Skip to content

Latest commit

 

History

History
705 lines (697 loc) · 191 KB

Supported-models-datasets.md

File metadata and controls

705 lines (697 loc) · 191 KB

Supported models and datasets

Table of Contents

Models

The table below introcudes all models supported by SWIFT:

  • Model List: The model_type information registered in SWIFT.
  • Default Lora Target Modules: Default lora_target_modules used by the model.
  • Default Template: Default template used by the model.
  • Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
  • Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
  • Requires: The extra requirements used by the model.

LLM

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support vLLM Support LMDeploy Support Megatron Requires Tags HF Model ID
qwen-1_8b qwen/Qwen-1_8B c_attn default-generation - Qwen/Qwen-1_8B
qwen-1_8b-chat qwen/Qwen-1_8B-Chat c_attn qwen - Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4 qwen/Qwen-1_8B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8 qwen/Qwen-1_8B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int8
qwen-7b qwen/Qwen-7B c_attn default-generation - Qwen/Qwen-7B
qwen-7b-chat qwen/Qwen-7B-Chat c_attn qwen - Qwen/Qwen-7B-Chat
qwen-7b-chat-int4 qwen/Qwen-7B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8 qwen/Qwen-7B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int8
qwen-14b qwen/Qwen-14B c_attn default-generation - Qwen/Qwen-14B
qwen-14b-chat qwen/Qwen-14B-Chat c_attn qwen - Qwen/Qwen-14B-Chat
qwen-14b-chat-int4 qwen/Qwen-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8 qwen/Qwen-14B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int8
qwen-72b qwen/Qwen-72B c_attn default-generation - Qwen/Qwen-72B
qwen-72b-chat qwen/Qwen-72B-Chat c_attn qwen - Qwen/Qwen-72B-Chat
qwen-72b-chat-int4 qwen/Qwen-72B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8 qwen/Qwen-72B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b iic/ModelScope-Agent-7B c_attn modelscope-agent - -
modelscope-agent-14b iic/ModelScope-Agent-14B c_attn modelscope-agent - -
qwen1half-0_5b qwen/Qwen1.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-0.5B
qwen1half-1_8b qwen/Qwen1.5-1.8B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-1.8B
qwen1half-4b qwen/Qwen1.5-4B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-4B
qwen1half-7b qwen/Qwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-7B
qwen1half-14b qwen/Qwen1.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-14B
qwen1half-32b qwen/Qwen1.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-32B
qwen1half-72b qwen/Qwen1.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-72B
qwen1half-110b qwen/Qwen1.5-110B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-110B
codeqwen1half-7b qwen/CodeQwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b qwen/Qwen1.5-MoE-A2.7B q_proj, k_proj, v_proj default-generation transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat qwen/Qwen1.5-0.5B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat qwen/Qwen1.5-1.8B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat qwen/Qwen1.5-4B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat qwen/Qwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat qwen/Qwen1.5-14B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat qwen/Qwen1.5-32B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat qwen/Qwen1.5-72B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat qwen/Qwen1.5-110B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat qwen/Qwen1.5-MoE-A2.7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat qwen/CodeQwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4 qwen/Qwen1.5-4B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4 qwen/Qwen1.5-7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4 qwen/Qwen1.5-14B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4 qwen/Qwen1.5-32B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4 qwen/Qwen1.5-72B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4 qwen/Qwen1.5-110B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8 qwen/Qwen1.5-4B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8 qwen/Qwen1.5-7B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8 qwen/Qwen1.5-14B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8 qwen/Qwen1.5-72B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4 qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq qwen/Qwen1.5-0.5B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq qwen/Qwen1.5-1.8B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq qwen/Qwen1.5-4B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq qwen/Qwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq qwen/Qwen1.5-14B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq qwen/Qwen1.5-32B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq qwen/Qwen1.5-72B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq qwen/Qwen1.5-110B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq qwen/CodeQwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b qwen/Qwen2-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-0.5B
qwen2-0_5b-instruct qwen/Qwen2-0.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4 qwen/Qwen2-0.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8 qwen/Qwen2-0.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq qwen/Qwen2-0.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b qwen/Qwen2-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-1.5B
qwen2-1_5b-instruct qwen/Qwen2-1.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4 qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8 qwen/Qwen2-1.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq qwen/Qwen2-1.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b qwen/Qwen2-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-7B
qwen2-7b-instruct qwen/Qwen2-7B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4 qwen/Qwen2-7B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8 qwen/Qwen2-7B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq qwen/Qwen2-7B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b qwen/Qwen2-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-72B
qwen2-72b-instruct qwen/Qwen2-72B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4 qwen/Qwen2-72B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8 qwen/Qwen2-72B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq qwen/Qwen2-72B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b qwen/Qwen2-57B-A14B q_proj, k_proj, v_proj default-generation transformers>=4.40 moe Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct qwen/Qwen2-57B-A14B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.40 moe Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4 qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 moe Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
qwen2-math-1_5b qwen/Qwen2-Math-1.5B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-1.5B
qwen2-math-1_5b-instruct qwen/Qwen2-Math-1.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-1.5B-Instruct
qwen2-math-7b qwen/Qwen2-Math-7B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-7B
qwen2-math-7b-instruct qwen/Qwen2-Math-7B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-7B-Instruct
qwen2-math-72b qwen/Qwen2-Math-72B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-72B
qwen2-math-72b-instruct qwen/Qwen2-Math-72B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-72B-Instruct
qwen2_5-0_5b qwen/Qwen2.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-0.5B
qwen2_5-1_5b qwen/Qwen2.5-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-1.5B
qwen2_5-3b qwen/Qwen2.5-3B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-3B
qwen2_5-7b qwen/Qwen2.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-7B
qwen2_5-14b qwen/Qwen2.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-14B
qwen2_5-32b qwen/Qwen2.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-32B
qwen2_5-72b qwen/Qwen2.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-72B
qwen2_5-0_5b-instruct qwen/Qwen2.5-0.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct
qwen2_5-1_5b-instruct qwen/Qwen2.5-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct
qwen2_5-3b-instruct qwen/Qwen2.5-3B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct
qwen2_5-7b-instruct qwen/Qwen2.5-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct
qwen2_5-14b-instruct qwen/Qwen2.5-14B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct
qwen2_5-32b-instruct qwen/Qwen2.5-32B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct
qwen2_5-72b-instruct qwen/Qwen2.5-72B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct
qwen2_5-0_5b-instruct-gptq-int4 qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-1_5b-instruct-gptq-int4 qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-3b-instruct-gptq-int4 qwen/Qwen2.5-3B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-7b-instruct-gptq-int4 qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-14b-instruct-gptq-int4 qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-32b-instruct-gptq-int4 qwen/Qwen2.5-32B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-72b-instruct-gptq-int4 qwen/Qwen2.5-72B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
qwen2_5-0_5b-instruct-gptq-int8 qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-1_5b-instruct-gptq-int8 qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-3b-instruct-gptq-int8 qwen/Qwen2.5-3B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-7b-instruct-gptq-int8 qwen/Qwen2.5-7B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-14b-instruct-gptq-int8 qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-32b-instruct-gptq-int8 qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-72b-instruct-gptq-int8 qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
qwen2_5-0_5b-instruct-awq qwen/Qwen2.5-0.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-1_5b-instruct-awq qwen/Qwen2.5-1.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-3b-instruct-awq qwen/Qwen2.5-3B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-7b-instruct-awq qwen/Qwen2.5-7B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-14b-instruct-awq qwen/Qwen2.5-14B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-32b-instruct-awq qwen/Qwen2.5-32B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-32B-Instruct-AWQ
qwen2_5-72b-instruct-awq qwen/Qwen2.5-72B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-72B-Instruct-AWQ
qwen2_5-math-1_5b qwen/Qwen2.5-Math-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-1.5B
qwen2_5-math-7b qwen/Qwen2.5-Math-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-7B
qwen2_5-math-72b qwen/Qwen2.5-Math-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-72B
qwen2_5-math-1_5b-instruct qwen/Qwen2.5-Math-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-1.5B-Instruct
qwen2_5-math-7b-instruct qwen/Qwen2.5-Math-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-7B-Instruct
qwen2_5-math-72b-instruct qwen/Qwen2.5-Math-72B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-72B-Instruct
qwen2_5-coder-1_5b qwen/Qwen2.5-Coder-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Coder-1.5B
qwen2_5-coder-1_5b-instruct qwen/Qwen2.5-Coder-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Coder-1.5B-Instruct
qwen2_5-coder-7b qwen/Qwen2.5-Coder-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Coder-7B
qwen2_5-coder-7b-instruct qwen/Qwen2.5-Coder-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Coder-7B-Instruct
chatglm2-6b ZhipuAI/chatglm2-6b query_key_value chatglm2 transformers<4.42 - THUDM/chatglm2-6b
chatglm2-6b-32k ZhipuAI/chatglm2-6b-32k query_key_value chatglm2 transformers<4.42 - THUDM/chatglm2-6b-32k
chatglm3-6b-base ZhipuAI/chatglm3-6b-base query_key_value chatglm-generation transformers<4.42 - THUDM/chatglm3-6b-base
chatglm3-6b ZhipuAI/chatglm3-6b query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b
chatglm3-6b-32k ZhipuAI/chatglm3-6b-32k query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b-32k
chatglm3-6b-128k ZhipuAI/chatglm3-6b-128k query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b-128k
codegeex2-6b ZhipuAI/codegeex2-6b query_key_value chatglm-generation transformers<4.34 coding THUDM/codegeex2-6b
glm4-9b ZhipuAI/glm-4-9b query_key_value chatglm-generation transformers>=4.42 - THUDM/glm-4-9b
glm4-9b-chat ZhipuAI/glm-4-9b-chat query_key_value chatglm4 transformers>=4.42 - THUDM/glm-4-9b-chat
glm4-9b-chat-1m ZhipuAI/glm-4-9b-chat-1m query_key_value chatglm4 transformers>=4.42 - THUDM/glm-4-9b-chat-1m
codegeex4-9b-chat ZhipuAI/codegeex4-all-9b query_key_value codegeex4 transformers<4.42 coding THUDM/codegeex4-all-9b
llama2-7b modelscope/Llama-2-7b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-7b-hf
llama2-7b-chat modelscope/Llama-2-7b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-7b-chat-hf
llama2-13b modelscope/Llama-2-13b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-13b-hf
llama2-13b-chat modelscope/Llama-2-13b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-13b-chat-hf
llama2-70b modelscope/Llama-2-70b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-70b-hf
llama2-70b-chat modelscope/Llama-2-70b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16 AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b LLM-Research/Meta-Llama-3-8B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-8B
llama3-8b-instruct LLM-Research/Meta-Llama-3-8B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4 swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8 swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq swift/Meta-Llama-3-8B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b LLM-Research/Meta-Llama-3-70B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-70B
llama3-70b-instruct LLM-Research/Meta-Llama-3-70B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4 swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8 swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq swift/Meta-Llama-3-70B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-70B-Instruct-AWQ
llama3_1-8b LLM-Research/Meta-Llama-3.1-8B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-8B
llama3_1-8b-instruct LLM-Research/Meta-Llama-3.1-8B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-8B-Instruct
llama3_1-8b-instruct-awq LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
llama3_1-8b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
llama3_1-8b-instruct-bnb LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4 q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4
llama3_1-70b LLM-Research/Meta-Llama-3.1-70B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B
llama3_1-70b-instruct LLM-Research/Meta-Llama-3.1-70B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B-Instruct
llama3_1-70b-instruct-fp8 LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8 q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B-Instruct-FP8
llama3_1-70b-instruct-awq LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
llama3_1-70b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4
llama3_1-70b-instruct-bnb LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit
llama3_1-405b LLM-Research/Meta-Llama-3.1-405B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B
llama3_1-405b-instruct LLM-Research/Meta-Llama-3.1-405B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B-Instruct
llama3_1-405b-instruct-fp8 LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8 q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
llama3_1-405b-instruct-awq LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4
llama3_1-405b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4
llama3_1-405b-instruct-bnb LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4 q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4
llama3_2-1b LLM-Research/Llama-3.2-1B q_proj, k_proj, v_proj default-generation transformers>=4.45 - meta-llama/Llama-3.2-1B
llama3_2-1b-instruct LLM-Research/Llama-3.2-1B-Instruct q_proj, k_proj, v_proj llama3_2 transformers>=4.45 - meta-llama/Llama-3.2-1B-Instruct
llama3_2-3b LLM-Research/Llama-3.2-3B q_proj, k_proj, v_proj default-generation transformers>=4.45 - meta-llama/Llama-3.2-3B
llama3_2-3b-instruct LLM-Research/Llama-3.2-3B-Instruct q_proj, k_proj, v_proj llama3_2 transformers>=4.45 - meta-llama/Llama-3.2-3B-Instruct
reflection-llama_3_1-70b LLM-Research/Reflection-Llama-3.1-70B q_proj, k_proj, v_proj reflection transformers>=4.43 - mattshumer/Reflection-Llama-3.1-70B
longwriter-glm4-9b ZhipuAI/LongWriter-glm4-9b query_key_value chatglm4 transformers>=4.42 - THUDM/LongWriter-glm4-9b
longwriter-llama3_1-8b ZhipuAI/LongWriter-llama3.1-8b q_proj, k_proj, v_proj longwriter-llama3 transformers>=4.43 - THUDM/LongWriter-llama3.1-8b
chinese-llama-2-1_3b AI-ModelScope/chinese-llama-2-1.3b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-1.3b
chinese-llama-2-7b AI-ModelScope/chinese-llama-2-7b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k AI-ModelScope/chinese-llama-2-7b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k AI-ModelScope/chinese-llama-2-7b-64k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b AI-ModelScope/chinese-llama-2-13b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k AI-ModelScope/chinese-llama-2-13b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b AI-ModelScope/chinese-alpaca-2-1.3b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b AI-ModelScope/chinese-alpaca-2-7b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k AI-ModelScope/chinese-alpaca-2-7b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k AI-ModelScope/chinese-alpaca-2-7b-64k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b AI-ModelScope/chinese-alpaca-2-13b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k AI-ModelScope/chinese-alpaca-2-13b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b ChineseAlpacaGroup/llama-3-chinese-8b q_proj, k_proj, v_proj default-generation - hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct ChineseAlpacaGroup/llama-3-chinese-8b-instruct q_proj, k_proj, v_proj llama3 - hfl/llama-3-chinese-8b-instruct
atom-7b FlagAlpha/Atom-7B q_proj, k_proj, v_proj default-generation - FlagAlpha/Atom-7B
atom-7b-chat FlagAlpha/Atom-7B-Chat q_proj, k_proj, v_proj atom - FlagAlpha/Atom-7B-Chat
yi-6b 01ai/Yi-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B
yi-6b-200k 01ai/Yi-6B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B-200K
yi-6b-chat 01ai/Yi-6B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-6B-Chat
yi-6b-chat-awq 01ai/Yi-6B-Chat-4bits q_proj, k_proj, v_proj chatml autoawq - 01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8 01ai/Yi-6B-Chat-8bits q_proj, k_proj, v_proj chatml auto_gptq - 01-ai/Yi-6B-Chat-8bits
yi-9b 01ai/Yi-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B
yi-9b-200k 01ai/Yi-9B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B-200K
yi-34b 01ai/Yi-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B
yi-34b-200k 01ai/Yi-34B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B-200K
yi-34b-chat 01ai/Yi-34B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-34B-Chat
yi-34b-chat-awq 01ai/Yi-34B-Chat-4bits q_proj, k_proj, v_proj chatml autoawq - 01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8 01ai/Yi-34B-Chat-8bits q_proj, k_proj, v_proj chatml auto_gptq - 01-ai/Yi-34B-Chat-8bits
yi-1_5-6b 01ai/Yi-1.5-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-6B
yi-1_5-6b-chat 01ai/Yi-1.5-6B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-6B-Chat
yi-1_5-9b 01ai/Yi-1.5-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-9B
yi-1_5-9b-chat 01ai/Yi-1.5-9B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k 01ai/Yi-1.5-9B-Chat-16K q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b 01ai/Yi-1.5-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-34B
yi-1_5-34b-chat 01ai/Yi-1.5-34B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k 01ai/Yi-1.5-34B-Chat-16K q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4 AI-ModelScope/Yi-1.5-6B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4 AI-ModelScope/Yi-1.5-6B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4 AI-ModelScope/Yi-1.5-9B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4 AI-ModelScope/Yi-1.5-9B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4 AI-ModelScope/Yi-1.5-34B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4 AI-ModelScope/Yi-1.5-34B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-34B-Chat-GPTQ
yi-coder-1_5b 01ai/Yi-Coder-1.5B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-Coder-1.5B
yi-coder-1_5b-chat 01ai/Yi-Coder-1.5B-Chat q_proj, k_proj, v_proj yi-coder - 01-ai/Yi-Coder-1.5B-Chat
yi-coder-9b 01ai/Yi-Coder-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-Coder-9B
yi-coder-9b-chat 01ai/Yi-Coder-9B-Chat q_proj, k_proj, v_proj yi-coder - 01-ai/Yi-Coder-9B-Chat
internlm-7b Shanghai_AI_Laboratory/internlm-7b q_proj, k_proj, v_proj default-generation - internlm/internlm-7b
internlm-7b-chat Shanghai_AI_Laboratory/internlm-chat-7b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-7b
internlm-7b-chat-8k Shanghai_AI_Laboratory/internlm-chat-7b-8k q_proj, k_proj, v_proj internlm - -
internlm-20b Shanghai_AI_Laboratory/internlm-20b q_proj, k_proj, v_proj default-generation - internlm/internlm-20b
internlm-20b-chat Shanghai_AI_Laboratory/internlm-chat-20b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-20b
internlm2-1_8b Shanghai_AI_Laboratory/internlm2-1_8b wqkv default-generation transformers>=4.38 - internlm/internlm2-1_8b
internlm2-1_8b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-1_8b
internlm2-7b-base Shanghai_AI_Laboratory/internlm2-base-7b wqkv default-generation transformers>=4.38 - internlm/internlm2-base-7b
internlm2-7b Shanghai_AI_Laboratory/internlm2-7b wqkv default-generation transformers>=4.38 - internlm/internlm2-7b
internlm2-7b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-7b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-7b-sft
internlm2-7b-chat Shanghai_AI_Laboratory/internlm2-chat-7b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-7b
internlm2-20b-base Shanghai_AI_Laboratory/internlm2-base-20b wqkv default-generation transformers>=4.38 - internlm/internlm2-base-20b
internlm2-20b Shanghai_AI_Laboratory/internlm2-20b wqkv default-generation transformers>=4.38 - internlm/internlm2-20b
internlm2-20b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-20b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-20b-sft
internlm2-20b-chat Shanghai_AI_Laboratory/internlm2-chat-20b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-20b
internlm2_5-1_8b Shanghai_AI_Laboratory/internlm2_5-1_8b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-1_8b
internlm2_5-1_8b-chat Shanghai_AI_Laboratory/internlm2_5-1_8b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-1_8b-chat
internlm2_5-7b Shanghai_AI_Laboratory/internlm2_5-7b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-7b
internlm2_5-7b-chat Shanghai_AI_Laboratory/internlm2_5-7b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-7b-chat
internlm2_5-7b-chat-1m Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-7b-chat-1m
internlm2_5-20b Shanghai_AI_Laboratory/internlm2_5-20b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-20b
internlm2_5-20b-chat Shanghai_AI_Laboratory/internlm2_5-20b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-20b-chat
internlm2-math-7b Shanghai_AI_Laboratory/internlm2-math-base-7b wqkv default-generation transformers>=4.38 math internlm/internlm2-math-base-7b
internlm2-math-7b-chat Shanghai_AI_Laboratory/internlm2-math-7b wqkv internlm2 transformers>=4.38 math internlm/internlm2-math-7b
internlm2-math-20b Shanghai_AI_Laboratory/internlm2-math-base-20b wqkv default-generation transformers>=4.38 math internlm/internlm2-math-base-20b
internlm2-math-20b-chat Shanghai_AI_Laboratory/internlm2-math-20b wqkv internlm2 transformers>=4.38 math internlm/internlm2-math-20b
deepseek-7b deepseek-ai/deepseek-llm-7b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat deepseek-ai/deepseek-llm-7b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b deepseek-ai/deepseek-moe-16b-base q_proj, k_proj, v_proj default-generation moe deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat deepseek-ai/deepseek-moe-16b-chat q_proj, k_proj, v_proj deepseek moe deepseek-ai/deepseek-moe-16b-chat
deepseek-67b deepseek-ai/deepseek-llm-67b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat deepseek-ai/deepseek-llm-67b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b deepseek-ai/deepseek-coder-1.3b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b deepseek-ai/deepseek-coder-6.7b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct deepseek-ai/deepseek-coder-6.7b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b deepseek-ai/deepseek-coder-33b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct deepseek-ai/deepseek-coder-33b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct deepseek-ai/DeepSeek-Coder-V2-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-coder-v2 deepseek-ai/DeepSeek-Coder-V2-Base q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Base
deepseek-coder-v2-lite deepseek-ai/DeepSeek-Coder-V2-Lite-Base q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Lite-Base
deepseek-math-7b deepseek-ai/deepseek-math-7b-base q_proj, k_proj, v_proj default-generation math deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct deepseek-ai/deepseek-math-7b-instruct q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat deepseek-ai/deepseek-math-7b-rl q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-rl
numina-math-7b AI-ModelScope/NuminaMath-7B-TIR q_proj, k_proj, v_proj numina-math math AI-MO/NuminaMath-7B-TIR
deepseek-v2 deepseek-ai/DeepSeek-V2 q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2
deepseek-v2-chat deepseek-ai/DeepSeek-V2-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite deepseek-ai/DeepSeek-V2-Lite q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat deepseek-ai/DeepSeek-V2-Lite-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Lite-Chat
deepseek-v2_5 deepseek-ai/DeepSeek-V2.5 q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2_5 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2.5
gemma-2b AI-ModelScope/gemma-2b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-2b
gemma-7b AI-ModelScope/gemma-7b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-7b
gemma-2b-instruct AI-ModelScope/gemma-2b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-2b-it
gemma-7b-instruct AI-ModelScope/gemma-7b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-7b-it
gemma2-2b LLM-Research/gemma-2-2b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-2b
gemma2-9b LLM-Research/gemma-2-9b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-9b
gemma2-27b LLM-Research/gemma-2-27b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-27b
gemma2-2b-instruct LLM-Research/gemma-2-2b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-2b-it
gemma2-9b-instruct LLM-Research/gemma-2-9b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-9b-it
gemma2-27b-instruct LLM-Research/gemma-2-27b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-27b-it
minicpm-1b-sft-chat OpenBMB/MiniCPM-1B-sft-bf16 q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat OpenBMB/MiniCPM-2B-sft-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat OpenBMB/MiniCPM-2B-dpo-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k OpenBMB/MiniCPM-2B-128k q_proj, k_proj, v_proj chatml transformers>=4.36.0 - openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b OpenBMB/MiniCPM-MoE-8x2B q_proj, k_proj, v_proj minicpm transformers>=4.36.0 moe openbmb/MiniCPM-MoE-8x2B
minicpm3-4b OpenBMB/MiniCPM3-4B q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj chatml transformers>=4.36 - openbmb/MiniCPM3-4B
openbuddy-llama-65b-chat OpenBuddy/openbuddy-llama-65b-v8-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat OpenBuddy/openbuddy-llama3-8b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat OpenBuddy/openbuddy-llama3-70b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat OpenBuddy/openbuddy-mistral-7b-v17.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat OpenBuddy/openbuddy-zephyr-7b-v14.1 q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat OpenBuddy/openbuddy-deepseek-67b-v15.2 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.36 moe OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
openbuddy-llama3_1-8b-chat OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k q_proj, k_proj, v_proj openbuddy2 transformers>=4.43 - OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
mistral-7b AI-ModelScope/Mistral-7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Mistral-7B-v0.1
mistral-7b-v2 AI-ModelScope/Mistral-7B-v0.2-hf q_proj, k_proj, v_proj default-generation transformers>=4.34 - alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct AI-ModelScope/Mistral-7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2 AI-ModelScope/Mistral-7B-Instruct-v0.2 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.2
mistral-7b-instruct-v3 LLM-Research/Mistral-7B-Instruct-v0.3 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.3
mistral-nemo-base-2407 AI-ModelScope/Mistral-Nemo-Base-2407 q_proj, k_proj, v_proj default-generation transformers>=4.43 - mistralai/Mistral-Nemo-Base-2407
mistral-nemo-instruct-2407 AI-ModelScope/Mistral-Nemo-Instruct-2407 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Nemo-Instruct-2407
mistral-large-instruct-2407 LLM-Research/Mistral-Large-Instruct-2407 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Large-Instruct-2407
mistral-small-instruct-2409 AI-ModelScope/Mistral-Small-Instruct-2409 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Small-Instruct-2409
mixtral-moe-7b AI-ModelScope/Mixtral-8x7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 moe mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.36 moe mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16 AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 moe ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1 AI-ModelScope/Mixtral-8x22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 moe mistral-community/Mixtral-8x22B-v0.1
wizardlm2-7b-awq AI-ModelScope/WizardLM-2-7B-AWQ q_proj, k_proj, v_proj wizardlm2-awq transformers>=4.34 - MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b AI-ModelScope/WizardLM-2-8x22B q_proj, k_proj, v_proj wizardlm2 transformers>=4.36 - alpindale/WizardLM-2-8x22B
baichuan-7b baichuan-inc/baichuan-7B W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-7B
baichuan-13b baichuan-inc/Baichuan-13B-Base W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat baichuan-inc/Baichuan-13B-Chat W_pack baichuan transformers<4.34 - baichuan-inc/Baichuan-13B-Chat
baichuan2-7b baichuan-inc/Baichuan2-7B-Base W_pack default-generation - baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat baichuan-inc/Baichuan2-7B-Chat W_pack baichuan - baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4 baichuan-inc/Baichuan2-7B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b baichuan-inc/Baichuan2-13B-Base W_pack default-generation - baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat baichuan-inc/Baichuan2-13B-Chat W_pack baichuan - baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4 baichuan-inc/Baichuan2-13B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct YuanLLM/Yuan2.0-2B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct YuanLLM/Yuan2-2B-Janus-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct YuanLLM/Yuan2.0-51B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct YuanLLM/Yuan2.0-102B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-102B-hf
yuan2-m32 YuanLLM/Yuan2-M32-hf q_proj, k_proj, v_proj yuan moe IEITYuan/Yuan2-M32-hf
xverse-7b xverse/XVERSE-7B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-7B
xverse-7b-chat xverse/XVERSE-7B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-7B-Chat
xverse-13b xverse/XVERSE-13B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B
xverse-13b-chat xverse/XVERSE-13B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-13B-Chat
xverse-65b xverse/XVERSE-65B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B
xverse-65b-v2 xverse/XVERSE-65B-2 q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B-2
xverse-65b-chat xverse/XVERSE-65B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-65B-Chat
xverse-13b-256k xverse/XVERSE-13B-256K q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B-256K
xverse-moe-a4_2b xverse/XVERSE-MoE-A4.2B q_proj, k_proj, v_proj default-generation moe xverse/XVERSE-MoE-A4.2B
orion-14b OrionStarAI/Orion-14B-Base q_proj, k_proj, v_proj default-generation - OrionStarAI/Orion-14B-Base
orion-14b-chat OrionStarAI/Orion-14B-Chat q_proj, k_proj, v_proj orion - OrionStarAI/Orion-14B-Chat
bluelm-7b vivo-ai/BlueLM-7B-Base q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base
bluelm-7b-32k vivo-ai/BlueLM-7B-Base-32K q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat vivo-ai/BlueLM-7B-Chat q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k vivo-ai/BlueLM-7B-Chat-32K q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b Fengshenbang/Ziya2-13B-Base q_proj, k_proj, v_proj default-generation - IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat Fengshenbang/Ziya2-13B-Chat q_proj, k_proj, v_proj ziya - IDEA-CCNL/Ziya2-13B-Chat
skywork-13b skywork/Skywork-13B-base q_proj, k_proj, v_proj default-generation - Skywork/Skywork-13B-base
skywork-13b-chat skywork/Skywork-13B-chat q_proj, k_proj, v_proj skywork - -
zephyr-7b-beta-chat modelscope/zephyr-7b-beta q_proj, k_proj, v_proj zephyr transformers>=4.34 - HuggingFaceH4/zephyr-7b-beta
polylm-13b damo/nlp_polylm_13b_text_generation c_attn default-generation - DAMO-NLP-MT/polylm-13b
seqgpt-560m damo/nlp_seqgpt-560m query_key_value default-generation - DAMO-NLP/SeqGPT-560M
sus-34b-chat SUSTC/SUS-Chat-34B q_proj, k_proj, v_proj sus - SUSTech/SUS-Chat-34B
tongyi-finance-14b TongyiFinance/Tongyi-Finance-14B c_attn default-generation financial -
tongyi-finance-14b-chat TongyiFinance/Tongyi-Finance-14B-Chat c_attn qwen financial jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4 TongyiFinance/Tongyi-Finance-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 financial jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat codefuse-ai/CodeFuse-CodeLlama-34B q_proj, k_proj, v_proj codefuse-codellama coding codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat codefuse-ai/CodeFuse-CodeGeeX2-6B query_key_value codefuse transformers<4.34 coding codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat codefuse-ai/CodeFuse-QWen-14B c_attn codefuse coding codefuse-ai/CodeFuse-QWen-14B
phi2-3b AI-ModelScope/phi-2 Wqkv default-generation coding microsoft/phi-2
phi3-4b-4k-instruct LLM-Research/Phi-3-mini-4k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct LLM-Research/Phi-3-mini-128k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-mini-128k-instruct
phi3-small-8k-instruct LLM-Research/Phi-3-small-8k-instruct query_key_value phi3 transformers>=4.36 - microsoft/Phi-3-small-8k-instruct
phi3-medium-4k-instruct LLM-Research/Phi-3-medium-4k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-medium-4k-instruct
phi3-small-128k-instruct LLM-Research/Phi-3-small-128k-instruct query_key_value phi3 transformers>=4.36 - microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct LLM-Research/Phi-3-medium-128k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-medium-128k-instruct
phi3_5-mini-instruct LLM-Research/Phi-3.5-mini-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3.5-mini-instruct
phi3_5-moe-instruct LLM-Research/Phi-3.5-MoE-instruct q_proj, k_proj, v_proj phi3 transformers>=4.36 moe microsoft/Phi-3.5-MoE-instruct
mamba-130m AI-ModelScope/mamba-130m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-130m-hf
mamba-370m AI-ModelScope/mamba-370m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-370m-hf
mamba-390m AI-ModelScope/mamba-390m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-390m-hf
mamba-790m AI-ModelScope/mamba-790m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-790m-hf
mamba-1.4b AI-ModelScope/mamba-1.4b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-1.4b-hf
mamba-2.8b AI-ModelScope/mamba-2.8b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-2.8b-hf
telechat-7b TeleAI/TeleChat-7B key_value, query telechat - Tele-AI/telechat-7B
telechat-12b TeleAI/TeleChat-12B key_value, query telechat - Tele-AI/TeleChat-12B
telechat-12b-v2 TeleAI/TeleChat-12B-v2 key_value, query telechat - Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4 swift/TeleChat-12B-V2-GPTQ-Int4 key_value, query telechat auto_gptq>=0.5 - -
telechat2-115b TeleAI/TeleChat2-115B key_value, query telechat2 - Tele-AI/TeleChat2-115B
grok-1 colossalai/grok-1-pytorch q_proj, k_proj, v_proj default-generation - hpcai-tech/grok-1
dbrx-instruct AI-ModelScope/dbrx-instruct attn.Wqkv dbrx transformers>=4.36 moe databricks/dbrx-instruct
dbrx-base AI-ModelScope/dbrx-base attn.Wqkv dbrx transformers>=4.36 moe databricks/dbrx-base
mengzi3-13b-base langboat/Mengzi3-13B-Base q_proj, k_proj, v_proj mengzi - Langboat/Mengzi3-13B-Base
c4ai-command-r-v01 AI-ModelScope/c4ai-command-r-v01 q_proj, k_proj, v_proj c4ai transformers>=4.39.1 - CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus AI-ModelScope/c4ai-command-r-plus q_proj, k_proj, v_proj c4ai transformers>4.39 - CohereForAI/c4ai-command-r-plus
codestral-22b swift/Codestral-22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Codestral-22B-v0.1

MLLM

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support vLLM Support LMDeploy Support Megatron Requires Tags HF Model ID
qwen-vl qwen/Qwen-VL ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl-generation vision Qwen/Qwen-VL
qwen-vl-chat qwen/Qwen-VL-Chat ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl vision Qwen/Qwen-VL-Chat
qwen-vl-chat-int4 qwen/Qwen-VL-Chat-Int4 ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl auto_gptq>=0.5 vision Qwen/Qwen-VL-Chat-Int4
qwen-audio qwen/Qwen-Audio ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-audio-generation audio Qwen/Qwen-Audio
qwen-audio-chat qwen/Qwen-Audio-Chat ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-audio audio Qwen/Qwen-Audio-Chat
qwen2-audio-7b qwen/Qwen2-Audio-7B ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-audio-generation librosa, transformers>=4.45 audio Qwen/Qwen2-Audio-7B
qwen2-audio-7b-instruct qwen/Qwen2-Audio-7B-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-audio librosa, transformers>=4.45 audio Qwen/Qwen2-Audio-7B-Instruct
qwen2-vl-2b qwen/Qwen2-VL-2B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-2B
qwen2-vl-2b-instruct qwen/Qwen2-VL-2B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-2B-Instruct
qwen2-vl-2b-instruct-gptq-int4 qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
qwen2-vl-2b-instruct-gptq-int8 qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
qwen2-vl-2b-instruct-awq qwen/Qwen2-VL-2B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-2B-Instruct-AWQ
qwen2-vl-7b qwen/Qwen2-VL-7B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-7B
qwen2-vl-7b-instruct qwen/Qwen2-VL-7B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-7B-Instruct
qwen2-vl-7b-instruct-gptq-int4 qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
qwen2-vl-7b-instruct-gptq-int8 qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
qwen2-vl-7b-instruct-awq qwen/Qwen2-VL-7B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-7B-Instruct-AWQ
qwen2-vl-72b qwen/Qwen2-VL-72B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-72B
qwen2-vl-72b-instruct qwen/Qwen2-VL-72B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-72B-Instruct
qwen2-vl-72b-instruct-gptq-int4 qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4
qwen2-vl-72b-instruct-gptq-int8 qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8
qwen2-vl-72b-instruct-awq qwen/Qwen2-VL-72B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-72B-Instruct-AWQ
glm4v-9b-chat ZhipuAI/glm-4v-9b ^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).* glm4v transformers>=4.42 vision THUDM/glm-4v-9b
llama3_2-11b-vision LLM-Research/Llama-3.2-11B-Vision ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision-generation transformers>=4.45 vision meta-llama/Llama-3.2-11B-Vision
llama3_2-11b-vision-instruct LLM-Research/Llama-3.2-11B-Vision-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision transformers>=4.45 vision meta-llama/Llama-3.2-11B-Vision-Instruct
llama3_2-90b-vision LLM-Research/Llama-3.2-90B-Vision ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision-generation transformers>=4.45 vision meta-llama/Llama-3.2-90B-Vision
llama3_2-90b-vision-instruct LLM-Research/Llama-3.2-90B-Vision-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision transformers>=4.45 vision meta-llama/Llama-3.2-90B-Vision-Instruct
llama3_1-8b-omni ICTNLP/Llama-3.1-8B-Omni ^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_1-omni whisper, openai-whisper audio ICTNLP/Llama-3.1-8B-Omni
idefics3-8b-llama3 AI-ModelScope/Idefics3-8B-Llama3 ^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).* idefics3 transformers>=4.45 vision HuggingFaceM4/Idefics3-8B-Llama3
llava1_5-7b-instruct swift/llava-1.5-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava1_5 transformers>=4.36 vision llava-hf/llava-1.5-7b-hf
llava1_5-13b-instruct swift/llava-1.5-13b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava1_5 transformers>=4.36 vision llava-hf/llava-1.5-13b-hf
llava1_6-mistral-7b-instruct swift/llava-v1.6-mistral-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-mistral transformers>=4.39 vision llava-hf/llava-v1.6-mistral-7b-hf
llava1_6-vicuna-7b-instruct swift/llava-v1.6-vicuna-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-vicuna transformers>=4.39 vision llava-hf/llava-v1.6-vicuna-7b-hf
llava1_6-vicuna-13b-instruct swift/llava-v1.6-vicuna-13b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-vicuna transformers>=4.39 vision llava-hf/llava-v1.6-vicuna-13b-hf
llava1_6-llama3_1-8b-instruct DaozeZhang/llava-llama3.1-8b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-llama3 transformers>=4.41 vision -
llava1_6-yi-34b-instruct swift/llava-v1.6-34b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-yi transformers>=4.39 vision llava-hf/llava-v1.6-34b-hf
llama3-llava-next-8b-hf swift/llama3-llava-next-8b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-llava-next-hf transformers>=4.39 vision llava-hf/llama3-llava-next-8b-hf
llava-next-72b-hf AI-ModelScope/llava-next-72b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-qwen-hf transformers>=4.39 vision llava-hf/llava-next-72b-hf
llava-next-110b-hf AI-ModelScope/llava-next-110b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-qwen-hf transformers>=4.39 vision llava-hf/llava-next-110b-hf
llava-onevision-qwen2-0_5b-ov AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-0.5b-ov-hf
llava-onevision-qwen2-7b-ov AI-ModelScope/llava-onevision-qwen2-7b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-7b-ov-hf
llava-onevision-qwen2-72b-ov AI-ModelScope/llava-onevision-qwen2-72b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-72b-ov-hf
llama3-llava-next-8b AI-Modelscope/llama3-llava-next-8b ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3-llava-next vision lmms-lab/llama3-llava-next-8b
llava-next-72b AI-Modelscope/llava-next-72b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-qwen vision lmms-lab/llava-next-72b
llava-next-110b AI-Modelscope/llava-next-110b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-qwen vision lmms-lab/llava-next-110b
llava-next-video-7b-instruct swift/LLaVA-NeXT-Video-7B-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-hf
llava-next-video-7b-32k-instruct swift/LLaVA-NeXT-Video-7B-32K-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-32K-hf
llava-next-video-7b-dpo-instruct swift/LLaVA-NeXT-Video-7B-DPO-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
llava-next-video-34b-instruct swift/LLaVA-NeXT-Video-34B-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video-yi transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-34B-hf
yi-vl-6b-chat 01ai/Yi-VL-6B ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* yi-vl transformers>=4.34 vision 01-ai/Yi-VL-6B
yi-vl-34b-chat 01ai/Yi-VL-34B ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* yi-vl transformers>=4.34 vision 01-ai/Yi-VL-34B
llava-llama3-8b-v1_1 AI-ModelScope/llava-llama-3-8b-v1_1-transformers ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-llama-instruct transformers>=4.36 vision xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2 vision internlm/internlm-xcomposer2-7b
internlm-xcomposer2-4khd-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2-4khd vision internlm/internlm-xcomposer2-4khd-7b
internlm-xcomposer2_5-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2_5 vision internlm/internlm-xcomposer2d5-7b
internvl-chat-v1_5 AI-ModelScope/InternVL-Chat-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8 AI-ModelScope/InternVL-Chat-V1-5-int8 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5 OpenGVLab/Mini-InternVL-Chat-2B-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5 OpenGVLab/Mini-InternVL-Chat-4B-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl-phi3 transformers>=4.35,<4.42, timm vision OpenGVLab/Mini-InternVL-Chat-4B-V1-5
internvl2-1b OpenGVLab/InternVL2-1B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-1B
internvl2-2b OpenGVLab/InternVL2-2B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-2B
internvl2-4b OpenGVLab/InternVL2-4B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2-phi3 transformers>=4.36,<4.42, timm vision, video OpenGVLab/InternVL2-4B
internvl2-8b OpenGVLab/InternVL2-8B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-8B
internvl2-26b OpenGVLab/InternVL2-26B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-26B
internvl2-40b OpenGVLab/InternVL2-40B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-40B
internvl2-llama3-76b OpenGVLab/InternVL2-Llama3-76B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-Llama3-76B
internvl2-2b-awq OpenGVLab/InternVL2-2B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-2B-AWQ
internvl2-8b-awq OpenGVLab/InternVL2-8B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-8B-AWQ
internvl2-26b-awq OpenGVLab/InternVL2-26B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-26B-AWQ
internvl2-40b-awq OpenGVLab/InternVL2-40B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-40B-AWQ
internvl2-llama3-76b-awq OpenGVLab/InternVL2-Llama3-76B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-Llama3-76B-AWQ
deepseek-vl-1_3b-chat deepseek-ai/deepseek-vl-1.3b-chat ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* deepseek-vl vision deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat deepseek-ai/deepseek-vl-7b-chat ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* deepseek-vl vision deepseek-ai/deepseek-vl-7b-chat
ovis1_6-gemma2-9b AIDC-AI/Ovis1.6-Gemma2-9B ^(llm)(?!.*(lm_head|output|emb|wte|shared)).* ovis1_6 transformers>=4.42 vision AIDC-AI/Ovis1.6-Gemma2-9B
paligemma-3b-pt-224 AI-ModelScope/paligemma-3b-pt-224 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-224
paligemma-3b-pt-448 AI-ModelScope/paligemma-3b-pt-448 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-448
paligemma-3b-pt-896 AI-ModelScope/paligemma-3b-pt-896 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-896
paligemma-3b-mix-224 AI-ModelScope/paligemma-3b-mix-224 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-mix-224
paligemma-3b-mix-448 AI-ModelScope/paligemma-3b-mix-448 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-mix-448
minicpm-v-3b-chat OpenBMB/MiniCPM-V ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v timm, transformers<4.42 vision openbmb/MiniCPM-V
minicpm-v-v2-chat OpenBMB/MiniCPM-V-2 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v timm, transformers<4.42 vision openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat OpenBMB/MiniCPM-Llama3-V-2_5 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v-v2_5 timm, transformers>=4.36 vision openbmb/MiniCPM-Llama3-V-2_5
minicpm-v-v2_6-chat OpenBMB/MiniCPM-V-2_6 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v-v2_6 timm, transformers>=4.36 vision, video openbmb/MiniCPM-V-2_6
pixtral-12b AI-ModelScope/pixtral-12b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* pixtral transformers>=4.45 vision mistral-community/pixtral-12b
mplug-owl2-chat iic/mPLUG-Owl2 q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 mplug-owl2 transformers<4.35, icecream vision MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat iic/mPLUG-Owl2.1 c_attn.multiway.0, c_attn.multiway.1 mplug-owl2 transformers<4.35, icecream vision Mizukiluke/mplug_owl_2_1
mplug-owl3-1b-chat iic/mPLUG-Owl3-1B-241014 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-1B-241014
mplug-owl3-2b-chat iic/mPLUG-Owl3-2B-241014 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-2B-241014
mplug-owl3-7b-chat iic/mPLUG-Owl3-7B-240728 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-7B-240728
phi3-vision-128k-instruct LLM-Research/Phi-3-vision-128k-instruct ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* phi3-vl transformers>=4.36 vision microsoft/Phi-3-vision-128k-instruct
phi3_5-vision-instruct LLM-Research/Phi-3.5-vision-instruct ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* phi3-vl transformers>=4.36 vision microsoft/Phi-3.5-vision-instruct
cogvlm-17b-chat ZhipuAI/cogvlm-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm-chat-hf
cogvlm2-19b-chat ZhipuAI/cogvlm2-llama3-chinese-chat-19B ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat ZhipuAI/cogvlm2-llama3-chat-19B ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm2-llama3-chat-19B
cogvlm2-video-13b-chat ZhipuAI/cogvlm2-video-llama3-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm2-video decord, pytorchvideo, transformers>=4.42 vision, video THUDM/cogvlm2-video-llama3-chat
cogagent-18b-chat ZhipuAI/cogagent-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogagent-chat timm vision THUDM/cogagent-chat-hf
cogagent-18b-instruct ZhipuAI/cogagent-vqa ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogagent-instruct timm vision THUDM/cogagent-vqa-hf
molmoe-1b LLM-Research/MolmoE-1B-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/MolmoE-1B-0924
molmo-7b-o LLM-Research/Molmo-7B-O-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-7B-O-0924
molmo-7b-d LLM-Research/Molmo-7B-D-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-7B-D-0924
molmo-72b LLM-Research/Molmo-72B-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-72B-0924
florence-2-base AI-ModelScope/Florence-2-base ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-base
florence-2-base-ft AI-ModelScope/Florence-2-base-ft ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-base-ft
florence-2-large AI-ModelScope/Florence-2-large ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-large
florence-2-large-ft AI-ModelScope/Florence-2-large-ft ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-large-ft
got-ocr2 stepfun-ai/GOT-OCR2_0 ^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).* got_ocr2 audio stepfun-ai/GOT-OCR2_0

Datasets

The table below introduces the datasets supported by SWIFT:

  • Dataset Name: The dataset name registered in SWIFT.
  • Dataset ID: The dataset id in ModelScope.
  • Size: The data row count of the dataset.
  • Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name Dataset ID Subsets Dataset Size Statistic (token) Tags HF Dataset ID
🔥ms-bench iic/ms_bench 316820 346.9±443.2, min=22, max=30960 chat, general, multi-round -
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 176.2±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 162.1±93.9, min=26, max=856 chat, general llm-wizard/alpaca-gpt4-data-zh
multi-alpaca damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 112.9±50.6, min=26, max=1226 chat, general, multilingual -
instinwild wyj123456/instinwild default
subset
103695 145.4±60.7, min=28, max=1434 - -
cot-en YorickHe/CoT 74771 122.7±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 117.5±70.8, min=43, max=9636 chat, general -
instruct-en wyj123456/instruct 888970 269.1±331.5, min=26, max=7254 chat, general -
firefly-zh AI-ModelScope/firefly-train-1.1M 1649399 178.1±260.4, min=26, max=12516 chat, general YeungNLP/firefly-train-1.1M
gpt4all-en wyj123456/GPT4all 806199 302.7±384.5, min=27, max=7391 chat, general -
sharegpt swift/sharegpt common-zh
computer-zh
unknow-zh
common-en
computer-en
96566 933.3±864.8, min=21, max=66412 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 5119 520.7±437.6, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 568.4±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 994896 382.3±417.4, min=31, max=8740 chat, multilingual, general -
🔥sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
72684 1047.6±1313.1, min=22, max=66412 chat, multilingual, general, multi-round, gpt4 -
deepctrl-sft AI-ModelScope/deepctrl-sft-data default
en
14149024 389.8±628.6, min=21, max=626237 chat, general, sft, multi-round -
🔥coig-cqia AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 703.8±654.2, min=33, max=19288 general -
🔥ruozhiba AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 39.9±13.1, min=21, max=559 pretrain -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 9619.0±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
lmsys-chat-1m AI-ModelScope/lmsys-chat-1m - Dataset is too huge, please click the original link to view the dataset stat. chat, em lmsys/lmsys-chat-1m
🔥ms-agent iic/ms_agent 26336 650.9±217.2, min=209, max=2740 chat, agent, multi-round -
🔥ms-agent-for-agentfabric AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 617.8±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 9500 447.6±84.9, min=145, max=1101 chat, agent, multi-round, role-play, multi-agent -
🔥toolbench-for-alpha-umi shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
1448337 1439.7±853.9, min=123, max=18467 chat, agent -
damo-agent-zh damo/MSAgent-Bench 386984 956.5±407.3, min=326, max=19001 chat, agent, multi-round -
damo-agent-zh-mini damo/MSAgent-Bench 20845 1326.4±329.6, min=571, max=4304 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
🔥msagent-pro iic/MSAgent-Pro 21905 1524.5±921.3, min=64, max=16770 chat, agent, multi-round -
toolbench swift/ToolBench 124345 3669.5±1600.9, min=1047, max=22581 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 100.2±60.1, min=29, max=1776 - sahil2801/CodeAlpaca-20k
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 727.1±235.9, min=259, max=2146 chat, coding -
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 483.6±193.9, min=45, max=3082 chat, coding -
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 439.6±206.3, min=37, max=2983 chat, coding -
medical-en swift/medical_zh en 117617 257.4±89.1, min=36, max=2564 chat, medical -
medical-zh swift/medical_zh zh 1950972 167.2±219.7, min=26, max=27351 chat, medical -
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 354.1±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 194.4±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 109.9±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 533.7±495.4, min=30, max=15169 chat, law ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 169.3±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 chat, math, quality BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 367.9±254.8, min=30, max=3951 chat, math, quality garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 274.6±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 80.2±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
synthetic-text-to-sql AI-ModelScope/synthetic_text_to_sql default 100000 283.4±115.8, min=61, max=1356 nl2sql, en gretelai/synthetic_text_to_sql
🔥advertise-gen-zh lvjianjin/AdvertiseGen 98399 130.6±21.7, min=51, max=241 text-generation shibing624/AdvertiseGen
🔥dureader-robust-zh modelscope/DuReader_robust-QG 17899 241.1±137.4, min=60, max=1416 text-generation -
cmnli-zh modelscope/clue cmnli 404024 82.6±16.6, min=51, max=199 text-generation, classification clue
🔥jd-sentiment-zh DAMO_NLP/jd 50000 66.0±83.2, min=39, max=4039 text-generation, classification -
🔥hc3-zh simpleai/HC3-Chinese baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology
39781 176.8±81.5, min=57, max=3051 text-generation, classification Hello-SimpleAI/HC3-Chinese
🔥hc3-en simpleai/HC3 finance
medicine
11021 298.3±138.7, min=65, max=2267 text-generation, classification Hello-SimpleAI/HC3
dolly-15k AI-ModelScope/databricks-dolly-15k default 15011 199.2±267.8, min=22, max=8615 multi-task, en, quality databricks/databricks-dolly-15k
zhihu-kol OmniData/Zhihu-KOL default - Dataset is too huge, please click the original link to view the dataset stat. zhihu, qa wangrui6/Zhihu-KOL
zhihu-kol-filtered OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 952.0±1727.2, min=25, max=98658 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en wyj123456/finance_en 68911 135.6±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 390309 55.2±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 1478.9±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
🔥self-cognition swift/self-cognition 134 53.6±18.6, min=29, max=121 chat, self-cognition modelscope/self-cognition
🔥swift-mix swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
- Dataset is too huge, please click the original link to view the dataset stat. chat, sft, general -
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 118.3±45.5, min=44, max=223 chat, ner -
coco-en modelscope/coco_2014_caption coco_2014_caption 454617 299.8±2.8, min=295, max=352 chat, multi-modal, vision -
🔥coco-en-mini modelscope/coco_2014_caption coco_2014_caption 40504 299.8±2.6, min=295, max=338 chat, multi-modal, vision -
coco-en-2 modelscope/coco_2014_caption coco_2014_caption 454617 36.8±2.8, min=32, max=89 chat, multi-modal, vision -
🔥coco-en-2-mini modelscope/coco_2014_caption coco_2014_caption 40504 36.8±2.6, min=32, max=75 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 8000 31.0±0.0, min=31, max=31 chat, multi-modal, vision -
latex-ocr-print AI-ModelScope/LaTeX_OCR full 17918 362.7±34.8, min=294, max=528 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
latex-ocr-handwrite AI-ModelScope/LaTeX_OCR synthetic_handwrite 95424 375.1±59.4, min=292, max=2115 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 141600 152.2±36.8, min=63, max=419 chat, multi-modal, audio -
🔥aishell1-zh-mini speech_asr/speech_asr_aishell1_trainsets 14526 152.2±35.6, min=74, max=359 chat, multi-modal, audio -
🔥video-chatgpt swift/VideoChatGPT Generic
Temporal
Consistency
3206 88.4±48.3, min=32, max=399 chat, multi-modal, video lmms-lab/VideoChatGPT
egoschema AI-ModelScope/egoschema Subset 101 191.6±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
hh-rlhf AI-ModelScope/hh-rlhf harmless-base
helpful-base
helpful-online
helpful-rejection-sampled
127459 245.4±190.7, min=22, max=1999 rlhf, dpo, pairwise -
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
355920 171.2±122.7, min=22, max=3078 rlhf, dpo, pairwise -
orpo-dpo-mix-40k AI-ModelScope/orpo-dpo-mix-40k default 43666 548.3±397.4, min=28, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo, pairwise -
ultrafeedback-kto AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 11.0±0.0, min=11, max=11 rlhf, kto -
rlaif-v swift/RLAIF-V-Dataset default 83132 119.8±52.6, min=28, max=556 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
pileval swift/pile-val-backup 214670 1612.3±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
mantis-instruct swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
655351 825.7±812.5, min=284, max=13563 chat, multi-modal, vision, quality TIGER-Lab/Mantis-Instruct
llava-data-instruct swift/llava-data llava_instruct 364100 189.0±142.1, min=33, max=5183 sft, multi-modal, quality TIGER-Lab/llava-data
midefics swift/MideficsDataset 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
gqa None train_all_instructions - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, vqa, quality lmms-lab/GQA
text-caps swift/TextCaps 18145 38.2±4.4, min=31, max=73 multi-modal, en, caption, quality HuggingFaceM4/TextCaps
refcoco-unofficial-caption swift/refcoco 46215 44.7±3.2, min=36, max=71 multi-modal, en, caption jxu124/refcoco
refcoco-unofficial-grounding swift/refcoco 46215 45.2±3.1, min=37, max=69 multi-modal, en, grounding jxu124/refcoco
refcocog-unofficial-caption swift/refcocog 44799 49.7±4.7, min=37, max=88 multi-modal, en, caption jxu124/refcocog
refcocog-unofficial-grounding swift/refcocog 44799 50.1±4.7, min=37, max=90 multi-modal, en, grounding jxu124/refcocog
a-okvqa swift/A-OKVQA 18201 45.8±7.9, min=32, max=100 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
okvqa swift/OK-VQA_train 9009 34.4±3.3, min=28, max=59 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
ocr-vqa swift/OCR-VQA 186753 35.6±6.6, min=29, max=193 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
grit swift/GRIT - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, caption-grounding, quality zzliang/GRIT
llava-instruct-mix swift/llava-instruct-mix-vsft 13640 179.8±120.2, min=30, max=962 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
lnqa swift/lnqa - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
science-qa swift/ScienceQA 8315 100.3±59.5, min=38, max=638 multi-modal, science, vqa, quality derek-thomas/ScienceQA
guanaco AI-ModelScope/GuanacoDataset default 31561 250.1±70.3, min=89, max=1436 chat, zh JosephusCheung/GuanacoDataset
mind2web swift/Multimodal-Mind2Web 1009 297522.4±325496.2, min=8592, max=3499715 agent, multi-modal osunlp/Multimodal-Mind2Web
sharegpt-4o-image AI-ModelScope/ShareGPT-4o image_caption 57289 638.7±157.9, min=47, max=4640 vqa, multi-modal OpenGVLab/ShareGPT-4o
pixelprose swift/pixelprose - Dataset is too huge, please click the original link to view the dataset stat. caption, multi-modal, vision tomg-group-umd/pixelprose
m3it AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
sharegpt4v AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
llava-instruct-150k AI-ModelScope/LLaVA-Instruct-150K 624610 490.4±180.2, min=288, max=5438 chat, multi-modal, vision -
llava-pretrain AI-ModelScope/LLaVA-Pretrain default - Dataset is too huge, please click the original link to view the dataset stat. vqa, multi-modal, quality liuhaotian/LLaVA-Pretrain
sa1b-dense-caption Tongyi-DataEngine/SA1B-Dense-Caption - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
sa1b-paired-caption Tongyi-DataEngine/SA1B-Paired-Captions-Images - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
alpaca-cleaned AI-ModelScope/alpaca-cleaned 51760 177.9±126.4, min=26, max=1044 chat, general, bench, quality yahma/alpaca-cleaned
aya-collection swift/aya_collection aya_dataset 202364 494.0±6911.3, min=21, max=3044268 multi-lingual, qa CohereForAI/aya_collection
belle-generated-chat-0.4M AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 common, zh BelleGroup/generated_chat_0.4M
belle-math-0.25M AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 math, zh BelleGroup/school_math_0.25M
belle-train-0.5M-CN AI-ModelScope/train_0.5M_CN 519255 129.1±91.5, min=27, max=6507 common, zh, quality BelleGroup/train_0.5M_CN
belle-train-1M-CN AI-ModelScope/train_1M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_1M_CN
belle-train-2M-CN AI-ModelScope/train_2M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_2M_CN
belle-train-3.5M-CN swift/train_3.5M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_3.5M_CN
c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/c4
chart-qa swift/ChartQA 28299 43.1±5.5, min=29, max=77 en, vqa, quality HuggingFaceM4/ChartQA
chinese-c4 swift/chinese-c4 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality shjwudp/chinese-c4
cinepile swift/cinepile - Dataset is too huge, please click the original link to view the dataset stat. vqa, en, youtube, video tomg-group-umd/cinepile
classical-chinese-translate swift/classical_chinese_translate 6655 344.0±76.4, min=61, max=815 chat, play-ground -
codealpaca-20k AI-ModelScope/CodeAlpaca-20k 20016 100.2±60.1, min=29, max=1776 code, en HuggingFaceH4/CodeAlpaca_20K
cosmopedia None auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
- Dataset is too huge, please click the original link to view the dataset stat. multi-domain, en, qa HuggingFaceTB/cosmopedia
cosmopedia-100k swift/cosmopedia-100k 100000 1024.5±243.1, min=239, max=2981 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
dolma swift/dolma v1_7 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/dolma
dolphin swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
- Dataset is too huge, please click the original link to view the dataset stat. en cognitivecomputations/dolphin
duet AI-ModelScope/Duet-v0.5 5000 1157.4±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
evol-instruct-v2 AI-ModelScope/WizardLM_evol_instruct_V2_196k 109184 480.9±333.1, min=26, max=4942 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
fineweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality HuggingFaceFW/fineweb
gen-qa swift/GenQA - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task tomg-group-umd/GenQA
github-code swift/github-code - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality codeparrot/github-code
gpt4v-dataset swift/gpt4v-dataset 12356 217.9±68.3, min=35, max=596 en, caption, multi-modal, quality laion/gpt4v-dataset
guanaco-belle-merge AI-ModelScope/guanaco_belle_merge_v1.0 693987 134.2±92.0, min=24, max=6507 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct swift/Infinity-Instruct - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task BAAI/Infinity-Instruct
llava-med-zh-instruct swift/llava-med-zh-instruct-60k 56649 207.7±67.6, min=37, max=657 zh, medical, vqa BUAADreamer/llava-med-zh-instruct-60k
🔥longwriter-6k ZhipuAI/LongWriter-6k 6000 4887.2±2879.2, min=117, max=30354 long, chat, sft THUDM/LongWriter-6k
🔥longwriter-6k-filtered swift/longwriter-6k-filtered 666 4108.9±2636.9, min=1190, max=17050 long, chat, sft -
math-instruct AI-ModelScope/MathInstruct 262283 254.4±183.5, min=11, max=4383 math, cot, en, quality TIGER-Lab/MathInstruct
math-plus TIGER-Lab/MATH-plus train 893929 287.1±158.7, min=24, max=2919 qa, math, en, quality TIGER-Lab/MATH-plus
moondream2-coyo-5M swift/moondream2-coyo-5M-captions - Dataset is too huge, please click the original link to view the dataset stat. caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
no-robots swift/no_robots 9485 298.7±246.4, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
open-hermes swift/OpenHermes-2.5 - Dataset is too huge, please click the original link to view the dataset stat. cot, en, quality teknium/OpenHermes-2.5
open-orca-chinese AI-ModelScope/OpenOrca-Chinese - Dataset is too huge, please click the original link to view the dataset stat. QA, zh, general, quality yys/OpenOrca-Chinese
orca_dpo_pairs swift/orca_dpo_pairs 12859 366.9±251.9, min=30, max=2010 rlhf, quality Intel/orca_dpo_pairs
path-vqa swift/path-vqa 19654 34.8±7.3, min=27, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
pile AI-ModelScope/pile - Dataset is too huge, please click the original link to view the dataset stat. pretrain EleutherAI/pile
poison-mpts iic/100PoisonMpts 906 150.6±80.8, min=39, max=656 poison-management, zh -
🔥qwen2-pro-en AI-ModelScope/Magpie-Qwen2-Pro-200K-English 200000 605.4±287.3, min=221, max=4267 chat, sft, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
🔥qwen2-pro-filtered AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered 300000 555.8±286.6, min=148, max=4267 chat, sft Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
🔥qwen2-pro-zh AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese 200000 446.2±246.4, min=74, max=4101 chat, sft, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t swift/RedPajama-Data-1T - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-1T
redpajama-data-v2 swift/RedPajama-Data-V2 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-V2
refinedweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality tiiuae/falcon-refinedweb
rwkv-pretrain-web mapjack/openwebtext_dataset - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality -
sft-nectar AI-ModelScope/SFT-Nectar 131192 396.4±272.1, min=44, max=10732 cot, en, quality AstraMindAI/SFT-Nectar
skypile AI-ModelScope/SkyPile-150B - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality, zh Skywork/SkyPile-150B
slim-orca swift/SlimOrca 517982 399.1±370.2, min=35, max=8756 quality, en Open-Orca/SlimOrca
slim-pajama-627b None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality cerebras/SlimPajama-627B
starcoder AI-ModelScope/starcoderdata - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/starcoderdata
tagengo-gpt4 swift/tagengo-gpt4 78057 472.3±292.9, min=22, max=3521 chat, multi-lingual, quality lightblue/tagengo-gpt4
the-stack AI-ModelScope/the-stack - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/the-stack
ultrachat-200k swift/ultrachat_200k 207865 1195.4±573.7, min=76, max=4470 chat, en, quality HuggingFaceH4/ultrachat_200k
vqa-v2 swift/VQAv2 443757 31.8±2.2, min=27, max=58 en, vqa, quality HuggingFaceM4/VQAv2
web-instruct-sub swift/WebInstructSub - Dataset is too huge, please click the original link to view the dataset stat. qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
wikipedia swift/wikipedia - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality wikipedia
wikipedia-cn-filtered AI-ModelScope/wikipedia-cn-20230720-filtered - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf AI-ModelScope/zhihu_rlhf_3k 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k