The table below introcudes all models supported by SWIFT:
- Model List: The model_type information registered in SWIFT.
- Default Lora Target Modules: Default lora_target_modules used by the model.
- Default Template: Default template used by the model.
- Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
- Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
- Requires: The extra requirements used by the model.
Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support vLLM | Support LMDeploy | Support Megatron | Requires | Tags | HF Model ID |
---|---|---|---|---|---|---|---|---|---|---|
qwen-1_8b | qwen/Qwen-1_8B | c_attn | default-generation | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-1_8B | |
qwen-1_8b-chat | qwen/Qwen-1_8B-Chat | c_attn | qwen | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-1_8B-Chat | |
qwen-1_8b-chat-int4 | qwen/Qwen-1_8B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int4 |
qwen-1_8b-chat-int8 | qwen/Qwen-1_8B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int8 |
qwen-7b | qwen/Qwen-7B | c_attn | default-generation | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-7B | |
qwen-7b-chat | qwen/Qwen-7B-Chat | c_attn | qwen | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-7B-Chat | |
qwen-7b-chat-int4 | qwen/Qwen-7B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int4 |
qwen-7b-chat-int8 | qwen/Qwen-7B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int8 |
qwen-14b | qwen/Qwen-14B | c_attn | default-generation | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-14B | |
qwen-14b-chat | qwen/Qwen-14B-Chat | c_attn | qwen | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-14B-Chat | |
qwen-14b-chat-int4 | qwen/Qwen-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int4 |
qwen-14b-chat-int8 | qwen/Qwen-14B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int8 |
qwen-72b | qwen/Qwen-72B | c_attn | default-generation | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-72B | |
qwen-72b-chat | qwen/Qwen-72B-Chat | c_attn | qwen | ✔ | ✔ | ✔ | ✘ | - | Qwen/Qwen-72B-Chat | |
qwen-72b-chat-int4 | qwen/Qwen-72B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int4 |
qwen-72b-chat-int8 | qwen/Qwen-72B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int8 |
modelscope-agent-7b | iic/ModelScope-Agent-7B | c_attn | modelscope-agent | ✔ | ✘ | ✘ | ✘ | - | - | |
modelscope-agent-14b | iic/ModelScope-Agent-14B | c_attn | modelscope-agent | ✔ | ✘ | ✘ | ✘ | - | - | |
qwen1half-0_5b | qwen/Qwen1.5-0.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B |
qwen1half-1_8b | qwen/Qwen1.5-1.8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B |
qwen1half-4b | qwen/Qwen1.5-4B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B |
qwen1half-7b | qwen/Qwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B |
qwen1half-14b | qwen/Qwen1.5-14B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B |
qwen1half-32b | qwen/Qwen1.5-32B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen1.5-32B |
qwen1half-72b | qwen/Qwen1.5-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B |
qwen1half-110b | qwen/Qwen1.5-110B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen1.5-110B |
codeqwen1half-7b | qwen/CodeQwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B |
qwen1half-moe-a2_7b | qwen/Qwen1.5-MoE-A2.7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.40 | moe | Qwen/Qwen1.5-MoE-A2.7B |
qwen1half-0_5b-chat | qwen/Qwen1.5-0.5B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat |
qwen1half-1_8b-chat | qwen/Qwen1.5-1.8B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat |
qwen1half-4b-chat | qwen/Qwen1.5-4B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat |
qwen1half-7b-chat | qwen/Qwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat |
qwen1half-14b-chat | qwen/Qwen1.5-14B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat |
qwen1half-32b-chat | qwen/Qwen1.5-32B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat |
qwen1half-72b-chat | qwen/Qwen1.5-72B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat |
qwen1half-110b-chat | qwen/Qwen1.5-110B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen1.5-110B-Chat |
qwen1half-moe-a2_7b-chat | qwen/Qwen1.5-MoE-A2.7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | transformers>=4.40 | moe | Qwen/Qwen1.5-MoE-A2.7B-Chat |
codeqwen1half-7b-chat | qwen/CodeQwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B-Chat |
qwen1half-0_5b-chat-int4 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 |
qwen1half-1_8b-chat-int4 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 |
qwen1half-4b-chat-int4 | qwen/Qwen1.5-4B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int4 |
qwen1half-7b-chat-int4 | qwen/Qwen1.5-7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int4 |
qwen1half-14b-chat-int4 | qwen/Qwen1.5-14B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int4 |
qwen1half-32b-chat-int4 | qwen/Qwen1.5-32B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat-GPTQ-Int4 |
qwen1half-72b-chat-int4 | qwen/Qwen1.5-72B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 |
qwen1half-110b-chat-int4 | qwen/Qwen1.5-110B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-110B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-int8 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 |
qwen1half-1_8b-chat-int8 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 |
qwen1half-4b-chat-int8 | qwen/Qwen1.5-4B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int8 |
qwen1half-7b-chat-int8 | qwen/Qwen1.5-7B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int8 |
qwen1half-14b-chat-int8 | qwen/Qwen1.5-14B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int8 |
qwen1half-72b-chat-int8 | qwen/Qwen1.5-72B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int8 |
qwen1half-moe-a2_7b-chat-int4 | qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.40 | moe | Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-awq | qwen/Qwen1.5-0.5B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-0.5B-Chat-AWQ |
qwen1half-1_8b-chat-awq | qwen/Qwen1.5-1.8B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-1.8B-Chat-AWQ |
qwen1half-4b-chat-awq | qwen/Qwen1.5-4B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-4B-Chat-AWQ |
qwen1half-7b-chat-awq | qwen/Qwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-7B-Chat-AWQ |
qwen1half-14b-chat-awq | qwen/Qwen1.5-14B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-14B-Chat-AWQ |
qwen1half-32b-chat-awq | qwen/Qwen1.5-32B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-32B-Chat-AWQ |
qwen1half-72b-chat-awq | qwen/Qwen1.5-72B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-72B-Chat-AWQ |
qwen1half-110b-chat-awq | qwen/Qwen1.5-110B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-110B-Chat-AWQ |
codeqwen1half-7b-chat-awq | qwen/CodeQwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/CodeQwen1.5-7B-Chat-AWQ |
qwen2-0_5b | qwen/Qwen2-0.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-0.5B |
qwen2-0_5b-instruct | qwen/Qwen2-0.5B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-0.5B-Instruct |
qwen2-0_5b-instruct-int4 | qwen/Qwen2-0.5B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4 |
qwen2-0_5b-instruct-int8 | qwen/Qwen2-0.5B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8 |
qwen2-0_5b-instruct-awq | qwen/Qwen2-0.5B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2-0.5B-Instruct-AWQ |
qwen2-1_5b | qwen/Qwen2-1.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-1.5B |
qwen2-1_5b-instruct | qwen/Qwen2-1.5B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-1.5B-Instruct |
qwen2-1_5b-instruct-int4 | qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 |
qwen2-1_5b-instruct-int8 | qwen/Qwen2-1.5B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8 |
qwen2-1_5b-instruct-awq | qwen/Qwen2-1.5B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2-1.5B-Instruct-AWQ |
qwen2-7b | qwen/Qwen2-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-7B |
qwen2-7b-instruct | qwen/Qwen2-7B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-7B-Instruct |
qwen2-7b-instruct-int4 | qwen/Qwen2-7B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-7B-Instruct-GPTQ-Int4 |
qwen2-7b-instruct-int8 | qwen/Qwen2-7B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-7B-Instruct-GPTQ-Int8 |
qwen2-7b-instruct-awq | qwen/Qwen2-7B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2-7B-Instruct-AWQ |
qwen2-72b | qwen/Qwen2-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-72B |
qwen2-72b-instruct | qwen/Qwen2-72B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-72B-Instruct |
qwen2-72b-instruct-int4 | qwen/Qwen2-72B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-72B-Instruct-GPTQ-Int4 |
qwen2-72b-instruct-int8 | qwen/Qwen2-72B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2-72B-Instruct-GPTQ-Int8 |
qwen2-72b-instruct-awq | qwen/Qwen2-72B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2-72B-Instruct-AWQ |
qwen2-57b-a14b | qwen/Qwen2-57B-A14B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.40 | moe | Qwen/Qwen2-57B-A14B |
qwen2-57b-a14b-instruct | qwen/Qwen2-57B-A14B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | transformers>=4.40 | moe | Qwen/Qwen2-57B-A14B-Instruct |
qwen2-57b-a14b-instruct-int4 | qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.40 | moe | Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 |
qwen2-math-1_5b | qwen/Qwen2-Math-1.5B | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-1.5B |
qwen2-math-1_5b-instruct | qwen/Qwen2-Math-1.5B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-1.5B-Instruct |
qwen2-math-7b | qwen/Qwen2-Math-7B | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-7B |
qwen2-math-7b-instruct | qwen/Qwen2-Math-7B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-7B-Instruct |
qwen2-math-72b | qwen/Qwen2-Math-72B | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-72B |
qwen2-math-72b-instruct | qwen/Qwen2-Math-72B-Instruct | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen2-Math-72B-Instruct |
qwen2_5-0_5b | qwen/Qwen2.5-0.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-0.5B |
qwen2_5-1_5b | qwen/Qwen2.5-1.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-1.5B |
qwen2_5-3b | qwen/Qwen2.5-3B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-3B |
qwen2_5-7b | qwen/Qwen2.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-7B |
qwen2_5-14b | qwen/Qwen2.5-14B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-14B |
qwen2_5-32b | qwen/Qwen2.5-32B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-32B |
qwen2_5-72b | qwen/Qwen2.5-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-72B |
qwen2_5-0_5b-instruct | qwen/Qwen2.5-0.5B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-0.5B-Instruct |
qwen2_5-1_5b-instruct | qwen/Qwen2.5-1.5B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-1.5B-Instruct |
qwen2_5-3b-instruct | qwen/Qwen2.5-3B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-3B-Instruct |
qwen2_5-7b-instruct | qwen/Qwen2.5-7B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-7B-Instruct |
qwen2_5-14b-instruct | qwen/Qwen2.5-14B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-14B-Instruct |
qwen2_5-32b-instruct | qwen/Qwen2.5-32B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-32B-Instruct |
qwen2_5-72b-instruct | qwen/Qwen2.5-72B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-72B-Instruct |
qwen2_5-0_5b-instruct-gptq-int4 | qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 |
qwen2_5-1_5b-instruct-gptq-int4 | qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4 |
qwen2_5-3b-instruct-gptq-int4 | qwen/Qwen2.5-3B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4 |
qwen2_5-7b-instruct-gptq-int4 | qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 |
qwen2_5-14b-instruct-gptq-int4 | qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 |
qwen2_5-32b-instruct-gptq-int4 | qwen/Qwen2.5-32B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4 |
qwen2_5-72b-instruct-gptq-int4 | qwen/Qwen2.5-72B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4 |
qwen2_5-0_5b-instruct-gptq-int8 | qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8 |
qwen2_5-1_5b-instruct-gptq-int8 | qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8 |
qwen2_5-3b-instruct-gptq-int8 | qwen/Qwen2.5-3B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8 |
qwen2_5-7b-instruct-gptq-int8 | qwen/Qwen2.5-7B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8 |
qwen2_5-14b-instruct-gptq-int8 | qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 |
qwen2_5-32b-instruct-gptq-int8 | qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 |
qwen2_5-72b-instruct-gptq-int8 | qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 |
qwen2_5-0_5b-instruct-awq | qwen/Qwen2.5-0.5B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-0.5B-Instruct-AWQ |
qwen2_5-1_5b-instruct-awq | qwen/Qwen2.5-1.5B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-1.5B-Instruct-AWQ |
qwen2_5-3b-instruct-awq | qwen/Qwen2.5-3B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-3B-Instruct-AWQ |
qwen2_5-7b-instruct-awq | qwen/Qwen2.5-7B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-7B-Instruct-AWQ |
qwen2_5-14b-instruct-awq | qwen/Qwen2.5-14B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-14B-Instruct-AWQ |
qwen2_5-32b-instruct-awq | qwen/Qwen2.5-32B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-32B-Instruct-AWQ |
qwen2_5-72b-instruct-awq | qwen/Qwen2.5-72B-Instruct-AWQ | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.37, autoawq | - | Qwen/Qwen2.5-72B-Instruct-AWQ |
qwen2_5-math-1_5b | qwen/Qwen2.5-Math-1.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-1.5B |
qwen2_5-math-7b | qwen/Qwen2.5-Math-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-7B |
qwen2_5-math-72b | qwen/Qwen2.5-Math-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-72B |
qwen2_5-math-1_5b-instruct | qwen/Qwen2.5-Math-1.5B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-1.5B-Instruct |
qwen2_5-math-7b-instruct | qwen/Qwen2.5-Math-7B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-7B-Instruct |
qwen2_5-math-72b-instruct | qwen/Qwen2.5-Math-72B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Math-72B-Instruct |
qwen2_5-coder-1_5b | qwen/Qwen2.5-Coder-1.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Coder-1.5B |
qwen2_5-coder-1_5b-instruct | qwen/Qwen2.5-Coder-1.5B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Coder-1.5B-Instruct |
qwen2_5-coder-7b | qwen/Qwen2.5-Coder-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Coder-7B |
qwen2_5-coder-7b-instruct | qwen/Qwen2.5-Coder-7B-Instruct | q_proj, k_proj, v_proj | qwen2_5 | ✔ | ✔ | ✔ | ✘ | transformers>=4.37 | - | Qwen/Qwen2.5-Coder-7B-Instruct |
chatglm2-6b | ZhipuAI/chatglm2-6b | query_key_value | chatglm2 | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm2-6b |
chatglm2-6b-32k | ZhipuAI/chatglm2-6b-32k | query_key_value | chatglm2 | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm2-6b-32k |
chatglm3-6b-base | ZhipuAI/chatglm3-6b-base | query_key_value | chatglm-generation | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm3-6b-base |
chatglm3-6b | ZhipuAI/chatglm3-6b | query_key_value | chatglm3 | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm3-6b |
chatglm3-6b-32k | ZhipuAI/chatglm3-6b-32k | query_key_value | chatglm3 | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm3-6b-32k |
chatglm3-6b-128k | ZhipuAI/chatglm3-6b-128k | query_key_value | chatglm3 | ✘ | ✔ | ✘ | ✘ | transformers<4.42 | - | THUDM/chatglm3-6b-128k |
codegeex2-6b | ZhipuAI/codegeex2-6b | query_key_value | chatglm-generation | ✘ | ✔ | ✘ | ✘ | transformers<4.34 | coding | THUDM/codegeex2-6b |
glm4-9b | ZhipuAI/glm-4-9b | query_key_value | chatglm-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.42 | - | THUDM/glm-4-9b |
glm4-9b-chat | ZhipuAI/glm-4-9b-chat | query_key_value | chatglm4 | ✔ | ✔ | ✔ | ✘ | transformers>=4.42 | - | THUDM/glm-4-9b-chat |
glm4-9b-chat-1m | ZhipuAI/glm-4-9b-chat-1m | query_key_value | chatglm4 | ✔ | ✔ | ✔ | ✘ | transformers>=4.42 | - | THUDM/glm-4-9b-chat-1m |
codegeex4-9b-chat | ZhipuAI/codegeex4-all-9b | query_key_value | codegeex4 | ✔ | ✔ | ✔ | ✘ | transformers<4.42 | coding | THUDM/codegeex4-all-9b |
llama2-7b | modelscope/Llama-2-7b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-7b-hf | |
llama2-7b-chat | modelscope/Llama-2-7b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-7b-chat-hf | |
llama2-13b | modelscope/Llama-2-13b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-13b-hf | |
llama2-13b-chat | modelscope/Llama-2-13b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-13b-chat-hf | |
llama2-70b | modelscope/Llama-2-70b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-70b-hf | |
llama2-70b-chat | modelscope/Llama-2-70b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Llama-2-70b-chat-hf | |
llama2-7b-aqlm-2bit-1x16 | AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | ✘ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | - | ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf |
llama3-8b | LLM-Research/Meta-Llama-3-8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Meta-Llama-3-8B | |
llama3-8b-instruct | LLM-Research/Meta-Llama-3-8B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Meta-Llama-3-8B-Instruct | |
llama3-8b-instruct-int4 | swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4 |
llama3-8b-instruct-int8 | swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8 |
llama3-8b-instruct-awq | swift/Meta-Llama-3-8B-Instruct-AWQ | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | autoawq | - | study-hjt/Meta-Llama-3-8B-Instruct-AWQ |
llama3-70b | LLM-Research/Meta-Llama-3-70B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Meta-Llama-3-70B | |
llama3-70b-instruct | LLM-Research/Meta-Llama-3-70B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | - | meta-llama/Meta-Llama-3-70B-Instruct | |
llama3-70b-instruct-int4 | swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4 |
llama3-70b-instruct-int8 | swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8 |
llama3-70b-instruct-awq | swift/Meta-Llama-3-70B-Instruct-AWQ | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | autoawq | - | study-hjt/Meta-Llama-3-70B-Instruct-AWQ |
llama3_1-8b | LLM-Research/Meta-Llama-3.1-8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-8B |
llama3_1-8b-instruct | LLM-Research/Meta-Llama-3.1-8B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-8B-Instruct |
llama3_1-8b-instruct-awq | LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, autoawq | - | hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 |
llama3_1-8b-instruct-gptq-int4 | LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, auto_gptq | - | hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 |
llama3_1-8b-instruct-bnb | LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, bitsandbytes | - | hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4 |
llama3_1-70b | LLM-Research/Meta-Llama-3.1-70B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-70B |
llama3_1-70b-instruct | LLM-Research/Meta-Llama-3.1-70B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-70B-Instruct |
llama3_1-70b-instruct-fp8 | LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-70B-Instruct-FP8 |
llama3_1-70b-instruct-awq | LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43, autoawq | - | hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 |
llama3_1-70b-instruct-gptq-int4 | LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, auto_gptq | - | hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 |
llama3_1-70b-instruct-bnb | LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, bitsandbytes | - | unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit |
llama3_1-405b | LLM-Research/Meta-Llama-3.1-405B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-405B |
llama3_1-405b-instruct | LLM-Research/Meta-Llama-3.1-405B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-405B-Instruct |
llama3_1-405b-instruct-fp8 | LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 |
llama3_1-405b-instruct-awq | LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43, autoawq | - | hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 |
llama3_1-405b-instruct-gptq-int4 | LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, auto_gptq | - | hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 |
llama3_1-405b-instruct-bnb | LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.43, bitsandbytes | - | hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4 |
llama3_2-1b | LLM-Research/Llama-3.2-1B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.45 | - | meta-llama/Llama-3.2-1B |
llama3_2-1b-instruct | LLM-Research/Llama-3.2-1B-Instruct | q_proj, k_proj, v_proj | llama3_2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.45 | - | meta-llama/Llama-3.2-1B-Instruct |
llama3_2-3b | LLM-Research/Llama-3.2-3B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.45 | - | meta-llama/Llama-3.2-3B |
llama3_2-3b-instruct | LLM-Research/Llama-3.2-3B-Instruct | q_proj, k_proj, v_proj | llama3_2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.45 | - | meta-llama/Llama-3.2-3B-Instruct |
reflection-llama_3_1-70b | LLM-Research/Reflection-Llama-3.1-70B | q_proj, k_proj, v_proj | reflection | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | mattshumer/Reflection-Llama-3.1-70B |
longwriter-glm4-9b | ZhipuAI/LongWriter-glm4-9b | query_key_value | chatglm4 | ✔ | ✔ | ✔ | ✘ | transformers>=4.42 | - | THUDM/LongWriter-glm4-9b |
longwriter-llama3_1-8b | ZhipuAI/LongWriter-llama3.1-8b | q_proj, k_proj, v_proj | longwriter-llama3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | THUDM/LongWriter-llama3.1-8b |
chinese-llama-2-1_3b | AI-ModelScope/chinese-llama-2-1.3b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-1.3b | |
chinese-llama-2-7b | AI-ModelScope/chinese-llama-2-7b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-7b | |
chinese-llama-2-7b-16k | AI-ModelScope/chinese-llama-2-7b-16k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-7b-16k | |
chinese-llama-2-7b-64k | AI-ModelScope/chinese-llama-2-7b-64k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-7b-64k | |
chinese-llama-2-13b | AI-ModelScope/chinese-llama-2-13b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-13b | |
chinese-llama-2-13b-16k | AI-ModelScope/chinese-llama-2-13b-16k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-llama-2-13b-16k | |
chinese-alpaca-2-1_3b | AI-ModelScope/chinese-alpaca-2-1.3b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-1.3b | |
chinese-alpaca-2-7b | AI-ModelScope/chinese-alpaca-2-7b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-7b | |
chinese-alpaca-2-7b-16k | AI-ModelScope/chinese-alpaca-2-7b-16k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-7b-16k | |
chinese-alpaca-2-7b-64k | AI-ModelScope/chinese-alpaca-2-7b-64k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-7b-64k | |
chinese-alpaca-2-13b | AI-ModelScope/chinese-alpaca-2-13b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-13b | |
chinese-alpaca-2-13b-16k | AI-ModelScope/chinese-alpaca-2-13b-16k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | - | hfl/chinese-alpaca-2-13b-16k | |
llama-3-chinese-8b | ChineseAlpacaGroup/llama-3-chinese-8b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | hfl/llama-3-chinese-8b | |
llama-3-chinese-8b-instruct | ChineseAlpacaGroup/llama-3-chinese-8b-instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | ✔ | ✘ | - | hfl/llama-3-chinese-8b-instruct | |
atom-7b | FlagAlpha/Atom-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | - | FlagAlpha/Atom-7B | |
atom-7b-chat | FlagAlpha/Atom-7B-Chat | q_proj, k_proj, v_proj | atom | ✔ | ✔ | ✘ | ✘ | - | FlagAlpha/Atom-7B-Chat | |
yi-6b | 01ai/Yi-6B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-6B | |
yi-6b-200k | 01ai/Yi-6B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-6B-200K | |
yi-6b-chat | 01ai/Yi-6B-Chat | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-6B-Chat | |
yi-6b-chat-awq | 01ai/Yi-6B-Chat-4bits | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | autoawq | - | 01-ai/Yi-6B-Chat-4bits |
yi-6b-chat-int8 | 01ai/Yi-6B-Chat-8bits | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | 01-ai/Yi-6B-Chat-8bits |
yi-9b | 01ai/Yi-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-9B | |
yi-9b-200k | 01ai/Yi-9B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-9B-200K | |
yi-34b | 01ai/Yi-34B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-34B | |
yi-34b-200k | 01ai/Yi-34B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-34B-200K | |
yi-34b-chat | 01ai/Yi-34B-Chat | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-34B-Chat | |
yi-34b-chat-awq | 01ai/Yi-34B-Chat-4bits | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | autoawq | - | 01-ai/Yi-34B-Chat-4bits |
yi-34b-chat-int8 | 01ai/Yi-34B-Chat-8bits | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | auto_gptq | - | 01-ai/Yi-34B-Chat-8bits |
yi-1_5-6b | 01ai/Yi-1.5-6B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-6B | |
yi-1_5-6b-chat | 01ai/Yi-1.5-6B-Chat | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-6B-Chat | |
yi-1_5-9b | 01ai/Yi-1.5-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-9B | |
yi-1_5-9b-chat | 01ai/Yi-1.5-9B-Chat | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-9B-Chat | |
yi-1_5-9b-chat-16k | 01ai/Yi-1.5-9B-Chat-16K | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-9B-Chat-16K | |
yi-1_5-34b | 01ai/Yi-1.5-34B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-34B | |
yi-1_5-34b-chat | 01ai/Yi-1.5-34B-Chat | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-34B-Chat | |
yi-1_5-34b-chat-16k | 01ai/Yi-1.5-34B-Chat-16K | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-1.5-34B-Chat-16K | |
yi-1_5-6b-chat-awq-int4 | AI-ModelScope/Yi-1.5-6B-Chat-AWQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | autoawq | - | modelscope/Yi-1.5-6B-Chat-AWQ |
yi-1_5-6b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-6B-Chat-GPTQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-6B-Chat-GPTQ |
yi-1_5-9b-chat-awq-int4 | AI-ModelScope/Yi-1.5-9B-Chat-AWQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | autoawq | - | modelscope/Yi-1.5-9B-Chat-AWQ |
yi-1_5-9b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-9B-Chat-GPTQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-9B-Chat-GPTQ |
yi-1_5-34b-chat-awq-int4 | AI-ModelScope/Yi-1.5-34B-Chat-AWQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✔ | ✘ | autoawq | - | modelscope/Yi-1.5-34B-Chat-AWQ |
yi-1_5-34b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-34B-Chat-GPTQ | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-34B-Chat-GPTQ |
yi-coder-1_5b | 01ai/Yi-Coder-1.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-Coder-1.5B | |
yi-coder-1_5b-chat | 01ai/Yi-Coder-1.5B-Chat | q_proj, k_proj, v_proj | yi-coder | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-Coder-1.5B-Chat | |
yi-coder-9b | 01ai/Yi-Coder-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-Coder-9B | |
yi-coder-9b-chat | 01ai/Yi-Coder-9B-Chat | q_proj, k_proj, v_proj | yi-coder | ✔ | ✔ | ✔ | ✘ | - | 01-ai/Yi-Coder-9B-Chat | |
internlm-7b | Shanghai_AI_Laboratory/internlm-7b | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✔ | ✘ | - | internlm/internlm-7b | |
internlm-7b-chat | Shanghai_AI_Laboratory/internlm-chat-7b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | ✔ | ✘ | - | internlm/internlm-chat-7b | |
internlm-7b-chat-8k | Shanghai_AI_Laboratory/internlm-chat-7b-8k | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | ✔ | ✘ | - | - | |
internlm-20b | Shanghai_AI_Laboratory/internlm-20b | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✔ | ✘ | - | internlm/internlm-20b | |
internlm-20b-chat | Shanghai_AI_Laboratory/internlm-chat-20b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | ✔ | ✘ | - | internlm/internlm-chat-20b | |
internlm2-1_8b | Shanghai_AI_Laboratory/internlm2-1_8b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-1_8b |
internlm2-1_8b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-1_8b-sft |
internlm2-1_8b-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-1_8b |
internlm2-7b-base | Shanghai_AI_Laboratory/internlm2-base-7b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-base-7b |
internlm2-7b | Shanghai_AI_Laboratory/internlm2-7b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-7b |
internlm2-7b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-7b-sft | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-7b-sft |
internlm2-7b-chat | Shanghai_AI_Laboratory/internlm2-chat-7b | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-7b |
internlm2-20b-base | Shanghai_AI_Laboratory/internlm2-base-20b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-base-20b |
internlm2-20b | Shanghai_AI_Laboratory/internlm2-20b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-20b |
internlm2-20b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-20b-sft | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-20b-sft |
internlm2-20b-chat | Shanghai_AI_Laboratory/internlm2-chat-20b | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2-chat-20b |
internlm2_5-1_8b | Shanghai_AI_Laboratory/internlm2_5-1_8b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-1_8b |
internlm2_5-1_8b-chat | Shanghai_AI_Laboratory/internlm2_5-1_8b-chat | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-1_8b-chat |
internlm2_5-7b | Shanghai_AI_Laboratory/internlm2_5-7b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-7b |
internlm2_5-7b-chat | Shanghai_AI_Laboratory/internlm2_5-7b-chat | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-7b-chat |
internlm2_5-7b-chat-1m | Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-7b-chat-1m |
internlm2_5-20b | Shanghai_AI_Laboratory/internlm2_5-20b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-20b |
internlm2_5-20b-chat | Shanghai_AI_Laboratory/internlm2_5-20b-chat | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | - | internlm/internlm2_5-20b-chat |
internlm2-math-7b | Shanghai_AI_Laboratory/internlm2-math-base-7b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | math | internlm/internlm2-math-base-7b |
internlm2-math-7b-chat | Shanghai_AI_Laboratory/internlm2-math-7b | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | math | internlm/internlm2-math-7b |
internlm2-math-20b | Shanghai_AI_Laboratory/internlm2-math-base-20b | wqkv | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | math | internlm/internlm2-math-base-20b |
internlm2-math-20b-chat | Shanghai_AI_Laboratory/internlm2-math-20b | wqkv | internlm2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.38 | math | internlm/internlm2-math-20b |
deepseek-7b | deepseek-ai/deepseek-llm-7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | deepseek-ai/deepseek-llm-7b-base | |
deepseek-7b-chat | deepseek-ai/deepseek-llm-7b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | ✔ | ✘ | - | deepseek-ai/deepseek-llm-7b-chat | |
deepseek-moe-16b | deepseek-ai/deepseek-moe-16b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | moe | deepseek-ai/deepseek-moe-16b-base | |
deepseek-moe-16b-chat | deepseek-ai/deepseek-moe-16b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | ✘ | ✘ | moe | deepseek-ai/deepseek-moe-16b-chat | |
deepseek-67b | deepseek-ai/deepseek-llm-67b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | deepseek-ai/deepseek-llm-67b-base | |
deepseek-67b-chat | deepseek-ai/deepseek-llm-67b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | ✔ | ✘ | - | deepseek-ai/deepseek-llm-67b-chat | |
deepseek-coder-1_3b | deepseek-ai/deepseek-coder-1.3b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-1.3b-base | |
deepseek-coder-1_3b-instruct | deepseek-ai/deepseek-coder-1.3b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-1.3b-instruct | |
deepseek-coder-6_7b | deepseek-ai/deepseek-coder-6.7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-6.7b-base | |
deepseek-coder-6_7b-instruct | deepseek-ai/deepseek-coder-6.7b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-6.7b-instruct | |
deepseek-coder-33b | deepseek-ai/deepseek-coder-33b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-33b-base | |
deepseek-coder-33b-instruct | deepseek-ai/deepseek-coder-33b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | ✔ | ✘ | coding | deepseek-ai/deepseek-coder-33b-instruct | |
deepseek-coder-v2-instruct | deepseek-ai/DeepSeek-Coder-V2-Instruct | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | coding, moe | deepseek-ai/DeepSeek-Coder-V2-Instruct |
deepseek-coder-v2-lite-instruct | deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | coding, moe | deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct |
deepseek-coder-v2 | deepseek-ai/DeepSeek-Coder-V2-Base | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | coding, moe | deepseek-ai/DeepSeek-Coder-V2-Base |
deepseek-coder-v2-lite | deepseek-ai/DeepSeek-Coder-V2-Lite-Base | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | coding, moe | deepseek-ai/DeepSeek-Coder-V2-Lite-Base |
deepseek-math-7b | deepseek-ai/deepseek-math-7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | math | deepseek-ai/deepseek-math-7b-base | |
deepseek-math-7b-instruct | deepseek-ai/deepseek-math-7b-instruct | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | ✔ | ✘ | math | deepseek-ai/deepseek-math-7b-instruct | |
deepseek-math-7b-chat | deepseek-ai/deepseek-math-7b-rl | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | ✔ | ✘ | math | deepseek-ai/deepseek-math-7b-rl | |
numina-math-7b | AI-ModelScope/NuminaMath-7B-TIR | q_proj, k_proj, v_proj | numina-math | ✔ | ✔ | ✘ | ✘ | math | AI-MO/NuminaMath-7B-TIR | |
deepseek-v2 | deepseek-ai/DeepSeek-V2 | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | moe | deepseek-ai/DeepSeek-V2 |
deepseek-v2-chat | deepseek-ai/DeepSeek-V2-Chat | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | moe | deepseek-ai/DeepSeek-V2-Chat |
deepseek-v2-lite | deepseek-ai/DeepSeek-V2-Lite | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | moe | deepseek-ai/DeepSeek-V2-Lite |
deepseek-v2-lite-chat | deepseek-ai/DeepSeek-V2-Lite-Chat | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | moe | deepseek-ai/DeepSeek-V2-Lite-Chat |
deepseek-v2_5 | deepseek-ai/DeepSeek-V2.5 | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.3 | moe | deepseek-ai/DeepSeek-V2.5 |
gemma-2b | AI-ModelScope/gemma-2b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.38 | - | google/gemma-2b |
gemma-7b | AI-ModelScope/gemma-7b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.38 | - | google/gemma-7b |
gemma-2b-instruct | AI-ModelScope/gemma-2b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.38 | - | google/gemma-2b-it |
gemma-7b-instruct | AI-ModelScope/gemma-7b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.38 | - | google/gemma-7b-it |
gemma2-2b | LLM-Research/gemma-2-2b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-2b |
gemma2-9b | LLM-Research/gemma-2-9b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-9b |
gemma2-27b | LLM-Research/gemma-2-27b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-27b |
gemma2-2b-instruct | LLM-Research/gemma-2-2b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-2b-it |
gemma2-9b-instruct | LLM-Research/gemma-2-9b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-9b-it |
gemma2-27b-instruct | LLM-Research/gemma-2-27b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.42 | - | google/gemma-2-27b-it |
minicpm-1b-sft-chat | OpenBMB/MiniCPM-1B-sft-bf16 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | ✘ | ✘ | transformers>=4.36.0 | - | openbmb/MiniCPM-1B-sft-bf16 |
minicpm-2b-sft-chat | OpenBMB/MiniCPM-2B-sft-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | ✘ | ✘ | - | openbmb/MiniCPM-2B-sft-fp32 | |
minicpm-2b-chat | OpenBMB/MiniCPM-2B-dpo-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | ✘ | ✘ | - | openbmb/MiniCPM-2B-dpo-fp32 | |
minicpm-2b-128k | OpenBMB/MiniCPM-2B-128k | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | ✘ | ✘ | transformers>=4.36.0 | - | openbmb/MiniCPM-2B-128k |
minicpm-moe-8x2b | OpenBMB/MiniCPM-MoE-8x2B | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | ✘ | ✘ | transformers>=4.36.0 | moe | openbmb/MiniCPM-MoE-8x2B |
minicpm3-4b | OpenBMB/MiniCPM3-4B | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj | chatml | ✔ | ✘ | ✘ | ✘ | transformers>=4.36 | - | openbmb/MiniCPM3-4B |
openbuddy-llama-65b-chat | OpenBuddy/openbuddy-llama-65b-v8-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-llama-65b-v8-bf16 | |
openbuddy-llama2-13b-chat | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | |
openbuddy-llama2-70b-chat | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | |
openbuddy-llama3-8b-chat | OpenBuddy/openbuddy-llama3-8b-v21.1-8k | q_proj, k_proj, v_proj | openbuddy2 | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-llama3-8b-v21.1-8k | |
openbuddy-llama3-70b-chat | OpenBuddy/openbuddy-llama3-70b-v21.1-8k | q_proj, k_proj, v_proj | openbuddy2 | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-llama3-70b-v21.1-8k | |
openbuddy-mistral-7b-chat | OpenBuddy/openbuddy-mistral-7b-v17.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | OpenBuddy/openbuddy-mistral-7b-v17.1-32k |
openbuddy-zephyr-7b-chat | OpenBuddy/openbuddy-zephyr-7b-v14.1 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | OpenBuddy/openbuddy-zephyr-7b-v14.1 |
openbuddy-deepseek-67b-chat | OpenBuddy/openbuddy-deepseek-67b-v15.2 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✔ | ✘ | - | OpenBuddy/openbuddy-deepseek-67b-v15.2 | |
openbuddy-mixtral-moe-7b-chat | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k |
openbuddy-llama3_1-8b-chat | OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k | q_proj, k_proj, v_proj | openbuddy2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.43 | - | OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k |
mistral-7b | AI-ModelScope/Mistral-7B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | mistralai/Mistral-7B-v0.1 |
mistral-7b-v2 | AI-ModelScope/Mistral-7B-v0.2-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | alpindale/Mistral-7B-v0.2-hf |
mistral-7b-instruct | AI-ModelScope/Mistral-7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.1 |
mistral-7b-instruct-v2 | AI-ModelScope/Mistral-7B-Instruct-v0.2 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.2 |
mistral-7b-instruct-v3 | LLM-Research/Mistral-7B-Instruct-v0.3 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.3 |
mistral-nemo-base-2407 | AI-ModelScope/Mistral-Nemo-Base-2407 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | mistralai/Mistral-Nemo-Base-2407 |
mistral-nemo-instruct-2407 | AI-ModelScope/Mistral-Nemo-Instruct-2407 | q_proj, k_proj, v_proj | mistral-nemo | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | mistralai/Mistral-Nemo-Instruct-2407 |
mistral-large-instruct-2407 | LLM-Research/Mistral-Large-Instruct-2407 | q_proj, k_proj, v_proj | mistral-nemo | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | mistralai/Mistral-Large-Instruct-2407 |
mistral-small-instruct-2409 | AI-ModelScope/Mistral-Small-Instruct-2409 | q_proj, k_proj, v_proj | mistral-nemo | ✔ | ✔ | ✘ | ✘ | transformers>=4.43 | - | mistralai/Mistral-Small-Instruct-2409 |
mixtral-moe-7b | AI-ModelScope/Mixtral-8x7B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | mistralai/Mixtral-8x7B-v0.1 |
mixtral-moe-7b-instruct | AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | mistralai/Mixtral-8x7B-Instruct-v0.1 |
mixtral-moe-7b-aqlm-2bit-1x16 | AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | ✘ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | moe | ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf |
mixtral-moe-8x22b-v1 | AI-ModelScope/Mixtral-8x22B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | mistral-community/Mixtral-8x22B-v0.1 |
wizardlm2-7b-awq | AI-ModelScope/WizardLM-2-7B-AWQ | q_proj, k_proj, v_proj | wizardlm2-awq | ✔ | ✔ | ✘ | ✘ | transformers>=4.34 | - | MaziyarPanahi/WizardLM-2-7B-AWQ |
wizardlm2-8x22b | AI-ModelScope/WizardLM-2-8x22B | q_proj, k_proj, v_proj | wizardlm2 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | alpindale/WizardLM-2-8x22B |
baichuan-7b | baichuan-inc/baichuan-7B | W_pack | default-generation | ✘ | ✔ | ✔ | ✘ | transformers<4.34 | - | baichuan-inc/Baichuan-7B |
baichuan-13b | baichuan-inc/Baichuan-13B-Base | W_pack | default-generation | ✘ | ✔ | ✔ | ✘ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Base |
baichuan-13b-chat | baichuan-inc/Baichuan-13B-Chat | W_pack | baichuan | ✘ | ✔ | ✔ | ✘ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Chat |
baichuan2-7b | baichuan-inc/Baichuan2-7B-Base | W_pack | default-generation | ✘ | ✔ | ✔ | ✘ | - | baichuan-inc/Baichuan2-7B-Base | |
baichuan2-7b-chat | baichuan-inc/Baichuan2-7B-Chat | W_pack | baichuan | ✘ | ✔ | ✔ | ✘ | - | baichuan-inc/Baichuan2-7B-Chat | |
baichuan2-7b-chat-int4 | baichuan-inc/Baichuan2-7B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-7B-Chat-4bits |
baichuan2-13b | baichuan-inc/Baichuan2-13B-Base | W_pack | default-generation | ✘ | ✔ | ✔ | ✘ | - | baichuan-inc/Baichuan2-13B-Base | |
baichuan2-13b-chat | baichuan-inc/Baichuan2-13B-Chat | W_pack | baichuan | ✘ | ✔ | ✔ | ✘ | - | baichuan-inc/Baichuan2-13B-Chat | |
baichuan2-13b-chat-int4 | baichuan-inc/Baichuan2-13B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-13B-Chat-4bits |
yuan2-2b-instruct | YuanLLM/Yuan2.0-2B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | ✘ | ✘ | - | IEITYuan/Yuan2-2B-hf | |
yuan2-2b-janus-instruct | YuanLLM/Yuan2-2B-Janus-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | ✘ | ✘ | - | IEITYuan/Yuan2-2B-Janus-hf | |
yuan2-51b-instruct | YuanLLM/Yuan2.0-51B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | ✘ | ✘ | - | IEITYuan/Yuan2-51B-hf | |
yuan2-102b-instruct | YuanLLM/Yuan2.0-102B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | ✘ | ✘ | - | IEITYuan/Yuan2-102B-hf | |
yuan2-m32 | YuanLLM/Yuan2-M32-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | ✘ | ✘ | moe | IEITYuan/Yuan2-M32-hf | |
xverse-7b | xverse/XVERSE-7B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-7B | |
xverse-7b-chat | xverse/XVERSE-7B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-7B-Chat | |
xverse-13b | xverse/XVERSE-13B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-13B | |
xverse-13b-chat | xverse/XVERSE-13B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-13B-Chat | |
xverse-65b | xverse/XVERSE-65B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-65B | |
xverse-65b-v2 | xverse/XVERSE-65B-2 | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-65B-2 | |
xverse-65b-chat | xverse/XVERSE-65B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-65B-Chat | |
xverse-13b-256k | xverse/XVERSE-13B-256K | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | ✘ | ✘ | - | xverse/XVERSE-13B-256K | |
xverse-moe-a4_2b | xverse/XVERSE-MoE-A4.2B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | ✘ | ✘ | moe | xverse/XVERSE-MoE-A4.2B | |
orion-14b | OrionStarAI/Orion-14B-Base | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | ✘ | ✘ | - | OrionStarAI/Orion-14B-Base | |
orion-14b-chat | OrionStarAI/Orion-14B-Chat | q_proj, k_proj, v_proj | orion | ✔ | ✘ | ✘ | ✘ | - | OrionStarAI/Orion-14B-Chat | |
bluelm-7b | vivo-ai/BlueLM-7B-Base | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base | |
bluelm-7b-32k | vivo-ai/BlueLM-7B-Base-32K | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base-32K | |
bluelm-7b-chat | vivo-ai/BlueLM-7B-Chat | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat | |
bluelm-7b-chat-32k | vivo-ai/BlueLM-7B-Chat-32K | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat-32K | |
ziya2-13b | Fengshenbang/Ziya2-13B-Base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✔ | ✘ | - | IDEA-CCNL/Ziya2-13B-Base | |
ziya2-13b-chat | Fengshenbang/Ziya2-13B-Chat | q_proj, k_proj, v_proj | ziya | ✔ | ✔ | ✔ | ✘ | - | IDEA-CCNL/Ziya2-13B-Chat | |
skywork-13b | skywork/Skywork-13B-base | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | ✘ | ✘ | - | Skywork/Skywork-13B-base | |
skywork-13b-chat | skywork/Skywork-13B-chat | q_proj, k_proj, v_proj | skywork | ✘ | ✘ | ✘ | ✘ | - | - | |
zephyr-7b-beta-chat | modelscope/zephyr-7b-beta | q_proj, k_proj, v_proj | zephyr | ✔ | ✔ | ✔ | ✘ | transformers>=4.34 | - | HuggingFaceH4/zephyr-7b-beta |
polylm-13b | damo/nlp_polylm_13b_text_generation | c_attn | default-generation | ✘ | ✘ | ✘ | ✘ | - | DAMO-NLP-MT/polylm-13b | |
seqgpt-560m | damo/nlp_seqgpt-560m | query_key_value | default-generation | ✘ | ✔ | ✘ | ✘ | - | DAMO-NLP/SeqGPT-560M | |
sus-34b-chat | SUSTC/SUS-Chat-34B | q_proj, k_proj, v_proj | sus | ✔ | ✔ | ✔ | ✘ | - | SUSTech/SUS-Chat-34B | |
tongyi-finance-14b | TongyiFinance/Tongyi-Finance-14B | c_attn | default-generation | ✔ | ✔ | ✔ | ✘ | financial | - | |
tongyi-finance-14b-chat | TongyiFinance/Tongyi-Finance-14B-Chat | c_attn | qwen | ✔ | ✔ | ✔ | ✘ | financial | jxy/Tongyi-Finance-14B-Chat | |
tongyi-finance-14b-chat-int4 | TongyiFinance/Tongyi-Finance-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | financial | jxy/Tongyi-Finance-14B-Chat-Int4 |
codefuse-codellama-34b-chat | codefuse-ai/CodeFuse-CodeLlama-34B | q_proj, k_proj, v_proj | codefuse-codellama | ✔ | ✔ | ✔ | ✘ | coding | codefuse-ai/CodeFuse-CodeLlama-34B | |
codefuse-codegeex2-6b-chat | codefuse-ai/CodeFuse-CodeGeeX2-6B | query_key_value | codefuse | ✘ | ✔ | ✘ | ✘ | transformers<4.34 | coding | codefuse-ai/CodeFuse-CodeGeeX2-6B |
codefuse-qwen-14b-chat | codefuse-ai/CodeFuse-QWen-14B | c_attn | codefuse | ✔ | ✔ | ✔ | ✘ | coding | codefuse-ai/CodeFuse-QWen-14B | |
phi2-3b | AI-ModelScope/phi-2 | Wqkv | default-generation | ✔ | ✔ | ✘ | ✘ | coding | microsoft/phi-2 | |
phi3-4b-4k-instruct | LLM-Research/Phi-3-mini-4k-instruct | qkv_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-mini-4k-instruct |
phi3-4b-128k-instruct | LLM-Research/Phi-3-mini-128k-instruct | qkv_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-mini-128k-instruct |
phi3-small-8k-instruct | LLM-Research/Phi-3-small-8k-instruct | query_key_value | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-small-8k-instruct |
phi3-medium-4k-instruct | LLM-Research/Phi-3-medium-4k-instruct | qkv_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-medium-4k-instruct |
phi3-small-128k-instruct | LLM-Research/Phi-3-small-128k-instruct | query_key_value | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-small-128k-instruct |
phi3-medium-128k-instruct | LLM-Research/Phi-3-medium-128k-instruct | qkv_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3-medium-128k-instruct |
phi3_5-mini-instruct | LLM-Research/Phi-3.5-mini-instruct | qkv_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | - | microsoft/Phi-3.5-mini-instruct |
phi3_5-moe-instruct | LLM-Research/Phi-3.5-MoE-instruct | q_proj, k_proj, v_proj | phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | microsoft/Phi-3.5-MoE-instruct |
mamba-130m | AI-ModelScope/mamba-130m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-130m-hf |
mamba-370m | AI-ModelScope/mamba-370m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-370m-hf |
mamba-390m | AI-ModelScope/mamba-390m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-390m-hf |
mamba-790m | AI-ModelScope/mamba-790m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-790m-hf |
mamba-1.4b | AI-ModelScope/mamba-1.4b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-1.4b-hf |
mamba-2.8b | AI-ModelScope/mamba-2.8b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-2.8b-hf |
telechat-7b | TeleAI/TeleChat-7B | key_value, query | telechat | ✔ | ✘ | ✘ | ✘ | - | Tele-AI/telechat-7B | |
telechat-12b | TeleAI/TeleChat-12B | key_value, query | telechat | ✔ | ✘ | ✘ | ✘ | - | Tele-AI/TeleChat-12B | |
telechat-12b-v2 | TeleAI/TeleChat-12B-v2 | key_value, query | telechat | ✔ | ✘ | ✘ | ✘ | - | Tele-AI/TeleChat-12B-v2 | |
telechat-12b-v2-gptq-int4 | swift/TeleChat-12B-V2-GPTQ-Int4 | key_value, query | telechat | ✔ | ✘ | ✘ | ✘ | auto_gptq>=0.5 | - | - |
telechat2-115b | TeleAI/TeleChat2-115B | key_value, query | telechat2 | ✔ | ✘ | ✘ | ✘ | - | Tele-AI/TeleChat2-115B | |
grok-1 | colossalai/grok-1-pytorch | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | ✘ | ✘ | - | hpcai-tech/grok-1 | |
dbrx-instruct | AI-ModelScope/dbrx-instruct | attn.Wqkv | dbrx | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | databricks/dbrx-instruct |
dbrx-base | AI-ModelScope/dbrx-base | attn.Wqkv | dbrx | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | moe | databricks/dbrx-base |
mengzi3-13b-base | langboat/Mengzi3-13B-Base | q_proj, k_proj, v_proj | mengzi | ✔ | ✔ | ✘ | ✘ | - | Langboat/Mengzi3-13B-Base | |
c4ai-command-r-v01 | AI-ModelScope/c4ai-command-r-v01 | q_proj, k_proj, v_proj | c4ai | ✔ | ✔ | ✘ | ✘ | transformers>=4.39.1 | - | CohereForAI/c4ai-command-r-v01 |
c4ai-command-r-plus | AI-ModelScope/c4ai-command-r-plus | q_proj, k_proj, v_proj | c4ai | ✔ | ✔ | ✘ | ✘ | transformers>4.39 | - | CohereForAI/c4ai-command-r-plus |
codestral-22b | swift/Codestral-22B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.34 | - | mistralai/Codestral-22B-v0.1 |
Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support vLLM | Support LMDeploy | Support Megatron | Requires | Tags | HF Model ID |
---|---|---|---|---|---|---|---|---|---|---|
qwen-vl | qwen/Qwen-VL | ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* | qwen-vl-generation | ✔ | ✔ | ✔ | ✘ | vision | Qwen/Qwen-VL | |
qwen-vl-chat | qwen/Qwen-VL-Chat | ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* | qwen-vl | ✔ | ✔ | ✔ | ✘ | vision | Qwen/Qwen-VL-Chat | |
qwen-vl-chat-int4 | qwen/Qwen-VL-Chat-Int4 | ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* | qwen-vl | ✔ | ✔ | ✘ | ✘ | auto_gptq>=0.5 | vision | Qwen/Qwen-VL-Chat-Int4 |
qwen-audio | qwen/Qwen-Audio | ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* | qwen-audio-generation | ✔ | ✘ | ✘ | ✘ | audio | Qwen/Qwen-Audio | |
qwen-audio-chat | qwen/Qwen-Audio-Chat | ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* | qwen-audio | ✔ | ✘ | ✘ | ✘ | audio | Qwen/Qwen-Audio-Chat | |
qwen2-audio-7b | qwen/Qwen2-Audio-7B | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-audio-generation | ✔ | ✘ | ✘ | ✘ | librosa, transformers>=4.45 | audio | Qwen/Qwen2-Audio-7B |
qwen2-audio-7b-instruct | qwen/Qwen2-Audio-7B-Instruct | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-audio | ✔ | ✘ | ✘ | ✘ | librosa, transformers>=4.45 | audio | Qwen/Qwen2-Audio-7B-Instruct |
qwen2-vl-2b | qwen/Qwen2-VL-2B | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-2B |
qwen2-vl-2b-instruct | qwen/Qwen2-VL-2B-Instruct | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-2B-Instruct |
qwen2-vl-2b-instruct-gptq-int4 | qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 |
qwen2-vl-2b-instruct-gptq-int8 | qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 |
qwen2-vl-2b-instruct-awq | qwen/Qwen2-VL-2B-Instruct-AWQ | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, autoawq | vision, video | Qwen/Qwen2-VL-2B-Instruct-AWQ |
qwen2-vl-7b | qwen/Qwen2-VL-7B | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-7B |
qwen2-vl-7b-instruct | qwen/Qwen2-VL-7B-Instruct | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-7B-Instruct |
qwen2-vl-7b-instruct-gptq-int4 | qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 |
qwen2-vl-7b-instruct-gptq-int8 | qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 |
qwen2-vl-7b-instruct-awq | qwen/Qwen2-VL-7B-Instruct-AWQ | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, autoawq | vision, video | Qwen/Qwen2-VL-7B-Instruct-AWQ |
qwen2-vl-72b | qwen/Qwen2-VL-72B | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-72B |
qwen2-vl-72b-instruct | qwen/Qwen2-VL-72B-Instruct | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils | vision, video | Qwen/Qwen2-VL-72B-Instruct |
qwen2-vl-72b-instruct-gptq-int4 | qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 |
qwen2-vl-72b-instruct-gptq-int8 | qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 | vision, video | Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 |
qwen2-vl-72b-instruct-awq | qwen/Qwen2-VL-72B-Instruct-AWQ | ^(model)(?!.*(lm_head|output|emb|wte|shared)).* | qwen2-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.45.dev.0, qwen_vl_utils, autoawq | vision, video | Qwen/Qwen2-VL-72B-Instruct-AWQ |
glm4v-9b-chat | ZhipuAI/glm-4v-9b | ^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).* | glm4v | ✘ | ✘ | ✘ | ✘ | transformers>=4.42 | vision | THUDM/glm-4v-9b |
llama3_2-11b-vision | LLM-Research/Llama-3.2-11B-Vision | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3_2-vision-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.45 | vision | meta-llama/Llama-3.2-11B-Vision |
llama3_2-11b-vision-instruct | LLM-Research/Llama-3.2-11B-Vision-Instruct | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3_2-vision | ✔ | ✔ | ✘ | ✘ | transformers>=4.45 | vision | meta-llama/Llama-3.2-11B-Vision-Instruct |
llama3_2-90b-vision | LLM-Research/Llama-3.2-90B-Vision | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3_2-vision-generation | ✔ | ✔ | ✘ | ✘ | transformers>=4.45 | vision | meta-llama/Llama-3.2-90B-Vision |
llama3_2-90b-vision-instruct | LLM-Research/Llama-3.2-90B-Vision-Instruct | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3_2-vision | ✔ | ✔ | ✘ | ✘ | transformers>=4.45 | vision | meta-llama/Llama-3.2-90B-Vision-Instruct |
llama3_1-8b-omni | ICTNLP/Llama-3.1-8B-Omni | ^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3_1-omni | ✔ | ✘ | ✘ | ✘ | whisper, openai-whisper | audio | ICTNLP/Llama-3.1-8B-Omni |
idefics3-8b-llama3 | AI-ModelScope/Idefics3-8B-Llama3 | ^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).* | idefics3 | ✔ | ✘ | ✘ | ✘ | transformers>=4.45 | vision | HuggingFaceM4/Idefics3-8B-Llama3 |
llava1_5-7b-instruct | swift/llava-1.5-7b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava1_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | vision | llava-hf/llava-1.5-7b-hf |
llava1_5-13b-instruct | swift/llava-1.5-13b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava1_5 | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | vision | llava-hf/llava-1.5-13b-hf |
llava1_6-mistral-7b-instruct | swift/llava-v1.6-mistral-7b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-mistral | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-v1.6-mistral-7b-hf |
llava1_6-vicuna-7b-instruct | swift/llava-v1.6-vicuna-7b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-vicuna | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-v1.6-vicuna-7b-hf |
llava1_6-vicuna-13b-instruct | swift/llava-v1.6-vicuna-13b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-vicuna | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-v1.6-vicuna-13b-hf |
llava1_6-llama3_1-8b-instruct | DaozeZhang/llava-llama3.1-8b | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-next-llama3 | ✔ | ✘ | ✘ | ✘ | transformers>=4.41 | vision | - |
llava1_6-yi-34b-instruct | swift/llava-v1.6-34b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-yi | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-v1.6-34b-hf |
llama3-llava-next-8b-hf | swift/llama3-llava-next-8b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama-llava-next-hf | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llama3-llava-next-8b-hf |
llava-next-72b-hf | AI-ModelScope/llava-next-72b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama-qwen-hf | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-next-72b-hf |
llava-next-110b-hf | AI-ModelScope/llava-next-110b-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama-qwen-hf | ✔ | ✔ | ✘ | ✘ | transformers>=4.39 | vision | llava-hf/llava-next-110b-hf |
llava-onevision-qwen2-0_5b-ov | AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-onevision-qwen | ✔ | ✘ | ✘ | ✘ | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-0.5b-ov-hf |
llava-onevision-qwen2-7b-ov | AI-ModelScope/llava-onevision-qwen2-7b-ov-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-onevision-qwen | ✔ | ✘ | ✘ | ✘ | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-7b-ov-hf |
llava-onevision-qwen2-72b-ov | AI-ModelScope/llava-onevision-qwen2-72b-ov-hf | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-onevision-qwen | ✔ | ✘ | ✘ | ✘ | transformers>=4.45 | vision, video | llava-hf/llava-onevision-qwen2-72b-ov-hf |
llama3-llava-next-8b | AI-Modelscope/llama3-llava-next-8b | ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llama3-llava-next | ✔ | ✘ | ✘ | ✘ | vision | lmms-lab/llama3-llava-next-8b | |
llava-next-72b | AI-Modelscope/llava-next-72b | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-qwen | ✔ | ✘ | ✘ | ✘ | vision | lmms-lab/llava-next-72b | |
llava-next-110b | AI-Modelscope/llava-next-110b | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-qwen | ✔ | ✘ | ✘ | ✘ | vision | lmms-lab/llava-next-110b | |
llava-next-video-7b-instruct | swift/LLaVA-NeXT-Video-7B-hf | ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* | llava-next-video | ✔ | ✔ | ✘ | ✘ | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-hf |
llava-next-video-7b-32k-instruct | swift/LLaVA-NeXT-Video-7B-32K-hf | ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* | llava-next-video | ✔ | ✔ | ✘ | ✘ | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-32K-hf |
llava-next-video-7b-dpo-instruct | swift/LLaVA-NeXT-Video-7B-DPO-hf | ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* | llava-next-video | ✔ | ✔ | ✘ | ✘ | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-7B-DPO-hf |
llava-next-video-34b-instruct | swift/LLaVA-NeXT-Video-34B-hf | ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* | llava-next-video-yi | ✔ | ✔ | ✘ | ✘ | transformers>=4.42, av | video | llava-hf/LLaVA-NeXT-Video-34B-hf |
yi-vl-6b-chat | 01ai/Yi-VL-6B | ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* | yi-vl | ✔ | ✘ | ✘ | ✘ | transformers>=4.34 | vision | 01-ai/Yi-VL-6B |
yi-vl-34b-chat | 01ai/Yi-VL-34B | ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* | yi-vl | ✔ | ✘ | ✘ | ✘ | transformers>=4.34 | vision | 01-ai/Yi-VL-34B |
llava-llama3-8b-v1_1 | AI-ModelScope/llava-llama-3-8b-v1_1-transformers | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | llava-llama-instruct | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | vision | xtuner/llava-llama-3-8b-v1_1-transformers |
internlm-xcomposer2-7b-chat | Shanghai_AI_Laboratory/internlm-xcomposer2-7b | attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 | internlm-xcomposer2 | ✔ | ✘ | ✔ | ✘ | vision | internlm/internlm-xcomposer2-7b | |
internlm-xcomposer2-4khd-7b-chat | Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b | attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 | internlm-xcomposer2-4khd | ✔ | ✘ | ✔ | ✘ | vision | internlm/internlm-xcomposer2-4khd-7b | |
internlm-xcomposer2_5-7b-chat | Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b | attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 | internlm-xcomposer2_5 | ✔ | ✘ | ✔ | ✘ | vision | internlm/internlm-xcomposer2d5-7b | |
internvl-chat-v1_5 | AI-ModelScope/InternVL-Chat-V1-5 | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl | ✔ | ✔ | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5 |
internvl-chat-v1_5-int8 | AI-ModelScope/InternVL-Chat-V1-5-int8 | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl | ✔ | ✘ | ✘ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5-int8 |
mini-internvl-chat-2b-v1_5 | OpenGVLab/Mini-InternVL-Chat-2B-V1-5 | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl | ✔ | ✔ | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/Mini-InternVL-Chat-2B-V1-5 |
mini-internvl-chat-4b-v1_5 | OpenGVLab/Mini-InternVL-Chat-4B-V1-5 | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl-phi3 | ✔ | ✔ | ✘ | ✘ | transformers>=4.35,<4.42, timm | vision | OpenGVLab/Mini-InternVL-Chat-4B-V1-5 |
internvl2-1b | OpenGVLab/InternVL2-1B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-1B |
internvl2-2b | OpenGVLab/InternVL2-2B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-2B |
internvl2-4b | OpenGVLab/InternVL2-4B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2-phi3 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36,<4.42, timm | vision, video | OpenGVLab/InternVL2-4B |
internvl2-8b | OpenGVLab/InternVL2-8B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-8B |
internvl2-26b | OpenGVLab/InternVL2-26B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-26B |
internvl2-40b | OpenGVLab/InternVL2-40B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-40B |
internvl2-llama3-76b | OpenGVLab/InternVL2-Llama3-76B | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-Llama3-76B |
internvl2-2b-awq | OpenGVLab/InternVL2-2B-AWQ | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-2B-AWQ |
internvl2-8b-awq | OpenGVLab/InternVL2-8B-AWQ | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-8B-AWQ |
internvl2-26b-awq | OpenGVLab/InternVL2-26B-AWQ | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-26B-AWQ |
internvl2-40b-awq | OpenGVLab/InternVL2-40B-AWQ | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-40B-AWQ |
internvl2-llama3-76b-awq | OpenGVLab/InternVL2-Llama3-76B-AWQ | ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* | internvl2 | ✔ | ✔ | ✔ | ✘ | transformers>=4.36, timm | vision, video | OpenGVLab/InternVL2-Llama3-76B-AWQ |
deepseek-vl-1_3b-chat | deepseek-ai/deepseek-vl-1.3b-chat | ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* | deepseek-vl | ✔ | ✘ | ✔ | ✘ | vision | deepseek-ai/deepseek-vl-1.3b-chat | |
deepseek-vl-7b-chat | deepseek-ai/deepseek-vl-7b-chat | ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* | deepseek-vl | ✔ | ✘ | ✔ | ✘ | vision | deepseek-ai/deepseek-vl-7b-chat | |
ovis1_6-gemma2-9b | AIDC-AI/Ovis1.6-Gemma2-9B | ^(llm)(?!.*(lm_head|output|emb|wte|shared)).* | ovis1_6 | ✔ | ✘ | ✘ | ✘ | transformers>=4.42 | vision | AIDC-AI/Ovis1.6-Gemma2-9B |
paligemma-3b-pt-224 | AI-ModelScope/paligemma-3b-pt-224 | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | paligemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-224 |
paligemma-3b-pt-448 | AI-ModelScope/paligemma-3b-pt-448 | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | paligemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-448 |
paligemma-3b-pt-896 | AI-ModelScope/paligemma-3b-pt-896 | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | paligemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-896 |
paligemma-3b-mix-224 | AI-ModelScope/paligemma-3b-mix-224 | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | paligemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-mix-224 |
paligemma-3b-mix-448 | AI-ModelScope/paligemma-3b-mix-448 | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | paligemma | ✔ | ✔ | ✘ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-mix-448 |
minicpm-v-3b-chat | OpenBMB/MiniCPM-V | ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* | minicpm-v | ✔ | ✘ | ✘ | ✘ | timm, transformers<4.42 | vision | openbmb/MiniCPM-V |
minicpm-v-v2-chat | OpenBMB/MiniCPM-V-2 | ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* | minicpm-v | ✔ | ✘ | ✘ | ✘ | timm, transformers<4.42 | vision | openbmb/MiniCPM-V-2 |
minicpm-v-v2_5-chat | OpenBMB/MiniCPM-Llama3-V-2_5 | ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* | minicpm-v-v2_5 | ✔ | ✔ | ✘ | ✘ | timm, transformers>=4.36 | vision | openbmb/MiniCPM-Llama3-V-2_5 |
minicpm-v-v2_6-chat | OpenBMB/MiniCPM-V-2_6 | ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* | minicpm-v-v2_6 | ✔ | ✔ | ✘ | ✘ | timm, transformers>=4.36 | vision, video | openbmb/MiniCPM-V-2_6 |
pixtral-12b | AI-ModelScope/pixtral-12b | ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* | pixtral | ✘ | ✘ | ✘ | ✘ | transformers>=4.45 | vision | mistral-community/pixtral-12b |
mplug-owl2-chat | iic/mPLUG-Owl2 | q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 | mplug-owl2 | ✔ | ✘ | ✘ | ✘ | transformers<4.35, icecream | vision | MAGAer13/mplug-owl2-llama2-7b |
mplug-owl2_1-chat | iic/mPLUG-Owl2.1 | c_attn.multiway.0, c_attn.multiway.1 | mplug-owl2 | ✔ | ✘ | ✘ | ✘ | transformers<4.35, icecream | vision | Mizukiluke/mplug_owl_2_1 |
mplug-owl3-1b-chat | iic/mPLUG-Owl3-1B-241014 | ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* | mplug_owl3 | ✔ | ✘ | ✘ | ✘ | transformers>=4.36, icecream | vision, video | mPLUG/mPLUG-Owl3-1B-241014 |
mplug-owl3-2b-chat | iic/mPLUG-Owl3-2B-241014 | ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* | mplug_owl3 | ✔ | ✘ | ✘ | ✘ | transformers>=4.36, icecream | vision, video | mPLUG/mPLUG-Owl3-2B-241014 |
mplug-owl3-7b-chat | iic/mPLUG-Owl3-7B-240728 | ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* | mplug_owl3 | ✔ | ✘ | ✘ | ✘ | transformers>=4.36, icecream | vision, video | mPLUG/mPLUG-Owl3-7B-240728 |
phi3-vision-128k-instruct | LLM-Research/Phi-3-vision-128k-instruct | ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* | phi3-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | vision | microsoft/Phi-3-vision-128k-instruct |
phi3_5-vision-instruct | LLM-Research/Phi-3.5-vision-instruct | ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* | phi3-vl | ✔ | ✔ | ✘ | ✘ | transformers>=4.36 | vision | microsoft/Phi-3.5-vision-instruct |
cogvlm-17b-chat | ZhipuAI/cogvlm-chat | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogvlm | ✘ | ✘ | ✘ | ✘ | transformers<4.42 | vision | THUDM/cogvlm-chat-hf |
cogvlm2-19b-chat | ZhipuAI/cogvlm2-llama3-chinese-chat-19B | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogvlm | ✘ | ✘ | ✔ | ✘ | transformers<4.42 | vision | THUDM/cogvlm2-llama3-chinese-chat-19B |
cogvlm2-en-19b-chat | ZhipuAI/cogvlm2-llama3-chat-19B | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogvlm | ✘ | ✘ | ✔ | ✘ | transformers<4.42 | vision | THUDM/cogvlm2-llama3-chat-19B |
cogvlm2-video-13b-chat | ZhipuAI/cogvlm2-video-llama3-chat | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogvlm2-video | ✘ | ✘ | ✘ | ✘ | decord, pytorchvideo, transformers>=4.42 | vision, video | THUDM/cogvlm2-video-llama3-chat |
cogagent-18b-chat | ZhipuAI/cogagent-chat | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogagent-chat | ✘ | ✘ | ✘ | ✘ | timm | vision | THUDM/cogagent-chat-hf |
cogagent-18b-instruct | ZhipuAI/cogagent-vqa | ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* | cogagent-instruct | ✘ | ✘ | ✘ | ✘ | timm | vision | THUDM/cogagent-vqa-hf |
molmoe-1b | LLM-Research/MolmoE-1B-0924 | ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* | molmo | ✔ | ✘ | ✘ | ✘ | transformers>=4.45.0 | vision | allenai/MolmoE-1B-0924 |
molmo-7b-o | LLM-Research/Molmo-7B-O-0924 | ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* | molmo | ✔ | ✘ | ✘ | ✘ | transformers>=4.45.0 | vision | allenai/Molmo-7B-O-0924 |
molmo-7b-d | LLM-Research/Molmo-7B-D-0924 | ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* | molmo | ✔ | ✘ | ✘ | ✘ | transformers>=4.45.0 | vision | allenai/Molmo-7B-D-0924 |
molmo-72b | LLM-Research/Molmo-72B-0924 | ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* | molmo | ✔ | ✘ | ✘ | ✘ | transformers>=4.45.0 | vision | allenai/Molmo-72B-0924 |
florence-2-base | AI-ModelScope/Florence-2-base | ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* | florence | ✔ | ✘ | ✘ | ✘ | vision | microsoft/Florence-2-base | |
florence-2-base-ft | AI-ModelScope/Florence-2-base-ft | ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* | florence | ✔ | ✘ | ✘ | ✘ | vision | microsoft/Florence-2-base-ft | |
florence-2-large | AI-ModelScope/Florence-2-large | ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* | florence | ✔ | ✘ | ✘ | ✘ | vision | microsoft/Florence-2-large | |
florence-2-large-ft | AI-ModelScope/Florence-2-large-ft | ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* | florence | ✔ | ✘ | ✘ | ✘ | vision | microsoft/Florence-2-large-ft | |
got-ocr2 | stepfun-ai/GOT-OCR2_0 | ^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).* | got_ocr2 | ✔ | ✘ | ✘ | ✘ | audio | stepfun-ai/GOT-OCR2_0 |
The table below introduces the datasets supported by SWIFT:
- Dataset Name: The dataset name registered in SWIFT.
- Dataset ID: The dataset id in ModelScope.
- Size: The data row count of the dataset.
- Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name | Dataset ID | Subsets | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
---|---|---|---|---|---|---|
🔥ms-bench | iic/ms_bench | 316820 | 346.9±443.2, min=22, max=30960 | chat, general, multi-round | - | |
🔥alpaca-en | AI-ModelScope/alpaca-gpt4-data-en | 52002 | 176.2±125.8, min=26, max=740 | chat, general | vicgalle/alpaca-gpt4 | |
🔥alpaca-zh | AI-ModelScope/alpaca-gpt4-data-zh | 48818 | 162.1±93.9, min=26, max=856 | chat, general | llm-wizard/alpaca-gpt4-data-zh | |
multi-alpaca | damo/nlp_polylm_multialpaca_sft | ar de es fr id ja ko pt ru th vi |
131867 | 112.9±50.6, min=26, max=1226 | chat, general, multilingual | - |
instinwild | wyj123456/instinwild | default subset |
103695 | 145.4±60.7, min=28, max=1434 | - | - |
cot-en | YorickHe/CoT | 74771 | 122.7±64.8, min=51, max=8320 | chat, general | - | |
cot-zh | YorickHe/CoT_zh | 74771 | 117.5±70.8, min=43, max=9636 | chat, general | - | |
instruct-en | wyj123456/instruct | 888970 | 269.1±331.5, min=26, max=7254 | chat, general | - | |
firefly-zh | AI-ModelScope/firefly-train-1.1M | 1649399 | 178.1±260.4, min=26, max=12516 | chat, general | YeungNLP/firefly-train-1.1M | |
gpt4all-en | wyj123456/GPT4all | 806199 | 302.7±384.5, min=27, max=7391 | chat, general | - | |
sharegpt | swift/sharegpt | common-zh computer-zh unknow-zh common-en computer-en |
96566 | 933.3±864.8, min=21, max=66412 | chat, general, multi-round | - |
tulu-v2-sft-mixture | AI-ModelScope/tulu-v2-sft-mixture | 5119 | 520.7±437.6, min=68, max=2549 | chat, multilingual, general, multi-round | allenai/tulu-v2-sft-mixture | |
wikipedia-zh | AI-ModelScope/wikipedia-cn-20230720-filtered | 254547 | 568.4±713.2, min=37, max=78678 | text-generation, general, pretrained | pleisto/wikipedia-cn-20230720-filtered | |
open-orca | AI-ModelScope/OpenOrca | 994896 | 382.3±417.4, min=31, max=8740 | chat, multilingual, general | - | |
🔥sharegpt-gpt4 | AI-ModelScope/sharegpt_gpt4 | default V3_format zh_38K_format |
72684 | 1047.6±1313.1, min=22, max=66412 | chat, multilingual, general, multi-round, gpt4 | - |
deepctrl-sft | AI-ModelScope/deepctrl-sft-data | default en |
14149024 | 389.8±628.6, min=21, max=626237 | chat, general, sft, multi-round | - |
🔥coig-cqia | AI-ModelScope/COIG-CQIA | chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu |
44694 | 703.8±654.2, min=33, max=19288 | general | - |
🔥ruozhiba | AI-ModelScope/ruozhiba | post-annual title-good title-norm |
85658 | 39.9±13.1, min=21, max=559 | pretrain | - |
long-alpaca-12k | AI-ModelScope/LongAlpaca-12k | 11998 | 9619.0±8295.8, min=36, max=78925 | longlora, QA | Yukang/LongAlpaca-12k | |
lmsys-chat-1m | AI-ModelScope/lmsys-chat-1m | - | Dataset is too huge, please click the original link to view the dataset stat. | chat, em | lmsys/lmsys-chat-1m | |
🔥ms-agent | iic/ms_agent | 26336 | 650.9±217.2, min=209, max=2740 | chat, agent, multi-round | - | |
🔥ms-agent-for-agentfabric | AI-ModelScope/ms_agent_for_agentfabric | default addition |
30000 | 617.8±199.1, min=251, max=2657 | chat, agent, multi-round | - |
ms-agent-multirole | iic/MSAgent-MultiRole | 9500 | 447.6±84.9, min=145, max=1101 | chat, agent, multi-round, role-play, multi-agent | - | |
🔥toolbench-for-alpha-umi | shenweizhou/alpha-umi-toolbench-processed-v2 | backbone caller planner summarizer |
1448337 | 1439.7±853.9, min=123, max=18467 | chat, agent | - |
damo-agent-zh | damo/MSAgent-Bench | 386984 | 956.5±407.3, min=326, max=19001 | chat, agent, multi-round | - | |
damo-agent-zh-mini | damo/MSAgent-Bench | 20845 | 1326.4±329.6, min=571, max=4304 | chat, agent, multi-round | - | |
agent-instruct-all-en | huangjintao/AgentInstruct_copy | alfworld db kg mind2web os webshop |
1866 | 1144.3±635.5, min=206, max=6412 | chat, agent, multi-round | - |
🔥msagent-pro | iic/MSAgent-Pro | 21905 | 1524.5±921.3, min=64, max=16770 | chat, agent, multi-round | - | |
toolbench | swift/ToolBench | 124345 | 3669.5±1600.9, min=1047, max=22581 | chat, agent, multi-round | - | |
code-alpaca-en | wyj123456/code_alpaca_en | 20016 | 100.2±60.1, min=29, max=1776 | - | sahil2801/CodeAlpaca-20k | |
🔥leetcode-python-en | AI-ModelScope/leetcode-solutions-python | 2359 | 727.1±235.9, min=259, max=2146 | chat, coding | - | |
🔥codefuse-python-en | codefuse-ai/CodeExercise-Python-27k | 27224 | 483.6±193.9, min=45, max=3082 | chat, coding | - | |
🔥codefuse-evol-instruction-zh | codefuse-ai/Evol-instruction-66k | 66862 | 439.6±206.3, min=37, max=2983 | chat, coding | - | |
medical-en | swift/medical_zh | en | 117617 | 257.4±89.1, min=36, max=2564 | chat, medical | - |
medical-zh | swift/medical_zh | zh | 1950972 | 167.2±219.7, min=26, max=27351 | chat, medical | - |
🔥disc-med-sft-zh | AI-ModelScope/DISC-Med-SFT | 441767 | 354.1±193.1, min=25, max=2231 | chat, medical | Flmc/DISC-Med-SFT | |
lawyer-llama-zh | AI-ModelScope/lawyer_llama_data | 21476 | 194.4±91.7, min=27, max=924 | chat, law | Skepsun/lawyer_llama_data | |
tigerbot-law-zh | AI-ModelScope/tigerbot-law-plugin | 55895 | 109.9±126.4, min=37, max=18878 | text-generation, law, pretrained | TigerResearch/tigerbot-law-plugin | |
🔥disc-law-sft-zh | AI-ModelScope/DISC-Law-SFT | 166758 | 533.7±495.4, min=30, max=15169 | chat, law | ShengbinYue/DISC-Law-SFT | |
🔥blossom-math-zh | AI-ModelScope/blossom-math-v2 | 10000 | 169.3±58.7, min=35, max=563 | chat, math | Azure99/blossom-math-v2 | |
school-math-zh | AI-ModelScope/school_math_0.25M | 248480 | 157.7±72.2, min=33, max=3450 | chat, math, quality | BelleGroup/school_math_0.25M | |
open-platypus-en | AI-ModelScope/Open-Platypus | 24926 | 367.9±254.8, min=30, max=3951 | chat, math, quality | garage-bAInd/Open-Platypus | |
text2sql-en | AI-ModelScope/texttosqlv2_25000_v2 | 25000 | 274.6±326.4, min=38, max=1975 | chat, sql | Clinton/texttosqlv2_25000_v2 | |
🔥sql-create-context-en | AI-ModelScope/sql-create-context | 78577 | 80.2±17.8, min=36, max=456 | chat, sql | b-mc2/sql-create-context | |
synthetic-text-to-sql | AI-ModelScope/synthetic_text_to_sql | default | 100000 | 283.4±115.8, min=61, max=1356 | nl2sql, en | gretelai/synthetic_text_to_sql |
🔥advertise-gen-zh | lvjianjin/AdvertiseGen | 98399 | 130.6±21.7, min=51, max=241 | text-generation | shibing624/AdvertiseGen | |
🔥dureader-robust-zh | modelscope/DuReader_robust-QG | 17899 | 241.1±137.4, min=60, max=1416 | text-generation | - | |
cmnli-zh | modelscope/clue | cmnli | 404024 | 82.6±16.6, min=51, max=199 | text-generation, classification | clue |
🔥jd-sentiment-zh | DAMO_NLP/jd | 50000 | 66.0±83.2, min=39, max=4039 | text-generation, classification | - | |
🔥hc3-zh | simpleai/HC3-Chinese | baike open_qa nlpcc_dbqa finance medicine law psychology |
39781 | 176.8±81.5, min=57, max=3051 | text-generation, classification | Hello-SimpleAI/HC3-Chinese |
🔥hc3-en | simpleai/HC3 | finance medicine |
11021 | 298.3±138.7, min=65, max=2267 | text-generation, classification | Hello-SimpleAI/HC3 |
dolly-15k | AI-ModelScope/databricks-dolly-15k | default | 15011 | 199.2±267.8, min=22, max=8615 | multi-task, en, quality | databricks/databricks-dolly-15k |
zhihu-kol | OmniData/Zhihu-KOL | default | - | Dataset is too huge, please click the original link to view the dataset stat. | zhihu, qa | wangrui6/Zhihu-KOL |
zhihu-kol-filtered | OmniData/Zhihu-KOL-More-Than-100-Upvotes | default | 271261 | 952.0±1727.2, min=25, max=98658 | zhihu, qa | bzb2023/Zhihu-KOL-More-Than-100-Upvotes |
finance-en | wyj123456/finance_en | 68911 | 135.6±134.3, min=26, max=3525 | chat, financial | ssbuild/alpaca_finance_en | |
poetry-zh | modelscope/chinese-poetry-collection | 390309 | 55.2±9.4, min=23, max=83 | text-generation, poetry | - | |
webnovel-zh | AI-ModelScope/webnovel_cn | 50000 | 1478.9±11526.1, min=100, max=490484 | chat, novel | zxbsmk/webnovel_cn | |
generated-chat-zh | AI-ModelScope/generated_chat_0.4M | 396004 | 273.3±52.0, min=32, max=873 | chat, character-dialogue | BelleGroup/generated_chat_0.4M | |
🔥self-cognition | swift/self-cognition | 134 | 53.6±18.6, min=29, max=121 | chat, self-cognition | modelscope/self-cognition | |
🔥swift-mix | swift/swift-sft-mixture | sharegpt firefly codefuse metamathqa |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, sft, general | - |
cls-fudan-news-zh | damo/zh_cls_fudan-news | 4959 | 3234.4±2547.5, min=91, max=19548 | chat, classification | - | |
ner-jave-zh | damo/zh_ner-JAVE | 1266 | 118.3±45.5, min=44, max=223 | chat, ner | - | |
coco-en | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 299.8±2.8, min=295, max=352 | chat, multi-modal, vision | - |
🔥coco-en-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 299.8±2.6, min=295, max=338 | chat, multi-modal, vision | - |
coco-en-2 | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 36.8±2.8, min=32, max=89 | chat, multi-modal, vision | - |
🔥coco-en-2-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 36.8±2.6, min=32, max=75 | chat, multi-modal, vision | - |
capcha-images | AI-ModelScope/captcha-images | 8000 | 31.0±0.0, min=31, max=31 | chat, multi-modal, vision | - | |
latex-ocr-print | AI-ModelScope/LaTeX_OCR | full | 17918 | 362.7±34.8, min=294, max=528 | chat, ocr, multi-modal, vision | linxy/LaTeX_OCR |
latex-ocr-handwrite | AI-ModelScope/LaTeX_OCR | synthetic_handwrite | 95424 | 375.1±59.4, min=292, max=2115 | chat, ocr, multi-modal, vision | linxy/LaTeX_OCR |
aishell1-zh | speech_asr/speech_asr_aishell1_trainsets | 141600 | 152.2±36.8, min=63, max=419 | chat, multi-modal, audio | - | |
🔥aishell1-zh-mini | speech_asr/speech_asr_aishell1_trainsets | 14526 | 152.2±35.6, min=74, max=359 | chat, multi-modal, audio | - | |
🔥video-chatgpt | swift/VideoChatGPT | Generic Temporal Consistency |
3206 | 88.4±48.3, min=32, max=399 | chat, multi-modal, video | lmms-lab/VideoChatGPT |
egoschema | AI-ModelScope/egoschema | Subset | 101 | 191.6±80.7, min=96, max=435 | chat, multi-modal, video | lmms-lab/egoschema |
hh-rlhf | AI-ModelScope/hh-rlhf | harmless-base helpful-base helpful-online helpful-rejection-sampled |
127459 | 245.4±190.7, min=22, max=1999 | rlhf, dpo, pairwise | - |
🔥hh-rlhf-cn | AI-ModelScope/hh_rlhf_cn | hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en |
355920 | 171.2±122.7, min=22, max=3078 | rlhf, dpo, pairwise | - |
orpo-dpo-mix-40k | AI-ModelScope/orpo-dpo-mix-40k | default | 43666 | 548.3±397.4, min=28, max=8483 | dpo, orpo, en, quality | mlabonne/orpo-dpo-mix-40k |
stack-exchange-paired | AI-ModelScope/stack-exchange-paired | 4483004 | 534.5±594.6, min=31, max=56588 | hfrl, dpo, pairwise | lvwerra/stack-exchange-paired | |
shareai-llama3-dpo-zh-en-emoji | hjh0119/shareAI-Llama3-DPO-zh-en-emoji | default | 2449 | 334.0±162.8, min=36, max=1801 | rlhf, dpo, pairwise | - |
ultrafeedback-kto | AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto | default | 230720 | 11.0±0.0, min=11, max=11 | rlhf, kto | - |
rlaif-v | swift/RLAIF-V-Dataset | default | 83132 | 119.8±52.6, min=28, max=556 | rlhf, dpo, multi-modal, en | openbmb/RLAIF-V-Dataset |
pileval | swift/pile-val-backup | 214670 | 1612.3±8856.2, min=11, max=1208955 | text-generation, awq | mit-han-lab/pile-val-backup | |
mantis-instruct | swift/Mantis-Instruct | birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling |
655351 | 825.7±812.5, min=284, max=13563 | chat, multi-modal, vision, quality | TIGER-Lab/Mantis-Instruct |
llava-data-instruct | swift/llava-data | llava_instruct | 364100 | 189.0±142.1, min=33, max=5183 | sft, multi-modal, quality | TIGER-Lab/llava-data |
midefics | swift/MideficsDataset | 3800 | 201.3±70.2, min=60, max=454 | medical, en, vqa | WinterSchool/MideficsDataset | |
gqa | None | train_all_instructions | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, vqa, quality | lmms-lab/GQA |
text-caps | swift/TextCaps | 18145 | 38.2±4.4, min=31, max=73 | multi-modal, en, caption, quality | HuggingFaceM4/TextCaps | |
refcoco-unofficial-caption | swift/refcoco | 46215 | 44.7±3.2, min=36, max=71 | multi-modal, en, caption | jxu124/refcoco | |
refcoco-unofficial-grounding | swift/refcoco | 46215 | 45.2±3.1, min=37, max=69 | multi-modal, en, grounding | jxu124/refcoco | |
refcocog-unofficial-caption | swift/refcocog | 44799 | 49.7±4.7, min=37, max=88 | multi-modal, en, caption | jxu124/refcocog | |
refcocog-unofficial-grounding | swift/refcocog | 44799 | 50.1±4.7, min=37, max=90 | multi-modal, en, grounding | jxu124/refcocog | |
a-okvqa | swift/A-OKVQA | 18201 | 45.8±7.9, min=32, max=100 | multi-modal, en, vqa, quality | HuggingFaceM4/A-OKVQA | |
okvqa | swift/OK-VQA_train | 9009 | 34.4±3.3, min=28, max=59 | multi-modal, en, vqa, quality | Multimodal-Fatima/OK-VQA_train | |
ocr-vqa | swift/OCR-VQA | 186753 | 35.6±6.6, min=29, max=193 | multi-modal, en, ocr-vqa | howard-hou/OCR-VQA | |
grit | swift/GRIT | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, caption-grounding, quality | zzliang/GRIT | |
llava-instruct-mix | swift/llava-instruct-mix-vsft | 13640 | 179.8±120.2, min=30, max=962 | multi-modal, en, vqa, quality | HuggingFaceH4/llava-instruct-mix-vsft | |
lnqa | swift/lnqa | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, ocr-vqa, quality | vikhyatk/lnqa | |
science-qa | swift/ScienceQA | 8315 | 100.3±59.5, min=38, max=638 | multi-modal, science, vqa, quality | derek-thomas/ScienceQA | |
guanaco | AI-ModelScope/GuanacoDataset | default | 31561 | 250.1±70.3, min=89, max=1436 | chat, zh | JosephusCheung/GuanacoDataset |
mind2web | swift/Multimodal-Mind2Web | 1009 | 297522.4±325496.2, min=8592, max=3499715 | agent, multi-modal | osunlp/Multimodal-Mind2Web | |
sharegpt-4o-image | AI-ModelScope/ShareGPT-4o | image_caption | 57289 | 638.7±157.9, min=47, max=4640 | vqa, multi-modal | OpenGVLab/ShareGPT-4o |
pixelprose | swift/pixelprose | - | Dataset is too huge, please click the original link to view the dataset stat. | caption, multi-modal, vision | tomg-group-umd/pixelprose | |
m3it | AI-ModelScope/M3IT | coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, multi-modal, vision | - |
sharegpt4v | AI-ModelScope/ShareGPT4V | ShareGPT4V ShareGPT4V-PT |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, multi-modal, vision | - |
llava-instruct-150k | AI-ModelScope/LLaVA-Instruct-150K | 624610 | 490.4±180.2, min=288, max=5438 | chat, multi-modal, vision | - | |
llava-pretrain | AI-ModelScope/LLaVA-Pretrain | default | - | Dataset is too huge, please click the original link to view the dataset stat. | vqa, multi-modal, quality | liuhaotian/LLaVA-Pretrain |
sa1b-dense-caption | Tongyi-DataEngine/SA1B-Dense-Caption | - | Dataset is too huge, please click the original link to view the dataset stat. | zh, multi-modal, vqa | - | |
sa1b-paired-caption | Tongyi-DataEngine/SA1B-Paired-Captions-Images | - | Dataset is too huge, please click the original link to view the dataset stat. | zh, multi-modal, vqa | - | |
alpaca-cleaned | AI-ModelScope/alpaca-cleaned | 51760 | 177.9±126.4, min=26, max=1044 | chat, general, bench, quality | yahma/alpaca-cleaned | |
aya-collection | swift/aya_collection | aya_dataset | 202364 | 494.0±6911.3, min=21, max=3044268 | multi-lingual, qa | CohereForAI/aya_collection |
belle-generated-chat-0.4M | AI-ModelScope/generated_chat_0.4M | 396004 | 273.3±52.0, min=32, max=873 | common, zh | BelleGroup/generated_chat_0.4M | |
belle-math-0.25M | AI-ModelScope/school_math_0.25M | 248480 | 157.7±72.2, min=33, max=3450 | math, zh | BelleGroup/school_math_0.25M | |
belle-train-0.5M-CN | AI-ModelScope/train_0.5M_CN | 519255 | 129.1±91.5, min=27, max=6507 | common, zh, quality | BelleGroup/train_0.5M_CN | |
belle-train-1M-CN | AI-ModelScope/train_1M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_1M_CN | |
belle-train-2M-CN | AI-ModelScope/train_2M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_2M_CN | |
belle-train-3.5M-CN | swift/train_3.5M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_3.5M_CN | |
c4 | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | allenai/c4 | |
chart-qa | swift/ChartQA | 28299 | 43.1±5.5, min=29, max=77 | en, vqa, quality | HuggingFaceM4/ChartQA | |
chinese-c4 | swift/chinese-c4 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, zh, quality | shjwudp/chinese-c4 | |
cinepile | swift/cinepile | - | Dataset is too huge, please click the original link to view the dataset stat. | vqa, en, youtube, video | tomg-group-umd/cinepile | |
classical-chinese-translate | swift/classical_chinese_translate | 6655 | 344.0±76.4, min=61, max=815 | chat, play-ground | - | |
codealpaca-20k | AI-ModelScope/CodeAlpaca-20k | 20016 | 100.2±60.1, min=29, max=1776 | code, en | HuggingFaceH4/CodeAlpaca_20K | |
cosmopedia | None | auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow |
- | Dataset is too huge, please click the original link to view the dataset stat. | multi-domain, en, qa | HuggingFaceTB/cosmopedia |
cosmopedia-100k | swift/cosmopedia-100k | 100000 | 1024.5±243.1, min=239, max=2981 | multi-domain, en, qa | HuggingFaceTB/cosmopedia-100k | |
dolma | swift/dolma | v1_7 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | allenai/dolma |
dolphin | swift/dolphin | flan1m-alpaca-uncensored flan5m-alpaca-uncensored |
- | Dataset is too huge, please click the original link to view the dataset stat. | en | cognitivecomputations/dolphin |
duet | AI-ModelScope/Duet-v0.5 | 5000 | 1157.4±189.3, min=657, max=2344 | CoT, en | G-reen/Duet-v0.5 | |
evol-instruct-v2 | AI-ModelScope/WizardLM_evol_instruct_V2_196k | 109184 | 480.9±333.1, min=26, max=4942 | chat, en | WizardLM/WizardLM_evol_instruct_V2_196k | |
fineweb | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | HuggingFaceFW/fineweb | |
gen-qa | swift/GenQA | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, quality, multi-task | tomg-group-umd/GenQA | |
github-code | swift/github-code | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | codeparrot/github-code | |
gpt4v-dataset | swift/gpt4v-dataset | 12356 | 217.9±68.3, min=35, max=596 | en, caption, multi-modal, quality | laion/gpt4v-dataset | |
guanaco-belle-merge | AI-ModelScope/guanaco_belle_merge_v1.0 | 693987 | 134.2±92.0, min=24, max=6507 | QA, zh | Chinese-Vicuna/guanaco_belle_merge_v1.0 | |
infinity-instruct | swift/Infinity-Instruct | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, quality, multi-task | BAAI/Infinity-Instruct | |
llava-med-zh-instruct | swift/llava-med-zh-instruct-60k | 56649 | 207.7±67.6, min=37, max=657 | zh, medical, vqa | BUAADreamer/llava-med-zh-instruct-60k | |
🔥longwriter-6k | ZhipuAI/LongWriter-6k | 6000 | 4887.2±2879.2, min=117, max=30354 | long, chat, sft | THUDM/LongWriter-6k | |
🔥longwriter-6k-filtered | swift/longwriter-6k-filtered | 666 | 4108.9±2636.9, min=1190, max=17050 | long, chat, sft | - | |
math-instruct | AI-ModelScope/MathInstruct | 262283 | 254.4±183.5, min=11, max=4383 | math, cot, en, quality | TIGER-Lab/MathInstruct | |
math-plus | TIGER-Lab/MATH-plus | train | 893929 | 287.1±158.7, min=24, max=2919 | qa, math, en, quality | TIGER-Lab/MATH-plus |
moondream2-coyo-5M | swift/moondream2-coyo-5M-captions | - | Dataset is too huge, please click the original link to view the dataset stat. | caption, pretrain, quality | isidentical/moondream2-coyo-5M-captions | |
no-robots | swift/no_robots | 9485 | 298.7±246.4, min=40, max=6739 | multi-task, quality, human-annotated | HuggingFaceH4/no_robots | |
open-hermes | swift/OpenHermes-2.5 | - | Dataset is too huge, please click the original link to view the dataset stat. | cot, en, quality | teknium/OpenHermes-2.5 | |
open-orca-chinese | AI-ModelScope/OpenOrca-Chinese | - | Dataset is too huge, please click the original link to view the dataset stat. | QA, zh, general, quality | yys/OpenOrca-Chinese | |
orca_dpo_pairs | swift/orca_dpo_pairs | 12859 | 366.9±251.9, min=30, max=2010 | rlhf, quality | Intel/orca_dpo_pairs | |
path-vqa | swift/path-vqa | 19654 | 34.8±7.3, min=27, max=85 | multi-modal, vqa, medical | flaviagiammarino/path-vqa | |
pile | AI-ModelScope/pile | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain | EleutherAI/pile | |
poison-mpts | iic/100PoisonMpts | 906 | 150.6±80.8, min=39, max=656 | poison-management, zh | - | |
🔥qwen2-pro-en | AI-ModelScope/Magpie-Qwen2-Pro-200K-English | 200000 | 605.4±287.3, min=221, max=4267 | chat, sft, en | Magpie-Align/Magpie-Qwen2-Pro-200K-English | |
🔥qwen2-pro-filtered | AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered | 300000 | 555.8±286.6, min=148, max=4267 | chat, sft | Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered | |
🔥qwen2-pro-zh | AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese | 200000 | 446.2±246.4, min=74, max=4101 | chat, sft, zh | Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese | |
redpajama-data-1t | swift/RedPajama-Data-1T | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | togethercomputer/RedPajama-Data-1T | |
redpajama-data-v2 | swift/RedPajama-Data-V2 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | togethercomputer/RedPajama-Data-V2 | |
refinedweb | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | tiiuae/falcon-refinedweb | |
rwkv-pretrain-web | mapjack/openwebtext_dataset | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, zh, quality | - | |
sft-nectar | AI-ModelScope/SFT-Nectar | 131192 | 396.4±272.1, min=44, max=10732 | cot, en, quality | AstraMindAI/SFT-Nectar | |
skypile | AI-ModelScope/SkyPile-150B | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality, zh | Skywork/SkyPile-150B | |
slim-orca | swift/SlimOrca | 517982 | 399.1±370.2, min=35, max=8756 | quality, en | Open-Orca/SlimOrca | |
slim-pajama-627b | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | cerebras/SlimPajama-627B | |
starcoder | AI-ModelScope/starcoderdata | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | bigcode/starcoderdata | |
tagengo-gpt4 | swift/tagengo-gpt4 | 78057 | 472.3±292.9, min=22, max=3521 | chat, multi-lingual, quality | lightblue/tagengo-gpt4 | |
the-stack | AI-ModelScope/the-stack | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | bigcode/the-stack | |
ultrachat-200k | swift/ultrachat_200k | 207865 | 1195.4±573.7, min=76, max=4470 | chat, en, quality | HuggingFaceH4/ultrachat_200k | |
vqa-v2 | swift/VQAv2 | 443757 | 31.8±2.2, min=27, max=58 | en, vqa, quality | HuggingFaceM4/VQAv2 | |
web-instruct-sub | swift/WebInstructSub | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, en, math, quality, multi-domain, science | TIGER-Lab/WebInstructSub | |
wikipedia | swift/wikipedia | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | wikipedia | |
wikipedia-cn-filtered | AI-ModelScope/wikipedia-cn-20230720-filtered | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | pleisto/wikipedia-cn-20230720-filtered | |
zhihu-rlhf | AI-ModelScope/zhihu_rlhf_3k | 3460 | 594.5±365.9, min=31, max=1716 | rlhf, dpo, zh | liyucheng/zhihu_rlhf_3k |