posts/2024/importing-yi9b-to-ollama/ #8

2024-04-20T15:20:42Z

giscus[bot]
bot Apr 20, 2024

posts/2024/importing-yi9b-to-ollama/

The log of importing Yi-9B LLM model to Ollama library.

https://shinyzhu.com/posts/2024/importing-yi9b-to-ollama/

bluryar · 2024-04-20T15:20:44Z

bluryar
Apr 20, 2024 — with giscus

我是LLM领域的菜鸟，也从未接错过机器学习、深度学习这些内容。

最近，我开始尝试了解LLM以及如何微调他们。过程中，我得知这些大模型通常会以 xxx-Chat 或 xxx-Instruct 命名来表示大模型接受了指令监督微调（SFT），它们被称为聊天模型，而那些不带后缀或者以 xxx-Base 命名的模型则是基础模型，它们只会补全内容而不会回答你。

通过搜索，我得知需要对它进行SFT微调，我已经完成了这方面的工作，只不过模型没上传到HuggingFace等公共平台上，因为我发现的使用LLaMA-Factory进行LoRA微调时，学习率设置太大导致模型训练效果没有收敛。

效果如何？坦率的讲，效果很差，但可以像和ChatGPT对话那样进行交流了，不过有些时候它还是表现的像个Base模型那样自说自话。我注意到你最新更新的版本已经可以完成对话了，不知道你后面是否还进行了其他工作？期待您的指点。

下面是我的脚本：

# SFT微调, 让模型可以进行Chat任务
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train True \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template yi \
    --dataset_dir data \
    --dataset belle_2m \
    --cutoff_len 1024 \
    --learning_rate 0.0002 \
    --num_train_epochs 3.0 \
    --max_samples 20000 \
    --per_device_train_batch_size 6 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 50 \
    --neftune_noise_alpha 5 \
    --optim adamw_torch \
    --packing True \
    --report_to none \
    --output_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora \
    --fp16 True \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.1 \
    --lora_target q_proj,v_proj \
    --plot_loss True 


# 命令行试用模型, 用于测试模型是否可以正常工作
CUDA_VISIBLE_DEVICES=0 python  src/cli_demo.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --quantization_bit 4 \
    --finetuning_type lora

# 对模型进行评分, 执行失败, A10 的显存不足
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --quantization_bit 4 \
    --finetuning_type lora \
    --task mmlu \
    --split test \
    --lang zh \
    --n_shot 5 \
    --batch_size 4

# 对模型进行评分, 执行成功, 效果如下
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path  saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template yi \
    --dataset_dir data \
    --dataset alpaca_gpt4_zh \
    --cutoff_len 1024 \
    --max_samples 2000 \
    --per_device_eval_batch_size 16 \
    --predict_with_generate True \
    --max_new_tokens 128 \
    --top_p 0.7 \
    --temperature 0.95 \
    --output_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --do_predict True

***** predict metrics *****
  predict_bleu-4             =    12.0712
  predict_rouge-1            =     34.153
  predict_rouge-2            =     12.641
  predict_rouge-l            =    23.7601
  predict_runtime            = 0:38:24.18
  predict_samples_per_second =      0.868
  predict_steps_per_second   =      0.054



# 合并模型
# DO NOT use quantized model or quantization_bit when merging lora weights
CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
    --model_name_or_path /mnt/workspace/LLaMA-Factory/Yi-9B-200K \
    --adapter_name_or_path saves/Yi-9B/lora/yi-9b-200k-chat-lora/ \
    --template yi \
    --finetuning_type lora \
    --export_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora/models \
    --export_size 4 \
    --export_legacy_format False


# 对模型进行GPTQ 4bit量化, 执行失败, 显存不足
#!/bin/bash
CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
    --model_name_or_path saves/Yi-9B/lora/yi-9b-200k-chat-lora/models \
    --template yi \
    --export_dir saves/Yi-9B/lora/yi-9b-200k-chat-lora-int4/models \
    --export_quantization_bit 4 \
    --export_quantization_dataset data/c4_demo.json \
    --export_size 1 \
    --export_legacy_format False

1 reply

shinyzhu Apr 22, 2024
Maintainer

Great work you've done!

我注意到你最新更新的版本已经可以完成对话了，不知道你后面是否还进行了其他工作？

这是一个有意思的问题。我并没有对 Yi-9B 做任何改动和微调（我还在学如何微调），只是按照 Ollama 的导入模型流程进行了操作，在创建模型的时候，使用了跟其他 Yi 模型相同的模板（TEMPLATE）和 stop 参数：

FROM quantized.bin
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
SYSTEM """
You are a helpful and powerful assistant. Respond to user's input carefully.
"""

我觉得可能是 Ollama 的能力让它能够“对话”，不过还没有深入研究。希望有了解的伙伴可以分享一下：）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posts/2024/importing-yi9b-to-ollama/ #8

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

posts/2024/importing-yi9b-to-ollama/ #8

giscus[bot] bot Apr 20, 2024

posts/2024/importing-yi9b-to-ollama/

Replies: 1 comment · 1 reply

bluryar Apr 20, 2024 — with giscus

shinyzhu Apr 22, 2024 Maintainer

giscus[bot]
bot Apr 20, 2024

Replies: 1 comment 1 reply

bluryar
Apr 20, 2024 — with giscus

shinyzhu Apr 22, 2024
Maintainer