Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama训练时,best状态存储导致训练卡顿,建议删除存储best文件部分代码,望记得更新 #52

Open
baketbek opened this issue Apr 16, 2023 · 4 comments

Comments

@baketbek
Copy link

No description provided.

@jamestch
Copy link

jamestch commented Apr 16, 2023

大佬好,我在pretrain的时候也碰到了训练卡顿的情况,但不知道啥原因。请问是如何分析确定是存储best的部分代码造成卡顿呢?

@baketbek
Copy link
Author

大佬好,我在pretrain的时候也碰到了训练卡顿的情况,但不知道啥原因。请问是如何分析确定是存储best的部分代码造成卡顿呢?

你看一下你是多少step存储,然后刚好那个步骤日志显示 saving best 以后 就不训练了,就是这个问题,欢迎加微信沟通,437461219

@jiangjingyao
Copy link

你好,你训练完后文件有多大,我的很小,这是我的执行代码
python pretrain.py --pretrained_model_path models/llama-7b.bin --dataset_path dataset.pt --spm_model_path ../llama.cpp-master/zh-models/tokenizer.model --config_path models/llama/7b_config.json --output_model_path models/llama_zh_7b.bin --world_size 1 --gpu_ranks 0 --data_processor lm --total_steps 100 --save_checkpoint_steps 50 --batch_size 24 --use_lora --lora_dropout 0.0 --vocab_path models/google_zh_vocab.txt

@zhanghaok
Copy link

你好,你训练完后文件有多大,我的很小,这是我的执行代码 python pretrain.py --pretrained_model_path models/llama-7b.bin --dataset_path dataset.pt --spm_model_path ../llama.cpp-master/zh-models/tokenizer.model --config_path models/llama/7b_config.json --output_model_path models/llama_zh_7b.bin --world_size 1 --gpu_ranks 0 --data_processor lm --total_steps 100 --save_checkpoint_steps 50 --batch_size 24 --use_lora --lora_dropout 0.0 --vocab_path models/google_zh_vocab.txt

你的训练代码中出现了这个参数--vocab_path models/google_zh_vocab.txt请问这个可是我在代码中没有发现这个参数啊,请问是怎么回事呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants