多卡微调时提示显存不足/ CUDA Out of Memory Error #1473
Unanswered
Chenhong-Zhang
asked this question in
Q&A
Replies: 2 comments
-
跟你同样的问题,也是lora训练的时候报cuda oom,但是提示信息里面显示需要171 T 的内存,不知道是哪里配置问题,用swift可以进行微调,但是还没找到合并成新模型,并且能够进行推理的方案 |
Beta Was this translation helpful? Give feedback.
0 replies
-
改用Ptuning 微调,同时在Main函数里增加模型量化,可以进行微调 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用了8*RTX 3080 进行训练,每张卡有10GB显存。
调用以下微调命令:
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune_hf.py /home/usr/IRE/data /home/usr/IRE/chatglm3-6b configs/lora.yaml configs/ds_zero_2.json
报错提示显存不足:
观察Traceback发现是在模型to device过程出现了错误:
Beta Was this translation helpful? Give feedback.
All reactions