Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调qwen爆内存 #10

Open
delian11 opened this issue Jun 27, 2024 · 3 comments
Open

微调qwen爆内存 #10

delian11 opened this issue Jun 27, 2024 · 3 comments

Comments

@delian11
Copy link

delian11 commented Jun 27, 2024

您好,使用原始代码在2张A100 80G上面微调qwen,显存占用两张卡上都只有919M,但是在数据加载过程中?内存占用一直在增加,直到180多G后内存爆了,程序终止。请问这个问题怎么解?
训练log:
image

内存占用:
image

@TobiasLee
Copy link

多大的qwen?

@delian11
Copy link
Author

多大的qwen?

qwen-vl, 7b

@TobiasLee
Copy link

bsz 可以调一下?他的词表有 100k 左右所以最后的activation很大,bsz=1 看看能不能跑起来吧,我记得 80G 是可以跑到 per_device_batch_size=4 的,然后调 gradient_accumulation_step 来保证 global_batch_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants