-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with Multi-GPU peft Reward Training #480
Comments
We have the same problem. |
Hi @mnoukhov @Receiling |
Gently pinging @younesbelkada if there's any update on the For context, the Llama 2 paper shows that training large reward models is an important ingredient in RLHF and enabling the |
No positive results yet, but new negative result that this isn't related to #728. Using that fix, I still encounter the problem. I also checked that the fix from huggingface/peft#899 doesn't solve it, although that issue also prevents multi gpu reward training since we need to use |
I'm getting the same error with |
Hi all, https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py But I am giving the TrainingArguments a path to ds_config.json where I specify
Since I encountered incompatible dtype issues, used the workaround mentioned here: I run the whole thing with I enabled gradient checkpointing to prevent OOM error:
I also had to set
To avoid this warning: Here is my logging now:
|
I have also found that |
Which models/setup are you using? |
I am using EDIT: I misunderstood "unused parameters". As long as the frozen parameters are used in the backward computation, they are fine. So if the model has two peft adapters, then there could be unused parameters, but not if there's just one. After looking into this, I am mostly convinced that DDP does not work with gradient checkpointing when there are unused parameters in the forward computation or two forward passes. This means you should not use gradient checkpointing with This is a note from the Pytorch docs
two I think a next step is to check whether DDP training with |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I think this issue is resolved by #912 right @younesbelkada ? |
Yes indeed! if you use latest releases from transformers, trl and peft, simply pass |
I tested this and it is resolved by #912, thank you @younesbelkada ! Sorry I didn't do this myself earlier. I tried doing the |
Awesome that it worked @mnoukhov , thanks! |
Same error here, set |
There is an issue when you combine all four:
peft
quantizationThis is reproducible if you correctly enable gradient checkpointing in
examples/multi-adapter-rl
as shown in PR #479 and then run in a multi-gpu setupaccelerate launch --multi_gpu reward_modeling.py --gradient_checkpointing True
you will receive the error
With
TORCH_DISTRIBUTED_DEBUG=DETAIL
, we find the affected parameter is a LoRA parameter. It is not related to pytorch/pytorch#60844 becausefind_unused_parameters
is set to False.This is likely a problem between
peft
andaccelerate
/ddp
but I'm putting the issue here because it affectsRewardTrainer
and quantization + multi gpu + gradient checkpointing are a common combinationThe text was updated successfully, but these errors were encountered: