-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix warmup lr when using deepspeed #2125
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@Eikor can you please do |
I don't think this should be the case. Let me deep dive into this. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Was there a reason this was never merged, @pacman100 and @muellerzr? Pretty sure it's still happening on
|
What does this PR do?
Fixes #2124
Current DeepSpeedEngineWrapper.backwards() calls DeepSpeedEngine._take_model_step(), which performs optimizer step first, followed by lr_scheduler step.
accelerate/src/accelerate/utils/deepspeed.py
Line 176 in 4f10031
However, the DS optimizer will be initialized with the maximum learning rate in the prepare function, the optimizer will update model parameters with the maximum learning rate at the first step, which will cause an unexpected behavior.
accelerate/src/accelerate/accelerator.py
Line 1665 in 4f10031
This PR solves this problem by initializing the optimizer with warmup_min_lr if the warmup lr scheduler is activated.
@pacman100