-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsdp refactoring #2177
fsdp refactoring #2177
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great change, love to see so many lines deleted.
I don't have experience with FSDP, so a few questions:
- Does this still work as expected when using PyTorch < 2.1?
use_orig_params
default was changed to True. Is there any disadvantage to that, e.g. more memory usage?
Hello Benjamin,
accelerate/src/accelerate/accelerator.py Lines 306 to 307 in 0e51680
It is expected to become default as per the above dev blogpost:
|
Ah okay, I missed the version bump, thanks for pointing me to it.
Thanks for providing more context. My question arose because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @pacman100! Excellent refactor and loving that diff. Keeping the simplistic API all around is a phenomenal win!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks Sourab.
What does this PR do?
FSDP refactogin based on:
use_orig_params=True
, we no longer require preparingmodel
before creating optimizer object. Earlier, we needed to prepare model, i.e., wrap the model with FSDP before creating optimizer object because of below warning from PyTorch official docs:Now, with
use_orig_params=True
, it is no longer the case. This makes the Accelerate training APi consistent, i.e., users using single GPU, DDP, FSDP, DeepSpeed now need to follow the same logic as below:Earlier, for FSDP, the recommended practice was shown as below. Else we used to receate the optimizer post preparing the model and it didn't preserve optimizer groups. Now, all that is resolved. Now, optimizer groups are also supported.
As such,
use_orig_params=True
is now the default.FULL_STATE_DICT
andSHARDED_STATE_DICT
. We are also supporting both of these and already have tests for it. They don't show how to save and load forLOCAL_STATE_DICT
state dict type.LOCAL_STATE_DICT
checkpointing feature of FSDP is now failing. Couldn't find anything about it in llama recipes, FSDP documentation, torch FSDP codebase https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp as well as on the internet. Will raise an issue with PyTorch team regarding it.