-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync states for npu fsdp #2093
Sync states for npu fsdp #2093
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jq460494839 for adding this, LGTM!
cc @pacman100, @muellerzr and @BenjaminBossan, please take a look as this PR, thanks :) |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@jq460494839 The import is not sorted correctly, could you please fix that? |
fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating
Hi @muellerzr would mind having a look at this PR? It links to #1777 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LG2M bar some nits.
src/accelerate/utils/dataclasses.py
Outdated
elif is_xpu_available(): | ||
device = torch.xpu.current_device() | ||
else: | ||
raise RuntimeError("There are currently no available device found in ['XPU', 'CUDA', 'NPU']!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError("There are currently no available device found in ['XPU', 'CUDA', 'NPU']!") | |
raise RuntimeError("There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed it.
Can you run |
Adds sync state for fsdp loading on rank 0 -> broadcasted to other npu ranks.
cc @pacman100 @statelesshz
Fixes #2085 Use FastChat with NPU fine-tune LLM get an AssertionError