Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync states for npu fsdp #2093

Closed
wants to merge 0 commits into from
Closed

Sync states for npu fsdp #2093

wants to merge 0 commits into from

Conversation

jq460494839
Copy link
Contributor

Adds sync state for fsdp loading on rank 0 -> broadcasted to other npu ranks.

cc @pacman100 @statelesshz

Fixes #2085 Use FastChat with NPU fine-tune LLM get an AssertionError

Copy link
Contributor

@statelesshz statelesshz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jq460494839 for adding this, LGTM!

@statelesshz
Copy link
Contributor

cc @pacman100, @muellerzr and @BenjaminBossan, please take a look as this PR, thanks :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@BenjaminBossan
Copy link
Member

@jq460494839 The import is not sorted correctly, could you please fix that?

@jq460494839
Copy link
Contributor Author

@jq460494839 The import is not sorted correctly, could you please fix that?

fixed.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

Copy link
Contributor

@statelesshz statelesshz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating

@statelesshz
Copy link
Contributor

Hi @muellerzr would mind having a look at this PR? It links to #1777

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LG2M bar some nits.

elif is_xpu_available():
device = torch.xpu.current_device()
else:
raise RuntimeError("There are currently no available device found in ['XPU', 'CUDA', 'NPU']!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError("There are currently no available device found in ['XPU', 'CUDA', 'NPU']!")
raise RuntimeError("There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed it.

@muellerzr
Copy link
Collaborator

Can you run make style; make quality? This should fix the failing test. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use FastChat with NPU fine-tune LLM get an AssertionError
5 participants