Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process group documentation for SPMD #6469

Merged
merged 1 commit into from
Feb 5, 2024
Merged

Add process group documentation for SPMD #6469

merged 1 commit into from
Feb 5, 2024

Conversation

jonb377
Copy link
Collaborator

@jonb377 jonb377 commented Feb 5, 2024

As pointed out in #6465, our documentation is missing discussion of how to initialize the process group in SPMD execution mode.

A process group is required for distributed checkpointing and can be used with various other torch.distributed APIs. In SPMD, we don't allow process groups on the XLA backend, since the compiler is responsible for controlling the on-device collectives.

@jonb377 jonb377 requested a review from yeounoh February 5, 2024 19:10
@jonb377 jonb377 self-assigned this Feb 5, 2024
@yeounoh
Copy link
Contributor

yeounoh commented Feb 5, 2024

cc @vanbasten23 , you might have already done, did we add the SPMD + GPU documentqtaion/section as well?

Copy link
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@jonb377 jonb377 merged commit 732a1c7 into master Feb 5, 2024
17 of 18 checks passed
@jonb377 jonb377 deleted the jonbolin/pg branch February 5, 2024 21:22
@vanbasten23
Copy link
Collaborator

cc @vanbasten23 , you might have already done, did we add the SPMD + GPU documentqtaion/section as well?

Good call. I haven't done that yet but let me add some tmr.

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants