Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a DDP graph capture issue #8489

Merged
merged 3 commits into from
Dec 17, 2024
Merged

Conversation

yaochengji
Copy link
Contributor

@yaochengji yaochengji commented Dec 12, 2024

There're issues in torch/xla DDP. #8396. After investigation, one of the issues are caused by the incorrect graph capture of the combination of inplace-update and non aten op.

Changes:

  1. Modify TryGetXlaTensor to make sure we can always have the most updated version of tensors
  2. Modify the learning rate of ddp test from 1e-4 to 1e-1 as 1e-4 is too small to verify the DDP correctness
  3. Remove the gradient_as_bucket_view = True requirement in all the docs and code
  4. Add the test_inplace_update.py file, note that test_non_aten_op_after_partial_update cannot get passed before the TryGetXlaTensor patch

@yaochengji
Copy link
Contributor Author

cc @qihqi @tengyifei

@jeffhataws
Copy link
Collaborator

@tengyifei will you enable TPU testing on this PR?

@tengyifei
Copy link
Collaborator

@yaochengji could you force push a commit? I just added a tpuci label but we need to retrigger the jobs to run TPU tests.

Copy link
Collaborator

@tengyifei tengyifei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tengyifei tengyifei merged commit b1869a8 into pytorch:master Dec 17, 2024
12 checks passed
@yaochengji
Copy link
Contributor Author

@tengyifei thanks for your review. Do I still need to force push as it is already merged?

@tengyifei
Copy link
Collaborator

oops, i merged too soon :) looks like tip-of-tree build is green, so nothing needed on your part now!

@jeffhataws
Copy link
Collaborator

@yaochengji will you add this change to #8455 for cherry-pick to 2.6? Let me know if I can help.

@yaochengji
Copy link
Contributor Author

@jeffhataws it's added, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants