-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a DDP graph capture issue #8489
Conversation
cc @qihqi @tengyifei |
0352fb3
to
31ca070
Compare
1b66afe
to
5b6917e
Compare
@tengyifei will you enable TPU testing on this PR? |
@yaochengji could you force push a commit? I just added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@tengyifei thanks for your review. Do I still need to force push as it is already merged? |
oops, i merged too soon :) looks like tip-of-tree build is green, so nothing needed on your part now! |
@yaochengji will you add this change to #8455 for cherry-pick to 2.6? Let me know if I can help. |
@jeffhataws it's added, thanks. |
There're issues in torch/xla DDP. #8396. After investigation, one of the issues are caused by the incorrect graph capture of the combination of inplace-update and non aten op.
Changes:
TryGetXlaTensor
to make sure we can always have the most updated version of tensorsgradient_as_bucket_view = True
requirement in all the docs and codetest_non_aten_op_after_partial_update
cannot get passed before theTryGetXlaTensor
patch