Include token as part of the input/output tuple in all-gather and reduce-scatter #7338

jeffhataws · 2023-11-28T19:46:14Z

(Replaced by #7677)

This is a follow up change to #5740 to include token as part of the input/output tuple in all-gather and reduce-scatter. The change helps clean up the interface to be similar to all-reduce and also enable to use to XLA token type. Per discussions in the previous PR we will open RFC discussion on openxla-discuss.

…tter Imported from GitHub PR openxla#5740 This PR adds tuple input support to all-gather and reduce-scatter. This is a revival of part of tensorflow/tensorflow#58377 and to be used in conjunction with pytorch/xla#5624 . In FSDP, different layers' weights need to be all-gathered/reduced-scatter during training. If some layers are small, multiple layers' weights can be aggregated for more efficient data transfer (same concept as bucket_cap_mb in DDP). With existing all-gather and reduce-scatter in PyTorch-XLA, you would have to do the bucketing and decomposing outside of the operation. This PR enables multiple different tensors to be all-gathered/reduce-scatter, keeping the original tensor shapes to enable bucketing and decomposing optimizations inside the operation. Original PR has token support like the token used for allreduce to ensure order between CCops. That will be separate PR if needed. Copybara import of the project: -- 7ea1159 by Junmin Hao <[email protected]>: Add Tuple input and token support to all-gather and reduce-scatter. Committer: Junmin Hao <[email protected]> -- cdb873e by Junmin Hao <[email protected]>: lint fix -- aad3521 by Jeffrey Huynh <[email protected]>: Fix hlo_verifier_test failure due to changed expectation -- 32e8145 by Jeffrey Huynh <[email protected]>: Separate the token change out into a separate PR with RFC. -- b301c2a by Jeffrey Huynh <[email protected]>: Change *WithToken tests to *WithTuple -- 5890278 by Jeffrey Huynh <[email protected]>: Fix missing parenthesis Merging this change closes openxla#5740 COPYBARA_INTEGRATE_REVIEW=openxla#5740 from jeffhataws:ag_rs_coalesce_revived 14e09f0 PiperOrigin-RevId: 573976449

This reverts commit 5890278.

This reverts commit b301c2a.

This reverts commit 32e8145.

ddunl · 2023-11-29T21:13:25Z

I think if you rebase the CI will run as expected, sorry about that

kamaljeeti · 2023-12-05T06:23:10Z

Hi @jeffhataws, Can you please check @ddunl comments? Thanks.

jeffhataws · 2023-12-11T19:45:31Z

Hi @jeffhataws, Can you please check @ddunl comments? Thanks.

Rebased in another branch, so switching to #7677 .

jeffhataws · 2023-12-11T19:45:53Z

Switching to #7677

jeffhataws added 4 commits November 19, 2023 07:02

Revert "Fix missing parenthesis"

e8bdf87

This reverts commit 5890278.

Revert "Change *WithToken tests to *WithTuple"

df942e0

This reverts commit b301c2a.

Revert "Separate the token change out into a separate PR with RFC."

430c067

This reverts commit 32e8145.

github-actions bot added the kokoro:force-run Forces CI to rerun label Nov 28, 2023

github-actions bot assigned kamaljeeti and xla-rotation Nov 28, 2023

ddunl added kokoro:force-run Forces CI to rerun and removed kokoro:force-run Forces CI to rerun labels Nov 28, 2023

kamaljeeti requested a review from ezhulenev November 29, 2023 07:08

jeffhataws closed this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include token as part of the input/output tuple in all-gather and reduce-scatter #7338

Include token as part of the input/output tuple in all-gather and reduce-scatter #7338

jeffhataws commented Nov 28, 2023 •

edited

Loading

ddunl commented Nov 29, 2023

kamaljeeti commented Dec 5, 2023

jeffhataws commented Dec 11, 2023

jeffhataws commented Dec 11, 2023

Include token as part of the input/output tuple in all-gather and reduce-scatter #7338

Include token as part of the input/output tuple in all-gather and reduce-scatter #7338

Conversation

jeffhataws commented Nov 28, 2023 • edited Loading

ddunl commented Nov 29, 2023

kamaljeeti commented Dec 5, 2023

jeffhataws commented Dec 11, 2023

jeffhataws commented Dec 11, 2023

jeffhataws commented Nov 28, 2023 •

edited

Loading