-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#10415: add new full-tensor bidirectional mode to all-gather #10421
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SeanNijjar
force-pushed
the
snijjar/issue-8574-push
branch
3 times, most recently
from
July 19, 2024 20:35
8d40bbb
to
d7dcf36
Compare
SeanNijjar
requested review from
eyonland,
arakhmati,
cfjchu,
xanderchin,
TT-BrianLiu and
ayerofieiev-tt
as code owners
July 19, 2024 20:35
SeanNijjar
force-pushed
the
snijjar/issue-8574-push
branch
from
July 21, 2024 19:07
0219a91
to
73ed18f
Compare
arakhmati
approved these changes
Jul 22, 2024
The new bidirectional all-gather mode is being added as a prerequisite to all-gather + matmul fusion. In addition, this change also leads to performance improvements, particularly for smaller all-gathers because fewer end-to-end latencies add up for what tensor to be single packet per channel/per ring index. The new mode sends the full input tensor for a given tensor both directions around the ring, but only halfway around the ring in each direction. This is in contrast to the prior default mode (SPLIT_TENSOR) which would send half of the input tensor each direction, but the full way around the ring. This new mode is not enabled yet for sharded all-gather.
SeanNijjar
force-pushed
the
snijjar/issue-8574-push
branch
from
July 22, 2024 15:53
73ed18f
to
5d605ec
Compare
arakhmati
pushed a commit
that referenced
this pull request
Jul 23, 2024
The new bidirectional all-gather mode is being added as a prerequisite to all-gather + matmul fusion. In addition, this change also leads to performance improvements, particularly for smaller all-gathers because fewer end-to-end latencies add up for what tensor to be single packet per channel/per ring index. The new mode sends the full input tensor for a given tensor both directions around the ring, but only halfway around the ring in each direction. This is in contrast to the prior default mode (SPLIT_TENSOR) which would send half of the input tensor each direction, but the full way around the ring. This new mode is not enabled yet for sharded all-gather.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The new bidirectional all-gather mode is being added as a prerequisite to all-gather + matmul fusion. In addition, this change also leads to performance improvements, particularly for smaller all-gathers because fewer end-to-end latencies add up for what tensor to be single packet per channel/per ring index. Larger tensors see a performance degradation with this mode but this mode is only enabled for small tensors at the moment, otherwise it can be opted into.
The new mode sends the full input tensor for a given tensor both directions around the ring, but only halfway around the ring in each direction. This is in contrast to the prior default mode (SPLIT_TENSOR) which would send half of the input tensor each direction, but the full way around the ring.
This new mode is not enabled yet for sharded all-gather.
Ticket
Link to Github Issue
Problem description
When fusing all-gather matmul, the current bidirectional all-gather data movement scheme would require extremely tight coupling between matmul and all-gather. This alternative bidirectional approach removes that because full chunks can be passed to each matmul instead of weird slice shapes that depend on CCL worker schedule.
What's changed
Setup work for enabling all_gather + matmul fusion
Checklist