Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#10415: add new full-tensor bidirectional mode to all-gather #10421

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

SeanNijjar
Copy link
Contributor

@SeanNijjar SeanNijjar commented Jul 17, 2024

The new bidirectional all-gather mode is being added as a prerequisite to all-gather + matmul fusion. In addition, this change also leads to performance improvements, particularly for smaller all-gathers because fewer end-to-end latencies add up for what tensor to be single packet per channel/per ring index. Larger tensors see a performance degradation with this mode but this mode is only enabled for small tensors at the moment, otherwise it can be opted into.

The new mode sends the full input tensor for a given tensor both directions around the ring, but only halfway around the ring in each direction. This is in contrast to the prior default mode (SPLIT_TENSOR) which would send half of the input tensor each direction, but the full way around the ring.

This new mode is not enabled yet for sharded all-gather.

Ticket

Link to Github Issue

Problem description

When fusing all-gather matmul, the current bidirectional all-gather data movement scheme would require extremely tight coupling between matmul and all-gather. This alternative bidirectional approach removes that because full chunks can be passed to each matmul instead of weird slice shapes that depend on CCL worker schedule.

What's changed

Setup work for enabling all_gather + matmul fusion

Checklist

@SeanNijjar SeanNijjar force-pushed the snijjar/issue-8574-push branch 3 times, most recently from 8d40bbb to d7dcf36 Compare July 19, 2024 20:35
@SeanNijjar SeanNijjar force-pushed the snijjar/issue-8574-push branch from 0219a91 to 73ed18f Compare July 21, 2024 19:07
The new bidirectional all-gather mode is being added as a prerequisite
to all-gather + matmul fusion. In addition, this change also leads to
performance improvements, particularly for smaller all-gathers because
fewer end-to-end latencies add up for what tensor to be single packet
per channel/per ring index.

The new mode sends the full input tensor for a given tensor both
directions around the ring, but only halfway around the ring in each
direction. This is in contrast to the prior default mode (SPLIT_TENSOR)
which would send half of the input tensor each direction, but the full
way around the ring.

This new mode is not enabled yet for sharded all-gather.
@SeanNijjar SeanNijjar force-pushed the snijjar/issue-8574-push branch from 73ed18f to 5d605ec Compare July 22, 2024 15:53
@SeanNijjar SeanNijjar merged commit 4e4eedc into main Jul 22, 2024
5 checks passed
@SeanNijjar SeanNijjar deleted the snijjar/issue-8574-push branch July 22, 2024 15:56
arakhmati pushed a commit that referenced this pull request Jul 23, 2024
The new bidirectional all-gather mode is being added as a prerequisite
to all-gather + matmul fusion. In addition, this change also leads to
performance improvements, particularly for smaller all-gathers because
fewer end-to-end latencies add up for what tensor to be single packet
per channel/per ring index.

The new mode sends the full input tensor for a given tensor both
directions around the ring, but only halfway around the ring in each
direction. This is in contrast to the prior default mode (SPLIT_TENSOR)
which would send half of the input tensor each direction, but the full
way around the ring.

This new mode is not enabled yet for sharded all-gather.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants