Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#0: Remove hardcoded grid width in all_gather and skip test_sharded_matmul test when the device grid size is too small #16315

Merged
merged 1 commit into from
Dec 26, 2024

Conversation

tt-aho
Copy link
Contributor

@tt-aho tt-aho commented Dec 26, 2024

Ticket

Link to Github Issue

Problem description

All gather had hardcoded device width of 8 but should derive from device grid size.
Sharded MM unit test did not check if the shard config actually fits on the device grid.

What's changed

Change all gather to pull width from device.
Skip sharded MM test if device grid is too small.

Checklist

…atmul test when the device grid size is too small
Copy link
Contributor

@SeanNijjar SeanNijjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Lgtm. You're going to also want to run TG frequent/nightly, t3k frequent, t4k and model perf.

Copy link
Contributor

@blozano-tt blozano-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tt-aho tt-aho merged commit 9879b37 into main Dec 26, 2024
42 of 44 checks passed
@tt-aho tt-aho deleted the aho/ci branch December 26, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants