-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for auto packing ratio #683
Conversation
206000d
to
36dc1a0
Compare
2bf4366
to
d48fb97
Compare
bb27d78
to
aeffb4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding this!!
Will hold off on approval so this can get another set of eyes!
Also left a couple comments (nothing major) along with a suggestion for an alternative way to search for the optimal packing ratio. I think re-using my brute force approach might not be the best way to go. If you agree, that will require more of an overhaul of this code, unfortunately. But potentially better to do that than simply inherit my hasty decision :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Two more manual tests I would like to see before merging:
(1) a short training run without packing, before and after this PR. These should be identical.
(2) a short training run with a set packing ratio, before and after this PR. These should be identical.
6cc7bc5
to
f899b69
Compare
f899b69
to
d88cdcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one determinism related comment
Co-authored-by: Daniel King <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one more question relating to the randomness
Adds 'auto' packing ratio for finetuning.
If 'auto' is specified, packing will be profiled for the dataloader configuration across a maximum of 20 packing ratios until the highest packing ratio with 0 waste is found.
If there are multiple ranks, we take the minimum 'auto' packing ratio across all ranks.
Testing
- before: finetune-auto-pack-test-baseline2-zptNpN , after: finetune-ratio-no-pack-branch-test-8N5I1y
- before: finetune-ratio-3-pack-test-kW8gp6 , after: finetune-ratio-3-pack-with-auto-test-YVP00E