-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor test_random
to minimize collective calls
#1677
Conversation
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
test_random
to minimize collective calls
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1677 +/- ##
=======================================
Coverage 92.13% 92.13%
=======================================
Files 83 83
Lines 12165 12173 +8
=======================================
+ Hits 11208 11216 +8
Misses 957 957
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. You skipped the median tests. Is this intentional?
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Yes, I skipped the |
Thank you for the PR! |
* debugging * fix misinterpretation of dtype * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * replace numpy() calls with alternative checks * debugging * debugging * debugging randint * debugging * cast ints to float in statistical ops * bypass numpy call l. 197 * bypass more numpy calls, skip median checks * bypass more numpy calls, skip median checks * bypass numpy calls wherever possible * reinstate median checks * skip ht.median if split>0 * skip all ht.median * Revert "skip all ht.median" This reverts commit 1241454. * Revert "skip ht.median if split>0" This reverts commit 4da8c93. * Revert "reinstate median checks" This reverts commit bf50914. (cherry picked from commit 4b3e570)
Successfully created backport PR for |
* debugging * fix misinterpretation of dtype * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * replace numpy() calls with alternative checks * debugging * debugging * debugging randint * debugging * cast ints to float in statistical ops * bypass numpy call l. 197 * bypass more numpy calls, skip median checks * bypass more numpy calls, skip median checks * bypass numpy calls wherever possible * reinstate median checks * skip ht.median if split>0 * skip all ht.median * Revert "skip all ht.median" This reverts commit 1241454. * Revert "skip ht.median if split>0" This reverts commit 4da8c93. * Revert "reinstate median checks" This reverts commit bf50914. (cherry picked from commit 4b3e570) Co-authored-by: Claudia Comito <[email protected]>
Due Diligence
Description
test_random
has been giving us problems in connection to.numpy()
calls (aka Allgather/Allgatherv and copying to CPU) before.As far as I can tell, it isn't any particular instance of "allgathering" that doesn't work. On the AMD runner (2-process GPU tests), since this Monday,
test_random
has been failing consistently around the 10th numpy() call in the module.I have refactored
test_random
to gather and copy only when absolutely necessary. It now gathers/copies to CPU only 8 times, as opposed to 47 in the legacy implementation.Issue/s resolved: #1682
Changes proposed:
Type of change
Bug fix (non-breaking change which fixes an issue)
Memory requirements
NA
Performance
NA
Does this change modify the behaviour of other functions? If so, which?
no