-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in Blackhole didt stress workload #12608
Comments
While doing the wormhole didt testing, we also had segfaults which stopped upon rebasing to a newer version of main. This happened ~1 month aho, and didn’t see them since. |
After rebasing to main |
Adding
|
Pushed update to |
Closing this issue because syseng confirmed that the workload is now passing for 100k iterations after some changes to voltage regulator configurations were reverted and then re-applied. |
on
abhullar/didt-mm
syseng report segfault when bumping up to 50k iterations for:pytest models/experimental/falcon_7b/tests/test_reproduce_hang_matmul.py -k ff1-hang
Ran this on yyzo-bh-05 and yyzo-bh-06 for 100k iterations multiple times without any segfault
The text was updated successfully, but these errors were encountered: