-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only write to HBM at the last iteration. #8393
Conversation
The TPU CI failure seems to be irrelevant to the PR:
I run the test |
TPU CI failure should be resolved if you rebase, I disabled that test for now |
1c17a71
to
d52c6f2
Compare
Thanks Jack for the info! |
The TPU test failure is very strange. On my TPU v4, The failing test succeeded on my v5e VM though which uses an older version of torch and torch_xla:
|
Seems the error is due to OOM despite the confusing error message even with |
Test plan: root@t1v-n-f3643994-w-0:/workspaces/persist# python pytorch/xla/test/test_tpu_paged_attention_kernel.py 2>&1 | tee ~/out.txt