Update non_xla attention to properly support paged_attention dynamo code path #7022

wonjoolee95 · 2024-05-02T23:29:03Z

Update non_xla attention to properly support paged_attention dynamo code path
Fix the original broken dynamo unit tests with paged_attention

Test plan:

root@1fdc3324aeef:/pytorch/xla# python test/test_pallas.py PallasTest.test_paged_attention_wrapper_with_dynamo
.
----------------------------------------------------------------------
Ran 1 test in 1.798s

OK

+ TPU CI

…ode path

alanwaketan

LGTM.

alanwaketan · 2024-05-03T00:38:40Z

torch_xla/experimental/custom_kernel.py

-  attn_output = attn_weight @ v
-  return attn_output
+  # Return orignal shape of q.
+  return torch.empty_like(q)


Do you know if this actually initialize anything? I hope not.

yea it is worth checking out, in above we have a warning about the q should be on meta device, I think running ops on meta_tensor will not allocate any device memory.

According to https://pytorch.org/docs/stable/generated/torch.empty_like.html, it seems like emtpy_like returns an uninitialized tensor. Along with the meta tensor check above, I think we should be good.

wonjoolee95 · 2024-05-03T02:03:09Z

Thanks for the reviews, I'll go ahead and merge this as the CIs (including TPU CI) are all green.

wonjoolee95 added 2 commits May 2, 2024 23:25

Update non_xla attention to properly support paged_attention dynamo c…

1cd9adb

…ode path

Run linter

9d75282

wonjoolee95 requested review from alanwaketan and JackCaoG May 2, 2024 23:29

JackCaoG added the tpuci label May 2, 2024

JackCaoG approved these changes May 2, 2024

View reviewed changes

alanwaketan approved these changes May 3, 2024

View reviewed changes

wonjoolee95 merged commit 2bce3f8 into master May 3, 2024
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update non_xla attention to properly support paged_attention dynamo code path #7022

Update non_xla attention to properly support paged_attention dynamo code path #7022

wonjoolee95 commented May 2, 2024 •

edited

Loading

alanwaketan left a comment

alanwaketan May 3, 2024

JackCaoG May 3, 2024

wonjoolee95 May 3, 2024

wonjoolee95 commented May 3, 2024 •

edited

Loading

Update non_xla attention to properly support paged_attention dynamo code path #7022

Update non_xla attention to properly support paged_attention dynamo code path #7022

Conversation

wonjoolee95 commented May 2, 2024 • edited Loading

alanwaketan left a comment

Choose a reason for hiding this comment

alanwaketan May 3, 2024

Choose a reason for hiding this comment

JackCaoG May 3, 2024

Choose a reason for hiding this comment

wonjoolee95 May 3, 2024

Choose a reason for hiding this comment

wonjoolee95 commented May 3, 2024 • edited Loading

wonjoolee95 commented May 2, 2024 •

edited

Loading

wonjoolee95 commented May 3, 2024 •

edited

Loading