Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] Failed running call_method movedim #1235

Open
RedRAINXXXX opened this issue Oct 10, 2024 · 2 comments
Open

[PyTorch] Failed running call_method movedim #1235

RedRAINXXXX opened this issue Oct 10, 2024 · 2 comments

Comments

@RedRAINXXXX
Copy link

Version
Name: flash-attn
Version: 2.6.3

Name: transformer-engine
Version: 1.11.0+c27ee60

Name: flashattn-hopper
Version: 3.0.0b1

Bug report
The bug occurs in this function:

@jit_fuser
def flash_attn_fwd_out_correction(out, out_per_step, seq_dim, softmax_lse, softmax_lse_per_step):
"""Merge partial outputs of each step in Attention with context parallelism"""
softmax_lse_corrected_exp = torch.exp(softmax_lse_per_step - softmax_lse).movedim(2, seq_dim)
softmax_lse_corrected_exp = softmax_lse_corrected_exp.unsqueeze(-1)
out_corrected = out_per_step * softmax_lse_corrected_exp
out.add_(out_corrected)

The final tracing info is:
torch._dynamo.exc.TorchRuntimeError: Failed running call_method movedim(*(FakeTensor(..., device='cuda:4', size=(36, 13056)), 2, 1), **{}):

@ksivaman
Copy link
Member

ksivaman commented Oct 10, 2024

Which PyTorch version are you using? I'm wondering if this is related to #1217, could you try that fix?

@RedRAINXXXX
Copy link
Author

RedRAINXXXX commented Oct 11, 2024

My torch version is:
Version: 2.4.1+cu124

I've tried this (setting NVTE_TORCH_COMPILE to 1) but it didn't work, and I tried removing the @jit_fuser decorator but it also didn't work. :(

The size of softmax_lse and softmax_lse_per_step is (36, 13056), and this will cause a bug when calling movedim(2, seq_dim).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants