Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about 'attention bias not supported for flash attention' #230

Open
amitaie opened this issue Sep 14, 2023 · 2 comments
Open

Question about 'attention bias not supported for flash attention' #230

amitaie opened this issue Sep 14, 2023 · 2 comments

Comments

@amitaie
Copy link

amitaie commented Sep 14, 2023

assert not exists(attn_bias), 'attention bias not supported for flash attention'

Why not to add the bias and the mask and create an attn_mask of type float and supply it to the scaled_dot_product_attention as attn_mask? is that not the same as we do where not using flash attention?

@amitaie amitaie changed the title Qusetion about 'attention bias not supported for flash attention' Question about 'attention bias not supported for flash attention' Sep 14, 2023
@amitaie
Copy link
Author

amitaie commented Sep 14, 2023

actually I see that in the SoundStrom repo you started to do something like that:

https://github.com/lucidrains/soundstorm-pytorch/blob/22d257d6b5241583e84619b7af6a634158aba426/soundstorm_pytorch/attend.py#L96-L99

but yo left the assert there and also i didn't understand why you divide the value by 2?

@lucidrains
Copy link
Owner

@amitaie it is not possible to get bias gradients from flash attention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants