-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable FP32 Accumulate in Flash Attention and Flash Decode #13364
Labels
flash-attention
flash-decode
kernels
kernels, such as hlks or llks or below
llama3
models
Models that run in tt-metal
P1
Comments
caixunshiren
added
kernels
kernels, such as hlks or llks or below
models
Models that run in tt-metal
P1
llama3
labels
Oct 2, 2024
19 tasks
caixunshiren
added a commit
that referenced
this issue
Oct 16, 2024
Update:
|
This was referenced Oct 16, 2024
Closed
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
caixunshiren
added a commit
that referenced
this issue
Oct 18, 2024
ct-clmsn
pushed a commit
to ct-clmsn/tt-metal
that referenced
this issue
Nov 12, 2024
ct-clmsn
pushed a commit
to ct-clmsn/tt-metal
that referenced
this issue
Nov 12, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
flash-attention
flash-decode
kernels
kernels, such as hlks or llks or below
llama3
models
Models that run in tt-metal
P1
Description
We do not have support for fp32 accumulate in sdpa family kernels. This becomes a problem when number of chunks gets large and we see diverging pcc from ground truth. For models that requires 128K sequel, this is problematic.
This issue tracks the enabling of fp32 accumulate in the following kernels:
round 1:
round 2:
FYI @cglagovichTT
The text was updated successfully, but these errors were encountered: