-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMPORTANT:The definition of the softmax one is wrong #5
Comments
我不明白您为什么要这么做,可以进一步解释怎么从该数学公式得到将1改为 |
@Devil-SX The reason is that in the original formulation when you subtract the max, it cannot give more than 0.5 attention to the attention sink. But if you replace the 1 with e^c you get: |
I agree. The definition of softmax_1 is wrong here. I worked with Evan Miller on this, and saw many people make the same mistake. I implemented the correct version in Flash Attention here: https://github.com/softmax1/Flash-Attention-Softmax-N |
The correct definition of softmax one would be:
请参考下面的数学公式来重新考虑:
Upvote & Fund
The text was updated successfully, but these errors were encountered: