Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPORTANT:The definition of the softmax one is wrong #5

Open
PhilIp-L-Good opened this issue Aug 3, 2023 · 4 comments
Open

IMPORTANT:The definition of the softmax one is wrong #5

PhilIp-L-Good opened this issue Aug 3, 2023 · 4 comments

Comments

@PhilIp-L-Good
Copy link

PhilIp-L-Good commented Aug 3, 2023

The correct definition of softmax one would be:
image

请参考下面的数学公式来重新考虑:
image

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@PhilIp-L-Good
Copy link
Author

Consider the following mathematical formula to reconsider.

If we multiply both the numerator and denominator of the soft-max function by the constant C, we get the following equation:
image

@Devil-SX
Copy link

我不明白您为什么要这么做,可以进一步解释怎么从该数学公式得到将1改为 $e^{-C}$ 的吗

@martindbp
Copy link

martindbp commented Dec 12, 2023

@Devil-SX The reason is that in the original formulation when you subtract the max, it cannot give more than 0.5 attention to the attention sink. But if you replace the 1 with e^c you get:

$\frac{e^{x_i+c}}{e^c + \sum_j e^{x_j+c}} = \frac{e^c e^{x_i}}{e^c + \sum_j e^{x_j}e^c} = \frac{e^c e^{x_i}}{e^c(1 + \sum_j e^{x_j})} = \frac{e^{x_i}}{1 + \sum_j e^{x_j}}$

@christopher-w-murphy
Copy link

I agree. The definition of softmax_1 is wrong here. I worked with Evan Miller on this, and saw many people make the same mistake. I implemented the correct version in Flash Attention here: https://github.com/softmax1/Flash-Attention-Softmax-N

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants