We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
一直限制attention的输出大小,这是warmup的手法吗?为什么是加在attention后面而不是卷积后面呢?
The text was updated successfully, but these errors were encountered:
这和知识蒸馏里面的内容相关,原文有提到为什么用tempeature,知识蒸馏可以看《Distilling the Knowledge in a Neural Network》这篇文章。
Sorry, something went wrong.
您好,请问应该怎么让这个程序跑起来呀
No branches or pull requests
一直限制attention的输出大小,这是warmup的手法吗?为什么是加在attention后面而不是卷积后面呢?
The text was updated successfully, but these errors were encountered: