-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 用MoE训练的时候tflop超级低 #264
Comments
@Cerberous 方便提供下配置吗?或者使用的是config/7b_Moe4_sft的配置吗?moe因为gate计算有很多小算子,如果不进行fused的话再加上all2all的开销目前moe的MFU大致只有稠密模型的一半。 |
model = dict( moe = dict( parallel = dict( |
好的,我复现一下。您这边用了多少卡跑的? |
我这边就是一台8卡的H800 |
描述该错误
训练MoE模型时,模型的tflops只有几十,正常训练的时候是正常的
环境信息
官方镜像代码
其他信息
No response
The text was updated successfully, but these errors were encountered: