-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss.detach().clone().mean() * (microbatch_size / current_batch_size #1596
Comments
Hi @YixinSong-e thanks for bringing up the issue. |
I find the reason. When moe_loss_weight is set to 0, the megablocks will return a float number, not a tensor. I fix the issue by bypass ths loss['lbl']. |
Hi @YixinSong-e Can you explain in a bit more details what you mean by "bypass the loss['lbl']"? |
In llmfoundry/models/mpt/modeling_mpt.py file,
When I set the moe_loss_weight to 0, the result returned from batched_load_balancing_loss is a float number 0.0, not a tensor, which means {'loss': 0.0}.
But 0.0 is not a tensor, which do not have the detach() function. |
When I set moe_loss_weight:0
The text was updated successfully, but these errors were encountered: