-
First of all, thanks for the great work. Is there any specific rationale behind setting LOSS_SCALE to 128 in ? Or is it just a magic number that worked well in practice? |
Beta Was this translation helpful? Give feedback.
Answered by
Tom94
Apr 8, 2022
Replies: 1 comment
-
Hi there, yes, it is a magic number that helps make better use of fp16 numbers during backpropagation through the neural network. There's otherwise no effect on training (the loss is multiplied by See this documentation for details. Cheers! |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
hturki
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi there, yes, it is a magic number that helps make better use of fp16 numbers during backpropagation through the neural network. There's otherwise no effect on training (the loss is multiplied by
LOSS_SCALE
before backpropagation, and the gradients are afterwards divided byLOSS_SCALE
to compensate).See this documentation for details.
Cheers!