You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All models are trained using an SGD optimizer with an initial learning rate of 1e−1 and batch size of
512. The learning rate is divided by 10 at 30k, 60k, 90k training iterations.
Since this paper is about losses, have there been any experiments on learning rate schedulers?
In my experiment, I am using SRT loss. The loss keeps dropping with learning rate of 1e-1.
Any suggestions on when is best to divide lr by 10?
Thank you!
The text was updated successfully, but these errors were encountered:
In the section 4.5 of the paper:
Since this paper is about losses, have there been any experiments on learning rate schedulers?
In my experiment, I am using SRT loss. The loss keeps dropping with learning rate of 1e-1.
Any suggestions on when is best to divide lr by 10?
Thank you!
The text was updated successfully, but these errors were encountered: