Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about the choice of learning rate schedule #2

Open
pipilurj opened this issue Jul 6, 2020 · 1 comment
Open

Inquiry about the choice of learning rate schedule #2

pipilurj opened this issue Jul 6, 2020 · 1 comment

Comments

@pipilurj
Copy link

pipilurj commented Jul 6, 2020

Hello, the paper only addressed the problem when using step decay as learning rate schedule, what if cosine decay or other non-linear schedule is used? Are there experiments using these schedules? Thanks!

@alexrenda
Copy link
Collaborator

Hi,

In the paper, we only experimented on networks with step decay, and with linear learning rate warmup at the beginning of training. All three re-training techniques compared in the paper could still be applied with a non-linear schedule. It would definitely be interesting to compare the techniques on networks with other schedules: I suspect that the same findings will hold, though of course it's always possible that a nonlinear schedule would change things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants