Inquiry about the choice of learning rate schedule #2

pipilurj · 2020-07-06T12:31:31Z

Hello, the paper only addressed the problem when using step decay as learning rate schedule, what if cosine decay or other non-linear schedule is used? Are there experiments using these schedules? Thanks!

alexrenda · 2020-07-06T14:26:34Z

Hi,

In the paper, we only experimented on networks with step decay, and with linear learning rate warmup at the beginning of training. All three re-training techniques compared in the paper could still be applied with a non-linear schedule. It would definitely be interesting to compare the techniques on networks with other schedules: I suspect that the same findings will hold, though of course it's always possible that a nonlinear schedule would change things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about the choice of learning rate schedule #2

Inquiry about the choice of learning rate schedule #2

pipilurj commented Jul 6, 2020

alexrenda commented Jul 6, 2020

Inquiry about the choice of learning rate schedule #2

Inquiry about the choice of learning rate schedule #2

Comments

pipilurj commented Jul 6, 2020

alexrenda commented Jul 6, 2020