LR defined wrt single gpu batchsize or global batch size? #766

yoni-f · 2021-07-20T07:50:28Z

yoni-f
Jul 20, 2021

Hello,

I was wondering if in the example training params (https://rwightman.github.io/pytorch-image-models/training_hparam_examples/), are the LR supplied wrt to the batchsize specified (ie 0.016 for batch 128 in the effnetb2 example), or is 0.016 for a global batch size of 256 (128*2 since you are training on 2 gpus)?

Hope the question is clear
Thanks!

Answered by rwightman

Jul 20, 2021

@yoni-f batch size is per GPU right now, so you need to calculate global based on num gpu for converting between example hparams and your own setup, I realize that's not ideal and intend to add a mode that normalizes lr per 256 batch size but will likely end up in the future 'bits' training code

View full answer

rwightman · 2021-07-20T17:21:21Z

rwightman
Jul 20, 2021
Maintainer

@yoni-f batch size is per GPU right now, so you need to calculate global based on num gpu for converting between example hparams and your own setup, I realize that's not ideal and intend to add a mode that normalizes lr per 256 batch size but will likely end up in the future 'bits' training code

3 replies

yoni-f Jul 23, 2021
Author

Hey @rwightman , thanks for the reply. Just to make sure I understood: in the training hparams, the written batch_size is per GPU, but the written LR is global? Meaning if I'm training on 2 GPUs I'll use your 0.016 lr, but if I'm training on 4 GPUs I need to give it 0.032?

rwightman Jul 25, 2021
Maintainer

@yoni-f yes, LR global (it is not adjusted at all), likewise the batch_size is per GPU because it's also not adjusted and each distributed process (each GPU) gets the same batch size arg.

Yes, if you went from 2x GPU to 4x GPU and kept the batch size the same you'd double the LR (for SGD and RMSProp, for Adam and some related optimizers it's often better to multiply the LR by sqrt of the ratio)...

yoni-f Jul 25, 2021
Author

Thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LR defined wrt single gpu batchsize or global batch size? #766

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

LR defined wrt single gpu batchsize or global batch size? #766

yoni-f Jul 20, 2021

Replies: 1 comment · 3 replies

rwightman Jul 20, 2021 Maintainer

yoni-f Jul 23, 2021 Author

rwightman Jul 25, 2021 Maintainer

yoni-f Jul 25, 2021 Author

yoni-f
Jul 20, 2021

Replies: 1 comment 3 replies

rwightman
Jul 20, 2021
Maintainer

yoni-f Jul 23, 2021
Author

rwightman Jul 25, 2021
Maintainer

yoni-f Jul 25, 2021
Author