Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the learning rate adjustment when training with one gpu or two gpus #80

Open
Young0111 opened this issue Apr 1, 2021 · 5 comments
Open

Comments

@Young0111
Copy link

Young0111 commented Apr 1, 2021

Thanks for your code.

When I training this code with two GPUs (Tesla P4), changing image_per_batch 4 by running:
python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 2 SOLVER.IMS_PER_BATCH 4

when it iter 7319, save the module and the AP of it is 3.915, which is different with your 11.440 (in your log).

Refer to detectron, I adjusted the learning rate to 0.0025 by running:
python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 2 SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.0025

and change GPU to 1 by running:
python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

but the result was worse.

Later I found that they use SGD and you use AdamW, maybe this 0.0025 is not applicable,
So I want to know if I need to adjust some parameters. and how to adjust?
Thanks.

@iFighting
Copy link
Collaborator

you should set BASE_LR as 0.000025 instead of 0.0025.

@Young0111
Copy link
Author

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

@iFighting
Copy link
Collaborator

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

maybe you need a small lr than 0.000025

@Young0111
Copy link
Author

OK, thanks for your reply, I will try it now.

@lingl-space
Copy link

So, what changes have you made? I guess do I need to set lr as 0.000025 * 1/8 and total number of iterations 8 times? However, as the number of iterations increases, the entire training duration becomes extremely large

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants