A2 (batch_size=1024) config from article #1122

hankyul2 · 2022-02-04T02:50:18Z

hankyul2
Feb 4, 2022

Hi, I have experimented A2 config from ResNet Strikes back.

Question

I failed to reproduce same validation top-1 accuracy reported as 79.8. (mine: 79.416)
The only changes I made are batch size (2048 -> 1024) and learning rate (5e-3 -> 2.5e-3) due to limitation of gpu machine (RTX 3090(memory=24GB) * 4).
Do you have any tips for training model with smaller batch size??
(I also checked the post in discussion board)

Experiment Setup

I have used below command.

CUDA_VISIBLE_DEVICES=0,3,6,9 python3 -m torch.distributed.launch 
          --nproc_per_node=4 --master_port=12345 train.py imageNet --model resnet50
          --aa rand-m7-mstd0.5-inc1 --mixup .1 --cutmix 1.0 --aug-repeats 3 --remode pixel --reprob 0.0 --crop-pct 0.95 
          --drop-path .05 --smoothing 0.0 --bce-loss --bce-target-thresh 0.2 --opt lamb --weight-decay .02 --sched cosine --epochs 300 --lr 2.5e-3 --warmup-lr 1e-4 
          -b 256 -j 16 --amp --channels-last --log-wandb

Experiment Results

I have experimented with different warmup_lr(1e-6 or 1e-4) and batch_size(256 or 448) and bce_threshold(0.0 or 0.2)

	top-1	train time
paper	79.8	55 hour
a2_b256_wlr1e-6_bcethresh0.0	79.368	58 hour
a2_b448_wlr1e-6_bcethresh0.2	79.63	41 hour
a2_b256_wlr1e-4_bcethresh0.2	79.416	49 hour

Answered by rwightman

Feb 4, 2022

@hankyul2 your 79.63 is sitting around the mean (79.68) for the A2 runs in the paper. The 76.8 was a better than avg run (seed 0) for the official runs.

For my local runs I believe I hit EDIT (just looked up, I had 4 runs on my local machine for A2) 79.65, 79.68, 79.82, 79.83

You could try a different seed, I had a lucky affinity to '21' for some of my local runs, or you can use 0 like the paper numbers. It should start out with the same model weights as the paper rus (we checked that), but the rest of the random selections for augmentations, dataset sampling, etc will likely follow a different path (so results won't end up exactly the same).

I find that scaling LR by sqrt is better than …

View full answer

rwightman · 2022-02-04T05:17:08Z

rwightman
Feb 4, 2022
Maintainer

@hankyul2 your 79.63 is sitting around the mean (79.68) for the A2 runs in the paper. The 76.8 was a better than avg run (seed 0) for the official runs.

For my local runs I believe I hit EDIT (just looked up, I had 4 runs on my local machine for A2) 79.65, 79.68, 79.82, 79.83

You could try a different seed, I had a lucky affinity to '21' for some of my local runs, or you can use 0 like the paper numbers. It should start out with the same model weights as the paper rus (we checked that), but the rest of the random selections for augmentations, dataset sampling, etc will likely follow a different path (so results won't end up exactly the same).

I find that scaling LR by sqrt is better than linear for lamb and adamw so 5e-3 * sqrt(1024/2048) = 3.5e-3

6 replies

rwightman Feb 4, 2022
Maintainer

Also something that's after the paper, leveraging the train-test res discrepancy, if you train at 176 and eval at 224 (need to set img-size via arg and then hack the train script to eval at 224 (or just wait until your done and re-eval) the 224 test results will be better and you might just crack 80. However, it will reduce the performance of the resulting weights at higher resolutions (which I think is more important).

hankyul2 Feb 4, 2022
Author

Okay, to summarize your suggestion for others:

Try different seed. Experiment results with different runs: A2) 79.65, 79.68, 79.82, 79.83
Try different lr for smaller batch size: 5e-3 * sqrt(1024/2048) = 3.5e-3
- (For future reader) Using sqrt scaling actually helps to get better top-1 acc. (79.416 -> 79.726). Thanks @rwightman.
Try smaller train resolution (=176). But this could bring performance drop when evaluating with higher resolutions (>224).

Thank your for your answer! 😊😊😊

purvang3 Feb 4, 2022

Hi @rwightman , Could you please explain how you come with these numbers for calculating lr to start with.
5e-3 * sqrt(1024/2048)

Thank you

rwightman Feb 4, 2022
Maintainer

@purvang3 google learning rate square root scaling rule

purvang3 Feb 4, 2022

Thanks @rwightman. Couple of quick questions!

Any formula or intuition behind starting with 2.5e-3 or is it purely based on experience?
Could you share your experience of scaling lr with adaptive optimizers like adam. Do we always have to (and if not under what conditions) multiply scalar with respect to batch size change or it will scale adaptively?

Zoe-Wan · 2022-07-03T13:03:00Z

Zoe-Wan
Jul 3, 2022

Hi @hankyul2 , thanks for sharing your config and acc-step figure, it helps me a lot.
I am traning A2 config with batch size 1024 and base lr 3.5e-3 , finding that my accuracy is increasing so slowly (when step 70, I was only reach the acc 55%) . I can't figure out whether it is a normal phenomenon or because of something wrong in my config.
Could you please share the loss-step figure or the entire acc-step figure? I'll thanks a lot.

2 replies

hankyul2 Jul 3, 2022
Author

Hi @623232747

No problem at all. Here is my eval top-1 vs epoch csv file: resnet50.csv.

Thank you.

Hankyul

Zoe-Wan Jul 4, 2022

Wow! 👍 Thank you for replying so quickly!
Looks like there is something wrong with my configuration, your csv file really helps me a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A2 (batch_size=1024) config from article #1122

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A2 (batch_size=1024) config from article #1122

hankyul2 Feb 4, 2022

Question

Experiment Setup

Experiment Results

Replies: 2 comments · 8 replies

rwightman Feb 4, 2022 Maintainer

rwightman Feb 4, 2022 Maintainer

hankyul2 Feb 4, 2022 Author

purvang3 Feb 4, 2022

rwightman Feb 4, 2022 Maintainer

purvang3 Feb 4, 2022

Zoe-Wan Jul 3, 2022

hankyul2 Jul 3, 2022 Author

Zoe-Wan Jul 4, 2022

hankyul2
Feb 4, 2022

Replies: 2 comments 8 replies

rwightman
Feb 4, 2022
Maintainer

rwightman Feb 4, 2022
Maintainer

hankyul2 Feb 4, 2022
Author

rwightman Feb 4, 2022
Maintainer

Zoe-Wan
Jul 3, 2022

hankyul2 Jul 3, 2022
Author