Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Scheduler] Strange behavior of the lr_mult #3059

Open
2 tasks done
AlphaPlusTT opened this issue Nov 22, 2024 · 0 comments
Open
2 tasks done

[New Scheduler] Strange behavior of the lr_mult #3059

AlphaPlusTT opened this issue Nov 22, 2024 · 0 comments

Comments

@AlphaPlusTT
Copy link

AlphaPlusTT commented Nov 22, 2024

Scheduler description

Version: mmdetection3d v1.4
During training, I want the learning rate of the image_backbone to remain at 0.1 times the base learning rate. Therefore, I set the following in the configuration file:

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01),
    clip_grad=dict(max_norm=35, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }
    )
)

And set the param_scheduler:

param_scheduler = [
    # learning rate scheduler
    # During the first 8 epochs, learning rate increases from lr to lr * 100
    # during the next 12 epochs, learning rate decreases from lr * 100 to lr
    dict(
        type='CosineAnnealingLR',
        T_max=8,
        eta_min=lr * 100,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=12,
        eta_min=lr,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True),
    # momentum scheduler
    # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95
    # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1
    dict(
        type='CosineAnnealingMomentum',
        T_max=8,
        eta_min=0.85 / 0.95,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingMomentum',
        T_max=12,
        eta_min=1,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True)
]

At the beginning, the learning rate of img_backbone is indeed 0.1 times the base learning rate:

2024/11/21 21:40:30 - mmengine - INFO - Epoch(train)  [1][ 100/3517]  base_lr: 5.0005e-05 lr: 5.0060e-06  eta: 20:29:23  time: 0.9889  data_time: 0.0563  memory: 32041  grad_norm: 52825.8107  loss: 5568.1427  task0.loss_heatmap: 49.9860  task0.loss_bbox: 0.9099  task1.loss_heatmap: 596.6443  task1.loss_bbox: 1.1417  task2.loss_heatmap: 2504.7168  task2.loss_bbox: 1.5418  task3.loss_heatmap: 620.9393  task3.loss_bbox: 0.8771  task4.loss_heatmap: 1612.4171  task4.loss_bbox: 0.9113  task5.loss_heatmap: 177.1250  task5.loss_bbox: 0.9324

However, img_backbone's learning rate slowly caught up during the training process:

2024/11/21 23:45:30 - mmengine - INFO - Epoch(train)  [3][ 100/3517]  base_lr: 7.2556e-05 lr: 3.4323e-05  eta: 18:42:58  time: 1.0738  data_time: 0.0613  memory: 32031  grad_norm: 64.6705  loss: 14.8487  task0.loss_heatmap: 1.4181  task0.loss_bbox: 0.6505  task1.loss_heatmap: 2.0847  task1.loss_bbox: 0.7157  task2.loss_heatmap: 2.0074  task2.loss_bbox: 0.7194  task3.loss_heatmap: 1.4966  task3.loss_bbox: 0.5754  task4.loss_heatmap: 1.9814  task4.loss_bbox: 0.6894  task5.loss_heatmap: 1.8084  task5.loss_bbox: 0.7016
...
...
2024/11/22 01:50:03 - mmengine - INFO - Epoch(train)  [5][ 100/3517]  base_lr: 1.2583e-04 lr: 1.0358e-04  eta: 16:36:18  time: 1.0527  data_time: 0.0568  memory: 32069  grad_norm: 52.7803  loss: 15.1927  task0.loss_heatmap: 1.4274  task0.loss_bbox: 0.6234  task1.loss_heatmap: 2.1254  task1.loss_bbox: 0.6715  task2.loss_heatmap: 2.0836  task2.loss_bbox: 0.7248  task3.loss_heatmap: 1.9199  task3.loss_bbox: 0.6361  task4.loss_heatmap: 1.8900  task4.loss_bbox: 0.6479  task5.loss_heatmap: 1.7534  task5.loss_bbox: 0.6891

It looks like lr_mult only works at the beginning to set the learning rate. How can I make lr_mult work throughout the training process?

Open source status

  • The model implementation is available
  • The model weights are available.

Provide useful links for the implementation

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant