How to resume training from last checkpoint #33

sparshgarg23 · 2024-03-23T18:07:00Z

Hi,I came across your work on instance segmentation and I am currently trying to reproduce the results.I was previously able to train the model for 90,000 iterations but when I tried resuming the training from the last checkpoint,I ended up getting some errors related to not properly loading the configuration file.

as i am new to detectron2,could you provide pointers on how to resume training from existing checkpoint.Does the resume option expect a cfg file as an argument or does it expect a model weights?
thanks

junjiehe96 · 2024-03-24T08:23:10Z

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4
--config-file /path/to/your_config.yaml
--resume MODEL.WEIGHTS /path/to/existing_checkpoint.pth

sparshgarg23 · 2024-03-24T18:36:38Z

hmmm tried that but instead of resuming from iteration 70,000K it restrarted training from iteration 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to resume training from last checkpoint #33

How to resume training from last checkpoint #33

sparshgarg23 commented Mar 23, 2024

junjiehe96 commented Mar 24, 2024

sparshgarg23 commented Mar 24, 2024

How to resume training from last checkpoint #33

How to resume training from last checkpoint #33

Comments

sparshgarg23 commented Mar 23, 2024

junjiehe96 commented Mar 24, 2024

sparshgarg23 commented Mar 24, 2024