Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to resume training from last checkpoint #33

Open
sparshgarg23 opened this issue Mar 23, 2024 · 2 comments
Open

How to resume training from last checkpoint #33

sparshgarg23 opened this issue Mar 23, 2024 · 2 comments

Comments

@sparshgarg23
Copy link

Hi,I came across your work on instance segmentation and I am currently trying to reproduce the results.I was previously able to train the model for 90,000 iterations but when I tried resuming the training from the last checkpoint,I ended up getting some errors related to not properly loading the configuration file.

as i am new to detectron2,could you provide pointers on how to resume training from existing checkpoint.Does the resume option expect a cfg file as an argument or does it expect a model weights?
thanks

@junjiehe96
Copy link
Owner

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4
--config-file /path/to/your_config.yaml
--resume MODEL.WEIGHTS /path/to/existing_checkpoint.pth

@sparshgarg23
Copy link
Author

hmmm tried that but instead of resuming from iteration 70,000K it restrarted training from iteration 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants