You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work on this very useful library!
I have had success training Albert Unbiased from scratch. I'm curious how model performance would compare if training continued from one of your checkpoints (unbiased-albert-c8519128.ckpt in this case). However if I attempt to initiate train.py with this file I am getting an error like:
KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to ModelCheckpoint.save_weights_only being set to True.'
Inspecting the checkpoint file I indeed observe it is missing some components, most critical of which (I think) is the optimizer_states. Comparing to one of my own checkpoints it looks like what is absent includes: ['pytorch-lightning_version', 'callbacks', 'optimizer_states', 'lr_schedulers', 'hparams_name', 'hyper_parameters'].
I'm wondering if I am doing something wrong? Or else, is it possible for you to share new versions of your checkpoints that include these missing components?
The text was updated successfully, but these errors were encountered:
Yes, we only saved the weights to keep the files small since the optimizer state is not needed for prediction. If you used the same data and training instructions the full checkpoint should be the same as what you have, which you could check by running the model on the test set.
Thank you for your work on this very useful library!
I have had success training Albert Unbiased from scratch. I'm curious how model performance would compare if training continued from one of your checkpoints (unbiased-albert-c8519128.ckpt in this case). However if I attempt to initiate train.py with this file I am getting an error like:
KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to
ModelCheckpoint.save_weights_only
being set toTrue
.'FYI I am using the following command:
python train.py --config configs/Unintended_bias_toxic_comment_classification_Albert_revised_training.json -d 1 --num_workers 0 -e 101 -r model_ckpts/unbiased-albert-c8519128_modified_state_dict.ckpt
Inspecting the checkpoint file I indeed observe it is missing some components, most critical of which (I think) is the optimizer_states. Comparing to one of my own checkpoints it looks like what is absent includes: ['pytorch-lightning_version', 'callbacks', 'optimizer_states', 'lr_schedulers', 'hparams_name', 'hyper_parameters'].
I'm wondering if I am doing something wrong? Or else, is it possible for you to share new versions of your checkpoints that include these missing components?
The text was updated successfully, but these errors were encountered: