You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i recently adapted a network architecture to a LightningModule, and find that when resuming a training in progress from a checkpoint file, the state of the OneCycleLR scheduler is not properly restored. i've tested with version 1.8.0 and confirmed that the issue persists.
the example pasted below will run a full end-to-end training of a toy model and dataset, then train the same model halfway to completion, and finally load the checkpoint file from the halfway-trained model and train it the rest of the way. the example will plot the learning rate in both cases, and demonstrate that in the latter case, the learning rate scheduler's internal state is not restored successfully when loading from the checkpoint file.
version: Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000
More info
the environment provided above is my local machine, where i constructed the toy, but i also observe the same issue on an Nvidia GPU cluster and in HPC environments, so it is not localised to a specific architecture.
Bug description
i recently adapted a network architecture to a
LightningModule
, and find that when resuming a training in progress from a checkpoint file, the state of theOneCycleLR
scheduler is not properly restored. i've tested with version 1.8.0 and confirmed that the issue persists.the example pasted below will run a full end-to-end training of a toy model and dataset, then train the same model halfway to completion, and finally load the checkpoint file from the halfway-trained model and train it the rest of the way. the example will plot the learning rate in both cases, and demonstrate that in the latter case, the learning rate scheduler's internal state is not restored successfully when loading from the checkpoint file.
How to reproduce the bug
Error messages and logs
No response
Environment
More info
the environment provided above is my local machine, where i constructed the toy, but i also observe the same issue on an Nvidia GPU cluster and in HPC environments, so it is not localised to a specific architecture.
cc @rohitgr7
The text was updated successfully, but these errors were encountered: