You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have trained a checkpoint without '--no-pipeline-parallel' args, which constructs the model named GPTModelPipe() in pretrain_gpt.py. Now, I modify the checkpoint to universal checkpoint and manually change the layer names. Then, I want to resume training with '--no-pipeline-parallel' which would construct the model named GPTModel(). Finally, I find the resumed training loss changes a lot (from 1.7 to 3.0).
Therefore, I want to ask whether there are some methods to solve this problem.
Also, the difference between GPTModel() and GPTModelPipe().
THX.
The text was updated successfully, but these errors were encountered:
Hello, I have trained a checkpoint without '--no-pipeline-parallel' args, which constructs the model named GPTModelPipe() in pretrain_gpt.py. Now, I modify the checkpoint to universal checkpoint and manually change the layer names. Then, I want to resume training with '--no-pipeline-parallel' which would construct the model named GPTModel(). Finally, I find the resumed training loss changes a lot (from 1.7 to 3.0).
Therefore, I want to ask whether there are some methods to solve this problem.
Also, the difference between GPTModel() and GPTModelPipe().
THX.
The text was updated successfully, but these errors were encountered: