-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training script for GPT-CoNuT model does not work #7
Comments
@lin-tan and @jiang719 we loaded the OpenAIGPTLMHeadModel(config)) to train a new model from scratch. This change looks like:
We would really appreciate if you share the config you trained the model with as we dont want to compare cure with our approach incorrectly. @jiang719 can you please help? |
@msintaha @nashid I used things like: You can also try other reasonable settings, as this is just empirically set. Actually, the checkpoint contains the config information, you can get and print it like this:
|
@jiang719 I cant find any model pushed to the repo. Where can I find the checkpoint? |
@jiang719 can you please share the trained model or upload somewhere so that others can download? |
@nashid Hello! Could you train with gpt_conut_trainer.py and gpt_fconv_trainer.py now? When I train using these two scripts, after the second round of reads, the loss value changes to nan. i have tried many ways but none of them work. if you have successfully trained using these two scripts, can you share your scripts and training data with me? (Although I'm basically sure my training data is fine, I used them to train two other NPR models) |
We are trying to train the GPT-CoNuT model. Following the instruction, we are trying to run the training script: src/trainer/gpt_conut_trainer.py.
However, the training fails here:
https://github.com/lin-tan/CURE/blob/master/src/trainer/gpt_conut_trainer.py#L22
In the very first step, this code is trying to load the model. Here, we are trying to train the model from scratch. So unless I am missing something, this does not seem correct. Can you share the artefact for training the model from scratch, please?
Looking forward to hear your feedback. Thanks in advance for the help.
The text was updated successfully, but these errors were encountered: