Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training script for GPT-CoNuT model does not work #7

Open
nashid opened this issue Sep 27, 2022 · 6 comments
Open

Training script for GPT-CoNuT model does not work #7

nashid opened this issue Sep 27, 2022 · 6 comments

Comments

@nashid
Copy link

nashid commented Sep 27, 2022

We are trying to train the GPT-CoNuT model. Following the instruction, we are trying to run the training script: src/trainer/gpt_conut_trainer.py.

However, the training fails here:

https://github.com/lin-tan/CURE/blob/master/src/trainer/gpt_conut_trainer.py#L22

    def __init__(self, train_loader, valid_loader, dictionary, gpt_file):
        gpt_loaded = torch.load(gpt_file)
        config = gpt_loaded['config']
        gpt_model = OpenAIGPTLMHeadModel(config).cuda()
        gpt_model.load_state_dict(gpt_loaded['model'])

In the very first step, this code is trying to load the model. Here, we are trying to train the model from scratch. So unless I am missing something, this does not seem correct. Can you share the artefact for training the model from scratch, please?

Looking forward to hear your feedback. Thanks in advance for the help.

@msintaha
Copy link

msintaha commented Sep 27, 2022

Hi @lin-tan and @jiang719 , I have a similar query. Can you also provide the config (used in OpenAIGPTLMHeadModel(config)) for training a new model from scratch?

Thank you.

@nashid
Copy link
Author

nashid commented Oct 1, 2022

@lin-tan and @jiang719 we loaded the OpenAIGPTLMHeadModel(config)) to train a new model from scratch.

This change looks like:

        - gpt_loaded = torch.load(gpt_file)
        - config = gpt_loaded['config']
        - gpt_model.load_state_dict(gpt_loaded['model'])
        + configuration = OpenAIGPTConfig()
        + gpt_model = OpenAIGPTLMHeadModel(configuration).cuda() if torch.cuda.is_available() else OpenAIGPTLMHeadModel(configuration)

We would really appreciate if you share the config you trained the model with as we dont want to compare cure with our approach incorrectly.

@jiang719 can you please help?

@jiang719
Copy link
Collaborator

jiang719 commented Oct 5, 2022

@msintaha @nashid I used things like:
n_positions=1024, n_ctx=1024, n_embd=384, n_layer=8, n_head=6

You can also try other reasonable settings, as this is just empirically set.

Actually, the checkpoint contains the config information, you can get and print it like this:

gpt_loaded = torch.load(gpt_file)
config = gpt_loaded['config']

@nashid
Copy link
Author

nashid commented Oct 6, 2022

@jiang719 I cant find any model pushed to the repo. Where can I find the checkpoint?

@nashid
Copy link
Author

nashid commented Oct 6, 2022

@jiang719 can you please share the trained model or upload somewhere so that others can download?

@HanJin996
Copy link

@nashid Hello! Could you train with gpt_conut_trainer.py and gpt_fconv_trainer.py now? When I train using these two scripts, after the second round of reads, the loss value changes to nan. i have tried many ways but none of them work. if you have successfully trained using these two scripts, can you share your scripts and training data with me? (Although I'm basically sure my training data is fine, I used them to train two other NPR models)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants