Training script for GPT-CoNuT model does not work #7

nashid · 2022-09-27T21:47:30Z

We are trying to train the GPT-CoNuT model. Following the instruction, we are trying to run the training script: src/trainer/gpt_conut_trainer.py.

However, the training fails here:

https://github.com/lin-tan/CURE/blob/master/src/trainer/gpt_conut_trainer.py#L22

    def __init__(self, train_loader, valid_loader, dictionary, gpt_file):
        gpt_loaded = torch.load(gpt_file)
        config = gpt_loaded['config']
        gpt_model = OpenAIGPTLMHeadModel(config).cuda()
        gpt_model.load_state_dict(gpt_loaded['model'])

In the very first step, this code is trying to load the model. Here, we are trying to train the model from scratch. So unless I am missing something, this does not seem correct. Can you share the artefact for training the model from scratch, please?

Looking forward to hear your feedback. Thanks in advance for the help.

The text was updated successfully, but these errors were encountered:

msintaha · 2022-09-27T23:47:29Z

Hi @lin-tan and @jiang719 , I have a similar query. Can you also provide the config (used in OpenAIGPTLMHeadModel(config)) for training a new model from scratch?

Thank you.

nashid · 2022-10-01T20:21:22Z

@lin-tan and @jiang719 we loaded the OpenAIGPTLMHeadModel(config)) to train a new model from scratch.

This change looks like:

        - gpt_loaded = torch.load(gpt_file)
        - config = gpt_loaded['config']
        - gpt_model.load_state_dict(gpt_loaded['model'])
        + configuration = OpenAIGPTConfig()
        + gpt_model = OpenAIGPTLMHeadModel(configuration).cuda() if torch.cuda.is_available() else OpenAIGPTLMHeadModel(configuration)

We would really appreciate if you share the config you trained the model with as we dont want to compare cure with our approach incorrectly.

@jiang719 can you please help?

jiang719 · 2022-10-05T16:59:30Z

@msintaha @nashid I used things like:
n_positions=1024, n_ctx=1024, n_embd=384, n_layer=8, n_head=6

You can also try other reasonable settings, as this is just empirically set.

Actually, the checkpoint contains the config information, you can get and print it like this:

gpt_loaded = torch.load(gpt_file)
config = gpt_loaded['config']

nashid · 2022-10-06T06:24:59Z

@jiang719 I cant find any model pushed to the repo. Where can I find the checkpoint?

nashid · 2022-10-06T06:34:31Z

@jiang719 can you please share the trained model or upload somewhere so that others can download?

HanJin996 · 2023-08-28T10:42:56Z

@nashid Hello! Could you train with gpt_conut_trainer.py and gpt_fconv_trainer.py now? When I train using these two scripts, after the second round of reads, the loss value changes to nan. i have tried many ways but none of them work. if you have successfully trained using these two scripts, can you share your scripts and training data with me? (Although I'm basically sure my training data is fine, I used them to train two other NPR models)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training script for GPT-CoNuT model does not work #7

Training script for GPT-CoNuT model does not work #7

nashid commented Sep 27, 2022 •

edited

Loading

msintaha commented Sep 27, 2022 •

edited

Loading

nashid commented Oct 1, 2022 •

edited

Loading

jiang719 commented Oct 5, 2022 •

edited

Loading

nashid commented Oct 6, 2022

nashid commented Oct 6, 2022

HanJin996 commented Aug 28, 2023

Training script for GPT-CoNuT model does not work #7

Training script for GPT-CoNuT model does not work #7

Comments

nashid commented Sep 27, 2022 • edited Loading

msintaha commented Sep 27, 2022 • edited Loading

nashid commented Oct 1, 2022 • edited Loading

jiang719 commented Oct 5, 2022 • edited Loading

nashid commented Oct 6, 2022

nashid commented Oct 6, 2022

HanJin996 commented Aug 28, 2023

nashid commented Sep 27, 2022 •

edited

Loading

msintaha commented Sep 27, 2022 •

edited

Loading

nashid commented Oct 1, 2022 •

edited

Loading

jiang719 commented Oct 5, 2022 •

edited

Loading