loading pretrained open ai model #28

mehdimashayekhi · 2018-07-20T04:13:36Z

can somebody please explain what are these parameters here

Line 303 in 561d409

    
           def load_openai_pretrained_model(model, n_ctx=-1, n_special=-1, n_transfer=12, n_embd=768, path='./model/',

, e.g., offsets, init parameters, can you add some comments to this function?, thanks

rodgzilla · 2018-07-24T12:32:43Z

There you go (@thomwolf correct me if I'm wrong on any of these):

n_ctx is the maximum number of token in an input sequence.
n_special is the number of special tokens used to format the input properly. For example in the ROCStories problem, we use 3 additional tokens, _start_, _delimiter_ and _classify_.
n_transfer is the number of pre-trained layers that we will be loaded, the next ones will be initialized randomly.
n_embd is the dimension of the embedding and of the vector associated to each position in the network. It has the value 768 because the network uses multi-head attention with 12 heads and 768 = 12 * 64.

If these comments are helpful to you I will add them in the code.

thomwolf · 2018-07-24T13:47:48Z

Exactly!

mehdimashayekhi · 2018-07-26T16:54:10Z

mehdimashayekhi changed the title ~~pretrained open ai model~~ loading pretrained open ai model Jul 20, 2018

LeeRel1991 mentioned this issue Apr 26, 2019

vocab = n_vocab + n_special + n_ctx means? #54

Open

Provide feedback