Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading pretrained open ai model #28

Open
mehdimashayekhi opened this issue Jul 20, 2018 · 3 comments
Open

loading pretrained open ai model #28

mehdimashayekhi opened this issue Jul 20, 2018 · 3 comments

Comments

@mehdimashayekhi
Copy link

mehdimashayekhi commented Jul 20, 2018

can somebody please explain what are these parameters here

def load_openai_pretrained_model(model, n_ctx=-1, n_special=-1, n_transfer=12, n_embd=768, path='./model/',
, e.g., offsets, init parameters, can you add some comments to this function?, thanks

@mehdimashayekhi mehdimashayekhi changed the title pretrained open ai model loading pretrained open ai model Jul 20, 2018
@rodgzilla
Copy link
Contributor

There you go (@thomwolf correct me if I'm wrong on any of these):

  • n_ctx is the maximum number of token in an input sequence.
  • n_special is the number of special tokens used to format the input properly. For example in the ROCStories problem, we use 3 additional tokens, _start_, _delimiter_ and _classify_.
  • n_transfer is the number of pre-trained layers that we will be loaded, the next ones will be initialized randomly.
  • n_embd is the dimension of the embedding and of the vector associated to each position in the network. It has the value 768 because the network uses multi-head attention with 12 heads and 768 = 12 * 64.

If these comments are helpful to you I will add them in the code.

@thomwolf
Copy link
Member

Exactly!

@mehdimashayekhi
Copy link
Author

@rodgzilla thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants