You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know that n_vocab is the total number of tokens in encoder dictionary. But when I saw vocab = n_vocab + n_special + n_ctx, I was confused, maybe n_special is the for start,delimiter and classify. But what is n_ctx? Why add these 3 things? (why there is little comment about variables and functions....Is there somewhere else to see the explanation of the codes?) I am new to learn about the transformer.
The text was updated successfully, but these errors were encountered:
n_ctx is the maximum number of token in an input sequence.
n_special is the number of special tokens used to format the input properly. For example in the ROCStories problem, we use 3 additional tokens, start, delimiter and classify
n_vocab should be the actual valid tokens
the reason for adding is that each token in a sentence should also has a position encoding(here, position embedding is used, similar to word embedding.
I know that
n_vocab
is the total number of tokens in encoder dictionary. But when I sawvocab = n_vocab + n_special + n_ctx
, I was confused, mayben_special
is the for start,delimiter and classify. But what isn_ctx
? Why add these 3 things? (why there is little comment about variables and functions....Is there somewhere else to see the explanation of the codes?) I am new to learn about the transformer.The text was updated successfully, but these errors were encountered: