Encoder paddings influence results? #45

OanaMariaCamburu · 2018-11-20T11:40:28Z

Hi,

I noticed that if I just increase n_ctx (it is 77 in https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/train.py#L214 and I tried different values larger than that) I get different results. For example, with n_ctx=200 I get:
ROCStories Valid Accuracy: 91.18
ROCStories Test Accuracy: 86.10

while without modifying it (n_ctx=77) I get:
ROCStories Valid Accuracy: 90.37
ROCStories Test Accuracy: 86.00

or with n_ctx=100:
ROCStories Valid Accuracy: 90.11
ROCStories Test Accuracy: 86.58

That is almost 1% difference on the validation set, and 0.58% on the test set. Running twice with the same n_ctx gives the same result, so the differences don't come from other sources.

I also couldn't (quickly) find the code that would set to -INF the values corresponding to paddings. I could only find here https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/model_pytorch.py#L87 for preventing the decoder from looking ahead, but nowhere for preventing the attention to go over the paddings.

As a side question, was wondering about the choice of -1e9 for -INF, couldn't that be too small and that the model still gets a tiny bit of information from ahead?

Thanks,
Oana

OanaMariaCamburu mentioned this issue Dec 5, 2018

Why is output vocab including positional embeddings? #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoder paddings influence results? #45

Encoder paddings influence results? #45

OanaMariaCamburu commented Nov 20, 2018 •

edited

Loading

Encoder paddings influence results? #45

Encoder paddings influence results? #45

Comments

OanaMariaCamburu commented Nov 20, 2018 • edited Loading

OanaMariaCamburu commented Nov 20, 2018 •

edited

Loading