Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the file "cloze_test_test__spring2016 - cloze_test_ALL_test.csv" created? #41

Open
luffycodes opened this issue Sep 12, 2018 · 5 comments

Comments

@luffycodes
Copy link

Downloading the dataset from the website comprise different filenames, none of which matches this particular filename. Can you please elaborate as to how this file is created - like merging the train & test & val files? Preferably the filename of those files. Thanks !

@luffycodes
Copy link
Author

Thanks ! So, the model is not trained on the entire dataset "ROCStories__spring2016 - ROCStories_spring2016.csv"?

@artemisart
Copy link

According to the datasets.py file, it's trained on 1497 examples from 'cloze_test_val__spring2016 - cloze_test_ALL_val.csv', validated on 374 examples from the same file, and tested on 'cloze_test_test__spring2016 - cloze_test_ALL_test.csv'.

@Belerafon
Copy link

It looks like catastrophically small dataset for deep learning model, isn't it? I have heard that good start to get adequate model is 1GB of text data. How does it work?

@rodgzilla
Copy link
Contributor

The idea of the OpenAI paper is to use a pretrained network and transfer what it knows about language to another task. By doing this, you can obtain really good results with a small dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants