Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention the truncation danger somewhere #19

Open
sven-nm opened this issue Apr 4, 2022 · 0 comments
Open

Mention the truncation danger somewhere #19

sven-nm opened this issue Apr 4, 2022 · 0 comments

Comments

@sven-nm
Copy link

sven-nm commented Apr 4, 2022

This is not really an issue as the code is doing what it should, but I think it would be good to warn users that if they are truncating samples (which, to my knowledge, is the default setting of HuggingFace and PyTorch) to fit the maximum model length (e.g. 512 tokens), then they will have blanks in their reconstructed files, which will lower their results.

Just dropping the idea !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant