Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long text clipped when disambiguated by BERT #145

Open
ahmadabousetta opened this issue Jun 6, 2024 · 1 comment
Open

Long text clipped when disambiguated by BERT #145

ahmadabousetta opened this issue Jun 6, 2024 · 1 comment
Assignees

Comments

@ahmadabousetta
Copy link

predictions.extend(prediction)

Ref line assumes the new batch is from a new sentence. Which is fine when trying to predict a list of short text sentences.
However, if we pass a single very long text, the dataloader will split the text into batches.
And since the input is only one sentence, only the predictions of the first batch will be returned. In my case, only 13309 out of 16949 tokens.

Fixing this issue should be done with care as this function is called also to predict a list of sentences.

@ahmadabousetta ahmadabousetta changed the title long text clipped when disambiguated by BERT Long text clipped when disambiguated by BERT Jun 6, 2024
@owo owo assigned owo and go-inoue Aug 23, 2024
@owo
Copy link
Collaborator

owo commented Aug 23, 2024

Agreed, we shouldn't be truncating output regardless of how long the input is.
We'll look into a good way of doing this without losing accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants