Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper Parser Fails #1

Open
data-hound opened this issue Jan 16, 2023 · 1 comment
Open

Paper Parser Fails #1

data-hound opened this issue Jan 16, 2023 · 1 comment

Comments

@data-hound
Copy link

data-hound commented Jan 16, 2023

Hi

This is really an awesome project, and could become a very handy open tool using @keerthanpg's colab!
Anyways, I have been fiddling with the scripts but I found that not all kinds of papers can be parsed yet. I tried some of the recent NeurIPS papers:
https://proceedings.neurips.cc/paper/2021/file/007ff380ee5ac49ffc34442f5c2a2b86-Paper.pdf, https://proceedings.neurips.cc/paper/2021/file/003dd617c12d444ff9c80f717c3fa982-Paper.pdf

But, the parser returned an empty list. Do you have any idea what could be the issue here? I suspected two column formatting might be a problem, since the initial couple of papers I used were two-column format, and the parser failed. But now, its failing with the single column format (Neurips - the same venue as the demo paper) as well. Maybe using pdfplumber could be a better option?

@Harsharma2308
Copy link

Harsharma2308 commented Jan 17, 2023

The filtered text has weird text length going to more than 15k ?
The data frame created is 64,3. Though the text length seems spurious. Some issue with the pdf parser maybe?

image
image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants