Paper Parser Fails #1

data-hound · 2023-01-16T05:29:52Z

Hi

This is really an awesome project, and could become a very handy open tool using @keerthanpg's colab!
Anyways, I have been fiddling with the scripts but I found that not all kinds of papers can be parsed yet. I tried some of the recent NeurIPS papers:
https://proceedings.neurips.cc/paper/2021/file/007ff380ee5ac49ffc34442f5c2a2b86-Paper.pdf, https://proceedings.neurips.cc/paper/2021/file/003dd617c12d444ff9c80f717c3fa982-Paper.pdf

But, the parser returned an empty list. Do you have any idea what could be the issue here? I suspected two column formatting might be a problem, since the initial couple of papers I used were two-column format, and the parser failed. But now, its failing with the single column format (Neurips - the same venue as the demo paper) as well. Maybe using pdfplumber could be a better option?

Harsharma2308 · 2023-01-17T01:37:48Z

The filtered text has weird text length going to more than 15k ?
The data frame created is 64,3. Though the text length seems spurious. Some issue with the pdf parser maybe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper Parser Fails #1

Paper Parser Fails #1

data-hound commented Jan 16, 2023 •

edited

Loading

Harsharma2308 commented Jan 17, 2023 •

edited

Loading

Paper Parser Fails #1

Paper Parser Fails #1

Comments

data-hound commented Jan 16, 2023 • edited Loading

Harsharma2308 commented Jan 17, 2023 • edited Loading

data-hound commented Jan 16, 2023 •

edited

Loading

Harsharma2308 commented Jan 17, 2023 •

edited

Loading