-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal R error when attempting to extract text from a PDF that includes a particular mathematical symbol #166
Comments
@tomsutch thx for reporting this |
@tomsutch it took me longer than expected but I think I was able to solve it |
hi @tomsutch |
Hi, thanks for looking into this! I can't see a new commit here - please could you point me to it? |
sorry, i realize i never pushed the commit i did it now in dev/ but I realize that it fails on ubuntu but worked on windows when i set utf-8 |
hola @jazzido @tomsutch found this very interesting case that I can't solve "universally" do you have any clues? I added my test to reproduce the error here https://github.com/ropensci/tabulapdf/blob/main/dev/test-special_characters.R and the file here https://github.com/ropensci/tabulapdf/blob/main/inst/examples/xbar.pdf |
I proposed a fix here pachadotdev/tabula-java@7bcb49c but when I build the jar locally, the produced jar does no longer work with R this:
fails with:
|
Hi
|
I proposed a fix to the Java code, but the produced jar is not working for me |
I updated to tabula 1.0.6, but because I do not know Java, I cannot fix the issue coming from there see https://github.com/ropensci/tabulapdf/tree/166 The solution is that Java returns "The mean of x is denoted ?" instead of "The mean of x is denoted ?̅?" |
Description
Fatal R error when attempting to use$\bar{x}$ . There's no error message, R just terminates.
extract_text
on a PDF that includesReproducible example
I have constructed a simple example PDF, attached xbar.pdf, that gives the error. (I made this using Microsoft Word, inserting the$x$ and $\bar{x}$ using the equation editor, then saving to PDF.)
As this crashes R I can't use the
reprex
package for this, as far as I know...Note that if I call the
tabula.jar
bundled with the R package directly from the command line like thisjava -jar C:\Users\<username>\AppData\Local\R\win-library\4.4\tabulapdf\java\tabula.jar xbar.pdf
I get the following output (which is fine for my purposes - I am not particularly concerned about the$\bar{x}$ rendering properly, I just don't want the R session to crash):
Expected result
No fatal error: I would expect any issues with reading/rendering the$\bar{x}$ to result in a fallback like putting in '??' or similar.
Session info
The text was updated successfully, but these errors were encountered: