You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tika is crashing on a PDF (which has confidential information, sorry can't post). at line 30 of StreamTextExtractor.cs attempting to extract text from the PDF.
Exception details:
System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=TikaOnDotNet
StackTrace:
at org.apache.jempbox.impl.XMLUtil.getStringValue(Element node)
Oddly, even though this code is in a try/finally block it trows an exception. If it would let me catch the exception, we could just ignore this file and keep going.
The file causing the error came from a Konica copier and appears to be a TIFF parked in a PDF. I suspect this error is related to issues #145 and #142 , only because Tika needs to extract information from a TIFF. I do not see how to add the optional dependencies to the .Net build to see if that is the problem. Does anybody know how that is accomplished?
Tika is crashing on a PDF (which has confidential information, sorry can't post). at line 30 of StreamTextExtractor.cs attempting to extract text from the PDF.
Exception details:
System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=TikaOnDotNet
StackTrace:
at org.apache.jempbox.impl.XMLUtil.getStringValue(Element node)
Oddly, even though this code is in a try/finally block it trows an exception. If it would let me catch the exception, we could just ignore this file and keep going.
I can open the file in adobe. Have saved as new pdf which also fails.
Is it possible to catch this error so the code can keep going?
The text was updated successfully, but these errors were encountered: