Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception on a simple pdf file extraction #158

Open
wis-niowy opened this issue Jun 28, 2023 · 0 comments
Open

Exception on a simple pdf file extraction #158

wis-niowy opened this issue Jun 28, 2023 · 0 comments

Comments

@wis-niowy
Copy link

wis-niowy commented Jun 28, 2023

I started playing with TikaOnDotnet today and created a simple case with pdf file extraction.
Unfortunately I have an issue when calling TextExtractor.Extract() method (both overloads - with byte[] and string path as arguments)
The exception is:

TextExtractionException: Extraction failed.
TypeInitializationException: The type initializer for 'java.nio.charset.StandardCharsets' threw an exception.
TypeLoadException: Could not load type 'System.Reflection.Emit.MethodToken' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'.

The code to reproduce is very simple - I only do:

var tikaResult_path = new TextExtractor().Extract(pathToPdf);
//(..)
// .. get file stream and initialize StreamReader instance
var bytes = await streamReader.ReadToEndAsync();
var tikaResult_bytes = new TextExtractor().Extract(bytes);

They both fail with the same exceptions.

The version of TikaOnDotNet.TextExtraction installed: 1.17.1 (date published: Tuesday, April 3, 2018 (4/3/2018))

I saw this comment in another issue: #118 (comment)
And verified whether these dlls mentioned there get copied to the output folder - and yes, they do get copied (i.e. IKVM.OpenJDK.Cldrdata.dll).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant