Exception on a simple pdf file extraction #158

wis-niowy · 2023-06-28T13:00:04Z

I started playing with TikaOnDotnet today and created a simple case with pdf file extraction.
Unfortunately I have an issue when calling TextExtractor.Extract() method (both overloads - with byte[] and string path as arguments)
The exception is:

TextExtractionException: Extraction failed.
TypeInitializationException: The type initializer for 'java.nio.charset.StandardCharsets' threw an exception.
TypeLoadException: Could not load type 'System.Reflection.Emit.MethodToken' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'.

The code to reproduce is very simple - I only do:

var tikaResult_path = new TextExtractor().Extract(pathToPdf);
//(..)
// .. get file stream and initialize StreamReader instance
var bytes = await streamReader.ReadToEndAsync();
var tikaResult_bytes = new TextExtractor().Extract(bytes);

They both fail with the same exceptions.

The version of TikaOnDotNet.TextExtraction installed: 1.17.1 (date published: Tuesday, April 3, 2018 (4/3/2018))

I saw this comment in another issue: #118 (comment)
And verified whether these dlls mentioned there get copied to the output folder - and yes, they do get copied (i.e. IKVM.OpenJDK.Cldrdata.dll).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception on a simple pdf file extraction #158

Exception on a simple pdf file extraction #158

wis-niowy commented Jun 28, 2023 •

edited

Loading

Exception on a simple pdf file extraction #158

Exception on a simple pdf file extraction #158

Comments

wis-niowy commented Jun 28, 2023 • edited Loading

wis-niowy commented Jun 28, 2023 •

edited

Loading