Amazon Textract to hOCR

Convert your Amazon Textract results to hOCR output.

The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.

To make the code work you will need to install the following packages via pip:

Inside code/hocrOuput.py, in the main function, replace the input_document_url with your document location in Amazon S3.

Run the script, it will generate an output html file.

Security

See CONTRIBUTING for more information.

This library is licensed under the MIT-0 License. See the LICENSE file.