Skip to content

Latest commit

 

History

History
26 lines (15 loc) · 963 Bytes

README.md

File metadata and controls

26 lines (15 loc) · 963 Bytes

Amazon Textract to hOCR

Convert your Amazon Textract results to hOCR output.

Usage Instructions

The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.

To make the code work you will need to install the following packages via pip:

Inside code/hocrOuput.py, in the main function, replace the input_document_url with your document location in Amazon S3.

Run the script, it will generate an output html file.

Output example

Example

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.