Convert your Amazon Textract results to hOCR output.
The code necessary for transforming Amazon Textract text extraction results to hOCR output is located in code/hocrOuput.py.
To make the code work you will need to install the following packages via pip:
- Yattag package (used for HTML generation)
- Textract-Caller to make calls to Amazon Textract
Inside code/hocrOuput.py, in the main function, replace the input_document_url with your document location in Amazon S3.
Run the script, it will generate an output html file.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.