Adds invisible text layers to PDFs for Overview

Methodology

This program always outputs 0.json and 0.blob.

The output 0.json has wantOcr:false.

We custom-built PdfOcr and we'll fix it if it has errors. The one error we can't handle is OutOfMemory. That will make us exit with a non-zero exit code, and we'll count on the framework to bail us out.

Testing

Write to test/test-*. docker build . will run the tests.

Each test has input.blob (which means the same as in production) and input.json (whose contents are $1 in do-convert-single-file). The files stdout, 0.json and 0.blob in the test directory are expected values. If actual values differ from expected values, the test fails.

PDF is a tricky format to get exactly right. You may need to use the Docker image itself to generate expected output files. For instance, here is how we build test-embedded-png/0.blob:

Wrote test/test-embedded-png/{input.json,input.blob,0.json,stdout}
Ran docker build .. The end of the output looked like this: Step 12/13 : RUN [ "/app/test-convert-single-file" ] ---> Running in 202f38be95c9 1..1 not ok 1 - test-embedded-png do-convert-single-file wrote /tmp/test-do-convert-single-file887786150/0.blob, but we expected it not to exist ...
docker cp 202f38be95c9:/tmp/test-do-convert-single-file887786150/0.blob test/test-embedded-png/
docker rm -f 202f38be95c9
Inspect the file to make sure it behaves as expected
docker build . again -- success!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Methodology

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Methodology

Testing