How to Retro-Digitize a Historical Dictionary

Step 3.3: Transcribe the Dictionary Pages

Use the trained Hanunoo model to OCR the dictionary. For this tutorial, you will OCR the same 5 sample pages.

cd retro-digitization/tutorial
export TESSDATA_PREFIX=./final

tesseract sample-01.tif trained-01 -l hnn
tesseract sample-02.tif trained-02 -l hnn
tesseract sample-03.tif trained-03 -l hnn
tesseract sample-04.tif trained-04 -l hnn
tesseract sample-05.tif trained-05 -l hnn

You should see 5 OCR-ed text files: trained-01.txt, ... trained-05.txt

Step 3.4 - Proofread

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step3.3-Transcribe.md

Step3.3-Transcribe.md

How to Retro-Digitize a Historical Dictionary

Step 3.3: Transcribe the Dictionary Pages

Files

Step3.3-Transcribe.md

Latest commit

History

Step3.3-Transcribe.md

File metadata and controls

How to Retro-Digitize a Historical Dictionary

Step 3.3: Transcribe the Dictionary Pages