Skip to content

Latest commit

 

History

History
43 lines (33 loc) · 3.13 KB

README.md

File metadata and controls

43 lines (33 loc) · 3.13 KB

defHTR

Pytorch code for deformation-invariant line-level Handwritten Text Recognition, as proposed in paper (accepted to ICIP'21).

Motivation: Image deformations under simple geometric restrictions are crucial for Handwriting Text Recognition (HTR), since different writing styles can be viewed as simple geometrical deformations of the same textual elements.
Contibutions: 1) Exploration of different existing strategies for ensuring deformation invariance, including spatial transformers and deformable convolutions, under the context of text recognition. 2) Introduction of a new deformation-based algorithm, inspired by adversarial learning, which aims to reduce character output uncertainty during evaluation time.

DNN Architecture: A Convolutional-only HTR system is presented (see paper), where the output of a convolutional backbone, which transforms the images into a sequence of feature vectors, is fed into a cascade of 1-D convolutional layers. Model architecture can be modified by changing the the cnn_cfg and cnn_top variables in config.py. Specifically, CNN backbone is consisted of multiple stacks of ResBlocks and the default setting cnn_cfg = [(2, 32), 'M', (4, 64), 'M', (6, 128), 'M', (2, 256)] is interpeted as follows: the first stack consists of 2 resblocks with output channels of 32 dimensions, the second of 4 resblocks with 64 output channels etc. The head, consisted of three 1-D convolutional layers, can be modified through the cnn_top variable, which controls the number of output channels in these layers.

Selected Features:

  • Dataset is saved in a '.pt' file after the initial preprocessing for faster loading operations
  • All images are resized to 128x1024 (using padding if possible in order to retain aspect ratio).
  • Transformations used during training: global affine, local deformations (elastic net), local morphological operations.

File valid_deforms.py contains:

  • local_deform(img): creates a grid of smooth offsets in order to perform an elastic net distortion.
  • morphological(img): creates a grid which control the morhpological operation to apply in each neighborhod.
  • uncertainty_reduction(net, img): the proposed uncertainty reduction method to be used at evaluation. It discovers small local deformations (akin to local_deform method) which maximize the per character softmax output, i.e. minimize the uncertainty to select these characters.

Installation:

python3 -m venv venv_defhtr         # Create a virtual environment.. (optional)
source venv_defhtr/bin/activate     # ..and activate the virtual environment (optional)
pip3 install --upgrade pip          # Upgrade pip
pip3 install -r requirements.txt    # Install required libraries
cat utils/iam_config.py             # Update the folder where IAM resides in your hard drive
python3 train_htr.py                # Have fun!!

Note: Local paths of IAM dataset (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) are hardcoded in utils/iam_config.py