SyntheticDocuments

Generate simulated handwritten texts to use as supplementary data for machine learning tasks.

Requirements

OpenCV
Python 3
Java JRE

Usage

In order to generate synthetic images, a few prerequisites are necessary. These scripts depend on access to a set of files:

Background images (i.e. the pages the text will be placed on)
Handwriting samples
Stains to place on the pages

By default, these need to be placed in background_images/, handwriting_images/, and stain_imges/ folders relative to the root of this repository.

To get started, some good sources to get some representative data for each data type follow.

Background Images

Handwriting Samples

A good dataset to use is the IAM Handwriting Database which is available free for non-commercial research usage. Note that downloading the database will require registration.

N.B. All handwriting samples should be black text on a white background

Stains

A good collection of stains is provided by DIVADid itself. The small stains can be downloaded here and larger stains can be downloaded here

Generating Synthetic Images

Once the required data files are in place, a simple demonstration of running the script is

./generate_images.py 10

which will generate 10 synthetic images in a folder (by default in tmp/).

Options can be specified by editing the options.ini file or passed in on the command line. For example,

./generate_images.py --output_dir=~/synthetic_images 10

will generate 10 images and save them in the ~/synthetic_images directory.

Explanation of Process

There are three high-level steps to the process of generating these synthetic images.

Add degradations to paper images
Alpha blend text to paper documents
Further degrade images

See DIVADid for information on how the degradations are performed, as DIVADid is used for all degradations.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
lib		lib
.gitignore		.gitignore
DivaDid.jar		DivaDid.jar
README.md		README.md
crop_documents.py		crop_documents.py
crop_documents_simple.py		crop_documents_simple.py
document.py		document.py
generate_images.py		generate_images.py
image_util.py		image_util.py
output_stainer.py		output_stainer.py
settings.ini		settings.ini
text_writer_state.py		text_writer_state.py
word_transform.py		word_transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyntheticDocuments

Requirements

Usage

Background Images

Handwriting Samples

Stains

Generating Synthetic Images

Explanation of Process

About

Releases

Packages

Languages

DanielSaunders/SyntheticDocuments

Folders and files

Latest commit

History

Repository files navigation

SyntheticDocuments

Requirements

Usage

Background Images

Handwriting Samples

Stains

Generating Synthetic Images

Explanation of Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages