Skip to content

Single-pass DjVu encoder based on DjVu Libre, minidjvu and ImageMagick

Notifications You must be signed in to change notification settings

homocomputeris/img2djvu

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is a single-pass DjVu encoder based on DjVu Libre and ImageMagick,
it requires bash and getopt, optionally also tifftopnm, minidjvu and
ocrodjvu plus cuneiform

img2djvu is in a public domain. Many thanks to members of
forum.ru-board.com for ideas, suggestions and corrections, especially to
U235, anagnost96 and monday2000.

===

SIMPLE USE

> img2djvu out

(where "out" is a name of folder wich contains some images)

img2djvu will output some diagnostics: first, total number of files in
folder, then will show processing status of every file. Skipped
non-graphic files are labeled with dots (.), graphic files are labeled
with angle brackets (<>). Black and white pages are labeled with B or M
(if minidjvu used), color pages with C, F (if cpaldjvu used) or L (if
layer separation used). OCR labeled as an additional layer (+1) or
additional plus (+). Aggressivity is the number of enclosed brackets
(e.g., <<<>>> for -a 2).

===

ADVANCED USE

> img2djvu -d 600 out

Will set resolution to 600 dpi (default is 300 dpi)

> img2djvu -t 17 out

Among color files, only files with color count < 17 will be sent to
cpaldjvu DjVu coder

> img2djvu -t 0 out

Will skip ImageMagick color count (this is much faster!) and send all
color files to c44 DjVu coder; cpaldjvu will not run

> img2djvu -m 10 out

For black and white pages, will use minidjvu DjVu coder (with 10 pages
per dictionary) instead of default cjb2 coder

> img2djvu -l 1 out

For color pages, img2djvu will perform layer separation*** (assuming that
the file is the result of Scan Tailor* mixed mode output**), then blur
color part and start forced segmentation****; it is slow but usually
produce compact output.

===

*    Scan Tailor: http://scantailor.sourceforge.net/

**   Mixed mode outputs image colors in [1...254] rank whereas text is
pure black [0] and page background is pure white [255]

***  Layer separation after Scan Tailor:
http://forum.ru-board.com/topic.cgi?forum=5&topic=32945 (in Russian),
http://alexrey036.narod.ru/LayerTailor/LayerTailor.zip

**** Forced segmentation: http://djvu-soft.narod.ru/scan/back_glue.htm
(in Russian)

> img2djvu -a 2 -m 20 out

Will produce more compact output; however, check the result
carefully---artifacts could appear!

> img2djvu -l 4 out

Will use four-fold downsampling for color layers; result will be
extremely compact. Coefficient should be an integer between 1 and 12;
best choices are 2, 3, or 4.

> img2djvu -p "" out

Will NOT use blur and contrast for processing color layers

> img2djvu -l 1 -r rus -e cuneiform -j 2 -a 1 out

After creation of the final DjVu, will run two OCR jobs of cuneiform with
"-rus" language option via ocrodjvu and insert text layer in place

> img2djvu -c 1 out

Will create temporary files in current directory

===

CAVEATS

Absolute paths are not allowed

It is expected that all images inside a folder have the same resolution

Folder and image names are very important; they, for example, determine
the page sequence in future DjVu file. It is strongly recommended to
rename files sequentially before im2djvu run. Also, please avoid using
anything except [0-9a-z_-] in the file and folder names. No spaces and
non-ASCII letters, please!

It is NOT recommended to use -l option with files which did not come from
Scan Tailor output

About

Single-pass DjVu encoder based on DjVu Libre, minidjvu and ImageMagick

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%