Skip to content

Latest commit

 

History

History
210 lines (143 loc) · 6.02 KB

README.md

File metadata and controls

210 lines (143 loc) · 6.02 KB

Transformers OCR

https://tatsumoto.neocities.org/blog/mining-from-manga.html

AUR Chat GitHub

An OCR tool for the GNU operating system that uses Transformers. Supports Xorg and Wayland.

ocr.mp4

This Manga OCR application is likely the most suckless and lightweight option available. The application is designed to work best with a tiling window manager. It requires a minimum of dependencies, and all of them you probably already have. However, it still has to rely on large Python libraries to work. To isolate the bloat, these libraries are installed in a dedicated folder. But if your computer is rather slow, use Tesseract instead.

Installation

Arch Linux and Arch-based distros

Install from the AUR.

Other distros

If you want to package this program for your distribution and know how to do it, please create a pull request. Otherwise, read the section below.

To install manually (not recommended)

The steps below are for people who can't access the AUR.

Step 1. Install the following dependencies if they are not installed.

Xorg
Wayland
GNOME
KDE

Step 2. Install the program using Makefile.

git clone 'https://github.com/Ajatt-Tools/transformers_ocr.git'
cd -- 'transformers_ocr'
sudo make install

Setup

Before you start, download manga-ocr data:

transformers_ocr download

The files will be saved to ~/.local/share/manga_ocr.

Usage

To show a help page, run transformers_ocr help.

To OCR text on a manga page, run:

transformers_ocr recognize

Bind the command to a keyboard shortcut using your WM's config. This enables you to call the OCR from anywhere, as shown in the demo video.

For example, if you use i3wm, add this line to the config file.

bindsym $mod+o  exec --no-startup-id transformers_ocr recognize

The first run will take longer than usual. There are additional files that will be downloaded and saved to ~/.cache/huggingface.

On the first run transformers_ocr launches a listener process that is running is the background and reads any new screenshots passed to it. To speed up the first run, add the command below to autostart (using ~/.profile, ~/.xinitrc, etc.).

transformers_ocr start

Holding text

Quite often one sentence, phrase or a chunk of meaning is split between two or more speech bubbles. This is a problem because if you take a screenshot of the whole area, including the area between the speech bubbles, you will likely end up with junk in the results. Processing each bubble separately is also not ideal since you want to analyze the entire sentence in GoldenDict, add it to Anki, etc.

A solution is to have transformers-ocr hold text for you. It will recognize one speech bubble, remember it, then wait for another, and only copy the text from all bubbles altogether when you're done.

To use this feature, add a new keyboard shortcut to the config file of your WM, for example Mod+Shift+o. Example for i3wm:

bindsym $mod+Shift+o  exec --no-startup-id transformers_ocr hold
screencast.mp4

Every time you call hold, a speech bubble will be recognized and saved for later. Finally, call recognize using the usual keyboard shortcut to copy the last speech bubble and all the saved ones together. The list of saved bubbles will be emptied when calling recognize.

Config file

Optionally, you can create a config file.

mkdir -p ~/.config/transformers_ocr
touch ~/.config/transformers_ocr/config

Each line must have this format: key=value. Lines that start with # are ignored.

Pass Image path

The --image-path argument can be used to manually parse image files rather than to rely on a screenshot taking application. Or, it can be used to add support to other screenshot taking applications.

Example usage in zsh:

flameshot_path=$(mktemp -u --suffix .png)
# cli usage for flameshot with no copy
flameshot gui --path "$flameshot_path" --delay 100
transformers_ocr recognize --image-path "$flameshot_path"

Send text to an external application

Instead of copying text to the clipboard, you may want to pass it as an argument to an external application. In the example below clip_command is set to goldendict which allows you to send recognized text directly to GoldenDict and keep the system clipboard for other tasks.

echo 'clip_command=goldendict %TEXT%' >> ~/.config/transformers_ocr/config
transformers_ocr stop
transformers_ocr start

If %TEXT% is passed as a parameter, it will be replaced with the actual text in the speech bubble. If not, the text will be passed to stdin of the called program.

Force CPU

If you want to force CPU.

echo 'force_cpu=yes' >> ~/.config/transformers_ocr/config