FOTS

[TOC]

This project is based on FOTS implemented by Ning Lu as well as DongLiang Ma and other excellent open source projects. Gratitude to all contributors to open source projects.

Identification sample [continuous update]

Scenario 1: specific portions of text

HOW TO RUN

Global parameter configuration

Field	Role	Reference Value	Comment
name	name of project	FOTS_2019-05-11_own_dataset	as you like
cuda	whether to use video card	true	use it if you can
gpus	configure multiple graphics cards for parallel training	[0]	`nvidia-smi` and look at the id of the video cards you want to use
finetune	finetune	""	must finetune the model with the same mode
data_loader	dataset configuration	see below for details
validation	validation dataset configuration	see below for details
lr_scheduler_type	learning rate scheduler type	ExponentialLR	linear,exp,custom,etc.
lr_scheduler_freq	learning rate scheduler frequency	50	adjust according to the convergence rate of the model
lr_scheduler	hyper-parameter of learning rate scheduler		pass parameters according to the scheduler type
optimizer_type	optimizer type	Adam	optimizer types supported by PyTorch
optimizer	hyper-parameter of specific optimizer		pass parameters by yourself, depending on the optimizer type
loss	loss function name	FOTSLoss	only the `FOTSLoss` support now
metrics	metric names	fots_metrics	both recognition_metric and detection_metric are not complete
trainer	trainer configuration	see below for details
arch	name of algorithm architecture	FOTSModel	only FOTSModel
model	hyper-parameter of FOTSModel	see below for details	will update frequently

dataset configuration 【data_loader】

Field	Role	Reference value	Comment
dataset	type of dataset	icdar2015、mydataset	If your data format is special, add to `data_loader\dataset.py`
data_dir	path to the data folder	trainging_gt,training_images	he folder must contains images folder and the ground truth folder
annotation_dir	path to the annotation data folder	training_gt	NOT COMPLETE
batch_size	batch size	32	according to personal circumstances
shuffle	whether to shuffle the data	true	true
workers	workers to generate data	0	1. The environment parameters need to be passed in the docker, otherwise an error will be reported 2. If `get_item` is inefficient, it is highly recommended to open more or the gpu will be idle.

ICDAR2015

Modify the configuration file as follows：

{
    "data_loader": {
        "dataset":"icdar2015",
        "data_dir": "/mnt/disk1/dataset/icdar2015/4.4/training",
        "batch_size": 16,
        "shuffle": true,
        "workers": 0 
    }
}

mydataset

Modify the configuration file as follows：

{
  "data_loader": {
        "dataset":"mydataset",
        "image_dir": "/data/OCR/owndataset/own_dataset/training_images",
        "annotation_dir": "/data/OCR/owndataset/own_dataset/training_gt",
        "batch_size": 4,
        "shuffle": true,
        "workers": 0
    }
}

validation dataset configuration【validation】

"validation": {
    "validation_split": 0.15,
    "shuffle": true
}

Since training and testing are conducted in a fixed proportion, where 'validation_split' indicates the proportion of test set, 'shuffle' means whether to reorder.

hyper-parameter of trainer【trainer】

"trainer": {
    "epochs": 10000,
    "save_dir": "/path/to/save_model",
    "save_freq": 1,
    "verbosity": 2,
    "monitor": "loss",
    "monitor_mode": "min"
}

In the trainer parameter, 'epochs' represents the total number of training rounds, 'save_dir' represents the storage location of the model, and the final location of the model is under 'save_dir/name', where 'name' is the project name in the global variable. 'save_freq' means storing the model every N epoches. 'verbosity' is used to set the logger display level. 'monitor' and 'monitor_mode' are used to generate the optimal model 'model_best.pth.tar' file.

[val_]loss global loss
[val_]det_loss text detection loss
[val_]rec_log text recognition loss
[val_]precious' metric 'defines the precision

Recall rate defined in [val_]recall 'metric'

Fscore defined in - [val_]hmean 'metric'

In the example, you want 'loss' to be as small as possible. Of course, it can also be set as' precious' as large as possible, that is, 'monitor' for 'precious' and' monitor_mode 'for' Max '.

hyper-parameter of FOTSModel【model】

"model": {
    "mode": "united",
    "scale": 512,
    "crnn": {
        "img_h": 16,
        "hidden": 1024
    },
    "keys": "number_and_dot"
}

NOTE

This piece is not perfect yet and can be customized according to individual needs.

mode has three modes can be selected, respectively is: recognition identification model of training, only detection testing model of training, only united testing and is not to training together. If you need to test a single module, you can choose whether to test or recognize it. The default is to train together.

The 'scale' parameter has not been adapted yet, and it will be used to adjust the size of the recognition frame later.

In 'CRNN', 'img_h' is the height of FeatureMap passed into the CRNN model after 'ROIRotate', **here must be a multiple of 8 **. 'hidden' is the number of hidden layers in BiLSTM in 'CRNN', and the specific parameters are adjusted by yourself.

'keys' is the character set currently used for identification. If you need to add or view an existing character set, please move to:common_str.py

Train

python train.py -c \path\to\your\config.json

Evaluate

python eval.py -m \path\to\your\model.pth.tar -i \path\to\eval\images -o \path\to\output\result

gRPC service

在service文件夹下面查看详情

Principles of interpretation

NOTE

There are some differences between this project and the original paper. In order to better converge the model, various adjustments are made to apply to the actual scene. And it may not be FOTS, so please don't worry about whether it's consistent with the original paper. Good results are enough.

Network architecture

Essentially, the current FOTS is an updated implementation of east+ CRNN. Instead of simply and roughly combining two models, FOTS puts the two models into the same big model, that is, gradient descent can be applied to two parts. The model structure is shown in the figure below:

NOTE

Which kind of FPN part of the paper is det and rec share, but found only in training for det or rec for convergence, if you want to ensure that this part convergence, then need a bigger size parameters, so for the sake of convenience, direct use of two classes of FPN part here, used to det and rec respectively, this model is more easy to convergence.

Part of det training is the training of common object detection model. The part of rec is to ROIRotate (in essence, complete affine transformation) the FeatureMap of the rbox region in ground-truth and then serve as the input of CRNN.During the training, the det part will get the rbox, ROIRotate the rbox area of the FeatureMap of the rec class FPN, and then pass it into CRNN.

The overall model is still OneStage.

TODO

use APEX to transform the model from float32 to float16
CRNN LSTM step parameter is adjustable, so that the rec part can support the CNN + CTC
ultra-long text recognition
adaptive TextSnake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_english.md

README_english.md

FOTS

Identification sample [continuous update]

Scenario 1: specific portions of text

HOW TO RUN

Global parameter configuration

dataset configuration 【data_loader】

ICDAR2015

mydataset

validation dataset configuration【validation】

hyper-parameter of trainer【trainer】

hyper-parameter of FOTSModel【model】

Train

Evaluate

gRPC service

Principles of interpretation

Network architecture

TODO

Files

README_english.md

Latest commit

History

README_english.md

File metadata and controls

FOTS

Identification sample [continuous update]

Scenario 1: specific portions of text

HOW TO RUN

Global parameter configuration

dataset configuration 【data_loader】

ICDAR2015

mydataset

validation dataset configuration【validation】

hyper-parameter of trainer【trainer】

hyper-parameter of FOTSModel【model】

Train

Evaluate

gRPC service

Principles of interpretation

Network architecture

TODO