⚡[Quick Start] [Model] [ModelScope Demo] [Hugging Face Demo] [Local Demo] [PaddleOCR Implementation]
We proposed strategies to comprehensively enhance CTC-based STR models and developed a novel CTC-based method, SVTRv2. SVTRv2 can outperform previous attention-based STR methods in terms of accuracy while maintaining the advantages of CTC, such as fast inference and robust recognition of long text. These features make SVTRv2 particularly well-suited for practical applications. To this end, building on SVTRv2, we develop a practical version of the model from scratch on publicly available Chinese and English datasets. Combined with a detection model, this forms a general OCR system with accuracy and efficiency, OpenOCR. Comparing with PP-OCRv4 baseline in the OCR competition leaderboard, OpenOCR (mobile) achieve a 4.5% improvement in terms of accuracy, while preserving quite similar inference speed on NVIDIA 1080Ti GPU.
Model | Config | E2E Metric | Downloading |
---|---|---|---|
PP-OCRv4 | 62.77% | PaddleOCR Model List | |
SVTRv2 (Rec Server) | configs/rec/svtrv2/svtrv2_ch.yml | 68.81% | Google Dirve, Github Released |
RepSVTR (Mobile) | Rec: configs/rec/svtrv2/repsvtr_ch.yml Det: configs/det/dbnet/repvit_db.yml |
67.22% | Rec: Google Drive, Github Released Det: Google Drive, Github Released |
- PyTorch version >= 1.13.0
- Python version >= 3.7
conda create -n openocr python==3.8
conda activate openocr
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
After installing dependencies, the following two installation methods are available. Either one can be chosen.
pip install openocr-python
Usage:
from openocr import OpenOCR
engine = OpenOCR()
img_path = '/path/img_path or /path/img_file'
result, elapse = engine(img_path)
# Server mode
# engine = OpenOCR(mode='server')
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install -r requirements.txt
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_repsvtr_ch.pth
# Rec Server model
# wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_svtrv2_ch.pth
Usage:
# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
pip install gradio==4.20.0
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/OCR_e2e_img.tar
tar xf OCR_e2e_img.tar
# start demo
python demo_gradio.py
TODO
TODO
In the examples provided, OpenOCR's detection model generates bounding boxes that are generally more comprehensive and better aligned with the boundaries of text instances compared to PP-OCRv4. In addition, OpenOCR excels in distinguishing separate text instances, avoiding errors such as merging two distinct text instances into one or splitting a single instance into multiple parts. This indicates superior handling of semantic completeness and spatial understanding, making it particularly effective for complex layouts.
OpenOCR's recognition model demonstrates enhanced generalization capabilities when compared to PP-OCRv4. It performs exceptionally well in recognizing text under difficult conditions, such as:
- Artistic or stylized fonts.
- Handwritten text.
- Blurry or low-resolution images.
- Incomplete or occluded text.
Remarkably, the OpenOCR mobile recognition model delivers results comparable to the larger and more resource-intensive PP-OCRv4 server model. This highlights OpenOCR's efficiency and accuracy, making it a versatile solution across different hardware platforms.
As shown in Det + Rec System results, OpenOCR demonstrates outstanding performance in practical scenarios, including documents, tables, invoices, and similar contexts. This underscores its potential as a general-purpose OCR system. It is capable of adapting to diverse use cases with high accuracy and reliability.
If you find our method useful for your reserach, please cite:
@article{Du2024SVTRv2,
title={SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition},
author={Yongkun Du and Zhineng Chen and Hongtao Xie and Caiyan Jia and Yu-Gang Jiang},
journal={CoRR},
volume={abs/2411.15858},
eprinttype={arXiv},
year={2024},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.15858}
}