We aim to establish a unified benchmark for training and evaluating models in scene text detection and recognition. Building on this benchmark, we introduce a general OCR system with accuracy and efficiency, OpenOCR. This repository also serves as the official codebase of the OCR team from the FVL Laboratory, Fudan University.
We sincerely welcome the researcher to recommend OCR or relevant algorithms and point out any potential factual errors or bugs. Upon receiving the suggestions, we will promptly evaluate and critically reproduce them. We look forward to collaborating with you to advance the development of OpenOCR and continuously contribute to the OCR community!
- π₯OpenOCR: A general OCR system with accuracy and efficiency
- β‘[Quick Start] [Model] [ModelScope Demo] [Hugging Face Demo] [Local Demo] [PaddleOCR Implementation]
- Introduction
- A practical OCR system building on SVTRv2.
- Outperforms PP-OCRv4 baseline by 4.5% on the OCR competition leaderboard in terms of accuracy, while preserving quite similar inference speed.
- Supports Chinese and English text detection and recognition.
- Provides server model and mobile model.
- Fine-tunes OpenOCR on a custom dataset.
- ONNX model export for wider compatibility.
- π₯SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
- [Paper] [Doc] [Model] [Datasets] [Config, Training and Inference] [Benchmark]
- Introduction
- A unified training and evaluation benchmark (on top of Union14M) for Scene Text Recognition
- Supports 24 Scene Text Recognition methods trained from scratch on the large-scale real dataset Union14M-L-Filter, and will continue to add the latest methods.
- Improves accuracy by 20-30% compared to models trained based on synthetic datasets.
- Towards Arbitrary-Shaped Text Recognition and Language modeling with a Single Visual Model.
- Surpasses Attention-based Encoder-Decoder Methods across challenging scenarios in terms of accuracy and speed
- Get Started with training a SOTA Scene Text Recognition model from scratch.
- DPTR (Shuai Zhao, Yongkun Du, Zhineng Chen*, Yu-Gang Jiang. Decoder Pre-Training with only Text for Scene Text Recognition, ACM MM 2024. Paper)
- IGTR (Yongkun Du, Zhineng Chen*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition, Under TPAMI minor revision 2024. Doc, Paper)
- SVTRv2 (Yongkun Du, Zhineng Chen*, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition, 2024. Doc, Paper)
- SMTR&FocalSVTR (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xieping Gao, Yu-Gang Jiang. Out of Length Text Recognition with Sub-String Matching, 2024. Doc, Paper)
- CDistNet (Tianlun Zheng, Zhineng Chen*, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang. CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition, IJCV 2024. Paper)
- MRN (Tianlun Zheng, Zhineng Chen*, Bingchen Huang, Wei Zhang, Yu-Gang Jiang. MRN: Multiplexed routing network for incremental multilingual text recognition, ICCV 2023. Paper, Code)
- TPS++ (Tianlun Zheng, Zhineng Chen*, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang. TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition, IJCAI 2023. Paper, Code)
- CPPD (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang. Context Perception Parallel Decoder for Scene Text Recognition, Under TPAMI minor revision 2023. PaddleOCR Doc, Paper)
- SVTR (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang. SVTR: Scene Text Recognition with a Single Visual Model, IJCAI 2022 (Long). PaddleOCR Doc, Paper)
- NRTR (Fenfen Sheng, Zhineng Chen*, Bo Xu. NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition, ICDAR 2019. Paper)
- π₯ 2024.11.23 release notes:
- OpenOCR: A general OCR system with accuracy and efficiency
- SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
- [Paper] [Doc] [Model] [Datasets] [Config, Training and Inference] [Benchmark]
- Introduction
- Get Started with training a SOTA Scene Text Recognition model from scratch.
- PyTorch version >= 1.13.0
- Python version >= 3.7
conda create -n openocr python==3.8
conda activate openocr
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
After installing dependencies, the following two installation methods are available. Either one can be chosen.
pip install openocr-python
Usage:
from openocr import OpenOCR
engine = OpenOCR()
img_path = '/path/img_path or /path/img_file'
result, elapse = engine(img_path)
# Server mode
# engine = OpenOCR(mode='server')
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install -r requirements.txt
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_repsvtr_ch.pth
# Rec Server model
# wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_svtrv2_ch.pth
Usage:
# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
pip install gradio==4.20.0
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/OCR_e2e_img.tar
tar xf OCR_e2e_img.tar
# start demo
python demo_gradio.py
Method | Venue | Training | Evaluation | Contributor |
---|---|---|---|---|
CRNN | TPAMI 2016 | β | β | |
ASTER | TPAMI 2019 | β | β | pretto0 |
NRTR | ICDAR 2019 | β | β | |
SAR | AAAI 2019 | β | β | pretto0 |
MORAN | PR 2019 | β | β | Debug |
DAN | AAAI 2020 | β | β | |
RobustScanner | ECCV 2020 | β | β | pretto0 |
AutoSTR | ECCV 2020 | β | β | |
SRN | CVPR 2020 | β | β | pretto0 |
SEED | CVPR 2020 | β | β | |
ABINet | CVPR 2021 | β | β | YesianRohn |
VisionLAN | ICCV 2021 | β | β | YesianRohn |
SVTR | IJCAI 2022 | β | β | |
PARSeq | ECCV 2022 | β | β | |
MATRN | ECCV 2022 | β | β | |
MGP-STR | ECCV 2022 | β | β | |
CPPD | 2023 | β | β | |
LPV | IJCAI 2023 | β | β | |
MAERec(Union14M) | ICCV 2023 | β | β | |
LISTER | ICCV 2023 | β | β | |
CDistNet | IJCV 2024 | β | β | YesianRohn |
BUSNet | AAAI 2024 | β | β | |
DCTC | AAAI 2024 | TODO | ||
CAM | PR 2024 | β | β | |
OTE | CVPR 2024 | β | β | |
CFF | IJCAI 2024 | TODO | ||
DPTR | ACM MM 2024 | TODO | ||
VIPTR | ACM CIKM 2024 | TODO | ||
IGTR | 2024 | β | β | |
SMTR | 2024 | β | β | |
FocalSVTR-CTC | 2024 | β | β | |
SVTRv2 | 2024 | β | β | |
ResNet+Trans-CTC | β | β | ||
ViT-CTC | β | β |
Yiming Lei (pretto0) and Xingsong Ye (YesianRohn) from the FVL Laboratory, Fudan University, with guidance from Dr. Zhineng Chen, completed the majority work of the algorithm reproduction. Grateful for their outstanding contributions.
TODO
TODO
If you find our method useful for your reserach, please cite:
@article{Du2024SVTRv2,
title={SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition},
author={Yongkun Du and Zhineng Chen and Hongtao Xie and Caiyan Jia and Yu-Gang Jiang},
journal={CoRR},
volume={abs/2411.15858},
eprinttype={arXiv},
year={2024},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.15858}
}
This codebase is built based on the PaddleOCR, PytorchOCR, and MMOCR. Thanks for their awesome work!