Skip to content

Repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"

License

Notifications You must be signed in to change notification settings

maps-as-data/DeepSolo

 
 

Repository files navigation

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

News | Main Results | Installation | Citation | Acknowledgement

This is a fork of the official repo for the papers:

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

image

News

2024.04.04 Repo forked from main repo. This fork may not reflect changes in the main repo from this point.

2023.06.2 Update the pre-trained and fine-tuned Chinese scene text spotting model (78.3% 1-NED on ICDAR 2019 ReCTS).

2023.05.31 The extension paper (DeepSolo++) is submitted to ArXiv. The code and models will be released soon.

2023.02.28 DeepSolo is accepted by CVPR 2023. 🎉🎉


Relevant Project:

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Code

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching | Code

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Code

Other applications of ViTAE inlcude: ViTPose | Remote Sensing | Matting | VSA | Video Object Segmentation

Main Results

Total-Text

Backbone External Data Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Res-50 Synth150K 93.9 82.1 87.6 78.8 86.2 OneDrive
Res-50 Synth150K+MLT17+IC13+IC15 93.1 82.1 87.3 79.7 87.0 OneDrive
Res-50 Synth150K+MLT17+IC13+IC15+TextOCR 93.2 84.6 88.7 $\underline{\text{82.5}}$ $\underline{\text{88.7}}$ OneDrive
Res-101 Synth150K+MLT17+IC13+IC15 93.2 83.5 88.1 80.1 87.1 OneDrive
Swin-T Synth150K+MLT17+IC13+IC15 92.8 83.5 87.9 79.7 87.1 OneDrive
Swin-S Synth150K+MLT17+IC13 +C15 93.7 84.2 88.7 81.3 87.8 OneDrive
ViTAEv2-S Synth150K+MLT17+IC13+IC15 92.6 85.5 $\underline{\text{88.9}}$ 81.8 88.4 OneDrive
ViTAEv2-S Synth150K+MLT17+IC13+IC15+TextOCR 92.9 87.4 90.0 83.6 89.6 OneDrive

ICDAR 2015 (IC15)

Backbone External Data Det-P Det-R Det-F1 E2E-S E2E-W E2E-G Weights
Res-50 Synth150K+Total-Text+MLT17+IC13 92.8 87.4 90.0 86.8 81.9 76.9 OneDrive
Res-50 Synth150K+Total-Text+MLT17+IC13+TextOCR 92.5 87.2 89.8 $\underline{\text{88.0}}$ $\underline{\text{83.5}}$ $\underline{\text{79.1}}$ OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13 93.7 87.3 90.4 87.5 82.8 77.7 OneDrive
ViTAEv2-S Synth150K+Total-Text+MLT17+IC13+TextOCR 92.4 87.9 $\underline{\text{90.1}}$ 88.1 83.9 79.5 OneDrive

CTW1500

Backbone External Data Det-P Det-R Det-F1 E2E-None E2E-Full Weights
Res-50 Synth150K+Total-Text+MLT17+IC13+IC15 93.2 85.0 88.9 64.2 81.4 OneDrive

ICDAR 2019 ReCTS

Backbone External Data Det-P Det-R Det-H 1-NED Weights
Res-50 SynChinese130K+ArT+LSVT 92.6 89.0 90.7 78.3 OneDrive
ViTAEv2-S SynChinese130K+ArT+LSVT 92.6 89.9 91.2 79.6 OneDrive

Installation

git clone https://github.com/maps-as-data/DeepSolo.git
cd DeepSolo
pip install -v .

Citation

If you find DeepSolo helpful, please consider giving this repo a star:star: and citing:

@inproceedings{ye2023deepsolo,
  title={DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19348--19357},
  year={2023}
}

@article{ye2023deepsolo++,
  title={DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
  booktitle={arxiv preprint arXiv:2305.19957},
  year={2023}
}

Acknowledgement

This project is based on Adelaidet. For academic use, this project is licensed under the 2-clause BSD License.

About

Repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.9%
  • Cuda 15.4%
  • C++ 1.7%