DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

News | Main Results | Installation | Citation | Acknowledgement

This is a fork of the official repo for the papers:

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

News

2024.04.04 Repo forked from main repo. This fork may not reflect changes in the main repo from this point.

2023.06.2 Update the pre-trained and fine-tuned Chinese scene text spotting model (78.3% 1-NED on ICDAR 2019 ReCTS).

2023.05.31 The extension paper (DeepSolo++) is submitted to ArXiv. The code and models will be released soon.

2023.02.28 DeepSolo is accepted by CVPR 2023. 🎉🎉

Relevant Project:

✨ Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Code

✨ GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching | Code

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Code

Other applications of ViTAE inlcude: ViTPose | Remote Sensing | Matting | VSA | Video Object Segmentation

Main Results

Total-Text

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-None	E2E-Full	Weights
Res-50	Synth150K	93.9	82.1	87.6	78.8	86.2	OneDrive
Res-50	Synth150K+MLT17+IC13+IC15	93.1	82.1	87.3	79.7	87.0	OneDrive
Res-50	Synth150K+MLT17+IC13+IC15+TextOCR	93.2	84.6	88.7	$\underline{\text{82.5}}$	$\underline{\text{88.7}}$	OneDrive
Res-101	Synth150K+MLT17+IC13+IC15	93.2	83.5	88.1	80.1	87.1	OneDrive
Swin-T	Synth150K+MLT17+IC13+IC15	92.8	83.5	87.9	79.7	87.1	OneDrive
Swin-S	Synth150K+MLT17+IC13 +C15	93.7	84.2	88.7	81.3	87.8	OneDrive
ViTAEv2-S	Synth150K+MLT17+IC13+IC15	92.6	85.5	$\underline{\text{88.9}}$	81.8	88.4	OneDrive
ViTAEv2-S	Synth150K+MLT17+IC13+IC15+TextOCR	92.9	87.4	90.0	83.6	89.6	OneDrive

ICDAR 2015 (IC15)

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-S	E2E-W	E2E-G	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13	92.8	87.4	90.0	86.8	81.9	76.9	OneDrive
Res-50	Synth150K+Total-Text+MLT17+IC13+TextOCR	92.5	87.2	89.8	$\underline{\text{88.0}}$	$\underline{\text{83.5}}$	$\underline{\text{79.1}}$	OneDrive
ViTAEv2-S	Synth150K+Total-Text+MLT17+IC13	93.7	87.3	90.4	87.5	82.8	77.7	OneDrive
ViTAEv2-S	Synth150K+Total-Text+MLT17+IC13+TextOCR	92.4	87.9	$\underline{\text{90.1}}$	88.1	83.9	79.5	OneDrive

CTW1500

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-None	E2E-Full	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13+IC15	93.2	85.0	88.9	64.2	81.4	OneDrive

ICDAR 2019 ReCTS

Backbone	External Data	Det-P	Det-R	Det-H	1-NED	Weights
Res-50	SynChinese130K+ArT+LSVT	92.6	89.0	90.7	78.3	OneDrive
ViTAEv2-S	SynChinese130K+ArT+LSVT	92.6	89.9	91.2	79.6	OneDrive

Installation

git clone https://github.com/maps-as-data/DeepSolo.git
cd DeepSolo
pip install -v .

Citation

If you find DeepSolo helpful, please consider giving this repo a star:star: and citing:

@inproceedings{ye2023deepsolo,
  title={DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19348--19357},
  year={2023}
}

@article{ye2023deepsolo++,
  title={DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting},
  author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
  booktitle={arxiv preprint arXiv:2305.19957},
  year={2023}
}

Acknowledgement

This project is based on Adelaidet. For academic use, this project is licensed under the 2-clause BSD License.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
demo		demo
figs		figs
pretrained_backbone		pretrained_backbone
src/deepsolo		src/deepsolo
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

News

Main Results

Installation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

maps-as-data/DeepSolo

Folders and files

Latest commit

History

Repository files navigation

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

News

Main Results

Installation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages