News | Main Results | Installation | Citation | Acknowledgement
This is a fork of the official repo for the papers:
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting
2024.04.04
Repo forked from main repo. This fork may not reflect changes in the main repo from this point.
2023.06.2
Update the pre-trained and fine-tuned Chinese scene text spotting model (78.3% 1-NED on ICDAR 2019 ReCTS).
2023.05.31
The extension paper (DeepSolo++) is submitted to ArXiv. The code and models will be released soon.
2023.02.28
DeepSolo is accepted by CVPR 2023. 🎉🎉
Relevant Project:
✨ Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Code
✨ GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching | Code
DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Code
Other applications of ViTAE inlcude: ViTPose | Remote Sensing | Matting | VSA | Video Object Segmentation
Total-Text
Backbone | External Data | Det-P | Det-R | Det-F1 | E2E-None | E2E-Full | Weights |
---|---|---|---|---|---|---|---|
Res-50 | Synth150K | 93.9 | 82.1 | 87.6 | 78.8 | 86.2 | OneDrive |
Res-50 | Synth150K+MLT17+IC13+IC15 | 93.1 | 82.1 | 87.3 | 79.7 | 87.0 | OneDrive |
Res-50 | Synth150K+MLT17+IC13+IC15+TextOCR | 93.2 | 84.6 | 88.7 | OneDrive | ||
Res-101 | Synth150K+MLT17+IC13+IC15 | 93.2 | 83.5 | 88.1 | 80.1 | 87.1 | OneDrive |
Swin-T | Synth150K+MLT17+IC13+IC15 | 92.8 | 83.5 | 87.9 | 79.7 | 87.1 | OneDrive |
Swin-S | Synth150K+MLT17+IC13 +C15 | 93.7 | 84.2 | 88.7 | 81.3 | 87.8 | OneDrive |
ViTAEv2-S | Synth150K+MLT17+IC13+IC15 | 92.6 | 85.5 | 81.8 | 88.4 | OneDrive | |
ViTAEv2-S | Synth150K+MLT17+IC13+IC15+TextOCR | 92.9 | 87.4 | 90.0 | 83.6 | 89.6 | OneDrive |
ICDAR 2015 (IC15)
Backbone | External Data | Det-P | Det-R | Det-F1 | E2E-S | E2E-W | E2E-G | Weights |
---|---|---|---|---|---|---|---|---|
Res-50 | Synth150K+Total-Text+MLT17+IC13 | 92.8 | 87.4 | 90.0 | 86.8 | 81.9 | 76.9 | OneDrive |
Res-50 | Synth150K+Total-Text+MLT17+IC13+TextOCR | 92.5 | 87.2 | 89.8 | OneDrive | |||
ViTAEv2-S | Synth150K+Total-Text+MLT17+IC13 | 93.7 | 87.3 | 90.4 | 87.5 | 82.8 | 77.7 | OneDrive |
ViTAEv2-S | Synth150K+Total-Text+MLT17+IC13+TextOCR | 92.4 | 87.9 | 88.1 | 83.9 | 79.5 | OneDrive |
CTW1500
Backbone | External Data | Det-P | Det-R | Det-F1 | E2E-None | E2E-Full | Weights |
---|---|---|---|---|---|---|---|
Res-50 | Synth150K+Total-Text+MLT17+IC13+IC15 | 93.2 | 85.0 | 88.9 | 64.2 | 81.4 | OneDrive |
ICDAR 2019 ReCTS
Backbone | External Data | Det-P | Det-R | Det-H | 1-NED | Weights |
---|---|---|---|---|---|---|
Res-50 | SynChinese130K+ArT+LSVT | 92.6 | 89.0 | 90.7 | 78.3 | OneDrive |
ViTAEv2-S | SynChinese130K+ArT+LSVT | 92.6 | 89.9 | 91.2 | 79.6 | OneDrive |
git clone https://github.com/maps-as-data/DeepSolo.git
cd DeepSolo
pip install -v .
If you find DeepSolo helpful, please consider giving this repo a star:star: and citing:
@inproceedings{ye2023deepsolo,
title={DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting},
author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19348--19357},
year={2023}
}
@article{ye2023deepsolo++,
title={DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting},
author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
booktitle={arxiv preprint arXiv:2305.19957},
year={2023}
}
This project is based on Adelaidet. For academic use, this project is licensed under the 2-clause BSD License.