This code repository contains an implementation of (SPIN: SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition (AAAI2021) ). Chromatic diffculties in complex scenes have not been paid much attention on. We introduce a new learnable geometric-unrelated module,the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network.
Dataset | Samples | Description | Release |
---|---|---|---|
MJSynth | 8919257 | Scene text recognition synthetic data set | Link |
SynText | 7266164 | A synthesized by scene text dataset, and the text is cropped from the large image | Link |
Test Set | Instance Number | Note |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC03_860 | 860 | regular |
IC13_857 | 857 | regular |
IC15_1811 | 1811 | irregular |
SVTP | 645 | irregular |
CUTE80 | 288 | irregular |
Test Set | Instance Number | Note |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC03_860 | 860 | regular |
IC13_857 | 857 | regular |
IC15_1811 | 1811 | irregular |
SVTP | 645 | irregular |
CUTE80 | 288 | irregular |
A quick start is to use above lmdb-formatted datasets that contain the full benchmarks for scene text recognition tasks as belows.
Data Type: LMDB
File storage format:
|-- train
| |-- MJ
| |-- ST
|-- validation
| |-- mixture
|-- evaluation
| |-- mixture
Run the following bash command in the command line,
cd .
bash ./train.sh
We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add
--no-validate
command.
Run following scripts to compare different rectification modules.
cd .
bash ./test_scripts/test_affine.sh
cd .
bash ./test_scripts/test_tps.sh
cd .
bash ./test_scripts/test_spin.sh
cd .
bash ./test_scripts/test_gaspin.sh
Methods | Regular Text | Irregular Text | Download | ||||||
Name | IIIT5K | SVT | IC03 | IC13 | IC15 | SVTP | CUTE80 | Config | Model |
Affine(Report) | - | - | - | - | - | - | - | - |
- |
Affine | 94.0 | 87.9 | 93.6 | 94.4 | 81.2 | 80.9 | 83.0 | pth [Link] (Access Code: 5nr1) |
|
TPS(Report) | 87.9 | 87.5 | 94.9 | 93.6 | 77.6 | 79.2 | 74.0 | - |
- |
TPS | 94.2 | 90.4 | 94.5 | 95.0 | 82.1 | 82.6 | 83.7 | pth [Link] (Acceess Code: 024F) |
|
SPIN(Report) | 94.7 | 87.6 | 93.4 | 91.5 | 79.1 | 79.7 | 85.1 | - |
- |
SPIN | 94.7 | 89.8 | 93.4 | 94.1 | 80.7 | 82.2 | 83.7 | pth [Link] (Code:M45z) |
|
GA-SPIN(Report) | 94.7 | 90.3 | 94.4 | 92.8 | 82.2 | 82.8 | 87.5 | - |
- |
GA-SPIN | 94.6 | 89.0 | 93.3 | 94.2 | 80.7 | 83.0 | 84.7 | pth [Link] (Access Code: 12q3) |
|
Here is the picture for result visualization.
@article{SPIN,
title={SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition},
author={Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Fei Wu},
journal={AAAI},
year={2021}
}
This project is released under the Apache 2.0 license
If there is any suggestion and problem, please feel free to contact the author with [email protected].