Text Recognition Models¶
An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition¶
Introduction¶
[ALGORITHM]
@article{shi2016end,
title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
journal={IEEE transactions on pattern analysis and machine intelligence},
year={2016}
}
Results and Models¶
Train Dataset¶
trainset | instance_num | repeat_num | note |
---|---|---|---|
Syn90k | 8919273 | 1 | synth |
Test Dataset¶
testset | instance_num | note |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular |
CT80 | 288 | irregular |
Results and models¶
methods | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|
methods | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | ||
CRNN | 80.5 | 81.5 | 86.5 | 54.1 | 59.1 | 55.6 | model | log |
NRTR¶
Introduction¶
[ALGORITHM]
@inproceedings{sheng2019nrtr,
title={NRTR: A no-recurrence sequence-to-sequence model for scene text recognition},
author={Sheng, Fenfen and Chen, Zhineng and Xu, Bo},
booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},
pages={781--786},
year={2019},
organization={IEEE}
}
[BACKBONE]
@inproceedings{li2019show,
title={Show, attend and read: A simple and strong baseline for irregular text recognition},
author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
number={01},
pages={8610--8617},
year={2019}
}
Dataset¶
Train Dataset¶
trainset | instance_num | repeat_num | source |
---|---|---|---|
SynthText | 7266686 | 1 | synth |
Syn90k | 8919273 | 1 | synth |
Test Dataset¶
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular |
CT80 | 288 | irregular |
Results and Models¶
Methods | Backbone | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | ||||
NRTR | R31-1/16-1/8 | 93.9 | 90.0 | 93.5 | 74.5 | 78.5 | 86.5 | model | log | |
NRTR | R31-1/8-1/4 | 94.7 | 87.5 | 93.3 | 75.1 | 78.9 | 87.9 | model | log |
Notes:
R31-1/16-1/8
means the height of feature from backbone is 1/16 of input image, where 1/8 for width.R31-1/8-1/4
means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition¶
Introduction¶
[ALGORITHM]
@inproceedings{yue2020robustscanner,
title={RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition},
author={Yue, Xiaoyu and Kuang, Zhanghui and Lin, Chenhao and Sun, Hongbin and Zhang, Wayne},
booktitle={European Conference on Computer Vision},
year={2020}
}
Dataset¶
Train Dataset¶
trainset | instance_num | repeat_num | source |
---|---|---|---|
icdar_2011 | 3567 | 20 | real |
icdar_2013 | 848 | 20 | real |
icdar2015 | 4468 | 20 | real |
coco_text | 42142 | 20 | real |
IIIT5K | 2000 | 20 | real |
SynthText | 2400000 | 1 | synth |
SynthAdd | 1216889 | 1 | synth, 1.6m in [1] |
Syn90k | 2400000 | 1 | synth |
Test Dataset¶
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular, 639 in [1] |
CT80 | 288 | irregular |
Results and Models¶
Methods | GPUs | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | ||||
RobustScanner | 16 | 95.1 | 89.2 | 93.1 | 77.8 | 80.3 | 90.3 | model | log |
References¶
[1] Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition¶
Introduction¶
[ALGORITHM]
@inproceedings{li2019show,
title={Show, attend and read: A simple and strong baseline for irregular text recognition},
author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
number={01},
pages={8610--8617},
year={2019}
}
Dataset¶
Train Dataset¶
trainset | instance_num | repeat_num | source |
---|---|---|---|
icdar_2011 | 3567 | 20 | real |
icdar_2013 | 848 | 20 | real |
icdar2015 | 4468 | 20 | real |
coco_text | 42142 | 20 | real |
IIIT5K | 2000 | 20 | real |
SynthText | 2400000 | 1 | synth |
SynthAdd | 1216889 | 1 | synth, 1.6m in [1] |
Syn90k | 2400000 | 1 | synth |
Test Dataset¶
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular, 639 in [1] |
CT80 | 288 | irregular |
Results and Models¶
Methods | Backbone | Decoder | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |||||
SAR | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | 79.0 | 82.2 | 88.9 | model | log | |
SAR | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | 78.2 | 81.9 | 89.6 | model | log |
Chinese Dataset¶
Results and Models¶
Methods | Backbone | Decoder | download | |
---|---|---|---|---|
SAR | R31-1/8-1/4 | ParallelSARDecoder | model | log | dict |
Notes:
R31-1/8-1/4
means the height of feature from backbone is 1/8 of input image, where 1/4 for width.We did not use beam search during decoding.
We implemented two kinds of decoder. Namely,
ParallelSARDecoder
andSequentialSARDecoder
.ParallelSARDecoder
: Parallel decoding during training withLSTM
layer. It would be faster.SequentialSARDecoder
: Sequential Decoding during training withLSTMCell
. It would be easier to understand.
For train dataset.
We did not construct distinct data groups (20 groups in [1]) to train the model group-by-group since it would render model training too complicated.
Instead, we randomly selected
2.4m
patches fromSyn90k
,2.4m
fromSynthText
and1.2m
fromSynthAdd
, and grouped all data together. See config for details.
We used 48 GPUs with
total_batch_size = 64 * 48
in the experiment above to speedup training, while keeping theinitial lr = 1e-3
unchanged.
References¶
[1] Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
SATRN¶
Introduction¶
[ALGORITHM]
@article{junyeop2019recognizing,
title={On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention},
author={Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee},
year={2019}
}
Dataset¶
Train Dataset¶
trainset | instance_num | repeat_num | source |
---|---|---|---|
SynthText | 7266686 | 1 | synth |
Syn90k | 8919273 | 1 | synth |
Test Dataset¶
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular |
CT80 | 288 | irregular |
Results and Models¶
Methods | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |||
Satrn | 96.1 | 93.5 | 95.7 | 84.1 | 88.5 | 90.3 | model | log | |
Satrn_small | 94.7 | 91.3 | 95.4 | 81.9 | 85.9 | 86.5 | model | log |
SegOCR Simple Baseline.¶
Introduction¶
[ALGORITHM]
@unpublished{key,
title={SegOCR Simple Baseline.},
author={},
note={Unpublished Manuscript},
year={2021}
}
Dataset¶
Train Dataset¶
trainset | instance_num | repeat_num | source |
---|---|---|---|
SynthText | 7266686 | 1 | synth |
Test Dataset¶
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
CT80 | 288 | irregular |
Results and Models¶
Backbone | Neck | Head | Regular Text | Irregular Text | download | ||||
---|---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | CT80 | ||||||
R31-1/16 | FPNOCR | 1x | 90.9 | 81.8 | 90.7 | 80.9 | model | log |
Notes:
R31-1/16
means the size (both height and width ) of feature from backbone is 1/16 of input image.1x
means the size (both height and width) of feature from head is the same with input image.
CRNN with TPS based STN¶
Introduction¶
[ALGORITHM]
@article{shi2016end,
title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
journal={IEEE transactions on pattern analysis and machine intelligence},
year={2016}
}
[PREPROCESSOR]
@article{shi2016robust,
title={Robust Scene Text Recognition with Automatic Rectification},
author={Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao,
Cong and Bai, Xiang},
year={2016}
}
Results and Models¶
Train Dataset¶
trainset | instance_num | repeat_num | note |
---|---|---|---|
Syn90k | 8919273 | 1 | synth |
Test Dataset¶
testset | instance_num | note |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular |
CT80 | 288 | irregular |
Results and models¶
methods | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |||
CRNN-STN | 80.8 | 81.3 | 85.0 | 59.6 | 68.1 | 53.8 | model | log |