Overview¶
Supported Datasets¶
Dataset Name |
Text Detection |
Text Recognition |
Text Spotting |
KIE |
---|---|---|---|---|
✓ |
✓ |
✓ |
||
✓ |
||||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
||
✓ |
||||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
||
✓ |
||||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
||
✓ |
✓ |
✓ |
✓ |
Dataset Details¶
COCO Text v2¶
“COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images”, arXiv, 2016. PDF
A. Basic Info
Official Website: cocotextv2
Year: 2016
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0
B. Annotation Format
Text Detection/Spotting
{
"cats": {},
"anns": {
"45346": {
"mask":[468.9,286.7,468.9,295.2,493.0,295.8,493.0,287.2],
"class":"machine printed",
"bbox":[468.9,286.7,24.1,9.1],
"image_id":522579,
"id":167312,
"language":"english",
"area":55.5,
"utf8_string":"the",
"legibility":"legible"
},
// ...
},
"imgs": {
"522579": {
"file_name":"COCO_train2014_000000522579.jpg",
"height":476,
"width":640,
"id":522579,
"set":"train",
},
// ...
},
"imgToAnns": {
"522579": [167294, 167295, 167296, 167297, 167298, 167299, 167300, 167301, 167302, 167303, 167304, 167305, 167306, 167307, 167308, 167309, 167310, 167311, 167312, 167313, 167314, 167315, 167316, 167317],
// ...
},
"info": {}
}
C. Reference
@article{veit2016coco, title={Coco-text: Dataset and benchmark for text detection and recognition in natural images}, author={Veit, Andreas and Matera, Tomas and Neumann, Lukas and Matas, Jiri and Belongie, Serge}, journal={arXiv preprint arXiv:1601.07140}, year={2016}}
CUTE80¶
“A Robust Arbitrary Text Detection System for Natural Scene Images”, ESWA, 2014. PDF
A. Basic Info
Official Website: cute80
Year: 2014
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A
B. Annotation Format
Text Recognition
# timage/img_name text 1 text
timage/001.jpg RONALDO 1 RONALDO
timage/002.jpg 7 1 7
timage/003.jpg SEACREST 1 SEACREST
timage/004.jpg BEACH 1 BEACH
C. Reference
@article{risnumawan2014robust, title={A robust arbitrary text detection system for natural scene images}, author={Risnumawan, Anhar and Shivakumara, Palaiahankote and Chan, Chee Seng and Tan, Chew Lim}, journal={Expert Systems with Applications}, volume={41}, number={18}, pages={8027--8048}, year={2014}, publisher={Elsevier}}
FUNSD¶
“FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”, ICDAR, 2019. PDF
A. Basic Info
Official Website: funsd
Year: 2019
Language: [‘English’]
Scene: [‘Document’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: FUNSD License
B. Annotation Format
Text Detection/Recognition/Spotting
{
"form": [
{
"id": 0,
"text": "Registration No.",
"box": [
94,
169,
191,
186
],
"linking": [
[
0,
1
]
],
"label": "question",
"words": [
{
"text": "Registration",
"box": [
94,
169,
168,
186
]
},
{
"text": "No.",
"box": [
170,
169,
191,
183
]
}
]
},
{
"id": 1,
"text": "533",
"box": [
209,
169,
236,
182
],
"label": "answer",
"words": [
{
"box": [
209,
169,
236,
182
],
"text": "533"
}
],
"linking": [
[
0,
1
]
]
}
]
}
C. Reference
@inproceedings{jaume2019, title = {FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents}, author = {Guillaume Jaume, Hazim Kemal Ekenel, Jean-Philippe Thiran}, booktitle = {Accepted to ICDAR-OST}, year = {2019}}
Incidental Scene Text IC13¶
“ICDAR 2013 Robust Reading Competition”, ICDAR, 2013. PDF
A. Basic Info
Official Website: icdar2013
Year: 2013
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A
B. Annotation Format
Text Detection
# train split
# x1 y1 x2 y2 "transcript"
158 128 411 181 "Footpath"
443 128 501 169 "To"
64 200 363 243 "Colchester"
# test split
# x1, y1, x2, y2, "transcript"
38, 43, 920, 215, "Tiredness"
275, 264, 665, 450, "kills"
0, 699, 77, 830, "A"
Text Recognition
# img_name, "text"
word_1.png, "PROPER"
word_2.png, "FOOD"
word_3.png, "PRONTO"
C. Reference
@inproceedings{karatzas2013icdar, title={ICDAR 2013 robust reading competition}, author={Karatzas, Dimosthenis and Shafait, Faisal and Uchida, Seiichi and Iwamura, Masakazu and i Bigorda, Lluis Gomez and Mestre, Sergi Robles and Mas, Joan and Mota, David Fernandez and Almazan, Jon Almazan and De Las Heras, Lluis Pere}, booktitle={2013 12th international conference on document analysis and recognition}, pages={1484--1493}, year={2013}, organization={IEEE}}
Incidental Scene Text IC15¶
“ICDAR 2015 Competition on Robust Reading”, ICDAR, 2015. PDF
A. Basic Info
Official Website: icdar2015
Year: 2015
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0
B. Annotation Format
Text Detection
# x1,y1,x2,y2,x3,y3,x4,y4,trans
377,117,463,117,465,130,378,130,Genaxis Theatre
493,115,519,115,519,131,493,131,[06]
374,155,409,155,409,170,374,170,###
Text Recognition
# img_name, "text"
word_1.png, "Genaxis Theatre"
word_2.png, "[06]"
word_3.png, "62-03"
C. Reference
@inproceedings{karatzas2015icdar, title={ICDAR 2015 competition on robust reading}, author={Karatzas, Dimosthenis and Gomez-Bigorda, Lluis and Nicolaou, Anguelos and Ghosh, Suman and Bagdanov, Andrew and Iwamura, Masakazu and Matas, Jiri and Neumann, Lukas and Chandrasekhar, Vijay Ramaseshan and Lu, Shijian and others}, booktitle={2015 13th international conference on document analysis and recognition (ICDAR)}, pages={1156--1160}, year={2015}, organization={IEEE}}
IIIT5K¶
“Scene Text Recognition using Higher Order Language Priors”, BMVC, 2012. PDF
A. Basic Info
Official Website: iiit5k
Year: 2012
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A
B. Annotation Format
Text Recognition
# img_name, "text"
train/1009_2.png You
train/1017_1.png Rescue
train/1017_2.png mission
C. Reference
@InProceedings{MishraBMVC12, author = "Mishra, A. and Alahari, K. and Jawahar, C.~V.", title = "Scene Text Recognition using Higher Order Language Priors", booktitle = "BMVC", year = "2012"}
NAF¶
“Deep Visual Template-Free Form Parsing”, ICDAR, 2019. PDF
A. Basic Info
Official Website: naf
Year: 2019
Language: [‘English’]
Scene: [‘Document’, ‘Handwritten’]
Annotation Granularity: [‘Word’, ‘Line’]
Supported Tasks: [‘textrecog’, ‘textdet’, ‘textspotting’]
License: CDLA
B. Annotation Format
Text Detection/Recognition/Spotting
{"fieldBBs": [{"poly_points": [[435, 1406], [466, 1406], [466, 1439], [435, 1439]], "type": "fieldCheckBox", "id": "f0", "isBlank": 1}, {"poly_points": [[435, 1444], [469, 1444], [469, 1478], [435, 1478]], "type": "fieldCheckBox", "id": "f1", "isBlank": 1}],
"textBBs": [{"poly_points": [[1183, 1337], [2028, 1345], [2032, 1395], [1186, 1398]], "type": "text", "id": "t0"}, {"poly_points": [[492, 1336], [809, 1338], [809, 1379], [492, 1378]], "type": "text", "id": "t1"}, {"poly_points": [[512, 1375], [798, 1376], [798, 1405], [512, 1404]], "type": "textInst", "id": "t2"}], "imageFilename": "007182398_00026.jpg", "transcriptions": {"f0": "\u00bf\u00bf\u00bf \u00bf\u00bf\u00bf 18/1/49 \u00bf\u00bf\u00bf\u00bf\u00bf", "f1": "U.S. Navy 53rd. Naval Const. Batt.", "t0": "APPLICATION FOR HEADSTONE OR MARKER", "t1": "ORIGINAL"}}
C. Reference
@inproceedings{davis2019deep, title={Deep visual template-free form parsing}, author={Davis, Brian and Morse, Bryan and Cohen, Scott and Price, Brian and Tensmeyer, Chris}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, pages={134--141}, year={2019}, organization={IEEE}}
Scanned Receipts OCR and Information Extraction¶
“ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”, ICDAR, 2019. PDF
A. Basic Info
Official Website: sroie
Year: 2019
Language: [‘English’]
Scene: [‘Document’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0
B. Annotation Format
Text Detection, Text Recognition and Text Spotting
# x1,y1,x2,y2,x3,y3,x4,y4,trans
72,25,326,25,326,64,72,64,TAN WOON YANN
50,82,440,82,440,121,50,121,BOOK TA .K(TAMAN DAYA) SDN BND
205,121,285,121,285,139,205,139,789417-W
C. Reference
@INPROCEEDINGS{8977955, author={Huang, Zheng and Chen, Kai and He, Jianhua and Bai, Xiang and Karatzas, Dimosthenis and Lu, Shijian and Jawahar, C. V.}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, title={ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction}, year={2019}, volume={}, number={}, pages={1516-1520}, doi={10.1109/ICDAR.2019.00244}}
Street View Text Dataset (SVT)¶
“Word Spotting in the Wild”, ECCV, 2010. PDF
A. Basic Info
Official Website: svt
Year: 2010
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A
B. Annotation Format
Text Detection/Recognition/Spotting
<image>
<imageName>img/14_03.jpg</imageName>
<address>341 Southwest 10th Avenue Portland OR</address>
<lex>
LIVING,ROOM,THEATERS,KENNY,ZUKE,DELICATESSEN,CLYDE,COMMON,ACE,HOTEL,PORTLAND,ROSE,CITY,BOOKS,STUMPTOWN,COFFEE,ROASTERS,RED,CAP,GARAGE,FISH,GROTTO,SEAFOOD,RESTAURANT,AURA,RESTAURANT,LOUNGE,ROCCO,PIZZA,PASTA,BUFFALO,EXCHANGE,MARK,SPENCER,LIGHT,FEZ,BALLROOM,READING,FRENZY,ROXY,SCANDALS,MARTINOTTI,CAFE,DELI,CROWSENBERG,HALF
</lex>
<Resolution x="1280" y="880"/>
<taggedRectangles>
<taggedRectangle height="75" width="236" x="375" y="253">
<tag>LIVING</tag>
</taggedRectangle>
<taggedRectangle height="76" width="175" x="639" y="272">
<tag>ROOM</tag>
</taggedRectangle>
<taggedRectangle height="87" width="281" x="839" y="283">
<tag>THEATERS</tag>
</taggedRectangle>
</taggedRectangles>
</image>
C. Reference
@inproceedings{wang2010word, title={Word spotting in the wild}, author={Wang, Kai and Belongie, Serge}, booktitle={European conference on computer vision}, pages={591--604}, year={2010}, organization={Springer}}
Street View Text Perspective (SVT-P)¶
“Recognizing Text with Perspective Distortion in Natural Scenes”, ICCV, 2013. PDF
A. Basic Info
Official Website: svtp
Year: 2013
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A
B. Annotation Format
Text Recognition
13_15_0_par.jpg WYNDHAM
13_15_1_par.jpg HOTEL
12_16_0_par.jpg UNITED
C. Reference
@inproceedings{phan2013recognizing, title={Recognizing text with perspective distortion in natural scenes}, author={Phan, Trung Quy and Shivakumara, Palaiahnakote and Tian, Shangxuan and Tan, Chew Lim}, booktitle={Proceedings of the IEEE International Conference on Computer Vision}, pages={569--576}, year={2013}}
Text OCR¶
“TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text”, CVPR, 2021. PDF
A. Basic Info
Official Website: textocr
Year: 2021
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0
B. Annotation Format
Text Detection/Recognition/Spotting
{
"imgs": {
"OpenImages_ImageID_1": {
"id": "OpenImages_ImageID_1",
"width": "INT, Width of the image",
"height": "INT, Height of the image",
"set": "Split train|val|test",
"filename": "train|test/OpenImages_ImageID_1.jpg"
},
"OpenImages_ImageID_2": {
"...": "..."
}
},
"anns": {
"OpenImages_ImageID_1_1": {
"id": "STR, OpenImages_ImageID_1_1, Specifies the nth annotation for an image",
"image_id": "OpenImages_ImageID_1",
"bbox": [
"FLOAT x1",
"FLOAT y1",
"FLOAT x2",
"FLOAT y2"
],
"points": [
"FLOAT x1",
"FLOAT y1",
"FLOAT x2",
"FLOAT y2",
"...",
"FLOAT xN",
"FLOAT yN"
],
"utf8_string": "text for this annotation",
"area": "FLOAT, area of this box"
},
"OpenImages_ImageID_1_2": {
"...": "..."
},
"OpenImages_ImageID_2_1": {
"...": "..."
}
},
"img2Anns": {
"OpenImages_ImageID_1": [
"OpenImages_ImageID_1_1",
"OpenImages_ImageID_1_2",
"OpenImages_ImageID_1_2"
],
"OpenImages_ImageID_N": [
"..."
]
}
}
C. Reference
@inproceedings{singh2021textocr, title={{TextOCR}: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text}, author={Singh, Amanpreet and Pang, Guan and Toh, Mandy and Huang, Jing and Galuba, Wojciech and Hassner, Tal}, journal={The Conference on Computer Vision and Pattern Recognition}, year={2021}}
Total Text¶
“Total-Text: Towards Orientation Robustness in Scene Text Detection”, IJDAR, 2020. PDF
A. Basic Info
Official Website: totaltext
Year: 2020
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: BSD-3
B. Annotation Format
Text Detection/Spotting
x: [[259 313 389 427 354 302]], y: [[542 462 417 459 507 582]], ornt: [u'c'], transcriptions: [u'PAUL']
x: [[400 478 494 436]], y: [[398 380 448 465]], ornt: [u'#'], transcriptions: [u'#']
C. Reference
@article{CK2019, author = {Chee Kheng Chng and Chee Seng Chan and Chenglin Liu}, title = {Total-Text: Towards Orientation Robustness in Scene Text Detection}, journal = {International Journal on Document Analysis and Recognition (IJDAR)}, volume = {23}, pages = {31-52}, year = {2020}, doi = {10.1007/s10032-019-00334-z}}
WildReceipt¶
“Spatial Dual-Modality Graph Reasoning for Key Information Extraction”, arXiv, 2021. PDF
A. Basic Info
Official Website: wildreceipt
Year: 2021
Language: [‘English’]
Scene: [‘Receipt’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘kie’, ‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A
B. Annotation Format
KIE
// Close Set
{
"file_name": "image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg",
"height": 1200,
"width": 1600,
"annotations":
[
{
"box": [550.0, 190.0, 937.0, 190.0, 937.0, 104.0, 550.0, 104.0],
"text": "SAFEWAY",
"label": 1
},
{
"box": [1048.0, 211.0, 1074.0, 211.0, 1074.0, 196.0, 1048.0, 196.0],
"text": "TM",
"label": 25
}
], //...
}
// Open Set
{
"file_name": "image_files/Image_12/10/845be0dd6f5b04866a2042abd28d558032ef2576.jpeg",
"height": 348,
"width": 348,
"annotations":
[
{
"box": [114.0, 19.0, 230.0, 19.0, 230.0, 1.0, 114.0, 1.0],
"text": "CHOEUN",
"label": 2,
"edge": 1
},
{
"box": [97.0, 35.0, 236.0, 35.0, 236.0, 19.0, 97.0, 19.0],
"text": "KOREANRESTAURANT",
"label": 2,
"edge": 1
}
]
}
C. Reference
@article{sun2021spatial, title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction}, author={Sun, Hongbin and Kuang, Zhanghui and Yue, Xiaoyu and Lin, Chenhao and Zhang, Wayne}, journal={arXiv preprint arXiv:2103.14470}, year={2021} }