Shortcuts

Overview

Supported Datasets

Dataset Name

Text Detection

Text Recognition

Text Spotting

KIE

cocotextv2

cute80

funsd

icdar2013

icdar2015

iiit5k

naf

sroie

svt

svtp

textocr

totaltext

wildreceipt

Dataset Details

COCO Text v2

“COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images”, arXiv, 2016. PDF

A. Basic Info

  • Official Website: cocotextv2

  • Year: 2016

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: CC BY 4.0

B. Annotation Format


Text Detection/Spotting

{
  "cats": {},
  "anns": {
      "45346": {
          "mask":[468.9,286.7,468.9,295.2,493.0,295.8,493.0,287.2],
          "class":"machine printed",
          "bbox":[468.9,286.7,24.1,9.1],
          "image_id":522579,
          "id":167312,
          "language":"english",
          "area":55.5,
          "utf8_string":"the",
          "legibility":"legible"
      },
      // ...
  },
  "imgs": {
      "522579": {
          "file_name":"COCO_train2014_000000522579.jpg",
          "height":476,
          "width":640,
          "id":522579,
          "set":"train",
      },
      // ...
  },
  "imgToAnns": {
      "522579": [167294, 167295, 167296, 167297, 167298, 167299, 167300, 167301, 167302, 167303, 167304, 167305, 167306, 167307, 167308, 167309, 167310, 167311, 167312, 167313, 167314, 167315, 167316, 167317],
      // ...
  },
  "info": {}
}


C. Reference

@article{veit2016coco, title={Coco-text: Dataset and benchmark for text detection and recognition in natural images}, author={Veit, Andreas and Matera, Tomas and Neumann, Lukas and Matas, Jiri and Belongie, Serge}, journal={arXiv preprint arXiv:1601.07140}, year={2016}}

CUTE80

“A Robust Arbitrary Text Detection System for Natural Scene Images”, ESWA, 2014. PDF

A. Basic Info

  • Official Website: cute80

  • Year: 2014

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textrecog’]

  • License: N/A

B. Annotation Format


Text Recognition

# timage/img_name text 1 text

timage/001.jpg RONALDO 1 RONALDO
timage/002.jpg 7 1 7
timage/003.jpg SEACREST 1 SEACREST
timage/004.jpg BEACH 1 BEACH


C. Reference

@article{risnumawan2014robust, title={A robust arbitrary text detection system for natural scene images}, author={Risnumawan, Anhar and Shivakumara, Palaiahankote and Chan, Chee Seng and Tan, Chew Lim}, journal={Expert Systems with Applications}, volume={41}, number={18}, pages={8027--8048}, year={2014}, publisher={Elsevier}}

FUNSD

“FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”, ICDAR, 2019. PDF

A. Basic Info

  • Official Website: funsd

  • Year: 2019

  • Language: [‘English’]

  • Scene: [‘Document’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: FUNSD License

B. Annotation Format


Text Detection/Recognition/Spotting

{
  "form": [
    {
      "id": 0,
      "text": "Registration No.",
      "box": [
          94,
          169,
          191,
          186
      ],
      "linking": [
          [
              0,
              1
          ]
      ],
      "label": "question",
      "words": [
          {
              "text": "Registration",
              "box": [
                  94,
                  169,
                  168,
                  186
              ]
          },
          {
              "text": "No.",
              "box": [
                  170,
                  169,
                  191,
                  183
              ]
          }
      ]
    },
    {
      "id": 1,
      "text": "533",
      "box": [
          209,
          169,
          236,
          182
      ],
      "label": "answer",
      "words": [
          {
              "box": [
                  209,
                  169,
                  236,
                  182
              ],
              "text": "533"
          }
      ],
      "linking": [
          [
              0,
              1
          ]
      ]
    }
  ]
}


C. Reference

@inproceedings{jaume2019, title = {FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents}, author = {Guillaume Jaume, Hazim Kemal Ekenel, Jean-Philippe Thiran}, booktitle = {Accepted to ICDAR-OST}, year = {2019}}

Incidental Scene Text IC13

“ICDAR 2013 Robust Reading Competition”, ICDAR, 2013. PDF

A. Basic Info

  • Official Website: icdar2013

  • Year: 2013

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: N/A

B. Annotation Format


Text Detection

# train split
# x1 y1 x2 y2 "transcript"

158 128 411 181 "Footpath"
443 128 501 169 "To"
64 200 363 243 "Colchester"

# test split
# x1, y1, x2, y2, "transcript"

38, 43, 920, 215, "Tiredness"
275, 264, 665, 450, "kills"
0, 699, 77, 830, "A"

Text Recognition

# img_name, "text"

word_1.png, "PROPER"
word_2.png, "FOOD"
word_3.png, "PRONTO"


C. Reference

@inproceedings{karatzas2013icdar, title={ICDAR 2013 robust reading competition}, author={Karatzas, Dimosthenis and Shafait, Faisal and Uchida, Seiichi and Iwamura, Masakazu and i Bigorda, Lluis Gomez and Mestre, Sergi Robles and Mas, Joan and Mota, David Fernandez and Almazan, Jon Almazan and De Las Heras, Lluis Pere}, booktitle={2013 12th international conference on document analysis and recognition}, pages={1484--1493}, year={2013}, organization={IEEE}}

Incidental Scene Text IC15

“ICDAR 2015 Competition on Robust Reading”, ICDAR, 2015. PDF

A. Basic Info

  • Official Website: icdar2015

  • Year: 2015

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: CC BY 4.0

B. Annotation Format


Text Detection

# x1,y1,x2,y2,x3,y3,x4,y4,trans

377,117,463,117,465,130,378,130,Genaxis Theatre
493,115,519,115,519,131,493,131,[06]
374,155,409,155,409,170,374,170,###

Text Recognition

# img_name, "text"

word_1.png, "Genaxis Theatre"
word_2.png, "[06]"
word_3.png, "62-03"


C. Reference

@inproceedings{karatzas2015icdar, title={ICDAR 2015 competition on robust reading}, author={Karatzas, Dimosthenis and Gomez-Bigorda, Lluis and Nicolaou, Anguelos and Ghosh, Suman and Bagdanov, Andrew and Iwamura, Masakazu and Matas, Jiri and Neumann, Lukas and Chandrasekhar, Vijay Ramaseshan and Lu, Shijian and others}, booktitle={2015 13th international conference on document analysis and recognition (ICDAR)}, pages={1156--1160}, year={2015}, organization={IEEE}}

IIIT5K

“Scene Text Recognition using Higher Order Language Priors”, BMVC, 2012. PDF

A. Basic Info

  • Official Website: iiit5k

  • Year: 2012

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textrecog’]

  • License: N/A

B. Annotation Format


Text Recognition

# img_name, "text"

train/1009_2.png You
train/1017_1.png Rescue
train/1017_2.png mission


C. Reference

@InProceedings{MishraBMVC12, author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.", title     = "Scene Text Recognition using Higher Order Language Priors", booktitle = "BMVC", year      = "2012"}

NAF

“Deep Visual Template-Free Form Parsing”, ICDAR, 2019. PDF

A. Basic Info

  • Official Website: naf

  • Year: 2019

  • Language: [‘English’]

  • Scene: [‘Document’, ‘Handwritten’]

  • Annotation Granularity: [‘Word’, ‘Line’]

  • Supported Tasks: [‘textrecog’, ‘textdet’, ‘textspotting’]

  • License: CDLA

B. Annotation Format


Text Detection/Recognition/Spotting

{"fieldBBs": [{"poly_points": [[435, 1406], [466, 1406], [466, 1439], [435, 1439]], "type": "fieldCheckBox", "id": "f0", "isBlank": 1}, {"poly_points": [[435, 1444], [469, 1444], [469, 1478], [435, 1478]], "type": "fieldCheckBox", "id": "f1", "isBlank": 1}],
 "textBBs": [{"poly_points": [[1183, 1337], [2028, 1345], [2032, 1395], [1186, 1398]], "type": "text", "id": "t0"}, {"poly_points": [[492, 1336], [809, 1338], [809, 1379], [492, 1378]], "type": "text", "id": "t1"}, {"poly_points": [[512, 1375], [798, 1376], [798, 1405], [512, 1404]], "type": "textInst", "id": "t2"}], "imageFilename": "007182398_00026.jpg", "transcriptions": {"f0": "\u00bf\u00bf\u00bf \u00bf\u00bf\u00bf 18/1/49 \u00bf\u00bf\u00bf\u00bf\u00bf", "f1": "U.S. Navy 53rd. Naval Const. Batt.", "t0": "APPLICATION FOR HEADSTONE OR MARKER", "t1": "ORIGINAL"}}


C. Reference

@inproceedings{davis2019deep, title={Deep visual template-free form parsing}, author={Davis, Brian and Morse, Bryan and Cohen, Scott and Price, Brian and Tensmeyer, Chris}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, pages={134--141}, year={2019}, organization={IEEE}}

Scanned Receipts OCR and Information Extraction

“ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”, ICDAR, 2019. PDF

A. Basic Info

  • Official Website: sroie

  • Year: 2019

  • Language: [‘English’]

  • Scene: [‘Document’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: CC BY 4.0

B. Annotation Format


Text Detection, Text Recognition and Text Spotting

# x1,y1,x2,y2,x3,y3,x4,y4,trans

72,25,326,25,326,64,72,64,TAN WOON YANN
50,82,440,82,440,121,50,121,BOOK TA .K(TAMAN DAYA) SDN BND
205,121,285,121,285,139,205,139,789417-W


C. Reference

@INPROCEEDINGS{8977955, author={Huang, Zheng and Chen, Kai and He, Jianhua and Bai, Xiang and Karatzas, Dimosthenis and Lu, Shijian and Jawahar, C. V.}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, title={ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction}, year={2019}, volume={}, number={}, pages={1516-1520}, doi={10.1109/ICDAR.2019.00244}}

Street View Text Dataset (SVT)

“Word Spotting in the Wild”, ECCV, 2010. PDF

A. Basic Info

  • Official Website: svt

  • Year: 2010

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: N/A

B. Annotation Format


Text Detection/Recognition/Spotting

<image>
  <imageName>img/14_03.jpg</imageName>
  <address>341 Southwest 10th Avenue Portland OR</address>
  <lex>
  LIVING,ROOM,THEATERS,KENNY,ZUKE,DELICATESSEN,CLYDE,COMMON,ACE,HOTEL,PORTLAND,ROSE,CITY,BOOKS,STUMPTOWN,COFFEE,ROASTERS,RED,CAP,GARAGE,FISH,GROTTO,SEAFOOD,RESTAURANT,AURA,RESTAURANT,LOUNGE,ROCCO,PIZZA,PASTA,BUFFALO,EXCHANGE,MARK,SPENCER,LIGHT,FEZ,BALLROOM,READING,FRENZY,ROXY,SCANDALS,MARTINOTTI,CAFE,DELI,CROWSENBERG,HALF
  </lex>
  <Resolution x="1280" y="880"/>
  <taggedRectangles>
    <taggedRectangle height="75" width="236" x="375" y="253">
      <tag>LIVING</tag>
    </taggedRectangle>
    <taggedRectangle height="76" width="175" x="639" y="272">
      <tag>ROOM</tag>
    </taggedRectangle>
    <taggedRectangle height="87" width="281" x="839" y="283">
      <tag>THEATERS</tag>
    </taggedRectangle>
  </taggedRectangles>
</image>


C. Reference

@inproceedings{wang2010word, title={Word spotting in the wild}, author={Wang, Kai and Belongie, Serge}, booktitle={European conference on computer vision}, pages={591--604}, year={2010}, organization={Springer}}

Street View Text Perspective (SVT-P)

“Recognizing Text with Perspective Distortion in Natural Scenes”, ICCV, 2013. PDF

A. Basic Info

  • Official Website: svtp

  • Year: 2013

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textrecog’]

  • License: N/A

B. Annotation Format


Text Recognition

13_15_0_par.jpg WYNDHAM
13_15_1_par.jpg HOTEL
12_16_0_par.jpg UNITED


C. Reference

@inproceedings{phan2013recognizing, title={Recognizing text with perspective distortion in natural scenes}, author={Phan, Trung Quy and Shivakumara, Palaiahnakote and Tian, Shangxuan and Tan, Chew Lim}, booktitle={Proceedings of the IEEE International Conference on Computer Vision}, pages={569--576}, year={2013}}

Text OCR

“TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text”, CVPR, 2021. PDF

A. Basic Info

  • Official Website: textocr

  • Year: 2021

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: CC BY 4.0

B. Annotation Format


Text Detection/Recognition/Spotting

{
  "imgs": {
    "OpenImages_ImageID_1": {
      "id": "OpenImages_ImageID_1",
      "width": "INT, Width of the image",
      "height": "INT, Height of the image",
      "set": "Split train|val|test",
      "filename": "train|test/OpenImages_ImageID_1.jpg"
    },
    "OpenImages_ImageID_2": {
      "...": "..."
    }
  },
  "anns": {
    "OpenImages_ImageID_1_1": {
      "id": "STR, OpenImages_ImageID_1_1, Specifies the nth annotation for an image",
      "image_id": "OpenImages_ImageID_1",
      "bbox": [
        "FLOAT x1",
        "FLOAT y1",
        "FLOAT x2",
        "FLOAT y2"
      ],
      "points": [
        "FLOAT x1",
        "FLOAT y1",
        "FLOAT x2",
        "FLOAT y2",
        "...",
        "FLOAT xN",
        "FLOAT yN"
      ],
      "utf8_string": "text for this annotation",
      "area": "FLOAT, area of this box"
    },
    "OpenImages_ImageID_1_2": {
      "...": "..."
    },
    "OpenImages_ImageID_2_1": {
      "...": "..."
    }
  },
  "img2Anns": {
    "OpenImages_ImageID_1": [
      "OpenImages_ImageID_1_1",
      "OpenImages_ImageID_1_2",
      "OpenImages_ImageID_1_2"
    ],
    "OpenImages_ImageID_N": [
      "..."
    ]
  }
}


C. Reference

@inproceedings{singh2021textocr, title={{TextOCR}: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text}, author={Singh, Amanpreet and Pang, Guan and Toh, Mandy and Huang, Jing and Galuba, Wojciech and Hassner, Tal}, journal={The Conference on Computer Vision and Pattern Recognition}, year={2021}}

Total Text

“Total-Text: Towards Orientation Robustness in Scene Text Detection”, IJDAR, 2020. PDF

A. Basic Info

  • Official Website: totaltext

  • Year: 2020

  • Language: [‘English’]

  • Scene: [‘Natural Scene’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: BSD-3

B. Annotation Format


Text Detection/Spotting

x: [[259 313 389 427 354 302]], y: [[542 462 417 459 507 582]], ornt: [u'c'], transcriptions: [u'PAUL']
x: [[400 478 494 436]], y: [[398 380 448 465]], ornt: [u'#'], transcriptions: [u'#']


C. Reference

@article{CK2019, author = {Chee Kheng Chng and Chee Seng Chan and Chenglin Liu}, title = {Total-Text: Towards Orientation Robustness in Scene Text Detection}, journal = {International Journal on Document Analysis and Recognition (IJDAR)}, volume = {23}, pages = {31-52}, year = {2020}, doi = {10.1007/s10032-019-00334-z}}

WildReceipt

“Spatial Dual-Modality Graph Reasoning for Key Information Extraction”, arXiv, 2021. PDF

A. Basic Info

  • Official Website: wildreceipt

  • Year: 2021

  • Language: [‘English’]

  • Scene: [‘Receipt’]

  • Annotation Granularity: [‘Word’]

  • Supported Tasks: [‘kie’, ‘textdet’, ‘textrecog’, ‘textspotting’]

  • License: N/A

B. Annotation Format


KIE

// Close Set
{
  "file_name": "image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg",
  "height": 1200,
  "width": 1600,
  "annotations":
    [
      {
        "box": [550.0, 190.0, 937.0, 190.0, 937.0, 104.0, 550.0, 104.0],
        "text": "SAFEWAY",
        "label": 1
      },
      {
        "box": [1048.0, 211.0, 1074.0, 211.0, 1074.0, 196.0, 1048.0, 196.0],
        "text": "TM",
        "label": 25
      }
    ], //...
}

// Open Set
{
  "file_name": "image_files/Image_12/10/845be0dd6f5b04866a2042abd28d558032ef2576.jpeg",
  "height": 348,
  "width": 348,
  "annotations":
    [
      {
        "box": [114.0, 19.0, 230.0, 19.0, 230.0, 1.0, 114.0, 1.0],
        "text": "CHOEUN",
        "label": 2,
        "edge": 1
      },
      {
        "box": [97.0, 35.0, 236.0, 35.0, 236.0, 19.0, 97.0, 19.0],
        "text": "KOREANRESTAURANT",
        "label": 2,
        "edge": 1
      }
    ]
}


C. Reference

@article{sun2021spatial, title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction}, author={Sun, Hongbin and Kuang, Zhanghui and Yue, Xiaoyu and Lin, Chenhao and Zhang, Wayne}, journal={arXiv preprint arXiv:2103.14470}, year={2021} } 
Read the Docs v: dev-1.x
Versions
latest
stable
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
v0.4.1
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.