Overview¶

Supported Datasets¶

Dataset Name	Text Detection	Text Recognition	Text Spotting	KIE
cocotextv2	✓	✓	✓
ctw1500	✓	✓	✓
cute80		✓
funsd	✓	✓	✓
icdar2013	✓	✓	✓
icdar2015	✓	✓	✓
iiit5k		✓
mjsynth		✓
naf	✓	✓	✓
sroie	✓	✓	✓
svt	✓	✓	✓
svtp		✓
synthtext	✓	✓	✓
textocr	✓	✓	✓
totaltext	✓	✓	✓
wildreceipt	✓	✓	✓	✓

Dataset Details¶

COCO Text v2¶

“COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images”, arXiv, 2016. PDF

A. Basic Info

Official Website: cocotextv2
Year: 2016
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0

B. Annotation Format

Text Detection/Spotting

{
  "cats": {},
  "anns": {
      "45346": {
          "mask":[468.9,286.7,468.9,295.2,493.0,295.8,493.0,287.2],
          "class":"machine printed",
          "bbox":[468.9,286.7,24.1,9.1],
          "image_id":522579,
          "id":167312,
          "language":"english",
          "area":55.5,
          "utf8_string":"the",
          "legibility":"legible"
      },
      // ...
  },
  "imgs": {
      "522579": {
          "file_name":"COCO_train2014_000000522579.jpg",
          "height":476,
          "width":640,
          "id":522579,
          "set":"train",
      },
      // ...
  },
  "imgToAnns": {
      "522579": [167294, 167295, 167296, 167297, 167298, 167299, 167300, 167301, 167302, 167303, 167304, 167305, 167306, 167307, 167308, 167309, 167310, 167311, 167312, 167313, 167314, 167315, 167316, 167317],
      // ...
  },
  "info": {}
}

C. Reference

@article{veit2016coco, title={Coco-text: Dataset and benchmark for text detection and recognition in natural images}, author={Veit, Andreas and Matera, Tomas and Neumann, Lukas and Matas, Jiri and Belongie, Serge}, journal={arXiv preprint arXiv:1601.07140}, year={2016}}

CTW1500¶

“Curved scene text detection via transverse and longitudinal sequence connection”, PR, 2019. PDF

A. Basic Info

Official Website: ctw1500
Year: 2019
Language: [‘English’]
Scene: [‘Scene’]
Annotation Granularity: [‘Word’, ‘Line’]
Supported Tasks: [‘textrecog’, ‘textdet’, ‘textspotting’]
License: N/A

B. Annotation Format

C. Reference

@article{liu2019curved, title={Curved scene text detection via transverse and longitudinal sequence connection}, author={Liu, Yuliang and Jin, Lianwen and Zhang, Shuaitao and Luo, Canjie and Zhang, Sheng}, journal={Pattern Recognition}, volume={90}, pages={337--345}, year={2019}, publisher={Elsevier} }

CUTE80¶

“A Robust Arbitrary Text Detection System for Natural Scene Images”, ESWA, 2014. PDF

A. Basic Info

Official Website: cute80
Year: 2014
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A

B. Annotation Format

Text Recognition

# timage/img_name text 1 text

timage/001.jpg RONALDO 1 RONALDO
timage/002.jpg 7 1 7
timage/003.jpg SEACREST 1 SEACREST
timage/004.jpg BEACH 1 BEACH

C. Reference

@article{risnumawan2014robust, title={A robust arbitrary text detection system for natural scene images}, author={Risnumawan, Anhar and Shivakumara, Palaiahankote and Chan, Chee Seng and Tan, Chew Lim}, journal={Expert Systems with Applications}, volume={41}, number={18}, pages={8027--8048}, year={2014}, publisher={Elsevier}}

FUNSD¶

“FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”, ICDAR, 2019. PDF

A. Basic Info

Official Website: funsd
Year: 2019
Language: [‘English’]
Scene: [‘Document’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: FUNSD License

B. Annotation Format

Text Detection/Recognition/Spotting

{
  "form": [
    {
      "id": 0,
      "text": "Registration No.",
      "box": [
          94,
          169,
          191,
          186
      ],
      "linking": [
          [
              0,
              1
          ]
      ],
      "label": "question",
      "words": [
          {
              "text": "Registration",
              "box": [
                  94,
                  169,
                  168,
                  186
              ]
          },
          {
              "text": "No.",
              "box": [
                  170,
                  169,
                  191,
                  183
              ]
          }
      ]
    },
    {
      "id": 1,
      "text": "533",
      "box": [
          209,
          169,
          236,
          182
      ],
      "label": "answer",
      "words": [
          {
              "box": [
                  209,
                  169,
                  236,
                  182
              ],
              "text": "533"
          }
      ],
      "linking": [
          [
              0,
              1
          ]
      ]
    }
  ]
}

C. Reference

@inproceedings{jaume2019, title = {FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents}, author = {Guillaume Jaume, Hazim Kemal Ekenel, Jean-Philippe Thiran}, booktitle = {Accepted to ICDAR-OST}, year = {2019}}

Incidental Scene Text IC13¶

“ICDAR 2013 Robust Reading Competition”, ICDAR, 2013. PDF

A. Basic Info

Official Website: icdar2013
Year: 2013
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A

B. Annotation Format

Text Detection

# train split
# x1 y1 x2 y2 "transcript"

158 128 411 181 "Footpath"
443 128 501 169 "To"
64 200 363 243 "Colchester"

# test split
# x1, y1, x2, y2, "transcript"

38, 43, 920, 215, "Tiredness"
275, 264, 665, 450, "kills"
0, 699, 77, 830, "A"

Text Recognition

# img_name, "text"

word_1.png, "PROPER"
word_2.png, "FOOD"
word_3.png, "PRONTO"

C. Reference

@inproceedings{karatzas2013icdar, title={ICDAR 2013 robust reading competition}, author={Karatzas, Dimosthenis and Shafait, Faisal and Uchida, Seiichi and Iwamura, Masakazu and i Bigorda, Lluis Gomez and Mestre, Sergi Robles and Mas, Joan and Mota, David Fernandez and Almazan, Jon Almazan and De Las Heras, Lluis Pere}, booktitle={2013 12th international conference on document analysis and recognition}, pages={1484--1493}, year={2013}, organization={IEEE}}

Incidental Scene Text IC15¶

“ICDAR 2015 Competition on Robust Reading”, ICDAR, 2015. PDF

A. Basic Info

Official Website: icdar2015
Year: 2015
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0

B. Annotation Format

Text Detection

# x1,y1,x2,y2,x3,y3,x4,y4,trans

377,117,463,117,465,130,378,130,Genaxis Theatre
493,115,519,115,519,131,493,131,[06]
374,155,409,155,409,170,374,170,###

Text Recognition

# img_name, "text"

word_1.png, "Genaxis Theatre"
word_2.png, "[06]"
word_3.png, "62-03"

C. Reference

@inproceedings{karatzas2015icdar, title={ICDAR 2015 competition on robust reading}, author={Karatzas, Dimosthenis and Gomez-Bigorda, Lluis and Nicolaou, Anguelos and Ghosh, Suman and Bagdanov, Andrew and Iwamura, Masakazu and Matas, Jiri and Neumann, Lukas and Chandrasekhar, Vijay Ramaseshan and Lu, Shijian and others}, booktitle={2015 13th international conference on document analysis and recognition (ICDAR)}, pages={1156--1160}, year={2015}, organization={IEEE}}

IIIT5K¶

“Scene Text Recognition using Higher Order Language Priors”, BMVC, 2012. PDF

A. Basic Info

Official Website: iiit5k
Year: 2012
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A

B. Annotation Format

Text Recognition

# img_name, "text"

train/1009_2.png You
train/1017_1.png Rescue
train/1017_2.png mission

C. Reference

@InProceedings{MishraBMVC12, author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.", title     = "Scene Text Recognition using Higher Order Language Priors", booktitle = "BMVC", year      = "2012"}

Synthetic Word Dataset (MJSynth/Syn90k)¶

“Reading Text in the Wild with Convolutional Neural Networks”, International Journal of Computer Vision, 2016. PDF

A. Basic Info

Official Website: mjsynth
Year: 2016
Language: [‘English’]
Scene: [‘Synthesis’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A

B. Annotation Format

Text Recognition

./3000/7/182_slinking_71711.jpg 71711
./3000/7/182_REMODELERS_64541.jpg 64541

C. Reference

@InProceedings{Jaderberg14c, author       = "Max Jaderberg and Karen Simonyan and Andrea Vedaldi and Andrew Zisserman", title        = "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition", booktitle    = "Workshop on Deep Learning, NIPS", year         = "2014", }
@Article{Jaderberg16, author       = "Max Jaderberg and Karen Simonyan and Andrea Vedaldi and Andrew Zisserman", title        = "Reading Text in the Wild with Convolutional Neural Networks", journal      = "International Journal of Computer Vision", number       = "1", volume       = "116", pages        = "1--20", month        = "jan", year         = "2016", }

NAF¶

“Deep Visual Template-Free Form Parsing”, ICDAR, 2019. PDF

A. Basic Info

Official Website: naf
Year: 2019
Language: [‘English’]
Scene: [‘Document’, ‘Handwritten’]
Annotation Granularity: [‘Word’, ‘Line’]
Supported Tasks: [‘textrecog’, ‘textdet’, ‘textspotting’]
License: CDLA

B. Annotation Format

Text Detection/Recognition/Spotting

{"fieldBBs": [{"poly_points": [[435, 1406], [466, 1406], [466, 1439], [435, 1439]], "type": "fieldCheckBox", "id": "f0", "isBlank": 1}, {"poly_points": [[435, 1444], [469, 1444], [469, 1478], [435, 1478]], "type": "fieldCheckBox", "id": "f1", "isBlank": 1}],
 "textBBs": [{"poly_points": [[1183, 1337], [2028, 1345], [2032, 1395], [1186, 1398]], "type": "text", "id": "t0"}, {"poly_points": [[492, 1336], [809, 1338], [809, 1379], [492, 1378]], "type": "text", "id": "t1"}, {"poly_points": [[512, 1375], [798, 1376], [798, 1405], [512, 1404]], "type": "textInst", "id": "t2"}], "imageFilename": "007182398_00026.jpg", "transcriptions": {"f0": "\u00bf\u00bf\u00bf \u00bf\u00bf\u00bf 18/1/49 \u00bf\u00bf\u00bf\u00bf\u00bf", "f1": "U.S. Navy 53rd. Naval Const. Batt.", "t0": "APPLICATION FOR HEADSTONE OR MARKER", "t1": "ORIGINAL"}}

C. Reference

@inproceedings{davis2019deep, title={Deep visual template-free form parsing}, author={Davis, Brian and Morse, Bryan and Cohen, Scott and Price, Brian and Tensmeyer, Chris}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, pages={134--141}, year={2019}, organization={IEEE}}

Scanned Receipts OCR and Information Extraction¶

“ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”, ICDAR, 2019. PDF

A. Basic Info

Official Website: sroie
Year: 2019
Language: [‘English’]
Scene: [‘Document’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0

B. Annotation Format

Text Detection, Text Recognition and Text Spotting

# x1,y1,x2,y2,x3,y3,x4,y4,trans

72,25,326,25,326,64,72,64,TAN WOON YANN
50,82,440,82,440,121,50,121,BOOK TA .K(TAMAN DAYA) SDN BND
205,121,285,121,285,139,205,139,789417-W

C. Reference

@INPROCEEDINGS{8977955, author={Huang, Zheng and Chen, Kai and He, Jianhua and Bai, Xiang and Karatzas, Dimosthenis and Lu, Shijian and Jawahar, C. V.}, booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)}, title={ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction}, year={2019}, volume={}, number={}, pages={1516-1520}, doi={10.1109/ICDAR.2019.00244}}

Street View Text Dataset (SVT)¶

“Word Spotting in the Wild”, ECCV, 2010. PDF

A. Basic Info

Official Website: svt
Year: 2010
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A

B. Annotation Format

Text Detection/Recognition/Spotting

<image>
  <imageName>img/14_03.jpg</imageName>
  <address>341 Southwest 10th Avenue Portland OR</address>
  <lex>
  LIVING,ROOM,THEATERS,KENNY,ZUKE,DELICATESSEN,CLYDE,COMMON,ACE,HOTEL,PORTLAND,ROSE,CITY,BOOKS,STUMPTOWN,COFFEE,ROASTERS,RED,CAP,GARAGE,FISH,GROTTO,SEAFOOD,RESTAURANT,AURA,RESTAURANT,LOUNGE,ROCCO,PIZZA,PASTA,BUFFALO,EXCHANGE,MARK,SPENCER,LIGHT,FEZ,BALLROOM,READING,FRENZY,ROXY,SCANDALS,MARTINOTTI,CAFE,DELI,CROWSENBERG,HALF
  </lex>
  <Resolution x="1280" y="880"/>
  <taggedRectangles>
    <taggedRectangle height="75" width="236" x="375" y="253">
      <tag>LIVING</tag>
    </taggedRectangle>
    <taggedRectangle height="76" width="175" x="639" y="272">
      <tag>ROOM</tag>
    </taggedRectangle>
    <taggedRectangle height="87" width="281" x="839" y="283">
      <tag>THEATERS</tag>
    </taggedRectangle>
  </taggedRectangles>
</image>

C. Reference

@inproceedings{wang2010word, title={Word spotting in the wild}, author={Wang, Kai and Belongie, Serge}, booktitle={European conference on computer vision}, pages={591--604}, year={2010}, organization={Springer}}

Street View Text Perspective (SVT-P)¶

“Recognizing Text with Perspective Distortion in Natural Scenes”, ICCV, 2013. PDF

A. Basic Info

Official Website: svtp
Year: 2013
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textrecog’]
License: N/A

B. Annotation Format

Text Recognition

13_15_0_par.jpg WYNDHAM
13_15_1_par.jpg HOTEL
12_16_0_par.jpg UNITED

C. Reference

@inproceedings{phan2013recognizing, title={Recognizing text with perspective distortion in natural scenes}, author={Phan, Trung Quy and Shivakumara, Palaiahnakote and Tian, Shangxuan and Tan, Chew Lim}, booktitle={Proceedings of the IEEE International Conference on Computer Vision}, pages={569--576}, year={2013}}

SynthText in the Wild Dataset¶

“Synthetic Data for Text Localisation in Natural Images”, CVPR, 2016. PDF

A. Basic Info

Official Website: synthtext
Year: 2016
Language: [‘English’]
Scene: [‘Synthesis’]
Annotation Granularity: [‘Word’, ‘Character’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: Synthext Custom

B. Annotation Format

Text Detection/Recognition/Spotting

{
    "imnames": [['8/ballet_106_0.jpg', ...]],
    "wordBB": [[[420.58957   418.85016   448.08478   410.3094    117.745026
                322.30963   322.6857    159.09138   154.27284   260.14597
                431.9315    427.52274   296.86508    99.56819   108.96211  ]
               [512.3321    431.88342   519.4515    499.81183   179.0544
                377.97382   376.4993    203.64464   193.77492   313.61514
                487.58023   484.64633   365.83176   142.49403   144.90457  ]
               [511.92203   428.7077    518.7375    499.0373    172.1684
                378.35858   377.2078    203.3191    193.0739    319.69186
                485.6758    482.571     365.76303   142.31898   144.43858  ]
               [420.1795    415.67444   447.3708    409.53485   110.859024
                322.6944    323.3942    158.76585   153.57182   266.2227
                430.02707   425.44742   296.79636    99.39314   108.49613  ]]

              [[ 21.06382    46.19922    47.570374   73.95366   197.17792
                  9.993624   48.437763    9.064571   49.659035  208.57095
                118.41646   162.82489    29.548729    5.800581   28.812992 ]
               [ 23.069519   48.254295   50.130234   77.18146   208.71487
                  8.999153   46.69632     9.698633   50.869553  203.25742
                122.64043   168.38647    29.660484    6.2558594  29.602367 ]
               [ 41.827087   68.39458    70.03627    98.65903   245.30832
                 30.534437   68.589294   32.57161    73.74529   264.40634
                147.7303    189.70224    72.08       22.759935   50.81941  ]
               [ 39.82139    66.3395     67.47641    95.43123   233.77136
                 31.528908   70.33074    31.937548   72.534775  269.71988
                143.50633   184.14066    71.96825    22.304657   50.030033 ]], ...],
    "charBB": [[[423.16126397 439.60847343 450.66887979 466.31976402 479.76190495
                504.59927448 418.80489444 450.13965942 464.16775197 480.46891089
                502.46437709 413.02373632 433.01396211 446.7222192  470.28467827
                482.51674486 116.52285438 139.51408587 150.7448586  162.03366629
                322.84717946 333.54881536 343.28386485 363.07416389 323.48968759
                337.98503283 356.66355903 160.48517048 174.1707753  189.64454066
                155.7637383  167.45490471 179.63644201 262.2183876  271.75848874
                284.05396524 298.26103738 432.8464733  449.15387392 468.07231897
                428.11482147 445.61538159 469.24565878 296.86441324 323.6603118
                344.09880401 101.14677814 110.45423597 120.54555495 131.18342618
                132.20545124 110.01673682 120.83144568 131.35885673]
               [438.2997574  452.61288403 466.31976402 482.22585715 498.3934528
                512.20555863 431.88338084 466.11639619 481.73414937 499.62012025
                519.36789779 432.51717267 449.23571387 465.73425964 484.45139112
                499.59056304 140.27413679 149.59811175 160.13352083 169.59504507
                333.55849014 344.33923741 361.08275796 378.09844418 339.92898685
                355.57692063 376.51230484 174.1707753  189.07871028 203.64462646
                165.22739457 181.27572412 193.60260894 270.99557614 283.13281739
                298.75499435 313.61511672 447.1421735  470.27065563 487.02126631
                446.97485257 468.98979567 484.64633864 317.88691577 341.16094163
                365.8300006  111.15280603 120.54555495 130.72086821 135.27663717
                142.4726875  120.1331955  133.07976304 144.75919258]
               [435.54895424 449.95797159 464.5848793  480.68235876 497.04793842
                511.1101386  428.95660757 463.61882066 480.14247127 498.2535215
                518.03243928 429.36600266 447.19056345 463.89483785 482.21016814
                498.18529977 142.63162835 152.55587851 162.80539142 172.21885945
                333.35620309 344.09880401 360.86201193 377.82379299 339.7646859
                355.37508239 376.1110999  172.46032372 187.37816388 201.39094518
                163.04321987 178.99078221 191.89681939 275.3073355  286.08373072
                301.85539131 318.57227103 444.54207279 467.53925436 485.27070558
                444.57367155 466.90671029 482.56302723 317.62908407 340.9131681
                365.44465854 109.40501176 119.4999228  129.67892444 134.35253232
                140.97421069 118.61779828 131.34019115 143.25688164]
               [420.17946701 436.74150236 448.74896556 464.5848793  478.18853922
                503.4152019  415.67442461 447.3707845  462.35927516 478.8614766
                500.86810735 409.54560397 430.77026495 444.64606264 467.79077782
                480.89051912 119.14629674 142.63162835 153.56593297 164.78799774
                322.69436747 333.35620309 343.11884239 362.84714115 323.37931952
                337.83763574 356.35573621 158.76583616 172.46032372 187.37816388
                153.57183805 165.15781218 177.92125239 266.22269514 274.45156305
                286.82608962 302.69695881 430.02705241 446.01814255 466.05208347
                425.44741792 443.19481667 466.90671029 296.79634428 323.49707084
                343.82488703  99.39315359 109.40501176 119.4999228  130.25798537
                130.70149005 108.49612777 119.08444238 129.84935461]]

              [[ 22.26958901  21.60559248  27.0241972   27.25747678  27.45783459
                 28.73896576  47.91255579  47.80732383  53.77711568  54.24219042
                 52.00169325  74.79043429  80.45929285  81.04748707  76.11658669
                 82.58335942 203.67278213 201.2743445  205.59358622 205.51198143
                 10.06536976  10.82312635  16.77203865  16.31842372  54.80444433
                 54.66492     47.33822371  15.08534083  15.18716407   9.62607092
                 51.06813224  50.18928243  56.16019366 220.78902143 236.08062638
                231.69267533 209.73652786 124.25352842 119.99631725 128.73732717
                165.78411123 167.31764153 167.05531699  29.97351822  31.5116502
                 31.14650552   5.88513488  12.51324147  12.57920537   8.21515307
                  8.21998849  35.66412031  29.17945741  36.00660903]
               [ 22.46075572  21.76391911  27.25747678  27.49456029  27.73554156
                 28.85582217  48.25428361  48.21714995  54.27828788  54.78857757
                 52.4595556   75.57743634  81.15533616  81.86325615  76.681392
                 83.31596322 210.04771309 203.83983042 208.00417391 207.41791524
                  9.79265706  10.55231862  16.36406888  15.97405105  54.64620856
                 54.49559004  47.09756263  15.18716407  15.29808166   9.69862498
                 51.27597632  50.48652154  56.49239954 216.92183074 232.02141018
                226.44624213 203.25738931 125.19349641 121.32658508 130.00428964
                167.43676857 169.36588297 168.38645076  29.58279603  31.19899202
                 30.75826599   5.92344996  12.57920537  12.64571832   8.23451892
                  8.26856497  35.82646468  29.342662    36.22165159]
               [ 40.15739982  40.47241401  40.79219178  41.14411963  41.50190876
                 41.80934074  66.81590976  68.05921213  68.6519006   69.30152766
                 70.01097963  96.14641662  96.04484417  96.89110144  97.81897661
                 98.62829468 237.26055111 240.35280825 243.54641271 245.04022528
                 31.33842788  31.14650552  30.84702178  30.54399042  69.80098672
                 68.7212013   68.62479627  32.13243303  32.34474067  32.54416771
                 72.82501686  73.31372392  73.70922459 267.74318222 265.39839711
                259.52741156 253.14023308 144.60810334 145.23371653 147.69958337
                186.00278322 188.17713786 189.70144388  71.89351759  53.62266986
                 54.40060855  22.41084398  22.51791234  22.62587258  17.11356079
                 22.74567232  50.25232032  46.05692507  50.79345235]
               [ 39.82138755  40.18347166  40.44598236  40.79219178  41.08959901
                 41.64111176  66.33948982  67.47640971  68.01403337  68.60595247
                 69.3953105   95.13188979  95.21297344  95.91593691  97.08847413
                 97.75212171 229.94285119 237.26055111 240.66752705 242.74145162
                 31.52890731  31.33842788  31.16401306  30.81155638  69.87135926
                 68.80273568  68.71664209  31.93753588  32.13243303  32.34474067
                 72.53476992  72.88981775  73.28094858 269.71986636 267.92938572
                262.93698624 256.88902439 143.50635029 143.61251781 146.24080653
                184.14064261 185.86853729 188.17713786  71.96823746  53.79651809
                 54.60870874  22.30465649  22.41084398  22.51791234  17.07939535
                 22.63671808  50.03002471  45.81009198  50.49899163]], ...],
    "txt": [['Lines:\nI lost\nKevin ' 'will                ' 'line\nand            '
              'and\nthe             ' '(and                ' 'the\nout             '
              'you                 ' "don't\n pkg          "], ...]
}

C. Reference

@InProceedings{Gupta16, author       = "Ankush Gupta and Andrea Vedaldi and Andrew Zisserman", title        = "Synthetic Data for Text Localisation in Natural Images", booktitle    = "IEEE Conference on Computer Vision and Pattern Recognition", year         = "2016", }

Text OCR¶

“TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text”, CVPR, 2021. PDF

A. Basic Info

Official Website: textocr
Year: 2021
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: CC BY 4.0

B. Annotation Format

Text Detection/Recognition/Spotting

{
  "imgs": {
    "OpenImages_ImageID_1": {
      "id": "OpenImages_ImageID_1",
      "width": "INT, Width of the image",
      "height": "INT, Height of the image",
      "set": "Split train|val|test",
      "filename": "train|test/OpenImages_ImageID_1.jpg"
    },
    "OpenImages_ImageID_2": {
      "...": "..."
    }
  },
  "anns": {
    "OpenImages_ImageID_1_1": {
      "id": "STR, OpenImages_ImageID_1_1, Specifies the nth annotation for an image",
      "image_id": "OpenImages_ImageID_1",
      "bbox": [
        "FLOAT x1",
        "FLOAT y1",
        "FLOAT x2",
        "FLOAT y2"
      ],
      "points": [
        "FLOAT x1",
        "FLOAT y1",
        "FLOAT x2",
        "FLOAT y2",
        "...",
        "FLOAT xN",
        "FLOAT yN"
      ],
      "utf8_string": "text for this annotation",
      "area": "FLOAT, area of this box"
    },
    "OpenImages_ImageID_1_2": {
      "...": "..."
    },
    "OpenImages_ImageID_2_1": {
      "...": "..."
    }
  },
  "img2Anns": {
    "OpenImages_ImageID_1": [
      "OpenImages_ImageID_1_1",
      "OpenImages_ImageID_1_2",
      "OpenImages_ImageID_1_2"
    ],
    "OpenImages_ImageID_N": [
      "..."
    ]
  }
}

C. Reference

@inproceedings{singh2021textocr, title={{TextOCR}: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text}, author={Singh, Amanpreet and Pang, Guan and Toh, Mandy and Huang, Jing and Galuba, Wojciech and Hassner, Tal}, journal={The Conference on Computer Vision and Pattern Recognition}, year={2021}}

Total Text¶

“Total-Text: Towards Orientation Robustness in Scene Text Detection”, IJDAR, 2020. PDF

A. Basic Info

Official Website: totaltext
Year: 2020
Language: [‘English’]
Scene: [‘Natural Scene’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘textdet’, ‘textrecog’, ‘textspotting’]
License: BSD-3

B. Annotation Format

Text Detection/Spotting

x: [[259 313 389 427 354 302]], y: [[542 462 417 459 507 582]], ornt: [u'c'], transcriptions: [u'PAUL']
x: [[400 478 494 436]], y: [[398 380 448 465]], ornt: [u'#'], transcriptions: [u'#']

C. Reference

@article{CK2019, author = {Chee Kheng Chng and Chee Seng Chan and Chenglin Liu}, title = {Total-Text: Towards Orientation Robustness in Scene Text Detection}, journal = {International Journal on Document Analysis and Recognition (IJDAR)}, volume = {23}, pages = {31-52}, year = {2020}, doi = {10.1007/s10032-019-00334-z}}

WildReceipt¶

“Spatial Dual-Modality Graph Reasoning for Key Information Extraction”, arXiv, 2021. PDF

A. Basic Info

Official Website: wildreceipt
Year: 2021
Language: [‘English’]
Scene: [‘Receipt’]
Annotation Granularity: [‘Word’]
Supported Tasks: [‘kie’, ‘textdet’, ‘textrecog’, ‘textspotting’]
License: N/A

B. Annotation Format

KIE

// Close Set
{
  "file_name": "image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg",
  "height": 1200,
  "width": 1600,
  "annotations":
    [
      {
        "box": [550.0, 190.0, 937.0, 190.0, 937.0, 104.0, 550.0, 104.0],
        "text": "SAFEWAY",
        "label": 1
      },
      {
        "box": [1048.0, 211.0, 1074.0, 211.0, 1074.0, 196.0, 1048.0, 196.0],
        "text": "TM",
        "label": 25
      }
    ], //...
}

// Open Set
{
  "file_name": "image_files/Image_12/10/845be0dd6f5b04866a2042abd28d558032ef2576.jpeg",
  "height": 348,
  "width": 348,
  "annotations":
    [
      {
        "box": [114.0, 19.0, 230.0, 19.0, 230.0, 1.0, 114.0, 1.0],
        "text": "CHOEUN",
        "label": 2,
        "edge": 1
      },
      {
        "box": [97.0, 35.0, 236.0, 35.0, 236.0, 19.0, 97.0, 19.0],
        "text": "KOREANRESTAURANT",
        "label": 2,
        "edge": 1
      }
    ]
}

C. Reference

@article{sun2021spatial, title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction}, author={Sun, Hongbin and Kuang, Zhanghui and Yue, Xiaoyu and Lin, Chenhao and Zhang, Wayne}, journal={arXiv preprint arXiv:2103.14470}, year={2021} }