Shortcuts

Text Recognition

Overview

The structure of the text recognition dataset directory is organized as follows.

├── mixture
│   ├── coco_text
│   │   ├── train_label.txt
│   │   ├── train_words
│   ├── icdar_2011
│   │   ├── training_label.txt
│   │   ├── Challenge1_Training_Task3_Images_GT
│   ├── icdar_2013
│   │   ├── train_label.txt
│   │   ├── test_label_1015.txt
│   │   ├── test_label_1095.txt
│   │   ├── Challenge2_Training_Task3_Images_GT
│   │   ├── Challenge2_Test_Task3_Images
│   ├── icdar_2015
│   │   ├── train_label.txt
│   │   ├── test_label.txt
│   │   ├── ch4_training_word_images_gt
│   │   ├── ch4_test_word_images_gt
│   ├── III5K
│   │   ├── train_label.txt
│   │   ├── test_label.txt
│   │   ├── train
│   │   ├── test
│   ├── ct80
│   │   ├── test_label.txt
│   │   ├── image
│   ├── svt
│   │   ├── test_label.txt
│   │   ├── image
│   ├── svtp
│   │   ├── test_label.txt
│   │   ├── image
│   ├── Syn90k
│   │   ├── shuffle_labels.txt
│   │   ├── label.txt
│   │   ├── label.lmdb
│   │   ├── mnt
│   ├── SynthText
│   │   ├── alphanumeric_labels.txt
│   │   ├── shuffle_labels.txt
│   │   ├── instances_train.txt
│   │   ├── label.txt
│   │   ├── label.lmdb
│   │   ├── synthtext
│   ├── SynthAdd
│   │   ├── label.txt
│   │   ├── label.lmdb
│   │   ├── SynthText_Add
│   ├── TextOCR
│   │   ├── image
│   │   ├── train_label.txt
│   │   ├── val_label.txt
│   ├── Totaltext
│   │   ├── imgs
│   │   ├── annotations
│   │   ├── train_label.txt
│   │   ├── test_label.txt
│   ├── OpenVINO
│   │   ├── image_1
│   │   ├── image_2
│   │   ├── image_5
│   │   ├── image_f
│   │   ├── image_val
│   │   ├── train_1_label.txt
│   │   ├── train_2_label.txt
│   │   ├── train_5_label.txt
│   │   ├── train_f_label.txt
│   │   ├── val_label.txt
Dataset images annotation file annotation file
training test
coco_text homepage train_label.txt -
icdar_2011 homepage train_label.txt -
icdar_2013 homepage train_label.txt test_label_1015.txt
icdar_2015 homepage train_label.txt test_label.txt
IIIT5K homepage train_label.txt test_label.txt
ct80 homepage - test_label.txt
svt homepage - test_label.txt
svtp unofficial homepage[1] - test_label.txt
MJSynth (Syn90k) homepage shuffle_labels.txt | label.txt -
SynthText (Synth800k) homepage alphanumeric_labels.txt |shuffle_labels.txt | instances_train.txt | label.txt -
SynthAdd SynthText_Add.zip (code:627x) label.txt -
TextOCR homepage - -
Totaltext homepage - -
OpenVINO Open Images annotations annotations

(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.

Preparation Steps

ICDAR 2013

ICDAR 2015

IIIT5K

svt

python tools/data/textrecog/svt_converter.py <download_svt_dir_path>

ct80

svtp

coco_text

MJSynth (Syn90k)

  • Step1: Download mjsynth.tar.gz from homepage

  • Step2: Download label.txt (8,919,273 annotations) and shuffle_labels.txt (2,400,000 randomly sampled annotations). Please make sure you’re using the right annotation to train the model by checking its dataset specs in Model Zoo.

  • Step3:

mkdir Syn90k && cd Syn90k

mv /path/to/mjsynth.tar.gz .

tar -xzf mjsynth.tar.gz

mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .

# create soft link
cd /path/to/mmocr/data/mixture

ln -s /path/to/Syn90k Syn90k

SynthText (Synth800k)

  • Step1: Download SynthText.zip from homepage

  • Step2: According to your actual needs, download the most appropriate one from the following options: label.txt (7,266,686 annotations), shuffle_labels.txt (2,400,000 randomly sampled annotations), alphanumeric_labels.txt (7,239,272 annotations with alphanumeric characters only) and instances_train.txt (7,266,686 character-level annotations).

Warning

Please make sure you’re using the right annotation to train the model by checking its dataset specs in Model Zoo.

  • Step3:

mkdir SynthText && cd SynthText
mv /path/to/SynthText.zip .
unzip SynthText.zip
mv SynthText synthtext

mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
mv /path/to/alphanumeric_labels.txt .
mv /path/to/instances_train.txt .

# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthText SynthText
  • Step4: Generate cropped images and labels:

cd /path/to/mmocr

python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8

SynthAdd

  • Step1: Download SynthText_Add.zip from SynthAdd (code:627x))

  • Step2: Download label.txt

  • Step3:

mkdir SynthAdd && cd SynthAdd

mv /path/to/SynthText_Add.zip .

unzip SynthText_Add.zip

mv /path/to/label.txt .

# create soft link
cd /path/to/mmocr/data/mixture

ln -s /path/to/SynthAdd SynthAdd

Tip

To convert label file with txt format to lmdb format,

python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>

For example,

python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb

TextOCR

mkdir textocr && cd textocr

# Download TextOCR dataset
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json

# For images
unzip -q train_val_images.zip
mv train_images train
  • Step2: Generate train_label.txt, val_label.txt and crop images using 4 processes with the following command:

python tools/data/textrecog/textocr_converter.py /path/to/textocr 4

Totaltext

  • Step1: Download totaltext.zip from github dataset and groundtruth_text.zip from github Groundtruth (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).

mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations

# For images
# in ./totaltext
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test

# For annotations
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
  • Step2: Generate cropped images, train_label.txt and test_label.txt with the following command (the cropped images will be saved to data/totaltext/dst_imgs/):

python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test

OpenVINO

  • Step0: Install awscli.

  • Step1: Download Open Images subsets train_1, train_2, train_5, train_f, and validation to openvino/.

mkdir openvino && cd openvino

# Download Open Images subsets
for s in 1 2 5 f; do
  aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
done
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .

# Download annotations
for s in 1 2 5 f; do
  wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
done
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json

# Extract images
mkdir -p openimages_v5/val
for s in 1 2 5 f; do
  tar zxf train_${s}.tar.gz -C openimages_v5
done
tar zxf validation.tar.gz -C openimages_v5/val
  • Step2: Generate train_{1,2,5,f}_label.txt, val_label.txt and crop images using 4 processes with the following command:

python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
Read the Docs v: v0.4.0
Versions
latest
stable
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.