Key Information Extraction¶
Note
This page is a manual preparation guide for datasets not yet supported by Dataset Preparer, which all these scripts will be eventually migrated into.
Overview¶
The structure of the key information extraction dataset directory is organized as follows.
└── wildreceipt
├── class_list.txt
├── dict.txt
├── image_files
├── openset_train.txt
├── openset_test.txt
├── test.txt
└── train.txt
Preparation Steps¶
WildReceipt¶
Just download and extract wildreceipt.tar.
WildReceiptOpenset¶
Step0: have WildReceipt prepared.
Step1: Convert annotation files to OpenSet format:
# You may find more available arguments by running
# python tools/data/kie/closeset_to_openset.py -h
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
Note
You can learn more about the key differences between CloseSet and OpenSet annotations in our tutorial.