Key Information Extraction¶
This page is a manual preparation guide for datasets not yet supported by Dataset Preparer, which all these scripts will be eventually migrated into.
The structure of the key information extraction dataset directory is organized as follows.
└── wildreceipt ├── class_list.txt ├── dict.txt ├── image_files ├── openset_train.txt ├── openset_test.txt ├── test.txt └── train.txt
Step0: have WildReceipt prepared.
Step1: Convert annotation files to OpenSet format:
# You may find more available arguments by running # python tools/data/kie/closeset_to_openset.py -h python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
You can learn more about the key differences between CloseSet and OpenSet annotations in our tutorial.