Changelog¶

v0.3.0 (25/8/2021)¶

Highlights¶

We add a new text recognition model – SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
Improve the demo script, ocr.py, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428
Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
The requirement of Polygon3 is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in shapely instead. #448

Breaking Changes & Migration Guide¶

Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
MMOCR now has its own model and layer registries inherited from MMDetection’s or MMCV’s counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.

mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
mmcv.MODELS -> mmdet.NECKS -> NECKS
mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
mmcv.MODELS -> mmdet.HEADS -> HEADS
mmcv.MODELS -> mmdet.LOSSES -> LOSSES
mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS

To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors) from mmdet.models.builder to mmocr.models.builder. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet. or mmcv. prefix to its name to inform the model builder of the right namespace to work on.

Interested users may check out MMCV’s tutorial on Registry for in-depth explanations on its mechanism.

New Features¶

Automatically replace SyncBN with BN for inference #420, #453
Support batch inference for CRNN and SegOCR #407
Support exporting documentation in pdf or epub format #406
Support persistent_workers option in data loader #459

Bug Fixes¶

Remove depreciated key in kie_test_imgs.py #381
Fix dimension mismatch in batch testing/inference of DBNet #383
Fix the problem of dice loss which stays at 1 with an empty target given #408
Fix a wrong link in ocr.py (@naarkhoo) #417
Fix undesired assignment to “pretrained” in test.py #418
Fix a problem in polygon generation of DBNet #421, #443
Skip invalid annotations in totaltext_converter #438
Add zero division handler in poly utils, remove Polygon3 #448

Improvements¶

Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
Support MIM #394
Add tests for PyTorch 1.9 in CI #401
Enables fullscreen layout in readthedocs #413
General documentation enhancement #395
Update version checker #427
Add copyright info #439
Update citation information #440

Contributors¶

We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!

v0.2.1 (20/7/2021)¶

Highlights¶

Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
Unified parameter initialization method which uses init_cfg in config files #365

New Features¶

Support TextOCR dataset #293
Support Total-Text dataset #266, #273, #357
Support grouping text detection box into lines #290, #304
Add benchmark_processing script that benchmarks data loading process #261
Add SynthText preprocessor for text recognition models #351, #361
Support batch inference during testing #310
Add user-friendly OCR inference script #366

Bug Fixes¶

Fix improper class ignorance in SDMGR Loss #221
Fix potential numerical zero division error in DRRG #224
Fix installing requirements with pip and mim #242
Fix dynamic input error of DBNet #269
Fix space parsing error in LineStrParser #285
Fix textsnake decode error #264
Correct isort setup #288
Fix a bug in SDMGR config #316
Fix kie_test_img for KIE nonvisual #319
Fix metafiles #342
Fix different device problem in FCENet #334
Ignore improper tailing empty characters in annotation files #358
Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
Fix NRTR config #356, #370

Improvements¶

Add backend for resizeocr #244
Skip image processing pipelines in SDMGR novisual #260
Speedup DBNet #263
Update mmcv installation method in workflow #323
Add part of Chinese documentations #353, #362
Add support for ConcatDataset with two workflows #348
Add list_from_file and list_to_file utils #226
Speed up sort_vertex #239
Support distributed evaluation of KIE #234
Add pretrained FCENet on IC15 #258
Support CPU for OCR demo #227
Avoid extra image pre-processing steps #375

v0.2.0 (18/5/2021)¶

Highlights¶

Add the NER approach Bert-softmax (NAACL’2019)
Add the text detection method DRRG (CVPR’2020)
Add the text detection method FCENet (CVPR’2021)
Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
Simplify the installation.

New Features¶

Add Bert-softmax for Ner task #148
Add DRRG #189
Add FCENet #133
Add end-to-end demo #105
Support batch inference #86 #87 #178
Add TPS preprocessor for text recognition #117 #135
Add demo documentation #151 #166 #168 #170 #171
Add checkpoint for Chinese recognition #156
Add metafile #175 #176 #177 #182 #183
Add support for numpy array inference #74

Bug Fixes¶

Fix the duplicated point bug due to transform for textsnake #130
Fix CTC loss NaN #159
Fix error raised if result is empty in demo #144
Fix results missing if one image has a large number of boxes #98
Fix package missing in dockerfile #109

Improvements¶

Simplify installation procedure via removing compiling #188
Speed up panet post processing so that it can detect dense texts #188
Add zh-CN README #70 #95
Support windows #89
Add Colab #147 #199
Add 1-step installation using conda environment #193 #194 #195

v0.1.0 (7/4/2021)¶

Highlights¶

MMOCR is released.

Main Features¶

Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
For text detection, support both single-step (PSENet, PANet, DBNet, TextSnake) and two-step (MaskRCNN) methods.
For text recognition, support CTC-loss based method CRNN; Encoder-decoder (with attention) based methods SAR, Robustscanner; Segmentation based method SegOCR; Transformer based method NRTR.
For key information extraction, support GCN based method SDMG-R.
Provide checkpoints and log files for all of the methods above.