FAQ¶

General¶

Q1 I’m getting the warning like unexpected key in source state_dict: fc.weight, fc.bias, is there something wrong?

A It’s not an error. It occurs because the backbone network is pretrained on image classification tasks, where the last fc layer is required to generate the classification output. However, the fc layer is no longer needed when the backbone network is used to extract features in downstream tasks, and therefore these weights can be safely skipped when loading the checkpoint.

Q2 MMOCR terminates with an error: shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry. How could I fix it?

A This error occurs because of some invalid polygons (e.g., polygons with self-intersections) existing in the dataset or generated by some non-rigorous data transforms. These polygons can be fixed by adding FixInvalidPolygon transform after the transform likely to introduce invalid polygons. For example, a common practice is to append it after LoadOCRAnnotations in both train and test pipeline. The resulting pipeline should look like:

train_pipeline = [
    ...
    dict(
        type='LoadOCRAnnotations',
        with_polygon=True,
        with_bbox=True,
        with_label=True,
    ),
    dict(type='FixInvalidPolygon', min_poly_points=4),
    ...
]

In practice, we find that Totaltext contains some invalid polygons and using FixInvalidPolygon is a must. Here is an example config.

Q3 Getting libpng warning: iCCP: known incorrect sRGB profile when loading images with cv2 backend.

A This is a warning from libpng and it is safe to ignore. It is caused by the icc profile in the image. You can use pillow backend to avoid this warning:

train_pipeline = [
    dict(
        type='LoadImageFromFile',
        imdecode_backend='pillow'),
    ...
]

Text Recognition¶

Q1 What are the steps to train text recognition models with my own dictionary?

A In MMOCR 1.0, you only need to modify the config and point Dictionary to your custom dict file. For example, if you want to train SAR model (https://github.com/open-mmlab/mmocr/blob/75c06d34bbc01d3d11dfd7afc098b6cdeee82579/configs/textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real.py) with your own dictionary placed at /my/dict.txt, you can modify dictionary.dict_file term in base config to:

dictionary = dict(
    type='Dictionary',
    dict_file='/my/dict.txt',
    with_start=True,
    with_end=True,
    same_start_end=True,
    with_padding=True,
    with_unknown=True)

Now you are good to go. You can also find more information in Dictionary API.

Q2 How to properly visualize non-English characters?

A You can customize font_families or font_properties in visualizer. For example, to visualize Korean:

configs/textrecog/_base_/default_runtime.py:

visualizer = dict(
    type='TextRecogLocalVisualizer',
    name='visualizer',
    font_families='NanumGothic', # new feature
    vis_backends=vis_backends)

It’s also fine to pass the font path to visualizer:

visualizer = dict(
    type='TextRecogLocalVisualizer',
    name='visualizer',
    font_properties='path/to/font_file',
    vis_backends=vis_backends)