TextRecogGeneralAug¶

class mmocr.datasets.transforms.TextRecogGeneralAug[source]¶

A general geometric augmentation tool for text images in the CVPR 2020 paper “Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition”. It applies distortion, stretching, and perspective transforms to an image.

This implementation is adapted from https://github.com/RubanSeven/Text-Image-Augmentation-python/blob/master/augment.py # noqa

TODO: Split this transform into three transforms.

Required Keys:

img

Modified Keys:

img
img_shape

tia_distort(img, segment=4)[source]¶

Image distortion.

Parameters

img (np.ndarray) – The image.
segment (int) – The number of segments to divide the image along the width. Defaults to 4.

Return type

numpy.ndarray

tia_perspective(img)[source]¶

Image perspective transformation.

Parameters

img (np.ndarray) – The image.
segment (int) – The number of segments to divide the image along the width. Defaults to 4.

Return type

numpy.ndarray

tia_stretch(img, segment=4)[source]¶

Image stretching.

Parameters

img (np.ndarray) – The image.
segment (int) – The number of segments to divide the image along the width. Defaults to 4.

Return type

numpy.ndarray

transform(results)[source]¶

Call function to pad images.

Parameters: results (dict) – Result dict from loading pipeline.
Returns: Updated result dict.
Return type: dict

warp_mls(src, src_pts, dst_pts, dst_w, dst_h, trans_ratio=1.0)[source]¶

Warp the image.

Parameters

src (numpy.ndarray) –
src_pts (List[int]) –
dst_pts (List[int]) –
dst_w (int) –
dst_h (int) –
trans_ratio (float) –

Return type

numpy.ndarray