TextRecogDataPreprocessor¶

class mmocr.models.textrecog.TextRecogDataPreprocessor(mean=None, std=None, pad_size_divisor=1, pad_value=0, bgr_to_rgb=False, rgb_to_bgr=False, batch_augments=None)[源代码]¶

Image pre-processor for recognition tasks.

Comparing with the mmengine.ImgDataPreprocessor,

It supports batch augmentations.

2. It will additionally append batch_input_shape and valid_ratio to data_samples considering the object recognition task.

It provides the data pre-processing as follows

Collate and move data to the target device.
Pad inputs to the maximum size of current batch with defined pad_value. The padding size can be divisible by a defined pad_size_divisor
Stack inputs to inputs.
Convert inputs from bgr to rgb if the shape of input is (3, H, W).
Normalize image with defined std and mean.
Do batch augmentations during training.

参数

mean (Sequence[Number], optional) – The pixel mean of R, G, B channels. Defaults to None.
std (Sequence[Number], optional) – The pixel standard deviation of R, G, B channels. Defaults to None.
pad_size_divisor (int) – The size of padded image should be divisible by pad_size_divisor. Defaults to 1.
pad_value (Number) – The padded pixel value. Defaults to 0.
bgr_to_rgb (bool) – whether to convert image from BGR to RGB. Defaults to False.
rgb_to_bgr (bool) – whether to convert image from RGB to RGB. Defaults to False.
batch_augments (list[dict], optional) – Batch-level augmentations

返回类型

None

forward(data, training=False)[源代码]¶

Perform normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数

data (dict) – Data sampled from dataloader.
training (bool) – Whether to enable training time augmentation.

返回

Data in the same format as the model input.

返回类型

dict