STN¶

class mmocr.models.textrecog.STN(in_channels, resized_image_size=(32, 64), output_image_size=(32, 100), num_control_points=20, margins=[0.05, 0.05], init_cfg=[{'type': 'Xavier', 'layer': 'Conv2d'}, {'type': 'Constant', 'val': 1, 'layer': 'BatchNorm2d'}])[source]¶

Implement STN module in ASTER: An Attentional Scene Text Recognizer with Flexible Rectification (https://ieeexplore.ieee.org/abstract/document/8395027/)

Parameters

in_channels (int) – The number of input channels.
resized_image_size (Tuple[int, int]) – The resized image size. The input image will be downsampled to have a better recitified result.
output_image_size (Tuple[int, int]) – The size of the output image for TPS. Defaults to (32, 100).
num_control_points (int) – The number of control points. Defaults to 20.
margins (Tuple[float, float]) – The margins for control points to the top and down side of the image for TPS. Defaults to [0.05, 0.05].
init_cfg (Optional[Union[Dict, List[Dict]]]) –

forward(img)[source]¶

Forward function of STN.

Parameters: img (Tensor) – The input image tensor.
Returns: The rectified image tensor.
Return type: Tensor

init_stn(stn_fc2)[source]¶

Initialize the output linear layer of stn, so that the initial source point will be at the top and down side of the image, which will help to optimize.

Parameters: stn_fc2 (nn.Linear) – The output linear layer of stn.
Return type: None