PositionAttentionDecoder¶
- class mmocr.models.textrecog.PositionAttentionDecoder(dictionary, module_loss=None, postprocessor=None, rnn_layers=2, dim_input=512, dim_model=128, max_seq_len=40, mask=True, return_feature=True, encode_value=False, init_cfg=None)[source]¶
Position attention decoder for RobustScanner.
RobustScanner: RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
- Parameters
dictionary (dict or
Dictionary
) – The config for Dictionary or the instance of Dictionary.module_loss (dict, optional) – Config to build module_loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
rnn_layers (int) – Number of RNN layers. Defaults to 2.
dim_input (int) – Dimension \(D_i\) of input vector
feat
. Defaults to 512.dim_model (int) – Dimension \(D_m\) of the model. Should also be the same as encoder output vector
out_enc
. Defaults to 128.max_seq_len (int) – Maximum output sequence length \(T\). Defaults to 40.
mask (bool) – Whether to mask input features according to
img_meta['valid_ratio']
. Defaults to True.return_feature (bool) – Return feature or logits as the result. Defaults to True.
encode_value (bool) – Whether to use the output of encoder
out_enc
as value of attention layer. If False, the original featurefeat
will be used. Defaults to False.init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.
- Return type
- forward_test(feat, out_enc, img_metas)[source]¶
- Parameters
feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.
img_metas (Sequence[mmocr.structures.textrecog_data_sample.TextRecogDataSample]) –
- Returns
Character probabilities of shape \((N, T, C)\) if
return_feature=False
. Otherwise it would be the hidden feature before the prediction projection layer, whose shape is \((N, T, D_m)\).- Return type
Tensor
- forward_train(feat, out_enc, data_samples)[source]¶
- Parameters
feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.
- Returns
A raw logit tensor of shape \((N, T, C)\) if
return_feature=False
. Otherwise it will be the hidden feature before the prediction projection layer, whose shape is \((N, T, D_m)\).- Return type
Tensor