Shortcuts

SequenceAttentionDecoder

class mmocr.models.textrecog.SequenceAttentionDecoder(dictionary, module_loss=None, postprocessor=None, rnn_layers=2, dim_input=512, dim_model=128, max_seq_len=40, mask=True, dropout=0, return_feature=True, encode_value=False, init_cfg=None)[source]

Sequence attention decoder for RobustScanner.

RobustScanner: RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

Parameters
  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.

  • module_loss (dict, optional) – Config to build module_loss. Defaults to None.

  • postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.

  • rnn_layers (int) – Number of RNN layers. Defaults to 2.

  • dim_input (int) – Dimension \(D_i\) of input vector feat. Defaults to 512.

  • dim_model (int) – Dimension \(D_m\) of the model. Should also be the same as encoder output vector out_enc. Defaults to 128.

  • max_seq_len (int) – Maximum output sequence length \(T\). Defaults to 40.

  • mask (bool) – Whether to mask input features according to data_sample.valid_ratio. Defaults to True.

  • dropout (float) – Dropout rate for LSTM layer. Defaults to 0.

  • return_feature (bool) – Return feature or logic as the result. Defaults to True.

  • encode_value (bool) – Whether to use the output of encoder out_enc as value of attention layer. If False, the original feature feat will be used. Defaults to False.

  • init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

Return type

None

forward_test(feat, out_enc, data_samples)[source]
Parameters
  • feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).

  • out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).

  • data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.

Returns

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

Return type

Tensor

forward_test_step(feat, out_enc, decode_sequence, current_step, data_samples)[source]
Parameters
  • feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).

  • out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).

  • decode_sequence (Tensor) – Shape \((N, T)\). The tensor that stores history decoding result.

  • current_step (int) – Current decoding step.

  • data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.

Returns

Shape \((N, C)\). The logit tensor of predicted tokens at current time step.

Return type

Tensor

forward_train(feat, out_enc, data_samples=None)[source]
Parameters
  • feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).

  • out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).

  • targets_dict (dict) – A dict with the key padded_targets, a tensor of shape \((N, T)\). Each element is the index of a character.

  • data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.

Returns

A raw logit tensor of shape \((N, T, C)\) if return_feature=False. Otherwise it would be the hidden feature before the prediction projection layer, whose shape is \((N, T, D_m)\).

Return type

Tensor

Read the Docs v: latest
Versions
latest
stable
v1.0.1
v1.0.0
0.x
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
v0.4.1
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.