Shortcuts

MasterDecoder

class mmocr.models.textrecog.MasterDecoder(n_layers=3, n_head=8, d_model=512, feat_size=240, d_inner=2048, attn_drop=0.0, ffn_drop=0.0, feat_pe_drop=0.2, module_loss=None, postprocessor=None, dictionary=None, max_seq_len=30, init_cfg=None)[源代码]

Decoder module in MASTER.

Code is partially modified from https://github.com/wenwenyu/MASTER-pytorch.

参数
  • n_layers (int) – Number of attention layers. Defaults to 3.

  • n_head (int) – Number of parallel attention heads. Defaults to 8.

  • d_model (int) – Dimension \(E\) of the input from previous model. Defaults to 512.

  • feat_size (int) – The size of the input feature from previous model, usually \(H * W\). Defaults to 6 * 40.

  • d_inner (int) – Hidden dimension of feedforward layers. Defaults to 2048.

  • attn_drop (float) – Dropout rate of the attention layer. Defaults to 0.

  • ffn_drop (float) – Dropout rate of the feedforward layer. Defaults to 0.

  • feat_pe_drop (float) – Dropout rate of the feature positional encoding layer. Defaults to 0.2.

  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary. Defaults to None.

  • module_loss (dict, optional) – Config to build module_loss. Defaults to None.

  • postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.

  • max_seq_len (int) – Maximum output sequence length \(T\). Defaults to 30.

  • init_cfg (dict or list[dict], optional) – Initialization configs.

decode(tgt_seq, feature, src_mask, tgt_mask)[源代码]

Decode the input sequence.

参数
  • tgt_seq (Tensor) – Target sequence of shape: math: (N, T, C).

  • feature (Tensor) – Input feature map from encoder of shape: math: (N, C, H, W)

  • src_mask (BoolTensor) – The source mask of shape: math: (N, H*W).

  • tgt_mask (BoolTensor) – The target mask of shape: math: (N, T, T).

返回

The decoded sequence.

返回类型

Tensor

forward_test(feat=None, out_enc=None, data_samples=None)[源代码]

Forward for testing.

参数
  • feat (Tensor, optional) – Input feature map from backbone.

  • out_enc (Tensor) – Unused.

  • data_samples (list[TextRecogDataSample]) – Unused.

返回

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

返回类型

Tensor

forward_train(feat=None, out_enc=None, data_samples=None)[源代码]

Forward for training. Source mask will not be used here.

参数
  • feat (Tensor, optional) – Input feature map from backbone.

  • out_enc (Tensor) – Unused.

  • data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing gt_text and valid_ratio information.

返回

The raw logit tensor. Shape \((N, T, C)\) where \(C\) is num_classes.

返回类型

Tensor

make_target_mask(tgt, device)[源代码]

Make target mask for self attention.

参数
  • tgt (Tensor) – Shape [N, l_tgt]

  • device (torch.device) – Mask device.

返回

Mask of shape [N * self.n_head, l_tgt, l_tgt]

返回类型

Tensor

Read the Docs v: latest
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.