Shortcuts

NRTRDecoder

class mmocr.models.textrecog.NRTRDecoder(n_layers=6, d_embedding=512, n_head=8, d_k=64, d_v=64, d_model=512, d_inner=256, n_position=200, dropout=0.1, module_loss=None, postprocessor=None, dictionary=None, max_seq_len=30, init_cfg=None)[源代码]

Transformer Decoder block with self attention mechanism.

参数
  • n_layers (int) – Number of attention layers. Defaults to 6.

  • d_embedding (int) – Language embedding dimension. Defaults to 512.

  • n_head (int) – Number of parallel attention heads. Defaults to 8.

  • d_k (int) – Dimension of the key vector. Defaults to 64.

  • d_v (int) – Dimension of the value vector. Defaults to 64

  • d_model (int) – Dimension \(D_m\) of the input from previous model. Defaults to 512.

  • d_inner (int) – Hidden dimension of feedforward layers. Defaults to 256.

  • n_position (int) – Length of the positional encoding vector. Must be greater than max_seq_len. Defaults to 200.

  • dropout (float) – Dropout rate for text embedding, MHSA, FFN. Defaults to 0.1.

  • module_loss (dict, optional) – Config to build module_loss. Defaults to None.

  • postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.

  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.

  • max_seq_len (int) – Maximum output sequence length \(T\). Defaults to 30.

  • init_cfg (dict or list[dict], optional) – Initialization configs.

返回类型

None

forward_test(feat=None, out_enc=None, data_samples=None)[源代码]

Forward for testing.

参数
  • feat (Tensor, optional) – Unused.

  • out_enc (Tensor) – Encoder output of shape: math:(N, T, D_m) where \(D_m\) is d_model. Defaults to None.

  • data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing gt_text and valid_ratio information. Defaults to None.

返回

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

返回类型

Tensor

forward_train(feat=None, out_enc=None, data_samples=None)[源代码]

Forward for training. Source mask will be used here.

参数
  • feat (Tensor, optional) – Unused.

  • out_enc (Tensor) – Encoder output of shape : math:(N, T, D_m) where \(D_m\) is d_model. Defaults to None.

  • data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing gt_text and valid_ratio information. Defaults to None.

返回

The raw logit tensor. Shape \((N, T, C)\) where \(C\) is num_classes.

返回类型

Tensor

Read the Docs v: latest
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.