Shortcuts

ABIVisionDecoder

class mmocr.models.textrecog.ABIVisionDecoder(dictionary, in_channels=512, num_channels=64, attn_height=8, attn_width=32, attn_mode='nearest', module_loss=None, postprocessor=None, max_seq_len=40, init_cfg={'layer': 'Conv2d', 'type': 'Xavier'}, **kwargs)[源代码]

Converts visual features into text characters.

Implementation of VisionEncoder in

ABINet.

参数
  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.

  • in_channels (int) – Number of channels \(E\) of input vector. Defaults to 512.

  • num_channels (int) – Number of channels of hidden vectors in mini U-Net. Defaults to 64.

  • attn_height (int) – Height \(H\) of input image features. Defaults to 8.

  • attn_width (int) – Width \(W\) of input image features. Defaults to 32.

  • attn_mode (str) – Upsampling mode for torch.nn.Upsample in mini U-Net. Defaults to ‘nearest’.

  • module_loss (dict, optional) – Config to build loss. Defaults to None.

  • postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.

  • max_seq_len (int) – Maximum sequence length. The sequence is usually generated from decoder. Defaults to 40.

  • init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to dict(type=’Xavier’, layer=’Conv2d’).

返回类型

None

forward_test(feat=None, out_enc=None, data_samples=None)[源代码]
参数
  • feat (torch.Tensor, optional) – Image features of shape (N, E, H, W). Defaults to None.

  • out_enc (torch.Tensor) – Encoder output. Defaults to None.

  • data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.

返回

A dict with keys feature, logits and attn_scores.

  • feature (Tensor): Shape (N, T, E). Raw visual features for language decoder.

  • logits (Tensor): Shape (N, T, C). The raw logits for characters.

  • attn_scores (Tensor): Shape (N, T, H, W). Intermediate result for vision-language aligner.

返回类型

dict

forward_train(feat=None, out_enc=None, data_samples=None)[源代码]
参数
  • feat (Tensor, optional) – Image features of shape (N, E, H, W). Defaults to None.

  • out_enc (torch.Tensor) – Encoder output. Defaults to None.

  • data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing gt_text information. Defaults to None.

返回

A dict with keys feature, logits and attn_scores.

  • feature (Tensor): Shape (N, T, E). Raw visual features for language decoder.

  • logits (Tensor): Shape (N, T, C). The raw logits for characters.

  • attn_scores (Tensor): Shape (N, T, H, W). Intermediate result for vision-language aligner.

返回类型

dict

Read the Docs v: stable
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.