Shortcuts

ABIFuser

class mmocr.models.textrecog.ABIFuser(dictionary, vision_decoder, language_decoder=None, d_model=512, num_iters=1, max_seq_len=40, module_loss=None, postprocessor=None, init_cfg=None, **kwargs)[源代码]

A special decoder responsible for mixing and aligning visual feature and linguistic feature. ABINet

参数
  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary. The dictionary must have an end token.

  • vision_decoder (dict) – The config for vision decoder.

  • language_decoder (dict, optional) – The config for language decoder.

  • num_iters (int) – Rounds of iterative correction. Defaults to 1.

  • d_model (int) – Hidden size \(E\) of model. Defaults to 512.

  • max_seq_len (int) – Maximum sequence length \(T\). The sequence is usually generated from decoder. Defaults to 40.

  • module_loss (dict, optional) – Config to build loss. Defaults to None.

  • postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

返回类型

None

forward_test(feat, logits, data_samples=None)[源代码]
参数
  • feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.

  • logits (Tensor) – Raw language logitis. Shape \((N, T, C)\).

  • data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

返回

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

返回类型

Tensor

forward_train(feat=None, out_enc=None, data_samples=None)[源代码]
参数
  • feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.

  • out_enc (Tensor) – Raw language logitis. Shape \((N, T, C)\). Defaults to None.

  • data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

返回

A dict with keys out_enc, out_decs and out_fusers.

  • out_vis (dict): Dict from self.vision_decoder with keys feature, logits and attn_scores.

  • out_langs (dict or list): Dict from self.vision_decoder with keys feature, logits if applicable, or an empty list otherwise.

  • out_fusers (dict or list): Dict of fused visual and language features with keys feature, logits if applicable, or an empty list otherwise.

返回类型

Dict

fuse(l_feature, v_feature)[源代码]

Mix and align visual feature and linguistic feature.

参数
  • l_feature (torch.Tensor) – (N, T, E) where T is length, N is batch size and E is dim of model.

  • v_feature (torch.Tensor) – (N, T, E) shape the same as l_feature.

返回

A dict with key logits. of shape \((N, T, C)\) where N is batch size, T is length and C is the number of characters.

返回类型

dict

Read the Docs v: stable
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.