ABIFuser¶

class mmocr.models.textrecog.ABIFuser(dictionary, vision_decoder, language_decoder=None, d_model=512, num_iters=1, max_seq_len=40, module_loss=None, postprocessor=None, init_cfg=None, **kwargs)[source]¶

A special decoder responsible for mixing and aligning visual feature and linguistic feature. ABINet

Parameters

dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary. The dictionary must have an end token.
vision_decoder (dict) – The config for vision decoder.
language_decoder (dict, optional) – The config for language decoder.
num_iters (int) – Rounds of iterative correction. Defaults to 1.
d_model (int) – Hidden size \(E\) of model. Defaults to 512.
max_seq_len (int) – Maximum sequence length \(T\). The sequence is usually generated from decoder. Defaults to 40.
module_loss (dict, optional) – Config to build loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

Return type

None

forward_test(feat, logits, data_samples=None)[source]¶

Parameters

feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
logits (Tensor) – Raw language logitis. Shape \((N, T, C)\).
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

Returns

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

Return type

Tensor

forward_train(feat=None, out_enc=None, data_samples=None)[source]¶

Parameters

feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
out_enc (Tensor) – Raw language logitis. Shape \((N, T, C)\). Defaults to None.
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

Returns

A dict with keys out_enc, out_decs and out_fusers.

out_vis (dict): Dict from self.vision_decoder with keys feature, logits and attn_scores.
out_langs (dict or list): Dict from self.vision_decoder with keys feature, logits if applicable, or an empty list otherwise.
out_fusers (dict or list): Dict of fused visual and language features with keys feature, logits if applicable, or an empty list otherwise.

Return type

Dict

fuse(l_feature, v_feature)[source]¶

Mix and align visual feature and linguistic feature.

Parameters

l_feature (torch.Tensor) – (N, T, E) where T is length, N is batch size and E is dim of model.
v_feature (torch.Tensor) – (N, T, E) shape the same as l_feature.

Returns

A dict with key logits. of shape \((N, T, C)\) where N is batch size, T is length and C is the number of characters.

Return type

dict