ABILanguageDecoder¶

class mmocr.models.textrecog.ABILanguageDecoder(dictionary, d_model=512, n_head=8, d_inner=2048, n_layers=4, dropout=0.1, detach_tokens=True, use_self_attn=False, max_seq_len=40, module_loss=None, postprocessor=None, init_cfg=None, **kwargs)[source]¶

Transformer-based language model responsible for spell correction. Implementation of language model of

ABINet.

Parameters

dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary. The dictionary must have an end token.
d_model (int) – Hidden size \(E\) of model. Defaults to 512.
n_head (int) – Number of multi-attention heads.
d_inner (int) – Hidden size of feedforward network model.
n_layers (int) – The number of similar decoding layers.
dropout (float) – Dropout rate.
detach_tokens (bool) – Whether to block the gradient flow at input tokens.
use_self_attn (bool) – If True, use self attention in decoder layers, otherwise cross attention will be used.
max_seq_len (int) – Maximum sequence length \(T\). The sequence is usually generated from decoder. Defaults to 40.
module_loss (dict, optional) – Config to build loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

Return type

None

forward_test(feat=None, logits=None, data_samples=None)[source]¶

Parameters

feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
logits (Tensor) – Raw language logitis. Shape \((N, T, C)\). Defaults to None.
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

Returns

A dict with keys feature and logits.

feature (Tensor): Shape \((N, T, E)\). Raw textual features for vision language aligner.
logits (Tensor): Shape \((N, T, C)\). The raw logits for characters after spell correction.

Return type

Dict

forward_train(feat=None, out_enc=None, data_samples=None)[source]¶

Parameters

feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
out_enc (torch.Tensor) – Logits with shape \((N, T, C)\). Defaults to None.
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.

Returns

A dict with keys feature and logits.

feature (Tensor): Shape \((N, T, E)\). Raw textual features for vision language aligner.
logits (Tensor): Shape \((N, T, C)\). The raw logits for characters after spell correction.

Return type

Dict