ABIFuser¶
- class mmocr.models.textrecog.ABIFuser(dictionary, vision_decoder, language_decoder=None, d_model=512, num_iters=1, max_seq_len=40, module_loss=None, postprocessor=None, init_cfg=None, **kwargs)[source]¶
A special decoder responsible for mixing and aligning visual feature and linguistic feature. ABINet
- Parameters
dictionary (dict or
Dictionary
) – The config for Dictionary or the instance of Dictionary. The dictionary must have an end token.vision_decoder (dict) – The config for vision decoder.
language_decoder (dict, optional) – The config for language decoder.
num_iters (int) – Rounds of iterative correction. Defaults to 1.
d_model (int) – Hidden size \(E\) of model. Defaults to 512.
max_seq_len (int) – Maximum sequence length \(T\). The sequence is usually generated from decoder. Defaults to 40.
module_loss (dict, optional) – Config to build loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.
- Return type
- forward_test(feat, logits, data_samples=None)[source]¶
- Parameters
feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
logits (Tensor) – Raw language logitis. Shape \((N, T, C)\).
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.
- Returns
Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is
num_classes
.- Return type
Tensor
- forward_train(feat=None, out_enc=None, data_samples=None)[source]¶
- Parameters
feat (torch.Tensor, optional) – Not required. Feature map placeholder. Defaults to None.
out_enc (Tensor) – Raw language logitis. Shape \((N, T, C)\). Defaults to None.
data_samples (list[TextRecogDataSample], optional) – Not required. DataSample placeholder. Defaults to None.
- Returns
A dict with keys
out_enc
,out_decs
andout_fusers
.out_vis (dict): Dict from
self.vision_decoder
with keysfeature
,logits
andattn_scores
.out_langs (dict or list): Dict from
self.vision_decoder
with keysfeature
,logits
if applicable, or an empty list otherwise.out_fusers (dict or list): Dict of fused visual and language features with keys
feature
,logits
if applicable, or an empty list otherwise.
- Return type
Dict
- fuse(l_feature, v_feature)[source]¶
Mix and align visual feature and linguistic feature.
- Parameters
l_feature (torch.Tensor) – (N, T, E) where T is length, N is batch size and E is dim of model.
v_feature (torch.Tensor) – (N, T, E) shape the same as l_feature.
- Returns
A dict with key
logits
. of shape \((N, T, C)\) where N is batch size, T is length and C is the number of characters.- Return type