SequentialSARDecoder¶

class mmocr.models.textrecog.SequentialSARDecoder(dictionary=None, module_loss=None, postprocessor=None, enc_bi_rnn=False, dec_bi_rnn=False, dec_gru=False, d_k=64, d_model=512, d_enc=512, pred_dropout=0.0, mask=True, max_seq_len=40, pred_concat=False, init_cfg=None, **kwargs)[source]¶

Implementation Sequential Decoder module in `SAR.

<https://arxiv.org/abs/1811.00751>`_.

Parameters

dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.
module_loss (dict, optional) – Config to build module_loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
enc_bi_rnn (bool) – If True, use bidirectional RNN in encoder. Defaults to False.
dec_bi_rnn (bool) – If True, use bidirectional RNN in decoder. Defaults to False.
dec_do_rnn (float) – Dropout of RNN layer in decoder. Defaults to 0.
dec_gru (bool) – If True, use GRU, else LSTM in decoder. Defaults to False.
d_k (int) – Dim of conv layers in attention module. Defaults to 64.
d_model (int) – Dim of channels from backbone \(D_i\). Defaults to 512.
d_enc (int) – Dim of encoder RNN layer \(D_m\). Defaults to 512.
pred_dropout (float) – Dropout probability of prediction layer. Defaults to 0.
max_seq_len (int) – Maximum sequence length during decoding. Defaults to 40.
mask (bool) – If True, mask padding in feature map. Defaults to False.
pred_concat (bool) – If True, concat glimpse feature from attention with holistic feature and hidden state. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

forward_test(feat, out_enc, data_samples=None)[source]¶

Parameters

feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing valid_ratio information.

Returns

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

Return type

Tensor

forward_train(feat, out_enc, data_samples=None)[source]¶

Parameters

feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing gt_text and valid_ratio information.

Returns

A raw logit tensor of shape \((N, T, C)\).

Return type

Tensor