ParallelSARDecoder¶

class mmocr.models.textrecog.ParallelSARDecoder(dictionary, module_loss=None, postprocessor=None, enc_bi_rnn=False, dec_bi_rnn=False, dec_rnn_dropout=0.0, dec_gru=False, d_model=512, d_enc=512, d_k=64, pred_dropout=0.0, max_seq_len=30, mask=True, pred_concat=False, init_cfg=None, **kwargs)[源代码]¶

Implementation Parallel Decoder module in `SAR.

<https://arxiv.org/abs/1811.00751>`_.

参数

dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.
module_loss (dict, optional) – Config to build module_loss. Defaults to None.
postprocessor (dict, optional) – Config to build postprocessor. Defaults to None.
enc_bi_rnn (bool) – If True, use bidirectional RNN in encoder. Defaults to False.
dec_bi_rnn (bool) – If True, use bidirectional RNN in decoder. Defaults to False.
dec_rnn_dropout (float) – Dropout of RNN layer in decoder. Defaults to 0.0.
dec_gru (bool) – If True, use GRU, else LSTM in decoder. Defaults to False.
d_model (int) – Dim of channels from backbone \(D_i\). Defaults to 512.
d_enc (int) – Dim of encoder RNN layer \(D_m\). Defaults to 512.
d_k (int) – Dim of channels of attention module. Defaults to 64.
pred_dropout (float) – Dropout probability of prediction layer. Defaults to 0.0.
max_seq_len (int) – Maximum sequence length for decoding. Defaults to 30.
mask (bool) – If True, mask padding in feature map. Defaults to True.
pred_concat (bool) – If True, concat glimpse feature from attention with holistic feature and hidden state. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

返回类型

None

forward_test(feat, out_enc, data_samples=None)[源代码]¶

参数

feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing valid_ratio information. Defaults to None.

返回

Character probabilities. of shape \((N, self.max_seq_len, C)\) where \(C\) is num_classes.

返回类型

Tensor

forward_train(feat, out_enc, data_samples)[源代码]¶

参数

feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
out_enc (Tensor) – Encoder output of shape \((N, D_m, H, W)\).
data_samples (list[TextRecogDataSample]) – Batch of TextRecogDataSample, containing gt_text and valid_ratio information.

返回

A raw logit tensor of shape \((N, T, C)\).

返回类型

Tensor