SAREncoder¶
- class mmocr.models.textrecog.SAREncoder(enc_bi_rnn=False, rnn_dropout=0.0, enc_gru=False, d_model=512, d_enc=512, mask=True, init_cfg=[{'type': 'Xavier', 'layer': 'Conv2d'}, {'type': 'Uniform', 'layer': 'BatchNorm2d'}], **kwargs)[source]¶
Implementation of encoder module in `SAR.
<https://arxiv.org/abs/1811.00751>`_.
- Parameters
enc_bi_rnn (bool) – If True, use bidirectional RNN in encoder. Defaults to False.
rnn_dropout (float) – Dropout probability of RNN layer in encoder. Defaults to 0.0.
enc_gru (bool) – If True, use GRU, else LSTM in encoder. Defaults to False.
d_model (int) – Dim \(D_i\) of channels from backbone. Defaults to 512.
d_enc (int) – Dim \(D_m\) of encoder RNN layer. Defaults to 512.
mask (bool) – If True, mask padding in RNN sequence. Defaults to True.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to [dict(type=’Xavier’, layer=’Conv2d’), dict(type=’Uniform’, layer=’BatchNorm2d’)].
- Return type
- forward(feat, data_samples=None)[source]¶
- Parameters
feat (Tensor) – Tensor of shape \((N, D_i, H, W)\).
data_samples (list[TextRecogDataSample], optional) – Batch of TextRecogDataSample, containing valid_ratio information. Defaults to None.
- Returns
A tensor of shape \((N, D_m)\).
- Return type
Tensor