ABIEncoder¶

class mmocr.models.textrecog.ABIEncoder(n_layers=2, n_head=8, d_model=512, d_inner=2048, dropout=0.1, max_len=256, init_cfg=None)[source]¶

Implement transformer encoder for text recognition, modified from <https://github.com/FangShancheng/ABINet>.

Parameters

n_layers (int) – Number of attention layers. Defaults to 2.
n_head (int) – Number of parallel attention heads. Defaults to 8.
d_model (int) – Dimension \(D_m\) of the input from previous model. Defaults to 512.
d_inner (int) – Hidden dimension of feedforward layers. Defaults to 2048.
dropout (float) – Dropout rate. Defaults to 0.1.
max_len (int) – Maximum output sequence length \(T\). Defaults to 8 * 32.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

forward(feature, data_samples)[source]¶

Parameters

feature (Tensor) – Feature tensor of shape \((N, D_m, H, W)\).
data_samples (List[TextRecogDataSample]) – List of data samples.

Returns

Features of shape \((N, D_m, H, W)\).

Return type

Tensor