EncoderDecoderRecognizer¶
- class mmocr.models.textrecog.EncoderDecoderRecognizer(preprocessor=None, backbone=None, encoder=None, decoder=None, data_preprocessor=None, init_cfg=None)[source]¶
Base class for encode-decode recognizer.
- Parameters
preprocessor (dict, optional) – Config dict for preprocessor. Defaults to None.
backbone (dict, optional) – Backbone config. Defaults to None.
encoder (dict, optional) – Encoder config. If None, the output from backbone will be directly fed into
decoder
. Defaults to None.decoder (dict, optional) – Decoder config. Defaults to None.
data_preprocessor (dict, optional) – Model preprocessing config for processing the input image data. Keys allowed are ``to_rgb``(bool), ``pad_size_divisor``(int), ``pad_value``(int or float), ``mean``(int or float) and ``std``(int or float). Preprcessing order: 1. to rgb; 2. normalization 3. pad. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.
- Return type
- extract_feat(inputs)[source]¶
Directly extract features from the backbone.
- Parameters
inputs (torch.Tensor) –
- Return type
- loss(inputs, data_samples, **kwargs)[source]¶
Calculate losses from a batch of inputs and data samples. :param inputs: Input images of shape (N, C, H, W).
Typically these should be mean centered and std scaled.
- Parameters
data_samples (list[TextRecogDataSample]) – A list of N datasamples, containing meta information and gold annotations for each of the images.
inputs (tensor) –
- Returns
A dictionary of loss components.
- Return type
- predict(inputs, data_samples, **kwargs)[source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (torch.Tensor) – Image input tensor.
data_samples (list[TextRecogDataSample]) – A list of N datasamples, containing meta information and gold annotations for each of the images.
- Returns
A list of N datasamples of prediction results. Results are stored in
pred_text
.- Return type