Shortcuts

SingleStageTextDetector

class mmocr.models.textdet.SingleStageTextDetector(backbone, det_head, neck=None, data_preprocessor=None, init_cfg=None)[源代码]

The class for implementing single stage text detector.

Single-stage text detectors directly and densely predict bounding boxes or polygons on the output features of the backbone + neck (optional).

参数
  • backbone (dict) – Backbone config.

  • neck (dict, optional) – Neck config. If None, the output from backbone will be directly fed into det_head.

  • det_head (dict) – Head config.

  • data_preprocessor (dict, optional) – Model preprocessing config for processing the input image data. Keys allowed are ``to_rgb``(bool), ``pad_size_divisor``(int), ``pad_value``(int or float), ``mean``(int or float) and ``std``(int or float). Preprcessing order: 1. to rgb; 2. normalization 3. pad. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to None.

返回类型

None

extract_feat(inputs)[源代码]

Extract features.

参数

inputs (Tensor) – Image tensor with shape (N, C, H ,W).

返回

Multi-level features that may have different resolutions.

返回类型

Tensor or tuple[Tensor]

loss(inputs, data_samples)[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • inputs (torch.Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • data_samples (list[TextDetDataSample]) – A list of N datasamples, containing meta information and gold annotations for each of the images.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

predict(inputs, data_samples)[源代码]

Predict results from a batch of inputs and data samples with post- processing.

参数
  • inputs (torch.Tensor) – Images of shape (N, C, H, W).

  • data_samples (list[TextDetDataSample]) – A list of N datasamples, containing meta information and gold annotations for each of the images.

返回

A list of N datasamples of prediction results. Each DetDataSample usually contain ‘pred_instances’. And the pred_instances usually contains following keys.

  • scores (Tensor): Classification scores, has a shape

    (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape

    (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4),

    the last dimension 4 arrange as (x1, y1, x2, y2).

  • polygons (list[np.ndarray]): The length is num_instances.

    Each element represents the polygon of the instance, in (xn, yn) order.

返回类型

list[TextDetDataSample]

Read the Docs v: dev-1.x
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.