Shortcuts

PackTextDetInputs

class mmocr.datasets.transforms.PackTextDetInputs(meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]

Pack the inputs data for text detection.

The type of outputs is dict:

  • inputs: image converted to tensor, whose shape is (C, H, W).

  • data_samples: Two components of TextDetDataSample will be updated:

    • gt_instances (InstanceData): Depending on annotations, a subset of the following keys will be updated:

      • bboxes (torch.Tensor((N, 4), dtype=torch.float32)): The groundtruth of bounding boxes in the form of [x1, y1, x2, y2]. Renamed from ‘gt_bboxes’.

      • labels (torch.LongTensor(N)): The labels of instances. Renamed from ‘gt_bboxes_labels’.

      • polygons(list[np.array((2k,), dtype=np.float32)]): The groundtruth of polygons in the form of [x1, y1,…, xk, yk]. Each element in polygons may have different number of points. Renamed from ‘gt_polygons’. Using numpy instead of tensor is that polygon usually is not the output of model and operated on cpu.

      • ignored (torch.BoolTensor((N,))): The flag indicating whether the corresponding instance should be ignored. Renamed from ‘gt_ignored’.

      • texts (list[str]): The groundtruth texts. Renamed from ‘gt_texts’.

    • metainfo (dict): ‘metainfo’ is always populated. The contents of the ‘metainfo’ depends on meta_keys. By default it includes:

      • “img_path”: Path to the image file.

      • “img_shape”: Shape of the image input to the network as a tuple (h, w). Note that the image may be zero-padded afterward on the bottom/right if the batch tensor is larger than this shape.

      • “scale_factor”: A tuple indicating the ratio of width and height of the preprocessed image to the original one.

      • “ori_shape”: Shape of the preprocessed image as a tuple (h, w).

      • “pad_shape”: Image shape after padding (if any Pad-related transform involved) as a tuple (h, w).

      • “flip”: A boolean indicating if the image has been flipped.

      • flip_direction: the flipping direction.

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be converted to the metainfo of TextDetSample. Defaults to ('img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction').

transform(results)[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): Data for model forwarding.

  • ’data_samples’ (obj:DetDataSample): The annotation info of the sample.

Return type

dict

Read the Docs v: dev-1.x
Versions
latest
stable
v1.0.1
v1.0.0
0.x
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
v0.4.1
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.