PackKIEInputs¶
- class mmocr.datasets.transforms.PackKIEInputs(meta_keys=())[源代码]¶
Pack the inputs data for key information extraction.
The type of outputs is dict:
inputs: image converted to tensor, whose shape is (C, H, W).
data_samples: Two components of
TextDetDataSample
will be updated:gt_instances (InstanceData): Depending on annotations, a subset of the following keys will be updated:
bboxes (torch.Tensor((N, 4), dtype=torch.float32)): The groundtruth of bounding boxes in the form of [x1, y1, x2, y2]. Renamed from ‘gt_bboxes’.
labels (torch.LongTensor(N)): The labels of instances. Renamed from ‘gt_bboxes_labels’.
edge_labels (torch.LongTensor(N, N)): The edge labels. Renamed from ‘gt_edges_labels’.
texts (list[str]): The groundtruth texts. Renamed from ‘gt_texts’.
metainfo (dict): ‘metainfo’ is always populated. The contents of the ‘metainfo’ depends on
meta_keys
. By default it includes:“img_path”: Path to the image file.
“img_shape”: Shape of the image input to the network as a tuple (h, w). Note that the image may be zero-padded afterward on the bottom/right if the batch tensor is larger than this shape.
“scale_factor”: A tuple indicating the ratio of width and height of the preprocessed image to the original one.
“ori_shape”: Shape of the preprocessed image as a tuple (h, w).
- 参数
meta_keys (Sequence[str], optional) – Meta keys to be converted to the metainfo of
TextDetSample
. Defaults to('img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')
.