DRRGHead¶

class mmocr.models.textdet.DRRGHead(in_channels, k_at_hops=(8, 4), num_adjacent_linkages=3, node_geo_feat_len=120, pooling_scale=1.0, pooling_output_size=(4, 3), nms_thr=0.3, min_width=8.0, max_width=24.0, comp_shrink_ratio=1.03, comp_ratio=0.4, comp_score_thr=0.3, text_region_thr=0.2, center_region_thr=0.2, center_region_area_thr=50, local_graph_thr=0.7, module_loss={'type': 'DRRGModuleLoss'}, postprocessor={'link_thr': 0.85, 'type': 'DRRGPostprocessor'}, init_cfg={'mean': 0, 'override': {'name': 'out_conv'}, 'std': 0.01, 'type': 'Normal'})[source]¶

The class for DRRG head: Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection.

Parameters

in_channels (int) – The number of input channels.
k_at_hops (tuple(int)) – The number of i-hop neighbors, i = 1, 2. Defaults to (8, 4).
num_adjacent_linkages (int) – The number of linkages when constructing adjacent matrix. Defaults to 3.
node_geo_feat_len (int) – The length of embedded geometric feature vector of a component. Defaults to 120.
pooling_scale (float) – The spatial scale of rotated RoI-Align. Defaults to 1.0.
pooling_output_size (tuple(int)) – The output size of RRoI-Aligning. Defaults to (4, 3).
nms_thr (float) – The locality-aware NMS threshold of text components. Defaults to 0.3.
min_width (float) – The minimum width of text components. Defaults to 8.0.
max_width (float) – The maximum width of text components. Defaults to 24.0.
comp_shrink_ratio (float) – The shrink ratio of text components. Defaults to 1.03.
comp_ratio (float) – The reciprocal of aspect ratio of text components. Defaults to 0.4.
comp_score_thr (float) – The score threshold of text components. Defaults to 0.3.
text_region_thr (float) – The threshold for text region probability map. Defaults to 0.2.
center_region_thr (float) – The threshold for text center region probability map. Defaults to 0.2.
center_region_area_thr (int) – The threshold for filtering small-sized text center region. Defaults to 50.
local_graph_thr (float) – The threshold to filter identical local graphs. Defaults to 0.7.
module_loss (dict) – The config of loss that DRRGHead uses. Defaults to dict(type='DRRGModuleLoss').
postprocessor (dict) – Config of postprocessor for Drrg. Defaults to dict(type='DrrgPostProcessor', link_thr=0.85).
init_cfg (dict or list[dict], optional) – Initialization configs. Defaults to dict(type='Normal', override=dict(name='out_conv'), mean=0, std=0.01).

Return type

None

forward(inputs, data_samples=None)[source]¶

Run DRRG head in prediction mode, and return the raw tensors only. :param inputs: Shape of \((1, C, H, W)\). :type inputs: Tensor :param data_samples: A list of data

samples. Defaults to None.

Returns

Returns (edge, score, text_comps).

edge (ndarray): The edge array of shape \((N_{edges}, 2)\) where each row is a pair of text component indices that makes up an edge in graph.
score (ndarray): The score array of shape \((N_{edges},)\), corresponding to the edge above.
text_comps (ndarray): The text components of shape \((M, 9)\) where each row corresponds to one box and its score: (x1, y1, x2, y2, x3, y3, x4, y4, score).

Return type

tuple

Parameters

inputs (torch.Tensor) –
data_samples (list[TextDetDataSample], optional) –

loss(inputs, data_samples)[source]¶

Loss function.

Parameters

inputs (Tensor) – Shape of \((N, C, H, W)\).
data_samples (List[TextDetDataSample]) – List of data samples.

Returns

pred_maps (Tensor): Prediction map with shape
\((N, 6, H, W)\).
gcn_pred (Tensor): Prediction from GCN module, with
shape \((N, 2)\).
gt_labels (Tensor): Ground-truth label of shape
\((m, n)\) where \(m * n = N\).

Return type

tuple(pred_maps, gcn_pred, gt_labels)