SDMGRHead¶

class mmocr.models.kie.SDMGRHead(dictionary, num_classes=26, visual_dim=64, fusion_dim=1024, node_input=32, node_embed=256, edge_input=5, edge_embed=256, num_gnn=2, bidirectional=False, relation_norm=10.0, module_loss={'type': 'SDMGRModuleLoss'}, postprocessor={'type': 'SDMGRPostProcessor'}, init_cfg={'mean': 0, 'override': {'name': 'edge_embed'}, 'std': 0.01, 'type': 'Normal'})[source]¶

SDMGR Head.

Parameters

dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.
num_classes (int) – Number of class labels. Defaults to 26.
visual_dim (int) – Dimension of visual features \(E\). Defaults to 64.
fusion_dim (int) – Dimension of fusion layer. Defaults to 1024.
node_input (int) – Dimension of raw node embedding. Defaults to 32.
node_embed (int) – Dimension of node embedding. Defaults to 256.
edge_input (int) – Dimension of raw edge embedding. Defaults to 5.
edge_embed (int) – Dimension of edge embedding. Defaults to 256.
num_gnn (int) – Number of GNN layers. Defaults to 2.
bidirectional (bool) – Whether to use bidirectional RNN to embed nodes. Defaults to False.
relation_norm (float) – Norm to map value from one range to another.= Defaults to 10.
module_loss (dict) – Module Loss config. Defaults to dict(type='SDMGRModuleLoss').
postprocessor (dict) – Postprocessor config. Defaults to dict(type='SDMGRPostProcessor').
init_cfg (dict or list[dict], optional) – Initialization configs.

Return type

None

compute_relations(data_samples)[source]¶

Compute the relations between every two boxes for each datasample, then return the concatenated relations.

Parameters: data_samples (List[mmocr.structures.kie_data_sample.KIEDataSample]) –
Return type: torch.Tensor

convert_texts(data_samples)[source]¶

Extract texts in datasamples and pack them into a batch.

Parameters

data_samples (List[KIEDataSample]) – List of data samples.

Returns

node_nums (List[int]): A list of node numbers for each sample.
char_nums (List[Tensor]): A list of character numbers for each sample.
nodes (Tensor): A tensor of shape \((N, C)\) where \(C\) is the maximum number of characters in a sample.

Return type

tuple(List[int], List[Tensor], Tensor)

forward(inputs, data_samples)[source]¶

Parameters

inputs (torch.Tensor) – Shape \((N, E)\).
data_samples (List[KIEDataSample]) – List of data samples.

Returns

node_cls (Tensor): Raw logits scores for nodes. Shape \((N, C_{l})\) where \(C_{l}\) is number of classes.
edge_cls (Tensor): Raw logits scores for edges. Shape \((N * N, 2)\).

Return type

tuple(Tensor, Tensor)

loss(inputs, data_samples)[source]¶

Calculate losses from a batch of inputs and data samples. :param inputs: Shape \((N, E)\). :type inputs: torch.Tensor :param data_samples: List of data samples. :type data_samples: List[KIEDataSample]

Returns

A dictionary of loss components.

Return type

dict[str, tensor]

Parameters

inputs (torch.Tensor) –
data_samples (List[mmocr.structures.kie_data_sample.KIEDataSample]) –

predict(inputs, data_samples)[source]¶

Predict results from a batch of inputs and data samples with post- processing.

Parameters

inputs (torch.Tensor) – Shape \((N, E)\).
data_samples (List[KIEDataSample]) – List of data samples.

Returns

A list of datasamples of prediction results. Results are stored in pred_instances.labels, pred_instances.scores, pred_instances.edge_labels and pred_instances.edge_scores.

labels (Tensor): An integer tensor of shape (N, ) indicating bbox labels for each image.
scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for node label predictions.
edge_labels (Tensor): An integer tensor of shape (N, N) indicating the connection between nodes. Options are 0, 1.
edge_scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for edge predictions.

Return type

List[KIEDataSample]