Shortcuts

SDMGRHead

class mmocr.models.kie.SDMGRHead(dictionary, num_classes=26, visual_dim=64, fusion_dim=1024, node_input=32, node_embed=256, edge_input=5, edge_embed=256, num_gnn=2, bidirectional=False, relation_norm=10.0, module_loss={'type': 'SDMGRModuleLoss'}, postprocessor={'type': 'SDMGRPostProcessor'}, init_cfg={'mean': 0, 'override': {'name': 'edge_embed'}, 'std': 0.01, 'type': 'Normal'})[source]

SDMGR Head.

Parameters
  • dictionary (dict or Dictionary) – The config for Dictionary or the instance of Dictionary.

  • num_classes (int) – Number of class labels. Defaults to 26.

  • visual_dim (int) – Dimension of visual features \(E\). Defaults to 64.

  • fusion_dim (int) – Dimension of fusion layer. Defaults to 1024.

  • node_input (int) – Dimension of raw node embedding. Defaults to 32.

  • node_embed (int) – Dimension of node embedding. Defaults to 256.

  • edge_input (int) – Dimension of raw edge embedding. Defaults to 5.

  • edge_embed (int) – Dimension of edge embedding. Defaults to 256.

  • num_gnn (int) – Number of GNN layers. Defaults to 2.

  • bidirectional (bool) – Whether to use bidirectional RNN to embed nodes. Defaults to False.

  • relation_norm (float) – Norm to map value from one range to another.= Defaults to 10.

  • module_loss (dict) – Module Loss config. Defaults to dict(type='SDMGRModuleLoss').

  • postprocessor (dict) – Postprocessor config. Defaults to dict(type='SDMGRPostProcessor').

  • init_cfg (dict or list[dict], optional) – Initialization configs.

Return type

None

compute_relations(data_samples)[source]

Compute the relations between every two boxes for each datasample, then return the concatenated relations.

Parameters

data_samples (List[mmocr.structures.kie_data_sample.KIEDataSample]) –

Return type

torch.Tensor

convert_texts(data_samples)[source]

Extract texts in datasamples and pack them into a batch.

Parameters

data_samples (List[KIEDataSample]) – List of data samples.

Returns

  • node_nums (List[int]): A list of node numbers for each sample.

  • char_nums (List[Tensor]): A list of character numbers for each sample.

  • nodes (Tensor): A tensor of shape \((N, C)\) where \(C\) is the maximum number of characters in a sample.

Return type

tuple(List[int], List[Tensor], Tensor)

forward(inputs, data_samples)[source]
Parameters
Returns

  • node_cls (Tensor): Raw logits scores for nodes. Shape \((N, C_{l})\) where \(C_{l}\) is number of classes.

  • edge_cls (Tensor): Raw logits scores for edges. Shape \((N * N, 2)\).

Return type

tuple(Tensor, Tensor)

loss(inputs, data_samples)[source]

Calculate losses from a batch of inputs and data samples. :param inputs: Shape \((N, E)\). :type inputs: torch.Tensor :param data_samples: List of data samples. :type data_samples: List[KIEDataSample]

Returns

A dictionary of loss components.

Return type

dict[str, tensor]

Parameters
predict(inputs, data_samples)[source]

Predict results from a batch of inputs and data samples with post- processing.

Parameters
Returns

A list of datasamples of prediction results. Results are stored in pred_instances.labels, pred_instances.scores, pred_instances.edge_labels and pred_instances.edge_scores.

  • labels (Tensor): An integer tensor of shape (N, ) indicating bbox labels for each image.

  • scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for node label predictions.

  • edge_labels (Tensor): An integer tensor of shape (N, N) indicating the connection between nodes. Options are 0, 1.

  • edge_scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for edge predictions.

Return type

List[KIEDataSample]

Read the Docs v: dev-1.x
Versions
latest
stable
v1.0.1
v1.0.0
0.x
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
v0.4.1
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.