SDMGRHead¶
- class mmocr.models.kie.SDMGRHead(dictionary, num_classes=26, visual_dim=64, fusion_dim=1024, node_input=32, node_embed=256, edge_input=5, edge_embed=256, num_gnn=2, bidirectional=False, relation_norm=10.0, module_loss={'type': 'SDMGRModuleLoss'}, postprocessor={'type': 'SDMGRPostProcessor'}, init_cfg={'mean': 0, 'override': {'name': 'edge_embed'}, 'std': 0.01, 'type': 'Normal'})[源代码]¶
SDMGR Head.
- 参数
dictionary (dict or
Dictionary
) – The config for Dictionary or the instance of Dictionary.num_classes (int) – Number of class labels. Defaults to 26.
visual_dim (int) – Dimension of visual features \(E\). Defaults to 64.
fusion_dim (int) – Dimension of fusion layer. Defaults to 1024.
node_input (int) – Dimension of raw node embedding. Defaults to 32.
node_embed (int) – Dimension of node embedding. Defaults to 256.
edge_input (int) – Dimension of raw edge embedding. Defaults to 5.
edge_embed (int) – Dimension of edge embedding. Defaults to 256.
num_gnn (int) – Number of GNN layers. Defaults to 2.
bidirectional (bool) – Whether to use bidirectional RNN to embed nodes. Defaults to False.
relation_norm (float) – Norm to map value from one range to another.= Defaults to 10.
module_loss (dict) – Module Loss config. Defaults to
dict(type='SDMGRModuleLoss')
.postprocessor (dict) – Postprocessor config. Defaults to
dict(type='SDMGRPostProcessor')
.init_cfg (dict or list[dict], optional) – Initialization configs.
- 返回类型
- compute_relations(data_samples)[源代码]¶
Compute the relations between every two boxes for each datasample, then return the concatenated relations.
- 参数
data_samples (List[mmocr.structures.kie_data_sample.KIEDataSample]) –
- 返回类型
- convert_texts(data_samples)[源代码]¶
Extract texts in datasamples and pack them into a batch.
- 参数
data_samples (List[KIEDataSample]) – List of data samples.
- 返回
node_nums (List[int]): A list of node numbers for each sample.
char_nums (List[Tensor]): A list of character numbers for each sample.
nodes (Tensor): A tensor of shape \((N, C)\) where \(C\) is the maximum number of characters in a sample.
- 返回类型
- forward(inputs, data_samples)[源代码]¶
- 参数
inputs (torch.Tensor) – Shape \((N, E)\).
data_samples (List[KIEDataSample]) – List of data samples.
- 返回
node_cls (Tensor): Raw logits scores for nodes. Shape \((N, C_{l})\) where \(C_{l}\) is number of classes.
edge_cls (Tensor): Raw logits scores for edges. Shape \((N * N, 2)\).
- 返回类型
tuple(Tensor, Tensor)
- loss(inputs, data_samples)[源代码]¶
Calculate losses from a batch of inputs and data samples. :param inputs: Shape \((N, E)\). :type inputs: torch.Tensor :param data_samples: List of data samples. :type data_samples: List[KIEDataSample]
- 返回
A dictionary of loss components.
- 返回类型
- 参数
inputs (torch.Tensor) –
data_samples (List[mmocr.structures.kie_data_sample.KIEDataSample]) –
- predict(inputs, data_samples)[源代码]¶
Predict results from a batch of inputs and data samples with post- processing.
- 参数
inputs (torch.Tensor) – Shape \((N, E)\).
data_samples (List[KIEDataSample]) – List of data samples.
- 返回
A list of datasamples of prediction results. Results are stored in
pred_instances.labels
,pred_instances.scores
,pred_instances.edge_labels
andpred_instances.edge_scores
.labels (Tensor): An integer tensor of shape (N, ) indicating bbox labels for each image.
scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for node label predictions.
edge_labels (Tensor): An integer tensor of shape (N, N) indicating the connection between nodes. Options are 0, 1.
edge_scores (Tensor): A float tensor of shape (N, ), indicating the confidence scores for edge predictions.
- 返回类型
List[KIEDataSample]