FCEModuleLoss¶

class mmocr.models.textdet.FCEModuleLoss(fourier_degree, num_sample, negative_ratio=3.0, resample_step=4.0, center_region_shrink_ratio=0.3, level_size_divisors=(8, 16, 32), level_proportion_range=((0, 0.4), (0.3, 0.7), (0.6, 1.0)), loss_tr={'type': 'MaskedBalancedBCELoss'}, loss_tcl={'type': 'MaskedBCELoss'}, loss_reg_x={'reduction': 'none', 'type': 'SmoothL1Loss'}, loss_reg_y={'reduction': 'none', 'type': 'SmoothL1Loss'})[source]¶

The class for implementing FCENet loss.

FCENet(CVPR2021): Fourier Contour Embedding for Arbitrary-shaped Text Detection

Parameters

fourier_degree (int) – The maximum Fourier transform degree k.
num_sample (int) – The sampling points number of regression loss. If it is too small, fcenet tends to be overfitting.
negative_ratio (float or int) – Maximum ratio of negative samples to positive ones in OHEM. Defaults to 3.
resample_step (float) – The step size for resampling the text center line (TCL). It’s better not to exceed half of the minimum width.
center_region_shrink_ratio (float) – The shrink ratio of text center region.
level_size_divisors (tuple(int)) – The downsample ratio on each level.
level_proportion_range (tuple(tuple(int))) – The range of text sizes assigned to each level.
loss_tr (dict) – The loss config used to calculate the text region loss. Defaults to dict(type=’MaskedBalancedBCELoss’).
loss_tcl (dict) – The loss config used to calculate the text center line loss. Defaults to dict(type=’MaskedBCELoss’).
loss_reg_x (dict) – The loss config used to calculate the regression loss on x axis. Defaults to dict(type=’MaskedSmoothL1Loss’).
loss_reg_y (dict) – The loss config used to calculate the regression loss on y axis. Defaults to dict(type=’MaskedSmoothL1Loss’).

Return type

None

forward(preds, data_samples)[source]¶

Compute FCENet loss.

Parameters

preds (list[dict]) – A list of dict with keys of cls_res, reg_res corresponds to the classification result and regression result computed from the input tensor with the same index. They have the shapes of \((N, C_{cls,i}, H_i, W_i)\) and :math: (N, C_{out,i}, H_i, W_i).
data_samples (list[TextDetDataSample]) – The data samples.

Returns

The dict for fcenet losses with loss_text, loss_center,: loss_reg_x and loss_reg_y.

Return type

dict

forward_single(pred, gt)[source]¶

Compute loss for one feature level.

Parameters

pred (dict) – A dict with keys cls_res and reg_res corresponds to the classification result and regression result from one feature level.
gt (Tensor) – Ground truth for one feature level. Cls and reg targets are concatenated along the channel dimension.

Returns

A list of losses for each feature level.

Return type

list[Tensor]

get_targets(data_samples)[source]¶

Generate loss targets for fcenet from data samples.

Parameters

data_samples (list(TextDetDataSample)) – Ground truth data samples.

Returns

A tuple of three tensors from three different: feature level as FCENet targets.

Return type

tuple[Tensor]