Shortcuts

Dictionary

class mmocr.models.common.Dictionary(dict_file, with_start=False, with_end=False, same_start_end=False, with_padding=False, with_unknown=False, start_token='<BOS>', end_token='<EOS>', start_end_token='<BOS/EOS>', padding_token='<PAD>', unknown_token='<UKN>')[source]

The class generates a dictionary for recognition. It pre-defines four special tokens: start_token, end_token, pad_token, and unknown_token, which will be sequentially placed at the end of the dictionary when their corresponding flags are True.

Parameters
  • dict_file (str) – The path of Character dict file which a single character must occupies a line.

  • with_start (bool) – The flag to control whether to include the start token. Defaults to False.

  • with_end (bool) – The flag to control whether to include the end token. Defaults to False.

  • same_start_end (bool) – The flag to control whether the start token and end token are the same. It only works when both with_start and with_end are True. Defaults to False.

  • with_padding (bool) – The padding token may represent more than a padding. It can also represent tokens like the blank token in CTC or the background token in SegOCR. Defaults to False.

  • with_unknown (bool) – The flag to control whether to include the unknown token. Defaults to False.

  • start_token (str) – The start token as a string. Defaults to ‘<BOS>’.

  • end_token (str) – The end token as a string. Defaults to ‘<EOS>’.

  • start_end_token (str) – The start/end token as a string. if start and end is the same. Defaults to ‘<BOS/EOS>’.

  • padding_token (str) – The padding token as a string. Defaults to ‘<PAD>’.

  • unknown_token (str, optional) – The unknown token as a string. If it’s set to None and with_unknown is True, the unknown token will be skipped when converting string to index. Defaults to ‘<UKN>’.

Return type

None

char2idx(char, strict=True)[source]

Convert a character to an index via Dictionary.dict.

Parameters
  • char (str) – The character to convert to index.

  • strict (bool) – The flag to control whether to raise an exception when the character is not in the dictionary. Defaults to True.

Returns

The index of the character.

Return type

int

property dict: list

Returns a list of characters to recognize, where special tokens are counted.

Type

list

idx2str(index)[source]

Convert a list of index to string.

Parameters

index (list[int]) – The list of indexes to convert to string.

Returns

The converted string.

Return type

str

property num_classes: int

Number of output classes. Special tokens are counted.

Type

int

str2idx(string)[source]

Convert a string to a list of indexes via Dictionary.dict.

Parameters

string (str) – The string to convert to indexes.

Returns

The list of indexes of the string.

Return type

list

Read the Docs v: dev-1.x
Versions
latest
stable
v1.0.1
v1.0.0
0.x
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
v0.4.1
v0.4.0
v0.3.0
v0.2.1
v0.2.0
v0.1.0
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.