Shortcuts

Dictionary

class mmocr.models.common.Dictionary(dict_file, with_start=False, with_end=False, same_start_end=False, with_padding=False, with_unknown=False, start_token='<BOS>', end_token='<EOS>', start_end_token='<BOS/EOS>', padding_token='<PAD>', unknown_token='<UKN>')[源代码]

The class generates a dictionary for recognition. It pre-defines four special tokens: start_token, end_token, pad_token, and unknown_token, which will be sequentially placed at the end of the dictionary when their corresponding flags are True.

参数
  • dict_file (str) – The path of Character dict file which a single character must occupies a line.

  • with_start (bool) – The flag to control whether to include the start token. Defaults to False.

  • with_end (bool) – The flag to control whether to include the end token. Defaults to False.

  • same_start_end (bool) – The flag to control whether the start token and end token are the same. It only works when both with_start and with_end are True. Defaults to False.

  • with_padding (bool) – The padding token may represent more than a padding. It can also represent tokens like the blank token in CTC or the background token in SegOCR. Defaults to False.

  • with_unknown (bool) – The flag to control whether to include the unknown token. Defaults to False.

  • start_token (str) – The start token as a string. Defaults to ‘<BOS>’.

  • end_token (str) – The end token as a string. Defaults to ‘<EOS>’.

  • start_end_token (str) – The start/end token as a string. if start and end is the same. Defaults to ‘<BOS/EOS>’.

  • padding_token (str) – The padding token as a string. Defaults to ‘<PAD>’.

  • unknown_token (str, optional) – The unknown token as a string. If it’s set to None and with_unknown is True, the unknown token will be skipped when converting string to index. Defaults to ‘<UKN>’.

返回类型

None

char2idx(char, strict=True)[源代码]

Convert a character to an index via Dictionary.dict.

参数
  • char (str) – The character to convert to index.

  • strict (bool) – The flag to control whether to raise an exception when the character is not in the dictionary. Defaults to True.

返回

The index of the character.

返回类型

int

property dict: list

Returns a list of characters to recognize, where special tokens are counted.

Type

list

idx2str(index)[源代码]

Convert a list of index to string.

参数

index (list[int]) – The list of indexes to convert to string.

返回

The converted string.

返回类型

str

property num_classes: int

Number of output classes. Special tokens are counted.

Type

int

str2idx(string)[源代码]

Convert a string to a list of indexes via Dictionary.dict.

参数

string (str) – The string to convert to indexes.

返回

The list of indexes of the string.

返回类型

list

Read the Docs v: dev-1.x
Versions
latest
stable
0.x
dev-1.x
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.