Dictionary¶
- class mmocr.models.common.Dictionary(dict_file, with_start=False, with_end=False, same_start_end=False, with_padding=False, with_unknown=False, start_token='<BOS>', end_token='<EOS>', start_end_token='<BOS/EOS>', padding_token='<PAD>', unknown_token='<UKN>')[source]¶
The class generates a dictionary for recognition. It pre-defines four special tokens:
start_token
,end_token
,pad_token
, andunknown_token
, which will be sequentially placed at the end of the dictionary when their corresponding flags are True.- Parameters
dict_file (str) – The path of Character dict file which a single character must occupies a line.
with_start (bool) – The flag to control whether to include the start token. Defaults to False.
with_end (bool) – The flag to control whether to include the end token. Defaults to False.
same_start_end (bool) – The flag to control whether the start token and end token are the same. It only works when both
with_start
andwith_end
are True. Defaults to False.with_padding (bool) – The padding token may represent more than a padding. It can also represent tokens like the blank token in CTC or the background token in SegOCR. Defaults to False.
with_unknown (bool) – The flag to control whether to include the unknown token. Defaults to False.
start_token (str) – The start token as a string. Defaults to ‘<BOS>’.
end_token (str) – The end token as a string. Defaults to ‘<EOS>’.
start_end_token (str) – The start/end token as a string. if start and end is the same. Defaults to ‘<BOS/EOS>’.
padding_token (str) – The padding token as a string. Defaults to ‘<PAD>’.
unknown_token (str, optional) – The unknown token as a string. If it’s set to None and
with_unknown
is True, the unknown token will be skipped when converting string to index. Defaults to ‘<UKN>’.
- Return type
- property dict: list¶
Returns a list of characters to recognize, where special tokens are counted.
- Type