Research Article

Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets

Figure 6

We provide distribution of labels for Chinese-ocr dataset. The first bar represents the most 300 frequently used words in dataset. Obviously, most Chinese words only account for a small part of all words.