Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets

<div>We provide distribution of labels for Chinese-ocr dataset. The first bar represents the most 300 frequently used words in dataset. Obviously, most Chinese words only account for a small part of all words.</div>

Complexity

fig6

Figure 6

Figure 6: Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets