Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets

<div>The network architecture. The architecture consists of three parts: convolutional layers, which extract a feature sequence from the input image; recurrent layers, which predict a label distribution for each frame; transcription layer, which translates the per-frame predictions into the final label sequence.</div>

Complexity

fig2

Figure 2

Figure 2: Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets