Research Article
Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets
Figure 2
The network architecture. The architecture consists of three parts: convolutional layers, which extract a feature sequence from the input image; recurrent layers, which predict a label distribution for each frame; transcription layer, which translates the per-frame predictions into the final label sequence.