Research Article

Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets

Figure 2

The network architecture. The architecture consists of three parts: convolutional layers, which extract a feature sequence from the input image; recurrent layers, which predict a label distribution for each frame; transcription layer, which translates the per-frame predictions into the final label sequence.