Research Article

DetReco: Object-Text Detection and Recognition Based on Deep Neural Network

Figure 3

The feature fusion network. The generation of the 3 different scale feature maps uses upsampling and concatenation from convolutional layers. Firstly, the 3 different scale feature maps are extracted from the Darknet-53, respectively, in different convolutional layers. Secondly, the convolutional set which consists of a set of convolutional layers is used to reduce the channel of the 13 × 13 feature map. Then the 13 × 13 feature map is upsampled with stride 2 to concatenate with the 26 × 26 feature map. The 52 × 52 feature map concatenation process is similar to that of the 13 × 13 feature map. Finally, the red dashed boxes are the extracted feature maps in 3 different scales.