Research Article

Object Detection Based on Fast/Faster RCNN Employing Fully Convolutional Architectures

Table 1

ConvNet configurations. The convolutional layer parameters are denoted as “layer/receptive field size-stride.” s2 stands for the case that the stride of the layer is 2 while stride 1 is omitted. All the receptive field size of convolutional layer is 3 × 3 which is omitted in ZF/VGG. LRN and GAP are short for local response normalization layer and global average pool layer while Inc and Res are the abbreviations for inception and residual building block, respectively. The ReLU activation function is not shown for brevity. And the batch normalization layer only included in the second three architectures is not shown as well.

ZFVGGGoogleNetInception_v2Inception_v3ResNet_50

Conv1-s2Conv1_1Conv1/7 × 7-s2Conv1/7 × 7-s2Conv1/3 × 3-s2Conv1/7 × 7-s2
LRNConv1_2MaxPool-s2MaxPool-s2Conv2/3 × 3MaxPool-s2
MaxPool-s2MaxPool-s2LRNConv2/1 × 1Conv3/3 × 3Res2a
Conv2-s2Conv2_1Conv2/1 × 1Conv2/3 × 3MaxPool-s2Res2b
LRNConv2_2Conv2/3 × 3MaxPool-s2Conv4/1 × 1Res2c
MaxPool-s2MaxPool-s2LRNInc_3aConv4/3 × 3Res3a-s2
Conv3Conv3_1MaxPool-s2Inc_3bMaxPool-s2Res3b
Conv4Conv3_2Inc_3aInc_3c-s2Inc_a1Res3c
Conv5Conv3_3Inc_3bInc_4aInc_a2Res3d
MaxPool-s2MaxPool-s2MaxPool-s2Inc_4bInc_a3Res4a-s2
Fc_4096Conv4_1Inc_4aInc_4cInc_a-s2Res4b
Fc_4096Conv4_2Inc_4bInc_4dInc_b1Res4c
Conv4_3Inc_4cInc_4e-s2Inc_b2Res4d
MaxPool-s2Inc_4dInc_5aInc_b3Res4e
Conv5_1Inc_4eInc_5bInc_b4Res4f
Conv5_2MaxPool-s2GAPInc_b-s2Res5a-s2
Conv5_3Inc_5aInc_c1Res5b
MaxPool-s2Inc_5bInc_c2Res5c
Fc_4096GAPGAPGAP
Fc_4096

Fc_1000

Softmax