Mathematical Problems in Engineering

Research Article

Object Detection Based on Fast/Faster RCNN Employing Fully Convolutional Architectures

Table 1

ConvNet configurations. The convolutional layer parameters are denoted as “layer/receptive field size-stride.” s2 stands for the case that the stride of the layer is 2 while stride 1 is omitted. All the receptive field size of convolutional layer is 3 × 3 which is omitted in ZF/VGG. LRN and GAP are short for local response normalization layer and global average pool layer while Inc and Res are the abbreviations for inception and residual building block, respectively. The ReLU activation function is not shown for brevity. And the batch normalization layer only included in the second three architectures is not shown as well.


ZF	VGG	GoogleNet	Inception_v2	Inception_v3	ResNet_50

Conv1-s2	Conv1_1	Conv1/7 × 7-s2	Conv1/7 × 7-s2	Conv1/3 × 3-s2	Conv1/7 × 7-s2
LRN	Conv1_2	MaxPool-s2	MaxPool-s2	Conv2/3 × 3	MaxPool-s2
MaxPool-s2	MaxPool-s2	LRN	Conv2/1 × 1	Conv3/3 × 3	Res2a
Conv2-s2	Conv2_1	Conv2/1 × 1	Conv2/3 × 3	MaxPool-s2	Res2b
LRN	Conv2_2	Conv2/3 × 3	MaxPool-s2	Conv4/1 × 1	Res2c
MaxPool-s2	MaxPool-s2	LRN	Inc_3a	Conv4/3 × 3	Res3a-s2
Conv3	Conv3_1	MaxPool-s2	Inc_3b	MaxPool-s2	Res3b
Conv4	Conv3_2	Inc_3a	Inc_3c-s2	Inc_a1	Res3c
Conv5	Conv3_3	Inc_3b	Inc_4a	Inc_a2	Res3d
MaxPool-s2	MaxPool-s2	MaxPool-s2	Inc_4b	Inc_a3	Res4a-s2
Fc_4096	Conv4_1	Inc_4a	Inc_4c	Inc_a-s2	Res4b
Fc_4096	Conv4_2	Inc_4b	Inc_4d	Inc_b1	Res4c
	Conv4_3	Inc_4c	Inc_4e-s2	Inc_b2	Res4d
	MaxPool-s2	Inc_4d	Inc_5a	Inc_b3	Res4e
	Conv5_1	Inc_4e	Inc_5b	Inc_b4	Res4f
	Conv5_2	MaxPool-s2	GAP	Inc_b-s2	Res5a-s2
	Conv5_3	Inc_5a		Inc_c1	Res5b
	MaxPool-s2	Inc_5b		Inc_c2	Res5c
	Fc_4096	GAP		GAP	GAP
	Fc_4096

Fc_1000

Softmax