Research Article

Multiactivation Pooling Method in Convolutional Neural Networks for Image Recognition

Table 1

Baseline architectures. Models A and E are two baseline architectures ‎[13] with slightly different number of filters in some layers. Others are architectures with large-scale pooling layers or MAP layers. Conv3 indicates that the filter kernel sizes are 3×3. The ReLU gate is not shown for simplicity. The structures between small-size datasets (CIFAR, SVHN) and large-scale datasets (ImageNet) are different in input layers and fully connected layers. Parameters on the right of the slash belong to large-scale datasets.

Networks architectures based on VGG
VGG-11VGG-13
ABCDEFGH

Small-size datasets: Input (3232 RGB image)/Large-scale datasets: Input (224224 RGB image)

conv3-64conv3-64conv3-64conv3-64conv3-64conv3-64conv3-64conv3-64
22
maxpool
conv3-128conv3-128conv3-128conv3-64conv3-64conv3-64conv3-64
conv3-128conv3-128conv3-12822
maxpool
conv3-128conv3-128conv3-128
44
maxpool/MAP
conv3-128conv3-128conv3-128conv3-128conv3-128
conv3-128conv3-256conv3-256conv3-128conv3-256conv3-256conv3-256
22
maxpool
88
maxpool/MAP
conv3-256conv3-12844
maxpool/MAP
conv3-256conv3-256
1616
maxpool/MAP
22
maxpool
conv3-256conv3-256
88
maxpool/MAP
conv3-512
conv3-256conv3-2561616
maxpool/MAP
conv3-256conv3-256
22
maxpool
22
maxpool
conv3-128conv3-256
conv3-256conv3-256
conv3-256conv3-512
conv3-51244
maxpool/MAP
conv3-51244
maxpool/MAP
conv3-512conv3-512
22
maxpool
conv3-25622
maxpool
conv3-512
conv3-512conv3-512
conv3-512conv3-512
conv3-512conv3-51244
avgpool
conv3-512conv3-512conv3-51244
avgpool
conv3-512
conv3-512conv3-512conv3-512conv3-512conv3-512conv3-512
22avgpool22avgpool22avgpool22avgpool22avgpool22avgpool

fc1-512512/250884096

fc2-512512/40964096

fc3-51210/40961000

Softmax-10/1000