Mathematical Problems in Engineering

Research Article

Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN

Table 2

Feature extraction network data comparison.


Network structure	Feature	Anchors	Lateral	Top-down

Featurized image pyramid	C4	47k	No	No	48.3	32.0	58.7	62.2
Featurized image pyramid	C5	12k	No	No	44.9	25.3	55.5	64.2
Bottom-up pyramid		200k	Yes	No	49.5	30.5	59.9	68.0
Top-down pyramid		_200k	_No	Yes	46.1	26.5	57.4	64.7
Finest level only	P2	200k	Yes	Yes	51.3	35.1	59.7	67.6
FPN		200k	Yes	Yes	56.3	44.9	63.4	66.2

The first column is the feature extraction algorithm, where the second and third rows are the two-layer outputs of the algorithm, and the fourth row contains the single feature map and pyramidal feature hierarchy. The second column is the name of the output layer, where the “{}” symbol indicates the independent prediction at each layer. The evaluation standard uses average recall (AR). The number in the upper right corner of the AR indicates the number of anchors generated by each image. The letters “s,” “m,” and “l” in the bottom right corner denote the small goal, medium goal, and large goal, respectively.