Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN

<div>The RPN operation uses each pixel as an anchor point on each feature map and simultaneously generates candidate frames of three sizes and three ratios. All candidate frames are then subjected to NMS screening according to the score, and a certain number of candidate frames are selected and saved, which are used for subsequent training and prediction.</div>

Mathematical Problems in Engineering

fig6

Figure 6

Figure 6: Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN