Research Article
Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN
Table 4
Experimental data.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The first row is the network structure and the number of identification categories. For example, FPN + resnet101_81 uses the Resnet-101 residual network to identify 81 types in the FPN. The second row indicates that all network structures are trained 160 ∗ 1000 = 160,000 times. Memory_size refers to the weight of memory after each network structure is trained. Train_time refers to the time taken for each network structure training. Total params and Trainable params represent the total memory parameters and training memory parameters, respectively, of the network structure. Floating point operations (FLOPs) indicate the number of floating-point operations for each network structure, that is, the amount of calculation. Test_avg_time (M4952) refers to the average time to test 4952 images for each network structure. Min_train_loss refers to the minimum value of the weight loss after 160,000 training for each network structure. |