Mathematical Problems in Engineering

Research Article

Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN

Table 4

Experimental data.


Backbone_class	FPN + resnet101_81	FPN + resnet86_81	FPN + resnet50_81	FPN + resnet101_3	FPN + resnet86_3	FPN + resnet50_3

Epoch_steps	160_1000	160_1000	160_1000	160_1000	160_1000	160_1000
Total params	64158584	58549624	45088120	63744170	58135210	44673706
Trainable params	64047096	58453496	45028856	63632682	58039082	44614442
FLOPs	130205828	118834293	91542609	129377924	118006389	90714705
Memory_size	257.6 M	235.0 M	180.9 M	255.9 M	233.3 M	179.2 M
Train_time	27.98 h	21.77 h	18.25	23.02 h	20.43 h	18.73 h
Test_avg_time (M4,952)	2.14 s	2.014 s	1.39 s	2.10 s	1.97 s	1.36 s

The first row is the network structure and the number of identification categories. For example, FPN + resnet101_81 uses the Resnet-101 residual network to identify 81 types in the FPN. The second row indicates that all network structures are trained 160 ∗ 1000 = 160,000 times. Memory_size refers to the weight of memory after each network structure is trained. Train_time refers to the time taken for each network structure training. Total params and Trainable params represent the total memory parameters and training memory parameters, respectively, of the network structure. Floating point operations (FLOPs) indicate the number of floating-point operations for each network structure, that is, the amount of calculation. Test_avg_time (M4952) refers to the average time to test 4952 images for each network structure. Min_train_loss refers to the minimum value of the weight loss after 160,000 training for each network structure.