Research Article

Efficient ConvNet Feature Extraction with Multiple RoI Pooling for Landmark-Based Visual Localization of Autonomous Vehicles

Table 5

Comparisons of average running time per image and the GPU memory cost between two variants of our method and compared methods for extracting ConvNet features. The total running time consists of the computational costs for preprocessing, going through the Caffe and postprocessing. Note that /VGG-M1024b refer to the costs of computation and GPU memory when sending 100 detected landmarks into Caffe as a batch of 100. “—” means the computational cost is negligible. We can clearly see that the computation speed and GPU memory consumption of two variants of our method are close to those of FastRCNN-AlexNet/VGG-M1024 and several times faster and fewer than those of /VGG-M1024b.

Method GPU Memory
(MB)
Average running time (ms)
Pre Caffe Post Total

MRoI-FastRCNN-AlexNet 240 6.9 18.3 3.8 29.0
MRoI-FastRCNN-VGG-M1024 396 33.5 5.8 46.2

FastRCNN-AlexNet 218 6.9 13.3 20.2
FastRCNN-VGG-M1024 367 31.0 37.9

AlexNet 183 30.3 518.5 548.8
VGG-M1024 229 998.8 1029.1
880 115.9 146.2
VGG-M1024b1965 199.7 230.0