Abstract

Precise, reliable, and low-cost vehicular localization across a continuous spatiotemporal domain is an important problem in the field of outdoor ground vehicles. This paper proposes a visual odometry algorithm, where an ultrarobust and fast feature-matching scheme is combined with an effective antiblurring frame selection strategy. Our method follows the procedure of finding feature correspondences from consecutive frames and minimizing their reprojection error. The blurred image is a great challenge for localization with a sharp turn or fast movement. So we attempt to mitigate the impact of blur with an image singular value decomposition antiblurring algorithm. Moreover, a statistic filter of feature space displacement and circle matching are proposed to screen or prune potential matching features, so as to remove the outliers caused by mismatching. An evaluation of benchmark dataset KITTI and real outdoor data, with blur, low texture, and illumination change, demonstrates that the proposed ego-motion scheme significantly achieved performance with respect to the other state-of-the-art visual odometry approaches to a certain extent.

1. Introduction

Precise, reliable, and low-cost vehicular localization across a continuous spatiotemporal domain of highway transportation is an important problem in the field of outdoor ground vehicles. Solving this issue is a breakthrough in promoting the future development of intelligent transportation, specifically in the following five aspects:

1.1. Conducive to Fine, Real-Time Traffic Information Accurate Perception

Compared with the traditional video-based or magnetic coil-based detection method, we can obtain more refined real-time traffic parameters by the high-precision vehicle position. With this information, we can calculate accurate micro and macrotraffic parameters (e.g., time headway, space headway, lane occupancy rate, average speed, traffic flow, traffic density, queue length, driving trajectory, congestion level, and accurate OD (origin-destination)). These parameters provide powerful data support for advanced intelligent transportation applications (e.g., vehicle collision avoidance, parking induction, navigation, platoon collaborative control, and road network dynamic traffic allocation).

1.2. Conducive to Establishment of the Vehicle Credit Records in the Vehicular Network

With the long time record of the precise position of the vehicle, the behavior safety level of the vehicle can be evaluated according to the driving trajectory of the vehicle. The recorded credit is valuable to vehicle collision warning, vehicle road right dynamic allocation, vehicle priority ranking, and other applications in the vehicular network.

1.3. Conducive to Early Warning of Traffic Accidents

In the future intelligent transportation system, if each vehicle can obtain its own precise position and that of the surrounding vehicle in real time, it can fuse other information (e.g., weather, road condition, road line shape, and obstacle) from the roadside to evaluate vehicle risk dynamically. The advance warning of traffic accident can avoid or reduce the probability of traffic accidents.

1.4. Conducive to Postanalysis of Traffic Accidents

In the future vehicular network, if the real-time location of the vehicle can be obtained and recorded exactly, microscale data in “before the accident-in the accident-after the accident” scenes can be obtained, which is favorable to reconstructing the situation before the collision and affirming the responsibility scientifically and objectively. The long-term observation of accurate data of traffic accidents is helpful to discovering the distribution law of traffic black spots in time and space, which can provide an important basis for road design and traffic accident prevention.

1.5. Conducive to the Cooperative Security Decision-Making Analysis with V2V (Vehicle to Vehicle) and V2I (Vehicle to Infrastructure)

A precise and reliable location is the most basic and important data for dynamic traffic information collection and integration and security control with V2V and V2I. The accuracy localization of the vehicle helps to get more refined real-time traffic parameters according to these precise traffic parameters of the vehicle: the optimal speed in the designated area, the probability of joining or leaving a platoon, and the optimal path selection under the premise of comprehensive consideration of trip time and driving distance.

In short, a major breakthrough of large-scale, precise, reliable, low-cost, and real-time vehicular localization will greatly reduce the road traffic system uncertainty and randomness, make the road traffic system further close to the rail transit system, and greatly improve the transport efficiency and safety without reducing the mobility at the same time. In this paper, a precise, reliable, and low-cost ego-motion estimation methodology, especially ones involving an image antiblurring and statistic filter of feature space displacement framework, is proposed. There are three main contributions of this paper. (1)An image singular value decomposition (ISVD) antiblurring algorithm is proposed to mitigate the effect of image blur. To highlight the efficiency and performance of the proposed method ISVD, many comparisons have been conducted with other image-blurred degree assessment algorithms.(2)For feature displacement, not only adjacent matched frames or but also the left and right matched frames or in a stereo model conform to a Laplacian distribution. Based on this discovery, we proposed a statistic filter of a feature space displacement algorithm to remove the outliers in feature correspondences.(3)Experiments are conducted on a public benchmark dataset and data captured with our outdoor ground vehicle platform. We analyze the results from quantitative and qualitative aspects to demonstrate the superiority of our proposed method.

This paper focuses on the innovation in the front end of the visual information-based ego-motion estimation method, so here we just review some works in the front-end procedure. The accuracy and reliability of feature detection and matching are the bases of the whole vision-based ego-motion estimation method. To eliminate the error caused by feature detection and matching or tracking, some effective tricks have been developed in the previous relevant literatures. Badino et al. [1] used the whole history of the tracked feature points to compute the motion of the camera, which reduced drift significantly with a negligible additional computational cost. They also discovered that the feature interframe tracking error follows a Laplacian distribution in all three dimensions, while this property was not employed to remove the outliers. The concept of temporal flow [2] restricts the uncertainty to reduce error propagation from matches. A more direct way is careful feature selection and tracking, for example, ORB based [3], KLT based [4], and AKAZE based [5]. With further analysis, we find that the feature displacement statistic for a sequential frame conforms to a Laplacian-like distribution with a zero mean. Badino et al. [1] had explored this distribution while they ignored the ability of this distribution in the matching procedure.

In the feature-based framework, common feature detectors, however, cannot detect enough features from a blurred image, especially for the outdoor ground vehicle in a sharp turn. The blur effect can further degrade the matching accuracy between frames (Figure 1(a); the number of good matches is decreasing from 111 to 15 with a more and more serious blur impact) and detect the inaccurate loop closure (Figure 1(b)). Figures 1(b), b1 and 1(b), b2 show an almost similar scene, and Figure 1(b), b2 is blurring. The similarity between Figures 1(b), b1 and 1(b), b2 is 76%, while that between Figures 1(b), b1 and 1(b), b3 is 92%, which has a great effect on reconstruction and localization performance. When an image is captured when the camera is moving during the exposure time, the one-to-one relationship between the scene points and the image points is broken and a certain number of scene points are projected at a single pixel contributing to the final pixel value. This effect is called motion blur. Motion blur is the most common artifact in images captured by a moving robot mounted with a camera. Usually, there are four types of blur, that is, linear blur, rotation blur, zoom blur, and Gaussian blur, considered as to occur in a moving environment. In order to remove or reduce the harm caused by blurry images, generally, there are two ways used, image deblurring and antiblurring. The former [6] recovers the blurred image by deconvolution, which will sacrifice the feature accuracy and is time consuming. The latter [79] classifies the images into blurry and clear images and then eliminates the blurry ones. Lee et al. [10] presented robust data association and drift-free simultaneous localization and mapping (SLAM) with blurred images. The blurred images were recovered by deconvolution. Obviously, this deblurring method was not applicable for outdoor ground vehicle localization. Zhao et al. [11] presented an adaptive blurred image classification algorithm, which was effective but could not be applied to outdoor ground vehicles with feature-matching failure. Pretto et al. [12] proposed a visual odometry (VO) method robust to motion blur based on the improved feature detector SIFT [13]. However, only Gaussian blur is considered in this work; the other three common types of blurring effect are ignored.

In general, there are three ways to estimate visual ego-motion, namely, VO, structure from motion (SFM), and SLAM. VO is devoted to the real-time and accurate estimation of camera movement while SFM focuses on 3-dimensional reconstruction as well as camera pose estimation and usually refines the estimation with bundle adjustment. VO can be thought of as a special case of SFM, whereas SLAM’s purposes are for localizing a robot in an unknown environment. More specifically, SLAM recovers the trajectory of the camera and builds a map at the same time. The concept of VO was formally put forward by Nister et al. [14], who realized a real-time VO system. The later research was mostly based on this VO framework. The most successful VO applications were NASA’s Mars Exploration Rovers, Spirit and Opportunity [15]. An image pyramid was introduced to help feature tracking in 2011 on Curiosity. Howard [16] implemented a stereo VO with an adopted Harris and Fast feature to ensure real-time performance and employed a feature-matching method [17] to find the correspondences. The aforementioned research laid a good foundation for the development of VO, but they suffer from a low accuracy. Geiger et al. [18, 19] used a simple Sobel template operator to detect the feature for a real-time system. They tested the VO algorithm on the KITTI benchmark dataset and obtained the positioning result with high efficiency and accuracy. A KITTI vision benchmark suite was established by Andreas (MPI Tübingen), Philip Lenz (KIT), Christoph Stiller (KIT), Raquel Urtasun (University of Toronto), and so on. Bellavia et al. [2] proposed key frame matching and loop closure to build a stereo camera SLAM system. Badino et al. [1] integrated features from multiple frames to improve the accuracy of the motion estimation, and they also combined the historical image information to obtain the refined motion estimation. Mur-Artal et al. [20] proposed a feature-based monocular ORB-SLAM system, which selected the points and keyframes for reconstruction and obtained an excellent performance. ORB-SLAM is the most outstanding work with a feature-based system that runs in real time, in small and large indoor and outdoor environments. Differently from the feature-based method, Engel et al. [21] relied on the photoconsistency of high-contrast pixels, including corners, edges, and high-texture areas. What is more, an illumination invariance method removed the effect of brightness changes. DSO [22] and LSD-SLAM [23] abandoned the feature detection procedure and directly used the actual sensor value light received from a certain direction over a certain time period. Hu and Chen [24] combined a monocular and an inertial measurement unit (IMU) with a visual odometer. They employed trifocal tensor geometry information and a multistate constraint Kalman filter in the algorithm architecture to reduce the time consumption and enhance the accuracy of the algorithm. Besides the fusion applications of visual information and IMU, VO is usually used as a supplementary to the global positioning system (GPS) [25, 26]. Fusion approaches are becoming the mainstream of research about the practical positioning and navigation system. Although there are many multisensor solution schemes for localization, the potential of VO has not been unlocked completely. With the development of deep learning (DL) in computer vision, some researchers have imported DL into localization [27, 28]. Typically, DL is used to detect the loop closure, which is different from the conventional way of bag of words (BoW) [29]. For a more comprehensive and in-depth literature survey of the VO algorithms that have been developed, the reader may refer to [3034].

3. Proposed Method

In this section, the system model and motion parameterization are briefly introduced. Figure 2 shows our experimental platform equipped with a high-resolution stereo camera rig (Basler acA1600 GigE, image size 1200 × 800 pixels) and differential GPS (DGPS, RT2000).

Then, the image antiblurring algorithm based on blurred degree calculation with ISVD is presented. What is more, we propose a statistic filter of feature displacement to obtain the ultrarobust feature correspondences.

3.1. System Model and Motion Parameterization

In this paper, we parameterize the motion as the vector . where and represent the rotation and translation vectors between the consecutive frames, respectively.

In each procedure of pose estimation, is estimated with matched points. For stereo VO, there are four feature points in each matching step, namely, , , , and , where represent the left, right, current, and previous frames, respectively. , , and are the 3d coordinates corresponding to the four feature points, as shown in where is the camera intrinsic matrix, is the focal length, is the principal point of the camera, is the base length, and is the disparity.

During the th Gaussian-Newton gradient decent iteration, the motion vector is refined by three random features selected by random sample consensus (RANSAC) [35]. The incremental equation represents where is the residual and is the homogeneous form of . Here, is designed for reducing the error caused by inaccurate camera calibration. is small if the feature point is far away from camera. is the Jacobian matrix represented as

If the stopping criterion for iteration meets , gradient decent iteration is terminated. Otherwise, gradient decent iteration will continue with .

3.2. ISVD Antiblurring Algorithm

Image blurring is a tough problem in visual applications, especially for outdoor ground vehicles with a sharp turn. Some efforts have been made for deblurring [6, 36] with a deconvolutional model, which is not calculable in a real-time system, for example, VO and SLAM. Zhao et al. [11] designed an adaptive blurred image classification framework, which selected an image in a relatively slightly blurred degree to calculate the ego-motion. This method can eliminate the blur effect to a certain extent while the image-blurred degree calculation algorithm SIGD is less impressive than the other no-reference image quality assessment algorithms, for example, Marziliano [7], JNB [8], and CPBD [9]. This paper proposes an outstanding image quality metric assessment method, image singular value decomposition (ISVD), to enhance the performance of image antiblurring.

For an image matrix , the orthogonal matrixes and compose . in are the nonzero singular values. To explore the relationship between the singular value and image quality, we choose six images (Figures 3(a)3(e)) with an increasing blur degree from the LIVE database [37], as shown in Figure 3.

Figure 4 shows the singular value curve of those different blurred degree images in Figure 3. After decomposing the image matrix, the singular value histogram is built on the nonzero singular values. In this figure, every singular value corresponds to a different singular value number. So we can verify the blur degree with a singular value number without any ambiguity, exactly as the threshold line describes. ISVD is defined as follows: where is the size of a singular value of and is the singular value in is a singular value threshold, and we set 100 in this paper.

The image blurred degree calculation with ISVD is summarized as Algorithm 1.

Input: image
Output: blurred degree of image
Initialization:
1 convert into grayscale
2 calculation singular value with singular value decomposition on
3 for in
4  if
5   ++
6  end
7 end
8 return blurred degree

Then, an adaptive blurred image classification is adopted to select a less blurred image for the following procedures. ISVD’s time complexity is . If the image matrix is too large, the decomposition procedure is too slow. To overcome the disadvantage of time consumption, antiblurring is operated only on keyframe, which performs more efficiently than a frame-by-frame model does.

3.3. Statistic Filter of Feature Space Displacement

For the discovery in [9], the feature tracking error follows a Laplacian distribution. We identify that not only the adjacent matched frames or but also the left and right matched frames or in the stereo model conform to this distribution. The feature space displacement is the displacement along the x or y coordinate in the feature space. In all the original matching features, including inliers and outliers, displacements are collected, the fact being that the displacement for inliers and outliers between frames is fitted with a Laplacian distribution with a long tail, as shown in Figure 5. The corresponding features are more close to the peak bin, which are more likely inliers. Feature displacements for inliers between frames are shown in Figure 6 with a short tail. The x-axis means the displacement in pixels, and the y-axis is the number of features collected by histogram statistics.

On the basis of this discovery, we propose a statistic filter of feature space displacement (SFFSD) to remove the outliers and retain the inliers as shown in Algorithm 2. This algorithm only describes the statistic filter between two candidate matched images and .

Input: Candidate matched images .
Output: good feature matches
1 detect features to get descriptors and key-points
2 match to get the original matches with brute force matcher and hamming distance
3 Calculate the key-points displacements for in x and y components ,
4 create a 2D histogram with and to confirm the highest bin for mode approximation
5 use the sample within the radius to perform parameter estimation of Laplacian distribution in x and y
6 determine the min and max boundary value to include a certain percentage of inliers, assuming a Laplacian distribution
7 find the matches according to the boundary in Step 6
8 repeat Step 6 to Step 7 to find the matches
9 calculate the common element from the and
10 for in
11  if in
12      pushback the corresponding element into
13  end
14 end

The bin number is set as . In that case, the corresponding features cover 80% inliers. However, the value of is not set in stone, which depends on the diversity of the detected features. To further refine the matches after statistic filter, circle matching is employed to verify the matches [19]. Three points are randomly selected to implement RANSAC from the above good feature matches at each iteration. The ultrarobust matches improve the accuracy; meanwhile, they accelerate the convergence process of RANSAC.

4. Experimental Results and Analysis

4.1. Performance Evaluation of ISVD

This paper applies the video quality assessment index proposed by the Video Quality Experts Group (VQEG) to test the performance of different blurred degree assessment algorithms. A few parameters are mentioned as follows. With these parameters, we can evaluate the performance of different algorithms. (1)Outlier ratio (OR): where is the number of outlier images and is the total number of images. If the difference between scores calculated by the blurred degree assessment algorithm and actual reference score is larger than the threshold value, the image is regarded as an outlier image. The smaller the OR is, the better the consistency of the algorithm.(2)Root mean square error (RMSE): where and are the reference score and score calculated by the blurred degree assessment algorithm of the th image, respectively. The smaller the RMSE, the better the predicted result of the assessment algorithm.(3)Mean absolute error (MAE): where MAE sums the absolute value of the residual . In statistics, MAE is a measure of the difference between two continuous variables. The smaller MAE is, the better the value calculated by the assessment algorithm.(4)Spearman rank-order correlation coefficient (SROCC): where (mean opinion scores) is the reference score of the th image and is the value calculated by the assessment algorithm of the image. The bigger SROCC , the better the performance of the assessment algorithm.(5)Pearson linear correlation coefficient (PCC): where is the actual reference score, is the average reference score of images, is the scores calculated by the assessment algorithm of the image, and is the average calculated score of images. The bigger PCC is, the better the consistency between the score calculated by the assessment algorithm and the actual reference score.

With different blurred degree assessment algorithms (Marziliano, JNB, CPBD, SIGD, and ISVD), we calculate the blurred degree for 174 images in dataset LIVE [37]. The actual reference score of those images is known. The image quality parameters OR, RMSE, MAE, SROCC, and PCC are calculated with different assessment algorithms, as Table 1 shows. Obviously, the proposed ISVD performs more outstandingly than do other algorithms. The blurred degree calculated by ISVD remains monotonic over the full range of the data, as suggested in [36].

The high efficiency is critical to the real-time system. As shown in Table 2, ISVD is less significantly efficient than SIGD, but it is sufficient for real-time VO.

4.2. Performance Evaluation of Statistic Filter

We analyze SFFSD’s performance against the alternative ratio test (RT; if the ratio of the nearest correspondence distance to the second nearest correspondence distance is less than a threshold, a good match is declared), introduced in SIFT; on the aspect of precision recall and F1-measure which are critical to the matching performance. An evaluation is conducted on Oxford dataset VGG [37, 38] and TUM [39], as shown in Figure 7.

VGG has different photometric and geometric transformations, for example, Gaussian blur, lighting variation, scale, and rotation changes. The first frame in each set is designated as the reference. All other 5 frames, indexed with 0–4, from the same set are matched to the reference. TUM has video sequences with low texture and blurriness. Obviously, SFFSD has relatively higher than the RT.

Figure 8 shows examples of outlier removal with SFFSD, where blue and red lines represent inliers and outliers, respectively. After the SFFSD, circle matching is employed to verify the matches (Figure 9).

4.3. Experiments and Comparisons

Experiments have been conducted on the open benchmark dataset KITTI [19] and the real outdoor and indoor datasets. In consideration of feature detection and matching efficiency≥efficiency, ORB [37] is regarded as a better choice. The comparison with other detectors and descriptors, like SIFT, SURF [40], KAZE [41], and AKAZE [42], has been carried out in [41]. To emphasize the excellent performance, we employ two basic VO algorithms, libviso2 (VISO2) [43] and ORB-SLAM2 [3], as comparisons in our experiments. The feature detector and descriptor in VISO2 are replaced with ORB, named ORB-VISO2. Our method is classified into two types, that is, method without loop closure and bundle adjustment based on VISO2 (Our-method1) and method with loop closure and bundle adjustment based on ORB-SLAM2 (Our-method2). No matter what the algorithms Our-method1 and Our-method2 are based on, the major difference is our contribution work of image antiblurring and statistic filter of feature space displacement. Table 3 shows the comparison of processing times among VIOS2, ORB-VISO2, ORB-SLAM2, Our-method1, and Our-method2. The spending time of ISVD and SFFSD is within 13 ms, which has a negligible effect on the total execution time.

4.3.1. Experiments on Benchmark Datasets

The first experiment is conducted≥conducted on the KITTI benchmark dataset. The KITTI dataset consists of 22 stereo sequences, and we randomly take sequences 02 and 05 as comparison datasets. The trajectories of different algorithms and ground truth are drawn on the same figure.

From Figure 10, we can see that our method is closer to the ground truth than the art-of-the-state method. Regardless of the rotation error or translation error, our method performs better. Compared with other VO algorithms, the absolute trajectory errors of sequence 00–10 are shown in Table 4. In nonloop closure and nonbundle adjustment methods, Our-method1 is better than VISO2 and ORB-VISO2. In sequences “KITTI 01” and “KITTI 08,” ORB-VISO2 fails with fast motion on the highway or illumination changing. In loop closure and bundle adjustment methods, Our-method2 is better than ORB-SLAM2. The longer the trajectory is, the more superiority Our-method2 has.

4.3.2. Experiments in Outdoor and Indoor Environment

In this section, to verify the stability and scalability, we also carry out the experiments in real outdoor and indoor environments. The data are captured in a considerably cluttered environment on campus, which contains challenging scenes like dynamic objects, illumination changing, surface gathered water, and motion blur. The image samples are shown in Figure 9.

The trajectories recovered using the proposed method and other methods are shown in Figure 11(a). The trajectory was approximately 1.45 km and lasted 236.4 s. We regard the trajectory of DGPS as the ground truth. VISO2 and ORB-VISO2 failed to locate with the instability factor in an outdoor environment. It can be found that Our-method1 is not reliable with the error accumulation. ORB-SLAM2 and Our-method2 perform well with loop closure and bundle adjustment. Obviously, Our-method2 has less drift than ORB-SLAM2 has. As shown in Table 5, the proposed method outperforms other methods in terms of overall RMSE and end-point error.

To further support the opinions proposed in this paper, we conduct another experiment to compare the performance of different VO and SLAM kind algorithms. The trajectory in Figure 11(b) is 1.17 km and lasted 109.3 s. The average speed is about 38.5 km/h. Unlike the dataset in Figure 11(a), images in Figure 11(b) contain serious blurring effects or illumination changes, which are common in an outdoor challenging environment. VISO2 and ORB-VISO2, without any exception, fail to estimate the motion of the ground vehicle with this challenging dataset. Because of image blur, moving objects, or illumination change, even ORB-SLAM2 fails to locate the feature. However, Our-method1 and Our-method2 still perform well in the whole process with acceptable rotation and translation accumulative errors. Table 6 shows the overall RMSEs and end-point errors of the listed algorithms corresponding to outdoor data in Figure 11(b).

Figure 12 shows that our proposed methods perform well at the underground parking lot (image samples are shown in Figures 12(a) and 12(b)). Compared with the other vision-based methods, only our proposed method can still locate the vehicle with poor light and dynamic objects while VISO2 and ORB-VISO2 all fail to locate, as shown in Figure 12(c). The repeated indoor parking experiments demonstrate that ORB-SLAM2 is not a stable work because of failure with exquisite lightening change.

5. Conclusions

In this paper, an image antiblurring and statistic filter of feature space displacement-based visual odometry algorithm is proposed. The approach achieves a low drift error even for long paths. The impact of image blurring is mitigated by an antiblurring algorithm. A statistic filter of feature space displacement and circle matching are proposed and adopted to screen or prune potential matching features, so as to remove the outliers caused by the mismatching. An evaluation of benchmark dataset KITTI and the real outdoor dataset demonstrates that the proposed method is suitable to be used in robot ego-motion estimation, even for outdoor ground vehicles. Our further research will be devoted to establishing a dense SLAM system based on the proposed method because a dense map is vital to vehicle navigation application.

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

Xiangmo Zhao and Haigen Min conceived and designed the research. Xiaochi Li, Pengpeng Sun, and Xia Wu performed the experiments. Haigen Min and Zhigang Xu analyzed the data. Xiangmo Zhao, Zhigang Xu, and Haigen Min wrote and edited the manuscript. All authors read and approved the final manuscript.

Acknowledgments

This work was supported and made possible by the National Natural Science Foundation of China (no. 51278058), 111 Project on Information of Vehicle-Infrastructure Sensing and ITS (no. B14043), Joint Laboratory of Internet of Vehicles sponsored by Ministry of Education and China Mobile (no. 213024170015), Application of Basic Research Project for National Ministry of Transport (Grant no. 2015319812060), and the Fundamental Research Funds for the Central Universities (310824165024, 300102328108, 2013G5240009, and 310824153103).