Abstract

Range estimation is crucial for maintaining a safe distance, in particular for vision navigation and localization. Monocular autonomous vehicles are appropriate for outdoor environment due to their mobility and operability. However, accurate range estimation using vision system is challenging because of the nonholonomic dynamics and susceptibility of vehicles. In this paper, a measuring rectification algorithm for range estimation under shaking conditions is designed. The proposed method focuses on how to estimate range using monocular vision when a shake occurs and the algorithm only requires the pose variations of the camera to be acquired. Simultaneously, it solves the problem of how to assimilate results from different kinds of sensors. To eliminate measuring errors by shakes, we establish a pose-range variation model. Afterwards, the algebraic relation between distance increment and a camera’s poses variation is formulated. The pose variations are presented in the form of roll, pitch, and yaw angle changes to evaluate the pixel coordinate incensement. To demonstrate the superiority of our proposed algorithm, the approach is validated in a laboratory environment using Pioneer 3-DX robots. The experimental results demonstrate that the proposed approach improves in the range accuracy significantly.

1. Introduction

The applications of mobile robots for observation and rescue missions have received an increasing attention in recent years. Current advances in sensing and computing promote mobile robots as a suitable option in occasions such as search and rescue, SLAM (Simultaneous Localization and Mapping), automatic navigation, and target detection. For mobile robots, retrieving their position is one of the important issues. In recent years, to solve this problem, vision sensors have attracted a lot of attention because vision sensors are relatively inexpensive and compact with low power consumption. Furthermore, methods using version sensors can localize the robot in various environments where it is difficult for general localization methods like wheel odometry and GPS. If localization can be performed only using image information, a robot’s flexibility will be improved remarkably.

However, the precondition to a successful intelligent robot system is the exact perception of surroundings, where the range and azimuth information of targets around play an important role. The range estimation algorithms using vision sensors are known as VO (visual odometry).

Approaches for range estimation can mainly be divided into three categories: radar-based, laser-based, and vision-based. As a typical paradigm of noncontact approaches, ultrasonic sensors have the advantages of time efficiency and measurement accuracy. However, it is arduous to detect those objects with small surfaces or situated at a wide angle related to the ultrasonic sensor(s). Among all perception sensors, computer visions have an added advantage of acquiring large amount of information at a lower cost. The vision-based method can solve both range and azimuth estimation problems using only the acquired image themselves. There has been much interest in research on object detection by stereo camera [14], but the monocular camera is still strongly advantageous for its large sensing area, low cost, and easy installation.

To achieve precise VO in outdoor environments, some problems remain to be solved. In this work, the problems of VO on bumpy courses which exist mostly in outdoor environments are considered, where the bumpy courses mean that the environments include rough roads on which the VO accuracy is dynamically affected by the pose change of vision sensors. Furthermore, if precise VO is realized in the environments including rough roads, we believe that it can be utilized in any outdoor environments.

In the researches of intelligent unmanned vehicle systems, computer vision generally adopts the methods of imaging processing algorithms. In those works, the image features are extracted, along with the model of the ambient environment, for vehicle localization and obstacle avoidance. Range and azimuth information are then refined from the above model using vision system. It is unrealistic to assume that the road is absolutely flat in the process. This paper concentrates on the dynamic measurement rectification problem in which the camera pose changes abruptly. This approach is particularly suitable for applications such as navigating the autonomous vehicles running on rough terrains. Pose variations are firstly measured by a three-axis angle sensor and sequentially applied to calculate the distance offsets using the proposed range-pixel model. Although monocular visual odometry has an advantage in wide FOV (field of view), factors such as lighting conditions, shadows, and random noise would unavoidably decrease the measurement precision, which are induced from both human limitations and sensor characteristics. On the contrary, noncontact sensors such as sonar are typically not susceptible to those external conditions which would infect the result accuracy. Nevertheless, one main defect is the existence of inherent blind areas. In the view of these possible advantages and corresponding limitations, sensor assimilation technique based on OI (Optimal Interpolation) method is employed. The main contributions of this paper are summarized as follows.(1)The relation between range increment and camera’s pose variation has been formulated, based on which a feasible data rectification algorithm has been designed to modify the metrical results. To the best of our knowledge, it is the first work to solve range estimation problem under camera shaking conditions.(2)An improved estimation mechanism of range information in OI model has been developed, which enhances adaptability and accuracy of multisensor measuring system.(3)Experiments on mobile robots and analytical results have been demonstrated.

The rest of this paper is organized as follows. The following section will provide some background and a more detailed literature review. Section 3 defines the problems and related literatures. Section 4 details the proposed approach for measurement rectification and sensor fusion. Finally, experiment results and conclusions are given in Sections 5 and 6.

Visual distance estimation is a specialized set of approaches which focus on real-time and accurate image capture followed by range information acquirement. Several of these mechanisms have been developed as foundational elements of 3D reconstruction, simultaneous localization, and map building.

Some basic algorithms as well as their improvements for range estimation have been developed: epipolar constraint model [2], defocusing method [35], coordinate mapping scheme [6, 7], and camera movement approach [1, 8]. Katsuyuki et al. proposed a coupled estimation of unknown vehicle width and following distance by sequential Bayesian estimation. The method can run in real-time and produce highly accurate estimation of the following distance under a precondition that no camera shaking happens.

Those proposed methods can be divided into two categories: monocular and stereo system. Monocular approaches involve a single no sophisticated camera that compute the pixel size or coordinates which are used for range estimation. Examples of these are studied in [9]. Stereo vision approaches can provide much higher accuracy than monocular, but they have small field of view and high operational complexity. Several intelligent and operable algorithms [10, 11] fall into this category.

Monocular and stereo vision approaches have advantages in different aspects. Monocular approaches are usually easy to be implemented and have optimal view scope. Meanwhile, they require much lower cost compared to the former. Stereo vision methods, in contrast, have a good performance in accuracy due to the subpixel synthetical localization technology, while their biggest drawback lies in the complicated operations and high computational complexity, especially during the calibration process.

Among these emerged researches, most work assumes the camera pose is fixed [3, 6, 9, 10, 1214]. Some notable exceptions, which have similarity to the present work, are as follows. Guo et al. [15] put forward a parallel constraint method based on the two lane boundaries. Vehicles are equipped with an angle sensor to accurately acquire the pitching angle of the camera in [13, 16, 17], where the author proposed an improved algorithm in angle calculation by using a function of the angles representing the two parallel lane lines.

Some other approaches have also been proposed. Typical paradigms are as follows. Han et al. [18] devise a feature point based method for monocular measurement, but they hinder the real-time implementation. Malis and Rives [19] design a hybrid algorithm to minimize the token relative displacements between two frames and then estimate the image-space distance.

3. Problem Formulation

In Figure 1 suppose is a point in the image plane of the camera in a pose of , and suppose that we have an estimate of the pose of the camera by a three-axis gyroscope. From this information a standard ground-constrained model [18] can be used to estimate the position of in the world coordinate. If the camera’s pose suddenly changes to , we can use this information to project point into the camera’s image plane, which obtains a second point . Now assuming that the pose measurement is reasonably accurate and that the position estimate algorithm works well, the problem is to estimate utilizing measurements including , , and pose variation of the camera.

The initial and final poses of the camera are denoted by and , respectively, where , , and () stand for the initial roll, pitch, and yaw angles. Although the actual relative distance from the optical center to the target changes slightly, the measured results deviate from the truth significantly. This is mainly because of the nonlinear mapping between pixel coordinates and corresponding distance values. The problem is to correct the actual measurements to be close to the truth by eliminating the pose perturbation of the camera.

4. Data Rectification Algorithm

In this section we describe our approach to the problem of monocular vision-based measurement rectification. Since a robot’s trajectory is most conveniently described in a world coordinate system, while the target on the ground is generally described by its camera coordinate system, we start with a preview of these two coordinate systems. To model the problem in a general geometrodynamical architecture, the algebraic relation between the camera pose displacement and the displacement of measuring distance is derived.

4.1. World and Camera Coordinates System

Assume that and are, respectively, a world coordinate system and the camera’s coordinate system as shown in Figure 2. The coordinates of a point under these two coordinate systems are transformed bywhere and are point coordinates in the world and robot camera coordinate system. Moreover, and are, respectively, the rotation and translation from the camera to the world’s coordinate system, which determine the position and orientation of the camera in the world coordinate system. Furthermore, for a 3D point in the FOV of the camera, its image coordinates are given by the projection equation as follows:where are the coordinates of in the image coordinate system and is the camera’s focal length.

4.2. Chebyshev Best Uniform Approximation Rectification Algorithm

The distance-orientation information between targets and the camera can be derived from corresponding pixel coordinate in the image [20, 21]. It is found that the ratio of image pixel motion to the camera rotation angles varies nonlinearly along the main optical axis. The main idea of the designed algorithm is to piecewise linearize the nonlinear rate and then calculate the rate of change with respect to rotation angles as well as the measured distance. Equation (3) presents the rotation matrix in 3D space:

Variations of pixel coordinates are associated with the world coordinates by a rotation matrix whose parameters are attitude angles of the camera, which is described bywhere inner parameters and are only determined by the CCD structure itself.

For the convenience of discussion, we assume that the camera poses change mainly along the yaw angle direction.

Denote ; using (3) and (4), we obtain where

Substituting (3) in (5) results in

From Figure 3, we can see that the slope of curves tends to be constant within a sliding interval of independent variable. This interval becomes smaller when the ratio of to increases. The Chebyshev approximation method has the characteristics of uniform approximation on selected closed-interval. Inspired by this, the nonlinear rate can be approximated by linear polynomial and the deviation caused by poses change of a camera can be effectively compensated. The second derivative is taken as

Considering the yaw angle variations of a PTZ (Pan/Tilt/Zoom) camera caused by uneven pavement during practical robot motion, a closed subinterval () is chosen for further deduction. Since (8) is a continuous function and keeps consistency in sign, the best consistent approximation method can be used.

Denote , using this to acquire normal equation of approximation:

Set the solution of (9) as . Then the approximation equation is written as follows:

We explore the slope of line after linear approximation to study the function of different ratios of to . Results show that the slope converges to its limit uniformly. Moreover, this constant value is irrelevant to the ratio above: Substituting (7) in (11) results inTo demonstrate the convergence of , we have also analyzed the limit value given by

Solid curves in Figure 4 are the results of actual slope and linear approximation, respectively. These two curves coincide with each other well after a translation operation. This indicates a high accuracy in slope using linear approximation. Figure 5 manifests the convergence of slope related to a metric of , which is in good agreement with experimental results. Another important property that should be noted is that the function value rapidly reaches convergence after a dramatic increase; that is, the measured range would vary with the metric with high nonlinearity. This also implies that the measurement should be conducted on the smooth interval to reduce the errors caused by camera shakings. On the other hand, it is impossible to compensate the deviations when the metric is too small.

4.3. Sonar and Camera Data Assimilation Model

The Optimal Interpolation Algorithm is derived to generate the least squares results for vectors of observations and background fields assuming “a priori” known statistical models for the background error covariance. The Optimal Interpolation Technique, based on the minimization of variance estimation, plays an important role in data assimilation. It uses several different real-world observations to produce a corrected output, which is closer to the truth.

The motivation of the proposed method comes from the similar characteristic and phenomena between a camera and sonar measurement system and an OI algorithm. First, a camera and sonar system can be considered to be an OI system that produces an optimal output through several groups of observations. Second, the OI algorithm has the dimension-extensible and loose a priori characteristics that are attractive for camera and sonar measurement system.

The following are given:(i)A background field available in two or three dimensions.(ii)A set of observations available at irregular positions.The optimal estimation is described byThe errors are given byThe optimal weight is then as follows:where and represent the mean value of and . As data from camera and sonar are unrelated, it is assumed that .

5. Evaluations and Analysis

In this section, we present the results of a set of physical experiments to demonstrate the performance of the proposed algorithm in Section 4. To validate the effectiveness of the proposed data rectification algorithm, we compared the results before and after a pose change with the truth. Moreover, we have conducted a set of experiments under different initial poses of a camera to testify the robustness of this method. Besides, comparative experiments have been designed to show the validity of the data assimilation approach.

Autonomous vehicles can be modeled as mobile robots and then we use the mobile robot Pioneer 3-DX (Figure 6) mounted with a camera for experiments. To prepare for experiments, the PTZ camera is firstly calibrated.

5.1. Camera Calibration

Grid size of the calibration board in experiments is . Picture resolution of VCC50I is fixed as . To ensure error balance and calibration accuracy, a group of calibrated images containing four images from various poses are collected at a distance interval of 10 cm. Calibration distance ranges from 1500 mm to 4000 mm. Considering effects of pitch and rotation angles as well as the zoom value, the camera state during calibration is fixed as given in Table 1. Internal parameters are listed in Table 2, which are crucial to distance measurement.

5.2. Performance Evaluation of the Data Rectification Algorithm

Angle variations are acquired by a three-axis angle sensor, which act as the input of the rectification algorithm. The module (MPU6050) has advantages in low temperature dependency, high resolution, and low noise. Due to these advantages, this module is chosen as a tool to measure the Euler angles.

To validate the robustness of the algorithm, a target is set at different positions randomly. For each metric , results under a set of camera poses are analyzed (yaw, pitch, and roll angles are random for each metric as set in Table 3). Initial readings of angle sensor are as follows: (pitch): 9.65, (yaw): −0.27°, and (roll): −0.94°. The target position tuples are set as .

Figure 7(a) shows measurements from the initial camera state. Compared with the truth, it shows bias along both horizontal and vertical directions.

The deviations of VO results caused by the camera motions are rectified independently. Based on the analysis of the range model in Section 3, the pitch angle is an independent variable of distance function. Therefore, we recalculate the pitch angle instead of inversion operations on pixel coordinates. Figures 7(b)7(f) show results before and after rectification. In Figure 7(e), the distance error along the optical axis is almost as high as 50% using direct measurement. However, this value decreases to be only 6% using the proposed algorithm. We can also see that the least improvement in accuracy in this direction is 10% as shown in Figure 7(c). Accordingly, much more remarkable effect can be seen from the results along the direction perpendicular to the optical axis. In the worst case as shown in Figure 7(f), the measured distance along -axis is rectified from the measured −780 mm to the final 98 mm. The percentage gains of measuring precision approach 878%. Even in a general situation, this percentage can be close to 35% as demonstrated in Figure 7(c). These figures also show that the range deviation becomes larger as the distance along the optical axis direction increases. This is mainly because the ratio of physical distance to pixel unit increases along the optical axis.

5.3. Data Assimilation Evaluation

For generality, the assimilation results under different metric size (i.e., manipulating ) with a fixed camera pose are demonstrated. Data from sonar sensors are set as the background field value and those from camera are set as the observation field value. In Figure 8(a), measurement results at some positions are missing, which indicate that blind zones exist when sonar system only is adopted. Range data in Figure 8(b) are the results of sensor assimilation. It demonstrates the accuracy improvement from both -axis and -axis compared with measurements solely from the vision system and sonar sensors. Assimilated results proved to be as much as 25 percent accurate along -axis and 9 percent accurate along -axis compared to those acquired using a single type of sensor. This is mainly because new information is brought in to compensate the output from a single measuring system, that is, the wild FOV of camera and the high measurement accuracy of sonar sensors.

6. Conclusions

In this paper, we have proposed an analytical measuring rectification algorithm for monocular range estimation under camera shaking conditions. Specifically, we have established a pose-range model and then the algebraic relation between distance increment and a camera’s poses variation has been formulated. We have also designed a data assimilation system to provide reliable range information using different types of transducer systems. Physical experiments are conducted to validate the effectiveness and robustness of the proposed algorithm. For future work we will try to implement our algorithms on the multiple robots formations as well as swarm coordination applications.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This research is financially supported by the Natural Science Foundation of China (Grant no. 61571334) and the Natural Science Foundation of China (Grant no. 2014AA09A512).