Abstract

In postestimation problem for space robot, photogrammetry has been used to determine the relative pose between an object and a camera. The calculation of the projection from two-dimensional measured data to three-dimensional models is of utmost importance in this vision-based estimation however, this process is usually time consuming, especially in the outer space environment with limited performance of hardware. This paper proposes a computationally efficient iterative algorithm for pose estimation based on vision technology. In this method, an error function is designed to estimate the object-space collinearity error, and the error is minimized iteratively for rotation matrix based on the absolute orientation information. Experimental result shows that this approach achieves comparable accuracy with the SVD-based methods; however, the computational time has been greatly reduced due to the use of the absolute orientation method.

1. Introduction

Vision based methods have been applied to estimate the pose of space robot since 1990s. In these methods, the relative position and orientation between a camera and a robot target are determined with a set of feature points expressed in the three dimensional (3D) object coordinates and their two dimensional (2D) projection in the camera coordinate. The error in position and orientation is usually optimized using the noniterative or iterative algorithms. The noniterative algorithms give an analytical solution for the optimization [13], and a typical example of these algorithms includes the method to represent feature points as a linear combination of four virtual control points based on their coordinates [4]. The noniterative methods are generally less time consuming than the iterative methods with acceptable accuracy; however, they are sensitive to observation noise such as image noise, different lighting conditions, and even occlusion by outliers. The iterative approaches, however, achieve better accuracy than the noniterative methods by solving the rotation matrix with a nonlinear least-square method iteratively.

A typical iterative method is the Levenberg-Marquardt (L-M) algorithm [57], and it has been widely used and accepted as a standard algorithm for least-square problem in photogrammetry. The L-M method is in essentially the combination of the steepest descent method and the Gauss-Newton method in different optimization stages. The steepest descent method is used at the early stage of optimization when the current value of error is still far from the minimum, while the Gauss-Newton method is used at the later stage of optimization when the solution is relatively close to the target. The combination of the two methods at different stages is in fact a coarse search from a globalwise followed by a fine search within a local area. The use of the steepest descent method in the early stage of optimization helps to find a guaranteed convergence direction and locate the solution within a small area, while the Gauss-Newton approach finds the optimized solution with fast speed. The L-M algorithm offers a way to find an optimized solution for the iterative approach; however, since the L-M method is a general-purposed optimization method, it can be improved significantly to suit the specific requirement of the pose estimation for a faster converging speed and a better noise-tolerant solution.

The iterative optimization method specially designed for pose estimation purpose has been considered based on the target pose and the depths of the feature points [8]. This method calculates the depth information and absolute orientation of target, respectively, and the perspective nonlinearity can be reduced by introducing the depth variable. However, this method requires hundreds of iterations before it can reach a convergence point [8].

Another type of iterative algorithm, orthogonal iterative (OI), was proposed by Lu et al. [9] to estimate the object-space collinearity error with a different objective function. Instead of using the depth of the object, this algorithm uses scene points to improve the calculation of the translation vector, and it achieves a higher accuracy and more computation efficiency. However, the corruption in the input data can cause a considerable error in the rotation matrix, and thus the accuracy of the OI method is affected [10]. The OI algorithm was further developed by Zhang et al. by introducing depth update in the computation of the translation vector in a two-stage iterative process [11, 12]. Higher accuracy than the OI algorithm can be achieved from this method by refining the error of absolute orientation [10].

The essential process of the above optimization algorithms specially designed for pose estimation is to solve the absolute orientation problem, which can be applied with quaternions [1315] and Singular Value Decomposition (SVD) [15, 16] methods, respectively. The SVD method has achieved a great performance and has been used extensively because of its closed form solution and enhanced orthogonality; however, the computational load makes it difficult to implement in real-time system.

In order to overcome this problem, this paper introduces the FOAM method [17] to calculate the absolute orientation for pose estimation. The experimental result shows that the performance of accuracy and noise resistance will be shown to be comparable with the SVD method; however, the computational efficiency is considerably better. The structure of this paper has been organized as follows. Section 2 of this paper further introduces the problem, and Section 3 presents our solution for the problem. Section 4 shows the experimental result, and finally a conclusion is drawn.

2. Theory

In pose estimation of space robot, we usually have a target coordinate frame and a camera coordinate frame , and they are defined as illustrated in Figure 1, respectively.

It can be seen that the center of the projection from the object is at the origin and the optical axis points to the positive axis. Supposed that a lens with the focal length of is located at the origin, the plane is then considered as the image plane of the camera on which the feature points are projected. If the coordinates of feature points, () on the target are denoted as in the target coordinate frame, and its corresponding projection on the camera axis is expressed as , then the rotation matrix and the translation vector from the target coordinate frame to the camera coordinate frame can be written in the relationship such as where denotes the rotation matrix and is the translation vector.

If the image point in camera axis frame represents the projection from the feature point , and its coordinate is written as , according to the idealized pinhole camera model, the relationship between the coordinates of and can be expressed as and in (2) is regarded as a collinearity equation in image space, and the orthogonal projection of to can be written as where denotes the projection matrix to the vector .

The object-space collinearity error for each feature point is then formulated for optimization, such as where represents the observed projection matrix to vector , and denotes the observation of . Considering each of the feature points, the following objective function is defined: The objective function in (7) can be minimized by finding a suitable and .

Since the objective function in (7) is the second norm of both the translation vector and the rotation matrix , the error of the absolute orientation can be found by taking the derivative of (7) with respect to and making it equal to zero, such as The left side of (7) can be rewritten as, Since is the projection of on , and in this case cannot project on itself, we have , or ; therefore, the term in (8) is nonsingular, and the optimal position of can be written as a function of the rotation matrix such as If , then (7) can be rewritten as a function of , such that It can be seen from (11) that the object-space collinearity error function can be minimized with respect to the rotation matrix only.

In the same nonlinear least-square format as shown in (11), another objective function can be formulated in order to find the optimal orientation matrix and translation vector , such that If the mean values of the feature points in object coordinate and camera coordinate are calculated as and respectively, (12) can be rewritten in the form of, where and . Since the second term of (13) can be set to zero by assigning , the objective function can be optimized by minimizing the first term, such that where . It is obvious that in (14) can be minimized with a maximum value of , and it can be found by using the SVD method. In traditional SVD method, can be decomposed as , where and are orthogonal to each other, and ( are the singular values of ), such that where , , , and . Therefore, we have It can be seen that the maximum of in (16) is with , where is an identity matrix. Therefore, the optimal can be obtained by using . However, the computation of the matrices and is too time consuming and this method cannot be implemented for real-time applications.

It is noted that the calculation of the singular values of can be replaced by the combination of , and adj(), and therefore can be calculated without the need for the SVD operation, such that If we let then the following equation can be formed in order to obtain , such that It is noted that (18) is one of the four solutions and the largest root of (19). Therefore, can be derived from (19) such that where . With the optimal being found, the minimum in (13) can be calculated as,

The procedures of the computationally efficient post estimation method can be summarized as follows.

First of all, calculate the optimal translation vector using the rotation matrix ; secondly, update the camera-frame coordinates with ; thirdly, calculate the optimal rotation matrix with and based on the proposed absolute orientation method. Repeat the process above until the absolute and relative object-space collinearity errors are less than the predefined thresholds. The initial value is computed with and based on the weak perspective approximation [9]. The flowchart of the iterative algorithm is shown in Figure 2.

3. Experiment

In this section, the performance-like accuracy, noise resistance, and computation efficiency for the proposed method are tested and compared with the methods such as Lu’s OI algorithm, Zhang’s two-stage algorithm, the classic linear transform method [18], and the L-M method, respectively.

The above methods are tested in programming languages such as Matlab 7.0 and VC++ 2008 with LAPACK 3.3.1 (linear algebra package), respectively, and the tests are run on a PC with a CPU of Intel Pentium D E5500 (with clock frequency 2.8 GHz and with 2 GB of random access memory). The operating system is Microsoft Windows XP Professional with service pack 3.

The feature points for the testing are generated randomly within a space of in the object coordinate frame. The rotation matrix is chosen randomly by the Euler angles: yaw, pitch, and roll within (−90, 90) (−90, 90] [0, 360) degree. The translation vector t is distributed randomly within a space of in the camera coordinate frame. The focal length on Figure 1 is set to. The coordinates of the image points are calculated based on , , and , and Gaussian white noise is added to to generate observed points . The standard deviation of the noise is a function of the SNR (Signal-to-Noise Ratio) defined by . In this way, 1000 sets of are generated for each set of , , and , and the final result is the average value of the 1000 set of .

In the first test, we select feature points for the optimization, and the SNR is varied from 30 dB to 80 dB with the interval of 10 dB. The errors of Euler angles and translation vector are recorded on the camera, and the result of each of the steps is averaged from 500 sets of , , and .

In the second test, the SNR is set to be 60 dB, and the number of feature pints selected varies from 4 to 29 with the interval of 5. The time of computation is recorded as well as the errors of Euler angles and translation vector. The result of each step is taken from the average value of 500 sets of , and .

Figure 3 shows the rotation error against the number of feature points, and Figure 4 gives the translation error versus the number of feature points. Figure 5 describes the rotation error against the SNR, and Figure 6 shows the translation error versus the SNR. In these figures, the rotation errors are represented by the roll angle errors because both yaw and pitch have the same effect. It can be seen from these figures that four of the approaches have comparable accuracy and noise-resistance capability, but the performance of the Linear method is relatively poor.

Figure 7 shows the computation time of three of the algorithms required to complete the calculation in both Matlab and C++ environments. Since the accuracy and noise resistance of the Linear method are not satisfactory, this method is not considered in the comparison of computation time. Also, because the computation time of the L-M method is too time consuming, the time data shown in Figure 7 is far beyond the range of the time coordinate, and therefore the L-M method is not included in the comparison as well. It can be seen from Figure 7 that the programmes run on C++ platform are generally faster than in Matlab environment. The proposed Fast Pose Estimation algorithm does not show its advantage in Matlab because the SVD calculation has been optimized in the built-in function, while the matrix operations in the Fast Pose Estimation algorithm have not been improved. However, when the tests are carried on based on the LAPACK in C++, the SVD calculation and other matrix operations are optimized equally, and it can be seen from the result that the proposed Fast Pose Estimation algorithm performs much faster than the other two approaches. Since, in real world application, our software for embedded system is generally designed using C++ language, the performance of the algorithms in C++ environment would be concerned the most.

4. Conclusions

In this paper, a computationally efficient pose estimation algorithm is proposed based on vision data. In this approach, a new absolute orientation method is designed to replace the time-consuming SVD calculation with the operations such as Frobenius norm, determinant, and adjoint of matrix. Experimental results show that the computation time required for pose estimation with the proposed method is much less than the original SVD approach in C++ programming language, while the performance such as accuracy and noise resistance is maintained as similar with the original method. Therefore, the proposed fast post estimation method is more suitable for real world applications such as the embedded systems used in satellites or other space missions.