Abstract

The traditional image stitching result based on the SIFT feature points extraction, to a certain extent, has distortion errors. The panorama, especially, would get more seriously distorted when compositing a panoramic result using a long image sequence. To achieve the goal of creating a high-quality panorama, the improved algorithm is proposed in this paper, including altering the way of selecting the reference image and putting forward a method that can compute the transformation matrix for any image of the sequence to align with the reference image in the same coordinate space. Additionally, the improved stitching method dynamically selects the next input image based on the number of SIFT matching points. Compared with the traditional stitching process, the improved method increases the number of matching feature points and reduces SIFT feature detection area of the reference image. The experimental results show that the improved method can not only accelerate the efficiency of image stitching processing, but also reduce the panoramic distortion errors, and finally we can obtain a pleasing panoramic result.

1. Introduction

Image stitching is a process to combine a sequence of images together, mutually having overlapping areas, resulting into a seamless, smooth panoramic image [1]. The hand-held camera has limited resolution and small field-of-view, while the image stitching can get high-resolution and high-quality panorama by using hand-held equipment. Image stitching has become a hotspot in the field of computer vision, image processing, and computer graphics.

Image registration [2] is an important step in the image stitching process. The quality of image stitching greatly depends on the accuracy of image registration. According to the different methods for image registration, the image stitching algorithms are divided into two categories generally, region-related image registration [3] and feature-based image registration [4, 5]. Region-related image registration studies the relationship of the same dimension blocks between the input image and the reference image and computes their similarity degree. But when the image is rotated or resized, this method would not result in a desired result. While the textures are too strong or too weak, the result would also show enormous stitching errors. Feature-based image registration method uses mathematical models to find the abstract description features of the useful pixel information by comparing the description features to find the correspondence connection between the input image and the reference image. However, the traditional feature detection methods such as Harris corner and Susan operator do not have the invariance properties. So a stable feature detection method is requested for image stitching process. In 2004, Lowe proposed local scale-invariant image feature extraction algorithm (SIFT) [6], which received a good performance from different scale, different rotation direction, and perspective distortion images.

This paper uses the feature-based image registration method and selects scale-invariant SIFT features to implement panorama image stitching. The aim of image stitching is to transform multiple source images with areas overlapping each other to unify in the same coordinate system through transformation matrixes. Therefore, it is important to select a reference coordinate system. The traditional stitching process [79] constructs panoramic images from ordered image sequences, stitching step by step from left to right. The first image of the sequence is selected as the reference image. Subsequently, select the following stitching result as the next new reference image in the traditional stitching process. The traditional stitching process cumulates the matching errors of each stitching process in the reference image, which would be seriously distorted [10]; when the set of image sequences is large, it affects the quality of panoramic result. The improved method proposed in this paper first implements image registration for all adjacent images in the sequence and calculates the transformation matrix between the adjacent images and then took the middle image of the sequence as the reference image. According to the transformation matrixes of all adjacent images, the improved method can realize image anywhere in the sequence transforms to the coordinate space of the reference image. Therefore, all images in the sequence can be unified to the same coordinate system after undergoing all stitching processes. The experimental results show that the improved method reduces the distortion errors of panorama and saves the time of stitching process; moreover, it enhances the quality of the stitching result.

The rest of this paper is organized as follows. Section 2 describes the SIFT algorithm for extracting image features and the RANSAC algorithm for purifying the matching feature points; meanwhile it obtains the transformation matrix for the matching images. Section 3 introduced the algorithm of Levenberg-Marquardt which is used to adjust the parameters in the transformation matrix [11]. Section 4 describes the traditional image stitching process and Section 5 proposes the improved image stitching process. Section 6 shows the experimental results and quantitative analysis to evaluate the improved method. Section 7 acts as the conclusion of the whole paper.

2. Image Registration

2.1. SIFT Feature Extraction

Brown and Lowe proposed scale-invariant features extraction algorithm (SIFT) in 2004 and continuously proposed the image stitching process [9] based on SIFT algorithm in 2007. The SIFT algorithm detects the image features speedily and provides invariant property when the image is rotated, resized, or illuminated. The SIFT detection process is shown in Figure 1.

In order to simulate the multiscale feature, the SIFT algorithm proposes the scale space, which is defined as the convolution of a variable-scale Gaussian, , with an input image, :where represents the factor of the scale space. Large-scale factors correspond to the overviews of the image features and small-scale factors correspond to the details of the image features. Creating the Difference of Gaussians (DOG operator) function,

We search the extreme points in DOG space. Each sampling point is compared with the 8 neighborhood points of the same scale factor and 18 neighborhood points of the adjacent scale factors. The detected extreme points are chosen as the candidate keypoints.

In order to improve the stability of the keypoints, correct the parameters of the DOG operator based on the Taylor formula to obtain the accurate position and scale factor. Meanwhile, eliminate the low contrast points and unstable points at the edge to enhance the stability of the matching points. Around the neighborhood of the keypoints, the gradient orientation of the pixels is counted by histogram and then distributes the maximum orientation of the histogram for the keypoints to provide the rotation-invariant characteristic. The formula of gradient orientation shows as follows:

Until now the keypoints of the image are detected completely. Each keypoint has coordinate position, scale, and orientation information. Then we need to build a descriptor for each keypoint. At first, in order to ensure the rotation-invariant characteristic, rotate the coordinate axis to the orientation of the keypoint and then divide the region around the keypoint into subregions. Each subregion has 8-dimension vector of orientation information. Thus, there is , dimension vector in the keypoint descriptor. Finally, we normalize the keypoint descriptor to eliminate the influence of the illumination change. The original images (image sequence) are shown in Figure 2.

Figure 3 shows the results of extracting the SIFT features from the 4 original images in Figure 2. Different colors represent the different degrees of feature points in density. The higher density degree of feature points represents the more significant characteristics of feature points. Density degrees were sorted from high to low as red, green, blue, yellow, and black, 5 levels. Table 1 lists the experimental data of extracting SIFT features, including the number of SIFT feature points and the time of each step in detection SIFT features process.

2.2. RANSAC Algorithm

RANSAC (Random Sample Consensus) algorithm calculates the parameters of mathematical model based on random sampling, through continuously iterating new random combinations of sample data until a mathematical model that can explain or adapt to distribution of all data sets is found [12]. For each feature point of the image, search the matching points based on the minimum Euclidean distance using a k-d tree. The coordinate space mapping relationship between matching points is represented by the transformation matrix. This paper is intended to use RANSAC algorithm to obtain the transformation matrix between the matching points and then make matching images mapped to the same coordinate space by the transformation matrixes information. In this paper, image sequences were shot by a hand-held camera along the same horizontal plane. So the transformation matrix is suitable for the affine matrix. The affine matrix structure [13] is shown as follows:where the variable represents the image rotation angle. , separately represent the translation distance in axial direction and axial direction. In order to calculate the parameters of the affine matrix, at least 3 pairs of coordinate information pieces of matching points are needed. We assume that and represent a pair of matching feature points. The affine matrix can be obtained as follows:

In the panorama image stitching process, select pairs of matching points at random (Ensure that the 3 pairs of feature points in the input image composite into an invertible matrix.) [10]. Through formula (5), it calculates the affine transformation matrix and then iterates resampling the new matching feature points to find an affine matrix , which corresponds to the maximum number of inliers (whose projections are consistent with with a tolerance pixels), if we assume that represents the probability that a feature point matches correctly between the matching images. After times iterations, the probability of finding a correct affine matrix is . With the iteration number is increasing, the probability will increase as well. When and , the probability that a correct affine transformation matrix is not found approaches .

This paper first extracts the image feature points through SIFT algorithm, searches matching feature points in the overlapping areas of the adjacent images according to the minimum Euclidean distance, and then uses RANSAC algorithm to find maximum number of inliers within the set of matching points, obtaining the affine transformation matrix. After refining the matching points with the RANSAC algorithm, the set of inliers are set up with red lines shown as in Figure 4. The inliers number and the time of image registration are shown in Table 2.

3. Bundle Adjustment Using L-M Algorithm

L-M (Levenberg-Marquardt) algorithm is a nonlinear optimization algorithm that combines the Gauss-Newton method with the gradient descent method, making use of known measurement data to estimate the unknown parameters [14]. In this paper, we use L-M bundle to adjust the parameters of the affine transformation matrix and try to make the total matching errors among the matching feature points approach a minimum. Generally, the standard model of error function is defined as , where   represents the Euclidean distance between the matching points. However when there is a mismatching point, the Euclidean distance is so large that it will impact on the standard error function with a great weight. In order to eliminate the influence of mismatching points, we choose the Huber function instead of the standard error function. The Huber function [9, 15] can empower the weight dynamically according to the matching errors value:

Formula (6) shows the Huber function and (7) shows the weight function of the Huber. One has , where represents the standard deviation of the measurement data. The new error function is expressed as , which is equivalent to . When , the matching point is inlier corresponding to the fact that the residual is given weight 1. When , the matching point is outlier corresponding the weight for residual trend towards 0.

L-M algorithm continues to iterate until finding a set of parameters that make the error function minimize. We assume represents the solution vector of unknown parameters in the iteration process. represents the offset vector. Then the next iteration process in L-M algorithm can be expressed aswhere corresponds to the Jacobi matrix under the condition of the . represents unit matrix. is the residual sum of all matching points under . () represents the damping parameter. When gradually tends to 0, L-M algorithm is similar to the Gauss-Newton method, which continuously closes to the ideal value after each iteration process. When is large, L-M algorithm is similar to the gradient descent method, which can find the solution neighborhood of the unknown parameters speedily.

Table 3 makes a list of comparative data of total matching errors before and after L-M bundle adjustment. Total matching errors are expressed as , where represents the matching feature points number. Comparative data shows that L-M algorithm can truly reduce the image matching errors.

4. The Traditional Stitching Method

After referring to the paper published by Xiong and Pulli [7] and Brown and Lowe [9], we summarize that the traditional stitching method is shown as in the following steps.

Algorithm 1 (the traditional image stitching process). Consider the following.
Input. ordered image sequence which mutually have overlapping areas:(1)Select the first two images of the set to calculate the SIFT feature points, respectively.(2)Select the first image in the sequence as the reference image and the second image as the new input image. Use ( Nearest-Neighbours) algorithm [16] to search matching feature points between the new input image and the reference image in accordance with the minimum Euclidean distance.(3)According to the matching feature points data set, use the RANSAC algorithm to calculate the affine matrix , which can transform the new input image into the coordinate space of the reference image.(4)Use the L-M algorithm to optimize the affine matrix .(5)Use the optimized to transform the new input image.(6)Search the optimal seam between the affine transformation result in step (5) and the reference image; then, along the seam to combine them together seamlessly, obtain the stitched result .(7)Add the into to replace the input image and the reference image. In next stitching process the is selected as the new reference image.(8)Return to step (1) to continuously implement the next stitching process until there is only one image existing in the , which is the panoramic result.Output. A panoramic image.

However, in traditional method, each time when an image stitching is completed, the dimension of reference image will continue to increase. The overlapping area between the new input image and the reference image accounts for smaller and smaller proportion comparing with the entire reference image area. Therefore, calculating SIFT feature points for the whole reference image will consume a lot of system resources and registration time. In addition, the approach of the traditional stitching algorithm to selecting the next input image is relatively single. Each time when the adjacent image is selected as the new input image, it does not give priority to the overlapping area size or the matching feature point number between new input image and the reference image. Meanwhile, if there are many matching errors in image registration process, the matching errors will be preserved and affect every subsequent registration process because the reference image is formed by the previous stitched results. Finally, the panoramic image will be seriously distorted.

5. The Improved Stitching Method

In the image stitching process, as we know, the accuracy of image registration decides the quality of the panoramic image, and the affine transformation matrix is the final result, which is what we wanted by image registration. So it is important to obtain a precise transformation matrix. Therefore, we proposed a new mathematical model to compute the transformation matrix, which is shown aswhere represents the affine transformation matrix, obtained from each group of adjacent images, and represents the middle index of the sequence. The new mathematical model does not calculate the affine transformation matrix between the result of each stitching process and the new input image, but using the property of transitivity of matrixes obtained the affine transformation matrix for the new input image through all adjacent images. Figure 5 describes the process of calculating the affine transformation matrix for arbitrary image. Firstly, we do image registrations for all adjacent images to get the matching feature points information. Using the information of matching features, we can calculate the affine matrices between all adjacent images and then the affine matrix mapped from the input image to the reference image also can be obtained through the statistical matrices set according to formula (9). Due to the transitivity of the matrixes, we can get the affine matrix transformed from any image to the reference image.

The improved method does not detect the SIFT feature points for the whole reference image area but uses the statistical registration information of all adjacent images to calculate the affine matrix. So the improved method not only reduces the SIFT feature detection area of the reference image, but also increases the number of matching feature points and improves the efficiency of stitching time.

On the other hand, in contrast to stitching serially from left to right one by one in the traditional method, the improved method takes the middle image as the reference image and selects the next input image based on the number of matching points and completes the image sequence stitching process from middle position dynamically expanding to both sides. In the improved stitching process, the maximum accumulated matching errors are computed from the middle image to the two sides of the image sequence, as shown in (10), reduced to the half level of the traditional method:where represents the matching points number between the image and the image and represents the residual of the matching feature points. After obtaining the affine transformation matrix using RANSAC algorithm, we can continuously use L-M algorithm bundle to adjust the affine matrix to further reduce the accumulated matching errors. Since the stitching errors are reduced obviously in the improved method, the distortion degree of panorama is lower compared to the traditional method.

Algorithm 2 (the improved image stitching process). Consider the following.
Input. ordered image sequence which mutually have overlapping areas:(1)Do image registrations for all adjacent images.(i)Assume the adjacent images are and , , use algorithm to search matching SIFT feature points between and in accordance with the minimum Euclidean distance, and then preserve the matching feature points data and the matching points number separately by using the arrays and .(ii)According to the matching feature points data set , compute the affine matrix that can be transformed from the coordinate space of the image to the coordinate space of the image and then preserve the affine matrix in the array .(iii)Return to step (i) to complete the image registrations for the next adjacent images.(2) represents the middle index of the set . Take as the reference image. According to the array , select the image which has larger matching points number with as the next new input image.(3)According to the index of the new input image and the array , calculate the affine matrix between the new input image and the reference image based on formula (9).(4)Use the L-M algorithm to optimize the affine matrix .(5)Use the optimized to transform the new input image.(6)Search the optimal seam between the affine transformation result in step (5) and the reference image; then along the seam combine them together seamlessly and obtain the stitched result .(7)Add the into to replace the input image and the reference image. Therefore, the stitched result becomes the new middle image of the sequence .(8)Return to step (2) to continuously implement the next stitching process until there is only one image existing in , which is the panoramic result.Output. A panoramic image.

Figure 6(a) is the original image sequence of the scene. Figures 6(b) and 6(c) are the panoramic images, respectively, obtained by the traditional method and the improved method. Comparing the two panoramic images, we can conclude that the result from the improved stitching method is better. The distortion in the traditional stitching result is significantly reduced by the improved stitching method.

6. The Experimental Results and Analysis

6.1. The Experimental Results

In order to demonstrate the stability of the improved method, we select image sequences from multiple scenes to implement the improved stitching algorithm. Figures 7(a), 8(a), and 9(a) are the parts of the experimental scenes. The source image size in Figures 7 and 9 is and the size in Figure 8 is . Figures 7(b) and 7(c) show the experimental results of the traditional stitching process and the improved stitching process. Figure 7(d) shows the stitching result after adding the L-M adjustment in the improved method. Because of the obvious seam in the panorama, continue to search the optimal seam between affine transformation result and the reference image; along the seam combine them together seamlessly in the stitching process. The result is as shown in Figure 7(e). Figure 7(f) shows the panoramic result after cutting the useless area. In order to detect the stability of the improved stitching process, continue to select longer image sequence. Figure 8(a) lists 12 computer graphic images and Figure 9(a) lists 16 building images. The rest of the images in Figures 8 and 9 show the experimental results corresponding with the images in Figure 7.

The experimental results in Figures 7, 8, and 9 show that the improved stitching method is adaptable for the longer image sequences as well. The stitching process can significantly improve the panorama distortions, obtaining high-resolution and high-quality panorama.

6.2. Estimation of Performance

This paper takes the middle image of the sequence as the reference image. Do image registrations first for all adjacent images. Then we calculate the affine transformation matrix according to the overlapping areas transitivity among the adjacent images, instead of calculating the affine transformation matrix between the new input image and the reference image directly. In addition, we can get more matching feature points by the improved method. So we can provide more sample data for RANSAC algorithm, which can increase the accuracy of the affine transformation matrix, and assure that we can obtain high-quality panorama at last.

Figures 10(a), 10(b), and 10(c) show the comparative data of matching feature points number between the traditional method and the improved method; the results correspond to the scenes of Figures 7(a), 8(a), and 9(a). The three scenes, respectively, have 8, 12, and 16 images in the sequence. We use to represent the stitching process. The three charts show that the improved method has better performance, which can get more matching feature points.

Figure 11 shows the comparative data of stitching time between the traditional method and the improved method. The stitching time is recorded from the beginning of the stitching process to the end of finishing the last image stitching. The chart shows that the improved method has faster time efficiency. Moreover, the time efficiency will be improved more obviously for the longer image sequence.

At the same time, from Figures 7, 8, and 9, we can see that the improved algorithm obviously eliminates the distortion of the traditional results and gets better panoramic results. We assume that () represents the center of the input image after being transformed by the affine matrix , and we preserve the center point of each stitching process in the set of the variable . To some extent, could reflect the degree of panorama distortion. We use formula (11) to express the distortion degree, in which the represents a function of computing slope between two points:

In addition, we also define the variable , which represents the ratio between the panorama image pixel number after removing the useless black pixels and the entire panorama pixels number. For example, in Figure 9(e), there are some black background pixels, which are useless and would be cut in Figure 9(f). The expression is showed as formula (12), and the , are the panorama dimensions:

From Table 4, we can see that the and , compared with the traditional method, have been pleasingly improved.

7. Conclusion

This paper increases the number of the matching SIFT feature points by improving the traditional method, reducing the unnecessary SIFT detection area in the reference area as well. In addition, the improved method accelerates the panoramic stitching time efficiency and obtains a high-resolution, high-quality panorama image. Compared with the traditional method, the improved method proposed in this paper starts stitching from the middle position of the image sequence. The middle image is taken as the reference image. Then, we obtain the affine matrix for any image in the sequence to the reference image according to the statistics information of all affine transformation matrixes between the adjacent images. Meanwhile, the improved method dynamically selects the next new input image to join the stitching process based on the number of statistical matching points of the adjacent images. The experimental results verify that the improved stitching process can accelerate time efficiency and reduce the distortion of the panorama image.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by Chongqing Frontier and Applied Basic Research Project under Grant no. cstc2014jcyjA1347, Chongqing Science and Technology Research Project of Chongqing Municipal Education Commission under Grant no. KJ1402001, and Outstanding Achievements Transformation Projects of University in Chongqing under Grant no. KJZH14219. The authors wish to thank the associate editors and anonymous reviewers for their valuable comments and suggestions on this paper.