Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2014 (2014), Article ID 452803, 14 pages
http://dx.doi.org/10.1155/2014/452803
Research Article

Feature Based Stereo Matching Using Two-Step Expansion

1School of Instrumentation Science & Opto-Electronics Engineering, Beihang University, Beijing 100191, China
2National Institute of Metrology, Beijing 100029, China

Received 2 January 2014; Revised 19 June 2014; Accepted 21 July 2014; Published 18 December 2014

Academic Editor: Yi Chen

Copyright © 2014 Liqiang Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper proposes a novel method for stereo matching which is based on image features to produce a dense disparity map through two different expansion phases. It can find denser point correspondences than those of the existing seed-growing algorithms, and it has a good performance in short and wide baseline situations. This method supposes that all pixel coordinates in each image segment corresponding to a 3D surface separately satisfy projective geometry of 1D in horizontal axis. Firstly, a state-of-the-art method of feature matching is used to obtain sparse support points and an image segmentation-based prior is employed to assist the first region outspread. Secondly, the first-step expansion is to find more feature correspondences in the uniform region via initial support points, which is based on the invariant cross ratio in 1D projective transformation. In order to find enough point correspondences, we use a regular seed-growing algorithm as the second-step expansion and produce a quasi-dense disparity map. Finally, two different methods are used to obtain dense disparity map from quasi-dense pixel correspondences. Experimental results show the effectiveness of our method.

1. Introduction

Stereo matching is an international research focus of computer vision [1]. It can produce a disparity map from stereo images which are captured by cameras in different viewpoints. This technology is important in 3D reconstruction, virtual view rendering, and automatic navigation. It is a key point to know how to compute a precise disparity map in a complex environment by stereo matching. There is much excellent research to solve this problem. However, it still has some inherent challenges, such as unavoidable light variations, textureless regions, occluded areas, and nonplanar surface, that make the disparity estimation difficult [24].

To solve the inherent problems, numerous methods have been proposed in the past two decades. They consist of local and global methods [5, 6]. Local methods generally compute the correlation between these points and candidates over an adequate window and then use winner-takes-all (WTA) algorithm to find the best candidate to the point [7, 8]. They are fast to compute a disparity and flexible to model parametric surfaces within the neighborhood but have difficulties in handling poorly textured and ambiguous surfaces. Global methods are different from local approaches; they commonly integrate prior constraints into optimization of the point correspondences to solve the poorly textured areas and lessen the matching ambiguities. They produce the disparity map by an energy minimization algorithm and have a better performance in poorly textured and textureless regions but are limited to model piecewise planar scenes [9]. Global methods have a goodish performance when the viewpoints are close [10] but do not handle well when the space of viewpoints becomes large [11, 12].

In large-scale stereo images, ambiguous areas exist more than their short-baseline counterpart. Whether the viewpoints are close or wide, there are always some significant features, such as points of interest, which are invariable. An alternative method uses reliable feature correspondences as seeds and expands these points by using a growing-like process to obtain more point correspondences [1318]. The methods named seed-growing or region-growing can yield much better results in large perspective distortions and increased occluded areas than traditional ones. Seed-growing methods have a low computational complexity since they are not using global optimization but are sensitive to mismatches. To lessen the influence of wrong points, Cech and Sara [19] employed an optimal solution and introduced an improved growing method which can handle many difficult instances, such as repetitive or complex textures. The method does not need each seed to be accurate in disparity map. However, seed-growing algorithms only generate a semidense disparity map because of sparse feature points.

To overcome drawbacks of traditional matching methods and seed-growing algorithms, the matched features are naturally integrated into state-of-the-art stereo methods as soft constraints [3, 20]. In these methods, a primary work is to find accurate point correspondences as GCPs (ground control points) [21]. GCP-based approaches improve stereo matching accuracy and correctness. However, GCP-based approaches need much time to obtain an accurate disparity map.

In this paper, a two-step expansion based robust dense matching algorithm is proposed based on the previous works [19, 2224]. Sparse support points are obtained by state-of-the-art feature matching methods [22, 23]. Before two-step expansion, the segmentation-based prior [24] is used to encode the assumption that the region which has the same color is a 3D surface. The first-step is a feature expansion that is presented based on the invariant cross ratio of projective transformation. The basic idea is to match more features from initial support points in uniform region via cross ratio constraint. However, there is no ability to find enough matched pixels to obtain dense disparity map. To obtain more point correspondences, in the second-step, the matched features from the first-step are used as seeds to grow and build a quasi-dense disparity map which is denser than the feature correspondences of the first-step but not an absolutely dense disparity. About the process stage from quasi-dense disparity to dense disparity, the paper introduces two methods: (i) fitting process: a planar surface fitting is used to remove mismatches and can fill blank occluded areas in the uniform region and (ii) synthesized method: an optimal solution incorporates quasi-dense pixels into global energy methods to reduce the matching ambiguities.

This new work mainly focuses on the first-step that uses a feature-expanded algorithm for stereo matching. In the first step, we suppose that it is a set of sparse points whose coordinates are given in the same 3D surface, and the coordinates of the homologous image pixels satisfy projective geometry of 1D in horizontal axis. Our motivation comes from the theory that the points of axis satisfy 1D projective transformation and that the cross ratio is invariant. By using the invariance of cross ratio, the inhomogeneous coordinates of each corresponding pixel can be approximated. The accurate coordinates of the corresponding pixel are found by a search model that computes a correlation statistic for neighboring pixels. In addition, to solve the poorly textured regions, we employ a propagation algorithm to expand low feature pixels. Occluded areas can be filled by a fitting process or a synthesized method, and the fitting process method does not use cross-checking (checking and optimizing the disparity by computing the differences between left-to-right disparity and right-to-left disparity). Experimental results demonstrate that the method of two-step expansion has considerable performances over the existing ones. It can produce denser disparity than these existing seed-growing algorithms, and it has a goodish result in short-baseline and wide-baseline stereo matching.

The paper is structured as follows: firstly, related work is discussed in Section 2. In Section 3, we introduce a support-point based expansion algorithm with cross ratio constraint. Then, a two-step expansion method is described, and it mainly presents the first-step about application of features expanded in Section 4. In Section 5, we describe two different methods to produce dense disparity map. Finally, we give the experimental validation supporting the feasibility of the method in Section 6. In Section 7, we give a conclusion and hint some future works.

2. Related Work

There are numerous literatures related to this work. Firstly, Scharstein and Szeliski [1] summarized dense stereo methods and established an early test bed for stereo matching algorithm. Then, Geiger et al. provided a newly outdoor challenge [25] for the quantitative evaluation of large-scale stereo matching. Seitz et al. [26] introduced a comprehensive study and made a comparison of stereo techniques. It included two main strategies for obtaining stereo correspondence: feature correspondences based local approaches and energy-minimum based global methods. In our method, the previous two-step expansion algorithm and the latter fitting process stage belong to the first strategy, and the later synthesized method falls in the second one.

Dense energy-minimum based global methods had a good performance in the past decade. Local stereo algorithms based on feature correspondences are speedy to estimate disparity [1, 27] but cannot effectively handle the blurry border and mismatches [7]. Hence, most excellent stereo matching algorithms rely first on using local approaches to find the pixel correspondences and then incorporate them into global constrains by dynamic programming (DP) [2831], level sets [32], space carving [33], PDE [12, 34], EM [35], and voxel coloring [36]. Recently, two global methods based on Markov random fields (MRFs) are used as basic algorithms to be improved: Graph Cuts [37] and Belief Propagation [38]. Many works of research about both of them have achieved a desirable result [4, 39, 40]. Both methods are often used to be comparable data of the top contenders in the realm of dense stereo matching and are powerful tools to produce disparity map but intractable to finish the solution in wide-baseline stereo. In contrast, our method can lessen the matching ambiguities and is efficient to large-scale stereo matching.

Sparse local feature based approaches are robust to the large-scale images. Image features play an important role in computer vision. They have already been used in wide-baseline stereo matching [4143]. In a wide-baseline setup, the inherent problems are perspective distortions and occlusion. Feature based matching methods are particularly effective because features are robust, distinctive, and invariant to various image and scene transformations [22, 23, 4447]. However, the traditional methods based on feature matching produces only sparse pixel correspondences. To find more matched points than features, a propagation algorithm from the matched points to their neighbors is introduced.

The rule of growing a region from primary seeds was used to segment image [48]. The seed-growing principle was originally introduced into stereo matching by Otto and Chau [49], O’Neill and Denos [50], and Kim and Muller [51] and used for photogrammetric community. Then Lhuillier and Quan [15, 52] employed the epipolar constraint and uniqueness constraint to greedily reproduce adjacent components in disparity blankness from corresponding seeds. The growth algorithm cannot achieve a good performance in the areas of repetitive patterns. The best first strategy as an optimal solution was used to replace the pixel-wise growth increments by Zeng et al. [17, 18]. And the optimization cannot be able to remove the previous match errors, especially in complex scenes. Kannala and Brandt [53] and Megyesi et al. [54] introduced a propagation algorithm by affine deformation of image similarity patches. But it had inaccurate affine parameters due to wrong initial seeds and made a bad propagation. Cech and Sara [19] introduced an optimal solution and presented a seed-growing method that could recover from errors in initial seeds. However, the method only produced a semidense disparity map. In contrast, our method can not only handle the difficult instances (e.g., repetitive texture, complex scene, and wrong initial seeds) but also produce denser point correspondences than the existing methods.

To compute an accurate dense disparity map, we incorporate quasi-dense pixel correspondences as GCPs into state-of-the-art global matching framework. In these literatures about stereo matching, GCPs-based methods can achieve a precise result. Bobick and Intille [2] used GCPs to optimize DP solution and reduce large occlusion. GCPs were used in preprocessing stage to guide the previous matching process and could reduce false matched points by using the method of Kim [3] and Wang et al. [20]. In [21], a GCPs-based regularization was incorporated into global method by using the Bayes optimization rule. In contrast, our method does not need provided special GCPs and can offer quasi-dense pixel correspondences as GCPs.

Geiger et al. proposed a generative probabilistic model ELAS [7] for wide-baseline stereo matching and offered a challenging KITTI dataset [25]. On KITTI dataset, these methods [5557] that were used to compute optical flow had better results. In contrast, our method is a just strategy for stereo matching and receives a result compared with ELAS.

3. Efficient Expansion with Cross Ratio Constraint

3.1. Cross Ratio Constraint Model

In the epipolar geometry of two views, it can restrict the corresponding point on the polar line. To find the precise position of the corresponding point, traditional algorithms employ exhaustive search along the corresponding line and give a statistic for correlation of all candidates. To fasten the position estimation of the corresponding point on line, we introduce a new constraint based on 1D projective geometry.

We assume there is a stereovision system as shown in Figure 1. It can be seen that there are three sets of four collinear points in the polar plane . Each set is related to the others by a line-to-line projective transformation. Since the cross ratio is invariant under a 1D projective geometry, it has the same value as

Figure 1: Two cameras are indicated by their centres and and their image planes and . There are 4 3D points , , , and in uniform 3D planar surface , and the point projects to and in the images and , respectively. Line in the right image and line in the left image are epipolar lines separately with respect to points and . Two camera centres, 3-space point , and its images and lie in an epipolar plane . The intersection of the planes and determines the line in 3D. The 3D points , , and are the closest points on the line to the 3D points , , and . Points , , and in the left image and points , , and in the right image are projected by 3D points , , and , and points , , and lie on the line , and points , , and lie on the line .
3.2. Estimation Model via Cross Ratio

The cross ratio constraint based on 1D projective geometry needs three or more known 3D points. Thus, it needs to obtain the known 3D points. We employ features matching algorithm as prior to produce reliable point correspondences which can be used to calculate the fundamental matrix. The proportional coordinates of the known 3D points can be estimated by the point correspondences and the fundamental matrix. It can produce more errors when the region including the known 3D points is not a planar surface.

To lessen the error, we know that the points on the same epipolar line satisfy 1D projective geometry whether the surface is plane or not and introduce a search strategy that uses image points near the epipolar line instead of the 3D points, as shown in Figure 2. Suppose the images have been rectified and the point correspondences lie on the same line in both images. We wonder how the corresponding point in the right image is found. Firstly, we can find the corresponding polar lines and . Then we can find a point satisfies the following equation: If the points , , and , , are the homologous points that the 3D points , , and project to two images separately in Figure 1, the points , , , and , , are not the corresponding points projected by the 3D points , , and because of projective transformation. Hence, the point is not the corresponding point , but adjacent to the point . The distance points to the epipolar line are shorter; the position of is nearer to the point . We employ a probabilistic search strategy to ensure the point in the has contiguous pixels along the line .

Figure 2: and are the images observed from the camera centres and separately. Given three sets of matched point pairs, , and , an unmatched point in the left image and the unknown corresponding point in the right image, and are the lines on which the corresponding points and lie. Points , , and are the closest points on the line to the points , , and in the left image, and points , , and are the closest points on the line to the points , , and in the right image.
3.3. Search Strategy

The search strategy computes all correlations with the neighbors of the point and decides the position of the point . As shown in Figure 3, a set of neighborhoods of size whose centre is point is built as a set of candidate matches. The value is the radius of search and is decided by the maximum of the Euclidean distances from referenced points , , and to the line . If the distance from reference point to the line is , then the maximum is and , where ratio, , is nonzero constants for proportional and fixed radius. We use sum of absolute differences (SAD) [58] on window as image similarity statistic between the point and all candidate points in the right image, where is a positive constant. Assuming the SAD value between the point and candidate points is , where , if the SAD value between and is , and the th point in candidate ones except the point has the SAD minimum and , where , then the corresponding point is defined as where is a proportional constant and is the threshold for the correct correspondence, if means no corresponding point.

Figure 3: and and and are separately the corresponding planes and lines. Given a pair of points whose is estimated by cross ratio model, is a set of neighborhoods of the point . is the size of window which is used to compute correlation.

4. A Two-Step Expansion Method

In this section, we describe a two-step expansion algorithm based on image features to compute quasi-dense point correspondences between two views. Our method is inspired by observing an instance where all points in the uniform surface satisfy 1D projective geometry in horizontal axis. And in 1D projective transformation, the cross ratio of the projected points is invariant. Our algorithm is arranged as follows: firstly, a sparse set of initial support points are found by excellent feature matching method. Then, in the first-step expansion, we use segmentation-based prior to partition the image into different regions and employ the invariance of cross ratio as a restrictive condition to find the more corresponding feature points from the support points in the same region. Finally, a regular seed-growing approach is used to obtain more pixel correspondences as the second-step expansion.

Suppose there exists a pair of images , where , are the separately left and right images, this section aims at finding the quasi-dense disparity corresponding to . To expediently introduce our method, we suppose input images and are rectified, such that corresponding points lie on epipolar lines of two images.

4.1. Initial Support Points

Before expansion, we introduce how to establish a sparse set of feature correspondences as initial support points. Most algorithms which are used to extract image features can be categorized as either corner detectors (such as Harris and Stephens [22] and SUSAN [46]) or descriptor extraction (such as SIFT [23], SURF [44], and DAISY [47]). Recently, a regional feature detector [59] based on descriptor [23] had a good performance in dealing with the large-scale instance. In our method, we employed regular Harris method [22] to obtain initial support points. While in the presence of large disparity ranges, the number of the successfully matched Harris points is less than the threshold (which is decided by the number of the segmented regions; refer to Section 4.2), scale invariant feature transform (SIFT) algorithm is used to extract features, and the KD-tree with the best bin first (BBF) [60] algorithm is employed to index and match these features. We assume the is matched point pairs by feature matching method, where and are points from two images.

4.2. The First-Step Expansion

At this stage, our objective is to compute all the possible feature point correspondences through the initial support points in the uniform region. The first-step expansion is based on segmented regions; thus, we employ the mean-shift method to segment the reference image before expanding feature points. The mean-shift algorithm which was successfully used to partition images by Comaniciu and Meer [24] can ensure our method estimates regions correctly and localizes depth boundaries precisely. The result of image segmentation will set a different label to each segmented region. If the number of the segmented regions is , then the threshold in Section 4.1 is .

In Section 3, we introduce an expansion model based on 1D projective geometry in the same planar surface. We now use the expansion model as the first-step expansion algorithm. More formally, let be a set of labels with respect to the different segmented regions of the left image. Each pixel is assigned to a corresponding label where .

We assume the initial support points belonging to a label construct a set of samples , where . In this step, we spread sweeping feature correspondences from initial support points in the same region. In our method, the feature means a point whose absolute value of gradient is bigger than 1. Hence, our prior is a process that computes gradient of each pixel in the image and selects the pixels whose gradient as our candidate feature points. Suppose we have found all the feature points   and each feature is assigned to a corresponding label . We will introduce how to find the corresponding point through .

The expansion algorithm mainly is based on epipolar geometry and 1D projective transformation. Epipolar restraint has been used to rectify the image and restrict the corresponding points to the same lines in the images. We just need to find three support points to estimate the probabilistic position of the corresponding point. The number of the initial support points in each region is not fixed and can be sorted to two statuses: more than three points and no more than three points. This step mainly handles the first status.

When the number of is more than three, as shown in Figure 4, we can consider each horizontal axis on which pixels lie is its corresponding epipolar because of image rectification. Suppose each point , where , and three support points which would be found satisfy the following conditions: (i) three points have a minimum summation of the distances to the epipolar line at the same time and (ii) any two -coordinates of all points should not be equal. For example, as shown in Figure 4, the support points of are , , and and the support points of are , , and , that satisfy the two conditions. The corresponding search radii are separately and . Then the corresponding point can be found by the method of Section 3.

Figure 4: In the region , a set of matched points has been known, and and are the points which need to search the corresponding point. The horizontal axes and are the epipolar lines of the points and , separately.
4.3. The Second-Step Expansion

The second-step employs a regular seed-growing method to obtain stable correspondences in poorly textured regions. Suppose the first-step produces a list of point correspondences . We regard the point correspondences as seeds to grow corresponding patches. Despite the fact that the first- step can find more effective point correspondences, it inevitably introduces errors in complex areas. The traditional seed-growing algorithms do not handle wrong initial seeds well. To overcome the drawbacks, Cech and Sara [19] temporarily forwent uniqueness constraint, propagated most disparity ingredients, and then optimized them to remove these false disparity components. Hence, the second-step employs the method of Cech to obtain quasi-dense disparity .

Cech method includes two phases: (i) growing and propagating as many seeds as possible regardless of their overlaps and (ii) optimizing these seeds of the first phase and removing these false ones. The seed-growing method of Cech can keep accurate point correspondences and recover most disparities from false seeds. The detailed descriptions of the seed-growing method can be referred to the literature [19].

5. Obtaining Dense Disparity Map

The two-step expansion method cannot find all pixel correspondences in some regions because of occlusion and cannot produce completely dense disparity map. We introduce two different processes to compute dense disparity map from quasi-dense point correspondences. One is a filling process by regional 3D surface fitting; the other is a synthesized method that integrates quasi-dense pixel correspondences as GCPs into global optimization frameworks in a principled way.

5.1. Fitting Process

In Section 4.2, we have obtained different regions from the image by the segmentation-based prior. The segmented regions may be with respect to different 3D surface. Now we assume each 3D surface is planar. In some regions of a quasi-dense disparity , there may be only a few corresponding points and some piecewise patches which are built by unmatched points due to occlusion. A 3D planar surface fitting can be applied to fill the uncharted patches in the same region.

Assume there exists a set of pixel correspondences in an arbitrary region , and we use the regional data   to compute a 3D plane . We describe each pixel of the quasi-dense disparity as , where and are the coordinates of in the image, and is a corresponding disparity. Then, we can use a set of points to fit a 3D planar surface: where , , , and are the parameters which are used to describe a plane. These pixels belonging to the same area satisfy the 3D plane equation and can be computed the involved disparities.

5.2. Synthesized Method

Recently, a mixed stereo model which uses these known point correspondences as the GCPs to improve the result of global matching has a good performance in textureless areas and occlusion.

Synthesized method is inspired from the method of Wang [21] and formulates the stereo modal as a MAP-MRF problem. Assume the quasi-dense is produced from a pair of images by two-step expansion. Based on Bayes’ rule, the posterior probability of the disparity map is expressed as . Finding the maximum posterior cost means minimizing the corresponding negative log likelihood. Thus computing a disparity map becomes the problem for minimizing the energy function: where is a function to estimate the probability for disparity map, is a smoothness term to encourage similar neighboring points in locally smooth region, and is the energy of to constrain the accuracy of disparity map . The details can be referred to the literature [21].

5.3. The Overall Algorithm

The process of two-step expansion algorithm is summarized as in Algorithm 1.

Algorithm 1: Two-step expansion algorithm.

6. Experiments

We took different experiments to demonstrate the validity of our method. In Section 6.1, we compared our approach to the seed-growing method of Cech on the real complex scenes [19]. It was a test on running time for different image resolutions in Section 6.2. We then separately evaluated fitting process and synthesized method on Middlebury benchmark short-baseline stereo images with known ground truth data in Section 6.3. In Section 6.4, we tested our algorithm on large-scale stereo image pairs.

Throughout all experiments we set , , , , and , which were empirically determined. All experiments were operated on the computer with Intel core 2 duo CPU and 2.93 GHz clock frequency. Unless stated otherwise, we employed regular Harris method to obtain initial matched points and performed mean-shift image segmentation using EDISON code [61] implementation of Comaniciu’s paper [24].

6.1. Computing Quasi-Dense Disparities

Firstly, we can obtain quasi-dense disparity map by the two-step spreading. We demonstrated the difference between the seed-growing method and our algorithm by comparing their performances on some real data. In known seed-growing algorithms, the method proposed by Cech and Sara [19] has a better performance, even in the presence of repetitive patterns. Hence, we compared our approach to the method of Cech. We tested different stereo images from the Cech dataset [62], that is, St. Martin, Head, and Larch. The relevant quasi-dense disparities of the images are shown in Figure 5. It can be seen that our algorithm can produce denser disparity map than the algorithm proposed by Cech. A comparison to the number of the corresponding points in different images is depicted in Table 1.

Table 1: Comparison of the results on number of corresponding points.
Figure 5: Results for quasi-dense disparities of Cech dataset are as follows: (a) St. Martin, (b) Head, and (c) Larch. Disparity maps are partitioned in different colors: colder color means smaller disparities, warmer color means larger disparities, and deeply blue areas are unassigned disparity.

This experiment result demonstrates that our method can produce a quasi-dense disparity map via a sparse set of initial feature correspondences. Our method does not need too accurate matched features as seeds. In a repeated experiment, our method always found more point correspondences than Cech’s method.

6.2. Running Time

The term of running time is relative to the elements, that is, image resolution, segmented regions, and initial support points. We changed the image resolutions for Tsukuba, Teddy, Cones, and Venus from Middlebury benchmark [63] and then took a running time statistic with respect to the three elements separately. We downscaled the images bicubically by 10%~90%, tested the running time in different resolutions, and recorded the corresponding numbers of regions and points. As shown in Figures 6(a), 6(b), and 6(c), it is related to the illustration about the running time of images in different resolutions, regions, and points. It can be seen that the numbers of the segmented regions and the initial points are more, and the corresponding running time is shorter in the same resolution. Figure 6(d) shows the corresponding relations between segmented regions and matched points in different image resolutions.

Figure 6: The relations of running time to (a) image resolution, (b) number of regions, (c) number of support points on the Tsukuba, Teddy, Cones, and Venus image pairs, and (d) the relevant segmented regions and corresponding points in different resolutions of the images.
6.3. Short-Baseline Stereo Matching

We tested the fitting process and the synthesized method on several image pairs, that is, Tsukuba, Venus, Teddy, and Cones from the Middlebury benchmark [63]. The maximum of the disparity for the images is less than 100. Firstly, we used a two-step method to produce the different quasi-dense disparities of the images. Then, we found the corresponding disparity maps by the fitting process and synthesized method. When we implemented the fitting process, we restricted the maximal difference of disparity in the same region less than 10. In the synthesized method, we employed Graph Cuts [37] as our assistant global method. The goal is to compute a disparity map by the function (5). In time statistics, the fitting process takes about 1.3 minutes to estimate a disparity and the synthesized method takes about 1.8 minutes. We demonstrate in Figure 7 the results for the images of Tsukuba, Venus, Teddy, and Cones. As can be seen, these disparities produced by synthesized method have a clear structure and few blurry areas.

Figure 7: Results of our two different methods on short-baseline dataset. (a) Left images. (b) Results of fitting process. (c) Results of synthesized method. (d) Ground truth disparities.

To evaluate the performance of our method, we used the quality measure method proposed in [1] with known ground truth data to evaluate the synthesized results. The matching results rank 87 and 62 with respect to 1 and 0.5 pixels error in Middlebury website. These competitive algorithms were based on these classical original methods and proposed an improved algorithm. They commonly integrated many technologies into their methods and had a better performance. Our method is first proposed by us without more refined technologies. To verify the validity of the method, we compared it with these classical original methods, that is, GC (graph cuts) [37], CSBP (constant-space belief propagation) [64], DP [29], and SO (scanline optimization) [1], as shown in Table 2, where the absolute error is more than 2 pixels. Quality evaluation uses the three performance measures: nonocc (bad pixels of nonoccluded), all (bad pixels of the entire image), and disc (bad pixels of near discontinuous).

Table 2: Comparative performance of stereo algorithms according to Middlebury methodology.

It can be seen from Table 2 that our method has a better result than these of the traditional methods. The wider the baseline is, the more obvious the accuracy of the disparities is, for example, the scenes of Teddy and Cones. Because our method is based on feature matching which is efficient to large-scale images. On the Venus scene, the error of our method is not the lowest in these methods. Since global methods employ energy-minimum function to optimize the disparity map and have a better performance than feature-based local methods in short-baseline. The images of Venus scene belong to short-baseline. The global methods of GC and CSBP have more accurate results than our method on the Venus scene.

6.4. Large-Scale Stereo Matching

Though short-baseline stereo matching can yield accurately dense disparity, there is much more challenge in large-scale stereo images because of too much occlusion. In large-scale stereo images, we computed the disparity just via the fitting process and without the synthesized method. Firstly, we compared the fitting process on a wide range of baseline high resolution images, that is, Aloe and Raindeer from the Middlebury benchmark [63] whose maximum of the disparity is bigger than 200. In particular, we compared our method against the method ELAS proposed by Geiger et al. [7], as shown in Figure 8. We compute all erroneous pixels of the entire image whose absolute error is more than 3 pixels. The error results of our method are 13.03% and 20.36% separately corresponding to the images Aloe and Raindeer, and the results of ELAS are 14.14% and 22.28%.

Figure 8: Comparison to Geiger’s method on the Aloe and Raindeer image pairs. (a) Left images. (b) ELAS results. (c) Fitting process results. (d) Ground truth disparities.

Then, we took a test on the KITTI dataset [25], which consists of 194 training and 195 test pairs of urban images. The training images with semidense ground truth disparities are used to adapt the parameters of stereo matching methods. There is no parameter to be trained and modified in our method. The test images without ground truth are used to evaluate participants in the challenge. On the dataset, the main problem is how to handle these textureless areas. We computed the disparity maps of these test images via the fitting process, and some results are shown in Figure 9. The average run time for computing a disparity map is about 4.7 minutes. The matching results rank 38 and 35 with respect to 3- and 5-pixel error in KITTI website. We compared our method to the similar methods, that is, ELAS [7], GCSF (growing correspondence seeds flow) [55], and GCS (growing correspondence seeds) [19], as shown in Table 3, where Out-Noc is the percentage of erroneous pixels in nonoccluded areas, and Out-All is the percentage of erroneous pixels in total. Avg-Noc is the ratio of average disparity or end-point error in nonoccluded areas. Avg-All is the ratio of average disparity or end-point error in total. The qualitative results for this dataset are similar to the previous evaluation. We are able to robustly reconstruct large-scale images, which leads to low error rates on the street and on other slanted surfaces.

Table 3: Comparative evaluation results on KITTI test dataset.
Figure 9: Results on urban scenes. (a) Left images. (b) Our method results. Best viewed is in different colors.

7. Conclusion

In this paper, we introduce a two-step expansion to produce precise disparity maps from stereo images whether the stereo baseline is short or large. Our method is based on feature matching and can cope with the difficult cases such as large perspective distortions, increased occluded areas, and complex scenes. Our experiments on Cech’s dataset, the Middlebury benchmark, and KITTI dataset demonstrate that our method achieves good results in the real complex scenes, short or wide baseline image pairs. Importantly, we introduce a cross ratio restraint model to expand more feature correspondences based on state-of-the-art feature matching.

Our method primarily involves performing point computation in large numbers of segmented regions, which is fit for implementing in GPU and can real-time compute the disparity map of stereo images.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  1. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002. View at Publisher · View at Google Scholar · View at Scopus
  2. A. F. Bobick and S. S. Intille, “Large occlusion stereo,” International Journal of Computer Vision, vol. 33, no. 3, pp. 181–200, 1999. View at Publisher · View at Google Scholar · View at Scopus
  3. J. C. Kim, K. M. Lee, B. T. Choi, and S. U. Lee, “A dense stereo matching using two-pass dynamic programming with generalized ground control points,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1075–1082, June 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. J. Sun, N. Zheng, and H. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. H. Sadeghi, P. Moallem, and S. A. Monadjemi, “Feature based dense stereo matching using dynamic programming and color,” International Journal of Computational Intelligence, vol. 4, no. 3, p. 179, 2008. View at Google Scholar
  6. L. Valgaerts, A. Bruhn, M. Mainberger, and J. Weickert, “Dense versus sparse approaches for estimating the fundamental matrix,” International Journal of Computer Vision, vol. 96, no. 2, pp. 212–234, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  7. A. Geiger, M. Roser, and R. Urtasun, “Efficient large-scale stereo matching,” in Proceedings of the 10th Asian Conference on Computer Vision (ACCV '10), November 2010.
  8. B. M. Smith, L. Zhang, and H. Jin, “Stereo matching with nonparametric smoothness priors in feature space,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 485–492, Miami, Fla, USA, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. L. Tang, H. T. Tsui, and C. K. Wu, “Dense stereo matching based on propagation with a Voronoi diagram,” in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, vol. 22, 2002.
  10. M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computational stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 993–1008, 2003. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761–767, 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. C. Strecha, T. Tuytelaars, and L. van Gool, “Dense matching of multiple wide-baseline views,” in Proceedings of the 9th IEEE International Conference On Computer Vision, pp. 1194–1201, October 2003. View at Scopus
  13. Q. Chen and G. Medioni, “Volumetric stereo matching method: application to image-based modeling,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99), pp. 1029–1034, Fort Collins, Colo, USA, June 1999. View at Publisher · View at Google Scholar · View at Scopus
  14. M. Gong and Y. Yang, “Fast stereo matching using reliability-based dynamic programming and consistency constraints,” in Proceedings of the 9th IEEE International Conference on Computer Vision, pp. 610–617, October 2003. View at Scopus
  15. M. Lhuillier and L. Quan, “Match propagation for image-based modeling and rendering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1140–1146, 2002. View at Publisher · View at Google Scholar · View at Scopus
  16. H. Wu, Z. Song, J. Yao, L. Li, and Y. Gu, “Stereo matching based on support points propagation,” in Proceeding of IEEE International Conferemce on Information Science and Technology, pp. 23–25, IEEE, Hubei, China, March 2012. View at Publisher · View at Google Scholar
  17. G. Zeng, S. Paris, L. Quan, and F. Sillion, “Accurate and scalable surface representation and reconstruction from images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 141–158, 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. G. Zeng, S. Paris, L. Quan, and M. Lhuillier, “Surface reconstruction by propagating 3D stereo data in multiple 2D images,” in Proceedings of the European Conference on Computer Vision, pp. 163–174, 2004.
  19. J. Cech and R. Sara, “Efficient sampling of disparity space for fast and accurate matching,” in Proceedings of the International Workshop on Benchmarking Automated Calibration, Orientation, and Surface Reconstruction from Images, 2007.
  20. L. Wang, H. Jin, and R. Yang, “Search space reduction for MRF stereo,” in Proceedings of the European Conference on Computer Vision, 2008.
  21. L. Wang and R. Yang, “Global stereo matching leveraged by sparse ground control points,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3033–3040, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. C. Harris and M. Stephens, “A combined corner and edge detector,” in Proceedings of the 4th Alvey Vision Conference, pp. 147–151, 1988.
  23. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. View at Publisher · View at Google Scholar · View at Scopus
  24. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002. View at Publisher · View at Google Scholar · View at Scopus
  25. A. Geiger, M. Roser, and R. Urtasun, “Urban Scenes Dataset,” 2013, http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php.
  26. S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), pp. 519–526, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  27. M. Weber, M. Humenberger, and W. Kubinger, “A very fast census-based stereo matching implementation on a graphics processing unit,” in Proceedings of the 12th International Conference on Computer Vision Workshops (ICCV '09), pp. 786–793, IEEE, October 2009. View at Publisher · View at Google Scholar · View at Scopus
  28. A. Bensrhair, P. Miché, and R. Debrie, “Fast and automatic stereo vision matching algorithm based on dynamic programming method,” Pattern Recognition Letters, vol. 17, no. 5, pp. 457–466, 1996. View at Publisher · View at Google Scholar · View at Scopus
  29. S. Birchfield and C. Tomasi, “Depth discontinuities by pixel-to-pixel stereo,” International Journal of Computer Vision, vol. 35, no. 3, pp. 269–293, 1999. View at Publisher · View at Google Scholar · View at Scopus
  30. Y. Ohta and T. Kanade, “Stereo by intra- and inter- scanline search using dynamic programming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 2, pp. 139–154, 1985. View at Google Scholar · View at Scopus
  31. O. Veksler, “Stereo correspondence by dynamic programming on a tree,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), pp. 384–390, June 2005. View at Publisher · View at Google Scholar · View at Scopus
  32. O. D. Faugeras and R. Keriven, “Complete dense stereovision using level set methods,” in Proceedings of the European Conference on Computer Vision, June 1998.
  33. K. N. Kutulakos and S. M. Seitz, “Theory of shape by space carving,” International Journal of Computer Vision, vol. 38, no. 3, pp. 199–218, 2000. View at Publisher · View at Google Scholar · View at Scopus
  34. L. Alvarez, R. Deriche, J. Sánchez, and J. Weickert, “Dense disparity map estimation respecting image discontinuities: a PDE and scale-space based approach,” Journal of Visual Communication and Image Representation, vol. 13, no. 1-2, pp. 3–21, 2002. View at Publisher · View at Google Scholar · View at Scopus
  35. C. Strecha, R. Fransens, and L. van Gool, “Combined depth and outlier estimation in multi-view stereo,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2394–2401, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. S. M. Seitz and C. R. Dyer, “Photorealistic scene reconstruction by voxel coloring,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1067–1073, June 1997. View at Scopus
  37. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001. View at Publisher · View at Google Scholar · View at Scopus
  38. J. Yedidia, W. T. Freeman, and Y. Weiss, “Understanding belief propagation and its generalizations,” in Proceedings of the International Joint Conference on Artificial Intelligence, Distinguished Papers Track, 2001.
  39. V. Kolmogorov and R. Zabih, “Multi-camera scene reconstruction via graph cuts,” in Proceedings of the European Conference on Computer Vision, pp. 82–96, 2002.
  40. C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast cost-volume filtering for visual correspondence and beyond,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 3017–3024, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  41. P. Pritchett and A. Zisserman, “Wide baseline stereo matching,” in Proceedings of the 6th International Conference on Computer Vision, pp. 754–760, IEEE, January 1998. View at Scopus
  42. E. Tola, V. Lepetit, and P. Fua, “A fast local descriptor for dense matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008. View at Publisher · View at Google Scholar · View at Scopus
  43. T. Tuytelaars and L. V. Gool, “Wide baseline stereo matching based on local, affinely invariant regions,” in Proceedings of the British Machine Vision Conference, pp. 412–425, 2000.
  44. H. Bay, A. Ess, T. Tuytelaars, and L. van Gool, “Speeded-Up Robust Features (SURF),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008. View at Publisher · View at Google Scholar · View at Scopus
  45. K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615–1630, 2005. View at Publisher · View at Google Scholar · View at Scopus
  46. S. M. Smith and J. M. Brady, “SUSAN: a new approach to low level image processing,” International Journal of Computer Vision, vol. 23, no. 1, pp. 45–78, 1997. View at Publisher · View at Google Scholar · View at Scopus
  47. E. Tola, V. Lepetit, and P. Fua, “DAISY: an efficient dense descriptor applied to wide-baseline stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 815–830, 2010. View at Publisher · View at Google Scholar · View at Scopus
  48. R. M. Haralick and L. G. Shapiro, “Image segmentation techniques.,” Computer Vision, Graphics, & Image Processing, vol. 29, no. 1, pp. 100–132, 1985. View at Publisher · View at Google Scholar · View at Scopus
  49. G. Otto and T. Chau, “‘Region-growing’ algorithm for matching of terrain images,” Image and Vision Computing, vol. 7, no. 2, pp. 83–94, 1989. View at Publisher · View at Google Scholar · View at Scopus
  50. M. O'Neill and M. Denos, “Practical approach to the stereo matching of urban imagery,” Image and Vision Computing, vol. 10, no. 2, pp. 89–98, 1992. View at Publisher · View at Google Scholar · View at Scopus
  51. T. Kim and J. Muller, “Automated urban area building extraction from high resolution stereo imagery,” Image and Vision Computing, vol. 14, no. 2, pp. 115–130, 1996. View at Publisher · View at Google Scholar · View at Scopus
  52. M. Lhuillier and L. Quan, “A quasi-dense approach to surface reconstruction from uncalibrated images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 418–433, 2005. View at Publisher · View at Google Scholar · View at Scopus
  53. J. Kannala and S. S. Brandt, “Quasi-dense wide baseline matching using match propagation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007. View at Publisher · View at Google Scholar · View at Scopus
  54. Z. Megyesi, G. Kós, and D. Chetverikov, “Dense 3D reconstruction from images by normal aided matching,” Machine Graphics and Vision, vol. 15, no. 1, pp. 3–28, 2006. View at Google Scholar · View at Scopus
  55. J. Čech, J. Sanchez-Riera, and R. Horaud, “Scene flow estimation by growing correspondence seeds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 3129–3136, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  56. C. Vogel, S. Roth, and K. Schindler, “Piecewise rigid scene flow,” in Proceedings of the International Conference on Computer Vision, 2013.
  57. K. Yamaguchi, D. McAllester, and R. Urtasun, “Robust monocular epipolar flow estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.
  58. M. J. Atallah, “Faster image template matching in the sum of the absolute value of differences measure,” IEEE Transactions on Image Processing, vol. 10, no. 4, pp. 659–663, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  59. K. Mikolajczyk, T. Tuytelaars, C. Schmid et al., “A comparison of affine region detectors,” International Journal of Computer Vision, vol. 65, no. 1-2, pp. 43–72, 2005. View at Publisher · View at Google Scholar · View at Scopus
  60. J. S. Beis and D. G. Lowe, “Shape indexing using approximate nearest-neighbour search in high-dimensional spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1000–1006, June 1997. View at Scopus
  61. “Edge Detection and Image Segmentation (EDISON) System,” 2014, http://coehttp://www.rutgers.edu/riul/research/code/EDISON/doc/overview.html.
  62. J. Cech and R. Sara, Cech GCS Dataset, 2013, http://cmp.felk.cvut.cz/~cechj/GCS/.
  63. D. Scharstein and R. Szeliski, “Middlebury Stereo Matching Benchmark,” 2013, http://vision.middlebury.edu/stereo/.
  64. Q. Yang, L. Wang, and N. Ahuja, “A constant-space belief propagation algorithm for stereo matching,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 1458–1465, San Francisco, Calif, USA, June 2010. View at Publisher · View at Google Scholar · View at Scopus