Local Stereo Matching Using Adaptive Local Segmentation

Damjanović, Sanja; van der Heijden, Ferdinand; Spreeuwers, Luuk J.

doi:https://doi.org/10.5402/2012/163285

International Scholarly Research Notices

On this page

Abstract Introduction Discussion Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2012 | Article ID 163285 | https://doi.org/10.5402/2012/163285

Local Stereo Matching Using Adaptive Local Segmentation

Sanja Damjanović,¹Ferdinand van der Heijden,¹and Luuk J. Spreeuwers¹

Academic Editor: A. Bandera, E. Davies, Y. Zhuge, S. Mattoccia, B. K. Gunturk

Received23 Mar 2012

Accepted03 May 2012

Published23 Aug 2012

Abstract

We propose a new dense local stereo matching framework for gray-level images based on an adaptive local segmentation using a dynamic threshold. We define a new validity domain of the frontoparallel assumption based on the local intensity variations in the 4 neighborhoods of the matching pixel. The preprocessing step smoothes low-textured areas and sharpens texture edges, whereas the postprocessing step detects and recovers occluded and unreliable disparities. The algorithm achieves high stereo reconstruction quality in regions with uniform intensities as well as in textured regions. The algorithm is robust against local radiometrical differences and successfully recovers disparities around the objects edges, disparities of thin objects, and the disparities of the occluded region. Moreover, our algorithm intrinsically prevents errors caused by occlusion to propagate into nonoccluded regions. It has only a small number of parameters. The performance of our algorithm is evaluated on the Middlebury test bed stereo images. It ranks highly on the evaluation list outperforming many local and global stereo algorithms using color images. Among the local algorithms relying on the frontoparallel assumption, our algorithm is the best-ranked algorithm. We also demonstrate that our algorithm is working well on practical examples as for disparity estimation of a tomato seedling and a 3D reconstruction of a face.

1. Introduction

Stereo matching has been a popular topic in computer vision for more than three decades, ever since one of the first papers appeared in 1979 [1]. Stereo images are two images of the same scene taken from different viewpoints. Dense stereo matching is a correspondence problem with the aim to find for each pixel in one image the corresponding pixel in the other image. A map of all pixel displacements in an image is a disparity map. To solve the stereo correspondence problem, it is common to introduce constraints and assumptions, which regularize the stereo correspondence problem.

The most common constraints and assumptions for stereo matching are the epipolar constraint, the constant brightness or the Lambertian assumption, the uniqueness constraint, the smoothness constraint, the visibility constraint and the ordering constraint [2–4]. Stereo correspondence algorithms belong to one of two major groups, local or global, depending on whether the constraints are applied to a small local region or propagated throughout the whole image. Local stereo methods estimate the correspondence using a local support region or a window [5, 6]. Local algorithms generally rely on an approximation of the smoothness constraint assuming that all pixels within the matching region have the same disparity. This approximation of the smoothness constraint is known as the frontoparallel assumption. However, the frontoparallel assumption is not valid for highly curved surfaces or around disparity discontinuities. Global stereo methods consider stereo matching as a labeling problem where the pixels of the reference image are nodes and the estimated disparities are labels. An energy functional embeds the matching assumptions by its data, smoothness, and occlusion terms and propagates them along the scan line or through the whole image. The labeling problem is solved by energy functional minimization, using dynamic programming, graph cuts, or belief propagation [7–9]. A recent review of both local and global stereo vision algorithms can be found in [10].

Algorithms based on rectangular window matching give an accurate disparity estimation provided the majority of the window pixels belong to the same smooth object surface with only a slight curvature or inclination relative to the image plain. In all other cases, window-based matching produces an incorrect disparity map: the discontinuities are smoothed, and the disparities of the high-textured surfaces are propagated into low-textured areas [11]. Another restriction of window-based matching is the size of objects of which the disparity is to be determined. Weather the disparity of a narrow object can be correctly estimated depends mostly on the similarity between the occluded background, visible background, and object [12]. Algorithms which use suitably shaped matching areas for cost aggregation result in a more accurate disparity estimation, [13–18]. The matching region is selected using pixels within certain fixed distances in RGB, CIELab color space, and/or Euclidean space.

To alleviate the frontoparallel assumption, some approaches allow the matching area to lie on the inclined plane, such as in [19, 20]. The alternative to the idea that properly shaped areas for cost aggregation can result in more accurate matching results is to allocate different weights to pixels in the cost aggregation step. In [21], the pixels closer in the color space and spatially closer to the central pixel are given proportionally more significance, whereas, in [22], the additional assumption of connectivity plays a role during weight assignment.

Our stereo algorithm belongs to the group of local stereo algorithms. Within the stereo framework, we rely on some standard and some modified matching constraints and assumptions. We use the epipolar constraint to convert the stereo correspondence into a one-dimensional problem. However, we modify the interpretation of the frontoparallel assumption and the Lambertian constraint. A novel interpretation of the frontoparallel assumption is based on local intensity variations. By adaptive local segmentation in both matching windows, we constrain the frontoparallel assumption only to the intersection of the central matching segments of the initial rectangular window. This mechanism prevents the propagation of the matching errors caused by occlusion and enables an accurate disparity estimation for narrow objects. The algorithm estimates correctly disparities of both textured as well as textureless surfaces, and disparities around depth discontinuities, disparities of the small as well as large objects independently of the initial window size. We apply the Lambertian constraint to local intensity differences and not to the original gray values of the pixels in the segment. In the postprocessing step, we apply the occlusion constraint without imposing the ordering constraint, which enables successful disparity estimation for narrow objects. Also, our stereo algorithm is suitable for a fast real-time implementation, because it is local algorithm for gray-valued images which uses a local segmentation and only a small subset of window pixels for cost calculation.

Our main contribution is the introduction of the relationship between the frontoparallel assumption and the local intensity variation and its applications to the stereo matching. In addition, we introduce a preprocessing step that smoothes low-textured areas and sharpens texture edges producing the image more favorable for a proper local adaptive segmentation.

The paper is organized as follows: in Section 2, we explain our stereo matching framework: the preprocessing step, the adaptive local segmentation, the matching region selection, the stereo matching, and the postprocessing step; in Section 3, we show and discuss the results of our algorithm on different stereo images; in Section 4, we draw conclusions.

2. Stereo Algorithm

Our algorithm consists of three steps: a preprocessing step, a matching step, and a postprocessing step. The flow chart of the algorithm is shown in Figure 1. Input to the algorithm is a pair of rectified stereo images and , where one of them, for instance , is considered as the reference image. For each pixel in the reference image, we perform matching along the epipolar line for each integer-valued disparity within the disparity range. Firstly, the input images are preprocessed, as explained in Section 2.1. The preprocessing step is applied to each image individually. Next, we calculate the local intensity variations maps for the preprocessed images and used them to determine the dynamic threshold for adaptive local segmentation, and elaborated in Section 2.2. Further, the stereo matching comprises a final region selection from segments, a matching cost calculation for all disparities from the disparity range and disparity estimation by a modification of the winner-take-all estimation method, see Section 2.3. The result of the matching is two disparity maps, and , corresponding to the left and right images of the stereo pair. Finally, postprocessing step calculates the final disparity map corresponding to the reference image as described in Section 2.4.

2.1. Preprocessing

We apply a nonlinear intensity transformation to the input images in order to make them more suitable for adaptive local segmentation. The presence of the Gaussian noise and the sampling errors in image can produce erroneous segments for matching. The noise is dominant in the low-textured and uniform regions, while the sampling errors are pronounced in the high-textured image regions. The sampling effects can be tackled by choosing a cost measure insensitive to sampling as in [23], or by interpolating the cost function as in [24]. We handle these problems differently and within the preprocessing step. The applied transformation suppresses the noise in low-textured regions while simultaneously suppressing the sampling effects in the high-textured regions.

The transformation is based on the interpolated subpixel samples by bicubic transform in the 4 neighborhoodS and by consistently replacing the central pixel value by maximum or by minimum value of the set, depending on the relation between the mean and the median of the set. We form a set of samples of the observed pixel at the position and the intensities in horizontally and vertically interpolates image at the subpixel level at :

The intensity transformation is performed by replacing the intensity with the new intensity as

All intensity values are corrected in the same manner. If the pixel intensity differs significantly from its four neighbors, as in the high-textured regions, it will be replaced by the maximum value in the interpolated subpixel set , resulting in the sharpening effect. On the other hand, in low-textured regions, the intensity change is small, and replacing the initial intensity value systematically with the minimum value of the interpolated subpixel set , it produces the favorable denoising effect. These positive effects originate from the image resampling done by bicubic interpolation, because the bicubic interpolation exhibits overshoots at locations with large differences between adjacent pixels, see Chapter in [25] and Chapter in [26]. These favorable effects are lacking if the interpolation method is linear.

We illustrate the effect of the preprocessing step for an image from a stereo pair from the Middlebury evaluation database in Figure 2. Therefore, the preprocessing step modifies regions with high-intensity variations and results in the sharper image. Further, in Section 3, we show the influence of this step to overall algorithm score.

(a)

(b)

(c)

2.2. Adaptive Local Segmentation

Adaptive local segmentation establishes a new relationship between the local intensity variation and the frontoparallel assumption applied to stereo matching. Adaptive local segmentation selects a central subset of pixels from a large rectangular window for which we assume that the frontoparallel assumption holds for the segment. The segment contains the central window pixel and pixels, spatially connected to the central pixel, whose intensities lie within the dynamic threshold from the intensity of the central window. Starting from the segment, we form a final region selection for matching, see Section 2.3.

The idea behind the adaptive local segmentation is to prevent that the matching region contains the pixels with significantly different disparities prior to actually estimating disparity. We accomplish this aim by conveniently choosing threshold for segmentation based on the local texture. If local texture is uniform with local intensity variations caused only by the Gaussian noise, we opt for a small threshold value. In this way, because the intensity variations are small, the segment will comprise the whole uniform region. We assume that these pixels originate from the smooth surface of one object and therefore that the frontoparallel assumption holds for the segment. On the other hand, if the window is textured, that is, intensity variations are significantly larger than the noise level, it is not possible to distinguish based only on the pixel intensities and prior to matching, whether the pixels originate from one textured object or from several different objects at different distances from the camera. In this case, relying on the high texture for an accurate matching result, it is good to select small segment in order to assure that the segment contains pixels from only one object and does not contain depth discontinuity. Due to the high local intensity variations, this is achieved by large threshold.

We introduce local intensity variation measure in order to determine the level of local texture and subsequently the dynamic threshold. We define the local intensity variation measure as a sharpness of local edges in the 4 neighborhoods of the central window pixel. The sharper local edges are, the larger is the local intensity variation. We calculate the local intensity variation using the maximum of the first derivatives in the horizontal and the vertical directions at the half-pixel interpolated image by benefiting again from overshooting effect of the bicubic interpolation.

The horizontal central difference for a pixel at the position in image is calculated as where and are horizontal half-pixel shifts of image to the left and to the right. The vertical central difference for a pixel at the position in image is calculated as where and are vertical half-pixel shifts of image . We define the intensity variation measure as

We divide local intensity variations into four ranges based on the preselected constant and define a dynamic threshold for each range by a look-up table:

Figure 3 shows a color-coded dynamic threshold map, or equivalently local intensity variation ranges, for the left image from Tsukuba stereo pair from the Middlebury stereo evaluation set [27].

The dynamic threshold defined by (6) for the reference pixel in the reference image is also used for the adaptive local segmentation in the nonreference image for all potentially corresponding pixels from the disparity range.

The adaptive local segmentation pseudocode for the reference pixel in the left image is given by Algorithm 1. The segmentation is performed for reference and nonreference windows independently using the same threshold . Thus, in the window, where , around the pixel at the position in the reference image, we declare that the pixel at position, where , in the reference window, belongs to the segment if its gray value differs from the central pixel's gray value for less than the dynamic threshold . The segment pixels in the nonreference window are chosen in similar way using the same threshold . Next, the central connected components in the dilated masks are selected. The final segments are defined by the binary maps, and , with ones if the pixels belong to the segment. Dilation is performed by squared structured element to include additional neighbor pixels into segments and to merge isolated but close-selected pixels.

Step : Dynamic thresholding
for to do
for to do
if then
set to
end if
end for
end for
Step : Dilation
Dilate with squared structured element
Step : Imposing connectivity
for to do
for to do
if and not connected to then
set to
end if
end for
end for

2.3. Stereo Correspondence

The matching region is defined by the overlap of the adaptive local segments in the reference and nonreference windows. Thus, the matching region is defined by binary map , which has ones if and only if both binary maps, and , have ones at the same positions, as given in Algorithm 2.

for to do
for to do
if then
set to
end if
end for
end for

We assume that the corresponding pixels have similar intensities and that the differences exist only due to the Gaussian noise with the variance . One-dimensional vectors, and , are formed from the pixels from the left and right matching window at positions of ones within the binary map . Besides the noise, differences between vectors can occur due to different offsets and due to occlusion. To make the matching vectors insensitive to local different offsets, we subtract the central pixel values and from vectors and , given by Algorithm 3. In this way, the intensity information is transformed from the absolute intensities to the differences of intensities with respect to the central window pixels. Further, we impose the Lambertian assumption on the pixels after the central pixel subtraction and not on the original pixel intensities. To prevent the occlusion influence in matching we eliminate the occlusion outliers by keeping only the coordinates of vectors which differ for less than threshold as given by Algorithm 4.

is the length of the vectors and
and are the central intensities in the left and in the right window
for to do


end for

is the length of the initial vectors and

for to do
if then
Remove and
end if
end for
is the length of the final vectors and

We calculate the matching cost using the sum of squared differences (SSDs) [7, 28]. To compare the costs with different length of vectors and for different disparities, we introduce the normalized SSD: where is the length of vectors and for disparity .

The winner-take-all (WTA) method selects the disparity with the minimal cost for the observed reference pixel. In our algorithm, besides the cost, the number of pixels participating in the cost calculation is also an indication of a correspondence. This ordinal measure cannot be used directly in the disparity estimation, because it is not always a reliable indication of the correspondence as in the case of occlusion. If the number of pixels used in the cost calculation is very low, it may be due to occlusion. However, a reliable match has a substantial ordinal support.

We combine the cost and the number of participating pixels in the disparity estimation and introduce a hybrid WTA; we consider only disparities supported by a sufficient number of pixels as potential candidates for a disparity estimate. Thus, the final disparity estimate is chosen from a subset of the all possible disparities from the disparity range. We term these disparity candidates as the reliable disparity candidates [13, 29].

The reliable disparity candidates have at least supporting pixels, where is a set containing the number of pixels participating in the cost aggregation step for each possible disparity value from the disparity range . is the ratio coefficient . The estimated disparity is where and , for image of the dimension pixels, and belongs to the set of all possible disparities from the disparity range .

The final result of the hybrid WTA is the disparity map

We calculate two disparity maps, one disparity map, , with the left image as the reference, and the other, , as the right image as the reference.

2.4. Postprocessing

In the postprocessing, we detect the disparity errors and correct them. There are some areas of incorrect disparity values caused by low-textured areas larger than the initial window. There are some isolated disparity errors with significantly different disparity from the neighborhood disparities, so called outliers, caused by isolated pixels or groups of several pixels if the adaptive local segmentation did not result in sufficiently large segment due to high local intensity variation. Also, there are disparity errors caused by occlusion. Although the matching procedure is the same for both occluded and nonoccluded pixels, our stereo matching algorithm does not propagate error caused by occlusions because the boundaries of objects are taken into account by both the adaptive local segmentation and the final matching region selection. However, occluded pixels do not have corresponding pixels, and the estimated disparities for the occluded pixels are incorrect.

The postprocessing consists of several steps including median filtering of the initial disparity maps, disparity refinement of the individual disparity maps, consistency check, and propagation of the reliable disparities.

First, we apply median filter to both disparity maps, and , and eliminate disparity outliers. Second, we refine the filtered disparity maps individually to correct low-textured areas with erroneous disparities, in an iterative procedure. The refinement step propagates disparities by histogram voting to the regions with close intensities defined by a look-up table given in (10) across the whole image as illustrated in propagation scheme in Figure 4. Some similar notions to this approach appear separately in the literature, [18, 30], and we were inspired by them. In [30], the cost aggregation is done along the radial directions in disparity space, while in [18], histogram voting is used within the segment for disparity refinement. We refine our disparity maps by histogram voting of accumulating disparities along radial directions across the whole disparity map with constraint of the maximum allowed intensity difference with the pixel being refined. The maximum intensity difference is defined by a dynamic threshold with the same logic behind as in local intensity variation measure in Section 2.2, with the difference that here we distinguish three ranges of intensity differences. Thus, the histogram is formed using disparities of the pixels with close intensities along radial directions, see Figure 4 and Table 1. The pixels are close in intensities, and their disparities are taken into account in histogram forming if they lie within the threshold from the intensity of the pixel at the observed position . The threshold is selected based on a look-up table:

The histogram with a number of bins equal to the number of disparities within the disparity range is formed by counting the disparities along radial directions for the pixels whose intensity is within threshold : where and are given by Table 1.

We calculate disparity as a disparity of the normalized histogram maximum: The initial disparity is replaced by the new value if it is significantly supported, that is, if the normalized histogram value is greater than ; otherwise, it is left unchanged: where , , is a significance threshold. The steps given by (11), (12), (13), and (14) are repeated iteratively until there are no more updates to disparities in the map.

Next, we detect occluded disparities by the consistency check between two disparity maps:

If the condition in (15) is not satisfied for disparity , we declare it as inconsistent and eliminate it from the disparity map. The missing disparities are filled in by an iterative refinement procedure similar to the previously applied procedure for the disparity propagation by histogram voting. In the iterative step to fill in the inconsistent disparities, we use the threshold look-up table (10) as in the disparity refinement step. We calculate the histogram of the consistent disparities with close intensities along radial directions as given by (11) and (12). The missing disparity is filled in with the disparity with the largest support in the histogram, provided that the histogram is not empty. The remaining unfilled inconsistent disparities, and we fill in by the disparity of the nearest neighbor with known disparities with the smallest intensity differences. As a last step in the postprocessing, we apply median filter to obtain the final disparity map.

3. Experiments and Discussion

We have used the Middlebury stereo benchmark [4] to evaluate the performance of our stereo matching algorithm. The parameters of the algorithm are fixed for all four stereo pairs as required by the benchmark. There are five free parameters in our algorithm. The threshold value is set to . The half-window size is , and the window size is where . The noise variance is a small and constant scaling factor in (7). The ratio coefficient in hybrid WTA is . In the postprocessing step, the median filter parameter is , and the significance threshold in histogram voting is .

Figure 5 shows results for all four stereo pairs from the Middlebury stereo evaluation database: Tsukuba, Venus, Teddy, and Cones. The leftmost column contains the left images of the four stereo pairs. The ground truth (GT) disparity maps are shown in the second column, the estimated disparity maps are shown in the third column, and the error maps are shown in the forth column. In the error maps, the white regions denote correctly calculated disparity values which do not differ for more than from the ground truth. If the estimated disparity differs for more than from the ground truth value, it is marked as an error. The errors are shown in black and gray, where black represents the errors in the nonoccluded regions, and gray represents errors in the occluded regions. The quantitative results in the Middlebury stereo evaluation framework are presented in Table 2.

The results show that our stereo algorithm preserves disparity edges. It estimates successfully the disparities of thin objects and successfully deals with subtle radiometrical differences between images of the same stereo pair. Occlusion errors are not propagated, and occluded disparities are successfully filled in the postprocessing step. A narrow object is best visible in the Tsukuba disparity map (the lamp construction) and in Cones disparity map (pens in a cup in the lower right corner). Our algorithm correctly estimates disparities of both textureless and textured surfaces, for example, the examples of large uniform surfaces in stereo pairs Venus and Teddy are successfully recovered.

The images in the Middlebury database have different sizes, different disparity ranges, and different radiometric properties. The stereo pairs Tsukuba, pixels, and Venus, 434 × 383 pixels, have disparity ranges from 0 to 15 and from 0 to 19. The radiometric properties of the images in these stereo pairs are almost identical, and the offset compensation given by Algorithm 3 is not significant for these two example pairs, as we demonstrated in [13]. As required by the Middlebury evaluation framework, we apply the offset compensation to all four stereo pairs. The stereo pairs Teddy, pixels, and Cones, pixels, have disparity ranges from 0 to 59. The images of these stereo pairs are not radiometrically identical, and the offset compensation successfully deals with these radiometrical differences [13].

The error percentages together with ranking in the Middlebury evaluation online list are given in Table 2. The numbers show error percentages for nonoccluded regions (NONOCC), discontinuity regions (DISC), and the whole (ALL) disparity map. The overall ranking of our algorithm in the Middlebury evaluation table of stereo algorithms is the 28th place out of evaluated algorithms. Thus, our stereo algorithm outperforms many local as well as global algorithms. Among the algorithms ranked in the Middlebury stereo evaluation, there are only two local algorithms ranked higher than our algorithm, but both of them do not impose the frontoparallel assumption strictly: a local matching method using image geodesic-supported weights GeoSup [5] and a matching approach with slanted support windows PatchMatch [31]. Both of these algorithms use colored images, while our algorithm works with intensity images and achieves comparable results. Although these approaches have better general ranking in the Middlebury stereo evaluation list, our approach with matching based on frontoparallel regions outperforms the PatchMatch algorithm for Tsukuba stereo pair, and the GeoSup algorithm for Tsukuba, Teddy, and Cones stereo pairs. Thus, our approach with region selection by threshold produces more accurate disparity maps for cluttered scenes than GeoSup algorithm with region selection using geodesic support weights.

To investigate the contribution of the preprocessing and the postprocessing steps to the overall result, we show in Table 3 the results we obtained on the benchmark stereo pairs with or without the preprocessing and the postprocessing steps in the algorithm. We show the results if neither, only one, and both steps are applied. If our postprocessing step was omitted, the median filter was applied. From the results in Table 3, we conclude that both steps, if individually applied, improve the qualities of the final disparity maps. If we apply both steps, the accuracy of the disparity maps is the highest. Furthermore, the improvement contribution of the preprocessing step is greater than the postprocessing step only for Venus stereo pair. This is because the sampling effects were most pronounced in Venus scene. In addition, we show in Figure 6 the disparity maps for Tsukuba stereo pair for all four combinations: if the preprocessing and the postprocessing steps are included or not in the algorithm. We conclude that the preprocessing step plays a significant role in accurate disparity estimation of textureless areas, while the postprocessing step especially helps in an accurate estimation of disparity discontinuities.

To illustrate the subtle features of our algorithm not captured in the standard test bed images, and we apply our stereo algorithm, while retaining the parameter values, on some other images from the Middlebury site in Figure 7. For two other stereo pairs, Art and Dolls, we show the left images of two stereo pairs in the leftmost column. The ground truth (GT) disparity maps are in the second column. The third column shows our estimation of the disparity maps. The fourth column shows the error maps with regard to the ground truth. The algorithm successfully recovers the disparities of very narrow structures as in Art disparity map. The disparity of the cluttered scene is successfully estimated, as in Dolls disparity map.

Next, we demonstrate that the presented local stereo algorithm works well on practical problems. Examples of disparity map estimation and 3D reconstruction of a face are shown for stereo pair Sanja in Figure 8. The disparity map estimation of a plant in stereo pair Tomato seedling is shown in Figure 9. The parameters of the algorithm are kept the same as in the previous examples. Thus, our algorithm successfully estimates the disparity of the smooth low-textured objects and is suitable also for application to 3D face reconstruction, Figure 8(d). Our algorithm also successfully estimated the disparity map of the tomato seedling. Tomato seedling stereo images represent a challenging task for a stereo matching algorithm in general, because the viewpoints significantly differ and the structure of the plant is narrow, that is, much smaller than the window dimension.

As far as the initial window size is concerned, our algorithm is not influenced by the window size above certain size. In principle, we could apply our algorithm using the whole image as the initial window around the reference pixel. This would result in a sufficiently large region selection for uniform regions in the image and make the ordinal measure within the hybrid WTA more reliable. On the other hand, in matching windows with high local intensity variations, the selected region is always significantly smaller than the window and does not change if the window is enlarged because of the connectivity constraint with the reference central pixel.

4. Conclusion

In our local stereo algorithm, we have introduced a new approach for stereo correspondence based on the adaptive local segmentation by a dynamic threshold so that the frontoparallel assumption holds for a segment. Further, we have established a relationship among the local intensity variation in an image and the dynamic threshold. We have applied the novel preprocessing procedure on both stereo images to eliminate the influence of noise and sampling artifacts. The mechanism for the final matching region selection prevents error propagation due to disparity discontinuities and occlusion. In the postprocessing step, we introduce a new histogram voting procedure for disparity refinement and for filling in the eliminated inconsistent disparities. Although the starting point in matching is the large rectangular window, disparity of narrow structures is accurately estimated.

We evaluated our algorithm on the stereo pairs from the Middlebury database. It ranks highly on the list, outperforming many local and global algorithms that use color information while we use only intensity images. Our algorithm is the best performing algorithm in the class of local algorithms which use intensity images and the frontoparallel assumption without weighting the intensities of the matching region. Furthermore, our algorithm matches textureless as well as textured surfaces equally well, handles well the local radiometric differences, preserves edges in disparity maps, and successfully recovers the disparity of thin objects and the disparities of the occluded regions. We demonstrated the performance of our algorithm on two additional examples from the Middlebury database and on two practical examples. The results on this additional examples show that the disparity maps of scenes of different natures are successfully estimated: smooth low-textured objects as well as textured cluttered scenes, narrow structures, and textureless surfaces. Moreover, our algorithm has also other positive aspects making it suitable for real-time implementation: it is local; it has just five parameters; intensity variations are locally calculated, and there is no global segmentation algorithm involved.

References

D. Marr and T. Poggio, “A computational theory of human stereo vision,” Proceedings of the Royal Society of London, vol. 204, no. 1156, pp. 301–328, 1979.
View at: Google Scholar
M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computational stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 993–1008, 2003.
View at: Publisher Site | Google Scholar
O. D. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press, 1993.
D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002.
View at: Publisher Site | Google Scholar
A. Hosni, M. Bleyer, M. Gelautz, and C. Rhemann, “Local stereo matching using geodesic support weights,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '09), pp. 2093–2096, November 2009.
View at: Publisher Site | Google Scholar
K. Zhang, J. Lu, and G. Lafruit, “Cross-based local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
View at: Publisher Site | Google Scholar
P. N. Belhumeur, “A bayesian approach to binocular stereopsis,” International Journal of Computer Vision, vol. 19, no. 3, pp. 237–260, 1996.
View at: Google Scholar
Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
View at: Publisher Site | Google Scholar
J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, 2003.
View at: Publisher Site | Google Scholar
N. Lazaros, G. C. Sirakoulis, and A. Gasteratos, “Review of stereo vision algorithms: from software to hardware,” International Journal of Optomechatronics, vol. 2, no. 4, pp. 435–462, 2008.
View at: Publisher Site | Google Scholar
C. Lawrence Zitnick and T. Kanade, “A cooperative algorithm for stereo matching and occlusion detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. 675–684, 2000.
View at: Google Scholar
H. Hirschmüller, P. R. Innocent, and J. Garibaldi, “Real-time correlation-based stereo vision with reduced border errors,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 229–246, 2002.
View at: Publisher Site | Google Scholar
S. Damjanović, F. van der Heijden, and L. J. Spreeuwers, “Sparse window local stereo matching,” in Proceedings of the International Conference on Computer Vision Theory and Application (VISAPP '11), pp. 689–693, March 2011.
View at: Google Scholar
R. K. Gupta and S. Y. Cho, “Real-time stereo matching using adaptive binary window,” in Proceedings of the 5th International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT '10), 2010.
View at: Google Scholar
F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classification and evaluation of cost aggregation methods for stereo correspondence,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, June 2008.
View at: Publisher Site | Google Scholar
X. Sun, X. Mei, S. Jiao, M. Zhou, and H. Wang, “Stereo matching with reliable disparity propagation,” in Proceedings of the IEEE International Conference on 3D Digital Imaging, Modeling, Processing, Visualisation and Transmittion (3DIMPVT '11), 2011.
View at: Google Scholar
K. Zhang, J. Lu, and G. Lafruit, “Scalable stereo matching with locally adaptive polygon approximation,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '08), pp. 313–316, October 2008.
View at: Publisher Site | Google Scholar
K. Zhang, J. Lu, G. Lafruit, R. Lauwereins, and L. Van Gool, “Accurate and efficient stereo matching with robust piecewise voting,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '09), pp. 93–96, IEEE Press, Piscataway, NJ, USA, July 2009.
View at: Publisher Site | Google Scholar
M. Bleyer, C. Rother, and P. Kohli, “Surface stereo with soft segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 1570–1577, June 2010.
View at: Publisher Site | Google Scholar
H. Tao, H. S. Sawhney, and R. Kumar, “A global matching framework for stereo computation,” in Proceedings of the 8th International Conference on Computer Vision, pp. 532–539, July 2001.
View at: Google Scholar
K. J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, 2006.
View at: Publisher Site | Google Scholar
A. Hosni, M. Bleyer, M. Gelautz, and C. Rhemann, “Geodesic adaptive support weight approach for localstereo matching,” in Proceedings of the Computer Vision Winter Workshop, pp. 60–65, 2010.
View at: Google Scholar
S. Birchfield and C. Tomasi, “Depth discontinuities by pixel-to-pixel stereo,” International Journal of Computer Vision, vol. 35, no. 3, pp. 269–293, 1999.
View at: Publisher Site | Google Scholar
R. Szeliski and D. Scharstein, “Sampling the Disparity Space Image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 419–425, 2004.
View at: Publisher Site | Google Scholar
Q. Wu, F. A. Merchant, and K. R. Castleman, Microscope Image Processing, Academic Press, 2008.
R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital Image Processing Using MATLAB, Gatesmark Publishing, 2nd edition, 2009.
Middlebury stereo, March 2012, http://vision.middlebury.edu/stereo/.
I. J. Cox, “Maximum likelihood N-camera stereo algorithm,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 733–739, June 1994.
View at: Google Scholar
S. Damjanović, F. Van Der Heijden, and L. J. Spreeuwers, “Sparse window local stereo matching,” in Proceedings of the International Workshop on Computer Vision Applications (CVA '11), pp. 83–86, 2011.
View at: Google Scholar
H. Hirschmüller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
View at: Publisher Site | Google Scholar
B. Bleyer, C. Rhemann, and C. Rother, “Patchmatch stereo—stereo matching with slanted support windows,” in Proceedings of the British Machine Vision Conference, 2011.
View at: Google Scholar

Copyright

Copyright © 2012 Sanja Damjanović et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3369

Downloads

3204

Citations