Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2014, Article ID 127284, 9 pages
http://dx.doi.org/10.1155/2014/127284
Research Article

Efficient Stereo Matching with Decoupled Dissimilarity Measure Using Successive Weighted Summation

1School of Electronic Information Engineering, TianJin University, TianJin 300072, China
2Department of Mechanical Engineering, Chang Gung University, Taoyuan 33302, Taiwan
3Department of Neurosurgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan

Received 31 October 2013; Accepted 16 December 2013; Published 16 January 2014

Academic Editor: Yi-Hung Liu

Copyright © 2014 Cheng-Tao Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Developing matching algorithms from stereo image pairs to obtain correct disparity maps for 3D reconstruction has been the focus of intensive research. A constant computational complexity algorithm to calculate dissimilarity aggregation in assessing disparity based on separable successive weighted summation (SWS) among horizontal and vertical directions was proposed but still not satisfactory. This paper presents a novel method which enables decoupled dissimilarity measure in the aggregation, further improving the accuracy and robustness of stereo correspondence. The aggregated cost is also used to refine disparities based on a local curve-fitting procedure. According to our experimental results on Middlebury benchmark evaluation, the proposed approach has comparable performance when compared with the selected state-of-the-art algorithms and has the lowest mismatch rate. Besides, the refinement procedure is shown to be capable of preserving object boundaries and depth discontinuities while smoothing out disparity maps.

1. Introduction

Stereo vision is the technique of constructing a 3D description of the scene from stereo image pairs, which is important in many computer vision tasks such as inspection [1], 3D object recognition [2], robot manipulation [3], and autonomous navigation [4]. Stereo vision systems can be active or passive. Active techniques utilize ultrasonic transducers and structured light or laser to simplify the stereo matching problem. On the other hand, passive stereo vision based only on stereo image pairs is less intrusive and typically able to provide a compact and affordable solution for range sensing.

For passive stereo vision systems, stereo matching algorithms are crucial for correct and accurate depth estimation, which find for each pixel in one image the corresponding pixel in the other image. A 2D picture of displacements between corresponding pixels of a stereo image pair is named as a disparity map [5].

Reference [6] is an intensively cited classification of stereo matching algorithms for rectified image pairs. The paper divides most of the algorithms into four sequential parts: matching cost calculation, cost aggregation, disparity computation, and disparity refinement. Among the steps, cost aggregation determines the performance of an algorithm in terms of computational complication and correctness.

Cost aggregation can be local [712] or global [1316], based on differences in the range of supporting regions or windows. Global methods assume that the scene is piecewise smooth and search for disparity assignments over the whole stereo pair [6], which requires high computational operation. The local methods, also known as window-based, typically require less memory and computation. As a result, the window-based algorithms are popular for fast disparity calculations [17].

Local methods tend to be sensitive to noise; however, and its correctness at regions with sparse texture or near depth discontinuities relies on proper selection of window size. To overcome this problem, [7] proposed variable windows for matching calculation, while [8] proposed multiple windows to enhance correctness at regions near depth discontinuities. Nevertheless, performance of these approaches is limited, since same aggregation weights are applied over the windows.

Recent years have seen adaptive support weight approaches [9] to improve quality of disparity maps. Unfortunately, these approaches require independent support weights calculation for each pixel and dramatically increase computational complexity.

To simplify computation, [10] introduced joint histogram to reduce the search region of disparity and [11] proposed the usage of a sparse Census mask. A summed normalized crosscorrelation was proposed in [12] to calculate matching cost in two stages. Segmentation and plane fitting on disparity planes [1316] are also popular to improve accuracy of disparity, but the performance relies on correctness in both segmentation and plane fitting.

An effective local stereo matching algorithm is introduced in [18], which significantly simplifies the intensity-dependent aggregation procedure of local methods. The algorithm aggregates cost values effectively in terms of bilateral filtering by only four passes along regions, called separable successive weighted summation (SWS), eliminating iteration and support area dependency. However, the dissimilarity measures are coupled, which significantly restrict the flexibility in weighting the aggregated costs.

In this paper, we present an improved stereo matching algorithm. Similar to [18], our algorithm uses whole regions as matching primitives to assess disparity based on SWS among horizontal and vertical directions. We also use the basic metrics, such as truncated sum of absolute difference and truncated absolute gradient difference, as dissimilarity measure to provide a trade-off between accuracy and complexity.

The main contribution of this paper is to afford a decoupled aggregation algorithm to access the stereo matching cost under the framework of SWS. The algorithm is simply yet efficient as well as robust. In addition, the resultant disparity map is in a discrete space, which is unsuitable for image-based rendering. We propose a subpixel refinement technique that employs inferior candidate disparities, rather than spatial neighbors, to smooth out discrete values in the disparity map. By this arrangement for curve-fitting, even regions near discontinuous depth can be correctly refined. Moreover, this technique increases the resolution of a stereo algorithm with marginal additional computation.

2. Aggregation Algorithm Design

Our stereo matching algorithm consists of three main stages. First, initial cost values are calculated based on dissimilarity measure between pixels in the reference and target images, and the costs are aggregated using the proposed method. Second, we perform initial disparity estimation by the use of a winner-takes-all minimum search based on the aggregated costs. Third, we check differences between the disparity values of corresponding pixel pairs for the existence of obscured regions and patch them by the smallest disparity values of nearby regions. Finally, the disparity map is refined by a proposed curve-fitting procedure.

2.1. Cost Definition

Assuming that the image pair is rectified and horizontally aligned, two dissimilarity measures between the pixels on the reference image and the target image are used in this work.

The truncated absolute difference cost, , is defined as the absolute difference between the intensity of pixels with the corresponding pixel being shifted pixels, along horizontal direction on the reference image: where and are the intensities of the pixels on reference and target images with corresponding to three-channel colors: , , and , respectively, are pixel coordinates, and is a threshold value for with . The use of a threshold to restrict cost value has been a well-adopted practice to reduce the effects of noise and potential mismatch in obscured regions.

Besides, the truncated absolute gradient difference cost is defined as where and are the horizontal and vertical gradient operators, respectively, and is a threshold value for with .

2.2. Cost Aggregation

Aggregation of primary costs determines correctness and accuracy of disparity estimation [1921]. Based on the observation that costs, such as the absolute difference and the absolute gradient difference, represent different dissimilarity characteristics, they should be independently weighted in aggregation. This section proposes a new method that is compatible with separable successive weighted summation (SWS) [18] while efficiently provides decoupled dissimilarity aggregation for robust stereo matching.

Once the and cost measures are obtained, as was defined in the last subsection, the aggregated cost function is set as the weighted sum of both measures according to a weighting factor, :

In (3), definition of the weightings, , , , and , are based on the operational principles of bilateral filters [22] to dramatically reduce computation:

The values of and increase with smaller , while the values of and rise with lesser . Also, the weightings are the multiples of horizontal and vertical weightings, which decrease with the increase of distances to the reference pixels. Hence, for each pixel, neighboring pixels with similar intensity have higher support during the aggregation.

The aggregation is a two-dimensional convolution. To reduce the computational complexity, each convolution is further decomposed into four one-dimensional convolutions [18]. These one-dimensional convolutions operate from left to right, from right to left, from top to bottom, and from bottom to top, respectively.

Let us take the absolute difference part of , denoted as in (3), as an example. If we define the left-to-right weighted sum, , as from the definition of , (4): we have that can be written in a recursive form Similarly, we may define the right-to-left weighted sum, , as which can also be written in a recursive form

We have that

Note that, in (13), we have defined to simplify the following derivation.

In the vertical direction, if we define the top-to-bottom and bottom-to-top weighted sums, and , as we have that

In calculating (16), both and are recursively obtained as

Hence, the first part of the aggregated cost, , can be efficiently calculated by (10), (12), and (17).

With similar procedure, we have that where with

These terms can all be written in the following recursive form:

2.3. Disparity Computation

In the last section, the matching cost is aggregated through weighted summation over the entire image, nd the disparity which provides minimum cost is assigned to the corresponding pixel. That is, the assigned disparity for a pixel in the reference image, , is the one with the minimum aggregated matching cost: where is the disparity search space with being the maximum possible disparity value.

The initial disparity map normally contains obscured outlier regions. The disparities at these regions are significantly different when they are found from different reference images. If and are the disparities with the left and right images as reference images, respectively, we can apply the Left-Right Consistency Check (LRC) [23] to determine if a pixel is located at an obscured region:

Once the obscured regions are found, occlusion handling [24] can be applied to patch them with the smallest disparity values of nearby regions, and the corresponding costs are also assigned for disparity refinement, as to be used in the next subsection.

2.4. Disparity Refinement

The disparity map obtained by the method proposed in the last section is discrete, since the disparity search space is an integer set. We propose a smoothing technique for the disparity refinement in this subsection.

Considering that the initial disparity has the smallest aggregation cost in the potential solution space, we may interpolate for refined value by fitting data sets to upward curves. Besides, rather than directly using the neighboring disparity for refinement, we use both the costs and disparity in the curve-fitting.

Assuming that is the cost value corresponding to disparity at on the reference image, we denote , such that and , to simplify the following presentation.

Firstly, the disparity-cost sets around the pixel, , , and , are fitted to a hyperbolic function

As the minimum value of the curve is located at , we have that

Secondly, an upward parabola equation is used to fit other disparity-cost sets around the pixel, , , and . As the minimum value of the parabola is located at , we have

The averaged value, , is then used as the refined disparity.

3. Experimental Study

In contrast to the approach of [18], the proposed algorithm applied independent weights to the mismatch measures while preserving comparable computational efficiency. Below, we rewrite the aggregated cost function, (3), for comparison:

The aggregated cost function of [18] is equivalent to where is the Census measure [25, 26]. It is clear from the comparison between (28) and (29) that the proposed algorithm (28) enables separate weightings on different measures in the aggregation.

A performance comparison between the proposed method and the algorithm of [18], denoted as InfoPerm [18], using the Middlebury stereo test bench [27, 28] is presented in Figure 1. In the computation, the parameters of the proposed algorithm, , , , , and , are selected as 22, 38, 32, 23, and 0.6, respectively.

fig1
Figure 1: A comparison of disparity estimation performance between [18] and the proposed algorithms. Top-to-bottom: first row: images (Tsukuba, Venus, Teddy, and Cones) from Middlebury stereo database, second row: ground truth disparity maps, third row: disparity maps by the InfoPerm [18] algorithm, fourth row: mismatches of the InfoPerm [18] algorithm, fifth row: disparity maps by the proposed algorithm, and sixth row: mismatches of the proposed algorithm.

In Figure 1, and hence the following presentations, disparities with errors larger than 0.5 disparity levels are named as mismatches, which are denoted in gray at non-occluded regions and black for occluded regions. The percentages of mismatches are further calculated and summarized in Table 1, which shows that the proposed algorithm outperforms InfoPerm [18] in correctness at non-occluded regions in the benchmark tests.

tab1
Table 1: Comparison of percentage of miss-matches among the proposed algorithm and [18] in nonoccluded regions.

In addition to InfoPerm [18], several state-of-the-art methods, such as SNCC [12], HistAggr2 [10], RTCensus [11], InfoPerm [18], AdaptWeight [9], FeatureGC [13], ObjectStereo [15], and AdaptingBP [16] were also implemented on the Middlebury stereo test bench [25, 26] for a complete performance comparison. Among them, SNCC [12], HistAggr2 [10], RTCensus [11], InfoPerm [18], and AdaptWeight [9] are local stereo match algorithms, while FeatureGC [13], ObjectStereo [15], are AdaptingBP [16] are global techniques.

Figure 2 shows a comparison of the disparity maps between RTCensus [11], SNCC [12], AdaptWeight [9] and the proposed approach, where all the disparity maps have been refined. The results show that the proposed method has comparable performance with these state-of-the-art methods, and the refinement strategy, introduced in Section 2.4, is able to preserve clear boundaries.

fig2
Figure 2: Top-to-bottom: color views (Tsukuba, Venus, Teddy, and Cone), ground truth disparity maps, disparity maps via RTCensus [11], disparity maps via SNCC [12], disparity maps via AdaptWeight [9], and disparity maps via the proposed approach.

The complete comparison of the mismatch rates between these algorithms is summarized in Table 2. In the table, “nonocc.” denotes the pixels in the non-occluded region, and “disc.” represents the discontinuous but visible pixels near the occluded regions. According to Table 2, the proposed algorithm outperforms AdaptWeight [9] in all of the mismatch evaluations, although it seems inferior to AdaptWeight [9] in generating sharp boundary, as shown in Figure 1.

tab2
Table 2: Comparison of mismatch percentage among the proposed algorithm and several representative algorithms.

It is also interesting to note that the local stereo matching algorithms, such as RTCensus [11] and SNCC [12], outperform other algorithms for the Teddy and Cones image pairs. Nevertheless, the global stereo matching algorithms, such as FeatureGC [13] and AdaptingBP [16], perform better for the Tsukuba and Venus image pairs. This observation indicates that both the local and global approaches are case-sensitive. However, the proposed approach has comparable performance in all of the cases and has the lowest mismatch rate in the benchmark evaluation.

4. Conclusion

Stereo matching algorithms are crucial for correct and accurate depth estimation in passive stereo vision systems. A stereo matching algorithm processes rectified stereo image pairs to generate the disparity map, which is used to calculate the depth image (z-map), and hence the 3D point cloud in camera coordinates. For practical applications, the algorithms should require less computational resources and provide precise disparity maps.

In this paper, we proposed an efficient stereo matching algorithm and a refinement strategy for the disparity maps. The algorithm effectively aggregates cost values in terms of bilateral filtering by only four passes along regions, which is able to provide a decoupled dissimilarity measure aggregation while preserving computational efficiency. Besides, the refinement strategy is a simple application of the aggregated costs that use both the costs and disparity in the curve-fitting, rather than directly using the neighboring disparity for refinement.

Experimental results using the Middlebury stereo test bench [25, 26] show that the algorithm has comparable performance with the state-of-the-art algorithms and outperforms the representative algorithms in the overall mismatch rate.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This paper was sponsored by Chang Gung Memorial Hospital, Chang Gung University, and the National Science Council, Taiwan, under Contract nos. CMRPD1C0021, CMRPD2C0051, NSC 100-2221-E-182-008, NSC 101-2221-E-182-006, and NSC 102-2221-E-182-073 and the National Science Foundation of China, under Contract no. 61271326.

References

  1. Á. González, M. Á. Garrido, D. F. Llorca et al., “Automatic traffic signs and panels inspection system using computer vision,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2, pp. 485–499, 2011. View at Publisher · View at Google Scholar · View at Scopus
  2. F. Tombari, F. Gori, and L. di Stefano, “Evaluation of stereo algorithms for 3D object recognition,” in Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops '11), pp. 990–997, Barcelona, Spain, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  3. K. Ohno, K. Kense, E. Takeuchi, and Z. Lei, “Unknown object modeling on the basis of vision and pushing manipulation,” in Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO '11), pp. 1942–1948, Phuket, Thailand, December 2011. View at Publisher · View at Google Scholar
  4. K. Schauwecker and A. Zell, “On-board dual-stereo-vision for autonomous quadrotor navigation,” in Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS '13), pp. 333–342, Atlanta, Ga, USA, May 2011.
  5. V. Murino, U. Castellani, and A. Fusiello, “Disparity map restoration by integration of confidence in Markov random fields models,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '01), pp. 29–32, Thessaloniki, Greece, October 2001. View at Scopus
  6. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  7. Y. Boykov, O. Veksler, and R. I. Zabi, “A variable window approach to early vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1283–1294, 1998. View at Publisher · View at Google Scholar · View at Scopus
  8. H. Hirschmüller, P. R. Innocent, and J. Garibaldi, “Real-time correlation-based stereo vision with reduced border errors,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 229–246, 2002. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  9. K.-J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, 2006. View at Publisher · View at Google Scholar · View at Scopus
  10. D. Min, J. Lu, and M. N. Do, “Joint histogram based cost aggregation for stereo matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 10, pp. 2539–2545, 2013. View at Publisher · View at Google Scholar
  11. M. Humenberger, C. Zinner, M. Weber, W. Kubinger, and M. Vincze, “A fast stereo matching algorithm suitable for embedded real-time systems,” Computer Vision and Image Understanding, vol. 114, no. 11, pp. 1180–1202, 2010. View at Publisher · View at Google Scholar · View at Scopus
  12. N. Einecke and J. Eggert, “A two-stage correlation method for stereoscopic depth estimation,” in Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA '10), pp. 227–234, Sydney, Australia, December 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. G. Saygili, L. van der Maaten, and E. A. Hendriks, “Feature-based stereo matching using graph cuts,” in Proceedings of ASCI-IPA-SIKS Tracks, IPA Fall Days/ICT.OPEN, Veldhoven, The Netherlands, November 2011.
  14. Z.-F. Wang and Z.-G. Zheng, “A region based stereo matching algorithm using cooperative optimization,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Bleyer, C. Rother, P. Kohli, D. Scharstein, and S. Sinha, “Object stereo Joint stereo matching and object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 3081–3088, Washington, DC, USA, June 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 15–18, Hong Kong, August 2006. View at Publisher · View at Google Scholar · View at Scopus
  17. R. Yang and M. Pollefeys, “Multi-resolution real-time stereo on commodity graphics hardware,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 1, pp. I/211–I/217, Washington, DC, USA, June 2003. View at Publisher · View at Google Scholar · View at Scopus
  18. C. Cigla and A. A. Alatan, “Efficient edge-preserving stereo matching,” in Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVWorkshops '11), pp. 696–699, Barcelona, Spain, November 2011.
  19. C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast cost-volume filtering for visual correspondence and beyond,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 504–511, 2012. View at Google Scholar
  20. K. He, J. Sun, and X. Tang, “Guided image filtering,” in Computer Vision—ECCV 2010, vol. 6311 of Lecture Notes in Computer Science, pp. 1–14, Springer, Heidelberg, Greece, 2010. View at Publisher · View at Google Scholar
  21. L. De-Maeztu, S. Mattoccia, A. Villanueva, and R. Cabeza, “Linear stereo matching,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 1708–1715, Barcelona, Spain, November 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of the 6th IEEE International Conference on Computer Vision (ICCV '98), pp. 839–846, Bombay, India, January 1998. View at Scopus
  23. P. Fua, “Combining stereo and monocular information to compute dense depth maps that preserve depth discontinuities,” in Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI '91), vol. 2, pp. 1292–1298, Sydney, Australia, 1991.
  24. L. D. Stefano, M. Marchionni, and S. Mattoccia, “A fast area-based stereo matching algorithm,” Image and Vision Computing, vol. 22, no. 12, pp. 983–1005, 2004. View at Publisher · View at Google Scholar · View at Scopus
  25. R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in Computer Vision—ECCV 1994, Lecture Notes in Computer Science, pp. 151–158, Springer, 1994. View at Publisher · View at Google Scholar
  26. N. Y.-C. Chang, T.-H. Tsai, B.-H. Hsu, Y.-C. Chen, and T.-S. Chang, “Algorithm and architecture of disparity estimation with mini-census adaptive support weight,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 6, pp. 792–805, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. “Middlebury stereo vision database,” http://vision.middlebury.edu/stereo/.
  28. F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classification and evaluation of cost aggregation methods for stereo correspondence,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, June 2008. View at Publisher · View at Google Scholar · View at Scopus