Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2013 / Article
Special Issue

Artificial Intelligence and Its Applications

View this Special Issue

Research Article | Open Access

Volume 2013 |Article ID 654139 | https://doi.org/10.1155/2013/654139

Yimin Lin, Naiguang Lu, Xiaoping Lou, Fang Zou, Yanbin Yao, Zhaocai Du, "Matching Cost Filtering for Dense Stereo Correspondence", Mathematical Problems in Engineering, vol. 2013, Article ID 654139, 11 pages, 2013. https://doi.org/10.1155/2013/654139

Matching Cost Filtering for Dense Stereo Correspondence

Academic Editor: Vishal Bhatnaga
Received04 Jul 2013
Accepted27 Aug 2013
Published30 Sep 2013

Abstract

Dense stereo correspondence enabling reconstruction of depth information in a scene is of great importance in the field of computer vision. Recently, some local solutions based on matching cost filtering with an edge-preserving filter have been proved to be capable of achieving more accuracy than global approaches. Unfortunately, the computational complexity of these algorithms is quadratically related to the window size used to aggregate the matching costs. The recent trend has been to pursue higher accuracy with greater efficiency in execution. Therefore, this paper proposes a new cost-aggregation module to compute the matching responses for all the image pixels at a set of sampling points generated by a hierarchical clustering algorithm. The complexity of this implementation is linear both in the number of image pixels and the number of clusters. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art local methods in terms of both accuracy and speed. Moreover, performance tests indicate that parameters such as the height of the hierarchical binary tree and the spatial and range standard deviations have a significant influence on time consumption and the accuracy of disparity maps.

1. Introduction

Stereo correspondence between stereo images results in a depth image, also called a disparity map, which can be categorized as sparse or dense. Sparse disparity maps are obtained mainly using feature-based methods derived from human vision research [1]. As a result, high processing speeds and accurate disparity maps are achieved but without high density, which has limited their use for many purposes. Dense stereo correspondence, which aims to figure out which parts of an image correspond to which parts of another image, is a challenging issue in the field of computer vision. The requirement of dense disparity maps is motivated by many contemporary applications such as virtual reality, view synthesis, and robot vision navigation [2].

Dense stereo correspondence algorithms can be classified as global or local according to whether they obtain disparities from global or local information. The goal of global methods (energy based) is to minimize a global cost function which combines matching costs and smoothness terms, depending on information derived from the whole image. These methods are time consuming but very accurate [3]. On the other hand, local methods (area based) offer high speed at the expense of matching accuracy and determine the degree of disparity of each pixel according to information provided by its local and neighboring pixels. These methods are also referred to as window-based methods because the disparity computation between two matching pairs depends only on the intensity values within a fixed-size and fixed-shape matching window [4]. However, recent studies have shown that, by ingeniously selecting and aggregating the matching costs of neighboring pixels, the disparity maps produced by a local approach can be more accurate than those generated by global methods [5]. The most noteworthy technique is local filtering, which is an effective way to reduce matching noise and is able to generate high-quality disparity maps.

This paper proposes a dense stereo correspondence approach very similar to the original adaptive support weight (ASW) method [6] to obtain accurate disparity maps both in depth discontinuities and smooth regions. The basic idea is to accept similar pixels within a matching window by assigning them relatively large support weights and to reject dissimilar pixels by giving them very small support weights. Therefore, it is necessary to divide the neighboring pixels into similar and dissimilar groups. In the present case, adaptive support weights are computed from the color image using a hierarchical clustering algorithm inspired by Gastal’s work [7] in high-dimensional filtering of images and videos in real time; the disparity maps after filtering are less noisy, and the depth discontinuity boundaries are preserved fairly well. In addition, the proposed algorithm has improved the results for efficiency and accuracy compared with the guided-image filter (GIF) [8] algorithm used for stereo correspondence, which is by far the best existing algorithm.

The main contributions of this paper include the following.(1)A novel matching-cost filtering model is proposed based on an edge-preserving filter for which the adaptive support weights are computed using a hierarchical clustering algorithm (as shown in Section 3.2). This solution can reduce mismatching, especially around regions of depth discontinuities, and can reconstruct dense high-accuracy disparity maps.(2)The computational complexity of the proposed method is essentially linear both in the number of image pixels and the number of clusters, regardless of the matching window size and the intensity range (as described in Section 3.3). Therefore, the method can be easily adjusted to meet real-time requirements with the help of contemporary graphics hardware (a graphics processing unit (GPU)). (3)A new disparity refinement method is presented, which has been proved to be robust and effective for improving the accuracy of coarse disparity maps (as presented in Section 3.5). This method can be applied to other coarse-to-fine frameworks, which are among the classic, simplest, and most popular stereo matching algorithms.(4)The influence of algorithm parameters on accuracy and efficiency is discussed, especially regarding the weight coefficient, the height of the hierarchical binary tree, and the size of the spatial and range standard deviations (as discussed in Section 4.2). This study offers recommendations which can be used as a basis for future practical applications.

The rest of this paper is organized as follows: Section 2 describes an overview of the state-of-the-art local filtering methods and our method will be proposed in Section 3. Section 4 presents experimental results which compare the proposed method with other state-of-the-art approaches and discusses the influences of parameter settings. Finally, conclusions and suggestions for future work are discussed in Section 5.

A disparity map is obtained by determining the disparity which has the lowest matching cost in each local matching window, a method which is widely used in local algorithms. Many local methods have been proposed to obtain a dense disparity map recently. For instance, adaptive-window methods [9, 10] try to find an optimal matching window for each pixel, and multiple-window methods [11] select an optimal matching window among predefined multiple windows located at different positions with the same shape. However, these methods have one limitation in common: the shape of the matching window is constrained to be a rectangle, which is not appropriate for pixels near depth discontinuities. Therefore, it is difficult to find an optimal matching window with an appropriate size and shape for all cases.

Instead of searching for an optimal matching window of arbitrary size and shape, it is possible to aggregate costs after local smoothing within a matching window to reduce matching noise. It is clear that most noise can be reduced effectively by a linear filter, such as Gaussian filter, but the disparity map always results in a well-known “edge-fattening” phenomenon. Therefore, the local filtering results will not be a good neighborhood representative close to an edge region. To address this problem, the recently proposed ASW algorithm [6] smoothes the matching costs with an adaptive weighted filter in which the support weights are chosen according to both the color similarity and the Euclidean distance to the center pixel. These methods imitate the way that humans assign different weights to a pixel according to color or brightness in the process of finding the correspondences between their two eyes. Such a filter is also referred to as an edge-preserving filter in computer vision and is widely used for image denoising; examples include the SUSAN filter [12], bilateral filter [13], and the nonlocal means filter [14, 15]. Experimental results show that this approach can produce disparity maps better than those generated using global optimization techniques without needing many user-specified parameters. Although this method leads to high-quality results, its computational speed presents a problem because runtime is computationally expensive. Therefore, many improved and real-time solutions have been presented, such as the O(1) bilateral filter [1618], the dual-cross-bilateral grid (DCBG) [19, 20], the GIF [21, 22], and the nonlocal filter [23].

3. Cost Aggregation with Local Filtering

A literature review has provided a taxonomy and an evaluation of typical matching algorithms and has emphasized that such a coarse-to-fine algorithm generally performs the following four steps [24]:(1)cost initialization, in which the matching costs for assigning different disparity hypotheses to different pixels are calculated;(2)cost aggregation, in which the initial matching costs are aggregated spatially over matching windows; (3)disparity optimization, in which a cost function is minimized to obtain the best disparity hypothesis for each pixel;(4)disparity refinement, in which the coarse disparity maps are postprocessed to remove mismatches or to generate fine disparity maps.

According to these four steps, in this paper, the cost aggregation with local filtering consists of five parts: matching cost initialization, cost aggregation with filtering, clustering range values for the sampling points, disparity selection, and refinement. In addition, the computational complexity is discussed.

3.1. Cost Initialization

Generally, it is possible to identify matching pairs in stereo images by measuring their similarity. The most common algorithms which use a matching cost function to establish a correspondence between the two points are the sum of absolute intensity differences (SAD), the sum of squared intensity differences (SSD), and the normalized cross-correlation (NCC) [25].

The cost initialization module computes the initial matching cost for assigning disparity hypothesis d to image pixel , where , define the displacements in the - and -directions, respectively. Generally, after rectifying a stereo image, there is no shift in the -direction except for the displacement in the -direction, in which case the cost can be represented as according to the disparity . The costs are calculated using the truncated absolute differences in range (intensity or color) and the gradient between corresponding pixels. In other words, where is the weight coefficient, is the left image, and the corresponding right image which has disparity is . is the gray-scale gradients in the -directions, and , are truncation values for balancing the range and gradient terms. Such a matching cost model has been proved to be robust to illumination changes and is commonly used in stereo correspondence [26].

3.2. Cost Aggregation with Filtering

The original local filtering approach tried to compute the weights which are the average of the adjacent matching costs. The costs aggregated over the weights can therefore be expressed as where and are pixel indices in the -direction and is the region around the th coordinate.

The weights of this linear combination are given by two Gaussian filter kernels which combine the spatial weights based on the distance between two pixels and the range weights based on the intensity difference. Therefore, the filter weights can be represented by spatial and range terms as where and are two constants used to adjust the spatial and range similarities.

The Gaussian over the range similarity can be rewritten as a convolution using two Gaussian kernels: where is a normalization factor and is a sampling range value. Finally, the range for a Gaussian integral can be evaluated numerically using an approximation according to the Gauss-Hermite quadrature rule as where is the number of sampling range values. Increasing the number of sampling points gives a better approximation for the integral in (4). Assuming that pixel has a sampling set , the filter weights in (3) can be rewritten as The normalization factor was not included because both of the numerator and denominator in (2) contain this factor and it will cancel out after the division.

3.3. Clustering Range Value for Sampling Points

As mentioned before, the key point of Yang’s algorithm [23] is that it accepts similar pixels within a matching window by assigning them relatively large support weights and rejects dissimilar pixels by giving them very small support weights. Clearly, it is necessary to divide neighboring pixels into similar and dissimilar groups. Inspired by this opinion, the authors propose a hierarchical clustering algorithm similar to that developed by Gastal and Oliveira [7] to separate iteratively the whole set of image pixels from different range values into different clusters and to perform cost aggregation with local filtering within these clusters. This is actually an expansion of the method of adaptive manifold filtering in stereo correspondence and results in a modified clustering algorithm.

Assume that pixel and its neighboring pixel within a cluster, where their th sampling points have similar range values, satisfy Averaging values only from pixels belonging to the same cluster generates better estimates for the local filtering output. Therefore, after clustering range values for the sampling points, the cost aggregation in (2) can be rewritten using the filter weights in (6) and the cluster constraints in (7) as Compared with the complexity of the original bilateral filter in (3), the proposed filter in (8) reduces the complexity from to , where and is the number of pixels within the whole image.

After introducing the improved cost aggregation and complexity analysis, an algorithm for clustering the range values can be summarized as follows.

Step 1. Generate the first sampling point at pixel by low-pass filtering the input signal within neighborhood : where represents the range value with distance around pixel .

Step 2. Generate the th sampling point . The first step is to compute an optimal hyperplane , which corresponds to the eigenvector associated with the largest eigenvalue of the covariance matrix: where is the difference between the range value and the previous sampling point associated with each pixel : and is equal to the sum of the values divided by the number of pixels :

Step 3. Segment the pixels into two clusters and using the sign of the projection:

Step 4. Compute a new sampling point also by low-pass filtering the input signal, but giving weight zero to pixels not in , as The values are the weights calculated using the range value and the previous sampling points: Perform the same processing for using pixels belonging to ; then the combination of and is the whole set of sampling points .

Step 5. The number of sampling range values determines whether more clusters are needed for sampling points. Therefore, the next step is to repeat recursively Step 2 onwards until .
Remember that Steps 2 and 3 can be directly rewritten using the sign (positive or negative) of the differences when the range value is a gray one: These five steps serve to construct the hierarchical binary tree shown in Figure 1. The whole tree with height has nodes. Each sampling point has nodes. For example, if , the third has four nodes ; note that the first subscript plus or minus is the same as the upper nodes and the second nodes generated by the current clustering procedure. The rest can be generated in the same manner.
At the top of the tree, the sampling points are better adapted to smooth regions. Points further down this tree would become gradually better adapted to edge regions.
Figure 2 shows the first three levels of sampling points of the Tsukuba image which was downloaded from the Middlebury benchmark database [27]. Based on clustering range values for more than one sampling points, the filtering results of (8) can be guaranteed to be an edge-preserving smoothing.

3.4. Disparity Optimization

Once the matching costs have been filtered using a cluster method, the disparity optimization step computes an optimal disparity map using the local winner-takes-all (WTA) approach, which computes the coarse disparities associated with the minimum cost value at each pixel. In other words, where represents the matching cost obtained after cost aggregation for assigning a disparity hypothesis to pixel and is the number of disparity levels.

3.5. Disparity Refinement

The coarse disparity maps generated by WTA may contain some mismatches because local optimization does not obey the smoothness constraint. Therefore, a two-step postprocessing method for fine disparity maps is proposed.

The first step is a left and right cross-checking procedure for mismatches. Two corresponding disparity maps with the left and the right images as reference images are obtained. Then the left and right consistency check divides all the pixels into stable or unstable pixels. Note that all stable pixels in the left and right disparity maps have the same disparity value and that the rest of the pixels are labeled as unstable, represented by a value of zero for all disparity levels.

Secondly, let represent the left disparity map; a new disparity space volume (DSV) [28] is then computed for each stable () or unstable () pixel () at each disparity level as Then an edge-preserving filter such as GIF is applied to smooth the DSV at each disparity level, and the unstable pixels are assigned a new disparity value which depends on the lowest value of the DSV.

4. Experimental Results

In this section, the performance of the proposed method is evaluated using the Middlebury stereo benchmark, which provides stereo images with known ground truth [27]. The experimental results are then compared with other local filtering methods which have recently been proven to be the best edge-preserving local stereo methods in terms of both speed and accuracy on the Middlebury benchmark website. Therefore, the comparison results will serve to demonstrate that the proposed method performs well among all local stereo correspondence algorithms. Moreover, this section analyzes the impacts of different parameter settings on the computational complexity and accuracy of the dense disparity maps.

The proposed method was run with constant parameter settings for all four testing images: . To analyze and compare the quality of the stereo matching algorithms, a widely accepted quantitative performance evaluation criterion, the percentage of bad pixels (PBP), was introduced: where is the total number of pixels, and are the computed depth mapping and the ground truth mapping, and is an absolute disparity error threshold. A value of was chosen in these experiments because this setting is the same as in some previously published studies. Hence, a smaller PBP number means a better-performing algorithm. The preferred metric (PBP) used in this paper, which is considered the most representative of the quality of the results, will be used to make comparison easier.

4.1. Comparison of Disparity Maps
4.1.1. Accuracy of the Dense Disparity Maps

The GIF-based cost-aggregation method and the proposed hierarchical clustering method were first used to aggregate matching costs. Then winner-take-all and refinement operations were used to obtain the dense disparity maps. As shown in Figure 3, both methods yielded accurate results for the depth discontinuities as well as in the smooth regions for the test images.

The corresponding quantitative results are presented in Table 1, which records PBP in the nonoccluded, depth-discontinuous, and overall regions of the “Tsukuba,” “Venus,” “Cones,” and “Teddy” images. The rightmost column of the table contains the average errors (AE), which were calculated using the average PBP over all twelve columns. As can be seen from the fourth and fifth rows of Table 1, the AE values obtained using the GIF (GIN) and the proposed method (HCN) without the refinement procedure were 8.78% and 7.77%, respectively. The first two rows show the errors obtained using the GIF (GIR) and the proposed method (HCR) with the refinement procedure; the AE values were 5.85% and 5.67%, respectively. This shows that the proposed method outperformed the GIF for filtering matching costs during cost aggregation. As expected, the proposed refinement method is suitable for removing mismatches, and the improvement is evident. In particular, as can be seen in Table 1, HCR can also outperform the original ASW algorithm [6] and the fast DCBG technique [20]. In the authors’ opinion, the method proposed in this research may well achieve the topmost position among local stereo correspondence algorithms.


MethodTsukubaVenusTeddyConesAE
NonAllDiscNonAllDiscNonAllDiscNonAllDisc

HCR1.561.788.070.220.342.966.3611.9315.622.888.148.225.67
GIR1.872.237.920.270.472.606.7412.2816.202.948.358.365.85
ASW1.381.856.900.711.196.137.8813.3018.603.979.798.266.66
HCN2.142.949.161.251.949.367.2215.2817.963.4112.939.617.77
GIN2.533.328.631.983.1315.818.3516.8718.813.6412.649.708.78
DCBG5.907.2621.01.351.9111.2010.517.222.25.3411.914.910.89

To verify algorithm stability, the performance of the GIF and the proposed methods was compared on an additional 27 Middlebury stereo images [27]. As described above, the PBP values with a disparity error larger than one pixel in all the regions were used to build the average of this measure over all 27 test images. The corresponding quantitative evaluation is summarized in Table 2. Note that both methods may be less accurate in large untextured regions such as the Midd1 and Monopoly pairs. Errors in untextured regions are due mostly to mismatches and will cause inconsistencies between the left and right disparity maps. However, the proposed HC method is still the winner and slightly outperforms the GIF technique. In a comparison of HCN and HCR, the proposed refinement method is expected to perform well.


MethodAloeBaby1Baby2Baby3Bowling1Bowling2Cloth1Cloth2Cloth3Cloth4

HCN12.7111.1411.8117.8726.7019.109.7116.3711.1514.95
GIN13.4212.3912.8817.9827.3719.1910.3616.5611.3015.40
HCR8.194.997.249.7420.6914.384.6110.965.3410.66
GIR8.785.417.599.9920.2614.615.0311.435.4310.81

MethodFlowerpotsLampshade1Lampshade2Midd1Midd2MonopolyPlasticRocks1Rocks2Wood1

HCN23.6023.0330.9545.6641.6636.5143.6211.9012.2416.78
GIN23.4124.1332.6446.5842.9034.7847.8111.7211.8317.61
HCR18.4815.8623.4643.9537.3022.7135.605.725.195.26
GIR18.8116.6724.0144.3538.6225.0138.335.555.025.57

MethodWood2ArtBooksDollsLaundryMoebiusReindeerAE

HCN15.4326.2621.5917.3227.9820.3221.9621.79
GIN14.8326.4121.1016.6829.1920.0623.1222.28
HCR0.6418.6017.8511.9020.8014.688.1914.92
GIR0.5718.5917.5011.9622.8014.228.3615.38

4.1.2. Computational Complexity

We have implemented two versions of the local matching filter described in this paper and tested them on the four benchmark images. These implementations include CPU versions written in MATLAB and a GPU version written in CUDA. The performance numbers reported in this paper were measured on a 2.99-GHz Intel Core 2 Duo processor with 3.25 GB of memory and on a GPU (GeForce 9500GT) with 512 MB of memory. Note that all of the algorithms were run on the same testing platform to achieve a fair comparison.

As demonstrated by the results shown in Table 3, the proposed method is slightly faster than GIF for the testing images both on CPU and GPU platforms. The reason for this is that the total complexity of GIF on three-dimensional color images for disparity maps is O(17N) [21], while that of the proposed method is O(15N), with a tree height and therefore a constant . Moreover, the proposed method also has the same linear time requirement as the GIF, regardless of the filter kernel size and the intensity range.


VersionMethodTsukubaVenusTeddyCones

CPUGIF3280251257
HC2862204203
GPUGIF0.3300.5431.6771.695
HC0.2700.4341.3071.315

Obviously, all the run times increase with the dimensional size of the disparity maps, where the “Tsukuba,” “Venus,” “Cones,” and “Teddy” disparity maps are 384 × 288 × 15, 434 × 383 × 19, 450 × 375 × 59, and 450 × 375 × 59, respectively. As a result, our CPU implementation processes a 1-megapixel image in about 16 to 20 seconds, resulting in a time-consuming process. Due to the simple and parallel operations used by our approach, our filter achieves significant performance gains on GPU platform. The total time required for filtering a 1-megapixel image ranges from 0.1 to 0.2 seconds. This represents a speedup from 80 to 200 compared to our CPU implementation.

Consequently, the proposed approach seems to perform slightly better than others in terms of accuracy and computational efficiency.

4.2. Influence of Parameter Settings
4.2.1. Robust Illumination-Independent Behavior

All of the stereo benchmark images used in Section 4.1 have been acquired under normal lighting conditions and there are no significant variations of luminosity between the two images of a stereo pair. However, this condition is often not valid for a real environment [29, 30]. Due to illumination effects, the color value is not always reliable for stereo matching. Therefore, it has been suggested to supplement the constraint on the gradient in (1), which is invariant to additive illumination changes.

In order to confirm that the proposed method is robust when applied to illumination-variant stereo pairs, PBP results of the altered Tsukuba images with different weight coefficient were presented in Table 4. Refer to Nalpantidis [29], each stereo pair consisted of the left image of the Tsukuba image set and a mount of different versions of the right image whose luminosity alteration ranged from −25% to +25% with 5% increments.


00.10.50.91
NonAll DiscNonAll DiscNonAll DiscNonAll DiscNonAll Disc

−25%3.334.2412.273.864.8213.5015.716.827.240.941.648.167.067.369.9
−20%3.124.0011.963.474.3612.8017.018.127.940.841.546.560.260.662.6
−15%2.973.8411.653.274.1011.8618.819.729.044.845.346.655.455.855.4
−10%3.033.9311.873.434.2211.2318.018.725.341.441.841.944.745.043.7
−5%3.084.0712.243.254.0110.5012.513.319.225.125.727.926.427.029.0
0%3.214.2112.362.142.949.161.952.759.182.193.099.632.243.159.73
5%3.334.3212.472.253.099.7910.911.716.321.321.923.522.423.024.2
10%3.514.5413.123.224.1011.4519.820.523.641.241.438.142.642.839.3
15%3.854.8813.743.884.8213.0320.521.326.349.549.946.052.352.648.8
20%4.095.1414.394.235.2414.1719.020.129.654.454.953.759.259.559.0
25%4.545.6215.324.675.7615.3720.021.131.653.153.754.362.062.562.6

It can be seen from Table 4 that the algorithm only based on color value (as ) leads to many false matches with the lighting nonuniformity, while the quality of the algorithm that just relied on gradient (as ) remains almost the same for every tested lighting condition. Moreover the algorithm combining color with gradient value produces the best results for ideal lighting conditions (%). As a result, the quality of our proposed method (when ) can be less affected by any difference of the lighting conditions and be satisfied with a suitable accuracy.

4.2.2. Selection of the Tree Height

The first step is to discuss how tree height affects the performance of the proposed method. “Tsukuba” was chosen as the test image, and the GPU run time and PBP of the disparity maps were recorded with increasing tree height, as shown in Table 5. Note that the spatial and the range are constants.


HeightTime (s)NonAllDisc

10.0314.395.9120.63
20.0563.003.9913.78
30.1282.323.219.93
40.2702.142.949.16
50.4992.052.788.90
61.2032.022.688.99

It is clear from the second column that the proposed algorithm will increase greatly in compilation time with increasing tree height. Because the number of sampling points increases with tree height, the greater number of summation operations (8) for the sampling points will be time consuming. On the contrary, the accuracy of the disparity maps for nonoccluded, depth-discontinuous, and overall regions, which is demonstrated in the last three columns, is dramatically improved with increasing height. The reason for this, as mentioned before, is that increasing the number of sampling points reduces the errors between the continuous integration (4) and the discrete summation (5). Figure 4 shows the first three levels of weights (16) for the test image corresponding to the sampling tree (Figure 1). Similar pixels with relatively large weights are shown in white, while black denotes dissimilar pixel areas with very small weights. Moving down this tree, the large weights will be gradually assigned to edge regions, and image integrity will be guaranteed. For example, the missing information from the edge regions of the lamp in , which is black with very small weights, can be compensated by more detail from the white regions with large weights in and .

However, accuracies improve slightly or even become worse between and because the spatial and range parameters are constant and unsuitable for the tree height. To confirm this cause, is kept constant, and the PBP for “non” are improved, with values of 1.98 and 1.95 when , and , , respectively. Therefore, the spatial and range parameters also affect performance, which will be further discussed below.

4.2.3. Influences of and

and are two standard deviations used to adjust the spatial similarity and the range similarity, respectively. The spatial spread is chosen based on the desired amount of low-pass filtering. A large creates more blurring, meaning that more high-frequency components are removed and the image becomes obviously blurred. Similarly, the range spread is set to achieve the desired amount of combination of pixel range values. Generally speaking, pixels with range differences less than are mixed together, and those with differences greater than are removed [13].

The results obtained from varying , in (8) are equivalent to adjusting the spatial and range spread for a bilateral filter. However, the influence of changes in , on the clustering weights (16) is also significant. To analyze the error source qualitatively, the following two propositions are defined.

Proposition 1. More sampling points will be needed for good accuracy when the range spread is small or the spatial spread is large. If the height is constant, the matching error would suffer from the edge-losing effect (ELE).

Proof. Using (16), the weight of each pixel is reduced when the value of is small with respect to the overall range of values in the image or when the set of sampling points is dissimilar to the image value because appears to be hazy due to larger (9) or (15). Moreover, each covers a limited sampling region, which means that in turn, more values are needed to adapt to the signal [7].

Proposition 2. The filter weights (6) in the proposed method behave more like a low-pass filter when the range spread is large or the spatial spread is small. The matching error would be caused by the edge-smoothing effect (ESE).

Proof. Using (16), the weights of all pixels are increased when the value of is large with respect to the overall range of values in the image or when the set of sampling points is similar to the image value because appears to be less hazy due to smaller (9) or (15). Therefore, all pixel values in any given neighborhood have approximately the same weight from range filtering for (6), and the resulting filter approximates a standard Gaussian filter [13].

“Tsukuba” was chosen as the test image. A fast way to determine the best choice of and using the filtering results of peak signal-to-noise ratios (PSNR) [31] is where is the image size, is the local filtering result (as in Figures 3(a)3(d)), and is the ground truth (as in Figure 3(e)). Table 6 shows the PSNR distributions with , . The following can be determined.(1)The PSNR decreases as or becomes smaller when and . The reason for this is that ESE obeys Proposition 2, that the proposed method behaves more like a low-pass filter when decreases. The reason for the latter is that ELE obeys Proposition 1 that accuracy is reduced due to lack of more information in the filtering results around the edge regions due to a limited number of sampling points.(2)The PSNR decreases with increasing or when and . It obeys Propositions 1 and 2 that the accuracy is reduced due to ELE with large in a constant-height tree and due to ESE with large for each sampling point.


1102030405060708090

0.0111.6312.9913.1013.2113.2413.2813.4613.3813.0812.73
0.112.7114.0114.0614.0714.0114.0014.0013.8613.8613.87
0.212.8713.9913.9913.9713.8913.8513.8313.6713.6413.62
0.312.9313.9613.9313.8813.7713.7013.6513.4713.4313.39
0.412.9613.9413.8813.8113.7013.6213.5613.3813.3313.29
0.512.9813.9313.8513.7713.6413.5613.5013.3213.2713.23
0.612.9913.9213.8313.7313.6013.5213.4513.2713.2213.18
0.712.9913.9113.8113.7113.5713.4913.4213.2413.1913.15
0.813.0013.9113.8013.6913.5513.4613.3913.2113.1613.12
0.913.0013.9113.7913.6813.5313.4413.3713.1913.1313.09

From the two findings, it can be confirmed that the optimal values for and are approximately 10 and 0.1, respectively, which are shown using bold italic font in Table 6.

The PBP distributions for the “non,” “all,” and “disc” disparity maps were then recorded with , but with and varying according to , , as shown in Figure 5. Results derived from Figure 5 can be summarized as follows.(1)All the PBP perform like the results of PSNR; the PBP values increase as or becomes smaller. They decrease with increasing or , but only up to a certain point, which constitutes the best parameter setting. After that point, the PBP values will gradually increase. (2)Figure 5(c) is more obviously different from the first two PBP because it was calculated only from the edge regions. The accuracy reduction refers to the nonoccluded and overall regions generated by ESE or ELE, which are smaller than the depth-discontinuous regions.

Consequently, accuracy was reduced when and became too small or too large within a constant-height tree. In terms of computational cost, the range component depends linearly on the image, regardless of the filter kernel for each sampling layer. To this end, the authors suggest that the tree height be first determined according to the time consumption and then that the filtering results for PSNR be used to determine the general choice of and .

5. Conclusions and Future Work

In this paper, a new local solution for fast, high-quality dense stereo correspondence has been proposed that focuses on matching cost filtering method which is based on a high-performance hierarchical clustering algorithm. Instead of filtering the matching costs using an edge-preserving smoothing operator as in the popular bilateral filter, the cost aggregation model was adjusted to compute the matching responses for all image pixels at a set of sampling points generated using a clustering method. The computational complexity for this filtering is linear both in the number of image pixels and the number of clustering classes. The experimental results of the comparison have demonstrated that the proposed method outperforms the GIF-based matching algorithm, which is one of the best local methods on the Middlebury benchmark in terms of both speed and accuracy. Moreover, the results of performance tests, which provide effective guidelines for parameter selection, indicate that good accuracy is highly dependent on the weight coefficient, the height of the hierarchical binary tree, and the spatial and range standard deviations. As a result, it can now be confirmed that the proposed approach can be capable of high-speed processing and offer high-quality disparity maps for dense stereo correspondence.

In the experimental results, we show that both of the GI and HC filtering methods make some of the erroneous disparity values due to the lack of texture, which is a traditional challenge for stereo algorithms. The reason is that a pixel’s disparity value is obtained by selecting the point of highest matching score and independently of disparity assignments of neighboring pixels. Hence, most of the disparity values in the low-texture areas maybe incorrect using a local matching method. To overcome this bottleneck, the authors plan to make the algorithm capable of handling large untextured regions, which remains an active area for future research [32].

Acknowledgments

This work was supported by the open project of Beijing Key Laboratory on Measurement and Control of Mechanical and Electrical System (no. KF20121123206), Key Laboratory of Modern Measurement and Control Technology (BISTU), Ministry of Education, Funding Project for Academic Human Resources Development Institutions of Higher Learning under the Jurisdiction of Beijing Municipality (no. PHR201106130), and Funding Project of Beijing Municipal Science & Technology Commission (no. Z121100001612011).

References

  1. S. Y. Chen, H. Tong, and C. Cattani, “Markov models for image labeling,” Mathematical Problems in Engineering, vol. 2012, Article ID 814356, 18 pages, 2012. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  2. J. A. Kalomiros, “Dense disparity features for fast stereo vision,” Journal of Electronic Imaging, vol. 21, no. 4, Article ID 043023, 2012. View at: Google Scholar
  3. S. Park and H. Jeong, “High-speed parallel very large scale integration architecture for global stereo matching,” Journal of Electronic Imaging, vol. 17, no. 1, Article ID 010501, 2008. View at: Publisher Site | Google Scholar
  4. N. Lazaros, G. C. Sirakoulis, and A. Gasteratos, “Review of stereo vision algorithms: from software to hardware,” International Journal of Optomechatronics, vol. 2, no. 4, pp. 435–462, 2008. View at: Publisher Site | Google Scholar
  5. M. Gong, R. Yang, L. Wang, and M. Gong, “A performance study on different cost aggregation approaches used in real-time stereo matching,” International Journal of Computer Vision, vol. 75, no. 2, pp. 283–296, 2007. View at: Publisher Site | Google Scholar
  6. K.-J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, 2006. View at: Publisher Site | Google Scholar
  7. E. S. L. Gastal and M. M. Oliveira, “Adaptive manifolds for real-time high-dimensional filtering,” ACM Transactions on Graphics, vol. 31, no. 4, 2012. View at: Google Scholar
  8. C. Rhemann, A. Hosni, M. Bleyer, M. Bleyer, C. Rother, and M. Gelautz, “Fast cost-volume filtering for visual correspondence and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 504–511, 2012. View at: Google Scholar
  9. T. Kanade and M. Okutomi, “Stereo matching algorithm with an adaptive window: theory and experiment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920–932, 1994. View at: Publisher Site | Google Scholar
  10. K.-H. Bae, J.-J. Kim, and E.-S. Kim, “New disparity estimation scheme based on adaptive matching windows for intermediate view reconstruction,” Optical Engineering, vol. 42, no. 6, pp. 1778–1786, 2003. View at: Publisher Site | Google Scholar
  11. S. A. Adhyapak, N. Kehtarnavaz, and M. Nadin, “Stereo matching via selective multiple windows,” Journal of Electronic Imaging, vol. 16, no. 1, Article ID 013012, 2007. View at: Publisher Site | Google Scholar
  12. S. M. Smith and J. M. Brady, “SUSAN—a new approach to low level image processing,” International Journal of Computer Vision, vol. 23, no. 1, pp. 45–78, 1997. View at: Google Scholar
  13. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proceedings of the IEEE 6th International Conference on Computer Vision, pp. 839–846, January 1998. View at: Google Scholar
  14. A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), pp. 60–65, June 2005. View at: Publisher Site | Google Scholar
  15. Y. S. Heo, K. M. Lee, and S. U. Lee, “Simultaneous depth reconstruction and restoration of noisy stereo images using non-local pixel distribution,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007. View at: Publisher Site | Google Scholar
  16. F. Porikli, “Constant time O(1) bilateral filtering,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008. View at: Publisher Site | Google Scholar
  17. Q. Yang, K.-H. Tan, and N. Ahuja, “Real-time O(1) bilateral filtering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 557–564, Miami, Fla, USA, June 2009. View at: Publisher Site | Google Scholar
  18. M.-H. Ju and H.-B. Kang, “Constant time stereo matching,” in Proceedings of the 13th International Machine Vision and Image Processing Conference (IMVIP '09), pp. 13–17, Dublin, Republic of Ireland, September 2009. View at: Publisher Site | Google Scholar
  19. J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image processing with the bilateral grid,” ACM Transactions on Graphics, vol. 26, no. 3, Article ID 1276506, pp. 103-1–103-9, 2007. View at: Publisher Site | Google Scholar
  20. C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodgson, “Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid,” in Proceedings of the 11th European Conference on Computer Vision (ECCV '10), vol. 6313 of Lecture Notes in Computer Science, pp. 510–523, Springer, Heraklion, Greece, 2010. View at: Publisher Site | Google Scholar
  21. K. He, J. Sun, and X. Tang, “Guided image filtering,” in Proceedings of the 11th European Conference on Computer Vision (ECCV '10), vol. 6311 of Lecture Notes in Computer Science, pp. 1–14, Springer, Heraklion, Greece, 2010. View at: Publisher Site | Google Scholar
  22. L. De-Maeztu, S. Mattoccia, A. Villanueva, and R. Cabeza, “Linear stereo matching,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 1708–1715, Barcelona, Spain, November 2011. View at: Publisher Site | Google Scholar
  23. Q. Yang, “A non-local cost aggregation method for stereo matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '12), pp. 1402–1409, Providence, RI, USA, 2012. View at: Publisher Site | Google Scholar
  24. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1-3, pp. 7–42, 2002. View at: Publisher Site | Google Scholar
  25. H. Hirschmüller and D. Scharstein, “Evaluation of cost functions for stereo matching,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), pp. 1–8, Minneapolis, Minn, USA, June 2007. View at: Publisher Site | Google Scholar
  26. T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 3, pp. 500–513, 2011. View at: Publisher Site | Google Scholar
  27. “Middlebury stereo vision database,” http://vision.middlebury.edu/stereo/. View at: Google Scholar
  28. K. Mühlmann, D. Maier, J. Hesser, and R. Männer, “Calculating dense disparity maps from color stereo images, an efficient implementation,” International Journal of Computer Vision, vol. 47, no. 1-3, pp. 79–88, 2002. View at: Publisher Site | Google Scholar
  29. L. Nalpantidis and A. Gasteratos, “Stereo vision for robotic applications in the presence of non-ideal lighting conditions,” Image and Vision Computing, vol. 28, no. 6, pp. 940–951, 2010. View at: Publisher Site | Google Scholar
  30. L. Nalpantidis and A. Gasteratos, “Biologically and psychophysically inspired adaptive support weights algorithm for stereo correspondence,” Robotics and Autonomous Systems, vol. 58, no. 5, pp. 457–464, 2010. View at: Publisher Site | Google Scholar
  31. K. Bae, J. Ko, and J. Lee, “Stereo image reconstruction using regularized adaptive disparity estimation,” Journal of Electronic Imaging, vol. 16, no. 1, Article ID 013013, 2007. View at: Google Scholar
  32. Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér, “Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 492–504, 2009. View at: Publisher Site | Google Scholar

Copyright © 2013 Yimin Lin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

1836 Views | 1665 Downloads | 6 Citations
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.