Abstract

In traditional adaptive-weight stereo matching, the rectangular shaped support region requires excess memory consumption and time. We propose a novel line-based stereo matching algorithm for obtaining a more accurate disparity map with low computation complexity. This algorithm can be divided into two steps: disparity map initialization and disparity map refinement. In the initialization step, a new adaptive-weight model based on the linear support region is put forward for cost aggregation. In this model, the neural network is used to evaluate the spatial proximity, and the mean-shift segmentation method is used to improve the accuracy of color similarity; the Birchfield pixel dissimilarity function and the census transform are adopted to establish the dissimilarity measurement function. Then the initial disparity map is obtained by loopy belief propagation. In the refinement step, the disparity map is optimized by iterative left-right consistency checking method and segmentation voting method. The parameter values involved in this algorithm are determined with many simulation experiments to further improve the matching effect. Simulation results indicate that this new matching method performs well on standard stereo benchmarks and running time of our algorithm is remarkably lower than that of algorithm with rectangle-shaped support region.

1. Introduction

Stereo vision is a fundamental technique for extracting 3D information of a scene from two or more 2D images. It is widely applied in robot navigation, remote sensing, and industrial automation. One of the key technologies of stereo vision is stereo matching, which produces a disparity map. The stereo matching algorithm can be classified into two broad categories: global-based and local-based algorithms.

Global-based matching algorithms follow the energy minimization principle. First, an energy function is established, consisting of a data term and a smoothness term. Next, this function is minimized with a global optimization method. Dynamic programming [1], loopy belief propagation (LBP) [2, 3], and graph cut [4, 5] are usually employed to identify the minimum energy required for a global-based algorithm. Comprehensive global constraint information can produce a more accurate disparity map in a global-based algorithm.

A local-based matching algorithm is a simple and effective method for stereo matching that is commonly used. An important underlying principle of local-based matching is that pixels in a support region have an approximately equal disparity. To satisfy this principle, it is very important to determine the support region size. The support region must contain enough pixels for intensity variation, and the support region must include only those pixels with the same disparity. Thus, the traditional, local-based matching method is prone to false matching for pixels from the depth discontinuities region, since those pixels are from different depths. To ensure that a local-based matching algorithm performs well in practical applications, various approaches have been proposed. For example, adaptive windows have been used to improve matching results. These methods search an appropriate support region for each pixel, greatly improve the performance of matching results, and outperform standard local-based methods [610]. However, it is difficult to search a support region with an arbitrary shape and size, and most of these methods have a high computational complexity. Other researchers assign different support-weights to the pixels in a support region, keeping the size and shape of a support region constant [1113].

In recent years, several methods for acquiring satisfactory effect of stereo matching have been adopted. Yang et al. [17] presented a stereo matching algorithm which integrates color-weight, belief propagation, left-right checking, color segmentation, plane fitting, and depth enhancement. Mei et al. [18] integrated the AD-census cost measurement function, the cost aggregation method based cross-based region, the scanline optimize method, the multistep refinement method, and the accelerative algorithm based on CUDA into their algorithm.

The algorithm presented in this paper is inspired from adaptive-weight matching algorithm. In this paper, the aim is to propose a low computation complexity and high accuracy stereo matching algorithm. So the rectangle-shaped support region is substituted for the line-shaped support region. Lacking of enough pixel information is a main weakness of the line-shaped support region, which is easy to cause error matching. Adaptive-weight can make full of limited pixel information, by analyzing the characteristic of the adaptive-weight model proposed in [13] on disparity accuracy, we use neural network (NN) to determine the spatial proximity and mean-shift based segmentation method to effectively describe the color similarity.

In addition, several approaches are applied to complete the algorithm. We develop a new pixel dissimilarity measurement function which combines Birchfield pixel dissimilarity measurement function and census transform to compute the matching cost. The loopy belief propagation method proposed in [2] is employed to estimate the initial disparity map, which is optimized with min convolution and image pyramid. There are several measurable improvements for the initial disparity map. To further improve the accuracy of the initial disparity map, iterative left-right consistency (LRC) checking and segmentation voting are used to refine the disparity map by analyzing the features of the initial disparity map.

2. Algorithm Description

The algorithm presented in this paper can be divided into two steps: a disparity map initialization step and a refinement step. The framework of the algorithm is shown in Figure 1. A detailed description of this algorithm is given in the following sections.

2.1. Adaptive-Weight Based Cost Aggregation Method

Assuming the two pixels and , the disparity of center pixel wants to be computed. is the support region of pixel , while is a neighboring pixel of in the support region. The support-weight of is assigned by the following according to [12]: where represents the spatial proximity, represents the color similarity, and is the support-weight. Our algorithm is designed on the basis of this framework. The list of variables used in this paper is shown in the end of the paper.

2.1.1. The Model of Line-Based Adaptive-Weight

Wang et al. [13] noted that when the support region is large enough, color similarity plays a major role in computing the center pixel disparity within a certain range. As shown in Figure 2, red represents the support region . We used the pixels in to compute the disparity of . The effects of spatial proximity can be neglected in the pale blue region according to [13]. We call this region the transition area, represented by . To satisfy this principle, the neural network can be applied in the design of this spatial proximity model.

Figure 3 shows the spatial proximity model established by neural network. The position of a pixel is the input, the spatial proximity is the output, and the connect weights are shown as in the figure. In fact, the distance is the input of neural network. To simplify the notations, suppose that the center point is at , the distance can be simplified into which represents the position of a pixel. The concrete form of the spatial proximity is expressed by where is the sigmoid function. Figure 4 demonstrates the varied trend of the spatial proximity according to the position of a pixel.

In Figure 4, the space between the two blue lines is the transition area and the support region is represented by the whole -axis. It can be seen from Figure 4 that the spatial proximity of pixel in the transition area is significantly greater than that of pixel in other area, which accords with the spatial proximity model of traditional adaptive-weight theory; there is not much difference between these pixels in the transition area for the spatial proximity, which means that the influence of spatial proximity can be neglected.

According to the segmentation-based stereo matching principle [14, 19], a new model of color similarity is established by the following in [20]:

Equation (3) shows that color similarity contributes enormously to measure the dissimilarity between center pixel and its neighbor pixel when and belong to the same segmentation.

That color similarity model based on image segmentation can achieve good performance, as introduced in [13]. Mean-shift is a nonparametric estimation iterative technique, and its application domains include computer vision, clustering, and image processing [21]. In this work, we use mean-shift as the segmentation method.

2.1.2. Cost Aggregation

The matching cost of pixel with disparity is represented bywhere and are the corresponding pixels of and , respectively, when the disparity of the center pixel is .

The pixel dissimilarity measurement function in (4) is very important for cost aggregation. The absolute difference and Birchfield function [22] are widely used in cost aggregation. To improve the matching accuracy of the textureless and repetitive regions, the pixel dissimilarity measurement function is described by combing Birchfield function and census transform:

To validate the effect of this cost aggregation method, simulation results on Teddy and Cones with Birchfield method and our method are shown in Figure 5.

2.2. Initial Disparity Determination

The winner-take-all (WTA) searching strategy is a common method for determining disparity, which can be expressed by

WTA tends to produce a low accuracy disparity map. Therefore, we adopt an efficient LBP algorithm proposed in [2] in this paper. In this LBP algorithm, FFT convolution and image pyramid are integrated into LBP, which can effectively decrease the complexity of LBP and increase the matching effect. The flowchart of the initial determination procedure is shown in Algorithm 1.

() Computing matching cost by (4)
    Initializing pyramid level and down sampling factor
() Pyramid initialization
    Pyramid().cost =
() Establishing pyramid structure with and
() For each level
()  For each iteration
()   Updating left, right, up and down direction message for each pixel
()  End
() End
() Determining disparity for each pixel by max-product principle

2.3. Disparity Refinement

It is inevitable that initial disparity maps will contain many error-matched pixels. To refine the disparity map, a two-step postprocessing method is put forward in this section.

2.3.1. Left-Right Consistency Check

The disparity map for the left image is computed by previous steps. The disparity map for the right image is computed in a similar manner. In [15, 17], pixels are classified into several types according to , , and to remove outliers. Then different strategies are designed for different type of pixels to determine its disparity.

In this work, we analyzed the property of initial disparity map. Figure 6 shows the distribution of bad pixels after executing the disparity initialization step of our algorithm for Tsukuba. It can be seen from this figure that most of the bad pixels concentrate in the occluded region. According to this result, most of the pixels match correctly and the initial disparity in the nonocc region should be trusted. Therefore an iterative left-right consistency check is proposed for handling this.

Pixels can be divided into two types: undependable pixels and dependable pixels. Pixel is classified as dependable when it meets the following condition:

Pixel is considered to be undependable if it fits the following condition: The new disparity of the undependable pixel can be computed as in Algorithm 2.

()   Establishing the table of pixel type according to (7)-(8)
()   
()    For each iteration
()     For each undependable pixel which is determined by
()      Searching the nearest dependable pixel and
()        //where represents initial disparity of pixel
()      If
()        Set as dependable pixel in
()      End
()    End
()   End
()  End

Figure 7 shows the result after the execution of iterative left-right consistency checking. Table 1 shows the detailed data for the role of iterative left-right consistency checking.

2.3.2. Segmentation Voting

The pixelwise region shown in Figure 8 is established according to color consistency. This region can be represented by . Pixels in satisfy the following condition:

Let represent the frequency distribution of disparity in . The new disparity is updated by Algorithm 3.

)
() For each
()  For each
()   If
()    
()   End
()  End
() End
()

The median filter is applied to the left disparity map. Figure 9 shows the effect of segmentation voting on the accuracy of the disparity map.

3. Results and Discussion

3.1. Parameters Determining

The parameters involved in our algorithm greatly affect the performance of the algorithm. In this section, we present the parameter settings.

We considered eight main parameters: , , , , , , , and , which are kept constant for all benchmarks.

Figure 10 shows the influence of on the accuracy of disparity map obtained by our algorithm. When varies from 35 to 65, our algorithm is insensitive to .

Figure 11 demonstrates that the performance of our algorithm varies with and . From these figures, it can be noted that generally speaking when is larger than 15, the influence of over algorithm performance is small for Tsukuba, Venus, and Teddy. Our algorithm shows good performance for Tsukuba and Teddy when and . The error percentage tends to decrease with a decreasing when is smaller than 15. In regard to Cones, the performances of this algorithm improve as increases, and when lies between 15 and 30.

Figure 12 shows the influence of on performance. The trend of the error percentage is U-shaped for Tsukuba when is smaller than 20. The error percentage bottom out when is between 7 and 13. The error percentage for Venus follows a downward trend. For Teddy and Cones, the error percentage is inversely proportional to . When is larger than 12, the error percentage is insensitive to .

Figure 13 shows the performance of our algorithm according to and . From these figures, our algorithm has a good ability of robustness with different values of . When , the error percentage of the disparity map obtained by our algorithm is still fairly low.

For the disparity map refinement step, two main parameters must be set. and are previously introduced. Figure 14 shows the influence of and on performance. When and , all data sets show good performance.

3.2. Experimental Results

We evaluate our algorithm on Middlebury benchmarks [23] with error threshold 1. The test platform hardware consists of T9600 CPU and 5 GB memory. Software consists of MATLAB 2014a and VS2012. Parameters are shown in Table 2.

Simulation results on Middlebury data sets are presented in Figure 15. The quantitative performance of our algorithm is shown in Tables 3 and 4. Our algorithm ranks 5th in the Middlebury data set (July 1, 2014).

These results demonstrate our algorithm has a good performance. However, it is difficult to decrease the error percentages in the three regions (nonocc, all, and disc) at the same time. This is because many pixels with correct disparity are classified as undependable, according to (8). Tsukuba can be used as an example. Figure 16 represents the bad pixels detected by (7)-(8).

Figure 6 shows the bad pixels obtained by comparing the initial disparity map and ground truth. Figure 6 indicates that true bad pixels are distributed in the occlusion region. However, it can be inferred from Figure 16 that many pixels in nonocc and disc regions are mistakenly classified as undependable. When standard LRC checking is used, misclassified pixels in the disc region may be assigned a wrong disparity, as shown in Figure 17. Table 5 shows quantitative comparison results for standard LRC checking and iterative LRC checking.

Thus, it can be seen that the error percentage in the disc region greatly increases after applying standard LRC checking. Because the disc region is part of the nonocc region, the error percentage in the nonocc region also increases. Our iterative LRC checking method can effectively improve the performance of our stereo matching algorithm.

3.3. Running Time of Our Algorithm

In this section, we investigate the computation running time of our algorithm. The running time directly reflects computational complexity. Without a loss of generality, our algorithm runs 50 times and average running time was calculated. Results are shown in Table 6.

The new rectangle-based adaptive-weight method can be obtained by imposing the -direction constraint on the new weight model. Table 7 displays the running time of cost aggregation of our algorithm with the rectangle-shaped support region under different sizes of support region. The rectangle-based algorithm also runs 50 times under each size of support windows.

Our algorithm with the rectangle-shaped support region performed best when the size of the support region is ; the error percentages of the initial disparity map in nonocc, all, and disc regions are 1.45%, 3.42%, and 7.08%, respectively. Tables 57 show that our algorithm produced notable results in decreasing computation complexity and improving performance.

4. Conclusions

In this work, we proposed a new line-based adaptive-weight stereo matching algorithm that integrates several methods. The main conclusions that can be drawn from our results are as follows.(1)Cost aggregation is the most time-consuming part of stereo matching algorithm. Using a line-shaped support region can dramatically reduce the elapsed time of cost aggregation.(2)The adaptive-weight model proposed in this paper can produce a rather satisfactory initial disparity map in the absence of enough pixel information.(3)Experimental results show that the algorithm proposed in this paper can attain a better matching effect with less running time.

Although our algorithm has a good performance on Middlebury data sets, there is still much room for improvement.(1)There are too many parameters in our algorithm to accommodate different image pairs. In further research, we will analyze the intrinsic relationship among the parameters and reduce the number of parameters.(2)Figure 13 indicates that image characteristics have a significant impact on the optimum values of and . In future studies, we will explore how the optimum values of and vary according to the texture of image.

List of Variables

: The half-length of support region
:Shape controlling parameter
:The half-length of transition region
:Spatial radius of mean-shift
:Color radius of mean-shift
:The half-length of segment for segmentation voting
:Confidence level of color similarity for segmentation voting.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors express their appreciation for the financial support of the Shandong Natural Science Foundation, Grant no. ZR2013FL033. They also extend their sincere gratitude to reviewers for their constructive suggestions.