Local Stereo Matching Using Adaptive Cross-Region-Based Guided Image Filtering with Orthogonal Weights

Kong, Lingyin; Zhu, Jiangping; Ying, Sancong

doi:https://doi.org/10.1155/2021/5556990

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Work Experimental Results Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 5556990 | https://doi.org/10.1155/2021/5556990

Local Stereo Matching Using Adaptive Cross-Region-Based Guided Image Filtering with Orthogonal Weights

Lingyin Kong,¹Jiangping Zhu,¹and Sancong Ying¹

Academic Editor: Luis J. Yebra

Received12 Jan 2021

Accepted30 Apr 2021

Published07 May 2021

Abstract

Adaptive cross-region-based guided image filtering (ACR-GIF) is a commonly used cost aggregation method. However, the weights of points in the adaptive cross-region (ACR) are generally not considered, which affects the accuracy of disparity results. In this study, we propose an improved cost aggregation method to address this issue. First, the orthogonal weight is proposed according to the structural feature of the ACR, and then the orthogonal weight of each point in the ACR is computed. Second, the matching cost volume is filtered using ACR-GIF with orthogonal weights (ACR-GIF-OW). In order to reduce the computing time of the proposed method, an efficient weighted aggregation computing method based on orthogonal weights is proposed. Additionally, by combining ACR-GIF-OW with our recently proposed matching cost computation method and disparity refinement method, a local stereo matching algorithm is proposed as well. The results of Middlebury evaluation platform show that, compared with ACR-GIF, the proposed cost aggregation method can significantly improve the disparity accuracy with less additional time overhead, and the performance of the proposed stereo matching algorithm outperforms other state-of-the-art local and nonlocal algorithms.

1. Introduction

Binocular stereo vision can acquire disparity information with required accuracy only by using image pairs of the same scene that are obtained from different angles. It is widely applied in three-dimensional reconstruction [1], three-dimensional measurement [2], robot vision [3], unmanned driving [4], and so on. The purpose of stereo matching is to find corresponding points in a pair of images. The accuracy tends to affect the precision of disparity results. So, it is a critical procedure in binocular stereo vision systems and is a topic of significant research interest in the field of computer vision.

Currently, stereo matching algorithms are mainly divided into two categories. The first category is based on deep learning. In particular, algorithms based on convolutional neural networks (CNNs) have developed rapidly in recent years. Žbontar and LeCun [5] proposed an algorithm that utilizes the Siamese network to compute the matching cost. Pairs of small image patches were used to train the network to determine the similarities among the patches. Pang et al. [6] proposed a cascade residual learning network divided into two stages, where each stage independently calculates disparity maps and multiscale residual signals. Chang et al. proposed a pyramid stereo matching network [7] comprising a spatial pyramid pooling module and a 3D CNN module. Kunal Swami et al. [8] proposed an end-to-end network model to utilize rich multiscale context information, which most existing methods cannot achieve. A large effective receiving domain is implemented to extract multiscale context information, while retaining the required spatial information in the entire network. Kim et al. [9] proposed a network architecture that uses both the matching cost volume and disparity map as inputs. Their architecture contains two subnetworks, namely, the matching probability construction network and the confidence estimation network. Such methods achieve high matching accuracy, but ground truth disparity maps are required in advance, especially for end-to-end network models.

The second category is conventional algorithms. These conventional algorithms can be classified as global, semiglobal, and local algorithms. Global algorithms usually obtain disparities by solving the minimum value of the global energy functions, including graph cuts [10] and belief propagation [11]. They are characterized by high matching accuracy and low computing efficiency. The first semiglobal algorithm was proposed by Hirschmüller [12], which mainly used the idea of dynamic programming. The matching accuracy and computing efficiency of semiglobal algorithms lie between those of global and local algorithms. Local algorithms are based on cost aggregation within the specified support region, and the matching accuracy is usually lower than those of the first two types. However, the computing efficiency is higher. Such algorithms generally employ the following four steps [13]: matching cost computation, cost aggregation, disparity computation, and disparity refinement.

Cost aggregation refers to summing or averaging the matching cost in the support region of each pixel, which directly influences the computing efficiency and accuracy of the algorithm. It is one of the most important steps and also the primary focus of many studies. Filtering-based cost aggregation methods are currently adopted by most local algorithms, in which cost aggregation is interpreted as the filtering of the matching cost volume. Compared with other filtering methods, guided image filtering (GIF) [14] is usually the preferred approach, owing to its superior filtering effect and computational efficiency.

Hosni et al. [15] were the first to apply GIF to cost aggregation, achieving good results. Various cost aggregation methods have been proposed on the basis of this approach. Based on weighting GIF [16], Hong et al. [17] proposed a cost volume filtering method, in which the adaptive weight based on the local variance is used to control linear coefficients according to local texture features; this yields better performance. Kordelas et al. [18] proposed a content-based GIF, in which two rectangular support regions with different sizes are employed. The support region was selected according to the texture homogeneity of the local area around the pixels. In order to improve the accuracy of object edges and areas with discontinuous disparities in images and to reduce the noise in the matching targets, Hamzah et al. [19] proposed an adaptive support weight based on iterative GIF. Moreover, Wu et al. [20] proposed a strategy to fuse GIF and MST (minimum spanning tree) filter, which embedded the local support region-based GIF into the MST filter based on the whole image; it improved robustness of textureless and highly textured regions. Zhu et al. [21] introduced the weight into the energy function of GIF, thus considering the entire image as a support region. This resulted in better matching accuracy.

Adaptive cross-region-based guided image filtering (ACR-GIF) is a cost aggregation method adopted by many local stereo matching algorithms currently [22–26]. However, the weights of points in the adaptive cross-region [27] (ACR) are generally not taken into account, which affects the accuracy of results. The main contributions of this paper to address this issue are summarized as follows:(1)An improved cost aggregation method is proposed. Firstly, according to the structural characteristic of the ACR, the orthogonal weight is proposed, and then the matching cost volume is filtered using ACR-GIF with orthogonal weights (ACR-GIF-OW).(2)In order to improve the computational efficiency of the proposed method, an efficient weighted aggregation computing method based on orthogonal weights is proposed.(3)Combining ACR-GIF-OW with our recently proposed matching cost computation and disparity refinement methods, a local stereo matching algorithm is proposed.

The main contributions of this paper are different from our previous paper [28]. In the previous paper, our work focused on matching cost computation and disparity refinement, so a gradient calculation method and a multistep refinement method based on ACR were proposed. However, the main contributions of this paper are mainly related to cost aggregation as mentioned above.

The remainder of this paper is structured as follows. The related work using ACR-GIF is discussed in Section 2. The proposed stereo matching algorithm using ACR-GIF-OW is described in Section 3. Experimental results and discussions are presented in Section 4 and finally, Section 5 concludes this paper.

In this section, we discuss cost aggregation methods using ACR-GIF as they are more relevant to our proposed approach. GIF is analyzed first since ACR-GIF is an improved version of GIF.

GIF utilizes support regions with fixed shapes and sizes. As a result, the matching accuracy of textureless areas or regions with discontinuous disparities in images is affected. Therefore, it has become imperative to acquire adaptive support regions according to different regions and structures in images.

Due to its simple implementation and high computing efficiency, many local stereo matching algorithms use the ACR as the support region. Yang et al. [22] established the rectangular ACR where the boundary is determined by the endpoint of the support arm, and the endpoint is the spot where the color difference exceeds the threshold and is closest to the center pixel in the given direction. This approach ensures that most pixels in the support region are similar to the center pixel. Zhu et al. [23] adopted both color difference and distance as conditions in constructing the rectangular ACR. The support arm extends when both conditions are met. This method can effectively reduce outliers from occluded regions or areas with discontinuous depths in the support region.

In addition to the rectangular ACR, the arbitrary-shaped ACR has also been commonly employed. Xu et al. [24] used the arbitrary-shaped ACR-based GIF, where the length of the support arm is determined by the color similarity of RGB channels. Zhu et al. [25] added an exponential threshold to the decision rule for arm length in order to process textureless regions; subsequently, they proposed an adaptive threshold using image variance to address the issue of the support region not being constructed in the same image structure when the brightness changes. Furthermore, in order to improve the accuracy of textureless regions, Yan et al. [26] proposed a decision rule for arm length by using dual constraint linear variable thresholds to construct the arbitrary-shaped ACR.

The above works all put emphasis on how to construct the ACR; the weight of each point in the ACR is not considered. Different from them, our method takes the orthogonal weight of each point in the ACR into consideration.

3. The Proposed Algorithm

The stereo matching algorithm proposed in this paper mainly includes five steps: (1) input image preprocessing, (2) matching cost computation, (3) cost aggregation using ACR-GIF-OW, (4) disparity computation, and (5) disparity refinement. A flowchart of the proposed algorithm is shown in Figure 1. Each step is detailed in the following sections.

3.1. Input Image Preprocessing

Since guided images are required in matching cost computation, it is necessary to preprocess the rectified input image. GIF has edge-preserving feature and good smoothing effect, and the computation time is independent of the size of the support region. Hence, guided images are obtained by using GIF.

3.2. Matching Cost Computation

To render the model more robust and achieve more accurate results, we adopt a matching cost computation method that combines the absolute differences (AD), census transformation [29], and the gradient.

AD is computed by using information on RGB channels according to the following equation:where is the value of point on the channel of the left image and is the value of the corresponding point on the channel of the right image, when the disparity is .

Census transformation firstly compares the gray value of the center point with those of other points in the window and utilizes 0 or 1 to represent the result; then, the results are linked to form a binary bit string. This can be formulated as

Here, represents a bitwise connection operation, is the window centered on the point , is an arbitrary point in , and and are gray values of points and , respectively.

Subsequently, the Hamming distance of binary bit strings between corresponding points is computed to measure the similarity between them. We assume that is the binary bit string of point in the left image and is the binary bit string of the corresponding point in the right image when the disparity is . The Hamming distance between and can be expressed as

The gradient is calculated by our recently proposed method [28]. It combines the RGB gradient of the input image and the guided image to compute the gradient in x and y directions, respectively. The expressions are as follows:where and represent the gradient in x and y directions, respectively. The superscripts and , respectively, represent the input image and the guided image.

By weighted fusion of the above-mentioned approaches, the matching cost computation function is acquired as follows:

Here, , , , and are the weight of AD, census transformation, the gradient in x direction, and the gradient in y direction, respectively; , , , and are the matching cost of the corresponding approach.

3.3. Cost Aggregation Using ACR-GIF-OW

3.3.1. ACR Construction

Points with similar color in the support region may arise from the same structure of the image; thus, these points have similar disparity [27]. In order to ensure that only points with similar color are included in the support region, the ACR is adopted. Double thresholds of the distance and color difference are used to restrict the arm length [30]. The decision rules are as follows:(1) and (2)(3)If , and

is an arbitrary point on the support arm with center point , is the point preceding in the direction of the support arm, and , . , , , and are thresholds, and and .

When the above rules are fulfilled, the center point is considered to be the starting point that expands in four directions, and the expansion stops when one of the decision rules is not satisfied. The ACR can be expressed as the union of all horizontal support arms , whose center point is on the vertical support arm of point , as shown in Figure 2.

3.3.2. The Orthogonal Weight Calculation

According to the construction process and structural feature of the ACR, we detect a horizontal path at first and then a vertical one when observed from any point to the center point. Therefore, the weight of each point relative to the center point can be computed by multiplying the weight between the adjacent points on the path [31]. Since the weight can be decomposed into two parts (the horizontal weight and the vertical weight), we name it as the orthogonal weight, as shown in Figure 3.

Here, is an arbitrary point in the , is the center point, and and represent the horizontal and vertical weights of point relative to point , respectively.

The orthogonal weight can be computed by multiplying the horizontal and vertical weights, and thus, the orthogonal weight of can be expressed as

To obtain the calculations associated with and conveniently, we construct the weight matrix of horizontal adjacent points, , and the weight matrix of vertical adjacent points, . The information on RGB channels is used to compute the weights of adjacent points, as shown in the following formulae:where and , respectively, represent the row and column indices of image . The expression of function in equations (7) and (8) is shown as follows:where parameters and are constants. The purpose of introducing function is to avoid the problem that when there is a significant local difference in the intensity between adjacent pixels on the path, information tends to get lost [21].

After calculating the matrices and , the horizontal and vertical weights of can be computed as follows:

According to equation (10), another recursive form of computing the horizontal weight is formulated as

Similarly, according to equation (11), another recursive form of computing the vertical weight is formulated as

According to equations (12) and (13), and can be computed from the center point in both directions, such that the weight of the previous point can be utilized, and only one multiplication is required in each computation.

3.3.3. The Weighted Aggregation Computing Method

On the basis of the discussion in Section 3.3.2 and inspired by the orthogonal integral image [27], we propose a weighted aggregation computing method based on orthogonal weights. Based on the feature that the orthogonal weight can be decomposed; the process of weighted aggregation is decomposed into two orthogonal directions of one-dimensional weighted aggregation as follows:(1)The weighted sum of the horizontal support arm for each point is computed. To improve the computing efficiency for any point in the image, the weighted sum of the left and right support arms is separately computed as follows: where and are the weighted sum of the left and right support arms of point , respectively; and are the beginning and ending positions of the horizontal support arm for point , respectively; is a single-channel image. For RGB images, one of the channels is selected according to the need for calculation.(2)The weighted sum of the horizontal support arm for point is computed and stored in as follows:(3)Based on , the weighted sum of the vertical support arm for each point is computed. Similarly, to improve the computing efficiency, the weighted sum of the up and bottom support arms for the center point is, respectively, computed as follows: Here, and are the weighted sum of the up and bottom support arms for point , respectively; and are the beginning and ending positions of the vertical support arm for point , respectively.(4)The weighted aggregation result of the ACR centered at point is obtained as follows:

3.3.4. ACR-GIF-OW

On the basis of the computing method described in Section 3.3.3, we adopt ACR-GIF-OW as the cost aggregation method. Since the color image has more obvious edge protection effects [14], we select the color image as the guidance image. We denote the guidance image as and the filtering input image as the matching cost volume . The linear model coefficients and can be acquired by minimizing the weighted local energy function that is formulated aswhere is the ACR centered at pixel , is a regularization parameter, and is the orthogonal weight of point defined in equation (6).

The solution to this equation is given as

Here, , , is the covariance matrix of in , and is an identity matrix.

The linear model is then used to compute the filtered result, which is also the result of cost aggregation, as shown as follows:where , , and is the matching cost volume after cost aggregation.

The comparison of results after cost aggregation using ACR-GIF [28] without/with orthogonal weights is shown in Figure 4.

(a)

(b)

3.4. Disparity Computation

We use the winner-take-all strategy [13] for the disparity computation, in which the disparity corresponding to the minimum matching cost of each point in is selected as the initial disparity. The expression is given by

Here, is the initial disparity of point .

3.5. Disparity Refinement

There are many outliers in the initial disparity map that need to be detected and corrected by disparity refinement. In this study, a multistep refinement method [28] proposed by us recently is adopted, and each step has been elucidated in the following sections.

3.5.1. Left-Right Consistency Check and Outlier’s Classification

The left-right consistency check judges whether the disparities of two points satisfy the condition given in the following equation:where and represent the initial disparity map of the left and right images, respectively; and are the point indices.

Subsequently, the detected outliers are divided into two classes: one has the corresponding point in the right image, and the other does not exist the corresponding point in the right image. The first class is called the corresponding outlier, and the second is called the no-corresponding outlier. The steps described below correct the two classes separately.

3.5.2. ACR Voting

To replace the disparities of outliers with that of reliable points, we first use ACR voting, in which the total number of votes of reliable points and the highest number of votes among different disparities are counted. We then consider the following conditions:

Here, is the total number of votes, is the highest number of votes among different disparities, and and are thresholds. If both equations (23) and (24) are satisfied, the disparity corresponding to the highest number of votes is used to replace the outlier’s disparity. Meanwhile, the outlier is marked as reliable. In order to deal with as many outliers as possible, this step is iterated five times.

3.5.3. ACR Four-Direction Propagation Interpolation

For corresponding outliers, the nearest reliable points are found in their own ACR along the directions of the four support arms. The corresponding disparities are separately marked as , , , and . Then, the disparities of these outliers are replaced by equation (25), and they are marked as reliable points.

Here, and . For dealing with as many outliers as possible, this step is iterated three times.

3.5.4. Two-Direction Propagation Interpolation

For remaining corresponding outliers, the nearest reliable points are found along the left and right directions, and the corresponding disparities are recorded as and , respectively. Then, the disparities of these outliers are replaced by equation (26), and they are marked as reliable points.

Here, .

3.5.5. No-Corresponding Outliers Interpolation

After the above-mentioned steps, the remaining outliers are mainly no-corresponding outliers. Since such outliers usually appear in the leftmost area of the image, we use one-direction propagation interpolation; that is, the nearest reliable point is found along the right side of the outlier. The disparity of the outlier is then replaced and the outlier is marked as reliable.

3.5.6. Subpixel Refinement

To reduce the error caused by disparity discontinuity, an approach based on quadratic polynomial interpolation is used for subpixel refinement as follows:where is the disparity of point after previous steps; and are cost aggregation results of point when the disparity is and , respectively. At last, the median filter is used to smooth the disparity result.

4. Experimental Results and Discussions

We carried out our experiments on Middlebury evaluation platform [32], whose dataset includes two parts: training sets and test sets. Every part has fifteen image pairs with different resolution at least pixels. Owing to the high resolution, complicated scene structure, and different lighting or exposure conditions, the results of the dataset can actually reflect the robustness and accuracy of the algorithm.

The parameters and thresholds in the proposed stereo matching algorithm are set as follows: , , , , , , , , , , , , and , where and represent the height and width of the input image, respectively. Among them, the values of , , , and are referenced from [23], the values of , , , , , and are referenced from [28], the value of is the same as in [15].

4.1. Efficiency of the Proposed Weighted Aggregation Computing Method

In order to verify the effectiveness of the proposed weighted aggregation computing method, the computation time of straightforward computing (the weight of each point in the ACR is computed according to equations (12) and (13) and then summed by traversal) and the computing method described in Section 3.3.3 are compared. The experimental environment used is Matlab 2018b, and the computer configuration is Intel Core i7-8750H CPU and 16G memory. The results of training sets are shown in Figure 5.

The chart illustrates that the computation time of the proposed computing method is obviously less than that required for straightforward computing. Among them, the computation time for Shelves is reduced by a maximum of 80.2%; the computation time for Recycle, Vintage, Jadeplant, and Adirondack is, respectively, reduced by 79.9%, 79.1%, 78.9%, and 78%. Owing to the relatively low resolution and disparity level, the percentage reduction of Teddy and ArtL is comparatively low, that is, 64.6% and 55.6%, respectively. The computation time is reduced by 72.7% on average. The above data indicate that, compared with straightforward computing, the proposed computing method can effectively reduce computing time and improve computing efficiency.

4.2. Comparison of ACR-GIF and ACR-GIF-OW

To verify the effect of the proposed cost aggregation method, two stereo matching algorithms that, respectively, adopt ACR-GIF used in [28] and ACR-GIF-OW for cost aggregation are compared in terms of their disparity result and time overhead. Except for cost aggregation, all other steps of two algorithms are identical.

4.2.1. Comparison of Disparity Results

The metric bad 2.0 is used to quantitatively evaluate the accuracy of disparity results. It is the default metric of Middlebury evaluation platform that represents the percentage of bad pixels with disparity errors greater than 2.0 pixels. The results of the training sets are shown in Figure 6.

(a)

(b)

As observed in Figure 6(a), in nonoccluded regions, except Shelves and Teddy, the values of bad 2.0 for the remaining images are reduced to varying degrees. Among them, the value of Motorcycle and MotorcycleE can be reduced by more than 40%; the value of Adirondack, PlaytableP, and Recycle can be reduced by more than 30%; the value of Piano, Pipes, and Vintage can be reduced by more than 20%. From Figure 6(b), we can see that, in all regions, except ArtL, Shelves, and Teddy, the values of bad 2.0 for the remaining images are also reduced to varying degrees. Among them, the values of Motorcycle and MotorcycleE are reduced by more than 30%; the values of Adirondack, PlaytableP, and Recycle are reduced by more than 20%; the values of Piano, Playtable, and Vintage are reduced by more than 15%.

Figure 7 shows the comparison result of the bad 2.0 weighted average on training sets, which is obtained from Middlebury evaluation platform (the weight of each image is given by the platform). It can be seen from Figure 7 that the value of ACR-GIF-OW is evidently lower than that of ACR-GIF. The weighted average can be reduced by 24.1% and 16.3% in nonoccluded regions and all regions, respectively.

According to the above results, we can conclude that, compared to that of ACR-GIF, the accuracy of disparity results obtained by using ACR-GIF-OW is significantly superior in both nonoccluded regions and all regions.

Next, to compare the results of two algorithms more intuitively, we select three images from training sets and make comparisons of disparity maps and the corresponding error maps acquired by two algorithms, as shown in Figure 8.

(a)

(b)

(c)

We find that, in error maps of ACR-GIF-OW, black regions in red boxes are apparently smaller in area (the black color implies that the disparity errors are greater than 2.0 pixels), and these red boxes mainly correspond to weakly textured and textureless regions in the image. The above result indicates that ACR-GIF-OW can improve the disparity accuracy of these regions, thereby improving the overall disparity accuracy.

Furthermore, the performance on the low texture, repetitive pattern, plain color, and discontinue regions is compared, as shown in Figure 9.

It can be found in Figure 9 that the performance of ACR-GIF-OW is obviously better than ACR-GIF (the area of black region in the red box), especially in low texture, plain color, and discontinue regions.

4.2.2. Comparison of Time Overhead

Using the same experimental environment and computer as described in Section 4.1, the time overhead of ACR-GIF and ACR-GIF-OW is compared. The result is shown in Figure 10.

The chart illustrates that the time overhead of ACR-GIF-OW is more than that of ACR-GIF, owing to the weights’ computation. Among them, the time overhead of Teddy has the lowest growth rate of 4.5%; the time overhead of Adirondack has the highest growth rate of 17.6%. The average growth rate of time overhead on training sets is 12.7%.

Combining the results acquired in Sections 4.2.1 and 4.2.2, we can conclude that comparing the increase in time overhead shows that ACR-GIF-OW exhibits an obvious improvement with regard to the disparity accuracy. Thus, considering both the accuracy and the time overhead, the proposed method is advantageous over ACR-GIF.

4.3. Analysis of Parameter Setting

Parameters and are two key parameters in the process of the orthogonal weight calculation. Figure 11 shows the effect of parameters and with different settings.

(a)

(b)

Figure 11(a) indicates that when , the disparity accuracy of both nonoccluded and all regions becomes worse; on the contrary, the accuracy remains unchanged. Figure 11(b) reveals that when , the disparity accuracy can achieve its best in both nonoccluded and all regions. According to the above conclusions, the best setting of and is and , respectively.

4.4. Effect of Each Step in the Proposed Algorithm

The proposed algorithm is composed of several steps. For analyzing how does each step affect the final result, except for bad 2.0, the weighted average of avgerr on training sets is also used. Avgerr is another metric that means average absolute error in pixels. The results after performing each step are shown in Figure 12.

(a)

(b)

Figure 12 presents the contribution of each step to the reduction of disparity errors in both nonoccluded and all regions. After performing CA, the value of bad 2.0 in nonoccluded and all regions can be decreased by 39.4% and 31.1%, respectively; the value of avgerr in nonoccluded and all regions can be decreased by 56.4% and 34.4%, respectively. After performing DR, the value of bad 2.0 can be, respectively, reduced by 27.7% and 22.9% in nonoccluded and all regions; the value of avgerr can be, respectively, reduced by 29.5% and 47.6% in nonoccluded and all regions.

Moreover, Figure 13 shows the effect of each step in DR. The charts indicate that the contribution of each step is distinct for different metrics and regions. For the bad 2.0, Step 5 is the most effective. But for the avgerr, the errors are significantly reduced by Step 1. In all regions, Step 2 is more effective than in nonoccluded regions, and so are Steps 3 and 4. Thus, the combination of these steps guarantees a better result.

(a)

(b)

4.5. Comparison with State-of-the-Art Stereo Matching Algorithms

4.5.1. Comparison with Other Local Stereo Matching Algorithms

To verify the performance of the local stereo matching algorithm proposed in this paper, we select seven state-of-the-art local algorithms for comparison, namely, DAWA-F [33], FASW [20], IEBIMst [34], ADSM [35], DoGGuided [36], IGF [37], and ISM [38]. The disparity map comparison of five stereo images in Middlebury datasets is shown in Figure 14.

To make a quantitative comparison of disparity results, the metric bad 2.0 is employed again. The comparison results of whole datasets are shown in Tables 1 and 2, where the bold fonts indicate the best results. The weighted average shown in the last row is the weighted average of training sets and test sets. The weights are given by the Middlebury evaluation platform.

The results of Tables 1 and 2 indicate that, whether in nonoccluded regions or all regions, the number of the best results obtained by the proposed method is higher than other local algorithms, and the rest of the results are also relatively good. Besides, both of the weighted average values are the best as well. Furthermore, for image pairs with different illuminations like ArtL, PianoL, and DjembeL and with different exposures like MotorcycleE and Classroom2E, better results are acquired by the proposed algorithm. This demonstrates that the proposed algorithm has better robustness when the illumination or exposure changes for a pair of images. To summarize, the performance of the proposed algorithm is evidently better than those of the other seven state-of-the-art local algorithms.

4.5.2. Comparison with Other Nonlocal Stereo Matching Algorithms

In order to make a more comprehensive comparison, except for local algorithms, we also select six state-of-the-art nonlocal algorithms for comparison, namely, DDL [39], LS_ELAS [40], TSGO [41], DSGCA [42], SIGMRF [43], and SPPSMNet [44]. Among them, DSGCA, SIGMRF, and SPPSMNet are algorithms based on deep learning. The disparity map comparison of the same five stereo images is shown in Figure 15.

The comparison results of bad 2.0 are shown in Tables 3 and 4, where the bold fonts also indicate the best results.

Similar to Tables 1 and 2, the results of Tables 3 and 4 can also indicate that the robustness and performance of the proposed algorithm are obviously better than those of the other six state-of-the-art nonlocal algorithms.

5. Conclusions

In this study, an improved cost aggregation method is proposed, in which the matching cost volume is filtered by ACR-GIF-OW. Different from other methods adopted ACR-GIF, the proposed method takes the orthogonal weight of each point in the ACR into consideration. For improving the computational efficiency of the proposed method, a weighted aggregation computing method based on orthogonal weights is proposed. Moreover, a local stereo matching algorithm using ACR-GIF-OW is proposed as well. Experimental results demonstrate that, compared with that of ACR-GIF, the disparity accuracy of ACR-GIF-OW is significantly improved at the cost of a smaller growth of the time overhead, and the stereo matching algorithm proposed in this paper exhibits superior performance than those of other state-of-the-art local and nonlocal algorithms. In the future work, we will introduce the orthogonal weight in the disparity refinement to further improve the disparity accuracy.

Data Availability

The dataset used to support the findings of this study are included in the article, which are cited at relevant places within the text as [32].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 61901287), the Key Research and Development Project of Sichuan Province (Grant no. 2020YFG0112; Grant no. 2020YFG0306), the Major Science and Technology Project of Sichuan Province (Grant nos. 2018GZDZX0024, 2019ZDZX0039, and 2018GZDZX0029), and the Science and Technology Planning Project of Sichuan Province (Grant no. 2020YFG0288). The authors would like to thank Associate Professor Yue Wu for reviewing the manuscript.

References

R. A. Hamzah, H. Ibrahim, and A. H. A. Hassan, “Stereo matching algorithm for 3D surface reconstruction based on triangulation principle,” in Proceedings of the 2016 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 119–124, Yogyakarta, Indonesia, July 2016.
View at: Google Scholar
X. Feng, J. Liu, Y. Deng, and S. Xu, “The measurement of three-dimensional points based on the single camera stereo vision sensor,” in Proceedings of the 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 278–281, Changsha, China, August 2017.
View at: Google Scholar
G. Zhuo, Y. Zhu, and G. Liang, “A 3D terrain reconstruction method of stereo vision based quadruped robot navigation system,” in Proceedings of the Seventh International Conference on Electronics and Information Engineering Proceedings of the SPIE, Dalian, China, September 2017.
View at: Google Scholar
X. Xiongwu, “Multi-view stereo matching based on self-adaptive patch and image grouping for multiple unmanned aerial vehicle imagery,” Remote Sensing, vol. 8, no. 2, p. 89, 2016.
View at: Google Scholar
J. Zbontar and Y. LeCun, “Stereo matching by training a convolutional neural network to compare image patches,” Journal of Machine Learning Research, vol. 17, no. 65, pp. 1–32, 2016.
View at: Google Scholar
J. Pang, W. Sun, J. S. Ren et al., “Cascade residual learning: a two-stage convolutional neural network for stereo matching,” in Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 878–886, Venice, Italy, June 2017.
View at: Publisher Site | Google Scholar
J. Chang and Y. Chen, “Pyramid stereo matching network,” in Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418, Salt Lake City, UT, USA, December 2018.
View at: Google Scholar
K. Swami, K. Raghavan, N. Pelluri et al., “DISCO: depth inference from stereo using context,” in Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 502–507, Shanghai, China, May 2019.
View at: Google Scholar
S. Kim, D. Min, S. Kim et al., “Unified confidence estimation networks for robust stereo matching,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1299–1313, 2019.
View at: Google Scholar
V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” in Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV 2001, pp. 508–515, Vancouver, Canada, April 2001.
View at: Google Scholar
J. Sun, N.-N. Zheng, and H.-Y. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, 2003.
View at: Google Scholar
H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
View at: Google Scholar
D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” in Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), pp. 131–140, Kauai, HI, USA, August 2001.
View at: Google Scholar
K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397–1409, 2013.
View at: Google Scholar
A. Hosni, C. Rhemann, M. Bleyer et al., “Fast cost-volume filtering for visual correspondence and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 504–511, 2013.
View at: Google Scholar
Z. Li, J. Zheng, Z. Zhu, W. Yao, and S. Wu, “Weighted guided image filtering,” IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 120–129, 2015.
View at: Google Scholar
G. Hong and B. Kim, “A local stereo matching algorithm based on weighted guided image filtering for improving the generation of depth range images,” Displays, vol. 49, pp. 80–87, 2017.
View at: Google Scholar
G. A. Kordelas, D. S. Alexiadis, P. Daras et al., “Content-based guided image filtering, weighted semi-global optimization, and efficient disparity refinement for fast and accurate disparity estimation,” IEEE Transactions on Multimedia, vol. 18, no. 2, pp. 155–170, 2016.
View at: Publisher Site | Google Scholar
R. A. Hamzah, A. F. Kadmin, M. S. Hamid et al., “Improvement of stereo matching algorithm for 3D surface reconstruction,” Signal Processing: Image Communication, vol. 65, pp. 165–172, 2018.
View at: Google Scholar
W. Wu, H. Zhu, S. Yu et al., “Stereo matching with fusing adaptive support weights,” IEEE Access, vol. 7, pp. 61960–61974, 2019.
View at: Google Scholar
C. T. Zhu and Y. Z. Chang, “Efficient stereo matching based on pervasive guided image filtering,” Mathematical Problems in Engineering, vol. 2019, Article ID 3128172, 11 pages, 2019.
View at: Publisher Site | Google Scholar
Q. Q. Yang, “Fast stereo matching using adaptive guided filtering,” Image & Vision Computing, vol. 32, pp. 202–211, 2014.
View at: Google Scholar
S. P. Zhu and L. N. Yan, “Local stereo matching algorithm with efficient matching cost and adaptive guided image filter,” Visual Computer, vol. 33, no. 9, pp. 1087–1102, 2017.
View at: Google Scholar
Y. Xu, Y. Zhao, and M. Ji, “Local stereo matching with adaptive shape support window based cost aggregation,” Applied Optics, vol. 53, no. 29, pp. 6885–6892, 2014.
View at: Google Scholar
S. Zhu, Z. Wang, X. Zhang et al., “Edge-preserving guided filtering based cost aggregation for stereo matching,” Journal of Visual Communication and Image, vol. 39, pp. 107–119, 2016.
View at: Google Scholar
L. Yan, R. Wang, H. Liu et al., “Stereo mactching method based on improved cost computation and adaptive guided filter,” Acta Optica Sinica, vol. 38, no. 11, Article ID 1115007, 2018.
View at: Google Scholar
K. Zhang, J. Lu, and G. Lafruit, “Cross-based local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
View at: Google Scholar
L. Y. Kong, J. P. Zhu, and S. C. Ying, “Stereo matching based on guidance image and adaptive support region,” Acta Optica Sinica, vol. 40, no. 9, Article ID 0915001, 2020.
View at: Google Scholar
R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in Proceedings of the European Conference on Computer Vision, pp. 151–158, Stockholm, Sweden, June 1994.
View at: Google Scholar
X. Mei, X. Sun, M. Zhou et al., “On building an accurate stereo matching system on graphics hardware,” in Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 467–474, Barcelona, Spain, May 2011.
View at: Google Scholar
Q. Yang, D. Li, L. Wang et al., “Full-image guided filtering for fast stereo matching,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 237–240, 2013.
View at: Google Scholar
D. Scharstein, R. Szeliski, and H. Hirschmuller, “Middlebury stereo evaluation version 3,” 2014, https://vision.middlebury.edu/stereo/eval3/.
View at: Google Scholar
J. Navarro and A. Buades, “Semi-dense and robust image registration by shift adapted weighted aggregation and variational completion,” Image and Vision Computing, vol. 89, pp. 258–275, 2019.
View at: Google Scholar
C. He, C. Zhang, Z. Chen et al., “Minimum spanning tree based stereo matching using image edge and brightness information,” in Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–5, Shanghai, China, September 2017.
View at: Google Scholar
M. Ning, “Accurate dense stereo matching based on image segmentation using an adaptive multi-cost approach,” Symmetry, vol. 8, p. 159, 2016.
View at: Google Scholar
M. Kitagawa, I. Shimizu, and R. Sara, “High accuracy local stereo matching using DoG scale map,” in Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 258–261, Nagoya, Japan, May 2017.
View at: Google Scholar
R. A. Hamzah, H. Ibrahim, and A. H. A. Hassan, “Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation,” Journal of Visual Communication and Image Representation, vol. 42, pp. 145–160, 2017.
View at: Google Scholar
R. A. Hamzah, A. F. Kadmin, M. S. Hamid, S. F. A. Ghani, and H. Ibrahim, “Improvement of stereo matching algorithm for 3D surface reconstruction,” Signal Processing: Image Communication, vol. 65, pp. 165–172, 2018.
View at: Google Scholar
J. Yin, “Sparse representation over discriminative dictionary for stereo matching,” Pattern Recognition, vol. 17, 2017.
View at: Google Scholar
R. A. Jellal, M. Lange, B. Wassermann et al., “Line segment based efficient large scale stereo matching,” in Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 146–152, Singapore, August 2017.
View at: Google Scholar
M. G. Mozerov and J. Van de Weijer, “Accurate stereo matching by two-step energy minimization,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1153–1163, 2015.
View at: Google Scholar
I. K. Park, “Deep self-guided cost aggregation for stereo matching,” Pattern Recognition Letters 112, vol. 41, pp. 168–175, 2018.
View at: Google Scholar
S. Nahar and M. V. Joshi, “A learned sparseness and IGMRF-based regularization framework for dense disparity estimation using unsupervised feature learning,” IPSJ Transactions on Computer Vision and Applications, vol. 9, p. 1, 2017.
View at: Google Scholar
F. Yang, Q. Sun, H. Jin, and Z. Zhou, “Superpixel segmentation with fully convolutional networks,” in Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13961–13970, Seattle, WA, USA, November 2020.
View at: Google Scholar

Copyright

Copyright © 2021 Lingyin Kong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

781

Downloads

774

Citations

Mathematical Problems in Engineering

Local Stereo Matching Using Adaptive Cross-Region-Based Guided Image Filtering with Orthogonal Weights

Abstract

1. Introduction

2. Related Work

3. The Proposed Algorithm

3.1. Input Image Preprocessing

3.2. Matching Cost Computation

3.3. Cost Aggregation Using ACR-GIF-OW

3.3.1. ACR Construction

3.3.2. The Orthogonal Weight Calculation

3.3.3. The Weighted Aggregation Computing Method

3.3.4. ACR-GIF-OW

3.4. Disparity Computation

3.5. Disparity Refinement

3.5.1. Left-Right Consistency Check and Outlier’s Classification

3.5.2. ACR Voting

3.5.3. ACR Four-Direction Propagation Interpolation

3.5.4. Two-Direction Propagation Interpolation

3.5.5. No-Corresponding Outliers Interpolation

3.5.6. Subpixel Refinement

4. Experimental Results and Discussions

4.1. Efficiency of the Proposed Weighted Aggregation Computing Method

4.2. Comparison of ACR-GIF and ACR-GIF-OW

4.2.1. Comparison of Disparity Results

4.2.2. Comparison of Time Overhead

4.3. Analysis of Parameter Setting

4.4. Effect of Each Step in the Proposed Algorithm

4.5. Comparison with State-of-the-Art Stereo Matching Algorithms

4.5.1. Comparison with Other Local Stereo Matching Algorithms

4.5.2. Comparison with Other Nonlocal Stereo Matching Algorithms

5. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright