Mathematical Problems in Engineering

Volume 2019, Article ID 3128172, 11 pages

https://doi.org/10.1155/2019/3128172

## Efficient Stereo Matching Based on Pervasive Guided Image Filtering

^{1}School of Microelectronics, Tianjin University, Tianjin 300072, China^{2}Department of Mechanical Engineering, Chang Gung University, Taoyuan 33302, Taiwan^{3}Department of Neurosurgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan

Correspondence should be addressed to Yau-Zen Chang; wt.ude.ugc.liam@nez

Received 4 March 2019; Accepted 3 April 2019; Published 17 April 2019

Academic Editor: Oscar Reinoso

Copyright © 2019 Chengtao Zhu and Yau-Zen Chang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper presents an effective cost aggregation strategy for dense stereo matching. Based on the guided image filtering (GIF), we propose a new aggregation scheme called Pervasive Guided Image Filtering (PGIF) to introduce weightings to the energy function of the filter which allows the whole image pair to be taken into account. The filter parameters of PGIF are calculated as two-dimensional convolution using the bright and spatial differences between the corresponding pixels, which can be incrementally calculated for efficient aggregation. The complexity of the proposed algorithm is O(*N*), which is linear to the number of image pixels. Furthermore, the algorithm can be further simplified into O(*N*/4) without significantly sacrificing accuracy if subsampling is applied in the stage of parameter calculation. We also found that a step function to attenuate noise is required in calculating the weights. Experimental evaluation on version 3 of the Middlebury stereo evaluation datasets shows that the proposed method achieves superior disparity accuracy over state-of-the-art aggregation methods with comparable processing speed.

#### 1. Introduction

Stereoscopic vision is less invasive and valuable for many applications, such as 3D reconstruction and environmental detection of autonomous vehicles, as it relies only on pairs of images captured from different perspectives. In stereo vision systems, stereo matching algorithms are critical for correct and accurate disparity estimation, finding corresponding pixels in matching images for each pixel in the reference image.

According to [1], dense stereo matching algorithms fall into two categories: local methods and global methods. The global methods treat disparity calculations as a minimization problem, where the objective function consists of a measurement part and a penalty part. The measurement part indicates the similarity between the slice on the pair of images and the penalty part to suppress the change in disparity. Representative global methods include Belief Propagation [2], Graph Cut [3], and Dynamic Programming [4]. These techniques require a lot of calculations and are not applicable for real-time applications.

In contrast, the local methods are popular for fast disparity calculations. Local approaches tackle the effects of light changes through local windows and are categorized into parametric [5] and nonparametric methods [6]. Local algorithms generally perform four stages [1]: (1) preliminary matching cost calculations, (2) cost aggregation over support regions, (3) disparity estimation, and (4) disparity refinement.

Cost aggregation plays a key role in the local stereo matching algorithms. Early approaches, such as in [7], achieved limited performances, especially in discontinues and occlusion regions. To accommodate the assumption that pixels are of similar disparity within the same aggregation windows [8], many adaptive weighting aggregation methods, such as in [9–11], were presented.

Recent years have seen a trend to treat the cost aggregation as image filtering. The guided image filter [12] (GIF) is able to provide superior edge profiles without gradient-reversal artifacts, and was successfully applied to cost aggregation in [13]. In order to reduce the computational load of the GIF, [14] recommends subsampling the cost and the guidance image to calculate the coefficients. Ref. [15] proposed the Weighted Guided Image Filtering (WGIF) to improve GIF by modifying the regularization term of the energy function. Besides, [16, 17] applied WGIF for stereo matching with limited performance because of the lack of pixel information outside the fixed windows.

In order to improve the precision of stereo matching, a series of approaches with adaptive guided filters were proposed [18–20] to remove the limitation of the fixed-window formulation. Among them, [20] adaptively tunes the size of rectangular support windows based on both the intensity difference and distance between the neighboring pixel and the central pixel. However, these approaches still suffer from the loss of information outside the windows. Ref. [21] uses whole image for matching cost calculation, where the weights of aggregation are computed according to a measure of similarity between neighboring pixels in the guidance image. In addition, a new scheme called weight propagation is proposed to efficiently calculate weights.

In this work, we extend the scheme of GIF [12] for disparity cost aggregation, as suggested in [13], by introducing bilateral weights that take distance and intensity differences into account, as suggested in [21]. This approach uses the whole image for aggregation. Similar to the convolution procedure of [21], the complexity of the proposed algorithm is O(*N*), which is linear to the image resolution. We call our approach Pervasive Guided Image Filtering, denoted as PGIF.

The main contribution of this paper is a new cost aggregation algorithm for stereo matching which can be summarized as follows:

(1) An innovative aggregation scheme is proposed that weights the cost function of the GIF to allow for consideration of the entire image pair.

(2) We demonstrate that a constraint modification of the aggregation weight by a step function is crucial to avoid value reduction in the weight propagation process. This minor modification further improves the accuracy of disparity.

(3) The proposed aggregation algorithm can be calculated as two-dimensional convolution with complexity O(*N*) using the weight-propagation method of [21]. Besides, the algorithm can be further simplified into O(*N*/4) without sacrificing accuracy if subsampling is applied in calculating the parameters of the guided image filtering, as suggested by [14].

(4) A performance evaluation of version 3 of the Middlebury Stereoscopic Evaluation Dataset demonstrates the superiority of the proposed method to most state-of-the-art aggregation methods in terms of disparity accuracy and processing speed.

#### 2. Aggregations of Preliminary Cost

##### 2.1. Cost Function Definition and GIF Aggregation

In a local stereo matching procedure, a disparity map is obtained through four steps: (1) computation of the preliminary matching cost, (2) aggregation of the cost via volumetric filtering, (3) disparity selection via winner-take-all (WTA), and (4) postprocessing for disparity refinement. In the first step, the preliminary matching cost is a three-dimensional array of dissimilarity measures for every pixel within potential disparities. When a pixel at is assigned a disparity value* d*, we use the notation as the preliminary cost based on the left image.

There are many metrics that can be used to measure the degree of matching between image patches, such as sum of squared difference and normalized cross correlation. In the following investigations, we use the basic metric, the truncated absolute difference of the gradient, for the cost:where and are the left and right images of the stereo pair and is a truncation threshold, normally assigned as 2, which is applied to reduce mismatch in noisy or obscured regions. This concise arrangement also reflects the effectiveness of the proposed scheme.

The general form of cost aggregation in the traditional local stereo matching algorithms can be written as a weighted sum of the preliminary matching cost:where is the aggregated cost and are pixels within a window centered at pixel .

Equation (2) is a general form of window-based cost aggregation. When the aggregation weight is set to 1, we have the simplest stereo matching algorithm, such as in [1]. The performance of the window-based approaches depends on the correct choice of the size of the support window, since the effects of external pixels are ignored.

Full Image-Guided Filtering [21] (FIF) is one of the methods using the entire image as a supporting window, which implements a scheme called weight propagation for the cost aggregation:In (3), is a constant, is the path between a pixel pair , and is the Euclidian distance between the intensities of two pixels, and , along the path. Note that the guided image filtering [12] (GIF) scheme is not employed in this method.

On the other hand, Fast Cost-Volume Filtering method [13] (FCVF) achieves significant performance compared to most local stereo matching methods. In this method, the weights are not explicitly calculated, as shown in (2). Instead, based on the principle of GIF [12], the filtered cost assumes a local linear model in terms of the guidance image such thatwhere and are obtained by minimizing an energy function within the supporting window . The energy function for the parameters,* a* and* b*, is defined aswhere is the normalization parameter used to limit the magnitude of , usually chosen to be 0.0001. The optimal values of the linear parameters in (5), denoted as and , arewhere is the number of pixels in the support window centered on . This GIF-based aggregation can also be rearranged into the form of (2) by replacing the parameters of (4) with the best values of (6): where is the mean values of and is the variance of

##### 2.2. The Proposed Aggregation Scheme: Pervasive Guided Image Filtering (PGIF)

According to the investigation of the last section, we have that a proper image filtering algorithm is critical for the accuracy of stereo matching. However, most GIF-based local stereo matching methods suffer from the same problem of missing information outside the supporting window.

In this section, we propose a GIF-based scheme that uses the full image for the filtering and call it the Pervasive Guided Image Filtering (PGIF). Similar to (4), the scheme also approximates the filtered cost as a linear model in terms of the guidance image However, the corresponding energy function for the linear parameters is defined as a weighted version of (5):where is a regularization parameter to restrict the value of and the weight reflects the relative importance of the pixel* q* = (*i*,* j*) with respect to the pixel* p* = (*x*,* y*). Importantly, the weight is calculated as a multiplication of horizontal and vertical weighting factors, and :whereand

The parameter is a constant filter factor, where the larger value corresponds to a higher weight. As , successive multiplication of them ensures smaller weights for longer distances. Furthermore, the step function defined in (11) is introduced to avoid the loss of information when there is a significant local density difference along the path from pixel to pixel . By this arrangement, the values of the weightings stay within the interval , which are proportional to density similarity.

The necessity of introducing the function is illustrated in Figure 1, where all pixel values are 10, except , which is 99. Assuming , we have that However, if is not applied, we have that , making the intensity of invisible to both and