Abstract
A stereo similarity function based on local multi-model monogenic image feature descriptors (LMFD) is proposed to match interest points and estimate disparity map for stereo images. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.
1. Introduction
Shape and motion estimation from stereo images has been one of the core challenges in computer vision for decades. The robust and accurate computation of stereo depth is an important problem for many visual tasks such as machine vision, virtual reality, robot navigation, simultaneous localization and mapping, depth measurements, and 3D environment reconstruction. Most of conventional approaches, such as intensity-based or correlation-based matching, feature-based matching, and matching function optimization techniques, estimate the disparity only based on local intensity and feature between stereo images so that the results may be susceptible to level shift, scaling, rotation, and noise [1, 2].
To overcome these drawbacks, we propose a new method for establishing spatial correspondences between a pair of color images. Unlike classical stereo-matching method based on brightness constancy assumption and phase congruency constraint, we match feature points and estimate disparity map between stereo images based on a new local multi-model monogenic image feature descriptors in the Color Monogenic Signal framework [3, 4]. We firstly introduce the monogenic signal [5] of 2D gray level image using Dirac operator and Laplace equation and extract local amplitude, local orientation, and instantaneous phase information in multiscale space [6]. At the same time, the 2D monogenic signal is extended to color image, and the color monogenic signal is introduced based on Clifford algebras. The local color phase is estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space with Clifford algebras [3, 4]. Then we focus on defining new local multi-model monogenic image feature descriptors which contain local geometric (local orientation), structure (instantaneous phase), and color (local color phase and color values) information in the Color Monogenic Signal framework. Based on the proposed image feature descriptors, a stereo similarity function between two primitives is also defined to solve stereo correspondence problem. Finally, we test the performance of the proposed approach on the synthetic and natural stereo images, and experiment results are given in detail.
2. Monogenic Signal Analysis in Poisson Scale Space
2.1. Modeling 2D Image Signal
Based on the results of Fourier theory and functional analysis, we assume that each 2D signal can be locally modeled by a superposition of arbitrarily orientated one-dimensional cosine waves [6]: with as the convolution operator and the orientation . Note that each cosine wave is determined with the same amplitude and phase information. The Poisson convolution kernel reads For a certain scale space parameters , the Poisson kernel acts as a low pass filter on the original signal . The Poisson scale space is naturally related to the generalized Hilbert transform by the Cauchy kernel. To filter a frequency interval of interest, the difference of Poisson (DoP) kernel will be used in practice: with and as the coarse scale parameter and as the fine scale parameter. The filtered signal is defined by convolution with the difference of Poisson kernel which will be used to analyze the original with the DoP operator to consider only a small passband of the original signal spectrum. Without loss of generality the signal model in (2.1) degrades locally at the origin of a local coordinate system to In case of image analysis lines, edges, junctions, and corners can be models in this way. The signal processing task is now to determine the local amplitude , the local orientation , and the local phase for a certain scale space parameter and a certain location . This problem has been already solved for one-dimensional signals by the classical analysis [7] by means of the Hilbert transform [8] and for intrinsically one-dimensional signals [9] by the two-dimensional monogenic signal by means of the generalized first-order Hilbert transforms [10].
2.2. The Analytic Signal
Let : be a real-valued signal, and let : [3] be a vector-valued signal such that [4]. The purpose is to construct a function fulfilling the Dirac equations whose real part is real-valued signal. It is equivalent to find the solution of a boundary value problem of the second kind (a Neumann problem): with and .
The first equation in (2.5) is the 2D Laplace equation restricted to the open domain . The second equation is called the boundary condition and the basis vector. is coherent with the embedding of complex functions as fields (the real part is embedded as the -component). Using the fundamental solution of the 2D Laplace equation, the solution of the problem leads to where is the 1D-Poisson kernel and is the Hilbert kernel. The variable is a scale parameter, and, taking it equal to zero, the classical analytic signal can be obtained.
2.3. Instantaneous Phase of the Monogenic Signal
Following the previous construction of the analytic signal, Michael Felsberg and Gerald Sommer has proposed an extension to 2D signals and defined a monogenic signal which is the combination of a gray image with its Riesz transform [5]. Let be a real-valued signal and a vector-valued signal, and is the orthonormal basis of such that . According to the 3D Laplace equation restricted to the open half-space and the boundary condition of the second kind, we can obtain the monogenic signal as follows: where is a 2D Poisson kernel and is the Riesz kernel, extension on 2D of the Hilbert kernel, is equal to the scale space parameter . Now let and . In case of intrinsic dimension one signals (The intrinsic dimension expresses the number of degrees of freedom necessary to describe local structure. Constant signals without any structure are of intrinsic dimension zero (i0D), arbitrary orientated straight lines and edges are of intrinsic dimension one (i1D), and all other possible patterns such as corners and junctions are of intrinsic dimension two (i2D). In general i2D signals can only be modeled by an infinite number of superimposed i1D signals.) (i.e., in (2.1)), we can obtain: where , , and represent the local amplitude, local orientation, and instantaneous phase, respectively: However, we do not know the correct signal of the phase since it depends on the directional sense of . The best possible solution is to project it onto as
2.4. Local Color Phase of the Color Monogenic Signal
In 2009, Demarcq et al. constructed a scale-space signal for color images seen as vectors in [4]. Let be a real-valued signal and a vector-valued signal, and is the orthonormal basis of such that . Then a color image is decomposed in the RGB space represented as the subspace spanned by . Now we need to find a function which is monogenic and the -, - and -component, of which are the components , , and , respectively. with and . A scale-space signal which has independent scales in each component can be defined by splitting the problem into three boundary value problems in as follows:
Each solution of the system in (2.12) leads to monogenic functions , , : they satisfy the Dirac equation in each subspace and consequently the Dirac equation in . Let , then is still monogenic in (i.e., ) and satisfies the boundary conditions. So the scale-space color monogenic signal can be obtained by using the Dirac operator and the Laplace equation as follows: where is a 2D Poisson kernel and is the Riesz kernel. As for the analytic or monogenic signal, a color image can be represented in terms of local amplitude and local phase. Now our proposal is to use the geometric product in order to compare two vectors in .
In the Clifford algebra of the Euclidean vector space , the product of two vectors and , embedded in , is given by [3, 4] where is the inner product and , the wedge product of and , is a bivector. This product is usually called the geometric product of and . The inner product is symmetric, and the wedge product is skew symmetric. If is a chosen vector containing structure information and color information , then the geometric product of and can be given by where denotes the scalar part, the bivector part, and the magnitude of the bivector part [6]. According to Clifford algebras, the geometric product reveals the relationship between bivectors and complex numbers [3]. This means that we can form the equivalent of a complex number, , by combining a scalar and a unit bivector. The local color phase can be computed as follows:
This phase describes the angular distance between and a given vector in , that is, it gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure.
3. Local Multi-Model Monogenic Image Feature Descriptors
We make use of a visual representation [11] which is motivated by feature processing in the human visual system and define new local multi-model monogenic image descriptors which give an explicit and condensed representation of the local image signal as follows: In fact, this representation performs a considerable condensation of information in a local image patch of pixels. The symbol represents central coordinates of the local image patch, is the instantaneous phase of the gray monogenic signal, is the local color phase, and is the color values in RGB color space.
Based on the local multi-model image feature descriptors, a stereo similarity function between two primitives is the weighted sum of squared differences of instantaneous phase, local color phase, and color vector for a pair of stereo images in the local patch: where represent the left and right images, respectively, is the distance measurement of instantaneous phase , is the distance measurement of local color phase , and is the distance measurement of color vector with and in RGB color space. The symbols are weighted coefficients with and . In order to achieve a better coherence with the real scene, we use an adaptive support-weight technique in the local patch [12]. The support weight for each pixel in the window is calculated based on the Gestalt Principles, which state that the grouping of pixels should be based on spatial proximity and chromatic similarity. The original formula proposed is given as follows [13]: where and are the pixel of interest in left and right images, is the pixel offset within the local window, represents the weight of neighboring pixel , is the color difference between pixel and , is the Euclidean distance between pixel and , and are user defined parameters and, is the aggregated cost between pixel in the left image and in the right image.
In the proposed feature descriptors, there are several merits. Firstly, the instantaneous phase contains local orientation (or geometric information) and information about contrast transition so that it describes an intrinsically one-dimensional structure in a grey level image, that is, an image structure that is dominated by one orientation. Examples of different contrast transitions are a dark/bright (bright/dark) edge or a bright (dark) line on dark (bright) background. Of course, there is a continuum between these different grey level structures. The instantaneous phase as an additional feature allows us to take this information into account (as one parameter in addition to orientation) in a compact way [11]. Secondly, local color phase describes the angular distance between color monogenic signal of color image and a given color vector in and gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure [14]. Finally, the color vector indicates the mean color structure of local image because color is also an important cue to improve stereo matching. Because the stereo similarity function with the local multi-model feature descriptors contains local geometric, structure, and color information, it is much more robust against noise and brightness change than others in feature matching and 3D reconstruction.
4. Experimentation Results
Once the similarity function is given, minimization process can be performed to find the optimal disparity. In order to reduce noise sensitivity and simultaneously achieve higher efficiency, the multiscale space and a winner-take-all technique are employed to optimize the disparity map for stereo matching. We test the performance of the proposed local multi-model feature descriptors and similarity function on the synthetic and natural stereo images. On the first step, we estimate and compare the disparity maps of a pair of synthetic images using each of three distance measurement and the proposed stereo similarity function. On the second step, we reconstruct 3D shape and appearance of natural object and scene combing the proposed stereo similarity function and multiview stereo technique [15].
4.1. Disparity Estimation Experiment
In the first experiment, we chose a pair of color images (cloth1) from the website [16]; the left image, right image, and ground disparity map of the stereo pair are showed in Figures 1(a)β1(c), respectively.
(a)
(b)
(c)
Firstly, the left and right color images are converted to gray level image. The gray monogenic signals with scales are computed, and the instantaneous phases are estimated for both gray images in Poisson scale space. Figure 2 show three instantaneous phase maps of the left image. At the same time, the scale-space color monogenic signals with scales are computed, and a chosen reference vector with local geometric structure and a unit vector in RGB color space are used to estimate the local color phase for both images [14]. Figure 3 also shows three color phase maps of the left image.
(a)
(b)
(c)
(a)
(b)
(c)
Secondly, the proposed stereo similarity function based on local multi-model monogenic image descriptors is employed to compute the weighted sum of squared differences of instantaneous phase, local color phase, and color vector. The scale-space color monogenic framework [14] and a local optimization technique [17] are employed to optimize the disparity map according to the adaptive support-weight cost aggregation as in (3.4). In the scale space, the disparity field at the coarser scale is used as starting guesses for estimation at the next finer scale, then a subpixel disparity can be obtained. For example, we estimate the disparity map at the third scale using the cost function (3.2) and (3.4). The disparities at the third scale directly subtending the second scale under scrutiny are all used as candidate starting guesses. The candidate leading to best match is accepted for the regularization step. Estimation proceeds in this fashion, decrementing scale from coarser to finer, with matching following by regularization, until the finest level of detail is reached. Figure 4 shows the dense disparity map based on the proposed algorithm. Figures 4(a)β4(d) show four estimated disparity maps with the different weighted coefficients .
(a)
(b)
(c)
(d)
Thirdly, a comparison is performed to further validate the claims about the performance of the proposed algorithm. This comparison is performed between the proposed algorithm and a number of selected algorithms from the literature, which are Sum of Absolute Differences (SAD) [18], Sum of Squared Differences (SSD) [18], Graph Cuts (GC) [18], Discrete Wavelet (DWT) [19], Complex Discrete Wavelet (CWT) [19], and Quaternion Wavelet (QWT) [20]. To create a better understanding of the comparison, statistic root mean square error (RMSE) and percent-age of bad disparities (PBD) are calculated as [19] where and are the estimated and ground truth disparity maps and is the total number of pixels in an image, whereas represents the disparity error tolerance. The statistics RMSE and PBD related to all of the above algorithms is presented in Table 1. As it can be seen in Table 1, the proposed algorithms with the weighted coefficients , , obtain better results (RMSE = 1.82 and PDB = 0.09) than others.
4.2. The Robustness against Noise and Brightness
Now we aim to investigate the robustness of the proposed approach against noise and brightness change. In a first step, Gaussian noises with are added to both color images, respectively, and the disparity maps are estimated using the proposed method with four different weighted coefficients. The statistic root mean square error (RMSE) and percentage of bad disparities (PBD) for each disparity map with noise also are calculated using (4.1). In a second step, the brightness values with are added to the right image. We also compute the disparity maps, RMSE and PBD, with four different weighted coefficients. The experiment results in Table 2 show that the proposed approach with ,, are insusceptible to noise and brightness change.
4.3. 3D Reconstruction Experimentation
In the last experiment, the proposed approach is used to reconstruct a set of oriented points for 3D natural scenes by multiview stereopsis. We firstly captured stereo images with the size of around 3D object and scene (a static color paper cup and a deforming color texture paper) under the condition of natural light by two color cameras (AVT Guppy F146C). We also calibrate the intrinsic and external parameters for all of cameras and images and estimated the corresponding epipolar lines among images. Feature points for all images are detected by using Speeded-Up Robust Features (SURF) with blob response threshold of 1000 [21] and are matched by using the corresponding epipolar constraints and the proposed similarity function with , , in (3.2) and (3.4). According to these init matched feature points, we reconstruct a set of 3D sparse oriented points for object and scene similar to multiview stereopsis in [15]. Then we expand and filter it to a set of robust 3D dense oriented points with each image cell of pixels and reconstruct. We reconstruct three-dimensional surface of the object. For the static color paper cup, we capture 16 images and reconstruct the 3D dense oriented points and shape. In Figure 5, there are the captured images, the 3D dense oriented points, and three-dimensional surface at the 1st, 7th, and 13th view, respectively. For the deforming color texture paper, we capture the stereo sequence using two cameras at the rate of 7 frames each second and reconstruct the time-varying 3D shape and motion. Figure 6 shows the captured image of one camera and the time-varying 3D shape at the 0th, 20th, 40th, 80th, and 100th frames.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
To further validate the performance of the proposed method, a new set of 3D dense patches for the static color paper cup also is reconstructed by the SSD-based photometric discrepancy function in [15]. And, for both algorithms, we calculated the mean feature points (MFP) and percentage of bad disparities (PBD) of each image, the number of 3D oriented points at the init matching (MAT_3D), expanding (EXP_3D) and filtering (FIL_3D) stages, and the percentage of image cells not to be reconstructed (PNR). Statistical results are shown in Table 3.
5. Conclusions
In the paper, we propose a stereo similarity function based on local multi-model monogenic image feature descriptors to solve stereo-matching problem. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach. But there is a shortcoming that the proposed method needs much more run time and storage space. So we will be devoted to improving the efficiency and evaluating the proposed method in the future work.
Acknowledgments
This work was supported by National High Technology Research and Development Program (2008AA04Z121), National Natural Science Foundation of China (50975228), and the Technical Innovation Research Program (2010CXY1007) of Xiβan, Shaanxi.