Abstract

A stereo similarity function based on local multi-model monogenic image feature descriptors (LMFD) is proposed to match interest points and estimate disparity map for stereo images. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.

1. Introduction

Shape and motion estimation from stereo images has been one of the core challenges in computer vision for decades. The robust and accurate computation of stereo depth is an important problem for many visual tasks such as machine vision, virtual reality, robot navigation, simultaneous localization and mapping, depth measurements, and 3D environment reconstruction. Most of conventional approaches, such as intensity-based or correlation-based matching, feature-based matching, and matching function optimization techniques, estimate the disparity only based on local intensity and feature between stereo images so that the results may be susceptible to level shift, scaling, rotation, and noise [1, 2].

To overcome these drawbacks, we propose a new method for establishing spatial correspondences between a pair of color images. Unlike classical stereo-matching method based on brightness constancy assumption and phase congruency constraint, we match feature points and estimate disparity map between stereo images based on a new local multi-model monogenic image feature descriptors in the Color Monogenic Signal framework [3, 4]. We firstly introduce the monogenic signal [5] of 2D gray level image using Dirac operator and Laplace equation and extract local amplitude, local orientation, and instantaneous phase information in multiscale space [6]. At the same time, the 2D monogenic signal is extended to color image, and the color monogenic signal is introduced based on Clifford algebras. The local color phase is estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space with Clifford algebras [3, 4]. Then we focus on defining new local multi-model monogenic image feature descriptors which contain local geometric (local orientation), structure (instantaneous phase), and color (local color phase and color values) information in the Color Monogenic Signal framework. Based on the proposed image feature descriptors, a stereo similarity function between two primitives is also defined to solve stereo correspondence problem. Finally, we test the performance of the proposed approach on the synthetic and natural stereo images, and experiment results are given in detail.

2. Monogenic Signal Analysis in Poisson Scale Space

2.1. Modeling 2D Image Signal

Based on the results of Fourier theory and functional analysis, we assume that each 2D signal π‘“βˆˆπΏ2(ℝ)∩𝐿1(ℝ) can be locally modeled by a superposition of arbitrarily orientated one-dimensional cosine waves [6]:ξ€·π‘π‘ ξ€Έβˆ—π‘“(π‘₯,𝑦)=𝑛𝑣=1π‘Žπ‘£(π‘₯,𝑦,𝑠)cosπ‘₯,𝑦,π‘œπ‘£ξ¬(π‘₯,𝑦,𝑠)+πœ™π‘£ξ€Έ(π‘₯,𝑦,𝑠)(2.1) with βˆ— as the convolution operator and the orientation π‘œπœˆ(π‘₯,𝑦,𝑠)=[cosπœƒπ‘£(π‘₯,𝑦,𝑠),sinπœƒπ‘£(π‘₯,𝑦,𝑠)]𝑇. Note that each cosine wave is determined with the same amplitude and phase information. The Poisson convolution kernel reads𝑝𝑠𝑠(π‘₯,𝑦)=𝑠2πœ‹2+π‘₯2+𝑦2ξ€Έ3/2.(2.2) For a certain scale space parameters π‘ βˆˆβ„+, the Poisson kernel acts as a low pass filter on the original signal 𝑓. The Poisson scale space is naturally related to the generalized Hilbert transform by the Cauchy kernel. To filter a frequency interval of interest, the difference of Poisson (DoP) kernel will be used in practice: 𝑝𝑠𝑓,𝑠𝑐(π‘₯,𝑦)=𝑝𝑠𝑓(π‘₯,𝑦)βˆ’π‘π‘ π‘(π‘₯,𝑦)(2.3) with 𝑠𝑐>𝑠𝑓>0 and 𝑠𝑐 as the coarse scale parameter and 𝑠𝑓 as the fine scale parameter. The filtered signal is defined by convolution with the difference of Poisson kernel which will be used to analyze the original with the DoP operator to consider only a small passband of the original signal spectrum. Without loss of generality the signal model in (2.1) degrades locally at the origin (π‘₯,𝑦)=(0,0) of a local coordinate system to𝑓𝑝(π‘₯,𝑦,𝑠)=𝑛𝑣=1π‘Ž(π‘₯,𝑦,𝑠)cosπœ™(π‘₯,𝑦,𝑠).(2.4) In case of image analysis lines, edges, junctions, and corners can be models in this way. The signal processing task is now to determine the local amplitude π‘Ž(π‘₯,𝑦,𝑠), the local orientation πœƒπ‘£(π‘₯,𝑦,𝑠), and the local phase πœ™(π‘₯,𝑦,𝑠) for a certain scale space parameter 𝑠 and a certain location (π‘₯,𝑦). This problem has been already solved for one-dimensional signals by the classical analysis [7] by means of the Hilbert transform [8] and for intrinsically one-dimensional signals [9] by the two-dimensional monogenic signal by means of the generalized first-order Hilbert transforms [10].

2.2. The Analytic Signal

Let 𝑠: ℝ→ℝ be a real-valued signal, and let 𝑓: ℝ→ℝ2,0 [3] be a vector-valued signal such that 𝑓(π‘₯)=𝑠(π‘₯)𝑒2 [4]. The purpose is to construct a function fulfilling the Dirac equations whose real part is real-valued signal. It is equivalent to find the solution of a boundary value problem of the second kind (a Neumann problem): πœ•Ξ”π‘’=2π‘’πœ•π‘₯2+πœ•2π‘’πœ•π‘¦2𝑒=0,if𝑦>0,2πœ•π‘’πœ•π‘¦=𝑓(π‘₯),if𝑦=0,(2.5) with𝐷=𝑒1(πœ•/πœ•π‘₯)+𝑒2(πœ•/πœ•π‘¦) and Ξ”=𝐷2.

The first equation in (2.5) is the 2D Laplace equation restricted to the open domain 𝑦>0. The second equation is called the boundary condition and the basis vector. 𝑒2 is coherent with the embedding of complex functions as fields (the real part is embedded as the 𝑒2-component). Using the fundamental solution of the 2D Laplace equation, the solution of the problem leads to𝑓𝐴(π‘₯,𝑦)=𝑝1βˆ—π‘“(π‘₯,𝑦)+𝑝1βˆ—β„Ž1βˆ—π‘“(π‘₯,𝑦),(2.6) where 𝑝1=𝑦/πœ‹(π‘₯2+𝑦2) is the 1D-Poisson kernel and β„Ž1=(1/πœ‹π‘₯)𝑒12 is the Hilbert kernel. The variable 𝑦 is a scale parameter, and, taking it equal to zero, the classical analytic signal can be obtained.

2.3. Instantaneous Phase of the Monogenic Signal

Following the previous construction of the analytic signal, Michael Felsberg and Gerald Sommer has proposed an extension to 2D signals and defined a monogenic signal which is the combination of a gray image with its Riesz transform [5]. Let π‘ βˆΆβ„2→ℝ be a real-valued signal and π‘“βˆΆβ„2→ℝ3,0 a vector-valued signal, and {𝑒𝑖}(𝑖=1,2,3) is the orthonormal basis of ℝ3 such that 𝑓(π‘₯,𝑦)=𝑓3(π‘₯,𝑦)𝑒3. According to the 3D Laplace equation restricted to the open half-space 𝑧>0 and the boundary condition of the second kind, we can obtain the monogenic signal as follows:𝑓𝑀(π‘₯,𝑦,𝑧)=𝑝2βˆ—π‘“(π‘₯,𝑦)+𝑝2βˆ—β„Žπ‘₯βˆ—π‘“(π‘₯,𝑦)+𝑝2βˆ—β„Žπ‘¦βˆ—π‘“(π‘₯,𝑦),(2.7) where 𝑝2=𝑧/2πœ‹(π‘₯2+𝑦2+𝑧2)3/2 is a 2D Poisson kernel and β„Žπ‘…=(β„Žπ‘₯,β„Žπ‘¦)=(π‘₯𝑒1/2πœ‹(π‘₯2+𝑦2)3/2,𝑦𝑒2/2πœ‹(π‘₯2+𝑦2)3/2) is the Riesz kernel, extension on 2D of the Hilbert kernel, 𝑧 is equal to the scale space parameter 𝑠. Now let 𝑓𝑀(π‘₯,𝑦,𝑠)=𝑓𝑝(π‘₯,𝑦,𝑠)+𝑓π‘₯(π‘₯,𝑦,𝑠)+𝑓𝑦(π‘₯,𝑦,𝑠) and π‘“π‘žξ‚ƒ(π‘₯,𝑦,𝑠)=𝑓π‘₯𝑓𝑦(π‘₯,𝑦,𝑠). In case of intrinsic dimension one signals (The intrinsic dimension expresses the number of degrees of freedom necessary to describe local structure. Constant signals without any structure are of intrinsic dimension zero (i0D), arbitrary orientated straight lines and edges are of intrinsic dimension one (i1D), and all other possible patterns such as corners and junctions are of intrinsic dimension two (i2D). In general i2D signals can only be modeled by an infinite number of superimposed i1D signals.) (i.e., 𝑛=1 in (2.1)), we can obtain: 𝑓𝑝,𝑓π‘₯,𝑓𝑦=π‘Ž(cosπœ™,sinπœ™cosπœƒ,sinπœ™sinπœƒ),(2.8) where π‘Ž(π‘₯,𝑦,𝑠), πœƒ(π‘₯,𝑦,𝑠), and πœ™(π‘₯,𝑦,𝑠) represent the local amplitude, local orientation, and instantaneous phase, respectively:ξ”π‘Ž(π‘₯,𝑦,𝑠)=𝑓2𝑝(π‘₯,𝑦,𝑠)+𝑓2π‘₯(π‘₯,𝑦,𝑠)+𝑓2𝑦𝑓(π‘₯,𝑦,𝑠),πœƒ(π‘₯,𝑦,𝑠)=arctan𝑦(π‘₯,𝑦,𝑠)𝑓π‘₯ξ‚Ά,βŽ›βŽœβŽœβŽœβŽξ”(π‘₯,𝑦,𝑠)πœ™(π‘₯,𝑦,𝑠)=arctan𝑓2π‘₯(π‘₯,𝑦,𝑠)+𝑓2𝑦(π‘₯,𝑦,𝑠)π‘“π‘βŽžβŽŸβŽŸβŽŸβŽ .(π‘₯,𝑦,𝑠)(2.9) However, we do not know the correct signal of the phase since it depends on the directional sense of πœƒ(π‘₯,𝑦,𝑠). The best possible solution is to project it onto (cosπœƒ,sinπœƒ) as ξπ‘“πœ™(π‘₯,𝑦,𝑠)=π‘ž(π‘₯,𝑦,𝑠)||π‘“π‘ž||𝑓(π‘₯,𝑦,𝑠)arg𝑝||𝑓(π‘₯,𝑦,𝑠)+π‘–π‘ž||ξ€Έ(π‘₯,𝑦,𝑠).(2.10)

2.4. Local Color Phase of the Color Monogenic Signal

In 2009, Demarcq et al. constructed a scale-space signal for color images seen as vectors in ℝ5,0 [4]. Let π‘ βˆΆβ„2→ℝ3 be a real-valued signal and π‘“βˆΆβ„2→ℝ5,0 a vector-valued signal, and {𝑒𝑖}(𝑖=1,2,3,4,5) is the orthonormal basis of ℝ5 such that 𝑓(π‘₯1,π‘₯2)=𝑓3(π‘₯1,π‘₯2)𝑒3+𝑓4(π‘₯1,π‘₯2)𝑒4+𝑓5(π‘₯1,π‘₯2)𝑒5. Then a color image is decomposed in the RGB space represented as the subspace spanned by {𝑒3,𝑒4,𝑒5}. Now we need to find a function which is monogenic and the 𝑒3-, 𝑒4- and 𝑒5-component, of which are the components π‘Ÿ, 𝑔, and 𝑏, respectively.πœ•Ξ”π‘’=2π‘’πœ•π‘₯21+πœ•2π‘’πœ•π‘₯22+πœ•2π‘’πœ•π‘₯23+πœ•2π‘’πœ•π‘₯24+πœ•2π‘’πœ•π‘₯25𝑒=0,3πœ•π‘’πœ•π‘₯3+𝑒4πœ•π‘’πœ•π‘₯4+𝑒5πœ•π‘’πœ•π‘₯5=𝑓(π‘₯)(2.11) with βˆ‘π·=5𝑖=1𝑒𝑖(πœ•/πœ•π‘₯𝑖) and Ξ”=𝐷2. A scale-space signal which has independent scales in each component (𝑓3,𝑓4,𝑓5)=(π‘Ÿ,𝑔,𝑏) can be defined by splitting the problem into three boundary value problems in ℝ5,0 as follows: πœ•2π‘’πœ•π‘₯21+πœ•2π‘’πœ•π‘₯22+πœ•2π‘’πœ•π‘₯2𝑖=0,ifπ‘₯𝑖𝑒>0,𝑖=3,4,5,π‘–πœ•π‘’πœ•π‘₯𝑖=𝑓𝑖π‘₯1,π‘₯2𝑒𝑖,ifπ‘₯𝑖=0,𝑖=3,4,5.(2.12)

Each solution of the system in (2.12) leads to monogenic functions 𝑆1, 𝑆2, 𝑆3: they satisfy the Dirac equation in each subspace 𝐸𝑖=span{𝑒1,𝑒2,𝑒𝑖}(𝑖=3,4,5) and consequently the Dirac equation in ℝ5,0(𝐷𝑆𝑖=0). Let 𝑓𝐢=𝑆1+𝑆2+𝑆3, then 𝑓𝐢 is still monogenic in ℝ5,0 (i.e., 𝐷𝑓𝐢=0) and satisfies the boundary conditions. So the scale-space color monogenic signal can be obtained by using the Dirac operator and the Laplace equation as follows:𝑓𝑐=5𝑖=1𝐴𝑖𝑒𝑖=5𝑖=3𝑝𝑖2βˆ—β„Žπ‘₯βˆ—π‘“π‘–+5𝑖=3𝑝𝑖2βˆ—β„Žπ‘¦βˆ—π‘“π‘–+5𝑖=3𝑝𝑖2βˆ—π‘“π‘–π‘’π‘–,(2.13) where 𝑝𝑖2=π‘₯𝑖/(2πœ‹(π‘₯21+π‘₯22+π‘₯2𝑖)3/2)(𝑖=3,4,5) is a 2D Poisson kernel and β„Žπ‘…=(β„Žπ‘₯,β„Žπ‘¦) is the Riesz kernel. As for the analytic or monogenic signal, a color image 𝑓 can be represented in terms of local amplitude and local phase. Now our proposal is to use the geometric product in order to compare two vectors in ℝ5,0.

In the Clifford algebra of the Euclidean vector space ℝ𝑛, the product of two vectors π‘Ž and 𝑏, embedded in ℝ𝑛,0, is given by [3, 4]π‘Žπ‘=π‘Žβ‹…π‘+π‘Žβˆ§π‘,(2.14) where π‘Žβ‹…π‘ is the inner product and π‘Žβˆ§π‘, the wedge product of π‘Ž and 𝑏, is a bivector. This product is usually called the geometric product of π‘Ž and 𝑏. The inner product is symmetric, and the wedge product is skew symmetric. If 𝑉=𝑒𝑒1+𝑣𝑒2+π‘Žπ‘’3+𝑏𝑒4+𝑐𝑒5βˆˆπ‘…5,0 is a chosen vector containing structure information (𝑒,𝑣) and color information (π‘Ÿ,𝑔,𝑏), then the geometric product of 𝑓𝐢 and 𝑉can be given by𝑓𝐢𝑉=𝑓𝐢⋅𝑉+π‘“πΆβˆ§π‘‰=βŸ¨π‘“πΆπ‘‰βŸ©0+βŸ¨π‘“πΆπ‘‰βŸ©2=βŸ¨π‘“πΆπ‘‰βŸ©0+βŸ¨π‘“πΆπ‘‰βŸ©2||βŸ¨π‘“πΆπ‘‰βŸ©2||||βŸ¨π‘“πΆπ‘‰βŸ©2||,(2.15) where βŸ¨β‹…βŸ©0 denotes the scalar part, βŸ¨β‹…βŸ©2 the bivector part, and |β‹…| the magnitude of the bivector part [6]. According to Clifford algebras, the geometric product reveals the relationship between bivectors and complex numbers [3]. This means that we can form the equivalent of a complex number, 𝑓𝐢𝑉=βŸ¨π‘“πΆπ‘‰βŸ©0+𝑖|βŸ¨π‘“πΆπ‘‰βŸ©2|, by combining a scalar and a unit bivector. The local color phase can be computed as follows:πœ‘π‘ξ€·π‘“=arg𝐢𝑉||=arctanβŸ¨π‘“πΆπ‘‰βŸ©2||βŸ¨π‘“πΆπ‘‰βŸ©0ξ‚Ά.(2.16)

This phase describes the angular distance between 𝑓𝐢 and a given vector 𝑉 in 𝑅5,0, that is, it gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure.

3. Local Multi-Model Monogenic Image Feature Descriptors

We make use of a visual representation [11] which is motivated by feature processing in the human visual system and define new local multi-model monogenic image descriptors which give an explicit and condensed representation of the local image signal as follows:ξ‚€ξπœ€=𝑋,πœ™,πœ‘π‘ξ‚,𝐢.(3.1) In fact, this representation performs a considerable condensation of information in a local image patch of 𝑛×𝑛(π‘›βˆˆβ„+,𝑛>1) pixels. The symbol 𝑋=(π‘₯,𝑦) represents central coordinates of the local image patch, ξπœ™ is the instantaneous phase of the gray monogenic signal, πœ‘π‘ is the local color phase, and 𝐢=(π‘Ÿ,𝑔,𝑏) is the color values in RGB color space.

Based on the local multi-model image feature descriptors, a stereo similarity function between two primitives is the weighted sum of squared differences of instantaneous phase, local color phase, and color vector for a pair of stereo images in the local patch:π‘’ξ€·πœ€π‘™,πœ€π‘Ÿξ€Έ=π›Όπ‘‘πœ™ξ‚€ξπœ™π‘™,ξπœ™π‘Ÿξ‚+π›½π‘‘πœ‘ξ€·πœ‘π‘™,πœ‘π‘Ÿξ€Έ+𝛾𝑑𝐢𝐢𝑙,πΆπ‘Ÿξ€Έ||ξπœ™=π›Όπ‘™βˆ’ξπœ™π‘Ÿ||||πœ‘+π›½π‘π‘™βˆ’πœ‘π‘π‘Ÿ||ξƒŽ+π›Ύξ“π‘–βˆˆ{π‘Ÿ,𝑔,𝑏}ξ€·π‘π‘–π‘™βˆ’π‘π‘–π‘Ÿξ€Έ,(3.2) where 𝑙,π‘Ÿ represent the left and right images, respectively, π‘‘πœ™βˆˆ[0,πœ‹) is the distance measurement of instantaneous phase πœ™βˆˆ[βˆ’πœ‹,πœ‹), π‘‘πœ‘βˆˆ[0,πœ‹] is the distance measurement of local color phase πœ‘π‘βˆˆ[0,πœ‹], and π‘‘π‘βˆšβˆˆ[0,3] is the distance measurement of color vector with 𝐢∈[0,1]Γ—[0,1]Γ—[0,1] and 𝑐𝑅+𝑐𝐺+𝑐𝐡=1 in RGB color space. The symbols 𝛼,𝛽,𝛾 are weighted coefficients with 𝛼,π›½βˆˆ[0,1] and π›Ύβˆˆ[0,0.5]. In order to achieve a better coherence with the real scene, we use an adaptive support-weight technique in the local patch [12]. The support weight for each pixel in the window is calculated based on the Gestalt Principles, which state that the grouping of pixels should be based on spatial proximity and chromatic similarity. The original formula proposed is given as follows [13]:ξ‚΅βˆ’ξ‚΅π‘€(π‘₯,𝑦,π‘š,𝑛)=expΔ𝑐π‘₯,𝑦,π‘š,𝑛𝛾𝑐+Ξ”π‘žπ‘₯,𝑦,π‘š,𝑛𝛾𝑔𝐸π‘₯ξ‚Άξ‚Ά,(3.3)𝑙,𝑦𝑙,π‘₯π‘Ÿ,π‘¦π‘Ÿξ€Έ=βˆ‘π‘š,π‘›βˆˆ[βˆ’π‘,𝑝]𝑀𝑙π‘₯𝑙,𝑦𝑙,π‘š,π‘›β‹…π‘€π‘Ÿξ€·π‘₯π‘Ÿ,π‘¦π‘Ÿξ€Έξ€·πœ€,π‘š,𝑛⋅𝑒𝑙,πœ€π‘Ÿξ€Έβˆ‘π‘š,π‘›βˆˆ[βˆ’π‘,𝑝]𝑀(π‘₯,𝑦,π‘š,𝑛)β‹…π‘€π‘Ÿξ€·π‘₯π‘Ÿ,π‘¦π‘Ÿξ€Έ,π‘š,𝑛,(3.4) where (π‘₯𝑙,𝑦𝑙) and (π‘₯π‘Ÿ,π‘¦π‘Ÿ) are the pixel of interest in left and right images, (π‘š,𝑛) is the pixel offset within the local window, 𝑀(π‘₯,𝑦,π‘š,𝑛) represents the weight of neighboring pixel (π‘₯+π‘š,𝑦+𝑛), Δ𝑐π‘₯,𝑦,π‘š,𝑛 is the color difference between pixel (π‘₯,𝑦) and (π‘₯+π‘š,𝑦+𝑛), Ξ”π‘žπ‘₯,𝑦,π‘š,𝑛 is the Euclidean distance between pixel (π‘₯,𝑦) and (π‘₯+π‘š,𝑦+𝑛), 𝛾𝑐 and 𝛾𝑔 are user defined parameters and, 𝐸(π‘₯𝑙,𝑦𝑙,π‘₯π‘Ÿ,π‘¦π‘Ÿ) is the aggregated cost between pixel (π‘₯𝑙,𝑦𝑙) in the left image and (π‘₯π‘Ÿ,π‘¦π‘Ÿ) in the right image.

In the proposed feature descriptors, there are several merits. Firstly, the instantaneous phase contains local orientation (or geometric information) and information about contrast transition so that it describes an intrinsically one-dimensional structure in a grey level image, that is, an image structure that is dominated by one orientation. Examples of different contrast transitions are a dark/bright (bright/dark) edge or a bright (dark) line on dark (bright) background. Of course, there is a continuum between these different grey level structures. The instantaneous phase as an additional feature allows us to take this information into account (as one parameter in addition to orientation) in a compact way [11]. Secondly, local color phase describes the angular distance between color monogenic signal 𝑓𝑐 of color image and a given color vector 𝑉 in 𝑅5,0 and gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure [14]. Finally, the color vector indicates the mean color structure of local image because color is also an important cue to improve stereo matching. Because the stereo similarity function with the local multi-model feature descriptors contains local geometric, structure, and color information, it is much more robust against noise and brightness change than others in feature matching and 3D reconstruction.

4. Experimentation Results

Once the similarity function is given, minimization process can be performed to find the optimal disparity. In order to reduce noise sensitivity and simultaneously achieve higher efficiency, the multiscale space and a winner-take-all technique are employed to optimize the disparity map for stereo matching. We test the performance of the proposed local multi-model feature descriptors and similarity function on the synthetic and natural stereo images. On the first step, we estimate and compare the disparity maps of a pair of synthetic images using each of three distance measurement and the proposed stereo similarity function. On the second step, we reconstruct 3D shape and appearance of natural object and scene combing the proposed stereo similarity function and multiview stereo technique [15].

4.1. Disparity Estimation Experiment

In the first experiment, we chose a pair of color images (cloth1) from the website [16]; the left image, right image, and ground disparity map of the stereo pair are showed in Figures 1(a)–1(c), respectively.

Firstly, the left and right color images are converted to gray level image. The gray monogenic signals with 𝑠=3 scales are computed, and the instantaneous phases are estimated for both gray images in Poisson scale space. Figure 2 show three instantaneous phase maps of the left image. At the same time, the scale-space color monogenic signals with 𝑠=3 scales are computed, and a chosen reference vector 𝑉=(𝑒3+𝑒4+𝑒5√)/3 with local geometric structure (𝑒,𝑣)=(0,0) and a unit vector √(π‘Ž,𝑏,𝑐)=(1,1,1)/3 in RGB color space are used to estimate the local color phase for both images [14]. Figure 3 also shows three color phase maps of the left image.

Secondly, the proposed stereo similarity function based on local multi-model monogenic image descriptors is employed to compute the weighted sum of squared differences of instantaneous phase, local color phase, and color vector. The scale-space color monogenic framework [14] and a local optimization technique [17] are employed to optimize the disparity map according to the adaptive support-weight cost aggregation as in (3.4). In the scale space, the disparity field at the coarser scale is used as starting guesses for estimation at the next finer scale, then a subpixel disparity can be obtained. For example, we estimate the disparity map at the third scale using the cost function (3.2) and (3.4). The disparities at the third scale directly subtending the second scale under scrutiny are all used as candidate starting guesses. The candidate leading to best match is accepted for the regularization step. Estimation proceeds in this fashion, decrementing scale from coarser to finer, with matching following by regularization, until the finest level of detail is reached. Figure 4 shows the dense disparity map based on the proposed algorithm. Figures 4(a)–4(d) show four estimated disparity maps with the different weighted coefficients (𝛼,𝛽,𝛾).

Thirdly, a comparison is performed to further validate the claims about the performance of the proposed algorithm. This comparison is performed between the proposed algorithm and a number of selected algorithms from the literature, which are Sum of Absolute Differences (SAD) [18], Sum of Squared Differences (SSD) [18], Graph Cuts (GC) [18], Discrete Wavelet (DWT) [19], Complex Discrete Wavelet (CWT) [19], and Quaternion Wavelet (QWT) [20]. To create a better understanding of the comparison, statistic root mean square error (RMSE) and percent-age of bad disparities (PBD) are calculated as [19]ξƒŽRMSE=1𝑁𝑖,𝑗||𝑑𝐸(𝑖,𝑗)βˆ’π‘‘πΊ||(𝑖,𝑗)2,1PBD=𝑁𝑖,𝑗||𝑑𝐸(𝑖,𝑗)βˆ’π‘‘πΊ||(𝑖,𝑗)>𝜁,(4.1) where 𝑑𝐸 and 𝑑𝐺 are the estimated and ground truth disparity maps and 𝑁 is the total number of pixels in an image, whereas 𝜁 represents the disparity error tolerance. The statistics RMSE and PBD related to all of the above algorithms is presented in Table 1. As it can be seen in Table 1, the proposed algorithms with the weighted coefficients 𝛼=1, 𝛽=1, 𝛾=0.5 obtain better results (RMSE = 1.82 and PDB = 0.09) than others.

4.2. The Robustness against Noise and Brightness

Now we aim to investigate the robustness of the proposed approach against noise and brightness change. In a first step, Gaussian noises with 𝜎=0.05,0.10,0.15,0.20,0.25,0.30 are added to both color images, respectively, and the disparity maps are estimated using the proposed method with four different weighted coefficients. The statistic root mean square error (RMSE) and percentage of bad disparities (PBD) for each disparity map with noise also are calculated using (4.1). In a second step, the brightness values with 𝐼=βˆ’30,βˆ’20,βˆ’10,10,20,30 are added to the right image. We also compute the disparity maps, RMSE and PBD, with four different weighted coefficients. The experiment results in Table 2 show that the proposed approach with =1,𝛽=1, 𝛾=0.5 are insusceptible to noise and brightness change.

4.3. 3D Reconstruction Experimentation

In the last experiment, the proposed approach is used to reconstruct a set of oriented points for 3D natural scenes by multiview stereopsis. We firstly captured stereo images with the size of 1024Γ—768 around 3D object and scene (a static color paper cup and a deforming color texture paper) under the condition of natural light by two color cameras (AVT Guppy F146C). We also calibrate the intrinsic and external parameters for all of cameras and images and estimated the corresponding epipolar lines among images. Feature points for all images are detected by using Speeded-Up Robust Features (SURF) with blob response threshold of 1000 [21] and are matched by using the corresponding epipolar constraints and the proposed similarity function with 𝛼=1, 𝛽=1, 𝛾=0.5 in (3.2) and (3.4). According to these init matched feature points, we reconstruct a set of 3D sparse oriented points for object and scene similar to multiview stereopsis in [15]. Then we expand and filter it to a set of robust 3D dense oriented points with each image cell of 2Γ—2 pixels and reconstruct. We reconstruct three-dimensional surface of the object. For the static color paper cup, we capture 16 images and reconstruct the 3D dense oriented points and shape. In Figure 5, there are the captured images, the 3D dense oriented points, and three-dimensional surface at the 1st, 7th, and 13th view, respectively. For the deforming color texture paper, we capture the stereo sequence using two cameras at the rate of 7 frames each second and reconstruct the time-varying 3D shape and motion. Figure 6 shows the captured image of one camera and the time-varying 3D shape at the 0th, 20th, 40th, 80th, and 100th frames.

To further validate the performance of the proposed method, a new set of 3D dense patches for the static color paper cup also is reconstructed by the SSD-based photometric discrepancy function in [15]. And, for both algorithms, we calculated the mean feature points (MFP) and percentage of bad disparities (PBD) of each image, the number of 3D oriented points at the init matching (MAT_3D), expanding (EXP_3D) and filtering (FIL_3D) stages, and the percentage of image cells not to be reconstructed (PNR). Statistical results are shown in Table 3.

5. Conclusions

In the paper, we propose a stereo similarity function based on local multi-model monogenic image feature descriptors to solve stereo-matching problem. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach. But there is a shortcoming that the proposed method needs much more run time and storage space. So we will be devoted to improving the efficiency and evaluating the proposed method in the future work.

Acknowledgments

This work was supported by National High Technology Research and Development Program (2008AA04Z121), National Natural Science Foundation of China (50975228), and the Technical Innovation Research Program (2010CXY1007) of Xi’an, Shaanxi.