A Multi-Model Stereo Similarity Function Based on Monogenic Signal Analysis in Poisson Scale Space

Li, Jinjun; Zhao, Hong; Shi, Chengying; Zhou, Xiang

doi:https://doi.org/10.1155/2011/202653

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2011 | Article ID 202653 | https://doi.org/10.1155/2011/202653

A Multi-Model Stereo Similarity Function Based on Monogenic Signal Analysis in Poisson Scale Space

Jinjun Li,^1,2Hong Zhao,¹Chengying Shi,²and Xiang Zhou¹

Academic Editor: Katica R. (Stevanovic) Hedrih

Received28 Jul 2010

Revised15 Nov 2010

Accepted20 Mar 2011

Published16 May 2011

Abstract

A stereo similarity function based on local multi-model monogenic image feature descriptors (LMFD) is proposed to match interest points and estimate disparity map for stereo images. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.

1. Introduction

Shape and motion estimation from stereo images has been one of the core challenges in computer vision for decades. The robust and accurate computation of stereo depth is an important problem for many visual tasks such as machine vision, virtual reality, robot navigation, simultaneous localization and mapping, depth measurements, and 3D environment reconstruction. Most of conventional approaches, such as intensity-based or correlation-based matching, feature-based matching, and matching function optimization techniques, estimate the disparity only based on local intensity and feature between stereo images so that the results may be susceptible to level shift, scaling, rotation, and noise [1, 2].

To overcome these drawbacks, we propose a new method for establishing spatial correspondences between a pair of color images. Unlike classical stereo-matching method based on brightness constancy assumption and phase congruency constraint, we match feature points and estimate disparity map between stereo images based on a new local multi-model monogenic image feature descriptors in the Color Monogenic Signal framework [3, 4]. We firstly introduce the monogenic signal [5] of 2D gray level image using Dirac operator and Laplace equation and extract local amplitude, local orientation, and instantaneous phase information in multiscale space [6]. At the same time, the 2D monogenic signal is extended to color image, and the color monogenic signal is introduced based on Clifford algebras. The local color phase is estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space with Clifford algebras [3, 4]. Then we focus on defining new local multi-model monogenic image feature descriptors which contain local geometric (local orientation), structure (instantaneous phase), and color (local color phase and color values) information in the Color Monogenic Signal framework. Based on the proposed image feature descriptors, a stereo similarity function between two primitives is also defined to solve stereo correspondence problem. Finally, we test the performance of the proposed approach on the synthetic and natural stereo images, and experiment results are given in detail.

2. Monogenic Signal Analysis in Poisson Scale Space

2.1. Modeling 2D Image Signal

Based on the results of Fourier theory and functional analysis, we assume that each 2D signal can be locally modeled by a superposition of arbitrarily orientated one-dimensional cosine waves [6]: with as the convolution operator and the orientation . Note that each cosine wave is determined with the same amplitude and phase information. The Poisson convolution kernel reads For a certain scale space parameters , the Poisson kernel acts as a low pass filter on the original signal . The Poisson scale space is naturally related to the generalized Hilbert transform by the Cauchy kernel. To filter a frequency interval of interest, the difference of Poisson (DoP) kernel will be used in practice: with and as the coarse scale parameter and as the fine scale parameter. The filtered signal is defined by convolution with the difference of Poisson kernel which will be used to analyze the original with the DoP operator to consider only a small passband of the original signal spectrum. Without loss of generality the signal model in (2.1) degrades locally at the origin of a local coordinate system to In case of image analysis lines, edges, junctions, and corners can be models in this way. The signal processing task is now to determine the local amplitude , the local orientation , and the local phase for a certain scale space parameter and a certain location . This problem has been already solved for one-dimensional signals by the classical analysis [7] by means of the Hilbert transform [8] and for intrinsically one-dimensional signals [9] by the two-dimensional monogenic signal by means of the generalized first-order Hilbert transforms [10].

2.2. The Analytic Signal

Let : be a real-valued signal, and let : [3] be a vector-valued signal such that [4]. The purpose is to construct a function fulfilling the Dirac equations whose real part is real-valued signal. It is equivalent to find the solution of a boundary value problem of the second kind (a Neumann problem): with and .

The first equation in (2.5) is the 2D Laplace equation restricted to the open domain . The second equation is called the boundary condition and the basis vector. is coherent with the embedding of complex functions as fields (the real part is embedded as the -component). Using the fundamental solution of the 2D Laplace equation, the solution of the problem leads to where is the 1D-Poisson kernel and is the Hilbert kernel. The variable is a scale parameter, and, taking it equal to zero, the classical analytic signal can be obtained.

2.3. Instantaneous Phase of the Monogenic Signal

Following the previous construction of the analytic signal, Michael Felsberg and Gerald Sommer has proposed an extension to 2D signals and defined a monogenic signal which is the combination of a gray image with its Riesz transform [5]. Let be a real-valued signal and a vector-valued signal, and is the orthonormal basis of such that . According to the 3D Laplace equation restricted to the open half-space and the boundary condition of the second kind, we can obtain the monogenic signal as follows: where is a 2D Poisson kernel and is the Riesz kernel, extension on 2D of the Hilbert kernel, is equal to the scale space parameter . Now let and . In case of intrinsic dimension one signals (The intrinsic dimension expresses the number of degrees of freedom necessary to describe local structure. Constant signals without any structure are of intrinsic dimension zero (i0D), arbitrary orientated straight lines and edges are of intrinsic dimension one (i1D), and all other possible patterns such as corners and junctions are of intrinsic dimension two (i2D). In general i2D signals can only be modeled by an infinite number of superimposed i1D signals.) (i.e., in (2.1)), we can obtain: where , , and represent the local amplitude, local orientation, and instantaneous phase, respectively: However, we do not know the correct signal of the phase since it depends on the directional sense of . The best possible solution is to project it onto as

2.4. Local Color Phase of the Color Monogenic Signal

In 2009, Demarcq et al. constructed a scale-space signal for color images seen as vectors in [4]. Let be a real-valued signal and a vector-valued signal, and is the orthonormal basis of such that . Then a color image is decomposed in the RGB space represented as the subspace spanned by . Now we need to find a function which is monogenic and the -, - and -component, of which are the components , , and , respectively. with and . A scale-space signal which has independent scales in each component can be defined by splitting the problem into three boundary value problems in as follows:

Each solution of the system in (2.12) leads to monogenic functions , , : they satisfy the Dirac equation in each subspace and consequently the Dirac equation in . Let , then is still monogenic in (i.e., ) and satisfies the boundary conditions. So the scale-space color monogenic signal can be obtained by using the Dirac operator and the Laplace equation as follows: where is a 2D Poisson kernel and is the Riesz kernel. As for the analytic or monogenic signal, a color image can be represented in terms of local amplitude and local phase. Now our proposal is to use the geometric product in order to compare two vectors in .

In the Clifford algebra of the Euclidean vector space , the product of two vectors and , embedded in , is given by [3, 4] where is the inner product and , the wedge product of and , is a bivector. This product is usually called the geometric product of and . The inner product is symmetric, and the wedge product is skew symmetric. If is a chosen vector containing structure information and color information , then the geometric product of and can be given by where denotes the scalar part, the bivector part, and the magnitude of the bivector part [6]. According to Clifford algebras, the geometric product reveals the relationship between bivectors and complex numbers [3]. This means that we can form the equivalent of a complex number, , by combining a scalar and a unit bivector. The local color phase can be computed as follows:

This phase describes the angular distance between and a given vector in , that is, it gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure.

3. Local Multi-Model Monogenic Image Feature Descriptors

We make use of a visual representation [11] which is motivated by feature processing in the human visual system and define new local multi-model monogenic image descriptors which give an explicit and condensed representation of the local image signal as follows: In fact, this representation performs a considerable condensation of information in a local image patch of pixels. The symbol represents central coordinates of the local image patch, is the instantaneous phase of the gray monogenic signal, is the local color phase, and is the color values in RGB color space.

Based on the local multi-model image feature descriptors, a stereo similarity function between two primitives is the weighted sum of squared differences of instantaneous phase, local color phase, and color vector for a pair of stereo images in the local patch: where represent the left and right images, respectively, is the distance measurement of instantaneous phase , is the distance measurement of local color phase , and is the distance measurement of color vector with and in RGB color space. The symbols are weighted coefficients with and . In order to achieve a better coherence with the real scene, we use an adaptive support-weight technique in the local patch [12]. The support weight for each pixel in the window is calculated based on the Gestalt Principles, which state that the grouping of pixels should be based on spatial proximity and chromatic similarity. The original formula proposed is given as follows [13]: where and are the pixel of interest in left and right images, is the pixel offset within the local window, represents the weight of neighboring pixel , is the color difference between pixel and , is the Euclidean distance between pixel and , and are user defined parameters and, is the aggregated cost between pixel in the left image and in the right image.

In the proposed feature descriptors, there are several merits. Firstly, the instantaneous phase contains local orientation (or geometric information) and information about contrast transition so that it describes an intrinsically one-dimensional structure in a grey level image, that is, an image structure that is dominated by one orientation. Examples of different contrast transitions are a dark/bright (bright/dark) edge or a bright (dark) line on dark (bright) background. Of course, there is a continuum between these different grey level structures. The instantaneous phase as an additional feature allows us to take this information into account (as one parameter in addition to orientation) in a compact way [11]. Secondly, local color phase describes the angular distance between color monogenic signal of color image and a given color vector in and gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure [14]. Finally, the color vector indicates the mean color structure of local image because color is also an important cue to improve stereo matching. Because the stereo similarity function with the local multi-model feature descriptors contains local geometric, structure, and color information, it is much more robust against noise and brightness change than others in feature matching and 3D reconstruction.

4. Experimentation Results

Once the similarity function is given, minimization process can be performed to find the optimal disparity. In order to reduce noise sensitivity and simultaneously achieve higher efficiency, the multiscale space and a winner-take-all technique are employed to optimize the disparity map for stereo matching. We test the performance of the proposed local multi-model feature descriptors and similarity function on the synthetic and natural stereo images. On the first step, we estimate and compare the disparity maps of a pair of synthetic images using each of three distance measurement and the proposed stereo similarity function. On the second step, we reconstruct 3D shape and appearance of natural object and scene combing the proposed stereo similarity function and multiview stereo technique [15].

4.1. Disparity Estimation Experiment

In the first experiment, we chose a pair of color images (cloth1) from the website [16]; the left image, right image, and ground disparity map of the stereo pair are showed in Figures 1(a)–1(c), respectively.

(a)

(b)

(c)

Firstly, the left and right color images are converted to gray level image. The gray monogenic signals with scales are computed, and the instantaneous phases are estimated for both gray images in Poisson scale space. Figure 2 show three instantaneous phase maps of the left image. At the same time, the scale-space color monogenic signals with scales are computed, and a chosen reference vector with local geometric structure and a unit vector in RGB color space are used to estimate the local color phase for both images [14]. Figure 3 also shows three color phase maps of the left image.

(a)

(b)

(c)

(a)

(b)

(c)

Secondly, the proposed stereo similarity function based on local multi-model monogenic image descriptors is employed to compute the weighted sum of squared differences of instantaneous phase, local color phase, and color vector. The scale-space color monogenic framework [14] and a local optimization technique [17] are employed to optimize the disparity map according to the adaptive support-weight cost aggregation as in (3.4). In the scale space, the disparity field at the coarser scale is used as starting guesses for estimation at the next finer scale, then a subpixel disparity can be obtained. For example, we estimate the disparity map at the third scale using the cost function (3.2) and (3.4). The disparities at the third scale directly subtending the second scale under scrutiny are all used as candidate starting guesses. The candidate leading to best match is accepted for the regularization step. Estimation proceeds in this fashion, decrementing scale from coarser to finer, with matching following by regularization, until the finest level of detail is reached. Figure 4 shows the dense disparity map based on the proposed algorithm. Figures 4(a)–4(d) show four estimated disparity maps with the different weighted coefficients .

(a)

(b)

(c)

(d)

Thirdly, a comparison is performed to further validate the claims about the performance of the proposed algorithm. This comparison is performed between the proposed algorithm and a number of selected algorithms from the literature, which are Sum of Absolute Differences (SAD) [18], Sum of Squared Differences (SSD) [18], Graph Cuts (GC) [18], Discrete Wavelet (DWT) [19], Complex Discrete Wavelet (CWT) [19], and Quaternion Wavelet (QWT) [20]. To create a better understanding of the comparison, statistic root mean square error (RMSE) and percent-age of bad disparities (PBD) are calculated as [19] where and are the estimated and ground truth disparity maps and is the total number of pixels in an image, whereas represents the disparity error tolerance. The statistics RMSE and PBD related to all of the above algorithms is presented in Table 1. As it can be seen in Table 1, the proposed algorithms with the weighted coefficients , , obtain better results (RMSE = 1.82 and PDB = 0.09) than others.

4.2. The Robustness against Noise and Brightness

Now we aim to investigate the robustness of the proposed approach against noise and brightness change. In a first step, Gaussian noises with are added to both color images, respectively, and the disparity maps are estimated using the proposed method with four different weighted coefficients. The statistic root mean square error (RMSE) and percentage of bad disparities (PBD) for each disparity map with noise also are calculated using (4.1). In a second step, the brightness values with are added to the right image. We also compute the disparity maps, RMSE and PBD, with four different weighted coefficients. The experiment results in Table 2 show that the proposed approach with ,, are insusceptible to noise and brightness change.

4.3. 3D Reconstruction Experimentation

In the last experiment, the proposed approach is used to reconstruct a set of oriented points for 3D natural scenes by multiview stereopsis. We firstly captured stereo images with the size of around 3D object and scene (a static color paper cup and a deforming color texture paper) under the condition of natural light by two color cameras (AVT Guppy F146C). We also calibrate the intrinsic and external parameters for all of cameras and images and estimated the corresponding epipolar lines among images. Feature points for all images are detected by using Speeded-Up Robust Features (SURF) with blob response threshold of 1000 [21] and are matched by using the corresponding epipolar constraints and the proposed similarity function with , , in (3.2) and (3.4). According to these init matched feature points, we reconstruct a set of 3D sparse oriented points for object and scene similar to multiview stereopsis in [15]. Then we expand and filter it to a set of robust 3D dense oriented points with each image cell of pixels and reconstruct. We reconstruct three-dimensional surface of the object. For the static color paper cup, we capture 16 images and reconstruct the 3D dense oriented points and shape. In Figure 5, there are the captured images, the 3D dense oriented points, and three-dimensional surface at the 1st, 7th, and 13th view, respectively. For the deforming color texture paper, we capture the stereo sequence using two cameras at the rate of 7 frames each second and reconstruct the time-varying 3D shape and motion. Figure 6 shows the captured image of one camera and the time-varying 3D shape at the 0th, 20th, 40th, 80th, and 100th frames.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

To further validate the performance of the proposed method, a new set of 3D dense patches for the static color paper cup also is reconstructed by the SSD-based photometric discrepancy function in [15]. And, for both algorithms, we calculated the mean feature points (MFP) and percentage of bad disparities (PBD) of each image, the number of 3D oriented points at the init matching (MAT_3D), expanding (EXP_3D) and filtering (FIL_3D) stages, and the percentage of image cells not to be reconstructed (PNR). Statistical results are shown in Table 3.

5. Conclusions

In the paper, we propose a stereo similarity function based on local multi-model monogenic image feature descriptors to solve stereo-matching problem. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach. But there is a shortcoming that the proposed method needs much more run time and storage space. So we will be devoted to improving the efficiency and evaluating the proposed method in the future work.

Acknowledgments

This work was supported by National High Technology Research and Development Program (2008AA04Z121), National Natural Science Foundation of China (50975228), and the Technical Innovation Research Program (2010CXY1007) of Xi’an, Shaanxi.

References

J. Li, H. Zhao, X. Zhou, and C. Shi, “Robust stereo image matching using a two-dimensional monogenic wavelet transform,” Optics Letters, vol. 34, no. 22, pp. 3514–3516, 2009.
View at: Publisher Site | Google Scholar
J. Li, H. Zhao, Q. Fu, and K. Jiang, “Space-time stereo analysis combining local structure and modulation features in the monogenic wavelet domain,” Optics Letters, vol. 35, no. 7, pp. 1049–1051, 2010.
View at: Publisher Site | Google Scholar
J. Vince, Geometric Algebra for Computer Graphics, Springer, London, UK, 2008.
G. Demarcq, L. Mascarilla, and P. Courtellemont, “The color monogenic signal: A new framework for color image processing. Application to color optical flow,” in Proceedings of the IEEE International Conference on Image Processing (ICIP '09), pp. 481–484, November 2009.
View at: Publisher Site | Google Scholar
M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Transactions on Signal Processing, vol. 49, no. 12, pp. 3136–3144, 2001.
View at: Publisher Site | Google Scholar
L. Wietzke and G. Sommer, “The signal multi-vector,” Journal of Mathematical Imaging and Vision, vol. 37, pp. 132–150, 2010.
View at: Publisher Site | Google Scholar
D. Gabor, “Theory of communication,” Journal IEE (London), vol. 93, no. 26, pp. 429–457, 1946.
View at: Google Scholar
S. L. Hahn, Hilbert Transforms in Signal Processing, Artech-House, Boston, Mass, USA, 1996.
C. Zetzche and E. Barth, “Fundamental limts of linear filters in the visual processing of two-dimensional signals,” Vision Research, vol. 30, no. 7, pp. 1111–1117, 1990.
View at: Publisher Site | Google Scholar
M. Felsberg, “Low-level image processing with the structure multivector,” Tech. Rep. 2016, Department of Computer Science, Kiel University, Kiel, Germany, 2002.
View at: Google Scholar | Zentralblatt MATH
N. Kruger and M. Felsberg, “An explicit and compact coding of geometric and structural image information applied to stereo processing,” Pattern Recognition Letters, vol. 25, no. 8, pp. 849–863, 2004.
View at: Publisher Site | Google Scholar
K. J. Yoon and I. N. S. Kweon, “Adaptive support-weight approach for correspondence search,” PAMI, vol. 28, no. 4, pp. 650–656, 2005.
View at: Google Scholar
Y. Zhang, M. Gong, and Y. H. Yang, “Real-time multi-view stereo algorithm using adaptive-weight parzen window and local winner-take-all optimization,” in Proceedings of the 5th Canadian Conference on Computer and Robot Vision (CRV '08), pp. 113–120, May 2008.
View at: Publisher Site | Google Scholar
J. Li, H. Zhao, K. Jiang, X. Zhou, and X. Tong, “Multiscale stereo analysis based on local-color-phase congruency in the color monogenic signal framework,” Optics Letters, vol. 35, no. 13, pp. 2272–2274, 2010.
View at: Publisher Site | Google Scholar
Y. Furukawa and J. Ponce, “Accurate, dense, and robust multi-view stereopsis,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), vol. 1–8, pp. 2118–2125, June 2007.
View at: Publisher Site | Google Scholar
http://vision.middlebury.edu/stereo/data/scenes2006/FullSize/Cloth1/.
L. Wang, M. Gong, M. Gong, and R. Yang, “How far can we go with local optimization in real-time stereo matching - A performance study on different cost aggregation approaches,” in Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT '06), pp. 129–136, June 2006.
View at: Publisher Site | Google Scholar
D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
A. Bhatti and S. Nahavandi, “Stereo image matching using wavelet scale-space representation,” in Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation (CGIV '06), pp. 267–272, July 2006.
View at: Publisher Site | Google Scholar
Y. I. Xu, X. Yang, P. Zhang, L. I. Song, and L. Traversoni, “Cooperative stereo matching using quaternion wavlets and top-down segmentation,” in Proceedings of the IEEE International Conference onMultimedia and Expo (ICME '07), pp. 1954–1957, July 2007.
View at: Google Scholar
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2011 Jinjun Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1037

Downloads

672

Citations