Abstract

We propose an entirely novel family of score functions for blind signal separation (BSS), based on the family of mixture generalized gamma density which includes generalized gamma, Weilbull, gamma, and Laplace and Gaussian probability density functions. To blindly extract the independent source signals, we resort to the FastICA approach, whilst to adaptively estimate the parameters of such score functions, we use Nelder-Mead for optimizing the maximum likelihood (ML) objective function without relaying on any derivative information. Our experimental results with source employing a wide range of statistics distribution show that Nelder-Mead technique produce a good estimation for the parameters of score functions.

1. Introduction

By definition, independent component analysis (ICA) is the statistical method that searches for a linear transformation, which can effectively minimize the statistical dependence between its components [1]. Under the physically plausible assumption of mutual statistical independence between these components, the most application of ICA is blind signal separation (BSS). In its simplest form, BSS aims to recover a set of unknown signals, the so-called original sources 𝐬(𝑡)=[𝑠1(𝑡),𝑠2(𝑡),,𝑠𝑛(𝑡)]𝑇𝐑𝑛, by relying exclusively on information that can be extracted from their linear and instantaneous mixtures 𝐱(𝑡)=[𝑥1(𝑡),𝑥2(𝑡),,𝑥𝑚(𝑡)]𝑇𝐑𝑚, given by 𝐱(𝑡)=𝐀𝐬(𝑡),𝐭=1,2,,𝑚,(1.1) where 𝐀𝐑mxn is an unknown mixing matrix of full rank and 𝑚𝑛. In doing so, BSS remains truly (blind) in the sense that very little to almost nothing be known a priori for the mixing matrix or the original source signals.

Often sources are assumed to be zero-mean and unit-variance signals with at most one having a Gaussian distribution. The problem of source estimation then boils down to determining the unmixing matrix 𝐖𝐑nxm such that the linear transformation of the sensor observation is 𝐮(𝑡)=𝐖𝐱(𝑡),𝑡=1,2,,𝑛,(1.2) where 𝐮(𝑡)=[𝑢1(𝑡),𝑢2(𝑡),,𝑢𝑛(𝑡)]𝑇𝐑𝑛 yield an estimate of vector 𝐬(𝑡) corresponding to the original or true sources. In general, the majority of BSS approaches perform ICA, by essentially optimizing the negative log-likelihood (objective) function with respect to the unmixing matrix 𝐖 such that L(𝐮,𝐖)=𝑛𝑙=1𝐸log𝑝𝑢l𝑢l||||logdet(𝐖),(1.3) where 𝐸[] represents the expectation operator and 𝑝𝑢l(𝑢l) is the model for the marginal probability density function (pdf) of 𝑢𝑙, for all 𝑙=1,2,,𝑛. Normally, matrix W is regarded as the parameter of interest and the pdfs of the sources are considered to be nuisance parameters. In effect, when correctly hypothesizing upon the distribution of the sources, the maximum likelihood (ML) principle leads to estimating functions, which in fact are the score functions of the sources [2]𝜑𝑙𝑢𝑙𝑑=𝑑𝑢𝑙log𝑝𝑢𝑙𝑢𝑙.(1.4) In principle, the separation criterion in (1.3) can be optimized by any suitable ICA algorithm where contrasts are utilized (see; e.g., [2]). A popular choice of such a contrast-based algorithm is the so-called fast (cubic) converging Newton-type (fixed-point) algorithm, normally referred to as FastICA [3], based on𝐖𝑘+1=𝐖𝑘𝐸𝜑+𝐃(𝐮)𝐮𝑇𝐸𝜑diag𝑙𝑢𝑙𝑢𝑙𝐖𝑘,(1.5) where, as defined in [4],1𝐃=diag𝐸𝜑𝑙𝑢𝑙𝑢𝑙𝜑E𝑙𝑢𝑙,(1.6) with 𝜑(𝑡)=[𝜑1(𝑢1),𝜑2(𝑢2),,𝜑𝑛(𝑢𝑛)]T being valid for all 𝑙=1,2,,𝑛. In the ICA framework, accurately estimating the statistical model of the sources at hand is still an open and challenging problem [2]. Practical BSS scenarios employ difficult source distributions and even situations where many sources with very different pdfs are mixed together. Towards this direction, a large number of parametric density models have been made available in recent literature. Examples of such models include the generalized Gaussian density (GGD) [5], the generalized lambda density (GLD), and the generalized beta distribution (GBD) or even combinations and generalizations such as super and generalized Gaussian mixture model (GMM) [6], the generalized gamma density (GGD) [7], the Pearson family of distributions [4], and even the so-called extended generalized lambda distribution (EGLD) which is an extended parameterizations of the aforementioned GLD and GBD models [8]. In the following section, we propose Mixture Generalized Gamma Density (MGΓD) for signal modeling in blind signal separation.

2. Mixture Generalized Gamma Density (MGΓD)

A Mixture Generalized Gamma Density (MGΓD) is a parametric statistical model which assumes that the data originates from weighted sum of several generalized gamma sources [9]. More specifically, a MGΓD is defined as𝑝(𝑢𝜽)=𝑘𝑖=1𝑚𝑖𝑝𝑖𝑢𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖,(2.1) where(i)𝜽=(𝑚𝑖,𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖),𝑖=1,2,,𝑘, (ii)𝑘 is the number of mixture density components,(iii)𝑚𝑖 is the 𝑖th mixture weight and satisfies 𝑚𝑖0,𝑘𝑖=1𝑚𝑖=1,(iv)𝑝𝑖(𝑢|𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖) is an individual density of the generalized gamma density which is characterized by [7],𝑝𝑖𝑢𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖=𝛾𝑖𝛽𝛼𝑖𝑖||𝑢𝑐𝑖||𝛼𝑖𝛾𝑖1𝛼2Γ𝑖exp𝛽𝑖||𝑢𝑐𝑖||𝛾𝑖,(2.2) where 𝑐𝑖 is the location parameter, 𝛽𝑖>0 is the scale parameter, 𝛼𝑖>0 is the shape/power parameter, and 𝛾𝑖>0 is the shape parameter. Γ(𝛼) is the gamma function, defined byΓ(𝑥)=0𝑡𝑥1𝑒𝑡𝑑𝑡.(2.3) By varying the parameters, it is possible to characterize a large class of distributions such as Gaussian, sub-Gaussian (more peaked, than Gaussian, heavier tail), and supergaussian (flatter, more uniform). It is noticed that for 𝛾=1, the (GΓD) define gamma density as special case. Furthermore, if 𝛾=2 and 𝛼=0.5, it become the Gaussian pdf, and if 𝛾=1and 𝛼=1, it represent the Laplacian pdf.

Figures 1 and 2 show some examples of pdf for MGΓD for 𝑘=1 and 𝑘=2. Thanks to the shape parameters, the MGΓD is more flexible and can approximate a large class of statistical distributions, this distribution requires to estimate 5𝑘 parameters, 𝜃(𝑚𝑖,𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖)𝑖=1,2,,𝑘. Particularly, we discuss the estimation of these parameters in detail in the following section.

3. Numerical Optimization of the Log-Likelihood Function to Estimate MGΓD Parameters

We propose in this section a generalization of the method proposed in [9] which address only the case of 2 components by setting the derivatives of the log-likelihood function to zeros. The log-likelihood function of (2.2), given by [10]𝐿(𝐮𝜽)=𝑘𝑖=1𝑁1𝑗=1𝑖,𝑗𝑚log𝑖𝑝𝑖𝑢𝑗𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖,(3.1) where 𝑁 the sample size and 𝑖,𝑗=𝑝(𝑖𝑢𝑗)(𝑖=1,,𝑘,𝑗[0,𝑁1]) represents the conditional expectation of 𝑝𝑖 given the observation 𝑢𝑗, this means the posterior probability that 𝑢𝑗 belongs to the 𝑖th component. In the case of generalized gamma distribution, if we substitute (2.2) into (3.1) and after some manipulation, we obtain the following form of 𝐿(𝐮𝜽)𝐿(𝐔𝜽)=𝑘𝑖=1𝑁𝑗=1𝑖,𝑗𝑚log𝑖+𝑘𝑖=1𝑁𝑗=1𝑖,𝑗𝛾log𝑖+𝛼𝑖𝛾𝑖||𝑢1log𝑗𝑐𝑖||log(2)+𝛼𝑖𝛽log𝑖Γ𝛼log𝑖𝛽𝑖||𝑢𝑗𝑐𝑖||𝛾𝑖.(3.2) Accordingly, we obtain for 𝑖=1,2,,𝑘 the following nonlinear equation related to the estimated parameters by derivatives of the log-likelihood function with respect to 𝑐𝑖,𝛼𝑖,𝛽𝑖, and 𝛾𝑖and setting these derivatives to zeros, we obtain 𝜕𝐿(𝐔𝜽)𝜕𝑐𝑖=0,𝜕𝐿(𝐔𝜽)𝜕𝛽𝑖=0,𝜕𝐿(𝐔𝜽)𝜕𝛾𝑖=0,𝜕𝐿(𝐔𝜽)𝜕𝛼𝑖𝜓𝛼=0,𝑖𝛽=log𝑖+𝑁𝑗=1𝑖𝑗||𝑢log𝑗𝑐𝑖||𝛾𝑖𝑁1𝑗=1𝑖𝑗,𝛽𝑖=𝛼𝑖𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝑖,(3.3)𝑁𝑗=1𝑖𝑗+𝛼𝑖𝑁𝑗=1𝑖𝑗||𝑢log𝑗𝑐𝑖||𝛾𝑖=𝛽𝑖𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖||𝑢log𝑗𝑐𝑖||𝛾𝑖,(3.4) where Ψ() is the digamma function (Ψ(𝑥)=Γ(𝑥)/Γ(𝑥)). After a little mathematical manipulation, the ML estimate of 𝛾𝑖 is obtained𝜓𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖||𝑢log𝑗𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖𝑁𝑘=1𝑖𝑘||𝑢log𝑘𝑐𝑖||𝛾𝑖log𝑁𝑗=1𝑖𝑗2𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖||𝑢log𝑗𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖𝑁𝑘=1𝑖𝑘||𝑢log𝑘𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗||𝑢log𝑗𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗=0.(3.5) Given the estimate of 𝛾𝑖, it is straightforward to derive the estimate for 𝛼𝑖,𝛽𝑖, and 𝑐𝑖. Let ̂𝛾𝑖 be the estimate of 𝛾𝑖. Then, 𝛼𝑖=𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗|𝑢𝑗𝑐𝑖|𝛾𝑖𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖||𝑢log𝑗𝑐𝑖||𝛾𝑖𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖𝑁𝑘=1𝑖𝑘||𝑢log𝑘𝑐𝑖||𝛾𝑖,̂𝛽𝑖=𝛼𝑖𝑁𝑗=1𝑖𝑗𝑁𝑗=1𝑖𝑗||𝑢𝑗𝑐𝑖||𝛾𝑖,𝑁𝑗=1𝑖,𝑗𝛼𝑖̂𝛾𝑖𝜂𝑢1𝑗̂𝛾𝑖̂𝛽𝑖𝜂𝑢𝑗||𝑢𝑗̂𝑐𝑖||̂𝛾𝑖=0,(3.6) where 𝜂(𝑥)=1if𝑢𝑗𝑐𝑖<0,1if𝑢𝑗𝑐𝑖0,(3.7) where 𝛼𝑖 and ̂𝛽𝑖 are the resulting estimates for 𝛼𝑖 and 𝛽𝑖, respectively, and to estimate the location parameter, we solve (3.4) by gradient ascent. The estimation of weight coefficient obtains directly from 𝑖,𝑗 as follows [10]m𝑖=1𝑁N𝑗=1𝑖,𝑗.(3.8)

However, (3.5) cannot be easily solved, so we adopt the gradient ascent algorithm to obtain the estimate of 𝛾𝑖 and determine the estimates of 𝛼𝑖, 𝛽𝑖, and 𝑐𝑖 uniquely using this value of 𝛾𝑖.

Alternative numerical method can be used to estimate the parameters is called NM, where the appeal of the NM optimization technique lies in the fact that it can minimize the negative of the log-likelihood objective function given in (3.2), essentially without relying on any derivative information. Despite the danger of unreliable performance (especially in high dimensions), numerical experiments have shown that the NM method can converge to an acceptably accurate solution with substantially fewer function evaluations than multidirectional search descent methods. Good numerical performance and a significant improvement in computational complexity for our estimation method, therefore, optimization with the NM technique, produce a good estimation for parameters in MGΓD. To show the performance of NM, we consider the next example.

3.1. Example

We generate random number from MGΓD with parameters 𝑘=2, 𝑚1=0.25, 𝑚2=0.75, 𝛼1=0.5, 𝛼2=0.5, 𝛽1=2, 𝛽2=2, 𝛾1=1, 𝛾2=4, 𝑐1=0, and 𝑐2=10. By performs NM, we obtain best estimation for parameters. As we show in Table 1, the first 5th values of estimated parameters after being sorted according the value of function. In the following section, we resolve to FastICA algorithm for blind signal separation (BSS), this algorithm depends on the estimated parameters and an unmixing matrix W which estimated by FastICA algorithm.

4. Application of MGΓD in Blind Signal Separation

Novel flexible score function is obtained, by substituting (2.1) into (1.4) for the source estimates 𝑢𝑙,𝑙=1,2,,𝑛, it quickly become obvious that our proposed score function inherits a generalized parametric structure, which in turn can be attributed to the highly flexible MGΓD parent model. In this case, a simple calculus yield the flexible BSS score function 𝜑𝑙𝑢l=𝜽1𝑝𝑢𝑙𝑢𝑙𝑘𝑖=1𝑚𝑖𝛾𝑖𝛽𝑖𝛼𝑖𝑢sign𝑙𝑐𝑖||𝑢𝑙𝑐𝑖||(𝛼𝑖𝛾𝑖2)𝛽exp𝑖||𝑢𝑙𝑐𝑖||𝛾𝑖𝛼2Γ𝑖×𝛼𝑖𝛾𝑖1𝛽𝑖𝛾𝑖||𝑢𝑙𝑐𝑖||𝛾𝑖.(4.1) In principle 𝜑𝑙(𝑢𝑙𝜽) is capable of modeling a large number of signals, such as speech or communication signals, as well as various other types of challenging heavy- and light-tailed distributions.

This is due to the fact that its characterization depends explicitly on all parameters 𝑚𝑖,𝑐𝑖,𝛼𝑖,𝛽𝑖,𝛾𝑖, 𝑖=1,2,,𝑘. Other commonly used score functions can be obtained by substituting appropriate values for 𝑚𝑖,𝑐𝑖,𝛼𝑖,𝛽𝑖, and 𝛾𝑖 in (4.1), for instance, when 𝑘=1, we have score function 𝜑𝑙ul=𝑢𝜽sign𝑙𝑐1||𝑢𝑙𝑐1||𝛽1𝛾1||𝑢𝑙𝑐1||𝛾1𝛼1𝛾1+1.(4.2) When 𝛼1𝛾1=1 and 𝛽1=1, we have a scaled form of the GGD-based score function constitutes such a special case of(4.2)𝜑𝑙𝑢𝑙𝜽=𝛾1𝑢sign𝑙𝑐1||𝑢𝑙𝑐1||𝛾11.(4.3) The function 𝜑𝑙(𝑢𝑙𝜽) could become singular, in some special cases, essentially those corresponding to heavy-tailed (or sparse) distribution defined for 𝛼𝑖𝛾𝑖1 with 𝛼𝑖=1 and |𝑢𝑙𝑐𝑖|=0. In practice, to circumvent such deficiency, the denominator in (4.1) can be modified slightly to read 𝜑𝑙𝑢l=𝜽1𝑝𝑢𝑙𝑢𝑙+𝜀𝑘𝑖=1𝑚𝑖𝛾𝑖𝑢sign𝑙𝑐𝑖||𝑢𝑙𝑐𝑖||(𝛼𝑖𝛾𝑖2)||𝑢exp𝑙𝑐𝑖||/𝛽𝑖𝛾𝑖2𝛽𝑖𝛼𝑖𝛾𝑖Γ𝛼𝑖×𝛼𝑖𝛾𝑖𝛾1𝑖𝛽𝛾𝑖𝑖||𝑢𝑙𝑐𝑖||𝛾𝑖,(4.4) where 𝜀 is a small positive parameter (typically around 104) which, when put to use, can almost always guarantee that the discontinuity of (4.1) or values in or approaching the region |𝑢𝑙𝑐𝑖|=0 is completely avoided.

5. Numerical Experiments

To investigate the separation performance of the proposed MGΓD-based FastICA BSS method, a set of numerical experiments are preformed, in which we consider only two cases when 𝑘=1, 𝑘=2, and we illustrate this in the following two examples.

5.1. Example  1

In this example, 𝑘=1 and the data set used consists of different realizations of independent signals, with distributions shown in Table 2. Note that this is a large-scale and substantially difficult separation problem, since it involves a Gaussian, various super- and sub-Gaussian symmetric PDFs, as well as asymmetric distributions. In all cases, the number of data samples has also been designed to be relatively small; for example, 𝑁=250. The source signals are mixed (noiselessly) with randomly generated full-rank mixing matrices A. The FastICA method is implemented in the so-called simultaneous separation mode whereas the stopping criterion is set to 𝜀=104. FastICA is executed using the flexible MGΓD model is used to model the distribution of the unknown sources, while (3.3)–(3.5) are employed to adaptively calculate the necessary parameters of the MGΓD-based score function defined in (4.4).

Now, to show the performance in this case, we consider three source signals (source), where these signals are generated randomly from Weilbull, gamma, and exponential distributions as follows:𝐬1𝐬=wblrnd(1250,1),2𝐬=exprnd(1,1,250),3=gamrnd(1,1,250).(5.1) Let the mixing matrix A and unmixing matrix W be defined as follows:𝐀=.56.79.37.75.65.86.17.32.48,𝐖=0.21280.21770.25660.21840.19240.16461.94851.87981.8687.(5.2) By using the equation 𝐱=𝐀𝐬, we obtain mixed signals as show in Figure 3, where mixing signals are in the left and source signals are in the right.

After using FastICA, we recover the sources, and we show the estimated signals in the left and original signals in the right in Figure 4 with different in scale only.

5.2. Example  2

In this example, 𝑘=2 in which the data set used consists of different realizations of independent signals. Note that each signal not only a Gaussian, super-, and sub-Gaussian PDFs, but it is mixed of this PDFs as shown in Table 3.

To show the performance in this case, we consider three source signals (source) where these signals are generated randomly from Gamma_Lapalce (𝐬1), Weilbull (𝐬2), and Gaussian_Laplace (𝐬3) distributions. Let the mixing matrix A and unmixing matrix W be defined as follows:𝐀=.56.79.37.75.65.86.17.32.48,𝐖=0.12932.65794.23221.44183.15339.99302.33108.410711.6627.(5.3) By using the equation 𝐗=𝐀𝐒,we obtain mixed signals as show in Figure 5, where mixed signals are in the left and source signals are in the right.

After using FastICA, we recover the source, and we show the estimated signals in the left with scale and original signals in the right in Figure 6.

6. Algorithm Performance

The separation performance for ICA algorithm is evaluated with the crosstalk error measure𝑃𝐼=𝑛𝑖=1𝑛𝑗=1||𝑝𝑖𝑗||2max1𝑙𝑛||𝑝𝑖𝑙||2+1𝑛𝑗=1𝑛𝑖=1||𝑝𝑖𝑗||2max1𝑙𝑛||𝑝𝑙𝑗||21.(6.1) Note that here, 𝑝𝑖𝑗 represents the elements of the permutation matrix 𝐏=𝐖𝐀, which after assuming that all sources have been successfully separated should ideally reduce to a permuted and scaled version of the identity matrix. The separation performance for the first example is 𝑃𝐼=12.68dB and for second example is 𝑃𝐼=16.68dB.

7. Conclusions

We have derived a novel parametric family of flexible score functions, based exclusively on the MGΓD model. To calculate the parameters of these functions in an adaptive BSS setup, we have chosen to maximize the ML equation with the NM optimization method. This alleviates excessive computational cost requirements and allows for a fast practical implementation of the FastICA. Simulation results show that the proposed approach is capable of separating mixtures of signals.