Abstract

The extraction of the stochastic source signals whose probability density functions (PDFs) are skewed is very important in many applications such as biomedical signal processing and mechanical fault diagnosis. This paper shows that the skewed source signal with the maximal absolute value of skewness can be fast extracted by a proposed algorithm using conditional expectation. Compared with the existing conditional expectation-based algorithms, the proposed one possesses two main advantages. One is that it does not require the prior knowledge of the positive support of the desired source, namely the time indices where the source of interest is positive. The other is that it can be employed both in the determined and underdetermined cases. Furthermore, the proposed algorithm is mainly based on the first- and second-order statistics and does not need the preprocessing so that the computational cost is significantly low. Simulation results show the superiority of the proposed algorithm over the existing methods and indicate that the proposed algorithm also performs well in the underdetermined case when the number of sensors is slightly less than that of sources.

1. Introduction

The target of independent source extraction is to estimate a specific source from the observations mixed by the source signals, where the source signals are mutually independent. It can be applied in various areas such as speech and image processing, biomedical signal processing, mechanical fault diagnosis, and wireless communication. Due to these wide application fields, independent source extraction has gained much attention in the past few decades [1, 2]. In some applications especially biomedical signal processing and mechanical fault diagnosis, it is of significance to extract the stochastic source signals with the skewed probability density functions (PDFs). For instance, the mechanical vibrations derived from defective bearings, which are desired to be extracted in vibration analysis, may have asymmetric PDF [3], and the fetal electrocardiogram (FECG) signal has quite different skewness compared to the maternal electrocardiogram (MECG) signals [4], where the FECG requires to be estimated from the mixtures of the FECG and the MECG. The existing extraction algorithms mainly employ the second- and higher-order statistics by exploiting the statistical independence of the sources, and some of them need to preprocess the mixture data such as FastICA [5]. Recently, Zarzoso et al. [6] and Xu et al. [7] and Xu and Shen [8] proposed a class of more computationally efficient algorithms for independent source extraction based on the first-order statistics as well as the conditional expectation. But before performing this class of algorithm, it requires to know the time indices where the source of interest is positive, that is, the positive support of the desired source, which reduces the practicability of this class of algorithm.

In this paper, we propose a new extraction algorithm using the conditional expectation for the skewed source signal with the maximal absolute value of skewness. It does not require the prior knowledge of the positive support of the desired source and can be applied both in the determined and underdetermined cases. It should be noted that after obtaining the source signal with the maximal absolute value of skewness, the source signal with the second maximal absolute value of skewness can also be extracted by the proposed algorithm from the mixtures which subtract the component of the estimated source (e.g., through linear regression as in [6]). Likewise, if required, the other skewed source signals with different skewness can be gained sequentially. For simplicity, this paper assumes that the skewed source signal with the maximal absolute value of skewness is the desired source. The proposed algorithm obtains the desired column vector of the mixing matrix corresponding to the source of interest by the conditional expectation and retrieves the desired source by the minimum mean-squared error-based (MMSE) beamforming approach [9]. Through several iterations with the initial value of the desired column vector of the mixing matrix gotten by the estimated approximately purely positive or negative interval, the estimation of the desired source can be derived accurately. The proposed algorithm is rather cost-effective, since it is mainly based on the first- and second-order statistics and does not require the preprocessing. Simulation results validate the superiority of the proposed algorithm over the existing methods and show that the proposed algorithm even performs well in the underdetermined case when the number of sensors is close to that of sources.

2. Data Model

Consider the instantaneous linear mixture shown by where is composed of source signals which are mutually independent, consists of mixtures received by sensors, is the unknown mixing matrix, and the superscript represents the transpose operator. In this paper, we consider that the source signals are stochastic with the unimodal continuous PDFs, since many signals such as some vibrational signals [10] and ECG signals [11] possess these characteristics. For convenience, we further assume that the source signals are stationary with zero mean and unit variance. Note that the assumption of stationary source signals is reasonable, since some nonstationary sources can also be divided to be several stationary blocks and we can tackle these stationary blocks separately. For example, some ECG signals can be regarded to be stationary in the duration of one heartbeat, despite they are nonstationary signals [12]. Our goal is to extract the skewed source signal which has the maximal absolute value of skewness from the accessible mixtures.

3. The Proposed Algorithm

Since the first-order statistics-based algorithms [6ā€“8] generally extract the source of interest under the condition that the mixing matrix is a unitary matrix, they can be only used in the determined case. In this paper, we remove this condition and obtain the column vector of the mixing matrix corresponding to the desired source based on the conditional expectation shown by where denotes the expectation operator, is the desired source, is the th column vector of the mixing matrix, , and . Obviously, and are the constants according to the stationary assumption of the sources. The proof of (2) can be easily deduced from the assumptions of the sources, that is, where is the unit vector in which the th entry is 1 and the other entries are 0. The proof of the other equation when in (2) is similar with (3). Then, we estimate the desired source by the MMSE beamforming approach, which is where is the estimation of , is the covariance matrix of the mixtures, and the superscript āˆ’1 stands for the inversion operator. Thus, when and and the positive or negative support of the desired source are provided, the desired source can be estimated by (2) and (4). Actually, and can be easily figured out when the PDF of the desired source is known. The values of for some normalized distributions are shown in [6]. However, when the PDF of the desired source is unknown, and are not obtainable. In this case, we can only get the direction of the vector by (2) which is the same with the direction of or . Since the estimated has the correct direction and unknown size, it will lead to the ambiguous amplitude of the estimated by (4). Fortunately, this indeterminacy of amplitude for estimating the desired source is allowable in many applications. For simplicity, we set , and then (2) denotes the direction of . Unless stated otherwise, in the rest of this section, estimating refers to estimate the direction of the vector .

In practical applications, the complete information about the positive or negative support of the desired source is extremely hard to be acquired. However, it is more possible to get a subset of the samples of the desired source in which the positive samples are more than the negative samples obviously, or it is the opposite. We can see that this subset is close to a purely positive or negative set. We define the correct index classification ratio as in [6], where is the number of the positive samples in the subset and is the total number of the samples in the subset. It was suggested in [6] that when is close to 0 or 1, the subset can be also used to estimate the desired source and the estimation performance is only slightly worse than that of employing the complete information about the positive or negative support of the desired source. Similarly, if we get the subset like this, we can use the information about this subset to roughly estimate . Fortunately, for a signal with unimodal continuous skewed distribution, we can utilize the asymmetry of its PDF to get the subset which is close to be purely positive or negative.

Figure 1 shows the PDF of unimodal continuous skewed distribution, where denotes the skewness defined by in which and are the mean and standard deviation of the random variable , respectively. , , and in Figure 1 represent the positively skewed distribution, the negatively skewed distribution, and the symmetric distribution, respectively. Since we consider the case when is zero mean and unit variance, the definition of skewness is reduced to be . When is subject to a positively skewed distribution (), we take a finite set of samples, , generated by the distribution of into account. As the skewness reflects the asymmetry of a PDF, a larger absolute value of skewness means stronger asymmetry of a PDF. Thus, it is easily deduced that most samples in are smaller than zero, and the proportion of the negative samples increases when rises. When randomly extracting some samples from to form a subset, we can find that the subset may get close to a purely negative set. And with the increase of , it will be more likely that the samples in this subset are completely negative. Likewise, in the case with , we can obtain the similar results. These provide the possibility to extract the source with the maximal absolute value of skewness based on the conditional expectation. Nevertheless, the random extraction is unstable, because this method is probabilistic. Instead, we separate the whole samples into several equivalent intervals and test all the intervals to find out the one closest to be purely positive or negative. Note that the size of each interval should be appropriate. If the size of each interval is too large, there will be high probability to contain the positive and negative values simultaneously. On the contrary, too small size of each interval cannot present the skewness characteristic of the skewed signal, since exhibiting the skewness requires a certain quantity of samples. The proper size should be adjusted depending on the nature of the signal, mainly its PDF. In the following section, we show that this proper size and the number of the intervals can be gotten by the simulations.

According to the analysis above, we divide the mixtures into equivalent intervals, and the size of each interval is , where is the number of samples and represents the floor function. Meanwhile, the inaccessible desired source is divided into intervals accordingly. Assume that the th interval of the desired source has positive samples. Then, in the th interval, the correct index classification ratio is . The objective is to find the optimal interval in which the samples of the desired source are closest to be purely positive or negative. Thus, can be roughly estimated by (2) through employing the information of this optimal interval. Assume that this optimal described by interval is the th interval which can be mathematically expressed as

Then, we propose a new method to find the index . We suppose that the th interval of the desired source is purely positive or negative, namely, or 1. According to (2), is estimated by where is a set constituted by the indices in the th interval. Nevertheless, is almost impossible to be 0 or 1, which leads to the inaccurate result by (6). We further rewrite the right side of (6) as

According to (3), if is the integer multiple of , will be the accurate estimation of . Since the desired source has the maximal absolute value of skewness, the th interval of the desired source is closer to a pure positive or negative region and has much greater absolute value of the sum of the samples than the other sourcesā€™ th interval, namely,

Then, we have , where is an integer. Therefore, by performing (6) with the indices in the th interval, we can obtain the approximate estimation of . Obviously, the estimation accuracy of is better than that of which is the estimation of by employing the other intervals. Using (4), we get the estimations of the desired source written as and by applying and , respectively. It is obvious that has higher estimation precision than , so the absolute value of the skewness of is larger than the absolute value of the skewness of . Given all that, the index of objective interval is the index corresponding to the maximum of .

By means of the above method, can be roughly estimated as by the information of the th interval. In order to improve the estimate accuracy, we propose an iteration method which is implemented by iterating (4) and (2) with the initial value until convergence. The outline of the proposed algorithm is shown below. (1)Calculate and its inverse.(2)Divide the mixtures into intervals, compute by (6), and then obtain by (4) and the corresponding .(3)Select the index corresponding to the maximum of and get the initial value by (6).(4)Achieve by (4) and acquire the positive or negative support of .(5)Obtain the estimation of by (2) through employing the knowledge about the positive or negative support in the step 4.(6)Iterate the step 4 and the step 5 until convergence.

4. Simulation Results

We consider five skewed source signals and fifteen symmetric distributed source signals (). The five skewed source signals are generated as follows. Firstly, we generate five skewed signals based on the noncentral -distributions [13] whose degrees of freedom and noncentrality parameters are (10, 2), (10, 1), (12, āˆ’1), (15, āˆ’2), (20, 1), respectively. Then, we centralize and standardize these skewed signals so that they are zero mean and unit variance. The values of skewness of the five skewed source signals are 0.6701, 0.4880, āˆ’0.5532, āˆ’0.3527, and 0.1486, respectively. The symmetric distributed source signals are generated based on the standard normal distribution. The mixing matrix is generated randomly, whose entries are subject to the standard normal distribution. We set the number of samples to 600 and rate to 300. We aim at extracting the skewed source signal with the maximal absolute value of skewness from the mixtures. FastICA [5] and the algorithm in [6] will be compared with the proposed algorithm.

We denote the algorithm in [6] as the first-order algorithm (FOA for short).

Figure 2 illustrates the average interference-to-signal ratio (ISR) (defined in [6]) versus obtained by the proposed algorithm over 200 Monte Carlo runs when , , , and . It can be observed from Figure 2 that the proposed algorithm has the low extraction performance when is selected to be too small or large. Since each interval should not be too short or long according to the discussion above, needs to be set properly. Based on Figure 2, we choose in (8, 33) empirically and set in the following experiments. In Figure 3, we show the average loci of the ISR versus the iteration number obtained by FastICA, FOA with and when Mā€‰=ā€‰N, and the proposed algorithm when , , , and over 200 Monte Carlo runs, where is the correct index classification ratio defined in [6]. As depicted in Figure 3, the proposed algorithm can extract the desired source successfully after several iterations both in the determined and underdetermined cases. When , the proposed algorithm has better extraction performance and faster convergence rate than FastICA. Moreover, the proposed algorithm possesses a faster convergence rate than FOA with when and the proposed algorithm and FOA with and when own the same performance of the steady state. Although the performance of the convergence rate of FOA with is better than that of the proposed algorithm, FOA is hard to get a priori knowledge satisfying . It can also be seen from Figures 2 and 3 that the extraction performance of the proposed algorithm deteriorates with the decrease of the number of sensors, . However, the proposed algorithm still performs well when is slightly less than in the underdetermined case.

Additionally, we compare the average computational cost of FastICA, FOA with when , and the proposed algorithm when , , and over 200 Monte Carlo runs as shown in Table 1. We can see from Table 1 that the proposed algorithm costs less in computations than FastICA and the proposed algorithm with less has slightly lower computational cost. It can also be observed from Table 1 that FOA with possesses lower computational cost than the proposed algorithm. This is because FOA is only based on the first-order statistics. However, FOA requires some prior knowledge of the desired source which is difficult to get and cannot be applied into the underdetermined case. So, the proposed algorithm is more practical than FOA.

5. Conclusion

In this paper, we proposed a cost-effective algorithm based on the conditional expectation for the extraction of the skewed source signal with the maximal absolute value of skewness. Simulation results testify the superiority of the proposed algorithm and validate that it performs well even in the underdetermined case when the sensor number is close to the source number. Future research of this work should extend to study more complicated and more practical mixture modes such as the convolutive mixture and the nonlinear mixture.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant nos. 61172061 and 61201242 and the National Natural Science Foundation of Jiangsu Province in China under Grant no. BK2012057.