Abstract

The paper is mainly used to provide the equivalence of two algorithms of independent component analysis (ICA) based on the information bottleneck (IB). In the viewpoint of information theory, we attempt to explain the two classical algorithms of ICA by information bottleneck. Furthermore, via the numerical experiments with the synthetic data, sonic data, and image, ICA is proved to be an edificatory way to solve BSS successfully relying on the information theory. Finally, two realistic numerical experiments are conducted via FastICA in order to illustrate the efficiency and practicality of the algorithm as well as the drawbacks in the process of the recovery images the mixing images.

1. Introduction

Information theory is found by Claude Elwood Shannon (1948) in one of his famous academic papers, “A Mathematical Theory of Communication,” where he gave the definition of information and information entropy based on the probability theory which build a bridge between the information theory and the numerical mathematics. Some basic conceptions (entropy, negentropy, mutual information, and so on) in the information theory have been successfully used to elaborate the independent components (ICs) and to deal with the problems on the application of the blind source separation (BSS). In the past decades, the information theory has been applied successfully into many fields such as clustering [1], medical examination [2], independent component analysis [3], feature learning [4], and telecommunication [58]. The purpose of this paper is to use the information bottleneck to derive the maximum of the mutual information (MI) between the mixing data and the recovery data which is no more than the MI of the recovery data and the original sources.

The rest of the paper is organized as follows. In Section 2, we first explain the information theory and introduce some important formulas. In Section 3, based on the entropy, the mutual information (MI), and negentropy, information bottleneck is used to illustrate the equivalence of the two classical algorithms, informax [3] and FastICA [9]. At last, by a series of experiments of synthetic data, sonic data, and image in Section 4, it is easy to compare the accuracy and complexity of the two algorithms. However, the ambiguity of the direction and scale of the recovery matrix lead the results of the image to the opposite.

2. Information Theory

According to the explanation of communication theory by Warren Weaver, “information” is not related to what you do say but to what you could say. That is, information is a measure of one’s freedom of choice when one selects a message [6, 7, 10].

At first, people focused attention on the “meaningful” or “relevant” information, which is crucial in solving the problem of transmitting information. Then, some scholars argue that lossy source compression provides a natural quantitative approach to “relevant information” [11, 12].

So, information bottleneck, which is going to seek for a tradeoff between the compression and the representation and preserving meaningful information, could be decomposed into the following aspects:(1)how to define the “meaningful” or “relevant” information;(2)how to extract the efficient representation of relevant information in order to transmit it speedily;(3)how to recover the information as exactly and comprehensively as possible only based on the efficient representation of relevant information.

People regard the possible results of the uncertainty or fuzzy as the surprise or information [13], and the smaller probability of the results occurring, the bigger surprise or the more information people obtained. So, entropy , a measure of the chaotic degree, is defined to measure the uncertainty of information. Assume that is a discrete random variable, and probability density function (pdf) ; then, entropy is defined asMoreover, it is easy to generalize it to more than two random variables, the joint entropy. On the other hand, mutual information (MI), a measure of dependency of two different random variable sets, is regarded as the reduction of uncertainty of the random variable, given the other random variable. Consider two random variables and , with the joint pdf and marginal pdfs and , respectively. MI can be written as follows:

It is easy to prove the following equations about MI based on information entropy:

According to the last two terms, we can find the relationship between MI and entropy. if and only if and are irrelevant.

3. The Equivalence of the Two ICA Algorithms Based on the IBN

Information bottleneck (IBN) [14] is used to make sure to recover the compressed information , which is presumed to be good representation or compression of the original information , to the recipient in terms of in the following type:

Now, in the terminology of information theory and optimizing theory, there are two inconsistent optimal problems that, on the one hand, we would make sure to minimize MI between the original information and the compressed information and, on the other hand, we want to capture the maximum of mutual information between and . Obviously, the amount of information about in is given by while the mutual information between the independent sources and the mixing signals is determinate but unknown with the precondition of ICA.

ICA is studied to find the independent sources as , which is equal to the original independent sources, ignoring the ambiguity of the direction and scale. Furthermore, the independent sources are the most concise, while any linear transformation of the independent sources obtains the redundance informationIf and only if are the independent sources the equation is true. That is to say, we need to find the recovery matrix , , in order to obtain the independent sources. Because of the precondition of the unknown independent sources and mixing matrix, the optimal problem of ICA is written by IBN [14] as follows:where is an theoretic maximum and is an approximate maximum.

3.1. Infomax Method

Infomax method [3] is used to tackle the problem of separating the mixture signals , attempting to look for the weight matrix without both the mixture matrix and the original signal . We attempt to illustrate BBS in the following: According to the optimization problem of ICA (16), we could rewrite it as follows: That is also regarded asThe equation can be differentiated with respect to a parameter , involved in the mapping from to : Therefore, MI between the mixtures and recoveries could be maximized by maximizing the entropy of the recoveries alone. And then the gradient method is used to obtain the learning method [3].

Considering the slow convergence and nonprecise and low accuracy, gave a FastICA based on the maximum of the negentropy, which is regarded as the measure for the independency of the signal.

3.2. FastICA Method

According to the informax method and IBN, the BBS is equal to the optimal problem as follows: Then, the optimal problem can be adapted aswhere and are the ICs. How can we identify and measure the independence of the recovery data? The equivalence of the non-Gaussian random variables and negentropy is illustrated based on the Central-Limit Theorem [9].

Theorem 1 (Central-Limit Theorem [15]). Given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.

According to the Central-Limit Theorem, if and only if the recovery data is a permutation of the original independent sources , the non-Gaussian random variables reaches the maximum. Considervia the normalization of mixing data: where is the identical matrix. So, (10) is rewritten aswhich is equivalent to (7) and (10). So, we can obtain the equivalence of the two classical algorithms in the point of IBN.

The approximation of negentropy and fixed-point algorithm are applied to derive the learning rule [9].

4. Experiments

Based on the infomax learning rule, the experiments presented here were obtained using the synthetic data as the original data plotted in Figure 1(a). The result by infomax is listed in Figure 1(b) corresponding to the recovery matrices (17). Obviously, is the product of the recovery matrix and the mixture matrix so that it would be the permutation of the approximate diagonal matrix. Then, we can easily find that only one substantial entry (boxed) exists in each row and column

In order to illustrate the efficiency of FastICA algorithm and the limitation on no more than one Gaussian variable, we list some numerical results on the blind mixing signals shown in Figures 2, 3, and 4, using the nonquadratic function to approximate the negentropy. Consider

Figure 2 is an obvious proof to declare the efficiency of the algorithm separating the randomly mixing data of the sinusoid, the rectangular curve, and the sawtooth curve successfully. And (18) revealed that the matrix is an elementary transformation of the approximative inverse of the mixing matrix

Then, it is necessary and meaningful to add the Gaussian variable into the original data to prove the efficiency of the algorithm so that the result is shown in Figure 3 and the product matrix is in (19). At last, based on the two Gaussian signals in the mixing data, the algorithm is not efficient to separate the two Gaussian signals apart shown in Figure 4.

Furthermore, the average iterative steps on the first three numerical experiments based on FastICA algorithm are shown in Table 1.

After the experiments on the synthetic data, the algorithm is also efficient on the real sonic data in Figure 5 and image data in Figure 6. In the process of separating the image data, the picture in Figure 7 can be obtained, because the matrix , which alters the picture in the opposite color, is not the exact inverse of

5. Conclusion

The algorithm of independent component analysis is enlightened from BSS, which is a very successful application of the information theory in speech recognition, image separation without knowing the linear transformation. But, there are also some disadvantages. For example, there exist the strong preconditions that the original data should be independent and the transformation should be linear.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This investigation was supported by National Basic Research Program of China (973 Program) under Grant no. 2013CB329404, the Major Research Project of the National Natural Science Foundation of China under Grant no. 912300101, the National Natural Science Foundation of China under Grant no. 61075006, the Key Project of the National Natural Science Foundation of China under Grant no. 111311006, the Scientific Research Program Funded by Shaanxi Provincial Education Department (Program no. 2013JK1139), the China Postdoctoral Science Foundation (no. 2013M542370), and the Specialized Research Fund for the Doctoral Program of Higher Education of the People’s Republic of China (Grant no. 20136118120010).