Independent Component Analysis Based on Information Bottleneck

Ke, Qiao; Zhang, Jiangshe; Srivastava, H. M.; Wei, Wei; Chen, Guang-Sheng

doi:https://doi.org/10.1155/2015/386201

Abstract and Applied Analysis

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 386201 | https://doi.org/10.1155/2015/386201

Independent Component Analysis Based on Information Bottleneck

Qiao Ke,¹Jiangshe Zhang,¹H. M. Srivastava,²Wei Wei,³and Guang-Sheng Chen⁴

Academic Editor: Hui Zhang

Received26 Jun 2014

Accepted15 Jul 2014

Published07 Jun 2015

Abstract

The paper is mainly used to provide the equivalence of two algorithms of independent component analysis (ICA) based on the information bottleneck (IB). In the viewpoint of information theory, we attempt to explain the two classical algorithms of ICA by information bottleneck. Furthermore, via the numerical experiments with the synthetic data, sonic data, and image, ICA is proved to be an edificatory way to solve BSS successfully relying on the information theory. Finally, two realistic numerical experiments are conducted via FastICA in order to illustrate the efficiency and practicality of the algorithm as well as the drawbacks in the process of the recovery images the mixing images.

1. Introduction

Information theory is found by Claude Elwood Shannon (1948) in one of his famous academic papers, “A Mathematical Theory of Communication,” where he gave the definition of information and information entropy based on the probability theory which build a bridge between the information theory and the numerical mathematics. Some basic conceptions (entropy, negentropy, mutual information, and so on) in the information theory have been successfully used to elaborate the independent components (ICs) and to deal with the problems on the application of the blind source separation (BSS). In the past decades, the information theory has been applied successfully into many fields such as clustering [1], medical examination [2], independent component analysis [3], feature learning [4], and telecommunication [5–8]. The purpose of this paper is to use the information bottleneck to derive the maximum of the mutual information (MI) between the mixing data and the recovery data which is no more than the MI of the recovery data and the original sources.

The rest of the paper is organized as follows. In Section 2, we first explain the information theory and introduce some important formulas. In Section 3, based on the entropy, the mutual information (MI), and negentropy, information bottleneck is used to illustrate the equivalence of the two classical algorithms, informax [3] and FastICA [9]. At last, by a series of experiments of synthetic data, sonic data, and image in Section 4, it is easy to compare the accuracy and complexity of the two algorithms. However, the ambiguity of the direction and scale of the recovery matrix lead the results of the image to the opposite.

2. Information Theory

According to the explanation of communication theory by Warren Weaver, “information” is not related to what you do say but to what you could say. That is, information is a measure of one’s freedom of choice when one selects a message [6, 7, 10].

At first, people focused attention on the “meaningful” or “relevant” information, which is crucial in solving the problem of transmitting information. Then, some scholars argue that lossy source compression provides a natural quantitative approach to “relevant information” [11, 12].

So, information bottleneck, which is going to seek for a tradeoff between the compression and the representation and preserving meaningful information, could be decomposed into the following aspects:(1)how to define the “meaningful” or “relevant” information;(2)how to extract the efficient representation of relevant information in order to transmit it speedily;(3)how to recover the information as exactly and comprehensively as possible only based on the efficient representation of relevant information.

People regard the possible results of the uncertainty or fuzzy as the surprise or information [13], and the smaller probability of the results occurring, the bigger surprise or the more information people obtained. So, entropy , a measure of the chaotic degree, is defined to measure the uncertainty of information. Assume that is a discrete random variable, and probability density function (pdf) ; then, entropy is defined asMoreover, it is easy to generalize it to more than two random variables, the joint entropy. On the other hand, mutual information (MI), a measure of dependency of two different random variable sets, is regarded as the reduction of uncertainty of the random variable, given the other random variable. Consider two random variables and , with the joint pdf and marginal pdfs and , respectively. MI can be written as follows:

It is easy to prove the following equations about MI based on information entropy:

According to the last two terms, we can find the relationship between MI and entropy. if and only if and are irrelevant.

3. The Equivalence of the Two ICA Algorithms Based on the IBN

Information bottleneck (IBN) [14] is used to make sure to recover the compressed information , which is presumed to be good representation or compression of the original information , to the recipient in terms of in the following type:

Now, in the terminology of information theory and optimizing theory, there are two inconsistent optimal problems that, on the one hand, we would make sure to minimize MI between the original information and the compressed information and, on the other hand, we want to capture the maximum of mutual information between and . Obviously, the amount of information about in is given by while the mutual information between the independent sources and the mixing signals is determinate but unknown with the precondition of ICA.

ICA is studied to find the independent sources as , which is equal to the original independent sources, ignoring the ambiguity of the direction and scale. Furthermore, the independent sources are the most concise, while any linear transformation of the independent sources obtains the redundance informationIf and only if are the independent sources the equation is true. That is to say, we need to find the recovery matrix , , in order to obtain the independent sources. Because of the precondition of the unknown independent sources and mixing matrix, the optimal problem of ICA is written by IBN [14] as follows:where is an theoretic maximum and is an approximate maximum.

3.1. Infomax Method

Infomax method [3] is used to tackle the problem of separating the mixture signals , attempting to look for the weight matrix without both the mixture matrix and the original signal . We attempt to illustrate BBS in the following: According to the optimization problem of ICA (16), we could rewrite it as follows: That is also regarded asThe equation can be differentiated with respect to a parameter , involved in the mapping from to : Therefore, MI between the mixtures and recoveries could be maximized by maximizing the entropy of the recoveries alone. And then the gradient method is used to obtain the learning method [3].

Considering the slow convergence and nonprecise and low accuracy, gave a FastICA based on the maximum of the negentropy, which is regarded as the measure for the independency of the signal.

3.2. FastICA Method

According to the informax method and IBN, the BBS is equal to the optimal problem as follows: Then, the optimal problem can be adapted aswhere and are the ICs. How can we identify and measure the independence of the recovery data? The equivalence of the non-Gaussian random variables and negentropy is illustrated based on the Central-Limit Theorem [9].

Theorem 1 (Central-Limit Theorem [15]). Given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.

According to the Central-Limit Theorem, if and only if the recovery data is a permutation of the original independent sources , the non-Gaussian random variables reaches the maximum. Considervia the normalization of mixing data: where is the identical matrix. So, (10) is rewritten aswhich is equivalent to (7) and (10). So, we can obtain the equivalence of the two classical algorithms in the point of IBN.

The approximation of negentropy and fixed-point algorithm are applied to derive the learning rule [9].

4. Experiments

Based on the infomax learning rule, the experiments presented here were obtained using the synthetic data as the original data plotted in Figure 1(a). The result by infomax is listed in Figure 1(b) corresponding to the recovery matrices (17). Obviously, is the product of the recovery matrix and the mixture matrix so that it would be the permutation of the approximate diagonal matrix. Then, we can easily find that only one substantial entry (boxed) exists in each row and column

(a) Synthetic independent sources

(b) Recovery data

Figure 1

ICA. The synthetic independent data are plotted in (a), and the recovery data are shown in (b) corresponding to the matrix in (17). In terms of every column of the matrix , the substantial entry, , is almost a reflection of the transformation between the original data and the recovery data by multiplying the substantial entry , accompanied by the nonzero entries, . For example, in the first result of the ICA experiments, is just a proof that the original data is recovered into the recovery data with a multiplicator and some noises based on the minor numbers of and .

In order to illustrate the efficiency of FastICA algorithm and the limitation on no more than one Gaussian variable, we list some numerical results on the blind mixing signals shown in Figures 2, 3, and 4, using the nonquadratic function to approximate the negentropy. Consider

(a) Independent sources

(b) Recovery data

(c) Mixing data

(a) Synthetic independent source

(b) Recovery data

(c) Mixing data

(a) Independent sources

(b) Recovery data

(c) Mixing data

Figure 2 is an obvious proof to declare the efficiency of the algorithm separating the randomly mixing data of the sinusoid, the rectangular curve, and the sawtooth curve successfully. And (18) revealed that the matrix is an elementary transformation of the approximative inverse of the mixing matrix

Then, it is necessary and meaningful to add the Gaussian variable into the original data to prove the efficiency of the algorithm so that the result is shown in Figure 3 and the product matrix is in (19). At last, based on the two Gaussian signals in the mixing data, the algorithm is not efficient to separate the two Gaussian signals apart shown in Figure 4.

Furthermore, the average iterative steps on the first three numerical experiments based on FastICA algorithm are shown in Table 1.

After the experiments on the synthetic data, the algorithm is also efficient on the real sonic data in Figure 5 and image data in Figure 6. In the process of separating the image data, the picture in Figure 7 can be obtained, because the matrix , which alters the picture in the opposite color, is not the exact inverse of

(a) Mixing data

(b) Recovery data

(c) Independent sources

(a) Independent images

(b) Mixing images

(c) Recovery images

5. Conclusion

The algorithm of independent component analysis is enlightened from BSS, which is a very successful application of the information theory in speech recognition, image separation without knowing the linear transformation. But, there are also some disadvantages. For example, there exist the strong preconditions that the original data should be independent and the transformation should be linear.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This investigation was supported by National Basic Research Program of China (973 Program) under Grant no. 2013CB329404, the Major Research Project of the National Natural Science Foundation of China under Grant no. 912300101, the National Natural Science Foundation of China under Grant no. 61075006, the Key Project of the National Natural Science Foundation of China under Grant no. 111311006, the Scientific Research Program Funded by Shaanxi Provincial Education Department (Program no. 2013JK1139), the China Postdoctoral Science Foundation (no. 2013M542370), and the Specialized Research Fund for the Doctoral Program of Higher Education of the People’s Republic of China (Grant no. 20136118120010).

References

E. Gokcay and J. C. Principe, “Information theoretic clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158–171, 2002.
View at: Publisher Site | Google Scholar
A. Kraskov, H. Stögbauer, R. G. Andrzejak et al., “Hierarchical clustering using mutual information,” Europhysics Letters, vol. 70, no. 2, pp. 278–284, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.
View at: Publisher Site | Google Scholar
A. Hyvärinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, Springer, 2009.
T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, 2012.
H. Zhang, X. Liu, J. Wang et al., “Robust $H_{\infty}$ sliding mode control with pole placement for a fluid power electrohydraulic actuator (EHA) system,” The International Journal of Advanced Manufacturing Technology, vol. 73, no. 5-8, pp. 1095–1104, 2014.
View at: Google Scholar
H. Zhang, Y. Shi, and J. Wang, “On energy-to-peak filtering for nonuniformly sampled nonlinear systems: a Markovian jump system approach,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 1, pp. 212–222, 2014.
View at: Google Scholar
H. Zhang and J. Wang, “Combined feedback-feedforward tracking control for networked control systems with probabilistic delays,” Journal of the Franklin Institute, vol. 351, no. 6, pp. 3477–3489, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural Networks, vol. 13, no. 4-5, pp. 411–430, 2000.
View at: Publisher Site | Google Scholar
H. Zhang, X. Zhang, and J. Wang, “Robust gain-scheduling energy-to-peak control of vehicle lateral dynamics stabilisation,” Vehicle System Dynamics, vol. 52, no. 3, pp. 309–340, 2014.
View at: Google Scholar
W. Wei and Y. Qi, “Information potential fields navigation in wireless Ad-Hoc sensor networks,” Sensors, vol. 11, no. 5, pp. 4794–4807, 2011.
View at: Publisher Site | Google Scholar
W. Wei, P. Shen, Y. Zhang, and L. Zhang, “Information fields navigation with piece-wise polynomial approximation for high-performance OFDM in WSNs,” Mathematical Problems in Engineering, vol. 2013, Article ID 901509, 9 pages, 2013.
View at: Publisher Site | Google Scholar
Z. Shuai, H. Zhang, J. Wang et al., “Lateral motion control for four-wheel-independent-drive electric vehicles using optimal torque allocation and dynamic message priority scheduling,” Control Engineering Practice, vol. 24, pp. 55–66, 2014.
View at: Google Scholar
N. Tishby, F. C. Pereira, and W. Bialek, “The information Bottleneck method,” in Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing, pp. 368–377, September 1999.
View at: Google Scholar
J. Rice, Mathematical Statistics and Data Analysis, Cengage Learning, 2006.

Copyright

Copyright © 2015 Qiao Ke et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1501

Downloads

1272

Citations