About this Journal Submit a Manuscript Table of Contents
Research Letters in Signal Processing
Volume 2008 (2008), Article ID 790607, 5 pages
http://dx.doi.org/10.1155/2008/790607
Research Letter

Generalized Cumulative Residual Entropy for Distributions with Unrestricted Supports

Lab-STICC (CNRS FRE 3167), Institut Telecom, Telecom Bretagne, Technopole Brest Iroise, CS 83818, 29238 Brest Cédex, France

Received 6 April 2008; Accepted 19 June 2008

Academic Editor: Andreas Jakobsson

Copyright © 2008 Noomane Drissi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We consider the cumulative residual entropy (CRE) a recently introduced measure of entropy. While in previous works distributions with positive support are considered, we generalize the definition of CRE to the case of distributions with general support. We show that several interesting properties of the earlier CRE remain valid and supply further properties and insight to problems such as maximum CRE power moment problems. In addition, we show that this generalized CRE can be used as an alternative to differential entropy to derive information-based optimization criteria for system identification purpose.

1. Introduction

The concept of entropy is important for studies in many areas of engineering such as thermodynamics, mechanics, or digital communications. An early definition of a measure of the entropy is the Shannon entropy [1, 2]. In Shannon's approach, discrete values and absolutely continuous distributions are treated in a somewhat different way through entropy and differential entropy, respectively. Considering the complementary cumulative distribution function (CCDF) instead of the probability density function in the definition of differential entropy leads to a new entropy measure named cumulative residual entropy (CRE) [3, 4]. In [3, 4], CRE is defined as

(𝑋)=𝑛+𝑃(|𝑋|>𝑢)log𝑃(|𝑋|>𝑢)𝑑𝑢,(1)where 𝑛 is the dimension of the random vector 𝑋. Clearly, this formula is valid both for a discrete or an absolutely continuous random variable (RV), or with both a discrete and an absolutely continuous part, because it resorts to the CCDF of |𝑋|. In addition, unlike Shannon differential entropy it is always positive, while preserving many interesting properties of Shannon entropy. The concept of CRE has found nice interpretations and applications in the fields of reliability (see [5] where the concept of dynamic CRE is introduced) and images alignment [3].

Shannon entropy can be seen as a particular case of exponential entropy, when entropy order tends to 1. Thus, following the work in [4], a modified version of the exponential entropy, where PDF is replaced by CCDF, has been introduced in [6], leading to new entropy-type measures, called survival entropies.

However, both Rao et al.'s CRE and its exponential entropy generalization by Zografos and Nadarajah lead to entropy-type definitions that assume either positive valued RVs or apply to |𝑋| otherwise. Although the positive case is of great interest for many applications, CRE and exponential entropies entail difficulties when working with RVs with supports that are not restricted to positive values.

In this paper, we show that for an RV 𝑋, (1) remains a valid expression when 𝑃(|𝑋|>𝑢) is replaced by 𝑃(𝑋>𝑢) and integration is performed over 𝑛, without further hypothesis than in [4]. In addition, some desirable properties are enabled by this CRE definition extension. We also complete the power moment constrained maximum CRE distributions problem that was adressed in [7], for classes of distributions that have lower-unbounded supports. Finally, we illustrate the potential superiority of the proposed generalized CRE (GCRE) against differential entropy in mutual information-based estimation problems.

The paper is organized as follows. Section 2 introduces the GCRE definition. Some properties of GCRE are discussed in Section 3. In Section 4, we introduce cumulative entropy rate and mutual information rate. Section 5 deals with maximum GCRE distributions. With a view to illustrate the potentiality of GCRE, in Section 6, we show on a simple example a possible benefit of GCRE for systems identification.

2. Generalized Cumulative Residual Entropy (GCRE)

We will denote by 𝐹𝑐𝑋(𝑥) the complementary cumulative distribution function (survival function) of a multivariate RV 𝑋=[𝑋1,,𝑋𝑛]𝑇 of dimension 𝑛: 𝐹𝑐𝑋(𝑋)=𝑃(𝑋>𝑥)=𝑃(𝑋𝑖>𝑥𝑖,𝑖=1,,𝑛). We denote by 𝐻𝐶(𝑋) the GCRE of 𝑋 that we define by

𝐻𝐶(𝑋)=𝑛𝐹𝑐𝑋(𝑢)log𝐹𝑐𝑋(𝑢)𝑑𝑢.(2)Clearly, like the CRE, the GCRE is a positive and concave function of 𝐹𝑐𝑋. In addition, existence of GCRE can be established without further assumption upon distribution than those assumed for the CRE in [4].

Theorem 1. 𝐻𝐶(𝑋)< if for some 𝑝>𝑛, 𝐸[|𝑋|𝑝]<.

Proof. First let us remark that from the proof of the existence of CRE in [4], it is sufficient to prove the result when 𝑋 is a scalar RV, that is 𝑛=1, and for 𝑝>1. Then, letting 𝑝1<𝛼<1, we use the following inequality:
𝐹𝑐𝑋(𝑥)log𝐹𝑐𝑋𝑒(𝑥)1𝐹1𝛼𝑐𝑋(𝑥)𝛼1I]0,[+(𝑥)1𝐹𝑐𝑋(𝑥)1I],0](𝑥),(3)where 1I𝐴(𝑥)=1 if 𝑥𝐴 and 1I𝐴(𝑥)=0 otherwise. The existence of (𝑒1/(1𝛼))[𝐹𝑐𝑋(𝑥)]𝛼1I[0,[(𝑥)𝑑𝑥 can be proven just as in [4]. Now, letting 𝑢=𝑥, we have
1𝐹𝑐𝑋(𝑥)1I],0](𝑥)=𝐹𝑋(𝑥)1I],0](𝑥)1I[1,0[(𝑡)+𝐹𝑋(𝑥)1I],1[(𝑥)1I]0,1](𝑢)+𝐹𝑋(𝑢)1I]1,](𝑢)1I]0,1](𝑢)+𝐹𝑋(𝑢)1I]1,](𝑢)1I]0,1](𝑢)+𝐹𝑐||𝑋||(𝑢)1I]1,](𝑢)1I]0,1](𝑢)+𝑢𝑝𝐸||𝑋||𝑝1I]1,](𝑢).(4)Thus,
1𝐹𝑐𝑋(𝑥)1I],0](𝑥)𝑑𝑥1I]0,1](𝑢)+𝑢𝑝𝐸||𝑋||𝑝1I]1,](𝑢)𝑑𝑢1+1𝑢𝑝𝐸||𝑋||𝑝𝑑𝑢<.(5) Finally, putting all pieces together one finally proves convergence of right-hand side of (2).

3. A Few Properties of GCRE

Let us now exhibit a few more interesting properties of the GCRE. First, it is easy to check that like Shannon entropy the GCRE remains constant with respect to variable translation:𝑎𝑛,𝐻𝐶(𝑋+𝑎)=𝐻𝐶(𝑋).(6) In the same way, it is clear that𝑎+,𝐻𝐶(𝑎𝑋)=𝑎𝐻𝐶(𝑋).(7) When 𝑎<0, we do not have such a nice property. However, let us consider the important particular case where the distribution of 𝑋 has a symmetry of the form

𝜇,𝑥,𝐹𝑐𝑋(𝜇𝑥)=1𝐹𝑐𝑋(𝜇+𝑥).(8)In this case, we get the following result.

Theorem 2. For an RV 𝑋 that satisfies symmetry property (8), one has 𝑎,𝐻𝐶||𝑎||𝐻(𝑎𝑋)=𝐶(𝑋).(9)

Proof. Since it is clear that for all 𝑎+, 𝐻𝐶(𝑎𝑋)=𝑎𝐻𝐶(𝑋), we just have to check that 𝐻𝐶(𝑋)=𝐻𝐶(𝑋), which can be established as follows:𝐻𝐶(𝑋)=𝐹𝑐𝑋(𝑥)log𝐹𝑐𝑋=(𝑥)𝑑𝑥𝐹𝑐𝑋(𝑥𝜇)log𝐹𝑐𝑋=(𝑥𝜇)𝑑𝑥𝐹𝑋(𝑥+𝜇)log𝐹𝑋=(𝑥+𝜇)𝑑𝑥1𝐹𝑋(𝑥+𝜇)log1𝐹𝑋=(𝑥+𝜇)𝑑𝑥𝐹𝑐𝑋(𝑥+𝜇)log𝐹𝑐𝑋=(𝑥+𝜇)𝑑𝑥𝐹𝑐𝑋(𝑥)log𝐹𝑐𝑋(𝑥)𝑑𝑥=𝐻𝐶(𝑋).(10)

When the entries of vector 𝑋 are independent, it has been shown in [4] that if the 𝑋𝑖 are nonnegative, then

𝐻𝐶(𝑋)=𝑖Π𝑗𝑖𝔼𝑋𝑗𝐻𝐶𝑋𝑖.(11)However, this formula does not extend to RVs with distributions carried by 𝑛 because 𝐹𝐶(𝑋) can be integrated over 𝑛+ in general but never over 𝑛. However, if the 𝑋𝑖s are independent and have lower bounded supports with respective lower bounds 𝑚1,,𝑚𝑛,

𝐻𝐶(𝑋)=Π𝑖]𝑚𝑖,]𝐹𝑐𝑋(𝑥)log𝐹𝑐𝑋=(𝑥)𝑑𝑥𝑖Π𝑗𝑖𝔼𝑋𝑗𝑚𝑗𝐻𝐶𝑋𝑖,(12)because𝑚𝑖𝐹𝑐𝑋𝑖(𝑢)𝑑𝑢=𝑢𝐹𝑐𝑋𝑖(𝑢)𝑚𝑖+𝑚𝑖𝑢𝑃𝑋𝑖(𝑑𝑢)=𝑚𝑖𝑋+𝔼𝑖.(13)

Conditional GCRE definition is a direct extension of the definition of conditional CRE: the conditional GCRE of 𝑋 knowing that 𝑌 is equal to 𝑦 is defined by

𝐻𝐶||=(𝑋𝑌=𝑦)𝑛||||𝑃(𝑋>𝑥𝑌=𝑦)log𝑃(𝑋>𝑥𝑌=𝑦)𝑑𝑥.(14)We recall an important result from [4] that states that conditioning reduces the entropy.

Theorem 3. For any 𝑋 and 𝑌,𝐻𝐶||(𝑋𝑌)𝐻𝐶(𝑋)(15)equality holds if and only if 𝑋 is independent of 𝑌.

As a consequence, if 𝑋𝑌𝑍 is a Markov chain, we have the data processing inequality for GCRE:𝐻𝐶||(𝑍𝑋,𝑌)𝐻𝐶||(𝑍𝑋).(16)

4. Entropy and Mutual Information Rates

4.1. Entropy Rate

The GCRE of a stochastic process {𝑋𝑖} is defined by𝐻𝐶(𝑋)=lim𝑛𝐻𝐶𝑋𝑛|||𝑋𝑛1,𝑋𝑛2,,𝑋1(17)when the limit exists.

Theorem 4. For stationary processes, the limit exists.

Proof. Consider𝐻𝐶𝑋𝑛+1|||𝑋𝑛,,𝑋1𝐻𝐶𝑋𝑛+1|||𝑋𝑛,,𝑋2𝐻𝐶𝑋𝑛|||𝑋𝑛1,,𝑋1.(18)The first line follows from the fact that conditioning reduces entropy and the second follows from the stationarity (see [2] for the equivalent proof in the case of Shannon entropy).

4.2. Mutual Information

Let 𝑋 and 𝑌 be two RVs. We define the cumulative mutual information between 𝑋 and 𝑌 as follows:𝐼𝐶(𝑋;𝑌)=𝐻𝐶(𝑋)𝐻𝐶||(𝑋𝑌).(19)

Theorem 5. 𝐼𝐶 is nonnegative and it vanishes if and only if 𝑋 and 𝑌 are independent.

Proof. It is clear that 𝐼𝐶 is nonnegative because of Theorem 3.

For a random vector 𝐗=(𝑋1,𝑋2,,𝑋𝑛) of size 𝑛, mutual information is defined by𝐼𝐶(𝐗)=𝑛𝑖=1𝐻𝐶𝑋𝑖𝐻𝐶𝑋𝑛|||𝑋𝑛1,,𝑋1.(20)In the case of stochastic processes {𝑋𝑖}, we have 𝐻𝐶(𝑋)=lim𝑛𝐻𝐶(𝑋𝑛|𝑋𝑛1,,𝑋1) and the limit exists for stationary processes. Then the mutual information rate for {𝑋𝑖} is defined as𝐼𝐶(𝑋)=lim𝑇1𝑇𝑇𝑡=1𝐻𝐶𝑋𝑡𝐻𝐶(𝑋),(21)where 𝐻𝐶(𝑋𝑡) is the marginal GCRE of the process 𝑋.

5. Maximum GCRE Distributions

In this section, we only consider the case of one-dimensional RVs (𝑛 = 1). Maximum entropy principle is useful in many scientific areas and most important distributions can be derived from it [8]. The maximum CRE distribution has been studied in [7]. For an RV 𝑋 with a symmetric CCDF in the sense of (8), we are looking for the maximum GCRE distribution, that is, the CCDF that solves the following moment problem:

max𝐹𝑐𝐻𝐶𝐹𝑐𝑟𝑖(𝑥)𝑝(𝑥)𝑑𝑥=𝑐𝑖,𝑖=1,,𝑚,(22)where 𝑝(𝑥)=(𝑑/𝑑𝑥)𝐹𝑐(𝑥), (𝑟𝑖)𝑖=1,𝑚, and (𝑐𝑖)𝑖=1,𝑚 are fixed 𝐶1 real valued functions and real coefficients, respectively. The solution of this problem is supplied by the following result.

Theorem 6. When the symmetry property (8) holds, the solution of problem (22), when it can be reached, is of the form 𝐹𝑐1(𝑥)=1+exp𝑚1𝜆𝑖𝑟𝑖(𝑥𝜇)(𝑥𝜇).(23)

Proof. 𝐻𝐶(𝑋)=𝜇𝐹𝑐𝐹(𝑥)log𝑐+(𝑥)𝑑𝑥𝜇𝐹𝑐𝐹(𝑥)log𝑐=(𝑥)𝑑𝑥𝐹𝑐𝐹(𝑥+𝜇)log𝑐+(𝑥+𝜇)𝑑𝑥+𝐹𝑐𝐹(𝑥+𝜇)log𝑐=(𝑥+𝜇)𝑑𝑥1𝐹𝑐(𝜇𝑥)log1𝐹𝑐+(𝜇𝑥)𝑑𝑥+𝐹𝑐𝐹(𝑥+𝜇)log𝑐=(𝑥+𝜇)𝑑𝑥+1𝐹𝑐(𝜇+𝑥)log1𝐹𝑐(𝜇+𝑥)+𝐹𝑐𝐹(𝜇+𝑥)log𝑐(𝜇+𝑥)𝑑𝑥.(24) Let us define 𝑓 by𝑓𝐹𝑐(𝑥),𝑝=1𝐹𝑐(𝜇+𝑥)log1𝐹𝑐(𝜇+𝑥)𝐹𝑐𝐹(𝜇+𝑥)log𝑐+(𝜇+𝑥)𝑚1𝜆𝑖𝑟𝑝(𝑥)𝑖(𝑥).(25) Then, since (𝐹𝑐(𝑥))=𝑝(𝑥), the Euler-Lagrange equation [9] states that the solution 𝐹𝑐 of problem (22) is a solution of equation𝑑𝑓𝑑𝑥𝑝𝑓(𝑥)=𝐹𝑐(𝑥),(26)where 𝑓𝑢 is the partial derivative of 𝑓 with respect to component 𝑢. From (25), we get𝑑𝑓𝑑𝑥𝑝(𝑥)=𝑚1𝜆𝑖𝑟𝑖𝑓(𝑥),𝐹𝑐(𝑥)=log1𝐹𝑐(𝜇+𝑥)log𝐹𝑐(𝜇+𝑥).(27)Then,log1𝐹𝑐(𝜇+𝑥)𝐹𝑐=(𝜇+𝑥)𝑚1𝜆𝑖𝑟𝑖𝐹(𝑥),𝑐1(𝑥)=1+exp𝑚1𝜆𝑖𝑟𝑖,(𝑥𝜇)(28)for 𝑥[𝜇,[.

5.1. Example

We set the constraints 𝔼[𝑋]=𝜇 and 𝔼[𝑋2]=𝜎2. Then the maximum GCRE symmetric solution for the CCDF of X is given by𝐹𝑐𝑋1(𝑥)=𝜆1+exp1+2𝜆2𝑥,(29)for 𝑥>0, which is the CCDF of a logistic distribution. The moment constraints lead to 𝜆1+2𝜆2𝑥=(𝜎3/𝜋)(𝑥𝜇). The corresponding PDF is defined on by𝑝𝑋𝜎(𝑥)=3𝜋exp(𝜎3/𝜋)(𝑥𝜇)1+exp(𝜎3/𝜋)(𝑥𝜇)2.(30)

5.2. Positive Random Variables

It has been shown in [7] that the maximum CRE (i.e., the maximum GCRE under additional nonnegative constraint) distribution has CCDF in the form𝐹𝐶(𝑥)=exp(𝑚𝑖=1𝜆𝑖𝑟𝑖(𝑥)),(31)for 𝑥[0,[. In [7], this result is derived from the log-sum inequality, but of course it can also be derived from the Euler-Lagrange equation along the same lines as in the proof of Theorem 6.

With a positive support constraint and under first and second moment constraints, it comes that the optimum CCDF is of the form 𝐹𝑐(𝑥)=exp(𝜆12𝜆2𝑥), for 𝑥>0. Thus the solution, if it exists, is an exponential distribution. In fact, the first and second power moment constraints must be such that 𝔼[𝑋2]=2(𝔼[𝑋])2, otherwise the problem has no exact solution.

6. Simulation Results

With a view to emphasize the potential practical interest of GCRE, we consider a simple system identification problem. Here, we consider an 𝑀𝐴(1) process, denoted by 𝑌=(𝑌𝑛)𝑛, generated by a white noise 𝑋=(𝑋𝑛)𝑛 and corrupted by a white noise 𝑁:𝑌𝑛=𝑋𝑛𝑎𝑋𝑛1+𝑁𝑛.(32) The model input 𝑋 and output 𝑌 are observed and the system model (𝑀𝐴(1)) is assumed to be known. We want to estimate the coefficient 𝑎 without prior knowledge upon the distributions of 𝑋 and 𝑌. Thus, we resort to mutual information (MI) to estimate 𝑎 as the coefficient 𝛼 such that RVs 𝑌𝛼𝑛=𝑋𝑛𝛼𝑋𝑛1 and 𝑌𝑛 show the highest dependence. Shannon MI between 𝑌𝛼𝑛 and 𝑌𝑛 is given by 𝑓𝑆(𝛼)=𝐼𝑆(𝑌𝛼𝑛,𝑌𝑛)=𝐻𝑆(𝑌𝛼𝑛)𝐻𝑆(𝑌𝑛|𝑌𝑛), where 𝐻𝑆 is Shannon differential entropy. Similarly, for GCRE, MI will be defined as 𝑓𝐶(𝛼)=𝐼𝐶(𝑌𝛼𝑛,𝑌)=𝐻𝐶(𝑌𝛼𝑛)𝐻𝐶(𝑌𝛼𝑛|𝑌𝑛). We compare estimation performance for 𝑎 by maximizing both 𝑓𝑆(𝛼) and 𝑓𝐶(𝛼). Since true values of 𝑓𝑆(𝛼) and 𝑓𝐶(𝛼) are not available, they are estimated from empirical distributions of (𝑌𝑛,𝑌𝛼𝑛).

For simulations, we have chosen 𝑋 Gaussian and 𝑁 with a Laplace distribution: 𝑝𝑁(𝑥)=(𝜆/2)exp(𝜆|𝑥|). We consider an experiment with 𝑎=0.5 and noise variance equal to 0.2. Estimation is carried out from observation of (𝑋𝑛,𝑌𝑛)𝑛=1,400. Here, optimization of MIs is realized on a fixed regular grid of 200 points over interval [0,1]. Estimation performance is calculated from 200 successive experiments. Estimation of 𝑎 from Shannon MI leads to bias and standard deviation that are equal to 0.032 and 0.18, respectively, while they are equal to 0.004 and 0.06, respectively, for GCRE MI.

More important, we see on Figure 1(a) that Shannon MI estimates are much more irregular than GCRE MI (Figure 1(b)) estimates because of smoothing brought by density integration in the calculation of CCDF. This difference is important since the use of an iterative local optimization technique would have failed in general to find Shannon's estimated MI global optimum, because of its many local maxima.

fig1
Figure 1: Ten estimates of (a) 𝑓𝑆(𝛼) and (b) 𝑓𝐶(𝛼). Dotted line: mean estimate averaged from 200 realizations.

Of course, this drawback can be partly solved by kernel smoothing of the empirical distribution of (𝑌𝑛,𝑌𝛼𝑛), for instance by using the method proposed in [10]. However, we have checked that, for the above example, very strong smoothing is necessary and then bias and variance performance remain worse than with GCRE MI estimator.

7. Conclusion

We have shown that the concept of cumulative residual entropy (CRE) introduced in [3, 4] can be extended to distributions with general supports. Generalized CRE (GCRE) shares many nice features of CRE. We also pointed out specific properties of GCRE such as its maximum, moment constrained, and distribution and we have illustrated practical interest of GCRE by showing how it can be used in system identification procedures.

References

  1. C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 1948.
  2. J. M. Cover and J. H. Thomas, Elements of Information Theory, Wiley Interscience, New York, NY, USA, 2006.
  3. F. Wang, B. C. Vemuri, M. Rao, and Y. Chen, “A new & robust information theoretic measure and its application to image alignment,” in Proceedings of the 18th International Conference on Information Processing in Medical Imaging (IPMI '03), vol. 2732 of Lecture Notes in Computer Science, pp. 388–400, Springer, Ambleside, UK, July 2003.
  4. M. Rao, Y. Chen, B. C. Vemuri, and F. Wang, “Cumulative residual entropy: a new measure of information,” IEEE Transactions on Information Theory, vol. 50, no. 6, pp. 1220–1228, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  5. M. Asadi and Y. Zohrevand, “On the dynamic cumulative residual entropy,” Journal of Statistical Planning and Inference, vol. 137, no. 6, pp. 1931–1941, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  6. K. Zografos and S. Nadarajah, “Survival exponential entropies,” IEEE Transactions on Information Theory, vol. 51, no. 3, pp. 1239–1246, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  7. M. Rao, “More on a new concept of entropy and information,” Journal of Theoretical Probability, vol. 18, no. 4, pp. 967–981, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  8. J. N. Kapur, Maximum Entropy Methods in Science and Engineering, John Wiley & Sons, New York, NY, USA, 1989.
  9. J. L. Troutman, Variational Calculus and Optimal Control, Optimization with Elementary Convexity, Springer, New York, NY, USA, 2nd edition, 1996.
  10. D. T. Pham, “Fast algorithm for estimating mutual information, entropies and score functions,” in Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), pp. 17–22, Nara, Japan, April 2003.