Lab-STICC (CNRS FRE 3167), Institut Telecom, Telecom Bretagne, Technopole Brest Iroise, CS 83818, 29238 Brest Cédex, France
We consider the cumulative residual entropy (CRE) a recently introduced
measure of entropy. While in previous works distributions with positive support
are considered, we generalize the definition of CRE to the case of distributions
with general support. We show that several interesting properties
of the earlier CRE remain valid and supply further properties and insight to
problems such as maximum CRE power moment problems. In addition, we
show that this generalized CRE can be used as an alternative to differential
entropy to derive information-based optimization criteria for system identification
purpose.
1. Introduction
The concept of entropy is important for studies in
many areas of engineering such as thermodynamics, mechanics, or digital
communications. An early definition of a measure of the entropy is the Shannon
entropy [1, 2]. In Shannon's approach, discrete values and
absolutely continuous distributions are treated in a somewhat different way
through entropy and differential entropy, respectively. Considering the
complementary cumulative distribution function (CCDF) instead of the
probability density function in the definition of differential entropy leads to
a new entropy measure named cumulative residual entropy (CRE) [3, 4]. In [3, 4], CRE is defined as
where is the dimension of the random vector .
Clearly, this formula is valid both for a discrete or an absolutely continuous
random variable (RV), or with both a discrete and an absolutely continuous
part, because it resorts to the CCDF of .
In addition, unlike Shannon differential entropy it is always positive, while
preserving many interesting properties of Shannon entropy. The concept of CRE has found nice interpretations and
applications in the fields of reliability (see [5] where the concept of dynamic CRE is introduced) and images
alignment [3].
Shannon entropy can be seen as a particular case of
exponential entropy, when entropy order tends to 1. Thus, following the work in
[4], a modified version of the exponential entropy, where
PDF is replaced by CCDF, has been introduced in [6], leading to new entropy-type measures, called
survival entropies.
However, both Rao et al.'s
CRE and its exponential entropy generalization by Zografos and Nadarajah lead
to entropy-type
definitions that assume either positive valued RVs or apply to otherwise. Although the positive case is of
great interest for many applications, CRE and exponential entropies entail
difficulties when working with RVs with supports that are not restricted to
positive values.
In this paper, we show that for an RV , (1) remains a valid expression when is replaced by and integration is performed over ,
without further hypothesis than in [4]. In addition, some desirable properties are enabled
by this CRE definition extension. We also complete the power moment constrained
maximum CRE distributions problem that was adressed in [7], for
classes of distributions that have
lower-unbounded supports. Finally, we illustrate the potential superiority of
the proposed generalized CRE (GCRE) against differential entropy in mutual
information-based estimation problems.
The paper is organized as follows. Section 2
introduces the GCRE definition. Some properties of GCRE are discussed in
Section 3. In Section 4, we introduce cumulative entropy rate and mutual
information rate. Section 5 deals with maximum GCRE distributions. With a view
to illustrate the potentiality of GCRE, in Section 6, we show on a simple
example a possible benefit of GCRE for systems identification.
2. Generalized Cumulative Residual Entropy (GCRE)
We will denote
by the complementary cumulative distribution
function (survival function) of a multivariate RV of dimension : .
We denote by the GCRE of that we define by
Clearly, like
the CRE, the GCRE is a positive and concave function of .
In addition, existence of GCRE can be established without further assumption
upon distribution than those assumed for the CRE in [4].
Theorem 1. if for some , .
Proof. First let us remark that from the proof of the existence of CRE in
[4], it is sufficient to prove the result when is a scalar RV, that is ,
and for .
Then, letting we use the following inequality:
where if and otherwise. The existence of can be proven just as in [4]. Now, letting ,
we have
Thus,
Finally,
putting all pieces together one finally proves convergence of right-hand side
of (2).
3. A Few Properties of GCRE
Let us now
exhibit a few more interesting properties of the GCRE. First, it is easy to
check that like Shannon entropy the GCRE remains constant with respect to
variable translation: In the same
way, it is clear that When ,
we do not have such a nice property. However, let us consider the important
particular case where the distribution of has a symmetry of the form
In this case,
we get the following result.
Theorem 2. For an RV that satisfies symmetry property (8), one has
Proof. Since
it is clear that for all , ,
we just have to check that ,
which can be established as follows:
When the entries of vector are independent, it has been shown in [4] that if
the are nonnegative, then
However, this
formula does not extend to RVs with distributions carried by because can be integrated over in general but never over .
However, if the s are independent and have lower bounded
supports with respective lower bounds ,
because
Conditional GCRE definition is a direct extension of
the definition of conditional CRE: the conditional GCRE of knowing that is equal to is defined by
We recall an
important result from [4] that states that conditioning reduces
the entropy.
Theorem 3. For any and equality holds if and only if is independent of .
As a consequence, if is a Markov chain, we have the data processing
inequality for GCRE:
4. Entropy and Mutual Information Rates
4.1. Entropy Rate
The GCRE of a
stochastic process is defined bywhen the limit
exists.
Theorem 4. For stationary processes, the
limit exists.
Proof. ConsiderThe first line follows from the
fact that conditioning reduces entropy and the second follows from the
stationarity (see [2] for the equivalent proof in the case of Shannon
entropy).
4.2. Mutual Information
Let and be two RVs. We define the cumulative mutual
information between and as follows:
Theorem 5. is nonnegative and it vanishes if and only if and are independent.
Proof. It is
clear that is nonnegative because of Theorem 3.
For a random vector of size ,
mutual information is defined byIn the case of stochastic
processes ,
we have and the limit exists for stationary processes.
Then the mutual information rate for is defined aswhere is the marginal GCRE of the process .
5. Maximum GCRE Distributions
In this
section, we only consider the case of one-dimensional RVs ( = 1). Maximum entropy principle is useful in
many scientific areas and most important distributions can be derived from it [8].
The maximum CRE distribution has been studied in [7]. For an RV with a symmetric CCDF in the sense of (8), we are
looking for the maximum GCRE distribution, that is, the CCDF that solves the
following moment problem:
where , ,
and are fixed real valued functions and real coefficients,
respectively. The solution of this problem is supplied by the following result.
Theorem 6. When the symmetry property (8) holds, the
solution of problem (22), when it can be reached, is of the form
Proof. Let us define by Then, since ,
the Euler-Lagrange equation [9] states that the solution of problem (22) is a solution of equationwhere is the partial derivative of with respect to component .
From (25), we getThen,for .
5.1. Example
We set the
constraints and .
Then the maximum GCRE symmetric solution for the CCDF of X is given
byfor ,
which is the CCDF of a logistic distribution. The moment constraints lead to .
The corresponding PDF is defined on by
5.2. Positive Random Variables
It has been
shown in [7] that the maximum CRE (i.e., the maximum GCRE under
additional nonnegative constraint) distribution has CCDF in the
formfor .
In [7], this result is derived from the log-sum inequality,
but of course it can also be derived from the Euler-Lagrange equation along the
same lines as in the proof of Theorem 6.
With a positive support constraint and under first and
second moment constraints, it comes that the optimum CCDF is of the form for .
Thus the solution, if it exists, is an exponential distribution. In fact, the
first and second power moment constraints must be such that ,
otherwise the problem has no exact solution.
6. Simulation Results
With a view to
emphasize the potential practical interest of GCRE, we consider a simple system
identification problem. Here, we consider an process, denoted by ,
generated by a white noise and corrupted by a white noise : The model input and output are observed and the system model () is assumed to be known. We want to estimate
the coefficient without prior knowledge upon the distributions
of and .
Thus, we resort to mutual information (MI) to estimate as the coefficient such that RVs and show the highest dependence. Shannon MI
between and is given by ,
where is Shannon differential entropy. Similarly,
for GCRE, MI will be defined as .
We compare estimation performance for by maximizing both and .
Since true values of and are not available, they are estimated from
empirical distributions of .
For simulations, we have chosen Gaussian and with a Laplace distribution: .
We consider an experiment with and noise variance equal to 0.2. Estimation is
carried out from observation of .
Here, optimization of MIs is realized on a fixed regular grid of 200 points
over interval [0,1]. Estimation performance is calculated from 200 successive
experiments. Estimation of from Shannon MI leads to bias and standard
deviation that are equal to 0.032 and 0.18, respectively, while they are equal
to 0.004 and 0.06, respectively, for GCRE MI.
More important, we see on Figure 1(a) that Shannon MI
estimates are much more irregular than GCRE MI (Figure 1(b)) estimates because
of smoothing brought by density integration in the calculation of CCDF. This
difference is important since the use of an iterative local optimization
technique would have failed in general to find Shannon's estimated MI global
optimum, because of its many local maxima.
Figure 1: Ten estimates of (a) and (b) .
Dotted line: mean estimate averaged from 200 realizations.
Of course, this drawback can be partly solved by kernel
smoothing of the empirical distribution of ,
for instance by using the method proposed in [10]. However, we have checked that, for the above
example, very strong smoothing is necessary and then bias and variance
performance remain worse than with GCRE MI estimator.
7. Conclusion
We have shown
that the concept of cumulative residual entropy (CRE) introduced in [3, 4] can be extended
to distributions with general
supports. Generalized CRE (GCRE) shares many nice features of CRE. We also
pointed out specific properties of GCRE such as its maximum, moment
constrained, and distribution and we have illustrated practical interest of
GCRE by showing how it can be used in system identification procedures.