Research Article  Open Access
The Unifying Frameworks of Information Measures
Abstract
Information measures are capable of providing us with fundamental methodologies to analyze uncertainty and unveiling the substantive characteristics of random variables. In this paper, we address the issues of different types of entropies through generalized KolmogorovNagumo averages, which lead to the propositions of the survival Rényi entropy and survival Tsallis entropy. Therefore, we make an inventory of eight types of entropies and then classify them into two categories: the density entropy that is defined on density functions and survival entropy that is defined on survival functions. This study demonstrates that, for each type of the density entropy, there exists a kind of the survival entropy corresponding to it. Furthermore, the similarity measures and normalized similarity measures are, respectively, proposed for each type of entropies. Generally, functionals of different types of informationtheoretic metrics are equally diverse, while, simultaneously, they also exhibit some unifying features in all their manifestations. We present the unifying frameworks for entropies, similarity measures, and normalized similarity measures, which helps us deal with the available information measures as a whole and move from one functional to another in harmony with various applications.
1. Introduction
Measures of probabilistic uncertainty and information have attracted growing attentions since Hartley introduced the practical measure of information as the logarithm of the amount of uncertainty associated with finite possible symbol sequences, where the distribution of events is considered to be equally probable [1]. Today, entropy plays a basic role in the definitions of information measures with various applications in different areas. It has been recognized as the fundamental important field intersecting with mathematics, communication, physics, computer science, economics, and so forth [2–5].
The generalized information theory arising from the study of complex systems was intended to expand classical information theory based on probability. The additive probability measures, which are inherent in classical information theory, are extended to various types of nonadditive measures and thus result in different types of functionals that generalize Shannon entropy [6–8]. Generally, the formalization of uncertainty functions involves a considerable diversity. However, it also exhibits some unifying features [9].
1.1. Entropies Defined on Density Functions
We consider as the continuous random variables (r.v.) over a state space with the joint density function and marginal density functions and . We also consider the conditional density function of given defined over . Note that , , and are also used to mean , , and , respectively, if their meanings are clear in context.
Let be a density function of r.v. with . The Khinchin axioms [10] are capable of obtaining the Shannon entropy in a unique way. However, this may be too narrowminded if one wants to describe complex systems. Therefore, a generalized measure of an r.v. with respect to KolmogorovNagumo (KN) averages [11] can be deduced as where is a continuous and strictly monotonic KN function [12] and hence has an inverse .
The KN averages can be extended in different manners to propose more generalized information measures. We use the logarithm function [13] given as to replace the logarithm function in (1). Note that and satisfies pseudoadditivity; for example, . Hence we extend KN averages to a generalized measure of information with respect to generalized KN averages defined by
In terms of Rényi’s generalization on axioms of KN averages [14], if and in (3), it yields Shannon entropy (SE) [15] defined as
Based on Shannon entropy, the Shannon mutual information (SMI) [15, 16] of r.v.s and was given by where is the joint Shannon entropy of and is the conditional Shannon entropy of given .
If and is chosen as in (3), it yields Rényi entropy (RE) [14] defined by where and .
Shannon entropy and Rényi entropy are additive. If and in (3), we get the pseudoadditive entropy or Tsallis entropy (TE) [17] defined by where and .
We obtain and . Therefore, Rényi entropy and Tsallis entropy can be viewed as interpolation formulas of the Shannon entropy and Hartley entropy (). A relation between Rényi and Tsallis entropies can be easily deduced as
More recently, interest in generalized information measures increases dramatically in different manners. A respectable number of nonclassical entropies, rather than Shannon entropy, Rényi entropy, and Tsallis entropy, have already been developed in the study of complex systems.
The exponential entropy (EE) of order [18] was defined by where and .
We obtain and .
1.2. Entropies Defined on Survival Functions
As narrated in [19], information measures defined on the density function suffer from several drawbacks, since the distribution function is more regular than the density function. Therefore, the cumulative residual entropy, which was defined on the cumulative distribution function or equivalently the survival function, was proposed as an alternative information measure of uncertainty.
Let be a nonnegative r.v. in . We use the notation to mean that for . The multivariate survival function of a nonnegative r.v. is given as where with .
If the density function is replaced by the survival function, is set as 1, and in (3), it yields the survival Shannon entropy (SSE) [19] defined as
Since eight different types of entropies and their corresponding similarity measures will be discussed subsequently, it is worth pointing out that some notations and names of the existing information measures will be changed in harmony with the unifying frameworks throughout this paper.
To consider the conditional survival entropy, we denote as the conditional distribution function of given and also as the respective conditional survival function.
The cross survival Shannon entropy (CSSE) of r.v.s was given by [19] where is the conditional survival Shannon entropy of r.v.s given defined as [19] and here is the expectation with respect to an r.v. . The nonnegativity of CSSE was proven in [19] and thus CSSE was used as a similarity measure in image registration [20]. The generalized versions of SSE in dynamic systems were discussed in [21, 22].
If the density function in (9) is replaced by the survival function, this yields the survival exponential entropy (SEE) [23] of an r.v. with order given by where and .
As an ongoing research program, generalized information measure offers us a steadily growing inventory of distinct entropy theories. Diversity and unity are two significant features of these theories. The growing diversity of information measures makes it increasingly more realistic to find a certain information measure suitable for a given condition. The unity allows us to view all available information measures as a whole and to move from one measure to another as needed. To that end, motivated by the researching approaches on Shannon entropy, Shannon mutual information [2], SSE [19], and SEE [23], we attempt to study informationtheoretic metrics in their manifestations. On one hand, we propose several new types of entropies and their similarity measures; on the other hand, for each type of the existing entropies, except for Shannon entropy, we give the definitions of similarity measures (see Tables 1 and 2). Finally, we deduce the unifying frameworks for information measures emerging from the study of complex systems based on probability.


The remainder of this paper is organized as follows. Section 2 will propose the similarity measures defined on the density function. In Section 3, the survival Rényi entropy and survival Tsallis entropy are presented. In Section 4, we address the similarity measures defined on the survival function. The unifying frameworks of information measures and examples are provided in Section 5. Finally, we conclude this paper in Section 6.
2. Similarity Measures Defined on the Density Function
Shannon mutual information measures the information of an r.v. conveying about another r.v. . It has been widely used in image registration [24, 25] and pattern recognition [26, 27]. Generally, as SMI is defined on Shannon entropy, each type of entropies would lead to corresponding similarity measures. In application, an idea similarity measure should be nonnegative. To that end, we take the way as [15, 19, 23] to define the similarity measures by linear expectation operator rather than KN average operator weighted by the escort distribution [28]. This section will present the similarity measures defined on the density function corresponding to Rényi entropy, Tsallis entropy, and the exponential entropy, respectively.
2.1. Rényi Mutual Information
Lemma 1. Let and be r.v.s and let be a real convex function. Then If is strictly convex, the equality holds if and only if X and Y are independent. If is concave, the inequality is reversed.
Proof. For a real convex function , using Jensen’s inequality [29], we obtain The equality holds if X and Y are independent. Since , it is immediate that Now consider the "only if" part of the lemma. If X and Y are independent, then , and thus the equality holds in (15). On the other hand, if the equality holds in (15), then the equality holds in Jensen’s inequality (16), which leads to . Then almost surely, since is strictly convex. We obtain by (18); hence , which leads to the independence of X and Y.
Lemma 1 plays an important role to prove the nonnegativity for the similarity measure to be introduced, which is defined on the density function.
Definition 2. Let X and Y be r.v.s; the conditional Rényi entropy of given with order is defined by where and .
Motivated by the definitions of the joint Shannon entropy and joint survival Shannon entropy , the joint Rényi entropy can be similarly introduced.
Definition 3. The joint Rényi entropy of r.v.s X and Y with order is defined as , where and .
Theorem 4. For r.v.s X and Y, we obtain and for all and .
Proof. Since and is concave of for all , using Lemma 1 and Jensen’s inequality, we have Since and is convex of for all , using Lemma 1 and Jensen’s inequality, we obtain Similarly, we have , and thus .
Definition 5. The Rényi mutual information (RMI) of r.v.s and with order is defined as where and .
It is worth pointing out that the definition of RMI parallels with the definitions of SMI (5) and CSSE (13). The nonnegativity of RMI is ensured by Theorem 4. Considering Theorem 4 that parallels with (5), we can give another form of the definition for RMI as There are no essential differences between these two forms of definitions for RMI. We only consider the similar definitions as (24) for similarity measures throughout this paper.
Using L’Hôpital’s rules, it is easy to obtain and .
The normalized Shannon mutual information (NSMI) [16] of r.v.s and was given as NSMI often acts as a robust similarity measure in image registration [16, 30], attribute abstraction [31], and clustering [32]. Note that . In a similar way, different forms of the normalized mutual information will be deduced in this work.
Definition 6. The normalized Rényi mutual information (NRMI) of r.v.s and with order is defined by where and .
We immediately obtain by L’Hôpital’s rules.
2.2. Tsallis Mutual Information
Definition 7. The conditional Tsallis entropy of r.v. given with order is defined as where and .
Definition 8. The joint Tsallis entropy of r.v.s and with order is defined as , where and .
Theorem 9. For two r.v.s X and Y, we have and for all and .
Proof. Since and is concave of for all , using Lemma 1, we obtain The inequality holds for all , since and is convex of .
It is trivial to verify that .
Definition 10. The Tsallis mutual information (TMI) of r.v.s and with order is defined as , where and .
Using Theorem 9, we have .
Definition 11. The normalized Tsallis mutual information (NTMI) of r.v.s and with order is defined by where and .
It is easy to verify thatby L’Hôpital’s rules, and .
2.3. Exponential Mutual Information
Definition 12. The conditional exponential entropy of r.v. given with order is defined by where and .
Definition 13. The joint exponential entropy of r.v.s and with order is defined as , where and .
Theorem 14. For two r.v.s X and Y, for all and .
Proof. Since is concave of for all , using Lemma 1, we obtain . The equality is true, since for all .
Similarly, since is convex of for all , using Lemma 1, we obtain . We complete the proof by considering that is decreasing in for all .
Theorem 15. For two r.v.s X and Y, we obtain and for all and .
Proof. Since is convex of for all , using (30) and Jensen’s inequality, we obtain Since is concave and is decreasing in for all , similarly we obtain It is trivial to verify that .
Definition 16. The exponential mutual information (EMI) of r.v.s X and Y with order is defined as , where and .
Using Theorem 15, we have .
Definition 17. The normalized exponential mutual information (NEMI) of r.v.s X and Y with order is defined by where and .
3. Entropies Defined on the Survival Function
The existing survival Shannon entropy and the survival exponential entropy extended the corresponding functionals from the density function to the survival function. In this section, we will propose the survival Rényi entropy and the survival Tsallis entropy defined on the survival function which, respectively, parallel with the classical Rényi entropy and Tsallis entropy defined on the density function.
3.1. Survival Rényi Entropy
If the density function is replaced by the survival function, is set as 1, and is chosen as in (3), it yields the survival Rényi entropy.
Definition 18. The survival Rényi entropy (SRE) of an r.v. with order is defined as where and .
It is immediate to see that .
Definition 19. The conditional survival entropy of r.v. given with order is defined as where and .
Definition 20. The joint survival Rényi entropy of r.v.s and with order is defined as , where and .
Definition 21. The joint survival exponential entropy of r.v.s and with order is defined as , where , , and is the conditional survival exponential entropy of r.v. given .
Theorem 22 will show the relation between SRE and Shannon entropy.
Theorem 22. For an r.v. , one has
Proof. Using the logsum inequality [2, 19], we obtain and thus . We obtain Since , (38) can be written as Note that for all and for all . We complete the proof by multiplying on both sides of (39).
Theorem 23. Let and be two r.v.s; then
Proof. If , using Theorem 22, one hasSimilarly, the inequality in (41) holds for all .
Since , some properties of SRE can be similarly deduced by the proof approaches as SEE [23] and SSE [19]. We list these properties as propositions and neglect their proofs unless we can provide the improved, different, or more concise versions.
Proposition 24. If , for some , , then for all and .
Proof. For sets