Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2018, Article ID 1791954, 16 pages
https://doi.org/10.1155/2018/1791954
Research Article

The Unifying Frameworks of Information Measures

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China

Correspondence should be addressed to Ting-Zhu Huang; moc.621@gnauhuhzgnit

Received 23 July 2017; Accepted 6 February 2018; Published 8 March 2018

Academic Editor: Zhen-Lai Han

Copyright © 2018 Shiwei Yu and Ting-Zhu Huang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Information measures are capable of providing us with fundamental methodologies to analyze uncertainty and unveiling the substantive characteristics of random variables. In this paper, we address the issues of different types of entropies through -generalized Kolmogorov-Nagumo averages, which lead to the propositions of the survival Rényi entropy and survival Tsallis entropy. Therefore, we make an inventory of eight types of entropies and then classify them into two categories: the density entropy that is defined on density functions and survival entropy that is defined on survival functions. This study demonstrates that, for each type of the density entropy, there exists a kind of the survival entropy corresponding to it. Furthermore, the similarity measures and normalized similarity measures are, respectively, proposed for each type of entropies. Generally, functionals of different types of information-theoretic metrics are equally diverse, while, simultaneously, they also exhibit some unifying features in all their manifestations. We present the unifying frameworks for entropies, similarity measures, and normalized similarity measures, which helps us deal with the available information measures as a whole and move from one functional to another in harmony with various applications.

1. Introduction

Measures of probabilistic uncertainty and information have attracted growing attentions since Hartley introduced the practical measure of information as the logarithm of the amount of uncertainty associated with finite possible symbol sequences, where the distribution of events is considered to be equally probable [1]. Today, entropy plays a basic role in the definitions of information measures with various applications in different areas. It has been recognized as the fundamental important field intersecting with mathematics, communication, physics, computer science, economics, and so forth [25].

The generalized information theory arising from the study of complex systems was intended to expand classical information theory based on probability. The additive probability measures, which are inherent in classical information theory, are extended to various types of nonadditive measures and thus result in different types of functionals that generalize Shannon entropy [68]. Generally, the formalization of uncertainty functions involves a considerable diversity. However, it also exhibits some unifying features [9].

1.1. Entropies Defined on Density Functions

We consider as the continuous random variables (r.v.) over a state space with the joint density function and marginal density functions and . We also consider the conditional density function of given defined over . Note that , , and are also used to mean , , and , respectively, if their meanings are clear in context.

Let be a density function of r.v. with . The Khinchin axioms [10] are capable of obtaining the Shannon entropy in a unique way. However, this may be too narrow-minded if one wants to describe complex systems. Therefore, a generalized measure of an r.v. with respect to Kolmogorov-Nagumo (KN) averages [11] can be deduced as where is a continuous and strictly monotonic KN function [12] and hence has an inverse .

The KN averages can be extended in different manners to propose more generalized information measures. We use the -logarithm function [13] given as to replace the logarithm function in (1). Note that and satisfies pseudoadditivity; for example, . Hence we extend KN averages to a generalized measure of information with respect to -generalized KN averages defined by

In terms of Rényi’s generalization on axioms of KN averages [14], if and in (3), it yields Shannon entropy (SE) [15] defined as

Based on Shannon entropy, the Shannon mutual information (SMI) [15, 16] of r.v.s and was given by where is the joint Shannon entropy of and is the conditional Shannon entropy of given .

If and is chosen as in (3), it yields Rényi entropy (RE) [14] defined by where and .

Shannon entropy and Rényi entropy are additive. If and in (3), we get the pseudoadditive entropy or Tsallis entropy (TE) [17] defined by where and .

We obtain and . Therefore, Rényi entropy and Tsallis entropy can be viewed as interpolation formulas of the Shannon entropy and Hartley entropy (). A relation between Rényi and Tsallis entropies can be easily deduced as

More recently, interest in generalized information measures increases dramatically in different manners. A respectable number of nonclassical entropies, rather than Shannon entropy, Rényi entropy, and Tsallis entropy, have already been developed in the study of complex systems.

The exponential entropy (EE) of order [18] was defined by where and .

We obtain and .

1.2. Entropies Defined on Survival Functions

As narrated in [19], information measures defined on the density function suffer from several drawbacks, since the distribution function is more regular than the density function. Therefore, the cumulative residual entropy, which was defined on the cumulative distribution function or equivalently the survival function, was proposed as an alternative information measure of uncertainty.

Let be a nonnegative r.v. in . We use the notation to mean that for . The multivariate survival function of a nonnegative r.v. is given as where with .

If the density function is replaced by the survival function, is set as 1, and in (3), it yields the survival Shannon entropy (SSE) [19] defined as

Since eight different types of entropies and their corresponding similarity measures will be discussed subsequently, it is worth pointing out that some notations and names of the existing information measures will be changed in harmony with the unifying frameworks throughout this paper.

To consider the conditional survival entropy, we denote as the conditional distribution function of given and also as the respective conditional survival function.

The cross survival Shannon entropy (CSSE) of r.v.s was given by [19] where is the conditional survival Shannon entropy of r.v.s given defined as [19] and here is the expectation with respect to an r.v. . The nonnegativity of CSSE was proven in [19] and thus CSSE was used as a similarity measure in image registration [20]. The generalized versions of SSE in dynamic systems were discussed in [21, 22].

If the density function in (9) is replaced by the survival function, this yields the survival exponential entropy (SEE) [23] of an r.v. with order given by where and .

As an ongoing research program, generalized information measure offers us a steadily growing inventory of distinct entropy theories. Diversity and unity are two significant features of these theories. The growing diversity of information measures makes it increasingly more realistic to find a certain information measure suitable for a given condition. The unity allows us to view all available information measures as a whole and to move from one measure to another as needed. To that end, motivated by the researching approaches on Shannon entropy, Shannon mutual information [2], SSE [19], and SEE [23], we attempt to study information-theoretic metrics in their manifestations. On one hand, we propose several new types of entropies and their similarity measures; on the other hand, for each type of the existing entropies, except for Shannon entropy, we give the definitions of similarity measures (see Tables 1 and 2). Finally, we deduce the unifying frameworks for information measures emerging from the study of complex systems based on probability.

Table 1: Entropies, conditional entropies, and joint entropies defined on density functions and survival functions.
Table 2: The similarity measures and normalized similarity measures defined on density functions and survival functions.

The remainder of this paper is organized as follows. Section 2 will propose the similarity measures defined on the density function. In Section 3, the survival Rényi entropy and survival Tsallis entropy are presented. In Section 4, we address the similarity measures defined on the survival function. The unifying frameworks of information measures and examples are provided in Section 5. Finally, we conclude this paper in Section 6.

2. Similarity Measures Defined on the Density Function

Shannon mutual information measures the information of an r.v. conveying about another r.v. . It has been widely used in image registration [24, 25] and pattern recognition [26, 27]. Generally, as SMI is defined on Shannon entropy, each type of entropies would lead to corresponding similarity measures. In application, an idea similarity measure should be nonnegative. To that end, we take the way as [15, 19, 23] to define the similarity measures by linear expectation operator rather than KN average operator weighted by the escort distribution [28]. This section will present the similarity measures defined on the density function corresponding to Rényi entropy, Tsallis entropy, and the exponential entropy, respectively.

2.1. Rényi Mutual Information

Lemma 1. Let and be r.v.s and let be a real convex function. Then If is strictly convex, the equality holds if and only if X and Y are independent. If is concave, the inequality is reversed.

Proof. For a real convex function , using Jensen’s inequality [29], we obtain The equality holds if X and Y are independent. Since , it is immediate that Now consider the "only if" part of the lemma. If X and Y are independent, then , and thus the equality holds in (15). On the other hand, if the equality holds in (15), then the equality holds in Jensen’s inequality (16), which leads to . Then almost surely, since is strictly convex. We obtain by (18); hence , which leads to the independence of X and Y.
Lemma 1 plays an important role to prove the nonnegativity for the similarity measure to be introduced, which is defined on the density function.

Definition 2. Let X and Y be r.v.s; the conditional Rényi entropy of given with order is defined by where and .
Motivated by the definitions of the joint Shannon entropy and joint survival Shannon entropy , the joint Rényi entropy can be similarly introduced.

Definition 3. The joint Rényi entropy of r.v.s X and Y with order is defined as , where and .

Theorem 4. For r.v.s X and Y, we obtain and for all and .

Proof. Since and is concave of for all , using Lemma 1 and Jensen’s inequality, we have Since and is convex of for all , using Lemma 1 and Jensen’s inequality, we obtain Similarly, we have , and thus .

Definition 5. The Rényi mutual information (RMI) of r.v.s and with order is defined as where and .
It is worth pointing out that the definition of RMI parallels with the definitions of SMI (5) and CSSE (13). The nonnegativity of RMI is ensured by Theorem 4. Considering Theorem 4 that parallels with (5), we can give another form of the definition for RMI as There are no essential differences between these two forms of definitions for RMI. We only consider the similar definitions as (24) for similarity measures throughout this paper.
Using L’Hôpital’s rules, it is easy to obtain and .
The normalized Shannon mutual information (NSMI) [16] of r.v.s and was given as NSMI often acts as a robust similarity measure in image registration [16, 30], attribute abstraction [31], and clustering [32]. Note that . In a similar way, different forms of the normalized mutual information will be deduced in this work.

Definition 6. The normalized Rényi mutual information (NRMI) of r.v.s and with order is defined by where and .
We immediately obtain by L’Hôpital’s rules.

2.2. Tsallis Mutual Information

Definition 7. The conditional Tsallis entropy of r.v. given with order is defined as where and .

Definition 8. The joint Tsallis entropy of r.v.s and with order is defined as , where and .

Theorem 9. For two r.v.s X and Y, we have and for all and .

Proof. Since and is concave of for all , using Lemma 1, we obtain The inequality holds for all , since and is convex of .
It is trivial to verify that .

Definition 10. The Tsallis mutual information (TMI) of r.v.s and with order is defined as , where and .
Using Theorem 9, we have .

Definition 11. The normalized Tsallis mutual information (NTMI) of r.v.s and with order is defined by where and .
It is easy to verify thatby L’Hôpital’s rules, and .

2.3. Exponential Mutual Information

Definition 12. The conditional exponential entropy of r.v. given with order is defined by where and .

Definition 13. The joint exponential entropy of r.v.s and with order is defined as , where and .

Theorem 14. For two r.v.s X and Y, for all and .

Proof. Since is concave of for all , using Lemma 1, we obtain . The equality is true, since for all .
Similarly, since is convex of for all , using Lemma 1, we obtain . We complete the proof by considering that is decreasing in for all .

Theorem 15. For two r.v.s X and Y, we obtain and for all and .

Proof. Since is convex of for all , using (30) and Jensen’s inequality, we obtain Since is concave and is decreasing in for all , similarly we obtain It is trivial to verify that .

Definition 16. The exponential mutual information (EMI) of r.v.s X and Y with order is defined as , where and .
Using Theorem 15, we have .

Definition 17. The normalized exponential mutual information (NEMI) of r.v.s X and Y with order is defined by where and .

3. Entropies Defined on the Survival Function

The existing survival Shannon entropy and the survival exponential entropy extended the corresponding functionals from the density function to the survival function. In this section, we will propose the survival Rényi entropy and the survival Tsallis entropy defined on the survival function which, respectively, parallel with the classical Rényi entropy and Tsallis entropy defined on the density function.

3.1. Survival Rényi Entropy

If the density function is replaced by the survival function, is set as 1, and is chosen as in (3), it yields the survival Rényi entropy.

Definition 18. The survival Rényi entropy (SRE) of an r.v. with order is defined as where and .
It is immediate to see that .

Definition 19. The conditional survival entropy of r.v. given with order is defined as where and .

Definition 20. The joint survival Rényi entropy of r.v.s and with order is defined as , where and .

Definition 21. The joint survival exponential entropy of r.v.s and with order is defined as , where , , and is the conditional survival exponential entropy of r.v. given .
Theorem 22 will show the relation between SRE and Shannon entropy.

Theorem 22. For an r.v. , one has

Proof. Using the log-sum inequality [2, 19], we obtain and thus . We obtain Since , (38) can be written as Note that for all and for all . We complete the proof by multiplying on both sides of (39).

Theorem 23. Let and be two r.v.s; then

Proof. If , using Theorem 22, one hasSimilarly, the inequality in (41) holds for all .
Since , some properties of SRE can be similarly deduced by the proof approaches as SEE [23] and SSE [19]. We list these properties as propositions and neglect their proofs unless we can provide the improved, different, or more concise versions.

Proposition 24. If , for some , , then for all and .

Proof. For sets , using Hölder’s inequality [33], we have For all , we obtain The inequality in (44) follows from Markov’s inequality [19, 34], where the integral on the right side exists if , that is, if . Hence, exists for all .
(i) If , note that for all ; then (ii) If , on one hand, for all , since exists for all , we see . On the other hand, for all , we obtain and . We complete the proof, since for all .

Proposition 25. If the components , of r.v. are independent, then for all and .
Proposition 25 is the immediate result of Theorem 15 in [23].
The Shannon entropy of a sum of independent variables is larger than that of either; for example, . SRE has this similar property.

Proposition 26. Let X and Y be independent r.v.s; then for all and .

Proof. Since and are independent, one has . Since and is concave of for all , using Jensen’s inequality, one has Integrating both sides of from 0 to , Then Multiplying by on both sides of the above leads to , and exchanging and leads to .
We complete the proof by using Jensen’s inequality and considering that and is convex of for all in a similar way.

Proposition 27. Let and let be r.v.s with , for a constant vector ; then for all .

Proposition 28. Let be a sequence of m-dimensional nonnegative r.v.s converging in law to r.v. X. If all are bounded in for some , then for all and .

Proposition 29. If , of the r.v. are independent, then for all and .

3.2. Survival Tsallis Entropy

If the density function is replaced by the survival function, , and in (3), or, equivalently, the density function is replaced by the survival function and logarithm function is replaced by -logarithm function in (7), it yields the survival Tsallis entropy.

Definition 30. The survival Tsallis entropy (STE) of an r.v. with order is defined as where and .
Note that . Using integration by parts formula, we obtain Hence, the survival Tsallis entropy can also be written as . It is easy to see that and

Definition 31. The conditional survival Tsallis entropy of r.v. given with order is defined as where and .
Note that . Then .

Definition 32. The joint survival Tsallis entropy of r.v.s and with order is defined as , where and .
It is easy to see that and using L’Hôpital’s rules.

Theorem 33. Let be an r.v.; then

Proof. It is easy to verify using (51) and Theorem 22.
Since there is a relation among , , and by (51), some properties of STE can be deduced by the theorems of SEE and SRE. We only list these properties as propositions and provide necessary explanations for their proofs.

Proposition 34. If for some , , then for all and .

Proof. It is easy to verify that . By the proof of Proposition 24, we see that and exist. Thus, for all and .

Proposition 35. Let and let be r.v.s with , and let be a constant vector; then for all and .
This proposition can be proven using (30) in [23].

Proposition 36. Let be a sequence of -dimensional nonnegative r.v.s converging to X: . If all are bounded in for some , then for all and .
It is immediate using (38) in [23].

Proposition 37. Let and be nonnegative and independent r.v.s; then for all and .

Proof. It can be proven in a similar way as Proposition 26 by considering and is convex of for all and and is concave of for all .

4. Similarity Measures Defined on the Survival Function

Paralleling with the similarity measures and the normalized similarity measures defined on the density function in Section 2, this section will focus on the corresponding similarity measures and the normalized similarity measures defined on the survival function. Traditionally, the kernel point is the proofs of the nonnegativity for the similarity measures to be introduced.

4.1. Cross Survival Rényi Entropy and Cross Survival Exponential Entropy

Lemma 38. Let X and Y be r.v.s and let be a real convex function; then If, moreover, is strictly convex, then equality holds in (54) if and only if X and Y are independent. The inequality is reversed if is concave.
Lemma 38 was proven in [23]. It is a cornerstone to prove the nonnegativity of each form of the similarity measure to be introduced according to the survival function.

Theorem 39. Let X and Y be r.v.s; then and for all and .

Proof. Since is concave of and for all , note that log is strictly concave of t. Using Lemma 38 and Jensen’s inequality, we obtain Similarly, considering that is convex of and for all , the conclusion is the same.
It is trivial to verify that .
For r.v.s X and Y, using Theorem 39, we have for all and .

Definition 40. The cross survival Rényi entropy (CSRE) of r.v.s and with order is defined as , where and .

Definition 41. The cross survival exponential entropy (CSEE) of r.v.s and with order is defined as , where and .
Using Theorem 39 and (56), we obtain and .

Definition 42. The normalized cross survival Rényi entropy (NCSRE) of r.v.s and with order is defined as where and .

Definition 43. The normalized cross survival exponential entropy (NCSEE) of r.v.s and with order is defined as where and .

Definition 44. The normalized cross survival Shannon entropy (NCSSE) of r.v.s and is defined as

Proposition 45. Let and be two r.v.s; then , , and , for and .

Proof. Since is convex of for , using Lemma 38, it is immediate that . The rest can be similarly proven by considering the range of and using Lemma 38.
Using L’Hôpital’s rules, it is easy to see that and .

4.2. Cross Survival Tsallis Entropy

Theorem 46. For two r.v.s and , one has and for all and .

Proof. Since , using Lemma 38, considering and is concave of for all , we obtain For , similarly the conclusion is the same.
It is immediate to deduce the rest of conclusion by Definition 32.

Definition 47. The cross survival Tsallis entropy (CSTE) of r.v.s and with order is defined as , where and .
Using Theorem 46, we obtain immediately.

Definition 48. The normalized cross survival Tsallis entropy (NCSTE) of r.v.s and with order is defined as where and .
We obtain and by L’Hôpital’s rules.
Note that , , , and , whereas is symmetric; for example, . We can define the symmetric versions of the similarity measures and the normalized similarity measures, taking the cross survival Rényi entropy as an example, as The similar way can be used to define the symmetric similarity measures and normalized similarity measures for those defined on the density function.

5. Unifying Frameworks and Examples

In this section, based on the generalized denotations on the entropies discussed previously, we will classify the fourteen types of entropies in two categories and then deduce the unifying presentations for entropies, similarity measures, and normalized similarity measures. Examples are also provided to unveil some properties of the information measures.

As enumerated in Table 1, different types of entropies have been discussed in this paper. There are three components in each item: entropy, conditional entropy, and joint entropy. In general, entropies in Table 1 can be classified into two categories: one is defined on the density function and the other is defined on the survival function. For simplicity, we refer to them as the density entropy and survival entropy, respectively. It is demonstrated that, for each type of the density entropy in Column 2, there is a survival entropy in Column 4 corresponding to it.

5.1. The Unifying Frameworks of Information Measures

For convenience, we view Shannon entropy, Rényi entropy, Tsallis entropy, and the exponential entropy as the classical density entropy and view their corresponding survival entropies as the classical survival entropy. We can see that the classical density entropy, the classical survival entropy, and their conditional entropy and joint entropy share similar presentations.

Let be one type of the generalized density entropy or the generalized survival entropy of r.v. with order . If , then means the Shannon entropy or the survival Shannon entropy. In these notations, is the conditional entropy with order . The corresponding joint entropy of r.v.s and with order can be introduced as

For r.v.s and , one has for all .

Entropies, conditional entropies, and joint entropies are listed in Table 1. The similarity measures and the normalized similarity measures are shown in Table 2 in detail, where the similarity measure is followed by the normalized one in each item. In a similar way, the similarity measure can be classified into the density similarity measure defined on the density function and the survival similarity measure defined on the survival function and so can the normalized similarity measures. Therefore, the unifying presentations for a similarity and a normalized similarity measure associated with a type of entropies can be deduced as

Note that and .

We obtain for all . Their symmetric versions can be, respectively, given by

The unifying frameworks make it possible to view all the available entropies listed in Table 1 as a whole and to move from one to another as necessary. Subsequently, the similarity measures and the normalized similarity measures are simultaneously obtained.

5.2. Three Examples

Example 1. Let be an r.v. corresponding to the exponential distribution with mean and density function . We obtain , and