Abstract

Some inequalities of the Slater type for convex functions defined on general linear spaces are given. Applications for norm inequalities and f-divergence measures are also provided.

1. Introduction

Suppose that is an interval of real numbers with interior , and is a convex function on . Then is continuous on and has finite left and right derivatives at each point of . Moreover, if and , then which shows that both and are nondecreasing functions on . It is also known that a convex function must be differentiable except for at most countably many points.

For a convex function , the subdifferential of denoted by is the set of all functions such that and

It is also well known that if is convex on , then is nonempty, , and if , then In particular, is a nondecreasing function.

If is differentiable and convex on , then .

The following result is well known in the literature as the Slater inequality.

Theorem 1.1 (Slater, 1981, [1]). If is a nonincreasing (nondecreasing) convex function, with and , where , then

As pointed out in [2] (see also [3, p. 64] and [4, p. 208]), the monotonicity assumption for the derivative can be replaced with the condition which is more general and can hold for suitable points in and for not necessarily monotonic functions.

For recent works on Slater’s inequality, see [57].

The main aim of the present paper is to extend Slater’s inequality for convex functions defined on general linear spaces. A reverse of the Slater’s inequality is also obtained. Natural applications for norm inequalities and -divergence measures are provided as well.

2. Slater’s Inequality for Functions Defined on Linear Spaces

Assume that is a convex function on the real linear space . Since for any vectors the function is convex, it follows that the following limits exist and they are called the right (left) Gâteaux derivatives of the function in the point over the direction .

It is obvious that for any we have for any and, in particular, for any . We call this the gradient inequality for the convex function . It will be used frequently in the sequel in order to obtain various results related to Slater’s inequality.

The following properties are also of importance: for any and .

The right Gâteaux derivative is subadditive while the left one is superadditive, that is, for any .

Some natural examples can be provided by the use of normed spaces.

Assume that is a real normed linear space. The function , is a convex function which generates the superior and the inferior semi-inner products For a comprehensive study of the properties of these mappings in the Geometry of Banach Spaces, see the monograph [8].

For the convex function , with , we have for any .

If , then we have for any .

For a given convex function and a given -tuple of vectors , we consider the sets where is a given probability distribution, that is, for and .

The following properties of these sets hold.

Lemma 2.1. For a given convex function , a given -tuple of vectors , and a given probability distribution , one has (i) and ;(ii) and for all ;(iii) the sets and are convex.

Proof. The properties (i) and (ii) follow from the definition and the fact that for any .
(iii) Let us only prove that is convex.
If we assume that and with , then by the superadditivity and positive homogeneity of the Gâteaux derivative in the second variable we have for all , which shows that
The proof for the convexity of is similar and the details are omitted.

For the convex function , with , defined on the normed linear space and for the -tuple of vectors we have, by the well-known property of the semi-inner products, that which, as can be seen, does not depend on . We observe, by the continuity of the semi-inner products in the first variable, that is closed in . Also, we should remark that if , then for any we also have that .

The larger classes, which are dependent on the probability distribution , are described by If the normed space is smooth, that is, the norm is Gâteaux differentiable in any nonzero point, then the superior and inferior semi-inner products coincide with the Lumer-Giles semi-inner product that generates the norm and is linear in the first variable (see for instance [8]). In this situation, If is an inner product space, then can be described by and if the family is orthogonal, then obviously, by the Pythagoras theorem, we have that the sum belongs to and therefore to for any and any probability distribution .

We can state now the following results that provide a generalization of Slater’s inequality as well as a counterpart for it.

Theorem 2.2. Let be a convex function on the real linear space , an -tuple of vectors, and a probability distribution. Then for any , one has the inequalities

Proof. If we write the gradient inequality for and , then we have that for any .
By multiplying (2.18) with and summing over from 1 to , we get Now, since , then the right hand side of (2.19) is nonnegative, which proves the second inequality in (2.17).
By the superadditivity of the Gâteaux derivative in the second variable, we have which, by multiplying with and summing over from 1 to , produces the inequality Utilising (2.19) and (2.21), we deduce the desired result (2.17).

Remark 2.3. The above result has the following form for normed linear spaces. Let be a normed linear space, an -tuple of vectors from , and a probability distribution. Then for any vector with the property we have the inequalities
Rearranging the first inequality in (2.23), we also have that
If the space is smooth, then the condition (2.22) becomes implying the inequality Notice also that the first inequality in (2.26) is equivalent with

Corollary 2.4. Let be a convex function on the real linear space , an -tuple of vectors, and a probability distribution. If and there exists a vector with then

Proof. Assume that and and define . We claim that .
By the subadditivity and positive homogeneity of the mapping in the second variable, we have as claimed. Applying Theorem 2.2 for this , we get the desired result.
If and , then for we also have that where, for the last equality, we have used the property (2.4). Therefore, and by Theorem 2.2 we get the desired result.

It is natural to consider the case of normed spaces.

Remark 2.5. Let be a normed linear space, an -tuple of vectors from , and a probability distribution. Then for any vector with the property that we have the inequalities
The case of smooth spaces can be easily derived from the above; however, the details are left to the interested reader.

3. The Case of Finite Dimensional Linear Spaces

Consider now the finite dimensional linear space and assume that is an open convex subset of . Assume also that the function is differentiable and convex on . Obviously, if , then for any we have For the convex function and a given -tuple of vectors with with , we consider the sets where is a given probability distribution.

As in the previous section the sets, and are convex and closed subsets of clo, the closure of . Also for any is a probability distribution.

Proposition 3.1. Let be a convex function on the open convex set in the finite dimensional linear space , an -tuple of vectors and a probability distribution. Then for any , one has the inequalities

The unidimensional case, that is, is of interest for applications. We will state this case with the general assumption that is a convex function on an open interval . For a given -tuple of vectors , we have where   is a probability distribution. These sets inherit the general properties pointed out in Lemma 2.1. Moreover, if we make the assumption that , then for we have while for we have

Also, if we assume that for all and , then due to the fact that and is a convex set.

Proposition 3.2. Let be a convex function on an open interval . For a given -tuple of vectors and a probability distribution, one has for any .
In particular, if one assumes that and then
Moreover, if for all and , then (3.10) holds true as well.

Remark 3.3. We remark that the first inequality in (3.10) provides a reverse inequality for the classical result due to Slater.

4. Some Applications for -Divergences

Given a convex function , the -divergence functional where are positive sequences, was introduced by Csiszár in [9], as a generalized measure of information, a “distance function’’ on the set of probability distributions . As in [9], we interpret undefined expressions by The following results were essentially given by Csiszár and Körner [10]:(i)if is convex, then is jointly convex in and ; (ii)for every , we have

If is strictly convex, equality holds in (4.3) if and only if

If is normalized, that is, , then for every with , we have the inequality

In particular, if , then (4.5) holds. This is the well-known positivity property of the -divergence.

It is obvious that the above definition of can be extended to any function ; however, the positivity condition will not generally hold for normalized functions and with .

For a normalized convex function and two probability distributions , we define the set Now, observe that is equivalent with If , then (4.8) is equivalent with therefore in this case

If , then (4.8) is equivalent with therefore

Utilising the extended -divergences notation, we can state the following result.

Theorem 4.1. Let be a normalized convex function and two probability distributions. If , then one has In particular, if one assumes that and then Moreover, if for all and , then (4.15) holds true as well.

The proof follows immediately from Proposition 3.2 and the details are omitted.

The K. Pearson -divergence is obtained for the convex function and given by

The Kullback-Leibler divergence can be obtained for the convex function , and is defined by

If we consider the convex function , , then we observe that

For the function , we will obviously have that Utilising the first part of Theorem 4.1, we can state the following.

Proposition 4.2. Let be two probability distributions. If , then one has In particular, for , one gets

If we consider now the function , , then and We observe that if are two probability distributions such that , then If , then .

By the use of Theorem 4.1, we can state now the following.

Proposition 4.3. Let be two probability distributions such that . If , then one has In particular, for , one gets

Similar results can be obtained for other divergence measures of interest such as the Jeffreys divergence and Hellinger discrimination. However, the details are left to the interested reader.

Acknowledgment

The author would like to thank the anonymous referees for their valuable comments that have been implemented in the final version of the paper.