#### Abstract

Inspired by the work of Zhefei He and Mingjin Wang which was published in the Journal of Inequalities and Applications in 2015, this paper further generalizes some related results to the case of multidimensional random variables. The resulting inequality for covariance is then applied to different multidimensional statistical distributions (multiuniform, multinomial, and multinormal). Coordinate dependence of the inequality is also examined. The obtained formulas could be useful for making estimates in multivariate statistics.

#### 1. Introduction

The concept of covariance appears ubiquitously in probability theory and statistics as the basic measure of correlation between random variables (see, e.g., standard textbooks [1–5]). An intuitive though elusive idea of correlation was transformed into a sound mathematical language in the century by Auguste Bravais and Francis Galton. Readers interested in the fascinating history of this subject are advised to consult a comprehensive work of Stingler [6] or two tutorial articles [7, 8]. An analysis of covariance possesses great practical importance in applied sciences [9, 10], especially in engineering (error analysis, optimum control, probabilistic design, and system identification) [11, 12], in biotechnology [13] and medical sciences [14, 15], or in economy [16].

Naturally, the means/variances/covariances of any ensemble of random variables are uniquely determined by the probability distribution (PD) corresponding to the concrete problem under study. However, both in pure mathematics and in applications, one frequently encounters a case when the pertinent PD is unknown. Under such circumstances, it is often much desirable to provide at least well defined general estimates (constraints) regarding mean/variance/covariance, namely, such estimates which are independent upon the specific PD. These constraints take typically the form of an inequality.

One remarkable result of the abovementioned sort was obtained by Chebyshev as early as in 1882 [1–3, 6]. More explicitly, the so-called Chebyshev inequality enables one to estimate which maximum fraction of values of a given random variable can be located further than a prescribed distance from the mean. Closely related are the subsequently found Ostrowski and Grüss type inequalities [1–3] in all their different variants (as listed in comprehensive monographs [17–19]).

Until nowadays, the works of Chebyshev, Ostrowski, and Grüss have continued to inspire active mathematical research focused on inequalities/estimates of (co)variance. This fact is clearly documented by rich literature dealing with the subject [20–37]. For the purposes of the present article we specifically highlight an inequality for covariance which was derived recently by He and Wang in [20]. Namely, the following theorem has been proven to hold.

Theorem I. *Here is a single random variable. One assumes that has a finite expectation value and a finite variance . Furthermore, are any bounded differentiable functions, such thatThe covariance is defined explicitly below in Section 2.*

In addition, He and Wang have applied inequality (1) to several concrete probability distributions (uniform, Gamma, Beta, Poisson, and binomial) and derived in this way various other, more specific inequalities.

The purpose of the present article is to generalize the work of He and Wang to the situation when is replaced by an -dimensional random variable with . More explicitly, we wish to generalize statement (1) and examine subsequently different multidimensional statistical distributions (multiuniform, multinomial, and multinormal).

The paper is structured as follows: Section 2 reviews all the necessary (well known and standard) prerequisites needed for an appropriate multidimensional generalization of statement (1) of Theorem I, such as the concept and basic properties of an expectation, variance, and covariance; the Cauchy-Schwartz inequality; and the Lagrange mean value theorem in dimensions. Section 3 represents the main core of our work, since it generalizes (1) as mentioned above. In Section 4 we apply our newly derived multidimensional random inequality to concrete multidimensional probability distributions (multiuniform, multinomial, normal, and multinormal). The purpose of Section 5 is to examine coordinate dependence of our generalized inequality (33) stated by Theorem II. Section 6 contains a brief conclusion and prospects of our future research work.

#### 2. Preparatory Considerations

##### 2.1. Multidimensional Random Variables

###### 2.1.1. Discrete Case

Assume that(i) is a countable set, (ii) is an -dimensional random variable, (iii) are the associated probabilities completely determining the statistical distribution of

Then(a)the probabilities satisfy the normalization condition(b)an expectation of is given by formula Hereafter we shall assume that is well defined and finite(c)the variance of () is given by formula Hereafter we shall assume that is well defined and finite. Relation (5) can be recast into an equivalent formLet be a function which is bounded in . Boundedness of means thatThen *( α)*an expectation is finite and well defined, since

*(*also the variance is finite and well defined, since

*β*)###### 2.1.2. Continuous Case

Assume that(i) is an open set, (ii) is an -dimensional random variable, (iii) are the associated probability densities completely determining the statistical distribution of

Then(a)the probability densities satisfy the normalization condition(b)an expectation of is given by formula Hereafter we shall assume that is well defined and finite(c)the variance of () is given by formula Hereafter we shall assume that is well defined and finite. Similarly as above in (5), also relation (14) can be recast into an equivalent formLet be a function which is measurable and bounded in . Boundedness of means thatThen *( α)*an expectation is finite and well defined, since

*(*also the variance is finite and well defined, since

*β*)##### 2.2. Covariance

Let be an -dimensional random variable. We define the covariance by prescriptionwith . Equivalently one can writeRecall that as follows from (5)-(6).

After taking an absolute value of (21) one getsThe well known Cauchy-Schwarz inequality (to be reviewed in the following subsection) implies howeverHence one may conclude thatThis means also that is well defined and finite as long as all the variances are well defined and finite.

##### 2.3. The Cauchy-Schwartz Inequality

Let and be any two vectors in .The proof is discussed in all standard textbooks of functional analysis, e.g., in [38].

##### 2.4. The Lagrange Mean Value Theorem in Dimensions

Let , be real valued differentiable functions defined in an open convex set . Let , . Convexity of implies that a straight line segment connecting with is entirely contained within . The Lagrange mean value theorem states that there always exists a number with the basic propertyBecause it is not easy to find a proof of statement (27) in available standard textbooks, we prefer to supplement here our own short proof. It is inspired by Theorem 4.2 given on page 378 of [39]: Direct calculation confirms thatThe Lagrange mean value theorem of the integral calculus (see, e.g., [40]) implies however thatwhere is a generally unknown fixed number (depending of course not only upon and but also on the function ). Combination of (28) and (29) yields now immediately the desired claim (27).

#### 3. Multidimensional Generalization of Theorem I

##### 3.1. Preliminaries

Before formulating the above advertised multidimensional generalization of Theorem (1), let us introduce some additional notations and conventions. Recall that is an -dimensional random variable with a finite expectation and a finite variance (see Section 2.1 for details). We define an auxiliary quantitySymbol will hereafter stand for an open convex subset of (case is also allowed to occur). If is a discrete random vector, then we tacitly assume .

Assume now that two functions are continuous and differentiable in . Assume also that all the partial derivatives , are bounded in . Then one may define auxiliary symbolsSubsequently one may introduce additional notations and through the formulas

##### 3.2. Our Basic Theorem

Now we are ready to state our own multidimensional generalization of Theorem I.

Theorem II. *Assuming the above specified notations and conventions, one has*

*Proof. *Since the proof is a bit lengthy, we shall conveniently divide it into parts: (i)let be a fixed parameter. Then is just a fixed number, and Define a quantity The Lagrange mean value theorem (Section 2.4) states that where . Hence In the last line of (37) we have used the triangle inequality(ii)by definition (17), we have But the involved integral satisfies an inequality Relations (38)-(39) correspond of course to continuous statistical distributions. Analogical formulas apply also in the case of discrete random variables (one merely replaces by ). We leave the details to the reader(iii)after plugging (39) into (37) one finds that Cauchy-Schwarz inequality (26) implies however Hence also where consonantly with (32)(iv)proceeding further, the variance relation implies , giving in turn Yet the variance formula yields . This is valid since and . We may thus conclude that(v)comparison of (35) and (47) provides an inequality So far, an entity was treated as a fixed parameter(vi)now we shall set to be a random variable equivalent to . Consequently we take the expectation over (48). This results in Yet, by the definition of the variance (19), one can convert (49) into a simple outcome where of course consonantly with (30)(vii)completely analogical sequence of considerations can be applied also to the case of . One would arrive towards an inequality where Combination of (25), (50), and (52) provides now the desired final claim This is exactly as stated above in (33). Thus, our Theorem II is proven

#### 4. An Application of Theorem II on Different Probability Distributions

In the present section, we shall derive some new inequalities by applying our basic Theorem II to three different types of probability distributions of an -dimensional random variable. The definitions of the distributions discussed below can be found in [4, 41].

##### 4.1. Multiuniform Distribution

###### 4.1.1. Definition

The probability density function is given by formulaNotice that (a)the probability densities satisfy the normalization condition Therefore Define this is the volume of region . Then(b)the expectation of () is given by formula combining (55), (59), and (13),(c)the variance of () is then given by formula combining (15) and (60), After recalling (30) one can write

###### 4.1.2. An Application of Theorem II

**Theorem II-1.*** Assume that two functions ** are continuous and differentiable in **. Assume also that all the partial derivatives **, ** are bounded in **. Recalling notations (31) and (32), one has**where*

*Proof. *Let be an -dimensional random variable which possesses the multiuniform distribution. According to (55), (59), and (17) we haveand thereforeThendue to Theorem II, (33). After inserting (62) one getsThus the proof is completed.

###### 4.1.3. Special Case: Defined within an -Dimensional Rectangular Box

Assume that corresponds to a rectangular box. Stated mathematically,where and . Then, (a)according to (58),(b)the expectation of is equal to whereas the expectation of is(c)the variance of can be expressed as Subsequently we have

Now we are ready to make the following statement.

**Theorem II-2.*** One has**where*

*Proof. *The proof follows immediately after substituting (69), (70), and (74) into (63) and (64).

##### 4.2. Multinomial Distribution

###### 4.2.1. Definition

Assume that(i), where (a) is a given fixed number of independent trials (b) is the number of possible outcomes in each trial, and here the outcomes can be labeled as (c) is the number of occurrences of the outcome during the whole trials(ii) is the prescribed probability of the outcome for a single trial, and one hasThen(a)the probability distribution function is given by formula(b)expectation is equal to(c)variance is equal to

###### 4.2.2. An Application of Theorem II

**Theorem II-3.*** Assume that two functions ** are continuous and differentiable in ** where **. Assume also that all the partial derivatives **, ** are bounded in **. Recalling notations (31) and (32), one has*

*Proof. *Let be an -dimensional random variable which possesses the multinomial distribution. According to (78) and (8) we havethereforeYet (33) of Theorem II impliesCombination of (83), (84), and (80) yields desired inequality (81); thus the proof is completed.

##### 4.3. One-Dimensional Normal Distribution

Since the normal distribution was not discussed in paper [20] of He and Wang, we shall discuss first the case of a single random variable. Later in the subsequent subsection we shall extend our result to the case of two random variables ().

###### 4.3.1. Definition

Assume here(i)(ii) is a one-dimensional random variable(iii) are given prescribed parameters

Then(a)the probability density function is given by formula(b)the expectation is equal to(c)the variance is equal to

###### 4.3.2. An Application of Theorem I

**Theorem I-1.*** Assume that two functions ** are continuous and differentiable in **. Assume also that the derivatives **, ** are bounded in **. Then*

*Proof. *Let be a random variable which possesses the normal distribution. According to (85) and (17) we havethereforeYet (1) of Theorem I impliesCombination of (90), (91), and (87) yields desired inequality (88); thus the proof is completed.

###### 4.3.3. Standard Normal Distribution

The standard normal distribution corresponds to choosing and . Accordingly we haveAfter substituting (92) into (88) one arrives to the following outcome.

**Theorem I-2.**

##### 4.4. Standard Bivariate Normal Distribution

###### 4.4.1. Definition

Assume that(i)(ii) with and independent(iii) denotes the standard normal probability density function of , with :Then (a)the probability density function is given by formula(b)the expectation is equal to(c)the variance is equal to

###### 4.4.2. An Application of Theorem II

**Theorem II-4.*** Assume that two functions ** are continuous and differentiable in **. Assume also that all the partial derivatives **, ** are bounded in **. Recalling (31) and (32), one has*

*Proof. *Let be a 2-dimensional random variable which possesses the standard bivariate normal distribution. According to (95) and (17), one hasthereforeYet (33) of Theorem II impliesCombination of (100), (101), and (97) yields desired inequality (98); thus the proof is completed.

#### 5. Coordinate Dependence of Theorem II

Our basic relation (33) of Theorem II is formulated in terms of an -dimensional random variable . Clearly, for a given -dimensional statistical problem there are many different (mutually equivalent) choices of independent random variables (coordinates) in terms of which the probability distribution can be expressed. It is not a priori clear whether or not our basic random inequality (33) is coordinate dependent. As we show below on a concrete example, an answer is affirmative. This means also that an inequality (33) can be optimized (made stronger) via carrying out a suitable coordinate transformation.

##### 5.1. Correlated Bivariate Normal Distribution

As an illustrative example we shall take the correlated bivariate normal distribution described in [4]. Correspondingly, we have , , and . The probability distribution of is characterized by formula where is a fixed correlation parameter.

Importantly, the above introduced correlated bivariate normal distribution can be converted into an uncorrelated normal distribution via performing a bijective coordinate transformation . Here, according to [4],The associated probability density takes the form already discussed above in (95), i.e.,Direct calculation yields ; thus . Hence also

##### 5.2. An Application of Theorem II: Correlated Case

For the sake of maximum simplicity, we shall apply Theorem II on a concrete case of two elementary functionsDirect calculation yieldsRelation (33) of Theorem II boils then down towhere

##### 5.3. An Application of Theorem II: Uncorrelated Case

In the coordinates introduced above by (104) we haveDirect calculation yieldsAccording to our above derived statement (98) of Theorem II-4, relation (33) of Theorem II boils down towhere

##### 5.4. Analysis

Functions and represent the same random variable, just expressed in different coordinates. This is similar for and . If so, then covariances (111) and (116) must be equal. This can easily be verified explicitly as a consistency check. One starts with (111) and performs a substitution following (104). The corresponding Jacobian is equal toStraightforward manipulations confirm then thatexactly as being claimed.

According to (118), the l.h.s. of (110) and (115) are equal. However, the r.h.s. of (110) differs from the r.h.s. of (115), the latter being -dependent. This means in turn that statement (33) of our basic Theorem II does generally depend upon the coordinates chosen. Coordinate transformations can be thus exploited to optimize (strengthen) inequality (33).

#### 6. Conclusion and Prospects

In summary, our basic Theorem II stated above generalizes a recently derived random inequality of [20] to the case of multidimensional random variables. Six subsequent additional results (Theorems I-1~I-2, Theorems II-1~II-4) apply then Theorem II to different frequently encountered statistical distributions (multiuniform, multinomial, normal, and multinormal). Furthermore, we show in Section 5 that basic inequality (33) of Theorem II is coordinate dependent (and thus optimizable via carrying out suitable coordinate transformations). The just mentioned formulas and insights could be useful for making estimates in multivariate statistics.

Finally, we find it useful to list below three open questions which may be worthy of examining in the future. Namely,(i)we have assumed above that is a convex open set. Any nonconvex path connected open set can actually be made convex via a suitable coordinate transformation. So Theorem II turns out to be even more general than stated in Section 3. It is not a priori clear, however, what would happen in the case when is a disconnected open set. This remains to be seen(ii)an optimization of inequality (33) via coordinate transformations represents a very promising direction of further research(iii)it might be desirable to supplement some real life application which would reflect the practical value of our inequality (33). For example, one may think about applications in physics or economy

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the Natural Science Foundation of Hainan Province (Grant no. 2018CXTD338; contribution rate: 30%), the National Natural Science Foundation of China (Grant no. 11761027; contribution rate: 50%), the Scientific Research Foundation of Hainan Province Education Bureau (Grant no. Hnky2016-14; contribution rate: 10%), and the Educational Reform Foundation of Hainan Province Education Bureau (Grant no. Hnjg2017ZD-13; contribution rate: 10%).