Journal of Inequalities and Applications
Volume 2009 (2009), Article ID 941936, 20 pages
doi:10.1155/2009/941936
Research Article

Bounds for Tail Probabilities of the Sample Variance

1Vilnius Pedagogical University, Studentu 39, LT-08106 Vilnius, Lithuania
2IMAPP, Radboud University Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands

Received 11 February 2009; Accepted 20 June 2009

Academic Editor: Andrei Volodin

Copyright © 2009 V. Bentkus and M. Van Zuijlen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We provide bounds for tail probabilities of the sample variance. The bounds are expressed in terms of Hoeffding functions and are the sharpest known. They are designed having in mind applications in auditing as well as in processing data related to environment.

1. Introduction and Results

Let 𝑋 , 𝑋 1 , , 𝑋 𝑛 be a random sample of independent identically distributed observations. Throughout we write 𝜇 = 𝔼 𝑋 , 𝜎 2 = 𝔼 ( 𝑋 𝜇 ) 2 , 𝜔 = 𝔼 ( 𝑋 𝜇 ) 4 ( 1 . 1 ) for the mean, variance, and the fourth central moment of 𝑋 , and assume that 𝑛 2 . Some of our results hold only for bounded random variables. In such cases without loss of generality we assume that 0 𝑋 1 . Note that 0 𝑋 1 is a natural condition in audit applications.

The sample variance 𝜎 2 of the sample 𝑋 1 , , 𝑋 𝑛 is defined as 𝜎 2 = 1 𝑛 1 𝑛 𝑖 = 1 ( 𝑋 𝑖 𝑋 ) 2 , ( 1 . 2 ) where 𝑋 is the sample mean, 𝑛 𝑋 = 𝑋 1 + + 𝑋 𝑛 . We can rewrite (1.2) as 𝜎 2 = 1 𝑛 ( 𝑛 1 ) 𝑖 𝑗 , 1 𝑖 , 𝑗 𝑛 ( 𝑋 𝑖 𝑋 𝑗 ) 2 2 . ( 1 . 3 )

We are interested in deviations of the statistic 𝜎 2 from its mean 𝜎 2 = 𝔼 𝜎 2 , that is, in bounds for the tail probabilities of the statistic 𝑇 = 𝜎 2 𝜎 2 , { 𝑇 𝑡 } = 𝜎 2 𝜎 2 𝑡 , 0 𝑡 𝜎 2 , ( 1 . 4 ) { 𝑇 𝑡 } = 𝜎 2 𝜎 2 + 𝑡 , 𝑡 0 . ( 1 . 5 )

The paper is organized as follows. In the introduction we give a description of bounds, some comments, and references. In Section 2 we obtain sharp upper bounds for the fourth moment. In Section 3 we give proofs of all facts and results from the introduction.

If 0 𝑋 1 , then the range of interest in (1.5) is 0 𝑡 𝛾 2 , where

𝛾 2 = 1 4 𝜎 2 + 1 4 1 ( 𝑛 1 ) , i f 𝑛 i s e v e n , 4 𝜎 2 + 1 4 𝑛 , i f 𝑛 i s o d d . ( 1 . 6 ) The restriction 0 𝑡 𝜎 2 on the range of 𝑡 in (1.4) (resp., 0 𝑡 𝛾 2 in (1.5) in cases where the condition 0 𝑋 1 is fulfilled) is natural. Indeed, 𝑃 { 𝑇 𝑡 } = 0 for 𝑡 > 𝜎 2 , due to the obvious inequality 𝜎 2 0 . Furthermore, in the case of 0 𝑋 1 we have { 𝑇 𝑡 } = 0 for 𝑡 > 𝛾 2 since 𝜎 2 𝛾 2 + 𝜎 2 (see Proposition 2.3 for a proof of the latter inequality).

The asymptotic (as 𝑛 ) properties of 𝑇 (see Section 3 for proofs of (1.7) and (1.8)) can be used to test the quality of bounds for tail probabilities. Under the condition 𝔼 𝑋 4 < the statistic 𝑇 = 𝜎 2 𝜎 2 is asymptotically normal provided that 𝑋 is not a Bernoulli random variable symmetric around its mean. Namely, if 𝜔 > 𝜎 4 , then l i m 𝑛 𝑛 𝑇 𝑦 𝜔 𝜎 4 = 1 Φ ( 𝑦 ) , 𝑦 . ( 1 . 7 ) If 𝜔 = 𝜎 4 (which happens if and only if 𝑋 is a Bernoulli random variable symmetric around its mean), then asymptotically 𝑇 has 𝜒 2 type distribution, that is, l i m 𝑛 𝑛 𝑇 𝑦 𝜎 2 𝜂 = 2 1 𝑦 , 𝑦 , ( 1 . 8 ) where 𝜂 is a standard normal random variable, and Φ ( 𝑦 ) = { 𝜂 𝑦 } is the standard normal distribution function.

Let us recall already known bounds for the tail probabilities of the sample variance (see (1.19)–(1.21)). We need notation related to certain functions coming back to Hoeffding [1]. Let 0 < 𝑝 1 and 𝑞 = 1 𝑝 . Write 𝐻 ( 𝑥 ; 𝑝 ) = 1 + 𝑞 𝑥 𝑝 𝑞 𝑥 𝑝 ( 1 𝑥 ) 𝑞 𝑥 𝑞 , 0 𝑥 1 . ( 1 . 9 ) For 𝑥 0 we define 𝐻 ( 𝑥 ; 𝑝 ) = 1 . For 𝑥 > 1 we set 𝐻 ( 𝑥 ; 𝑝 ) = 0 . Note that our notation for the function 𝐻 is slightly different from the traditional one. Let 𝜆 0 . Introduce as well the function Π ( 𝑥 ; 𝜆 ) = 𝑒 𝑥 𝑥 1 + 𝜆 𝑥 𝜆 f o r 𝑥 0 , ( 1 . 1 0 ) and Π ( 𝑥 ; 𝜆 ) = 1 for 𝑥 0 . One can check that 𝑝 𝐻 ( 𝑥 ; 𝑝 ) Π 𝑥 ; 𝑞 . ( 1 . 1 1 ) All our bounds are expressed in terms of the function 𝐻 . Using (1.11), it is easy to replace them by bounds expressed in terms of the function Π , and we omit related formulations.

Let 0 𝑝 < 1 and 𝜎 2 0 . Assume that 𝜎 𝑝 = 2 1 + 𝜎 2 1 , 𝑞 = 1 + 𝜎 2 , 𝑝 + 𝑞 = 1 . ( 1 . 1 2 ) Let 𝜀 be a Bernoulli random variable such that { 𝜀 = 𝜎 2 } = 𝑞 and { 𝜀 = 1 } = 𝑝 . Then 𝔼 𝜀 = 0 and 𝔼 𝜀 2 = 𝜎 2 . The function 𝐻 is related to the generating function (the Laplace transform) of binomial distributions since 𝐻 ( 𝑥 ; 𝑝 ) = i n f > 0 𝐻 e x p { 𝑥 } 𝔼 e x p { 𝜀 } , ( 1 . 1 3 ) 𝑛 ( 𝑥 ; 𝑝 ) = i n f > 0 𝜀 e x p { 𝑛 𝑥 } 𝔼 e x p 1 + + 𝜀 𝑛 , ( 1 . 1 4 ) where 𝜀 1 , , 𝜀 𝑛 are independent copies of 𝜀 . Note that (1.14) is an obvious corollary of (1.13). We omit elementary calculations leading to (1.13). In a similar way Π ( 𝑥 ; 𝜆 ) = i n f > 0 e x p { 𝑥 } 𝔼 e x p { ( 𝜂 𝜆 ) } , ( 1 . 1 5 ) where 𝜂 is a Poisson random variable with parameter 𝜆 .

The functions 𝐻 and Π satisfy a kind of the Central Limit Theorem. Namely, for given 0 < 𝑝 < 1 and 𝑦 0 we have l i m 𝑛 𝐻 𝑛 𝑦 𝑛 1 / 2 𝑝 𝑞 ; 𝑝 = l i m 𝑛 Π 𝑛 𝑦 𝑛 1 / 2 𝑦 𝜆 ; 𝜆 = e x p 2 2 ( 1 . 1 6 ) (we omit elementary calculations leading to (1.16)). Furthermore, we have [1] 𝐻 𝑦 𝑝 𝑞 𝑦 ; 𝑝 e x p 2 2 , 1 2 𝑝 < 1 , 𝑦 0 , ( 1 . 1 7 ) and we also have [2] 𝐻 𝑦 𝑝 𝑞 ; 𝑝 e x p 𝑝 𝑦 2 1 2 𝑞 ( 𝑦 + 1 ) , 0 𝑝 2 , 𝑦 0 . ( 1 . 1 8 )

Using the introduced notation, we can recall the known results (see [2, Lemma 3 . 2 ]). Let 𝑘 = [ 𝑛 / 2 ] be the integer part of 𝑛 / 2 . Assume that 0 𝑋 1 . If 𝜎 2 is known, then { 𝑇 𝑡 } 𝑈 0 , 𝑈 0 d e f = 𝐻 𝑘 𝑡 𝜎 2 ; 1 2 𝜎 2 . ( 1 . 1 9 ) The right-hand side of (1.19) is an increasing function of 𝜎 2 1 / 4 (see Section 3 for a short proof of (1.19) as a corollary of Theorem 1.1). If 𝜎 2 is unknown but 𝜇 is known, then { 𝑇 𝑡 } 𝑈 1 , 𝑈 1 d e f = 𝐻 𝑘 𝑡 𝜇 𝜇 2 ; 1 2 𝜇 + 2 𝜇 2 . ( 1 . 2 0 ) Using the obvious estimate 𝜎 2 𝜇 ( 1 𝜇 ) , the bound (1.20) is implied by (1.19). In cases where both 𝜇 and 𝜎 2 are not known, we have { 𝑇 𝑡 } 𝑈 2 , 𝑈 2 d e f = 𝐻 𝑘 1 4 𝑡 ; 2 , ( 1 . 2 1 ) as it follows from (1.19) using the obvious bound 𝜎 2 1 / 4 .

Let us note that the known bounds (1.19)–(1.21) are the best possible in the framework of an approach based on analysis of the variance, usage of exponential functions, and of an inequality of Hoeffding (see (3.3)), which allows to reduce the problem to estimation of tail probabilities for sums of independent random variables. Our improvement is due to careful analysis of the fourth moment which appears to be quite complicated; see Section 2. Briefly the results of this paper are the following: we prove a general bound involving 𝜇 , 𝜎 2 , and the fourth moment 𝜔 ; this general bound implies all other bounds, in particular a new precise bound involving 𝜇 and 𝜎 2 ; we provide as well bounds for lower tails { 𝑇 𝑡 } ; we compare the bounds analytically, mostly as 𝑛 is sufficiently large.

From the mathematical point of view the sample variance is one of the simplest nonlinear statistics. Known bounds for tail probabilities are designed having in mind linear statistics, possibly also for dependent observations. See a seminal paper of Hoeffding [1] published in JASA. For further development see Talagrand [3], Pinelis [4, 5], Bentkus [6, 7], Bentkus et al. [8, 9], and so forth. Our intention is to develop tools useful in the setting of nonlinear statistics, using the sample variance as a test statistic.

Theorem 1.1 extends and improves the known bounds (1.19)–(1.21). We can derive (1.19)–(1.21) from this theorem since we can estimate the fourth moment 𝜔 via various combinations of 𝜇 and 𝜎 2 using the boundedness assumption 0 𝑋 1 .

Theorem 1.1. Let 𝑘 = [ 𝑛 / 2 ] and 𝜔 0 0 .
If 𝔼 𝑋 4 < and 𝜔 𝜔 0 , then { 𝑇 𝑡 } 𝑈 , 𝑈 d e f = 𝐻 𝑘 𝑡 𝜎 2 ; 𝑝 ( 1 . 2 2 ) with 𝜎 𝑝 = 4 + 𝜔 0 3 𝜎 4 + 𝜔 0 = 𝑠 2 1 + 𝑠 2 , 𝑠 2 = 𝜎 4 + 𝜔 0 2 𝜎 4 . ( 1 . 2 3 )
If 0 𝑋 1 and 𝜔 𝜔 0 , then { 𝑇 𝑡 } 𝐿 , 𝐿 d e f = 𝐻 𝑘 2 𝑡 1 2 𝜎 2 ; 𝑝 ( 1 . 2 4 ) with 𝑝 = 2 𝜎 4 + 2 𝜔 0 1 4 𝜎 2 + 6 𝜎 4 + 2 𝜔 0 = 𝑠 2 1 + 𝑠 2 , 𝑠 2 = 2 𝜎 4 + 2 𝜔 0 1 2 𝜎 2 2 . ( 1 . 2 5 )
Both bounds 𝑈 and 𝐿 are increasing functions of 𝑝 , 𝜔 0 , and 𝑠 2 .

Remark 1.2. In order to derive upper confidence bounds we need only estimates of the upper tail { 𝑇 𝑡 } (see [2]). To estimate the upper tail the condition 𝔼 𝑋 4 < is sufficient. The lower tail { 𝑇 𝑡 } has a different type of behavior since to estimate it we indeed need the assumption that 𝑋 is a bounded random variable.

For 0 𝑋 1 Theorem 1.1 implies the known bounds (1.19)–(1.21) for the upper tail of 𝑇 . It implies as well the bounds (1.26)–(1.29) for the lower tail. The lower tail has a bit more complicated structure, (cf. (1.26)–(1.29) with their counterparts (1.19)–(1.21) for the upper tail).

If 𝜎 2 is known, then { 𝑇 𝑡 } 𝐿 0 , 𝐿 0 d e f = 𝐻 𝑘 2 𝑡 1 2 𝜎 2 ; 2 𝜎 2 . ( 1 . 2 6 ) One can show (we omit details) that the bound 𝐿 0 is not an increasing function of 𝜎 2 . A bit rougher inequality { 𝑇 𝑡 } 𝐿 0 , 𝐿 0 d e f = 𝐻 𝑘 2 𝑡 ; 2 𝜎 2 1 + 2 𝜎 2 ( 1 . 2 7 ) has the monotonicity property since 𝐿 0 is an increasing function of 𝜎 2 . If 𝜇 is known, then using the obvious inequality 𝜎 2 𝜇 ( 1 𝜇 ) , the bound (1.27) yields { 𝑇 𝑡 } 𝐿 1 , 𝐿 1 d e f = 𝐻 𝑘 2 𝑡 ; 2 𝜇 2 𝜇 2 1 + 2 𝜇 2 𝜇 2 . ( 1 . 2 8 ) If we have no information about 𝜇 and 𝜎 2 , then using 𝜎 2 1 / 4 , the bound (1.27) implies { 𝑇 𝑡 } 𝐿 2 , 𝐿 2 d e f = 𝐻 𝑘 1 2 𝑡 ; 3 . ( 1 . 2 9 )

The bounds above do not cover the situation where both 𝜇 and 𝜎 2 are known. To formulate a related result we need additional notation. In case of 0 𝑋 1 we use the notation

𝑓 1 1 = ( 1 𝜇 ) 2 𝜇 , 𝑓 3 1 = 𝜇 𝜇 2 . ( 1 . 3 0 ) In view of the well-known upper bound 𝜎 2 𝜇 ( 1 𝜇 ) for the variance of 0 𝑋 1 , we can partition the set 𝐷 = 𝜇 , 𝜎 2 2 0 𝜇 1 , 0 𝜎 2 𝜇 ( 1 𝜇 ) ( 1 . 3 1 ) of possible values of 𝜇 and 𝜎 2 into a union 𝐷 = 𝐷 1 𝐷 2 𝐷 3 of three subsets 𝐷 1 = 𝜇 , 𝜎 2 𝐷 𝜎 2 𝑓 1 , 𝐷 3 = 𝜇 , 𝜎 2 𝐷 𝜎 2 𝑓 3 , ( 1 . 3 2 ) and 𝐷 2 = 𝐷 ( 𝐷 1 𝐷 3 ) ; see Figure 1.

941936.fig.001
Figure 1: 𝐷 = 𝐷 1 𝐷 2 𝐷 3 .

Theorem 1.3. Write 𝑘 = [ 𝑛 / 2 ] . Assume that 0 𝑋 1 .
The upper tail of the statistic 𝑇 satisfies { 𝑇 𝑡 } 𝑈 3 , 𝑈 3 d e f = 𝐻 𝑘 𝑡 𝜎 2 ; 𝑝 𝑢 ( 1 . 3 3 ) with 𝑝 𝑢 = 𝑠 2 / ( 1 + 𝑠 2 ) , where 𝑠 2 = 𝜎 4 + ( 1 𝜇 ) 4 2 ( 1 𝜇 ) 2 𝜎 2 , i f 𝜇 , 𝜎 2 𝐷 1 , 𝑎 + 𝑏 𝜎 2 + 4 𝜎 4 8 𝜎 4 , i f 𝜇 , 𝜎 2 𝐷 2 , 𝜎 4 + 𝜇 4 2 𝜇 2 𝜎 2 , i f 𝜇 , 𝜎 2 𝐷 3 , ( 1 . 3 4 ) and where one can write 𝑎 = 𝜇 ( 1 𝜇 ) ( 2 𝜇 1 ) 2 , 𝑏 = 8 𝜇 2 8 𝜇 + 3 . ( 1 . 3 5 )
The lower tail of 𝑇 satisfies { 𝑇 𝑡 } 𝐿 3 , 𝐿 3 d e f = 𝐻 𝑘 2 𝑡 1 2 𝜎 2 ; 𝑝 𝑙 ( 1 . 3 6 ) with 𝑝 𝑙 = 𝑠 2 / ( 𝑐 2 + 𝑠 2 ) , where 𝑐 = ( 1 2 𝜎 2 ) / ( 2 𝜎 2 ) , and 𝑠 2 is defined by (1.34).

The bounds above are obtained using the classical transform 𝐺 𝐻 𝐺 , ( 𝐻 𝐺 ) ( 𝑥 ) = i n f < 𝑥 𝔼 e x p { ( 𝑌 𝑥 ) } ( 1 . 3 7 ) of survival functions 𝐺 ( 𝑥 ) = { 𝑌 𝑥 } (cf. definitions (1.13) and (1.14) of the related Hoeffding functions). The bounds expressed in terms of Hoeffding functions have a simple analytical structure and are easily numerically computable.

All our upper and lower bounds satisfy a kind of the Central Limit Theorem. Namely, if we consider an upper bound, say 𝑈 = 𝑈 ( 𝑡 ) (resp., a lower bound 𝐿 = 𝐿 ( 𝑡 ) ) as a function of 𝑡 , then there exist limits l i m 𝑛 𝑈 𝑡 𝑛 = e x p 𝑐 𝑡 2 , l i m 𝑛 𝐿 𝑡 𝑛 = e x p 𝑑 𝑡 2 ( 1 . 3 8 ) with some positive 𝑐 and 𝑑 . The values of 𝑐 and 𝑑 can be used to compare the bounds—the larger these constants, the better the bound. To prove (1.38) it suffices to note that with 𝑘 = [ 𝑛 / 2 ] l i m 𝑛 𝐻 𝑘 𝑥 𝑛 ; 𝑝 = e x p 𝑞 𝑥 2 4 𝑝 . ( 1 . 3 9 ) The Central Limit Theorem in the form of (1.7) restricts the ranges of possible values of 𝑐 and 𝑑 . Namely, using (1.7) it is easy to see that 𝑐 and 𝑑 have to satisfy 𝑐 , 𝑑 𝑎 d e f = 1 2 𝜔 𝜎 4 . ( 1 . 4 0 )

We provide the values of these constants for all our bounds and give the numerical values of them in the following two cases.

(i) 𝑋 is a random variable uniformly distributed in the interval [ 0 , 1 / 2 ] . The moments of this random variable satisfy 1 𝜇 = 4 , 𝜎 2 = 1 , 4 8 𝜇 , 𝜎 2 𝐷 1 1 , 𝜔 = 1 2 8 0 , 𝑎 = 1 4 4 0 . ( 1 . 4 1 ) For 𝜇 , 𝜎 2 , 𝜔 defined by (1.41), the constants 𝑐 and 𝑑 we give as 𝑐 1 , 𝑑 1 .(ii) 𝑋 is uniformly distributed in [ 0 , 1 ] , and in this case 1 𝜇 = 2 , 𝜎 2 = 1 , 1 2 𝜇 , 𝜎 2 𝐷 2 1 , 𝜔 = 8 0 , 𝑎 = 9 0 . ( 1 . 4 2 )

For the constants 𝑐 and 𝑑 with 𝜇 , 𝜎 2 , 𝜔 defined by (1.42) we give as 𝑐 2 , 𝑑 2 .

We have 𝑈 2 : 𝑐 = 4 , 𝑐 1 = 4 , 𝑐 2 𝑈 = 4 , 1 1 : 𝑐 = 2 𝜇 2 𝜇 2 1 2 𝜇 + 2 𝜇 2 , 𝑐 1 = 4 . 2 6 , 𝑐 2 𝑈 = 4 , 0 1 : 𝑐 = 2 𝜎 2 4 𝜎 4 , 𝑐 1 = 2 5 . 0 4 , 𝑐 2 𝑈 = 7 . 2 , 3 1 : 𝑐 = 4 𝜎 4 𝑠 2 , 𝑐 1 = 4 2 . 6 0 , 𝑐 2 1 = 1 8 , ( 1 . 4 3 ) 𝑈 : 𝑐 = 2 𝜎 4 + 2 𝜔 0 , 𝑐 1 = 4 1 1 . 4 2 , 𝑐 2 𝐿 = 2 5 . 7 1 , ( 1 . 4 4 ) 2 : 𝑑 = 2 , 𝑑 1 = 2 , 𝑑 2 𝐿 = 2 , 1 1 : 𝑑 = 2 𝜇 2 𝜇 2 , 𝑑 1 = 2 . 6 6 , 𝑑 2 𝐿 = 2 , 0 1 : 𝑑 = 2 𝜎 2 , 𝑑 1 = 2 4 , 𝑑 2 𝐿 = 6 , 0 1 : 𝑑 = 2 𝜎 2 4 𝜎 4 , 𝑑 1 = 2 5 . 0 4 , 𝑑 2 𝐿 = 7 . 2 , 3 1 : 𝑑 = 4 𝜎 4 𝑠 2 , 𝑑 1 = 4 2 . 6 0 , 𝑑 2 1 = 1 8 , ( 1 . 4 5 ) 𝐿 : 𝑑 = 2 𝜎 4 + 2 𝜔 0 , 𝑑 1 = 4 1 1 . 4 2 , 𝑑 2 = 2 5 . 7 1 , ( 1 . 4 6 ) while calculating the constants in (1.44) and (1.46) we choose 𝜔 0 = 𝜔 . The quantity 𝑠 2 in (1.43) and (1.45) is defined by (1.34).

Conclusions
Our new bounds provide a substantial improvement of the known bounds. However, from the asymptotic point of view these bounds seem to be still rather crude. To improve the bounds further one needs new methods and approaches. Some preliminary computer simulations show that in applications where 𝑛 is finite and random variables have small means and variances (like in auditing, where a typical value of 𝑛 is 6 0 ), the asymptotic behavior is not related much to the behavior for small 𝑛 . Therefore bounds specially designed to cover the case of finite 𝑛 have to be developed.

2. Sharp Upper Bounds for the Fourth Moment

Recall that we consider bounded random variables such that 0 𝑋 1 , and that we write 𝜇 = 𝔼 𝑋 and 𝜎 2 = 𝔼 ( 𝑋 𝜇 ) 2 . In Lemma 2.1 we provide an optimal upper bound for the fourth moment of 𝑋 𝜆 given a shift 𝜆 , a mean 𝜇 , and a variance 𝜎 2 . The maximizers of the fourth moment are either Bernoulli or trinomial random variables. It turns out that their distributions, say 𝜈 , are of the following three types (i)–(iii):

(i)a two point distribution such that 𝜎 𝜈 ( { 𝑑 } ) = 𝑟 , 𝜈 ( { 1 } ) = 𝑝 , 𝑑 = 𝜇 2 , 1 𝜇 ( 2 . 1 ) 𝑟 = ( 1 𝜇 ) 2 ( 1 𝜇 ) 2 + 𝜎 2 𝜎 , 𝑝 = 2 ( 1 𝜇 ) 2 + 𝜎 2 ; ( 2 . 2 ) (ii)a family of three point distributions depending on 1 / 4 < 𝜆 < 3 / 4 such that 𝜈 ( { 0 } ) = 𝑞 , 𝜈 ( { 𝑑 } ) = 𝑟 , 𝜈 ( { 1 } ) = 𝑝 , 𝑑 𝑑 𝜆 1 = 2 𝜆 2 , 𝜎 ( 2 . 3 ) 𝑞 = 2 𝑓 1 𝑑 𝜆 , 𝑟 = 𝜇 ( 1 𝜇 ) 𝜎 2 𝑑 𝜆 1 𝑑 𝜆 𝜎 , 𝑝 = 2 𝑓 3 1 𝑑 𝜆 , ( 2 . 4 ) where we write 𝑓 1 = ( 1 𝜇 ) 𝜇 𝑑 𝜆 , 𝑓 3 𝑑 = 𝜇 𝜆 𝜇 ; ( 2 . 5 ) notice that (2.4) supplies a three-point probability distribution only in cases where the inequalities 𝜎 2 > 𝑓 1 and 𝜎 2 > 𝑓 3 hold;(iii)a two point distribution such that 𝜎 𝜈 ( { 0 } ) = 𝑞 , 𝜈 ( { 𝑑 } ) = 𝑟 , 𝑑 = 𝜇 + 2 𝜇 , 𝜎 ( 2 . 6 ) 𝑞 = 2 𝜇 2 + 𝜎 2 𝜇 , 𝑟 = 2 𝜇 2 + 𝜎 2 . ( 2 . 7 )

Note that the point 𝑑 in (2.2)–(2.7) satisfies 0 𝑑 1 and that the probability distribution 𝜈 has mean 𝜇 and variance 𝜎 2 .

Introduce the set 𝐷 = 𝜇 , 𝜎 2 2 𝜇 = 𝔼 𝑋 , 𝜎 2 = 𝔼 ( 𝑋 𝜇 ) 2 , 0 𝑋 1 . ( 2 . 8 ) Using the well-known bound 𝜎 2 𝜇 ( 1 𝜇 ) valid for 0 𝑋 1 , it is easy to see that 𝐷 = 𝜇 , 𝜎 2 2 0 𝜇 1 , 0 𝜎 2 𝜇 ( 1 𝜇 ) . ( 2 . 9 ) Let 𝜆 . We represent the set 𝐷 2 as a union 𝐷 = 𝐷 𝜆 1 𝐷 𝜆 2 𝐷 𝜆 3 of three subsets setting 𝐷 𝜆 1 = 𝜇 , 𝜎 2 𝐷 𝜎 2 𝑓 1 , 𝐷 𝜆 3 = 𝜇 , 𝜎 2 𝐷 𝜎 2 𝑓 3 , ( 2 . 1 0 ) and 𝐷 𝜆 2 = 𝐷 ( 𝐷 𝜆 1 𝐷 𝜆 3 ) , where 𝑓 1 and 𝑓 3 are given in (2.5). Let us mention the following properties of the regions.

(a)If 𝜆 1 / 4 , then 𝐷 = 𝐷 𝜆 1 since for such 𝜆 obviously 𝜇 ( 1 𝜇 ) 𝑓 1 for all 0 𝜇 1 . The set 𝐷 𝜆 3 = { ( 0 , 0 ) } is a one-point set. The set 𝐷 𝜆 2 is empty.(b)If 𝜆 3 / 4 , then 𝐷 = 𝐷 𝜆 3 since for such 𝜆 clearly 𝜇 ( 1 𝜇 ) 𝑓 3 for all 0 𝜇 1 . The set 𝐷 𝜆 1 = { ( 1 , 0 ) } is a one-point set. The set 𝐷 𝜆 2 is empty.

For 1 / 4 < 𝜆 < 3 / 4 all three regions 𝐷 𝜆 1 , 𝐷 𝜆 2 , 𝐷 𝜆 3 are nonempty sets. The sets 𝐷 𝜆 1 and 𝐷 𝜆 3 have only one common point ( 𝑑 𝜆 , 0 ) 𝐷 , that is, 𝐷 𝜆 1 𝐷 𝜆 3 = { ( 𝑑 𝜆 , 0 ) } .

Lemma 2.1. Let 𝜆 . Assume that a random variable 𝑋 satisfies 0 𝑋 1 , 𝔼 𝑋 = 𝜇 , 𝔼 ( 𝑋 𝜇 ) 2 = 𝜎 2 . ( 2 . 1 1 ) Then 𝔼 ( 𝑋 𝜆 ) 4 𝔼 ( 𝑋 𝜆 ) 4 ( 2 . 1 2 ) with a random variable 𝑋 satisfying (2.11) and defined as follows:
(i)if ( 𝜇 , 𝜎 2 ) 𝐷 𝜆 1 , then 𝑋 is a Bernoulli random variable with distribution (2.2);(ii)if ( 𝜇 , 𝜎 2 ) 𝐷 𝜆 2 , then 𝑋 is a trinomial random variable with distribution (2.4);(iii)if ( 𝜇 , 𝜎 2 ) 𝐷 𝜆 3 , then 𝑋 is a Bernoulli random variable with distribution (2.7).

Proof. Writing 𝑌 = 𝑋 𝜆 , we have to prove that if 𝜆 𝑌 1 𝜆 , 𝔼 𝑌 = 𝜇 𝜆 , 𝔼 ( 𝑌 𝔼 𝑌 ) 2 = 𝜎 2 , ( 2 . 1 3 ) then 𝔼 𝑌 4 𝔼 𝑌 4 ( 2 . 1 4 ) with 𝑌 = 𝑋 𝜆 . Henceforth we write 𝑎 = 𝑑 𝜆 , so that 𝑌 can assume only the values 𝜆 , 𝑎 , 1 𝜆 with probabilities 𝑞 , 𝑟 , 𝑝 defined in (2.2)–(2.7), respectively. The distribution 𝜚 = ( 𝑌 ) is related to the distribution 𝜈 = ( 𝑋 ) as 𝜚 ( 𝐵 ) = 𝜈 ( 𝐵 + 𝜆 ) for all 𝐵 .
Formally in our proof we do not need the description (2.17) of measures 𝜚 satisfying (2.15). However, the description helps to understand the idea of the proof. Let 𝑎 and 𝜎 2 0 . Assume that a signed measure 𝜚 of subsets of is such that the total variation measure 𝜚 + + 𝜚 is a discrete measure concentrated in a three-point set { 𝜆 , 𝑎 , 1 𝜆 } and 𝜚 ( 𝑑 𝑥 ) = 1 , 𝑥 𝜚 ( 𝑑 𝑥 ) = 𝜇 𝜆 , ( 𝑥 𝜇 + 𝜆 ) 2 𝜚 ( 𝑑 𝑥 ) = 𝜎 2 . ( 2 . 1 5 ) Then 𝜚 is a uniquely defined measure such that 𝑞 d e f = 𝜚 ( { 𝜆 } ) , 𝑟 d e f = 𝜚 ( { 𝑎 } ) , 𝑝 d e f = 𝜚 ( { 1 𝜆 } ) ( 2 . 1 6 ) satisfy 𝜎 𝑞 = 2 + ( 𝑎 𝜇 + 𝜆 ) ( 1 𝜇 ) 𝑎 + 𝜆 , 𝑟 = 𝜇 ( 1 𝜇 ) 𝜎 2 𝜎 ( 𝑎 + 𝜆 ) ( 1 𝑎 𝜆 ) , 𝑝 = 2 ( 𝑎 𝜇 + 𝜆 ) 𝜇 1 𝑎 𝜆 . ( 2 . 1 7 ) We omit the elementary calculations leading to (2.17). The calculations are related to solving systems of linear equations.
Let 𝑎 , 𝑏 , 𝑐 . Consider the polynomial 𝑃 ( 𝑡 ) = ( 𝑡 𝑐 ) ( 𝑏 𝑡 ) ( 𝑡 𝑎 ) 2 𝑐 0 + 𝑐 1 𝑡 + 𝑐 2 𝑡 2 + 𝑐 3 𝑡 3 𝑡 4 , 𝑡 . ( 2 . 1 8 ) It is easy to check that 𝑐 3 = 0 𝑏 + 𝑐 + 2 𝑎 = 0 . ( 2 . 1 9 )
The proofs of (i)–(iii) differ only in technical details. In all cases we find 𝑎 , 𝑏 , and 𝑐 (depending on 𝜆 , 𝜇 and 𝜎 2 ) such that the polynomial 𝑃 defined by (2.18) satisfies 𝑃 ( 𝑡 ) 0 for 𝜆 𝑡 1 𝜆 , and such that the coefficient 𝑐 3 in (2.18) vanishes, 𝑐 3 = 0 . Using 𝑐 3 = 0 , the inequality 𝑃 ( 𝑡 ) 0 is equivalent to 𝑡 4 𝑐 0 + 𝑐 1 𝑡 + 𝑐 2 𝑡 2 , which obviously leads to 𝔼 𝑌 4 𝑐 0 + 𝑐 1 ( 𝜇 𝜆 ) + 𝑐 2 𝜎 2 . We note that the random variable 𝑌 assumes the values from the set { 𝑡 𝑃 ( 𝑡 ) = 0 } = 𝑡 𝑐 0 + 𝑐 1 𝑡 + 𝑐 2 𝑡 2 = 𝑡 4 . ( 2 . 2 0 ) Therefore we have 𝔼 𝑌 4 𝑐 0 + 𝑐 1 ( 𝜇 𝜆 ) + 𝑐 2 𝜎 2 = 𝔼 𝑌 4 , ( 2 . 2 1 ) which proves the lemma.
(i)Now ( 𝜇 , 𝜎 2 ) 𝐷 𝜆 1 . We choose 𝑐 = 1 𝜆 and 𝑎 = 𝜇 𝜆 𝜎 2 / ( 1 𝜇 ) . In order to ensure 𝑐 3 = 0 (cf. (2.19)) we have to take 𝑏 = 𝑐 2 𝑎 2 𝜇 1 + 3 𝜆 + 2 𝜎 2 1 𝜇 . ( 2 . 2 2 ) If 𝑏 𝜆 , then 𝑃 ( 𝑡 ) 0 for all 𝜆 𝑡 1 𝜆 . The inequality 𝑏 𝜆 is equivalent to 𝜎 2 1 ( 1 𝜇 ) 𝜇 2 𝜆 + 2 𝑓 1 𝜇 , 𝜎 2 𝐷 𝜆 1 . ( 2 . 2 3 ) To complete the proof we note that the random variable 𝑌 = 𝑋 𝜆 with 𝑋 defined by (2.2) assumes its values in the set { 𝑎 , 1 𝜆 } { 𝑡 𝑃 ( 𝑡 ) = 0 } . To find the distribution of 𝑌 we use (2.17). Setting 𝑎 = 𝜇 𝜆 𝜎 2 / ( 1 𝜇 ) in (2.17) we obtain 𝑞 = 0 and 𝑟 , 𝑝 as in (2.2).(ii)Now ( 𝜇 , 𝜎 2 ) 𝐷 𝜆 2 or, equivalently 𝜎 2 > 𝑓 1 and 𝜎 2 > 𝑓 3 . Moreover, we can assume that 1 / 4 < 𝜆 < 3 / 4 since only for such 𝜆 the region 𝐷 𝜆 2 is nonempty. We choose 𝑐 = 1 𝜆 and 𝑏 = 𝜆 . Then