Abstract

The paper is a contribution to the problem of estimating the deviation of two discrete probability distributions in terms of the supremum distance between their generating functions over the interval . Deviation can be measured by the difference of the th terms or by total variation distance. Our new bounds have better order of magnitude than those proved previously, and they are even sharp in certain cases.

1. Introduction

Dealing with random combinatorial structures needs to estimate the deviation of two discrete probability distributions in terms of the maximal distance between their generating functions over . This is often the case when given a couple of not necessarily independent random events, one needs to estimate the number of those that occur. Among many popular methods of Poisson approximation, sieve methods with generalized Bonferroni bounds, such as the graph sieve [1], are at hand. They provide estimates not only for the probability that none of the events occur, but also for the difference between the generating function of and that of the corresponding Poisson distribution over the interval (see [2] for more details). This raises the following problem.

Let and be discrete probability distributions with generating functions and , respectively. Given , estimate the distance of th terms, or the total variation distance between and defined as .

The difficulty lies in the constraint that the difference of the generating functions is only available over the real interval , and not over the whole complex unit disc, which would make it possible to apply standard methods of characteristic functions.

Several positive and negative results were achieved in the last three decades, beginning with [3]; see Section 2. Lower and upper estimates got closer and closer, but the final answer is still ahead. The aim of the present note is to provide new bounds that have better order of magnitude than those proved previously. They are even sharp in certain cases.

The paper is organized as follows. In Section 2 we introduce the necessary notions and notations and cite some earlier results. Section 3 is devoted to the case where is estimated in terms of , while in Section 4 the total variation distance is treated.

2. Preliminaries

The difference of two probability generating function belongs to the following class. Let denote the set of all functions such thatOn the other hand, every can be obtained as the difference of two probability generating functions, namely, , where and can be constructed in the following way. Let (note that for ) such thatwhere , . Then setFor let , and . For and define

In [2] it was shown that, for every ,holds with a suitable positive constant , if is sufficiently small. In fact, the upper estimate is valid for every , but it becomes trivial for by elementary calculus. Though in both the upper and lower bounds the multiplier of is slowly varying as , they are not of the same order of magnitude.

It is easy to see that cannot be estimated in a nontrivial way, because for arbitrary there exists a function , such that and , maximal. Indeed, let andThen for , becauseby the choice of , provided . For the estimation obviously holds. Hence .

However, if , and one of the distributions, say , is fixed (as in the case of Poisson approximation), then the class of feasible functions is smaller; thus the bounds for may decrease, and even the total variation distance can be estimated nontrivially. The following results can be found in [2].

Let be a fixed discrete probability distribution such that for every , and

Let be a positive integer and a sufficiently small positive constant. Then for every sufficiently small positive there exists a discrete probability distribution , such that , and

If the tail of is lighter than exponential, the lower estimate decreases.

Instead of (8), suppose thatis positive and finite, where is a positive, continuous, and increasing function, regularly varying at infinity, and . Let be a positive integer and a sufficiently small positive constant. Then for every sufficiently small positive there exists a discrete probability distribution , such that , and

Particularly, when is the Poisson distribution with parameter , it follows that for every sufficiently small positive there exists a discrete probability distribution , such that , and(The constant does not depend on and . The parameter only appears in the bounds implicit in the phrase “sufficiently small.”)

Let us turn to the case of total variation. In [2] for every fixed an increasing function was constructed in such a way that and for arbitrary . However, apart from the case where the tail of was extremely light, the function proved to be slowly varying at , which is just a little bit better than nothing. For example, in the case of Poissonian , the following inequality was obtained:as varies in such a way that .

Since , every lower estimate obtained for fixed will do for the total variation. However, if does not decrease faster than exponential, that is, condition (8) is fulfilled, there is a lower estimate of the formwith an exponent depending on .

When the tail of is lighter than exponential, namely, condition (10) holds, then for every sufficiently small positive the following lower estimate is valid.with a constant depending on . Particularly, in the case where is Poisson,was proved.

3. Estimation for the Difference of th Terms

The following important result can be traced back to Markoff, 1892 [4], who dealt with the extremal properties of Chebyshev polynomials over the interval ; see Chapter 2 of [5]. The proof can be found in [6] or in [7].

Theorem 1. Let be a polynomial of degree less than or equal to , and . Then

Using this result an upper bound can be proved without any restriction on , which is of the same order as the lower bound in the left-hand side of (5)

Theorem 2. Let . Then

Proof. Suppose first that . Let and . Then , andHere ; therefore the right-hand side is less than . By Theorem 1 we then haveas claimed.
If , then the upper bound, being greater than , is trivial. Indeed, in that interval, is decreasing, hence the upper bound attains its minimum at . Then its value isHere the right-hand side is equal to for and to for . Stepping further from to , the right-hand side gets multiplied bywhere we used the fact that is increasing; hence it is not less than .
Finally, it is easy to see that is decreasing for , from which the second inequality follows.

If satisfies (8), that is, cannot decrease faster than exponential, then (9) implies that the estimate of Theorem 2 is sharp in the order of magnitude.

Theorem 3. Let , where and are discrete probability distributions. Suppose thatwhere is a positive, continuous, and strictly decreasing function tending to zero. Thenfor every .

Proof. Let , and introduceThenHenceLet , and . Let us approximate with its Lagrange interpolation polynomial over the uniform grid . That is,Then has zeros in the interval ; hence, there exists a such thatSince is increasing in the interval , and it is equal to , we have that for all . Moreover, ; thusFrom all these it follows thatfor . Applying Theorem 1 in the same way as in the proof of Theorem 2 we get thatif . Investigating the ratio of consecutive terms one can easily see that

Remark 4. Let us write in exponential form: , where is nonnegative, continuous, and tends increasingly to infinity. Then . The estimate of Theorem 3 is better than that of Theorem 2 in the order of magnitude only if the distribution has a tail lighter than exponential; that is, .

Remark 5. If is bounded away from zero and infinity as well, then both the upper estimate of Theorem 3 and the lower estimate of (11) are applicable and they are of the same order of magnitude. Use the fact that for regularly varying functions of order we have as .

Remark 6. Reference [6] proved similar bounds with different constants, but of the same order of magnitude. However, they imposed conditions similar to (10) on the sequence , rather than on , which is less useful for applications in probability. Besides, for the estimate of Theorem 2, which is true without any restriction on the coefficients, they needed exponential decay of the sequence .

Remark 7. If is linear, that is, , and it holds with a sufficiently large ( will do), the upper bound of Theorem 3 is better for than that of Theorem 2. (For the bound is obviously the best possible.)

Particularly, let be the Poisson distribution with mean . Then, for , we have(see Theorem A.15 in [8]); hence , andas . In addition, eventually. Let us plug this back into Theorem 3 to get the following estimate.

Corollary 8. If is Poisson, then, uniformly in , one hasfor every , if is sufficiently small.

Note that the order of this upper bound is the same as that of the lower bound (12).

4. Estimation for the Total Variation Distance

Let again , where and are discrete probability distributions. As we have seen, if nothing is known about and , it is impossible to give a nontrivial upper bound for the total variation distanceHowever, when is fixed, the situation is completely different.

Theorem 9. Let , where satisfies condition (23). Thenwhere .

The right-hand side tends to as only if the tail of is not too heavy; namely, .

The method of proof will be applied a couple of times in the sequel with different parameters. Therefore we formulate its essence in a separate lemma as a master inequality.

Lemma 10. Let , , and be positive real numbers; . SupposeThen

Proof. Let , then . Clearly,By supposition we haveHence

Proof of Theorem 9. Starting from Theorem 3, let us apply Lemma 10 with , , and . We get that

Particularly, let be the Poisson distribution with mean . As we have seen in (35),Let us plug this into Theorem 9. Writing in place of in the exponent we can get rid of the term , and even the multiplier gets eventually absorbed. Thus we obtain the following estimate.

Corollary 11. Let be Poisson; then, uniformly in , one hasif is sufficiently small.

This is already similar to the lower bound (16), and is much better than (13).

If the tail of is subexponential, that is, , then the estimate of Theorem 3 is useless: it tends to infinity as . However, with suitably chosen parameters in the master inequality, a reasonable upper bound can be obtained: not really sharp, but at least not trivial.

Theorem 12. Suppose . Then

Proof. This time we use Theorem 2. Let , , and . Then Lemma 10 gives thatSince , the first term in the right-hand side can be estimated by a positive power of , while the second term decreases slower than any positive power of , as ; thus it will eventually dominate.

Corollary 13. If the th moment of is finite, then ; thus

Finally, let us deal with the case of exponentially decaying tails. Let , with . Then the total variation can be estimated by a positive power of . This follows from Theorem 9, but only for . The following theorem is valid for all positive .

Theorem 14. Suppose , with . Then

As not sharp, the concrete form of the exponent holds no interest; what really matters is that it is positive and tends to as . Note that if tends to at an exponential rate, (14) provides a lower bound which is also a positive power of (from Theorem  2.4 of [2] it follows that can be used for exponent).

Proof. Firstly, suppose . Based on Theorem 2, apply Lemma 10 with , , and :Let us deal with the first term on the right-hand sideSince , it follows thatSecondly, let . Now we apply Theorem 3. Note that , and set , , and . Then we obtain thatAgain, the first term can be estimated by the second one, becauseand . Thus, it follows thatfor , as needed.

Competing Interests

The author declares that there are no competing interests regarding the publication of this paper.

Acknowledgments

This research has been supported by the Hungarian Scientific Research Fund OTKA, Grant no. K 108615. The idea of the proof of Theorem 3 is due to Gábor Halász, whom the author is indebted to for his invaluable help.