A Winner's Mean Earnings in Lottery and Inverse Moments of the Binomial Distribution
We study the mean earnings of a lottery winner as a function of the number of participants in the lottery and of the success probability . We show, in particular, that, for fixed , there exists an optimal value of where the mean earnings are maximized. We also establish a relation with the inverse moments of a binomial distribution and suggest new formulas (exact and approximate) for them.
The game of lottery is both popular and simple. Focusing on the essentials, and leaving aside additional features of secondary importance, which vary across different lottery implementations, the rules of the game are as follows: each player submits to the lottery organizers a ticket consisting of integers (selected by the player, without repetitions, selection order being unimportant) from the range ; within the prespecified time period the game is set to last; upon the expiry of this period no more ticket submissions are accepted, and an -tuple of distinct integers (the “winning set”) is selected uniformly at random by the lottery organizers; each submitted ticket gets compared against the winning set, and, if they match, the corresponding player “wins.” This is known as an lottery system, and the winning probability is clearly . The money the winners earn depends on the number of submitted tickets: each submitted -tuple incurs a certain fee (which we assume, without loss of generality, to be equal to 1), and some fixed ratio of the total sum collected (which we assume, again without loss of generality, to be 100%, namely, the entire sum) is returned as prize money back to the winners and equally split among them. As long as no winner is found, earnings of earlier games accumulate until winners are found, who then split equally the entire sum. This is an important feature in the implementation of lottery systems in practice, known as rollover.
For a given success probability , what is the effect of the total number of participants on a winner's mean earnings? This is the object of study of this article. Clearly, more participants lead not only to more prize money, but also to more potential winners. Intuitively, and by the elementary properties of the binomial distribution, we expect that the area of the -plane where is an important borderline: as long as , existence of winners is highly improbable, so most likely the mean earnings will trivially be 0, while, as long as , the law of large numbers applies and suggests that there will be approximately winners; each of when will receive an amount of money equal to . We first analyze the independent draws scenario (without rollover), and then use this result to deduce the corresponding result with rollover.
An additional complication rollover presents that it is not only that the number of participants in the various draws can vary, but also that this variation can potentially exhibit strong correlation, specifically be strictly increasing: as a rule, larger money prizes motivate more extensive participation [1, 2]. A reliable model for this variation can only be obtained through detailed statistical analysis and the psychology of gambling, which both lie beyond the scope of the present article (see, e.g., [1–4] for an analysis of the playing style of lottery participants). Perhaps surprisingly, though, some analysis we performed on Greek lottery data  did not invariably lead to the conclusion that the number of participants in the various games during a rollover round shows a clear upward trend (a similar phenomenon is observed in ). On many occasions, these fluctuations appear indeed to be random and, more importantly, the order of magnitude of the number of participants does not vary. Accordingly, we will derive the general formula for an arbitrary fluctuation of the number of participants during lottery games in a single rollover round, but we will then focus on the special case where this number is constant, as it is not only interesting but also mathematically tractable.
Needless to say, actual lottery implementations are actually more complicated, incorporating a plethora of secondary features. For example, lower winning levels may be introduced, corresponding to partial matches between the tickets and the winning set, and the prize money they win may either be smaller fractions of the total sum or simply a fixed sum. Alternatively, an extra “bonus number” may be introduced, acting, in a sense, like a “second chance”: players are allowed to submit a guess for this number, in addition to their guess for the winning set, while the selection of the winning set is followed by the (also uniformly random amongst the remaining integers) bonus number. If a player's choice has an overlap of integers with the winning set, but the player also guessed correctly the bonus number, a lesser prize is offered to the player. This may, of course, be combined with all lower winning levels, whereby an overlap of integers () plus correct guess of the bonus number leads to a smaller prize compared to an overlap of integers.
To the best of our knowledge, the issue of winners' mean earnings has not been considered in the lottery literature before. It is a problem of mathematical interest, relating lottery to the study of inverse moments of the positive binomial distribution (as will be seen below), and, for this reason, this article can be considered falling into the same group of publications by the present author [6–8], as well as other authors (e.g., [4, 9]), whose objective is to consider nontrivial probability issues related to the lottery. We should perhaps state that the purpose of this article is not to consider winning strategies or to advise a prospective player about the investive value of the lottery. For readers interested in these aspects of the lottery, we recommend the study in , a recent and mathematically sophisticated analysis of the lottery's viability as an investment, and also those in [10, 11].
One final point that needs to be addressed is our choice to assume throughout this article that players choose their numbers independently of each other, namely, that the probability distribution describing the choice of an -tuple of integers by any player is uniform over all possible -tuples, and to actually do so despite the fact that this has been extensively investigated and found, perhaps surprisingly, not to hold in practice [1, 2, 10–13]! Indeed, humans seem to favor some classes of -tuples more than others (for several reasons), and the net effect of this is that the probability distribution of the number of winners depends on the winning set. Lottery organizers seem to encourage this phenomenon, as it leads to more frequent rollovers, which has been documented, at least, in some lotteries (e.g., US and UK), to increase participation [1, 2]. This already suggests a playing strategy: assuming that players have information on which -tuples are “unpopular” and that a player should choose one of those. Indeed, the probability distribution of the winning set is uniform (assuming that the lottery organizers are honest), and, if the chosen “unpopular” -tuple wins, the likelihood of other winners existing will be minimal. In this article, however, we choose to ignore this complication, as mentioned above, on account of three reasons: (a) there appears to be no universal and conclusive statistical evidence to support the position that the skewness present in the players' choices is strong enough to unquestionably reject the validity of the uniform players' choices model, at least as an approximation; (b) as lottery players become more mature and aware, this skewness may progressively disappear (there is already evidence that, in some lotteries, more than 70% of the choices entered are picked randomly by computers ); and (c) as mentioned earlier, this article focuses on the principles of the lottery rather than its actual implementations.
2. Results without Rollover
Consider the lottery game as described in the Introduction: the probability distribution for the number of winners is the binomial . Assuming that winners exist, each one's earnings will be , while, for , there is no winner to earn something, so earnings are conventionally taken to be 0. It follows that the mean earnings are
This expression is essentially the inverse first moment of the positive binomial distribution . Assuming that and setting , and the inverse th moment is defined as Clearly, (2.1) and (2.3) (for ) consist of the same sum (with different multiplicative factors):
The problem of the inverse moments of the positive binomial distribution was first considered by Stephan in 1946 , and has received a fair share of attention in the literature ever since. For example, Grab and Savage  provided a recursive formula in for , which was later generalized by Govindarajulu , while Chao and Strawderman  considered , where such that . Recently, Wuyungaowa and Wang provided asymptotic expansions of .
Theorem 2.1. . Furthermore, setting , For , in particular, , and .
Proof. We can transform (2.1) using integration:
We now use the identity to get
which proves our first claim.
We now approximate the sum in (2.7) by an integral (considering it to be a Riemann sum, or, even better, through the Euler-McLaurin formula ): where But this can be further simplified as implying that
The error of this approximation is readily given by the lowest-order Newton-Cotes numerical integration formula known as the midpoint rule , which, applied in the function at hand, yields where as the second derivative is monotonically increasing, and, therefore, the maximum corresponds to the right endpoint of the interval, namely, . For large , the contribution of the two integral error terms becomes insignificant, while, asymptotically, Assuming further that , we find that , and ; hence the total error is approximately bounded above by This completes the proof.
The approximation (2.11) proved in the theorem above is compared in Figure 1 against the exact (2.6), or, equivalently, (2.7). Our experiments show that for the approximation is almost exact, while for larger values of , where (2.15) does not yet hold, there is noticeable deviation between the two curves.
Figure 1 suggests that, for fixed attains a global maximum for some . We now prove this to be the case.
Theorem 2.2. For fixed attains a global maximum for some .
Proof. To begin with, note that by (2.1). Furthermore, we obtain from (2.7) that
where denotes the th harmonic number. We further break the last sum into two sums:
where is a positive integer to be specified later. Since for , for any there exists a suitable such that
On the other hand, for this specific , and for any , there exists such that, for any ,
Putting (2.17), (2.18), and (2.19) together, we obtain that, for any , there exist and such that, for any , Furthermore, so that, for any such that, for any , Finally, for any , and we know that this decays to 0 much faster than any power: in particular, for any , there exists such that, for any , From (2.16), then, using (2.22) and (2.24), we obtain for , whence, if we ensure that , it follows that On the other hand, To recapitulate, we have shown that , whereas, as , converges to from above: more specifically, we have shown that, for all sufficiently large, Hence, exhibits a global maximum for some value of , for fixed , and this concludes the proof.
Alternatively, the eventual positivity of also follows immediately from the more general asymptotic formulas presented in . Having now demonstrated that has a global maximum, the next step is to locate and calculate its value. To do this, we do not work on directly, but rather on the approximation , derived in Theorem 2.1.
Theorem 2.3. For fixed , is maximized for , and . Furthermore, is maximized for , and .
In particular, the maximal mean earnings possible are about 29.5% above the asymptotic value, and about 31% of the total prize money.
Proof. Considering to be a continuous quantity, we determine the global maximum of from the condition :
where . This equation can be solved numerically to yield . It follows that the optimal is asymptotically given by the formula
The value of the maximum, consequently, is
What about maximizing the relative mean earnings ? Going back to (2.11), we need to determine the roots of the derivative of : Solving numerically we obtain , whence This completes the proof.
As a brief numerical example, consider the Greek lottery, a lottery system, for which . Theorem 2.3 shows that is maximized for million, while is maximized for million. The actual number of tickets normally played, however, is significantly lower, ranging from 1 to 8 million , so, for the given level of participation, a system is not optimal.
Before we conclude this section, let us investigate the variance of a winner's earnings. Our goal is to obtain a similar formula as the one presented in Theorem 2.1, and then to generalize it into a formula for any . We begin by calculating the mean square earnings
It follows that the variance is
Asymptotics can be found as in Theorem 2.1:
The generalization is clear. For any , while the asymptotic expression as becomes
The relation to the inverse moments follows from the generalization of (2.4)
3. Results with Rollover
When rollover is introduced, a sequence of lottery draws is played till a winner is found, at which point the total prize money (consisting of the prize money of the current draw plus the jackpot, namely, the accumulated prize money over the previous draws) is equally split among the winners. We assume that the (infinite) sequence of the number of participants in the various draws is known and fixed, and we denote by the mean earnings in the th lottery draw, assuming that no winner was found in the first draws and that a winner was found in the th draw. It is clear that this quantity depends not on the full but only on its starting subsequence ; hence, for example, of the previous section equals for any with .
We denote the mean earnings under rollover by . Letting be the random variable denoting, for a fixed , the number of draws played till a winner is found, then
It is, in fact, possible to determine the probability distribution of : if and only if no winner is found in the first draws and a winner is found in the th draw. Taking into account that the probability that no winner is found in the th draw is for , we immediately find Furthermore, revisiting (2.11) in Theorem 2.1, we see that the first factor of in the equation denotes the prize money; whereas every other occurrence of is within the expression and denotes the number of tickets played; whence, in the present case, the asymptotic expression for is given by To sum up, the asymptotic expression for is given by
It would be, of course, far more interesting to enrich the model by allowing to be random, following some infinite-dimensional discrete probability distribution , in which case the averaged would be
In general, (3.4) cannot be simplified further in a meaningful way. This, however, is possible in the special case, where, for some for all , in which case we may substitute the vector appearing in the preceding expressions by the value of its first coordinate (since all coordinates are equal) and prove the following theorem, corresponding to Theorem 2.3.
Theorem 3.1. Assuming that the number of tickets submitted remains constant throughout rollover draws, the asymptotic expression for as and is This function is maximized at and .
In particular, the maximal mean earnings possible are about 32% above the asymptotic value, and about 35% of the (first draw) total prize money.
Proof. Under the stated assumptions, (3.4) becomes
and this proves our first claim.
We now note that the asymptotic behavior of and is identical as , while ( is treated here as a continuous quantity). As in Theorem 2.2, it follows that has a global maximum, which, as before, we locate through the condition : This equation can be solved numerically to yield , whence The value of the maximum, consequently, is This completes the proof.
Continuing the numerical example for the Greek lottery given right below Theorem 2.3, is maximized for million, which is also significantly higher than the actual number of tickets normally played.
We may attempt, as in Theorem 2.3, to maximize the relative (to the prize money accumulated in the first draw) mean earnings. To do this, we refer back to (3.7) and ask for the roots of the derivative of : This equation, however, has no root! It turns out that is strictly decreasing. A more appropriate quantity to consider is the mean earnings relative to the mean accumulated prize money whence it follows that .
Figure 2 plots , and for .
We investigated the mean earnings of a lottery winner as a function of the number of players participating in a lottery with success probability , under the assumption that the total amount of money offered as a prize to the winners is equal (or, in general, linearly proportional) to . We considered two versions of the lottery game, namely, with and without rollover. We concluded that both with and without rollover, both the absolute mean earnings and the fraction of the mean earnings over the total accumulated sum for an individual winner are maximized when equals a certain constant (different, in general, for the four different stated cases), which we determined numerically. In the course of our investigation, we linked the mean earnings formula to the classical problem of determining the inverse moments of a binomial distribution, offering for them novel formulas, both exact and approximate.
The author would like to thank the two anonymous referees for their detailed and insightful comments, which greatly improved this paper; in particular, for bringing to the author's attention that the problem studied here is related to the classical problem of determining the inverse moments of the binomial distribution, and for providing an extensive list of additional references.
J. Simon, “An analysis of the distribution of combinations chosen by UK National Lottery players,” Journal of Risk and Uncertainty, vol. 17, no. 3, pp. 243–276, 1998.View at: Google Scholar
L. DeBoer, “Lotto sales stagnation: product maturity or small jackpots?” Growth and Change, vol. 21, no. 1, pp. 73–77, 2006.View at: Google Scholar
N. Henze and H. Riedwyl, How to Win More: Strategies for Increasing a Lottery Win, A.K. Peters, Natick, Mass, USA, 1998.
K. Drakakis, “A note on the appearance of consecutive numbers amongst the set of winning numbers in Lottery,” Facta Universitatis: Mathematics and Informatics, vol. 22, no. 1, pp. 1–10, 2007.View at: Google Scholar
K. Drakakis, “On the maximal distance between consecutive choices in the set of winning numbers in Lottery,” Applied Mathematical Sciences, vol. 3, no. 55, pp. 2725–2738, 2009.View at: Google Scholar
K. Drakakis and K. Taylor, “A statistical test to detect tampering with lottery results,” in Proceedings of the 2nd International Conference on Mathematics in Sport, Groningen, The Netherlands, 2009.View at: Google Scholar
N. Henze, “The distribution of spaces on lottery tickets,” Fibonacci Quarterly, vol. 33, pp. 426–431, 1995.View at: Google Scholar
H. Joe, “A winning strategy for lotto games?” The Canadian Journal of Statistics, vol. 18, pp. 233–244, 1990.View at: Google Scholar
H. Joe, “An ordering of dependence for distribution of k-tuples, with applications to lotto games,” The Canadian Journal of Statistics, vol. 15, pp. 227–238, 1987.View at: Google Scholar
H. Joe, “Tests of uniformity for sets of lotto numbers,” Statistics and Probability Letters, vol. 16, no. 3, pp. 181–188, 1993.View at: Google Scholar
F. F. Stephan, “The expected value and variance of the reciprocal and other negative powers of a positive Bernoullian variate,” Annals of Mathematical Statistics, vol. 16, pp. 50–61, 1946.View at: Google Scholar
E. L. Grab and I. R. Savage, “Tables of the expected value of for positive Bernoulli and Poisson variables,” Journal of the American Statistical Association, vol. 49, pp. 169–177, 1954.View at: Google Scholar
Z. Govindarajulu, “Recurrence relations for the inverse moments of the positive binomial variable,” Journal of the American Statistical Association, vol. 58, pp. 463–473, 1963.View at: Google Scholar
M. Y. Chao and W. E. Strawderman, “Negative moments of positive random variables,” Journal of the American Statistical Association, vol. 67, pp. 429–431, 1972.View at: Google Scholar
R. Graham, D. Knuth, and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, Addison-Wesley, Harlow, UK, 2nd edition, 1994.
E. Isaakson and H. Keller, Analysis of Numerical Methods, Dover, New York, NY, USA, 1994.