Research Article | Open Access

Volume 2009 |Article ID 970284 | https://doi.org/10.1155/2009/970284

Werner Hürlimann, "Generalizing Benford's Law Using Power Laws: Application to Integer Sequences", International Journal of Mathematics and Mathematical Sciences, vol. 2009, Article ID 970284, 10 pages, 2009. https://doi.org/10.1155/2009/970284

# Generalizing Benford's Law Using Power Laws: Application to Integer Sequences

Academic Editor: Kenneth Berenhaut
Received25 Mar 2009
Revised16 Jul 2009
Accepted19 Jul 2009
Published10 Aug 2009

#### Abstract

Many distributions for first digits of integer sequences are not Benford. A simple method to derive parametric analytical extensions of Benford's law for first digits of numerical data is proposed. Two generalized Benford distributions are considered, namely, the two-sided power Benford (TSPB) distribution, which has been introduced in Hürlimann (2003), and the new Pareto Benford (PB) distribution. Based on the minimum chi-square estimators, the fitting capabilities of these generalized Benford distributions are illustrated and compared at some interesting and important integer sequences. In particular, it is significant that much of the analyzed integer sequences follow with a high -value the generalized Benford distributions. While the sequences of prime numbers less than 1000, respectively, 10 000 are not at all Benford or TSPB distributed, they are approximately PB distributed with high -values of 93.3% and 99.9% and reveal after a further deeper analysis of longer sequences a new interesting property. On the other side, Benford's law of a mixing of data sets is rejected at the 5% significance level while the PB law is accepted with a 93.6% -value, which improves the -value of 25.2%, which has been obtained previously for the TSPB law.

#### 1. Introduction

Since Newcomb  and Benford  it is known that many numerical data sets follow Benford’s law or are closely approximated by it. To be specific, if the random variable , which describes the first significant digit in a numerical table, is Benford distributed, then

Mathematical explanations of this law have been proposed by Pinkham , Cohen , Hill , Allart , Janvresse and de la Rue , and Kossovsky . The latter author has raised some conjectures, which have been proved in some special cases by Jang et al. . Other explanations of the prevalence of Benford’s law exist. For example, Miller and Nigrini  obtain it through the study of products of random variables and Kafri  through the maximum entropy principle. In the recent years an upsurge of applications of Benford’s law has appeared, as can be seen from the compiled bibliography by Hürlimann  and the recent online bibliography by Berg and Hill . Among them one might mention Judge and Schechter , Judge et al. , and Nigrini and Miller . As in the present paper, the latter authors also consider power laws.

Hill  also suggested to switch the attention to probability distributions that follow or closely approximate Benford’s law. Papers along this path include Leemis et al.  and Engel and Leuenberger . Some survival distributions, which satisfy exactly Benford’s law, are known. However, there are not many simple analytical distributions, which include as special case Benford’s law. Combining facts from Leemis et al.  and Dorp and Kotz  such a simple one-parameter family of distributions has been considered in Hürlimann . In a sequel to this, a further generalization of Benford’s law is considered.

It is important to note that many distributions for first digits of integer sequences are not Benford but are power laws or something close. Thus there is a need for statistical tests for analyzing such hypotheses. In this respect the interest of enlarged Benford laws is twofold. First, parametric extensions may provide a better fit of the data than Benford’s law itself. Second, they yield a simple statistical procedure to validate Benford’s law. If Benford’s model is sufficiently “close” to the one-parameter extended model, then it will be retained. These points will be illustrated through our application to integer sequences.

#### 2. Generalizing Benford’s Distribution

If denotes a random lifetime with survival distribution , then the value of the first significant digit in the lifetime has the probability distribution

Alternatively, if denotes the integer-valued random variable satisfying

then the first significant digit can be written in terms of and as

where denotes the greatest integer less than or equal to . In particular, if the random variable is uniformly distributed as , then the first significant digit is exactly Benford distributed. Starting from the uniform random variable or the triangular random variable with probability density function if and if , one shows that the random lifetime generates the first digit Benford distribution (Leemis et al. [21, Examples  1 and  2]).

A simple parametric distribution, which includes as special cases both the above uniform and triangular distributions, is the twosided power random variable considered in Dorp and Kotz  with probability density function

If then , and if then . This observation shows that the random lifetime will generate first digit distributions closely related to Benford’s distribution, at least if is close to 1 or 2.

Theorem 2.1. Let be the twosided power random variable with probability density function and let the integer-valued random variable satisfy . Then the first digit random variable has the one-parameter twosided power Benford (TSPB) probability density function

Proof. This has been shown in Hürlimann .

#### 3. From the Geometric Brownian Motion to the Pareto Benford Law

Another interesting distribution, which also takes the form of a twosided power law, is the double Pareto random variable considered in Reed  with probability density function

Recall the stochastic mechanism and the natural motivation, which generates this distribution. It is often assumed that the time evolution of a stochastic phenomena involves a variable but size independent proportional growth rate and can thus be modeled by a geometric Brownian motion (GBM) described by the stochastic differential equation

where is the increment of a Wiener process. Since the proportional increment of a GBM in time has a systematic component and a random white noise component , GBM can be viewed as a stochastic version of a simple exponential growth model. The GBM has long been used to model the evolution of stock prices (Black-Scholes option pricing model), firm sizes, city sizes, and individual incomes. It is well known that empirical studies on such phenomena often exhibit power-law behavior. However, the state of a GBM after a fixed time  follows a lognormal distribution, which does not exhibit power-law behavior.

Why does one observe power-law behavior for phenomena apparently evolving like a GBM? A simple mechanism, which generates the power-law behavior in the tails, consists to assume that the time of observation itself is a random variable, whose distribution is an exponential distribution. The distribution of with fixed initial state is described by the double Pareto distributio n   with density function (3.1), where , and are the positive roots of the characteristic equation

where is the parameter of the exponentially distributed random variable . Setting one obtains the following generalized Benford distribution.

Theorem 3.1. Let be the double Pareto random variable with probability density function

Let the integer-valued random variable satisfy . Then the first digit random variable has the two-parameter Pareto Benford (PB) probability density function

Proof. The probability density function of is given by It follows that the first significant digit of , namely, , has probability density Making the change of variable , one obtains (3.5) as follows:

One notes that setting and letting goes to infinity, the Pareto Benford distribution converges to Benford’s law. Other important paper, which links Benford’s law to GBMs’ law on the one side, is Kontorovich and Miller  and to Black-Scholes’ law on the other side is Schürger . Another law, which includes as a special case the Benford law, is the Planck distribution of photons at a given frequency, as shown recently by Kafri [28, 29].

#### 4. Fitting the First Digit Distributions of Integer Sequences

Minimum chi-square estimation of the generalized Benford distributions is straightforward by calculation with modern computer algebra systems. The fitting capabilities of the new distributions are illustrated at some interesting and important integer sequences. The first digit occurrences of the analyzed integer sequences are listed in Table 1. The minimum chi-square estimators of the generalized distributions as well as an assumed summation index for the infinite series (3.5) are displayed in Table 2. Statistical results are summarized in Table 3. For comparison we list the chi-square values and their corresponding -values. The obtained results are discussed.

 Name of sequence Sample size Percentage of first digit occurrences 1 2 3 4 5 6 7 8 9 Benford law 30.1 17.6 12.5 9.7 7.9 6.7 5.8 5.1 4.6 Square 100 21.0 14.0 12.0 12.0 9.0 9.0 8.0 7.0 8.0 Cube 500 28.2 14.8 11.4 9.8 8.8 7.8 6.6 6.8 5.8 Cube 1000 22.6 15.9 12.4 10.6 9.4 8.3 7.4 7.1 6.3 Cube 10000 22.5 15.8 12.6 10.6 9.3 8.3 7.5 7.0 6.4 Square root 99 19.2 17.2 15.2 13.1 11.1 9.1 7.1 5.1 3.0 Prime < 100 25 16.0 12.0 12.0 12.0 12.0 8.0 16.0 8.0 4.0 Prime < 1000 168 14.9 11.3 11.3 11.9 10.1 10.7 10.7 10.1 8.9 Prime < 10000 1229 13.0 11.9 11.3 11.3 10.7 11.0 10.2 10.3 10.3 Princeton number 25 28.0 8.0 12.0 12.0 8.0 12.0 8.0 4.0 8.0 Mixing sequence 618 28.3 14.6 11.5 9.9 7.6 7.8 8.1 6.6 5.7 Pentagonal number 100 35.0 12.0 10.0 8.0 10.0 6.0 8.0 5.0 6.0 Keith number 71 32.4 14.1 14.1 7.0 4.2 7.0 12.7 2.8 5.6 Bell number 100 31.0 15.0 10.0 12.0 10.0 8.0 5.0 6.0 3.0 Catalan number 100 33.0 18.0 11.0 11.0 8.0 8.0 4.0 3.0 4.0 Lucky number 45 42.2 17.8 8.9 4.4 2.2 6.7 8.9 2.2 6.7 Ulam number 44 45.5 13.6 6.8 6.8 4.5 6.8 4.5 6.8 4.5 Numeri ideoni 65 30.8 18.5 13.8 10.8 6.2 3.1 7.7 6.2 3.1 Fibonacci number 100 30.0 18.0 13.0 9.0 8.0 6.0 5.0 7.0 4.0 Partition number 94 28.7 17.0 14.9 9.6 7.4 6.4 7.4 5.3 3.2
 Name of sequence Sample size TSPB PB Parameter Parameters c alpha beta m Square 100 0.79837 15.55957 1.74552 100 Cube 500 2.46519 5.55849 1.69860 100 Cube 1000 2.26798 20.56506 1.47082 100 Cube 10000 2.27054 20.53577 1.475760 100 Square root 99 1.40176 89491723 1.34334 100 Prime < 100 25 2.68581 23.13952 2.14449 100 Prime < 1000 168 2.95216 22.99754 2.28436 100 Prime < 10000 1229 3.03542 29.76729 2.30760 100 Princeton number 25 2.76170 6.94595 2.36119 100 Mixing sequence 618 2.53958 4.78641 1.83119 100 Pentagonal number 100 2.94847 2.06797 3.31268 100 Keith number 71 2.73338 2.16107 2.63720 1000 Bell number 100 1.08191 10.14820 1.24828 100 Catalan number 100 1.13522 0.67095 1.15377 5000 Lucky number 45 3.15721 7.56962 0.94576 100 Ulam number 44 3.55375 9.99445 0.81215 100 Numeri ideoni 65 1.12410 1297612.16 0.98591 100 Fibonacci number 100 2.05365 257000.42 1.00560 100 Partition number 94 1.23268 0.65651 1.71409 1000
 Name of sequence Sample size Benford Twosided Power Benford Pareto Benford chi-square -value chi-square -value chi-square -value Square 100 9.096 33.43 7.837 34.72 0.362 99.91 Cube 500 9.696 28.70 5.808 56.23 0.286 99.96 Cube 1000 46.459 0.00 43.725 0.00 0.48 99.81 Cube 10000 443.745 0.00 472.011 0.00 3.138 79.13 Square root 99 8.612 37.61 7.002 42.86 2.778 83.61 Prime < 100 25 7.741 45.91 7.299 39.84 1.849 93.30 Prime < 1000 168 45.016 0.00 36.651 0.00 0.333 99.93 Prime < 10000 1229 387.194 0.00 307.322 0.00 3.297 77.07 Princeton number 25 3.452 90.29 2.762 89.72 1.302 97.16 Mixing sequence 618 15.550 4.93 9.014 25.17 1.819 93.55 Pentagonal number 100 5.277 72.76 2.127 95.24 1.968 92.26 Keith number 71 9.215 32.45 7.688 36.09 7.402 28.53 Bell number 100 3.069 93.00 3.014 88.37 2.607 85.63 Catalan number 100 2.404 96.61 2.304 94.11 1.934 92.57 Lucky number 45 7.693 46.40 5.165 63.98 5.564 47.37 Ulam number 44 6.350 60.81 2.520 92.56 2.526 86.56 Numeri ideoni 65 2.594 95.72 2.522 92.54 2.584 85.89 Fibonacci number 100 1.029 99.81 1.021 99.45 1.027 98.46 Partition number 94 1.394 99.43 1.132 99.24 1.513 95.86

The definition, origin, and comments on the mathematical interest of a great part of these integer sequences have been discussed in Hürlimann . Further details on all sequences can be retrieved from the considerable related literature. The mixing sequence represents the aggregate of the integer sequences considered in Hürlimann .

All of the 19 considered integer sequences are quite well fitted by the new PB distribution. For 14 sequences the minimum chi-square is the smallest among the three comparative values and in the other 5 cases its value does not differ much from the chi-square of the TSPB distribution ( bold cells in Table 3 and Table 5).

A strong numerical evidence for the Benford property for the Fibonacci, Bell, Catalan, and partition numbers is observed (corresponding italic cells in Tables 2 and 3). In particular, the values of the parameters of the BP distribution for the Fibonacci sequence are close to 1 and , which means that the BP distribution is almost Benford as remarked after Theorem 3.1. It is well known that the Fibonacci sequence is Benford distributed (e.g., Brown and Duncan , Wlodarski , Sentance , Webb , Raimi (1976),  Brady  and Kunoff ).The same result for Bell numbers has been derived formally in Hürlimann [24, Theorem  4.1]. More generally, a proof that a generic solution of a generic difference equation is Benford is found in Miller and Takloo-Bighash  (see also Jolissaint [38, 39]). Results for squares and cubes are also obtained. Recall that the exact probability distribution of the first digit of th integer powers with at most digits is known and asymptotically related to Benford’s law (e.g., Hürlimann ). The fit of the PB distribution is very good when restricted to finite sequences but breaks down for longer sequences. A further remarkable result is that Benford’s law of the mixing sequence is rejected at the 5% significance level while the PB law is accepted with a 93.6% -value, which improves the -value of 25.2% obtained for the TSPB law in Hürlimann .

The sequence of primes merits a deeper analysis. The Benford property for it has long been studied. Diaconis (1977)  shows that primes are not Benford distributed. However, it is known that the sequence of primes is Benford distributed with respect to other densities rather than with the usual natural density . According to Serre [45, Page76], , Bombieri has noted that the analytical density of primes with first digit 1 is , and this result can be easily generalized to Benford behavior for any first digit. Table 3 shows that the primes less than 1,000 respectively 10 000 are not at all Benford or TSPB distributed, but they are approximately PB distributed with high -values of 93.3% and 99.9%. Does this statistical result reveal a new property of the prime number sequence? To answer this question it is necessary to take into account longer sequences and look at other cutoffs than for an integer k . Our calculations show that among those prime sequences below for fixed k there is exactly one sequence with minimum chi-square value with an optimal cutoff at a prime with first digit 9. Tables 4 and 5 summarize our results for the primes up to . Besides the PB best fit with minimum chi-square we also list the PB “linear best” fit obtained from the PB best fit by taking a linear decreasing number of primes between those with the same number of primes with first digit 1 and 9 as in the PB best fit. Though the P-value goes to zero very rapidly the ratio of the minimum chi-square value to the sample size is more stable. For the PB linear best fit this goodness-of-fit statistic, which is also considered in Leemis et al. , even decreases and indicates therefore that the first digits of the prime number sequence might be distributed this way. For this it remains to test using more powerful computing whether the mentioned property still holds for even longer sequences of primes. One observes that the best fit parameters as the sample size increases to infinity are quite stable and increase only slightly.

 Sample size First digit occurrences 1 2 3 4 5 6 7 8 9 25 4 3 3 3 3 2 4 2 1 168 25 19 19 20 17 18 18 17 15 1216 160 146 139 139 131 135 125 127 114 9486 1193 1129 1097 1069 1055 1013 1027 1003 900 77736 9585 9142 8960 8747 8615 8458 8435 8326 7468 657934 80020 77025 75290 74114 72951 72257 71564 71038 63675 5701502 686048 664277 651085 641594 633932 628206 622882 618610 554868
 Sample size PB Parameters PB best fit PB linear best fit alpha beta chi-square/sample size -value chi-square/sample size -value 25 23.13952 2.14449 7.396% 93.30 8.407% 91.01 168 22.99754 2.28436 0.198% 99.93 0.781% 97.10 1216 30.15504 2.25800 0.175% 90.76 0.152% 93.34 9486 32.59544 2.28442 0.172% 1.20 0.084% 23.86 77736 33.26550 2.31262 0.175% 0.00 0.075% 0.00 657934 33.82622 2.32908 0.185% 0.00 0.070% 0.00 5701502 34.28132 2.34148 0.188% 0.00 0.065% 0.00

Finally, it might be worthwhile to mention another recent intriguing result by Kafri , which shows that digits distribution of prime numbers obeys the Planck distribution, which is another generalized Benford law as already mentioned at the end of Section 3.

1. S. Newcomb, “Note on the frequency of use of the different digits in natural numbers,” American Journal of Mathematics, vol. 4, no. 1–4, pp. 39–40, 1881. View at: Publisher Site | Google Scholar | MathSciNet
2. F. Benford, “The law of anomalous numbers,” Proceedings of the American Philosophical Society, vol. 78, pp. 551–572, 1938. View at: Google Scholar | Zentralblatt MATH
3. R. S. Pinkham, “On the distribution of first significant digits,” Annals of Mathematical Statistics, vol. 32, pp. 1223–1230, 1961.
4. D. I. A. Cohen, “An explanation of the first digit phenomenon,” Journal of Combinatorial Theory. Series A, vol. 20, no. 3, pp. 367–370, 1976.
5. T. P. Hill, “Base-invariance implies Benford's law,” Proceedings of the American Mathematical Society, vol. 123, no. 3, pp. 887–895, 1995.
6. T. P. Hill, “The significant-digit phenomenon,” The American Mathematical Monthly, vol. 102, no. 4, pp. 322–327, 1995.
7. T. P. Hill, “A statistical derivation of the significant-digit law,” Statistical Science, vol. 10, no. 4, pp. 354–363, 1995.
8. T. P. Hill, “Benford's law,” Encyclopedia of Mathematics Supplement, vol. 1, p. 102, 1997. View at: Google Scholar
9. T. P. Hill, “The first digit phenomenon,” The American Scientist, vol. 86, no. 4, pp. 358–363, 1998. View at: Google Scholar
10. P. C. Allaart, “An invariant-sum characterization of Benford's law,” Journal of Applied Probability, vol. 34, no. 1, pp. 288–291, 1997.
11. É. Janvresse and T. de la Rue, “From uniform distributions to Benford's law,” Journal of Applied Probability, vol. 41, no. 4, pp. 1203–1210, 2004.
12. A. E. Kossovsky, “Towards a better understanding of the leading digits phenomena,” preprint, 2008, http://arxiv.org/abs/math/0612627. View at: Google Scholar
13. D. Jang, J. U. Kang, A. Kruckman, J. Kudo, and S. J. Miller , “Chains of distributions, hierarchical Bayesian models and Benford's law,” Journal of Algebra, Number Theory: Advances and Applications, vol. 1, no. 1, pp. 37–60, 2009. View at: Google Scholar
14. S. J. Miller and M. J. Nigrini, “The modulo 1 central limit theorem and Benford's law for products,” International Journal of Algebra, vol. 2, no. 1–4, pp. 119–130, 2008.
15. O. Kafri, “Entropy principle in direct derivation of Benford's law,” preprint, 2009, http://arxiv.org/ftp/arxiv/papers/0901/0901.3047.pdf. View at: Google Scholar
16. W. Hürlimann, “Benford's law from 1881 to 2006: a bibliography,” 2006, http://arxiv.org/abs/math/0607168. View at: Google Scholar
17. A. Berg and T. Hill, “Benford Online Bibliography,” 2009, http://www.benfordonline.net. View at: Google Scholar
18. G. Judge and L. Schechter, “Detecting problems in survey data using Benford's law,” Journal of Human Resources, vol. 44, no. 1, pp. 1–24, 2009. View at: Publisher Site | Google Scholar
19. G. Judge, L. Schechter, and M. Grendar, “An information theoretic family of data based Benford-like distributions,” Physica A, vol. 97, pp. 201–207, 2007. View at: Google Scholar
20. M. J. Nigrini and S. J. Miller, “Benford's law applied to hydrology data—results and relevance to other geophysical data,” Mathematical Geology, vol. 39, no. 5, pp. 469–490, 2007.
21. L. M. Leemis, B. W. Schmeiser, and D. L. Evans, “Survival distributions satisfying Benford's law,” The American Statistician, vol. 54, no. 4, pp. 236–241, 2000. View at: Publisher Site | Google Scholar | MathSciNet
22. H.-A. Engel and C. Leuenberger, “Benford's law for exponential random variables,” Statistics & Probability Letters, vol. 63, no. 4, pp. 361–365, 2003.
23. J. R. van Dorp and S. Kotz, “The standard two-sided power distribution and its properties: with applications in financial engineering,” The American Statistician, vol. 56, no. 2, pp. 90–99, 2002. View at: Publisher Site | Google Scholar | MathSciNet
24. W. Hürlimann, “A generalized Benford law and its application,” Advances and Applications in Statistics, vol. 3, no. 3, pp. 217–228, 2003.
25. W. J. Reed, “The Pareto, Zipf and other power laws,” Economics Letters, vol. 74, no. 1, pp. 15–19, 2001.
26. A. V. Kontorovich and S. J. Miller, “Benford's law, values of $L$-functions and the $3x+1$ problem,” Acta Arithmetica, vol. 120, no. 3, pp. 269–297, 2005.
27. K. Schürger, “Extensions of Black-Scholes processes and Benford's law,” Stochastic Processes and Their Applications, vol. 118, no. 7, pp. 1219–1243, 2008.
28. O. Kafri, “The second law as a cause of the evolution,” preprint, 2007, http://arxiv.org/ftp/arxiv/papers/0711/0711.4507.pdf. View at: Google Scholar
29. O. Kafri, “Sociological inequality and the second law,” preprint, 2008, http://arxiv.org/ftp/arxiv/papers/0805/0805.3206.pdf. View at: Google Scholar
30. J. L. Brown Jr. and R. L. Duncan, “Modulo one uniform distribution of the sequence of logarithms of certain recursive sequences,” The Fibonacci Quarterly, vol. 8, no. 5, pp. 482–486, 1970.
31. J. Wlodarski, “Fibonacci and Lucas numbers tend to obey Benford's law,” The Fibonacci Quarterly, vol. 9, pp. 87–88, 1971. View at: Google Scholar
32. W. A. Sentance, “A further analysis of Benford's law,” The Fibonacci Quarterly, vol. 11, pp. 490–494, 1973. View at: Google Scholar | Zentralblatt MATH
33. W. Webb, “Distribution of the first digits of Fibonacci numbers,” The Fibonacci Quarterly, vol. 13, no. 4, pp. 334–336, 1975.
34. R. A. Raimi, “The first digit problem,” American Mathematical Monthly, vol. 83, pp. 521–538, 1976. View at: Google Scholar
35. W. G. Brady, “More on Benford's law,” The Fibonacci Quarterly, vol. 16, pp. 51–52, 1978. View at: Google Scholar | Zentralblatt MATH
36. S. Kunoff, “$N$! has the first digit property,” The Fibonacci Quarterly, vol. 25, no. 4, pp. 365–367, 1987.
37. S. J. Miller and R. Takloo-Bighash, An Invitation to Modern Number Theory, Princeton University Press, Princeton, NJ, USA, 2006. View at: MathSciNet
38. P. Jolissaint, “Loi de Benford, relations de récurrence et suites équidistribuées,” Elemente der Mathematik, vol. 60, no. 1, pp. 10–18, 2005.
39. P. Jolissaint, “Loi de Benford, relations de récurrence et suites équidistribuées. II,” Elemente der Mathematik, vol. 64, no. 1, pp. 21–36, 2009. View at: Google Scholar | MathSciNet
40. W. Hürlimann, “Integer powers and Benford's law,” International Journal of Pure and Applied Mathematics, vol. 11, no. 1, pp. 39–46, 2004.
41. P. Diaconis, “The distribution of leading digits and uniform distribution mod 1,” Annals of Probability, vol. 5, pp. 72–81, 1977. View at: Google Scholar
42. R. E. Whitney, “Initial digits for the sequence of primes,” The American Mathematical Monthly, vol. 79, no. 2, pp. 150–152, 1972.
43. P. Schatte, “On ${H}_{\infty }$-summability and the uniform distribution of sequences,” Mathematische Nachrichten, vol. 113, pp. 237–243, 1983. View at: Publisher Site | Google Scholar | MathSciNet
44. D. I. A. Cohen and T. M. Katz, “Prime numbers and the first digit phenomenon,” Journal of Number Theory, vol. 18, no. 3, pp. 261–268, 1984.
45. J.-P. Serre, A Course in Arithmetic, Springer, New York, NY, USA, 1996.