Abstract

The Hill estimator is often used to infer the power behavior in tails of experimental distribution functions. This estimator is known to produce bad results in certain situations which have lead to the so-called Hill horror plots. In this brief note, we propose an improved estimator which is simple and coherent and often provides an efficient remedy in the bad situations, especially when the distribution is decreasing slowly, when the data is restricted by external cuts to lie within a finite domain, or even when the distribution is increasing.

1. Introduction

It has been advocated that self-organization, first discovered to dominate sand pile formation [1], may very well apply to many financial, economic, traffic control, or social phenomena. The general outcome of these systems is in general an asymptotic power like behavior of the experimental distribution in some variable . The value of the critical exponent is often a major prediction of the models. Let us also mention models leading to Pareto [2, 3], Zipf [4], Levy-flight [5, 6], or Padé [79] type distributions which also provide asymptotic power like behaviors with some definite critical exponent.

A particularly simple estimator of the critical exponent for a set of data measures [10] has been provided by Hill [11]. It has proved to be very useful and has been widely used. See, for example, the KU Leuven web page [12]. Following Hill, the estimator is first applied to a subset of measures corresponding to the highest values of the variable. In order to increase the statistics, this subset is then extended to include progressively further experimental values of the variable, every set providing a value of . These values are then plotted in a so-called Hill plot [13]. The subset is extended as long as it includes values which can be considered to lie in the asymptotic domain where the distribution is power like. The values obtained when they are stable provide the best value of the Hill estimator, as usually seen on the Hill plot. It is then obvious that the estimator usually takes into account all the highest experimental values of the variable larger than a certain value . The data points with do not belong to the asymptotic region or turn out to be unsafe to use.

Various properties, aspects, and generalizations of the Hill estimator have been discussed and studied in numerous publications. One should mention that many questions about the asymptotic properties, about asymptotic normality, and about the volatility of the index have been addressed (see, e.g., [1418]). Some authors have proposed, in order to improve the high volatility of the Hill plots, to smooth the result by averaging the Hill estimator values corresponding to different numbers of order statistics [19]. Other authors have smoothed the Hill estimator by convoluting the experimental random variables by a kernel function together with a bandwidth parameter [20]. An optimal choice of the kernel and of the parameter improves the situation greatly.

However, it should be stressed that, in many cases, experimental distributions are related to discrete phenomena, which carry their own natural limits. In other cases, large values in the data may be biased, unsafe or unreachable for technical reasons. Let us give two examples.

One example of considerable importance relates to the evaluation of risk in finance, in particular those related to the variations of interest rates. Starting from their known daily variation in the past, one may try to evaluate the probability of a major overnight variation of say 3 or 4 percent in the future. Using a normal law, as often used by risk managers, the catastrophe should occur of the order of once every 10,000 years. Starting from the same available data but with a power law this may happen tree or four times per century. The precise evaluation of the parameter is obviously of paramount importance. In data of the Federal Reserve System, if any day the variation is larger than a certain predefined limit the quotation is suspended. Hence the data is artificially cut for high . The relevant must be restricted in a finite domain extending from some value where asymptotics begins to some value where the data is artificially cut.

Another example occurs in sociology. Suppose that some phenomena depends on the number of inhabitants of cities, it is clear that no data on earth today will be obtained for towns larger than say thirty millions inhabitants or smaller than say ten inhabitants. The available data will be restricted to a finite domain.

To conclude, in various cases, the variable is not only discrete but also restricted to lie within a finite domain . Data outside the domain is either not available or unsafe to use or does not correspond to the asymptotic region. Remark that the domain may not only have a lowest (left) bound but also a highest (right) bound . Though the lower bound is taken into account by the Hill estimator, the possible presence of the highest bound has not been addressed by the Hill.

In this paper, we thus wish to provide a very simple improvement of the Hill estimator (and of the Hill plot) which takes into account in a perfectly symmetrical way of a lowest value () and of a highest value () defining a safe domain where the power behavior is at work, where data exists, and from which the critical exponent should be inferred.

The problem of existing limits on the data has been subject to a much lower scrutiny than the Hill estimator itself. We would like to cite the work of Beirlant and Guillou [21] and Beirlant et al. [22] dealing in particular with insurance policies where policy provisions (deduction, limits) constrain the data. They suppose censored data, that is, that the number of events above the constrain value is known and discuss the influence of censoring. This is not exactly the aim of our article.

We have focused our attention on data which exist in a finite range only and have obtained the simplest estimator which takes into account in a symmetrical way the lower and higher bounds of the domain.

Finally, we would like to mention the related problems of variance and bias of the estimators and especially the question of their experimental determination. For the Hill estimator, this is delicate and it has given rise to much research as is testified by numerous references [2326]. Starting from the basic definition of the variance-bias parameters applied to the improved estimator, a detailed discussion following the paths set up for the Hill estimator are certainly worth further work and publications, both in theory and when actual data, which carry their own uncertainty, are used. Indeed, the experimental precision of each data point is not always easy to determine, may in fact depend of the value of the itself and has to be taken into account precisely. On the other hand, in simulations, the result depends on the details of the random number generators.

2. The Hill Estimator

One supposes that, based on some theoretical model, one predicts a distribution density which behaves asymptotically as where and are real and the critical exponent is a positive real parameter. When , the function is supposed to be a rather smooth function which, for large (say ), is often assumed to become essentially a constant . In those cases and within a certain margin of error, the distribution is thus approximated in the form which is scale free. As explained in the introduction, it has been conjectured by different authors that this type of power like distributions and in particular those based on self-organized critical models, rather than the often used Gauss like exponential forms, could very well dominate certain financial, economic, or social phenomena.

Let () be a random sample obtained from experimental data for a phenomenon which is supposed to follow a distribution law satisfying the requirements (2.1) and/or (2.2). The question is to draw inference on the critical exponent from the random sample.

This most important question was discussed very carefully by Hill [11] both from a Bayesian and a frequentist approach. He showed that, in a first approach, both points of view lead to the same very useful answer. His recipe for estimating can be outlined as follows. (i)Let the set be the set reordered (reversed order statistics) in such a way that that is, the set is ordered in a decreasing fashion, being the highest value.(ii)Construct the sets of numbers , , and for . The set is defined by which is identical to the set proposed by Hill (see [11]) The set is defined by and the set defined by(iii)For quite many distributions, Hill showed that is an estimator of improving when is increased until “it seems unwise to proceed.” It is the point at the left of which the form of the actual distribution is not anymore approximated safely enough by (2.2). In other words, this occurs when the variable is not large enough anymore to be in the asymptotic region of the distribution. Both the phenomenon and the limitation can easily be seen by constructing specific examples.

Unfortunately, there are simple but somewhat more subtle distributions for which the set does not converge well when is increased. This type of situation has led to what has been called “horror Hill plots.” Such horror plots can easily be constructed by considering simple examples or by looking at [27, page 194 ]. As argued in the introduction, there are also cases when the discrete experimental distribution carry their own domain, bounded on the left and on the right .

In this short note, we show how to remedy some of these unfortunate situations in a very straightforward way. We construct some numerical examples to show this explicitly. This result will be achieved by taking into account, not only a left boundary but also a right boundary . It is supposed that the data cannot be trusted and/or is not power like at the left of or at the right of .

3. Improvement on the Hill Estimator

We first derive a simple heuristic formula and then show how to apply it to improve the Hill estimator.

3.1. A Simple and Exact Formula

Let us first suppose that is exactly, for , of a power law form with an arbitrary normalization constant . Take two arbitrary positive numbers and define the average value of on the interval as After some algebra, defining and for later convenience one finds, from (3.1), the exact relation which depends on the correction function Equation (3.5) is our basic equation which is exactly valid for an exact power law distribution (3.1) and approximately correct for a distribution which satisfies (2.1) and/or (2.2).

3.2. The Basic Hill Estimator

The basic Hill estimator is obtained when the upper limit is chosen at infinity. Indeed, when is larger than 1 and , (3.5) reduces to

The formula (3.7) can easily be used to draw inference for and from the highest values of the random sample with the chosen order statistics which we consider as the conditional event. Take as the experimental average value of from to the highest empirical value , We find the formula for the Hill estimator (2.4) exactly.

Remark that by using a Simpson rule an experimental average, slightly better than (3.8), is obtained by The difference for the Hill estimator using the slightly better (3.10) rather than (3.8) is usually too minute to care.

3.3. The Improvement

From (3.5), we see that the Hill estimator can be improved by taking into account not only a lowest value in the reduced empirical set but also a highest value . Taking again a sample estimate for in the reduced set, one obtains the solution of as an inference for ,

This obviously provides an easy generalization of the Hill procedure. It makes sense when it is known theoretically that the density function behaves as times a constant or times a slowly varying function and that the random sample is secure for . It takes into account the facts that the exact theoretical form of is not known on the left of and that the data points do not extend beyond because of limited statistics or when the data is poorly known outside a finite domain. In this case, the set of derived from the random sample should include the values inside the interval only and thus take the 's between the two limit points () into account.

The generalised Hill plot is obtained by taking first , then increasing until “ seems unwise to proceed” and plotting the as a function of . Otherwise can be increased and decreased until “it seems unwise to proceed.” When the data is not biased for the largest values of , the choice is optimal.

It is finally worth noting the very important fact that our basic equation (3.11) is perfectly symmetrical under the exchange . As a result, it will be apply and provide meaningful answer for distribution which increase ( negative) rather than decrease with , as will be seen in the examples.

Our fundamental equation (3.11) is transcendental and hence requires a further treatment. The estimation of the value of inferred from (3.11) can, for example, be obtained numerically in the two following ways.

Method. Direct Evaluation
By using standard ad hoc computer programs, (3.11) can be solved numerically for .

Method. Iteration
A simple way to achieve the same result is as follows. Define the function as the derivative of the correction function with respect to Take the first order as the Hill solution and define the successive approximations by iteration as In the right hand side, the correction function and its derivative are estimated for the value of of the preceding iteration. Then for . Empirically, for as low as [5, 6] or [79], the approximation for and hence for is already very good.

4. Theoretical Comparison between the Hill Estimator and the New Estimator

When should the new estimator obviously be preferred to the Hill estimator? We remark that the latter (see (2.4)) depends critically on two quantities only. These are the average value of the logarithm of the data points and the smallest (highest ) included in the sample. The improved estimator depends on these two quantities but also crucially on the third one: the highest attained value .

Let us justify this by discussing more carefully than in the introduction three examples of situations, pertaining to three completely different domains of research (sociology, economy, high energy physics), where both limits and should be taken into account.(i)Suppose that some sociological theory predicts an asymptotic power law for some phenomenon (crime rate, economic growth, power consumption) as a function of the number of inhabitants in towns, either when this number is high or when it is small. Obviously, since every country has a finite number of inhabitants, the number of people in any city on earth is less than some number, actually inhabitants for Bombay (India). There can be no data for larger numbers. The data is cut artificially on this higher side essentially by the fact that the earth has a finite population. On the other end of the statistics, when is small, no aggregation of human beings is called a city if it contains less than say about ten inhabitants, depending on the rules for defining towns in different countries. Even more, no town will ever be defined with a fraction of one inhabitant. Hence the data is cut on the lower side by country depending administrative decisions.(ii)Suppose that some economic theory predicts a power law for the daily variation of the interest rates (or of the price of some share) when they are large (the tail of the variations). Usually, if this variation on some day exceeds some predefined number the floor has rules to suspend the quotation. Hence, no data will ever be produced with higher variations. This artificial limit, which is usually not taken into account in the models, is included in the analysis of the data by the dependence of the new estimator in the highest value .(iii)In high energy physics, when two beams of initial particles are made to scatter head on at very high speed along a certain direction, final particles emerge from the scattering region at an angle with respect to the direction of the initial beams. The probability distribution is often plotted as a function of the transverse momentum which is related to this angle. The interaction region is surrounded by detectors. In the forward and in the backward directions there is a blind cone where the scattered particles cannot be identified among those of the intensive initial beams. Hence there are artificial limits both at low and at high transverse momenta outside which no data is produced because of the physics of the measuring devices. When the tails of the distributions of the scattered particles with respect to beam are studied, these artificial limitations have to be taken into account.(iv)In practical situations, to obtain the improved estimator the parameters bounds and , which limit on the left and on the right the data points, have to be chosen in an appropriate manner. There is, a priori, no unique method for choosing them. For actual phenomena, related to some definite model (e.g., to solutions of a differential equation), the model itself should provide some information on the bounds. The onset of the asymptotic domain of the model provides an indication on . On the other hand, the measure themselves may carry a natural right bound where the measurements become meaningless (e.g., a town of more than inhabitants). The estimator should be evaluated for various values of the bounds and educated guesses have to be applied. In general, it is convenient to simply use the highest data point as unless there are good reasons to reject it. In high energy physics, “good experimentalists” are known to apply suitable and correct cuts to their data, depending on a deep inside knowledge of their apparatus.

5. Tests of the New Estimator

In order to test the new estimator, we have used it for some trial functions. The new estimator performs as well or even slightly better than the original Hill estimator in cases where the latter is known to be good. When applied to cases where the Hill estimator is known to perform more poorly, the improvement is often spectacular and the new estimator converges to a much better approximate value. Using the method () above (see (3.15)), a good value is usually obtained after three steps only and always after four steps, that is, from .

Let us give here a few examples where we have used the following procedure. The examples are divided in sets. The examples ()–() are related to very simple distributions where the best value of is obtained by letting become the largest available value in the data and the lowest, that is, . In the examples ()–(), the Hill plot is drawn for somewhat more complicated distributions where the deviation from a purely power behavior shows up for small . In all the cases, (see (3.11)) has been chosen.(1)We have used FORTRAN to do numerical calculations.(2) We have used the DRNGCS module of the International Mathematical and Statistical Libraries (IMSL) to generate randomly a certain number of , inside a predefined interval and following a given distribution . It should be noted that the normalization of is inessential. As required by DRNGCS, this normalization is chosen in such a way that the cumulated distribution has . In our examples, the distribution and the cumulated distribution have always been defined on a grid of 1000 to 10,000 points regularly separated between and . This is precise enough for our purpose.(3)For each example ()–(), we give the results in Table 1 which includes the following. (i)The trial distribution , up to an arbitrary factor, fed to the random generator. (ii)The predefined limits and chosen for the data produced by the random generator and ordered according to (2.3). (iii)The number of randomly produced . (iv)The smallest value and highest value obtained from the random generator. Obviously .(v)The average value of the natural logarithm of the as produced by the random generator (vi)The value expected for , the exponent of the power decrease of the starting distribution .(vii)The value produced by the Hill estimator (see (3.14)). (viii)The values produced by the iteration. Quite generally, the values produced by the iteration are already very good except in example () below where we had to proceed to ). In almost all cases (except in example ()), it turns out that for with at least four significant digits well within the errors of the method.

Example
Let us first give an example where the old and the new estimators perform both well. We have supposed a power law (3.1) with , , and . In this case, is large enough and can be essentially ignored in all the formulas. Following the procedure outlined above we obtain the line () of Table 1. We see that in the values produced by the Hill estimator and by the new estimator are almost equal and close to the expected value . The new estimator is slightly better. The same behavior is found when and the related are large enough.

Example
The second example is almost identical to the first example except that the data is artificially reduced to the interval , . We find the line () of Table 1. Here we see that in this extreme case, where the interval allowed has been reduced drastically, the Hill estimator has produced a very bad result while the new estimator converges toward a much better value close to the expected one.

Example
The third example in line () of Table 1 is identical to the second except that the number of , is increased to 5000. The new estimator behaves even better while the Hill one is still very bad. This behavior was expected. It should be noted that, when is increased from 4 to over 100, the two estimators tend gradually to produce compatible results. There is a weak dependency in the value chosen for provided that it is chosen large enough.

Examples, (), and ()
When the probability distribution is chosen to decrease very slowly, here , even if is chosen rather large , or , we see in line (), (), and () of Table 1 that the new estimator is definitively better than the Hill one. An increase of and/or of does not alter this fact drastically.

Examples and ()
Let us now give an example where the initial distribution involves a slowly varying function as the consequence of a Padé (1,4) type distribution in the positive region : We expect a type behavior provided that is chosen sufficiently large. The actual left limit to be used depends on the values of the parameters . We analyze the problem in the case of an experimental symmetric distribution of variation of interest rates of a Padé type (see [79] and especially the results obtained in the first reference). Data which were extracted from the Federal Reserve System lead to and for a lag of one day and a maturity of one year. Here, is expressed in the convenient unit of percent/year. The parameters and as well as and are given percent/year. It is easy to see that for the term in to dominate over the term in the denominator, a value of of about one percent in needed. Hence has to be chosen greater than one percent. We do not expect a behavior for smaller values. In lines () and () of Table 1, the left boundary is taken to be one percent while the right one , when the floor would be supposed to suspend the quotation, is respectively two and five percent.

Examples , (), (), and ()
Let us now turn to two examples when there is a logarithmic factor, namely, and when there is an inverse logarithmic factor In both cases the distribution decreases rather slowly so that taking into account the right boundary has usually a major effect. In these cases, we have increased the value of to 5000. The results depend rather weakly on this choice. In lines () and () of Table 1, the presence of the factor in the denominator leads to an effective decrease of the probability faster than . Hence, the effective is expected and turns out to be somewhat higher than one. In the opposite way, in lines () and (), when the logarithm appears in the numerator, the effective is expected to be somewhat lower than one and it is.


Since the treatment of the new estimator (3.11) is perfectly symmetrical under the exchange of two limits and , it can be applied directly without any change to infer the asymptotic power behavior of distributions which increase with and hence have a power behavior with negative up to a smooth multiplication function. Let us give a last example for , and . Obviously the Hill estimator produces a wrong sign for . The new estimator is perfect but the iterations had to be carried effectively to the fourth level since the starting is so bad.

. Figure 1
In Figure 1, we show the original Hill plot and the improved plot related to the Padé example (). The parameters in (5.3) are again and . Here we have chosen percent and percent and generated 2000 data points. The continuous line is the original Hill plot, the dotted line is the improved plot, and the dashed line is the expected asymptotic value .

. Figure 2
We have taken a distribution The random data with has been collected in the interval and . The original Hill estimate and the improved estimate are plotted in terms of ranging up to in Figure 2. The continuous line is the original Hill plot, the dotted line is the improved plot and the dashed line is the expected asymptotic value .

. Figure 3
In Figure 3, the distribution is taken as related to example (). The random data with has been collected in the interval and . The continuous line is the original Hill plot, the dotted line is the improved plot and the dashed line is the expected asymptotic value (regardless of the ). In fact one sees that when reaches about 4000, the value of is close to 1. For larger values of , decreases below . This is due to the fact that the slowly varying function in the numerator leads to a slightly decreased value of whose effects increases with . Indeed, since is an increasing function of , the distribution is expected to decrease somewhat more slowly than .

. Figure 4
In Figure 4, the distribution is taken as related to Examples (), (), and (). The random data with has been collected in the interval and . The continuous line is the original Hill plot, the dotted line is the improved plot and the dashed line is the expected asymptotic value .

6. Conclusion

We have shown that, by a very simple alteration, the Hill estimator, which provides a useful inference for the power behavior of the tail of a distribution, can be easily improved to cover cases where it performs badly. This includes cases when the tail is not of the form of a power law but is multiplied by a slowly varying functions including logarithms for example. It also applies when the power is of the order or smaller than one. It even works for inferring the tails of increasing distributions, that is, when is negative.

Needless to say, the approach outlined in this paper is one of efficiency. From the frequentist point of view, it is perfectly justified. However, in some way, the Bayesian point of view is also, but granted not completely, met as the theoretical and empirical knowledge of extreme tails of the distribution is somehow better taken into account. It should be stressed that the new estimator applies particularly when the data is confined by artificial external conditions to lie within a finite domain bounded not only by a lowest value of the variable but also by a highest value.

Acknowledgments

This work was supported in part by the Belgian F.N.R.S. (Fonds National de la Recherche Scientifique). The author would like to thank Professor I. Platten for carrying detailed numerical tests in an early phase of this research and Professor F. Grard for a critical reading of the manuscript.