International Scholarly Research Notices

International Scholarly Research Notices / 2012 / Article

Research Article | Open Access

Volume 2012 |Article ID 746203 |

Salim Bouzebda, Mohamed Cherfi, "Dual Divergence Estimators of the Tail Index", International Scholarly Research Notices, vol. 2012, Article ID 746203, 14 pages, 2012.

Dual Divergence Estimators of the Tail Index

Academic Editor: M. Galea
Received17 Aug 2012
Accepted30 Sep 2012
Published01 Nov 2012


The main purpose of the present paper is to propose a new estimator of the tail index using -divergences and the duality technique. These estimators are explored with respect to robustness through the influence function approach. The empirical performances of the proposed estimators are illustrated by simulation.

1. Introduction

In extreme value statistics emphasis lies on the modelling of rare events, mostly events with a low frequency but a high impact. Common practice is to characterize the size and frequency of such extreme events mainly by the extreme value index , and here the main problem is to estimate the unknown parameter . Since only the upper tail of the distribution is involved, it is reasonable to construct estimators of based on the top extreme values of a sample . The most commonly used estimator of the kind is that proposed by Hill [1]. We mention that the most prominent estimators of this real-valued parameter are maximum likelihood estimators of specific parametric models which are fitted to excesses over large thresholds (see [2]). Indeed, alternatives to the Hill estimator are discussed by Smith [2]. One of his conclusions (see, e.g., [2], pp. 1181-1182) is that in general the Hill estimator compares favourably with other competitors. In general, these maximum likelihood estimators often prove to be highly efficient, though nonrobust against deviations of the actual distribution from the assumed parametric model. This is, for instance, the case in the presence of outliers or suspicious data, where the performance of the maximum likelihood estimators and the quality of the corresponding estimates of the tail index are often seriously affected. It is known that the maximum likelihood estimation is very sensitive to deviations from theoretical distributions which is, not surprisingly, the case for the class of heavy-tailed distributions, and fails to provide a reasonable parameter estimation; refer to Alexander [3] among others.

Robustness is an important issue in extreme value theory; see for instance DellAquila and Embrechts [4]. As shown in Brazauskas and Serfling [5], small errors in the estimation of the tail index can already produce large errors in the estimation of quantities based on the tail index . Hence, to overcome the lack of robustness to outliers of this kind of estimators, some robust methods for extreme values have already been discussed in recent literature. The interested reader may refer to Brazauskas and Serfling [6] for robust estimation in the context of strict Pareto distributions. Dupuis and Field [7], respectively, Peng and Welsh [8] and Juárez and Schucany [9], derived robust estimation methods for the case where the observations follow a generalized extreme value distribution, respectively, a generalized Pareto distribution, light or heavy tailed. Vandewalle et al. [10] considered a robust estimation method based on the minimization of the integrated squared error criterion using an incomplete density mixture model; Kim and Lee [11] used the minimum density power divergence approach of Basu et al. [12] to estimate the tail index in the dependent case; more recently Hubert et al. [13] proposed a method to detect outliers that can influence the Hill [1] estimator.

In this paper, we propose a new robust tail index estimation procedure, based on -divergences and the duality technique, for the semi-parametric setting of Pareto-type (or heavy-tailed) distributions. So here, the strict Pareto distribution is assumed to hold only asymptotically, that is, for excess distributions over high enough threshold values. The proposed method extends the maximum likelihood procedure, and, it will be seen that the last method corresponds to the particular choice of the -divergence which leads to the Hill [1]'s etimator.

The remainder of this paper is organized as follows. After some motivations in this Introduction, Section 2, is devoted to preliminary results on -divergence and the introduction of our estimator. Section 3 presents our new results on the Influence function. In Section 4, we investigate the finite-sample performance of the newly proposed estimators. To avoid interrupting the flow of the presentation, all technical arguments are deferred to the Appendix.

2. Extreme Value Statistics and -Divergence Setting

A widely usen family of divergences is the so-called “power divergences”, introduced by Cressie and Read [14] (see also Liese and Vajda [15, Chapter 2] and also the Renyi [16]'s paper is to be mentioned here), which are defined through the class of convex real-valued functions, for in : We have and . (For all , we define .) So, the -divergence is associated to , the to , the to , the to and the Hellinger distance to . In the monograph by Liese and Vajda [15] the reader may find detailed ingredients of the modeling theory as well as surveys of the commonly used divergences. We recall some basic definitions for the readers' convenience. Unless otherwise specified we will assume that the function is a function of class , strictly convex, such that, for fixed , According to Broniatowski and Keziou [17], if the function satisfies the following conditions: then the assumption (2.2) is satisfied whenever , where stands for the -divergence between and , we refer the reader to Broniatowski and Keziou [18, Lemma 3.2]. Also the real convex functions (2.1), associated with the class of power divergences, all satisfy the condition (2.2), including all standard divergences. Under assumption (2.2), using Fenchel duality technique, the divergence can be represented as resulting from an optimization procedure, where and This result was elegantly proven in, Keziou [19], Liese and Vajda [20] and Broniatowski and Keziou [17]. Broniatowski and Keziou [18] called it the dual form of a divergence, due to its connection with convex analysis. Furthermore, the supremum in this display (2.4) is unique and reached in , independently upon the value of . Let be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) . Naturally, a class of estimators of , called “dual -divergence estimators” (DDE's), is defined by where is the function defined in (2.5). The class of estimators satisfies Formula (2.6) defines a family of -estimators indexed by the function specifying the divergence and by some instrumental value of the parameter . Application of dual representation of -divergences has been considered by many authors; we cite among others, Keziou and Leoni-Aubin [21] for semiparametric two-sample density ratio models, bootstrapped -divergences estimates are considered in Bouzebda and Cherfi [22], extension of dual -divergences estimators to right censored data are introduced in Cherfi [23], for estimation and tests in copula models we refer to Bouzebda and Keziou [24, 25], and the references therein. Performances of dual -divergence estimators for normal models are studied in Cherfi [26].

In what follows, we describe the procedure used to obtain the DDE for the tail index. Let be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) . Since it is well known that a distribution is in the domain of attraction of a Fréchet distribution if and only if the distribution has a regularly varying tail, we can assume that is regularly varying at with the exponent ,   is called the tail index of distribution : or equivalently, where is slowly varying at , namely, In the Pareto-type case, the conditional distribution of relative excesses over a threshold satisfies it is easily seen that ultimately A popular choice for the threshold in threshold based methods is , the th largest observation of the sample, with for some . Here and elsewhere, denotes the largest integer . The quantile function pertaining to , is defined, for , by The empirical quantile function is given, for each and , by The threshold is easily seen to be equal to . The idea of constructing the DDE for the tail index is to assume the above Pareto approximation to hold exactly as a model for the conditional distribution of the relative excesses That is we can fit a Pareto model to the relative excesses. In this framework, the estimation of can be handled through dual divergences techniques, which provide a wide range of estimators, including the Hill estimator, they all can be compared with respect to robustness properties. Consider the Pareto density Specializing (2.4) to this setting, elementary calculation, for in , gives Using this last equality, one finds We now consider an interesting particular case of the previous setup, for , one obtains which leads to the famous Hill estimator [1], given by independently upon , where . Mason [27] show that consistency of the Hill estimator if is a sequence of positive integers satisfying Further investigations concerning the asymptotic distribution of the Hill estimator have been made by Hall [28], Csörgő and Mason [29], Haeusler and Teugels [30], Beirlant and Teugels [31], and Bouzebda [32] ISUP. This is shown, under certain additional regularity conditions, on and on satisfying (2.22).

3. Influence Function

In this section we study the robustness properties of the proposed estimators theoretically. In particular, we derive their influence functions from which the asymptotic variance follows. The following definition is needed for the statement of our forthcoming result. Recall that the influence function of a functional at a distribution describes the effect on the estimate of an infinitesimal contamination to at the point and is given by where and is the Dirac measure putting all its mass at and . In the following, we will derive the influence function for the functional form of the newly proposed estimator in an analogous way as for the classical -estimators [33]. General results on influence functions of the dual -divergence estimators can be found in Toma and Broniatowski [34]. We will use the following notations where stands for the indicator function of the event . Our results are summarized in the following theorem; its proof is given in the next section.

Proposition 3.1. The influence function of the functional corresponding to an estimator is given by

To illustrate the behavior of the obtained influence functions we restrict ourselves to the strict Pareto case; simulation results for other heavy-tailed distribution are presented in the next section. Figure 1 plots the influence functions of our estimators for the Pareto distribution. Observe that, for the Hellinger distance () and the -divergence (), the influence for the tail index becomes negligible for large outliers. The influence functions are bounded, making the associated functionals robust, in contrast with the Hill estimator () and the other divergences.

4. Simulation

In order to illustrate the robustness of the proposed statistical method, its finite sample behavior is investigated, both at contaminated as well as uncontaminated data. For the tail index , a comparison is made between the well-known Hill [1] estimator and the newly proposed estimator.

We consider simulated samples without contamination, each containing observations, from the two Pareto-type distributions. (i)The Fréchet distribution given by . (ii)The Burr distribution given by In the simulations, we have chosen , for the Burr distribution , , . The means of the DDE (left) and the corresponding empirical mean squared errors (right), also as a function of are plotted. The horizontal line indicates the true value of .

When the data are uncontaminated, although most robust estimators are known to be less efficient at the true model than maximum likelihood estimators [1], we notice that the estimates seem to be fairly stable for intermediate values of , making the influence of the choice of less troublesome and even with respect to mean squared error, the newly proposed estimator does not seem to lose too much accuracy, a close look to Figures 2 and 3 shows that there is a slight tendency to overestimation. The newly proposed DDE's perform remarkably as well as the Hill [1] estimator.

However, a slight contamination () is sufficient to make the DDE associated to the -divergence () more appealing in terms of low MSE; see Figures 4, 5, 6, and 7. Furthermore when contamination increases, the DDE performs remarkably better.

Overall, the simulation results in this section provide supporting evidence of the adequacy of the DDE associated with the -divergence with observations drawn from Fréchet and Burr distributions. Moreover, the sensitivity of this estimator for the choice of is low.


This section is devoted to the proofs of our result.

Proof of Proposition 3.1. For convenience, we recall the definition of the empirical measure associated with the random variables , , which is given by We define the estimator as the value which maximizes, independently of , the following estimating equation: or, equivalently, as the solution, in , of the following equation: In this view, the estimator may be written in the form of a functional , given by We continue by rewriting (A.4) for the contaminated distribution as given in the definition of the influence function defined in (3.1), that is, Observe that Keeping in mind the definition of as in (A.4), the last term in (A.7) disappears. We next evaluate the first term in the right side of (A.7). We infer readily by using Leibnitz's integral rule, that: In a similar way, we can therefore write, The proof of Proposition 3.1 is therefore completed.


The authors would like to thank the four editors for their helpful comments on the paper.


  1. B. M. Hill, “A simple general approach to inference about the tail of a distribution,” The Annals of Statistics, vol. 3, no. 5, pp. 1163–1174, 1975. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  2. R. L. Smith, “Estimating tails of probability distributions,” The Annals of Statistics, vol. 15, no. 3, pp. 1174–1207, 1987. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  3. C. Alexander, “Assessment of operational risk capital. newblock,” in Riskman-Agement, Challenge and Opportunity, M. Frenkel, U. Hommel, and M. Rudolf, Eds., 2nd edition, 2005. View at: Google Scholar
  4. R. Dell'Aquila and P. Embrechts, “Extremes and robustness: a contradiction?” Financial Markets and Portfolio Management, vol. 20, no. 1, pp. 103–118, 2006. View at: Google Scholar
  5. V. Brazauskas and R. Serfling, “Robust estimation of tail parameters for twoparameter and exponential models via generalized quantile statistics,” Extremes, vol. 3, pp. 231–249, 2000. View at: Google Scholar
  6. V. Brazauskas and R. Serfling, “Robust and effcient estimation of the tail index of a single-parameter Pareto distribution,” The North American Actuarial Journal, vol. 4, pp. 12–27, 2000. View at: Google Scholar
  7. D. J. Dupuis and C. A. Field, “Robust estimation of extremes,” The Canadian Journal of Statistics, vol. 26, no. 2, pp. 199–215, 1998. View at: Google Scholar
  8. L. Peng and A. H. Welsh, “Robust estimation of the generalized Pareto distribution,” Extremes, vol. 4, no. 1, pp. 53–65, 2001. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  9. S. F. Juárez and W. R. Schucany, “Robust and efficient estimation for the generalized Pareto distribution,” Extremes, vol. 7, no. 3, pp. 237–251, 2004. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  10. B. Vandewalle, J. Beirlant, A. Christmann, and M. Hubert, “A robust estimator for the tail index of Pareto-type distributions,” Computational Statistics & Data Analysis, vol. 51, no. 12, pp. 6252–6268, 2007. View at: Publisher Site | Google Scholar
  11. M. Kim and S. Lee, “Estimation of a tail index based on minimum density power divergence,” Journal of Multivariate Analysis, vol. 99, no. 10, pp. 2453–2471, 2008. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  12. A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones, “Robust and efficient estimation by minimising a density power divergence,” Biometrika, vol. 85, no. 3, pp. 549–559, 1998. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  13. M. Hubert, G. Dierckx, and D. Vanpaemel, “Detecting inuential data points for the Hill estimator in Pareto-type distributions,” Computational Statistics & Data Analysis. In press. View at: Publisher Site | Google Scholar
  14. N. Cressie and T. R. C. Read, “Multinomial goodness-of-fit tests,” Journal of the Royal Statistical Society B, vol. 46, no. 3, pp. 440–464, 1984. View at: Google Scholar | Zentralblatt MATH
  15. F. Liese and I. Vajda, Convex Statistical Distances, vol. 95 of Teubner Texts in Mathematics, BSB B. G. Teubner Verlagsgesellschaft, Leipzig, Germany, 1987.
  16. A. Rényi, “On measures of entropy and information,” in Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 547–561, University of California Press, Berkeley, Calif, USA, 1961. View at: Google Scholar
  17. M. Broniatowski and A. Keziou, “Parametric estimation and tests through divergences and the duality technique,” Journal of Multivariate Analysis, vol. 100, no. 1, pp. 16–36, 2009. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  18. M. Broniatowski and A. Keziou, “Minimization of φ-divergences on sets of signed measures,” Studia Scientiarum Mathematicarum Hungarica, vol. 43, no. 4, pp. 403–442, 2006. View at: Publisher Site | Google Scholar
  19. A. Keziou, “Dual representation of φ-divergences and applications,” Comptes Rendus Mathématique, vol. 336, no. 10, pp. 857–862, 2003. View at: Publisher Site | Google Scholar
  20. F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Transactions on Information Theory, vol. 52, no. 10, pp. 4394–4412, 2006. View at: Publisher Site | Google Scholar
  21. A. Keziou and S. Leoni-Aubin, “On empirical likelihood for semiparametric two-sample density ratio models,” Journal of Statistical Planning and Inference, vol. 138, no. 4, pp. 915–928, 2008. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  22. S. Bouzebda and M. Cherfi, “General bootstrap for dual φ-divergence estimates,” Journal of Probability and Statistics, vol. 2012, Article ID 834107, 33 pages, 2012. View at: Publisher Site | Google Scholar
  23. M. Cherfi, “Dual divergences estimation for censored survival data,” Journal of Statistical Planning and Inference, vol. 142, no. 7, pp. 1746–1756, 2012. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  24. S. Bouzebda and A. Keziou, “New estimates and tests of independence in semiparametric copula models,” Kybernetika, vol. 46, no. 1, pp. 178–201, 2010. View at: Google Scholar | Zentralblatt MATH
  25. S. Bouzebda and A. Keziou, “A new test procedure of independence in copula models via χ2-divergence,” Communications in Statistics, vol. 39, no. 1-2, pp. 1–20, 2010. View at: Publisher Site | Google Scholar
  26. M. Cherfi, “Dual ϕ-divergences estimation in normal models,” Computation. In press. View at: Google Scholar
  27. D. M. Mason, “Laws of large numbers for sums of extreme values,” The Annals of Probability, vol. 10, no. 3, pp. 754–764, 1982. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  28. P. Hall, “On some simple estimates of an exponent of regular variation,” Journal of the Royal Statistical Society B, vol. 44, no. 1, pp. 37–42, 1982. View at: Google Scholar | Zentralblatt MATH
  29. S. Csörgő and D. M. Mason, “Central limit theorems for sums of extreme values,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 98, no. 3, pp. 547–558, 1985. View at: Publisher Site | Google Scholar
  30. E. Haeusler and J. L. Teugels, “On asymptotic normality of Hill's estimator for the exponent of regular variation,” The Annals of Statistics, vol. 13, no. 2, pp. 743–756, 1985. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  31. J. Beirlant and J. L. Teugels, “Asymptotics of Hill's estimator,” Teoriya Veroyatnosteĭ i ee Primeneniya, vol. 31, no. 3, pp. 530–536, 1986. View at: Google Scholar | Zentralblatt MATH
  32. S. Bouzebda, “Bootstrap de l'estimateur de Hill: théorèmes limites,” Annales de l'I.S.U.P., vol. 54, no. 1-2, pp. 61–72, 2010. View at: Google Scholar
  33. F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, New York, NY, USA, 1986.
  34. A. Toma and M. Broniatowski, “Dual divergence estimators and tests: robustness results,” Journal of Multivariate Analysis, vol. 102, no. 1, pp. 20–36, 2011. View at: Publisher Site | Google Scholar | Zentralblatt MATH

Copyright © 2012 Salim Bouzebda and Mohamed Cherfi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.