Abstract

The main purpose of the present paper is to propose a new estimator of the tail index using -divergences and the duality technique. These estimators are explored with respect to robustness through the influence function approach. The empirical performances of the proposed estimators are illustrated by simulation.

1. Introduction

In extreme value statistics emphasis lies on the modelling of rare events, mostly events with a low frequency but a high impact. Common practice is to characterize the size and frequency of such extreme events mainly by the extreme value index , and here the main problem is to estimate the unknown parameter . Since only the upper tail of the distribution is involved, it is reasonable to construct estimators of based on the top extreme values of a sample . The most commonly used estimator of the kind is that proposed by Hill [1]. We mention that the most prominent estimators of this real-valued parameter are maximum likelihood estimators of specific parametric models which are fitted to excesses over large thresholds (see [2]). Indeed, alternatives to the Hill estimator are discussed by Smith [2]. One of his conclusions (see, e.g., [2], pp. 1181-1182) is that in general the Hill estimator compares favourably with other competitors. In general, these maximum likelihood estimators often prove to be highly efficient, though nonrobust against deviations of the actual distribution from the assumed parametric model. This is, for instance, the case in the presence of outliers or suspicious data, where the performance of the maximum likelihood estimators and the quality of the corresponding estimates of the tail index are often seriously affected. It is known that the maximum likelihood estimation is very sensitive to deviations from theoretical distributions which is, not surprisingly, the case for the class of heavy-tailed distributions, and fails to provide a reasonable parameter estimation; refer to Alexander [3] among others.

Robustness is an important issue in extreme value theory; see for instance DellAquila and Embrechts [4]. As shown in Brazauskas and Serfling [5], small errors in the estimation of the tail index can already produce large errors in the estimation of quantities based on the tail index . Hence, to overcome the lack of robustness to outliers of this kind of estimators, some robust methods for extreme values have already been discussed in recent literature. The interested reader may refer to Brazauskas and Serfling [6] for robust estimation in the context of strict Pareto distributions. Dupuis and Field [7], respectively, Peng and Welsh [8] and Juárez and Schucany [9], derived robust estimation methods for the case where the observations follow a generalized extreme value distribution, respectively, a generalized Pareto distribution, light or heavy tailed. Vandewalle et al. [10] considered a robust estimation method based on the minimization of the integrated squared error criterion using an incomplete density mixture model; Kim and Lee [11] used the minimum density power divergence approach of Basu et al. [12] to estimate the tail index in the dependent case; more recently Hubert et al. [13] proposed a method to detect outliers that can influence the Hill [1] estimator.

In this paper, we propose a new robust tail index estimation procedure, based on -divergences and the duality technique, for the semi-parametric setting of Pareto-type (or heavy-tailed) distributions. So here, the strict Pareto distribution is assumed to hold only asymptotically, that is, for excess distributions over high enough threshold values. The proposed method extends the maximum likelihood procedure, and, it will be seen that the last method corresponds to the particular choice of the -divergence which leads to the Hill [1]'s etimator.

The remainder of this paper is organized as follows. After some motivations in this Introduction, Section 2, is devoted to preliminary results on -divergence and the introduction of our estimator. Section 3 presents our new results on the Influence function. In Section 4, we investigate the finite-sample performance of the newly proposed estimators. To avoid interrupting the flow of the presentation, all technical arguments are deferred to the Appendix.

2. Extreme Value Statistics and -Divergence Setting

A widely usen family of divergences is the so-called “power divergences”, introduced by Cressie and Read [14] (see also Liese and Vajda [15, Chapter 2] and also the Renyi [16]'s paper is to be mentioned here), which are defined through the class of convex real-valued functions, for in : We have and . (For all , we define .) So, the -divergence is associated to , the to , the to , the to and the Hellinger distance to . In the monograph by Liese and Vajda [15] the reader may find detailed ingredients of the modeling theory as well as surveys of the commonly used divergences. We recall some basic definitions for the readers' convenience. Unless otherwise specified we will assume that the function is a function of class , strictly convex, such that, for fixed , According to Broniatowski and Keziou [17], if the function satisfies the following conditions: then the assumption (2.2) is satisfied whenever , where stands for the -divergence between and , we refer the reader to Broniatowski and Keziou [18, Lemma 3.2]. Also the real convex functions (2.1), associated with the class of power divergences, all satisfy the condition (2.2), including all standard divergences. Under assumption (2.2), using Fenchel duality technique, the divergence can be represented as resulting from an optimization procedure, where and This result was elegantly proven in, Keziou [19], Liese and Vajda [20] and Broniatowski and Keziou [17]. Broniatowski and Keziou [18] called it the dual form of a divergence, due to its connection with convex analysis. Furthermore, the supremum in this display (2.4) is unique and reached in , independently upon the value of . Let be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) . Naturally, a class of estimators of , called “dual -divergence estimators” (DDE's), is defined by where is the function defined in (2.5). The class of estimators satisfies Formula (2.6) defines a family of -estimators indexed by the function specifying the divergence and by some instrumental value of the parameter . Application of dual representation of -divergences has been considered by many authors; we cite among others, Keziou and Leoni-Aubin [21] for semiparametric two-sample density ratio models, bootstrapped -divergences estimates are considered in Bouzebda and Cherfi [22], extension of dual -divergences estimators to right censored data are introduced in Cherfi [23], for estimation and tests in copula models we refer to Bouzebda and Keziou [24, 25], and the references therein. Performances of dual -divergence estimators for normal models are studied in Cherfi [26].

In what follows, we describe the procedure used to obtain the DDE for the tail index. Let be an independent, identically distributed (i.i.d.) sample from an unknown distribution function (d.f.) . Since it is well known that a distribution is in the domain of attraction of a Fréchet distribution if and only if the distribution has a regularly varying tail, we can assume that is regularly varying at with the exponent ,   is called the tail index of distribution : or equivalently, where is slowly varying at , namely, In the Pareto-type case, the conditional distribution of relative excesses over a threshold satisfies it is easily seen that ultimately A popular choice for the threshold in threshold based methods is , the th largest observation of the sample, with for some . Here and elsewhere, denotes the largest integer . The quantile function pertaining to , is defined, for , by The empirical quantile function is given, for each and , by The threshold is easily seen to be equal to . The idea of constructing the DDE for the tail index is to assume the above Pareto approximation to hold exactly as a model for the conditional distribution of the relative excesses That is we can fit a Pareto model to the relative excesses. In this framework, the estimation of can be handled through dual divergences techniques, which provide a wide range of estimators, including the Hill estimator, they all can be compared with respect to robustness properties. Consider the Pareto density Specializing (2.4) to this setting, elementary calculation, for in , gives Using this last equality, one finds We now consider an interesting particular case of the previous setup, for , one obtains which leads to the famous Hill estimator [1], given by independently upon , where . Mason [27] show that consistency of the Hill estimator if is a sequence of positive integers satisfying Further investigations concerning the asymptotic distribution of the Hill estimator have been made by Hall [28], Csörgő and Mason [29], Haeusler and Teugels [30], Beirlant and Teugels [31], and Bouzebda [32] ISUP. This is shown, under certain additional regularity conditions, on and on satisfying (2.22).

3. Influence Function

In this section we study the robustness properties of the proposed estimators theoretically. In particular, we derive their influence functions from which the asymptotic variance follows. The following definition is needed for the statement of our forthcoming result. Recall that the influence function of a functional at a distribution describes the effect on the estimate of an infinitesimal contamination to at the point and is given by where and is the Dirac measure putting all its mass at and . In the following, we will derive the influence function for the functional form of the newly proposed estimator in an analogous way as for the classical -estimators [33]. General results on influence functions of the dual -divergence estimators can be found in Toma and Broniatowski [34]. We will use the following notations where stands for the indicator function of the event . Our results are summarized in the following theorem; its proof is given in the next section.

Proposition 3.1. The influence function of the functional corresponding to an estimator is given by

To illustrate the behavior of the obtained influence functions we restrict ourselves to the strict Pareto case; simulation results for other heavy-tailed distribution are presented in the next section. Figure 1 plots the influence functions of our estimators for the Pareto distribution. Observe that, for the Hellinger distance () and the -divergence (), the influence for the tail index becomes negligible for large outliers. The influence functions are bounded, making the associated functionals robust, in contrast with the Hill estimator () and the other divergences.

4. Simulation

In order to illustrate the robustness of the proposed statistical method, its finite sample behavior is investigated, both at contaminated as well as uncontaminated data. For the tail index , a comparison is made between the well-known Hill [1] estimator and the newly proposed estimator.

We consider simulated samples without contamination, each containing observations, from the two Pareto-type distributions. (i)The Fréchet distribution given by . (ii)The Burr distribution given by In the simulations, we have chosen , for the Burr distribution , , . The means of the DDE (left) and the corresponding empirical mean squared errors (right), also as a function of are plotted. The horizontal line indicates the true value of .

When the data are uncontaminated, although most robust estimators are known to be less efficient at the true model than maximum likelihood estimators [1], we notice that the estimates seem to be fairly stable for intermediate values of , making the influence of the choice of less troublesome and even with respect to mean squared error, the newly proposed estimator does not seem to lose too much accuracy, a close look to Figures 2 and 3 shows that there is a slight tendency to overestimation. The newly proposed DDE's perform remarkably as well as the Hill [1] estimator.

However, a slight contamination () is sufficient to make the DDE associated to the -divergence () more appealing in terms of low MSE; see Figures 4, 5, 6, and 7. Furthermore when contamination increases, the DDE performs remarkably better.

Overall, the simulation results in this section provide supporting evidence of the adequacy of the DDE associated with the -divergence with observations drawn from Fréchet and Burr distributions. Moreover, the sensitivity of this estimator for the choice of is low.

Appendix

This section is devoted to the proofs of our result.

Proof of Proposition 3.1. For convenience, we recall the definition of the empirical measure associated with the random variables , , which is given by We define the estimator as the value which maximizes, independently of , the following estimating equation: or, equivalently, as the solution, in , of the following equation: In this view, the estimator may be written in the form of a functional , given by We continue by rewriting (A.4) for the contaminated distribution as given in the definition of the influence function defined in (3.1), that is, Observe that Keeping in mind the definition of as in (A.4), the last term in (A.7) disappears. We next evaluate the first term in the right side of (A.7). We infer readily by using Leibnitz's integral rule, that: In a similar way, we can therefore write, The proof of Proposition 3.1 is therefore completed.

Acknowledgments

The authors would like to thank the four editors for their helpful comments on the paper.