Research Article | Open Access
An Analysis of a Heuristic Procedure to Evaluate Tail (in)dependence
Measuring tail dependence is an important issue in many applied sciences in order to quantify the risk of simultaneous extreme events. A usual measure is given by the tail dependence coefficient. The characteristics of events behave quite differently as these become more extreme, whereas we are in the class of asymptotic dependence or in the class of asymptotic independence. The literature has emphasized the asymptotic dependent class but wrongly infers that tail dependence will result in the overestimation of extreme value dependence and consequently of the risk. In this paper we analyze this issue through simulation based on a heuristic procedure.
The degree of association between concurrent rainfall extremes at different locations may lead to a better understanding of extreme rainfall events, a very important matter due to their severe impacts on the economy and the environment. The globalization and an absence of market regulation increased the dependence on financial asset returns and thus the risk of simultaneous crashes. Pearson’s correlation is not an appropriate measure of dependence whenever extreme realizations are important. It gives the same weight to extreme values as for all of the other observations, and the dependence characteristics for extreme realizations may differ from all others in the sample. For more details, see for example, Embrechts et al. . The most used measure of tail dependence is the so called tail dependence coefficient (TDC), a concept introduced by Sibuya , which is defined as follows: where and are the distribution functions (d.f.’s) of the random variables (r.v.’s) and , respectively, which are considered continuous. Observe that the TDC can also be formulated through the copula function introduced by Sklar . A copula function is a d.f. whose marginals are standard uniform; that is, if is the copula function of , having joint d.f. , then and thus If , the r.v.’s and are said to be tail dependent whose degree of dependence is measured through ( means total dependence in the tail). The case , corresponds to asymptotic independence in the tail. However, as noticed in Ledford and Tawn [4, 5], it may occur a residual tail dependence captured through the convergence rate of towards zero, as . More precisely, consider where is a slow varying function at , that is, , as and . The parameter , known as Ledford and Tawn coefficient, measures the residual tail dependence and the function the relative strength of dependence given a particular value of . Observe that and converging to some positive constant corresponds to tail dependence (), whilst means tail independence. If , we have (almost) perfect independence (perfect if ), and for or we have association, respectively, negative (i.e., ) or positive (i.e., ).
Relation (4) also means that the function is regularly varying (of first order) with index . In Draisma et al. , it is considered a refinement of this relation under a second order regularly varying condition. More precisely, it is assumed that the limit exists, for all and , with , as , being a regularly varying function of index and a nonconstant function and nonmultiple of . It is also assumed that the convergence is uniform in , that exists,and, without loss of generality, . In addition, the function is homogeneous of order ; that is, .
Now observe that with , , and hence we can write Therefore, corresponds to the tail index of and thus it can be estimated as so. The second order regularly varying condition in (5) allows to derive the asymptotics of the estimator Draisma et al. [6, Theorem 2.1]. This will be addressed in Section 2.
An alternative measure for the residual tail dependence was introduced in Coles et al. . By considering and applying logarithms to both members, we derive
or with and corresponding to the survival copula; that is, Observe that means tail independence () and if , we have tail dependence (). We also have positive and negative associations whenever and , respectively, with corresponding to (almost) exact independence. Estimators for and based on the expressions (3) and (11), respectively, will be also presented in Section 2.
The behavior of events within the class of asymptotic dependence is quite different from the one detected in the class of asymptotic independence. Both forms allow dependence between moderately large values of each variable, but only when the variables exhibit tail dependence the very largest values from each variable can occur together. If we wrongly infer tail dependence, we will overestimate the dependence on the high values and consequently the risk. This overestimation is related to the degree of residual dependence which is measured through or . Therefore, it is important to assess whether a data set presents tail dependence or independence and to quantify the degree of dependence for the appropriate dependence class. This can be done through the estimation of and of (or ) together with tests for tail independence. These topics can be found in many references such as Huang , Joe , Coles et al. , Frahm et al. , and Schmidt and Stadtmüller  for the TDC estimation, Ledford and Tawn [4, 5] and Peng  concerning the estimation of , and Coles et al.  for the estimation. The tail independence tests can be seen in, for example, Poon et al.  and Draisma et al. .
Most of the nonparametric estimation of extremal parameters requires the choice of the number of upper order statistics to be used in it. A paradigmatic example is the univariate tail index estimation of regularly varying distributions (for a survey, see Beirlant et al.  2012 and references therein). A similar problem exists for tail dependence estimation. In practice, we have to deal with a trade-off between variance and bias, since small values of correspond to larger variance whilst large values of increase the bias of the estimators. Figure 1 illustrates this issue. Observe that the true value (horizontal line) can be inferred from a kind of first stability region within the sample path of estimators. In order to overcome this problem, Frahm et al.  developed a heuristic procedure, where is estimated based on a simple plateau-finding algorithm after smoothing the latter plot by some box kernel. They proposed some values for the bandwidth but no study was carried out in order to evaluate possible choices. In this paper we address this issue through a simulation study, by applying the heuristic procedure to nonparametric estimators of the TDC. In addition, we also analyze the performance of the procedure when applied to the estimation of and , as well as, within the context of the referred tail independence tests (Section 4). An illustration with financial data is presented in Section 5. We end with some final remarks (Section 6).
2. Inference on the Extremal (in)dependence
Consider independent and identically distributed (i.i.d.) copies of the random pair . From (3), it is possible to deduce the estimator : By using , with , it can be derived the estimator : where denotes the empirical copula given by with denoting the indicator function and , , the marginal empirical d.f.’s of and , respectively. For more accurate estimates, it is considered See Beirlant et al. [16, Section ] for more details. Note that both estimators depend on the parameter and the number of upper order statistics involved in the estimation. The choice of is of major difficulty within these estimators because of the compromise between variance and bias explained in the introduction. To ensure properties as asymptotic normality and consistency it is necessary to assume that is an intermediate sequence, that is,
We have already seen that, by considering (7), coefficient corresponds to the tail index of the r.v. defined in (8). The tail index estimation has been largely exploited in literature and a survey on this topic can be seen in, for example, Beirlant et al. . The most used estimator within positive tail indexes is the Hill estimator Hill . More precisely, considering in (8) the respective empirical counterparts, we have with and given in (16), , . Thus, considering the order statistics, , the Hill estimator for coefficient is given by Observe that is also a function of the parameter , under the same conditions described above, thus suffering from the same problem involving the bias and variance.
Observe that, from the first equality in (11), we can derive the estimator with given in (19). From the second equality in (11), it is obtained the estimator  where denotes the empirical survival copula, with , , given in (16). Once again, we have dependency on the parameter .
In most of the cases, the TDC estimators do not behave well under asymptotic independence, that is, whenever (see, e.g., Frahm et al.  and Ferreira ). A possible way to deal with this problem is to consider preliminar tests for tail independence. Poon et al.  suggest to test versus , that is, dependence versus independence, based on estimator in (19). Considering an intermediate sequence and under some quite general additional conditions, we have approximately , as . Thus, we reject in favor of , at the significance level , if where denotes the -quantile of .
An analogous test was developed in Draisma et al. , based on relation (5). More precisely, assuming that (5) holds for a function with first derivatives and and considering an intermediate sequence such that , with , then is asymptotically normal with null mean value and variance: Consider with and , , the ordinal statistics of Defining similarly , if (5) holds under the above mentioned conditions, then , where denotes convergence in probability. Moreover, if , then where with corresponding to the Hill estimator of , given in (19). Therefore, for the same test hypotheses, we reject if
Observe that the variance in test (29) includes a correctness factor when compared with the one in (23). This will render its value slightly smaller, making the test more accurate under tail independence, as shall be seen in the simulations afterwards.
3. The Heuristic Procedure
In this section we describe the “plateau-finding” heuristic procedure presented in Frahm et al. . A stability on the sample path of the graph , , for high thresholds (small values of ) is observed once the diagonal section of the copula is expected to be smooth in the neighborhood of and the first derivative approximately constant. However, in order to decrease variance, cannot be too small (see Figure 1). The algorithm proposed in Frahm et al.  aims to identify the plateau, that is, the stability region which is induced by the homogeneity. More precisely, first we smooth the graph by a box kernel with bandwidth consisting of the means of successive points of , . Now, in the smoothed moving average values, , the plateaus with length are defined as , . The algorithm stops at the first plateau fulfilling the criterium with corresponding to the standard deviation of , and the TDC estimate corresponding to If no plateau fulfills the stopping condition, the TDC is estimated as zero.
Observe that, if the diagonal section of the copula follows a power law, the homogeneity of still holds for larger and larger bandwidths may be chosen in order to reduce the variance.
We simulate independent random samples of sizes , from the models:(i)bivariate Normal with and (; , resp.);(ii)bivariate Student t with , and , (, resp.; );(iii)logistic with dependence parameter (; ) Ledford and Tawn [4, 5];(iv)asymmetric logistic with dependence parameter and asymmetry parameters and (; ) Ledford and Tawn [4, 5];(v)Morgenstern with dependence parameter (; ) Ledford and Tawn [4, 5];
We apply the algorithm described in Section 3 to the tail dependence coefficients estimated by , , , and , defined in, respectively, (13), (14), (19), and (21), as well as, to the tail independence tests (23) and (29). In the sequel we denote (23) as test 1 and (29) as test 2. The variances within test 1 and test 2, respectively, and with given in (28), are estimated by applying the algorithm to the plots , , but we pick the plateau at the same location of the one given by the respective coefficient estimation. In all the cases we consider the values . The boundary cases of a bivariate Normal with (tail independent model but with ), and a bivariate Student t with and (tail dependent model with a very low TDC of ) are included in simulations in order to assess the robustness of the method.
Observe in Figures 2 and 3 that estimators and behave quite similarly, although the former seems slightly better. The largest bias occurring for the smallest sample size is around but for the largest one is close to zero, which indicates a good performance. The exception relates to the Normal model, in particular the boundary case of . In the Normal model with , the largest bias is about . For small samples it is preferable to choose bandwidths with or . In all the other simulation results presented here, there are no significant differences between the considered bandwidths.
In what concerns estimators and , the first one is clearly better (Figures 4 and 5). It is also robust within the boundary cases of Student t (, ) and Normal (), for large sample sizes. Observe that the bias and the root mean squared error results are very close to the ones obtained in Draisma et al. , where was chosen in a range where the overall performance seems best through an intensive simulation study. Estimator only slightly outperforms in the Normal model for . The proportion of samples in which tail dependence () is rejected at a 5% significance level is plotted in Figure 6. The heuristic procedure has an overall good performance in both tests for large sample sizes. We can see that, under tail independence, test 2 outperforms test 1 as expected (see Section 2), whereas in the tail dependent case, test 1 is slightly better. However, they do not seem to be robust given the results within the above mentioned boundary cases, particularly in the Normal case.
5. An Application: Dependence of Large Losses within Stock Markets
We consider five years of negative daily log-returns (from 1996 to 2000) of Intel (INTC), Microsoft (MSFT) and General Electric (GE) stocks, which amounts to a sample size . These data were analyzed in McNeil et al. [19, Chapter 5]. We aim to quantify the degree of a contagious risk of large losses within (INTC, MSFT), (INTC, GE), and (MSFT, GE), that is, to investigate if the pairs (INTC, MSFT), (INTC, GE), and (MSFT, GE) present tail dependence or independence and quantify the respective degree of extremal dependence. As a preliminary step, we analyze the scatter plots in Figure 7. Observe that the largest values for one variable correspond to moderately large values of the same sign for the other variable, insinuating that the variables are asymptotically independent but not perfectly. In Table 5 are the estimates of , , , , and . The results correspond to , which are very close to the ones obtained with the other bandwidths () and thus omitted. Both tests reject dependency in (INTC, MSFT) and (INTC, GE). Observe the small values provided by the TDC estimators. In the case (MSFT, GE), test 2 rejects the dependence condition and test 1 does not reject it for very little. The values of and are also small indicating that tail independence may be a more plausible conclusion. Therefore, we find that the contagious risk of large losses is residual, particularly, in the case (INTC, GE).
6. Final Remarks
In this paper we address the tail dependence inference problem since it is important to distinguish the type of tail dependence in order to correctly evaluate the risk of simultaneous extreme events. Most of the nonparametric estimators have to deal with the choice of the number of order statistics to be considered in the production of an estimate. This is not an easy task since it requires a trade-off between variance and bias (small values of cause large variance and large values of increase the bias). An optimal choice of that leads to the smallest mean squared error is difficult to derive and, in practice, this is frequently solved through intensive simulation studies (see, e.g., Draisma et al. ). This is also a very common problem in the estimation of the tail index, a parameter of major importance within extreme value theory (see, e.g., Beirlant et al.  and references therein). Since the nonparametric estimators yield a characteristic plateau while plotting the estimates for successive , Frahm et al.  introduced a simple plateau-finding algorithm after smoothing the latter plot by some box kernel in order to find the optimal threshold . Here we have applied this heuristic procedure to estimators of the TDC in (1), as well as estimators of the tail independence such as the Ledford and Tawn coefficient in (4) and coefficient in (11), for several box kernel bandwidths. We have also analyzed this methodology in two tests for tail independence given in (23) and (29). We conclude that the procedure has an overall good performance, especially for large samples. Some care must be given to the tests as they might not be robust, in particular for boundary cases within the normal model. We call the attention for the very good performance of estimation. We recall that it is based on a tail index estimator (Hill estimator) which may be an indication that this procedure can also work well within the tail index estimation. Since this very simple heuristic procedure revealed some potential, we intend to develop it further and compare with other heuristic methods like, for instance, the graphical method in de Sousa and Michailidis  and bootstrap methods (see, e.g., Peng and Qi  2008 and Gomes and Oliveira  2001 and references therein). This will be addressed in a future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors are very grateful to the referees for their significant suggestions and corrections. Marta Ferreira was financed by FEDER Funds through “Programa Operacional Factores de Competitividade, COMPETE,” and by Portuguese Funds through FCT, “Fundação para a Ciência e a Tecnologia,” within the Project PEst-OE/MAT/UI0013/2014.
- P. Embrechts, A. J. McNeil, and D. Straumann, “Correlation and dependence in risk management: properties and pitfalls,” in Risk Management: Value at Risk and Beyond, M. A. H. Dempster, Ed., pp. 176–223, Cambridge University Press, Cambridge, UK, 2002.
- M. Sibuya, “Bivariate extreme statistics, I,” Annals of the Institute of Statistical Mathematics, vol. 11, pp. 195–210, 1960.
- A. Sklar, “Fonctions de répartition à n dimensions et leurs marges,” Publications de l'Institut de statistique de l'Université de Paris, vol. 8, pp. 229–231, 1959.
- A. W. Ledford and J. A. Tawn, “Statistics for near independence in multivariate extreme values,” Biometrika, vol. 83, no. 1, pp. 169–187, 1996.
- A. W. Ledford and J. A. Tawn, “Modelling dependence within joint tail regions,” Journal of the Royal Statistical Society B, vol. 59, no. 2, pp. 475–499, 1997.
- G. Draisma, H. Drees, A. Ferreira, and L. de Haan, “Bivariate tail estimation: dependence in asymptotic independence,” Bernoulli, vol. 10, no. 2, pp. 251–280, 2004.
- S. Coles, J. Heffernan, and J. Tawn, “Dependence measures for extreme value analysis,” Extremes, vol. 2, pp. 339–366, 1999.
- X. Huang, Statistics of bivariate extreme values [Ph.D. thesis], Tinbergen Institute Research, Erasmus University, Rotterdam, The Netherlands, 1992.
- H. Joe, Multivariate Models and Dependence Concepts, vol. 73 of Monographs on Statistics and Applied Probability, Chapman and Hall, London, UK, 1997.
- G. Frahm, M. Junker, and R. Schmidt, “Estimating the tail-dependence coefficient: properties and pitfalls,” Insurance: Mathematics & Economics, vol. 37, no. 1, pp. 80–100, 2005.
- R. Schmidt and U. Stadtmüller, “Non-parametric estimation of tail dependence,” Scandinavian Journal of Statistics, vol. 33, no. 2, pp. 307–335, 2006.
- L. Peng, “Estimation of the coefficient of tail dependence in bivariate extremes,” Statistics & Probability Letters, vol. 43, no. 4, pp. 399–409, 1999.
- S.-H. Poon, M. Rockinger, and J. Tawn, “Extreme value dependence in financial markets: diagnostics, models, and financial implications,” Review of Financial Studies, vol. 17, no. 2, pp. 581–610, 2004.
- J. Beirlant, F. Caeiro, and M. I. Gomes, “An overview and open research topics in statistics of univariate extremes,” RevStat, vol. 10, no. 1, pp. 1–31, 2012.
- H. Joe, R. L. Smith, and I. Weissman, “Bivariate threshold methods for extremes,” Journal of the Royal Statistical Society B: Methodological, vol. 54, no. 1, pp. 171–183, 1992.
- J. Beirlant, Y. Goegebeur, J. Segers, and J. Teugels, Statistics of Extremes: Theory and Application, John Wiley & Sons, New York, NY, USA, 2004.
- B. M. Hill, “A simple general approach to inference about the tail of a distribution,” Annals of Statistics, vol. 3, no. 5, pp. 1163–1174, 1975.
- M. Ferreira, “Nonparametric estimation of the tail-dependence coefficient,” Revstat Statistical Journal, vol. 11, no. 1, pp. 1–16, 2013.
- A. J. McNeil, R. Frey, and P. Embrechts, Quantitative Risk Management, Princeton Series in Finance, Princeton University Press, Princeton, NJ, USA, 2005.
- B. de Sousa and G. Michailidis, “A diagnostic plot for estimating the tail index of a distribution,” Journal of Computational and Graphical Statistics, vol. 13, no. 4, pp. 974–995, 2004.
- L. Peng and Y. Qi, “Bootstrap approximation of tail dependence function,” Journal of Multivariate Analysis, vol. 99, no. 8, pp. 1807–1824, 2008.
- M. I. Gomes and O. Oliveira, “The bootstrap methodology in statistics of extremes—choice of the optimal sample fraction,” Extremes, vol. 4, no. 4, pp. 331–358, 2001.
Copyright © 2014 Marta Ferreira and Sérgio Silva. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.