- About this Journal
- Abstracting and Indexing
- Aims and Scope
- Article Processing Charges
- Articles in Press
- Author Guidelines
- Bibliographic Information
- Citations to this Journal
- Contact Information
- Editorial Board
- Editorial Workflow
- Free eTOC Alerts
- Publication Ethics
- Reviewers Acknowledgment
- Submit a Manuscript
- Subscription Information
- Table of Contents
Journal of Probability and Statistics
Volume 2010 (2010), Article ID 754851, 26 pages
Local Likelihood Density Estimation and Value-at-Risk
1CREST and University of Toronto, Canada
2York University, Canada
Received 5 October 2009; Accepted 9 March 2010
Academic Editor: Ričardas Zitikis
Copyright © 2010 Christian Gourieroux and Joann Jasiak. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper presents a new nonparametric method for computing the conditional Value-at-Risk, based on a local approximation of the conditional density function in a neighborhood of a predetermined extreme value for univariate and multivariate series of portfolio returns. For illustration, the method is applied to intraday VaR estimation on portfolios of two stocks traded on the Toronto Stock Exchange. The performance of the new VaR computation method is compared to the historical simulation, variance-covariance, and J. P. Morgan methods.
The Value-at-Risk (VaR) is a measure of market risk exposure for portfolios of assets. It has been introduced by the Basle Committee on Banking Supervision (BCBS) and implemented in the financial sector worldwide in the late nineties. By definition, the VaR equals the Dollar loss on a portfolio that will not be exceeded by the end of a holding time with a given probability. Initially, the BCBS has recommended a 10-day holding time (and allowed for computing the VaR at horizon 10 days by rescaling the VaR at a shorter horizon) and loss probability 1%; (see, , page 3),Banks use the VaR to determine the required capital to be put aside for coverage of potential losses. (The required capital reserve is defined as , (see, , page 14 and , page 2), where is a multiplier set equal to 3, and takes a value between 0 and 1 depending on the predictive quality of the internal model used by the bank.) The VaR is also used in portfolio management and internal risk control. Therefore, some banks compute intradaily VaRs, at horizons of one or two hours, and risk levels of 0.5%, or less.
Formally, the conditional Value-at-Risk is the lower-tail conditional quantile and satisfies the following expression: where is the portfolio return between and , denotes the loss probability, and represents the conditional distribution of given the information available at time . Usually, the information set contains the lagged values of portfolio returns. It can also contain lagged returns on individual assets, or on the market portfolio.
While the definition of the VaR as a market risk measure is common to all banks, the VaR computation method is not. In practice, there exist a variety of parametric, semiparametric, and nonparametric methods, which differ with respect to the assumptions on the dynamics of portfolio returns. They can be summarized as follows (see, e.g., ).
(a) Marginal VaR Estimation
The approach relies on the assumption of i.i.d. returns and comprises the following methods.
(1) Gaussian Approach
The VaR is the -quantile, obtained by inverting the Gaussian cumulative distribution function where is the expected return on a portfolio, is the variance of portfolio returns, and is the -quantile of the standard normal distribution. This method assumes the normality of returns and generally underestimates the VaR. The reason is that the tails of the normal distribution are much thinner than the tails of an empirical marginal distribution of portfolio returns.
(2) Historical Simulation (see )
VaR is approximated from a sample quantile at probability , obtained from historical data collected over an observation period not shorter than one year. The advantage of this method is that it relaxes the normality assumption. Its major drawback is that it provides poor approximation of small quantiles at 's such as 1%, for example, as extreme values are very infrequent. Therefore, a very large sample is required to collect enough information about the true shape of the tail. (According to the asymptotic properties of the empirical quantile, at least 200–300 observations, that is, one year, approximately, are needed for = 5% and at least 1000, that is, 4 years are needed for = 1%, both for a Gaussian tail. For fatter tails, even more observations can be required (see, e.g., the discussion in ).
(3) Tail Model Building
The marginal quantile at a small risk level is computed from a parametric model of the tail and from the sample quantile(s) at a larger . For example, McKinsey Inc. suggests to infer the 99th quantile from the 95th quantile by multiplying the latter one by 1.5, which is the weight based on a zero-mean Gaussian model of the tail. This method is improved by considering two tail quantiles. If a Gaussian model with mean and variance is assumed to fit the tail for %, then the , for any %, can be calculated as follows, Let and denote the sample quantiles at risk levels 5% and 10%. From (1.2), the estimated mean and variance in the tail arise as the solutions of the system The marginal VaR at any loss probability less than 10% is calculated as where are solutions of the above system. Equivalently, we get Thus, is a linear combination of sample quantiles and with the weights determined by the Gaussian model of the tail.
This method is parametric as far as the tail is concerned and nonparametric for the central part of the distribution, which is left unspecified.
The marginal VaR estimation methods discussed so far do not account for serial dependence in financial returns, evidenced in the literature. (These methods are often applied by rolling, that is, by averaging observations over a window of fixed length, which implicitly assumes independent returns, with time dependent distributions.)
(b) Conditional VaR Estimation
These methods accommodate serial dependence in financial returns.
(1) J. P. Morgan
The VaR at 5% is computed by inverting a Gaussian distribution with conditional mean zero and variance equal to an estimated conditional variance of returns. The conditional variance is estimated from a conditionally Gaussian IGARCH-type model of volatility , called the Exponentially Weighted Moving Average, where , and parameter is arbitrarily fixed at 0.94 for any portfolio .
(2) CaViar 
The CaViar model is an autoregressive specification of the conditional quantile. The model is estimated independently for each value of , and is nonparametric in that respect.
(3) Dynamic Additive Quantile (DAQ) 
This is a parametric, dynamic factor model of the conditional quantile function.
Table 1 summarizes all the aforementioned methods.
This paper is intended to fill in the empty cell in Table 1 by extending the tail model building method to the conditional Value-at-Risk. To do that, we introduce a parametric pseudomodel of the conditional portfolio return distribution that is assumed valid in a neighbourhood of the VaR of interest. Next, we estimate locally the pseudodensity, and use this result for calculating the conditional VaRs in the tail.
The local nonparametric approach appears preferable to the fully parametric approaches for two reasons. First, the nonparametric methods are too sensitive to specification errors. Second, even if the theoretical rate of convergence appears to be smaller than that of a fully parametric method (under the assumption of no specification error in the latter one), the estimator proposed in this paper is based on a local approximation of the density in a neighborhood where more observations are available than at the quantile of interest.
The paper is organized as follows. Section 2 presents the local estimation of a probability density function from a misspecified parametric model. By applying this technique to a Gaussian pseudomodel, we derive the local drift and local volatility, which can be used as inputs in expression (1.2). In Section 3, the new method is used to compute the intraday conditional Value-at-Risk for portfolios of two stocks traded on the Toronto Stock Exchange. Next, the performance of the new method of VaR computation is compared to other methods, such as the historical simulation, Gaussian variance-covariance method, J. P. Morgan IGARCH, and ARCH-based VaR estimation in Monte Carlo experiments. Section 4 discusses the asymptotic properties of the new nonparametric estimator of the log-density derivatives. Section 5 concludes the paper. The proofs are gathered in Appendices.
2. Local Analysis of the Marginal Density Function
The local analysis of a marginal density function is based on a family of pseudodensities. Among these, we define the pseudodensity, which is locally the closest to the true density. Next, we define the estimators of the local pseudodensity, and show the specific results obtained for a Gaussian family of pseudodensities.
2.1. Local Pseudodensity
Let us consider a univariate or multivariate random variable , with unknown density , and a parametric multivariate family of densities , called the family of pseudodensities where the parameter set . This family is generally misspecified. Our method consists in finding the pseudodensity , which is locally the closest to the true density. To do that we look for the local pseudo-true value of parameter .
In the first step, let us assume that variable is univariate and consider an approximation on an interval , centered at some value of variable . The pseudodensity is derived by optimizing the Kullback-Leibler criterion evaluated from the pseudo and true densities truncated over . The pseudo-true value of is where denotes the expectation taken with respect to the true probability density function (pdf, henceforth) . The pseudo-true value depends on the pseudofamily, the true pdf, the bandwidth, and the location . The above formula can be equivalently rewritten in terms of the uniform kernel . This leads to the following extended definition of the pseudo-true value of the parameter which is valid for vector of any dimension , kernel , bandwidth , and location : Let us examine the behavior of the pseudo-true value when the bandwidth tends to zero.
Definition 2.1 :. (i) The local parameter function (l.p.f.) is the limit of when tends to zero, given by
when this limit exists.
(ii) The local pseudodensity is .
The local parameter function provides the set of local pseudo-true values indexed by , while the local pseudodensity approximates the true pdf in a neighborhood of . Let us now discuss some properties of the l.p.f.
Proposition 2.2 :. Let one assume the following:(A.1) There exists a unique solution to the objective function maximized in (2.2) for any , and the limit exists.(A.2) The kernel is continuous on , of order 2, such that , positive definite. (A.3) The density functions and are positive and third-order differentiable with respect to . (A.4), and, for any in the support of ,(A.5) For small and any , the following integrals exist: , , , and are twice differentiable under the integral sign with respect to . Then, the local parameter function is a solution of the following system of equations:
Proof. See Appendix A.
The first-order conditions in Proposition 2.2 show that functions and have the same derivatives at . When is strictly larger than , the first-order conditions are not sufficient to characterize the l.p.f.
Assumption (A.1) is a local identification condition of parameter . As shown in the application given later in the text, it is verified to hold for standard pseudofamilies of densities such as the Gaussian, where has a closed form. (The existence of a limit is assumed for expository purpose. However, the main result concerning the first-order conditions is easily extended to the case when exists, with a compact parameter set . The proof in Appendix A shows that, even if the does not exist, we get This condition would be sufficient to define a local approximation to the log-derivative of the density.)
It is known that a distribution is characterized by the log-derivative of its density due to the unit mass restriction. This implies the following corollary.
Corollary 2.3. The local parameter function characterizes the true distribution.
2.2. Estimation of the Local Parameter Function and of the Log-Density Derivative
Suppose that are observations on a strictly stationary process of dimension . Let us denote by the true marginal density of and by a (misspecified) pseudoparametric family used to approximate . We now consider the l.p.f. characterization of , and introduce nonparametric estimators of the l.p.f. and of the marginal density.
The estimator of the l.p.f. is obtained from formula (2.2), where the theoretical expectations are replaced by their empirical counterparts: The above estimator depends on the selected kernel and bandwidth. This estimator allows us to derive from Proposition 2.2 a new nonparametric consistent estimator of the log-density derivative defined as The asymptotic properties of the estimators of the l.p.f. and log-density derivatives are discussed in Section 4,for the exactly identified case . In that case, is characterized by the system of first-order conditions (2.7).
The quantity is generally a nonconsistent estimator of the density at (see, e.g.,  for a discussion of such a bias in an analogous framework). However, a consistent estimator of the log-density (and thus of the density itself, obtained as the exponential function of the log-density) is derived directly by integrating the estimated log-density derivatives under the unit mass restriction. This offers a correction for the bias, and is an alternative to including additional terms in the objective function (see, e.g., [7, 8]).
2.3. Gaussian Pseudomodel
A Gaussian family is a natural choice of pseudomodel for local analysis, as the true density is locally characterized by a local mean and a local variance-covariance matrix. Below, we provide an interpretation of the Gaussian local density approximation. Next, we consider a Gaussian pseudomodel parametrized by the mean only, and show the relationship between the l.p.f. estimator and two well-known nonparametric estimators of regression and density, respectively.
For a Gaussian pseudomodel indexed by mean and variance , we have
Thus, the approximation associated with a Gaussian pseudofamily is the standard one, where the partial derivatives of the log-density are replaced by a family of hyperplanes parallel to the tangent hyperplanes. These tangent hyperplanes are not independently defined, due to the Schwartz equality
The Schwartz equalities are automatically satisfied by the approximated densities because of the symmetry of matrix .
(ii) Gaussian Pseudomodel Parametrized by the Mean and Gaussian Kernel
Let us consider a Gaussian kernel: of dimension , where denotes the pdf of the standard Normal .
Proposition 2.4. The l.p.f. estimator for a Gaussian pseudomodel parametrized by the mean and with a Gaussian kernel can be written as where is the Nadaraya-Watson estimator of the conditional mean , and is the Gaussian kernel estimator of the unknown value of the true marginal pdf at .
Proof. See Appendix B.
In this special case, the asymptotic properties of follow directly from the asymptotic properties of and . In particular, converges to , when and tend to infinity and zero, respectively, with .
Alternatively, the asymptotic behavior can be inferred from the Nadaraya-Watson estimator [10, 11] in the degenerate case when the regressor and the regressand are identical. Section 5 will show that similar relationships are asymptotically valid for non-Gaussian pseudofamilies.
2.4. Pseudodensity over a Tail Interval
Instead of using the local parameter function and calibrating the pseudodensity locally about a value, one could calibrate the pseudodensity over an interval in the tail. (We thank an anonymous referee for this suggestion.) More precisely, we could define a pseudo-true parameter value where denotes the survival function, and consider an approximation of the true distribution over a tail interval , for . From a theoretical point of view, this approach can be criticized as it provides different approximations of depending on the selected value of , .
3. From Marginal to Conditional Analysis
Section 2 described the local approach to marginal density estimation. Let us now show the passage from the marginal to conditional density analysis and the application to the conditional VaR.
3.1. General Approach to VaR Computation
The VaR analysis concerns the future return on a given portfolio. Let denote the return on that portfolio at date . In practice, the prediction of is based on a few summary statistics computed from past observations, such as a lagged portfolio return, realized market volatility, or realized idiosyncratic volatility in a previous period. The application of our method consists in approximating locally the joint density of series , whose component is , and component contains the summary statistics, denoted by . Next, from the marginal density of , that is, the joint density of and , we derive the conditional density of given , and the conditional VaR.
The joint density is approximated locally about which is a vector of two components, . The first component is a tail value of portfolio returns, such as the 5% quantile of the historical distribution of portfolio returns, for example, if the conditional VaR at needs to be found. The second component is the value of the conditioning set, which is fixed, for example, at the last observed value of the summary statistics in . Due to the difference in interpretation, the bandwidths for and need to be different.
The approach above does not suffer from the curse of dimensionality. Indeed, in practice, is univariate, and the number of summary statistics is small (often less than 3), while the number of observations is sufficiently large (250 per year) for a daily VaR.
3.2. Gaussian Pseudofamily
When the pseudofamily is Gaussian, the local approximation of the density of is characterized by the local mean and variance-covariance matrix. For , these moments are decomposed by blocks as follows: The local conditional first and second-order moments are functions of these joint moments: When is univariate, these local conditional moments can be used as inputs in the basic Gaussian VaR formula (1.2).
The method is convenient for practitioners, as it suggests them to keep using the misspecified Gaussian VaR formula. The only modifications are the inputs, which become the local conditional mean and variance in the tail that are easy to calculate given the closed-form expressions given above.
Even though the theoretical approach is nonparametric, its practical implementation is semi-parametric. This is because, once an appropriate location has been selected, the local pseudodensity estimated at is used to calculate any VaR in the tail. Therefore, the procedure can be viewed as a model building method, in which the two benchmark loss probabilities are arbitrarily close. As compared with other model building approaches, it allows for choosing a location with more data-points in its neighborhood than the quantile of interest.
4. Application to Value-at-Risk
The nonparametric feature of our localized approach requires the availability of a sufficient number of observations in a neighborhood of the selected . This requirement is easily satisfied when high-frequency data are used and an intraday VaR is computed. We first consider an application of this type. It is followed by a Monte-Carlo study, which provides information on the properties of the estimator when the number of observations is about 200, which is the sample size used in practice for computing the daily VaR.
4.1. Comparative Study of Portfolios
We apply the local conditional mean and variance approach to intraday data on financial returns and calculate the intraday Value-at-Risk. The financial motivation for intraday risk analysis is that internal control of the trading desks and portfolio management is carried out continuously by banks, due to the use of algorithmic trading that implements automatic portfolio management, based on high-frequency data. Also, the BCBS in [2, page 3], suggests that a weakness of the current (daily) risk measure is that it is based on the end-of-day positions, and disregards the intraday trading risk. It is known that intraday stock price variation can be often as high as the variation of the market closure prices over 5 to 6 consecutive days.
Our analysis concerns two stocks traded on the Toronto Stock Exchange: the Bank of Montreal (BMO) and the Royal Bank (ROY) from October 1st to October 31, 1998, and all portfolios with nonnegative allocations in these two stocks. This approach under the no-short-sell constraint will suffice to show that allocations of the least risky portfolios differ, depending on the method of VaR computation.
From the tick-by-tick data, we select stock prices at a sampling interval of two minutes, and compute the two minute returns . The data contain a large proportion of zero price movements, which are not deleted from the sample, because the current portfolio values have to be computed from the most recent trading prices.
The BMO and ROY sample consists of 5220 observations on both returns from October 1 to October 31, 1998. The series have equal means of zero. The standard deviations are 0.0015 and 0.0012 for BMO and ROY, respectively. To detect the presence of fat tails, we calculate the kurtosis, which is 5.98 for BMO and 3.91 for ROY, and total range, which is 0.0207 for BMO and 0.0162 for ROY. The total range is approximately 50 (for BMO) and 20 (for ROY) times greater than the interquartile range, equal to 0.0007 in both samples.
The objective is to compute the VaR for any portfolio that contains these two assets. Therefore, has two components; each of which is a bivariate vector. We are interested in finding a local Gaussian approximation of the conditional distribution of given in a neighborhood of values of and of (which does not mean that the conditional distribution itself is Gaussian) . We fix . Because a zero return is generally due to nontrading, by conditioning on zero past returns, we investigate the occurrence of extreme price variations after a non-trading period. As a significant proportion of returns is equal to zero, we eliminate smoothing with respect to these conditioning values in our application.
The local conditional mean and variance estimators were computed from formulae (3.2)-(3.3) for and , which are the 90% upper percentiles of the sample distribution of each return on the dates preceded by zero returns. The bandwidth for was fixed at , proportionally to the difference between the 10% and 1% quantiles. The estimates are
They can be compared to the global conditional moments of the returns, which are the moments computed from the whole sample, , . Their estimates are As the conditional distribution of given has a sharp peak at zero, it comes as no surprise that the global conditional moments estimators based on the whole sample lead to smaller Values-at-Risk than the localized ones. More precisely, for loss probability 5% and a portfolio with allocations , , in the two assets, the Gaussian VaR is given by and determines the required capital reserve for loss probability 5%. Figure 1 presents the Values-at-Risk computed from the localized and unlocalized conditional moments, for any admissible portfolios of nonnegative allocations. The proportion invested in the BMO is measured on the horizontal axis.
As expected, the localized VaR lies far above the unlocalized one. This means that the localized VaR implies a larger required capital reserve. We also note that, under the unlocalized VaR, the least risky portfolio contains equal allocations in both assets. In contrast, the localized measure suggests to invest the whole portfolio in a single asset to avoid extreme risks (under the no-short-sell constraint).
4.2. Monte-Carlo Study
The previous application was based on a quite large number of data (more than 5000) on trades in October 1998 and risk level of 5%. It is natural to assess the performance of the new method in comparison to other methods of VaR computation, for smaller samples, such as 200 (resp. 400) observations that correspond to one year (resp., two years) of daily returns and for a smaller risk level of 1%.
A univariate series of 1000 simulated portfolio returns is generated from an ARCH(1) model, with a double exponential (Laplace) error distribution. More precisely, the model is where the errors are i.i.d. with pdf The error distribution has exponential tails that are slightly heavier than the tails of a Gaussian distribution. The data generating process are assumed to be unknown to the person who estimates the VaR. In practice, that person will apply a method based on a misspecified model (such as the i.i.d. Gaussian model of returns in the Gaussian variance-covariance method or the IGARCH model of squared returns by J. P. Morgan with an ad-hoc fixed parameter 0.94). Such a procedure leads to either biased, or inefficient estimators of the VaR level.
The following methods of VaR computation at risk level of 1% are compared. Methods 1 to 4 are based on standard routines used in banks, while method 5 is the one proposed in this paper.
(1) The historical simulation based on a rolling window of 200 observations. We will see later (Figure 2) that this approach results in heavy smoothing with respect to time. A larger bandwidth would entail even more smoothing.
(2) The Gaussian variance-covariance approach based on the same window.
(3) The IGARCH-based method by J. P. Morgan:
(4) Two conditional ARCH-based procedures that consist of the following steps. First, we consider a subset of observations to estimate an ARCH(1) model: where are i.i.d. with an unknown distribution. First, the parameters and are estimated by the quasi-maximum likelihood, and the residuals are computed. From the residuals we infer the empirical 1% quantile , say. The VaR is computed as . We observe that the ARCH parameter estimators are very inaccurate, which is due to the exponential tails of the error distribution. Two subsets of data were used to estimate the ARCH parameters and the 1%-quantile. The estimator values based on a sample of 200 observations are . The estimator values based on a sample of 800 observations are . We find that the ratios are quite far from the true value used to generate the data, which is likely due to fat tails.
(5) Localized VaR.
We use a Gaussian pseudofamily, a Gaussian kernel, and two different bandwidths for the current and lagged value of returns, respectively. The bandwidths were set proportional to the difference between the 10% and 1% quantiles, and the bandwidth for the lagged return is 4 times the bandwidth for the current return. Their values are 1.16 and 4.64, respectively. We use a Gaussian kernel (resp., a simple bandwidth) instead of an optimal kernel (resp., an optimal bandwidth) for the sake of robustness. Indeed, an optimal approach may not be sufficiently robust for fixing the required capital. Threshold is set equal to the 3%-quantile of the marginal empirical distribution. The localized VaR's are computed by rolling with a window of 400 observations.
For each method, Figures 2, 3, 4, 5, 6 and 7 report the evolution of the true VaR corresponding to the data generating model along with the evolution of the estimated VaR. For clarity, only 200 data points are plotted.
The true VaR features persistence and admits extreme values. The rolling methods such as the historical simulation and variance-covariance method produce stepwise patterns of VaR, as already noted, for example, by Hull and White . These patterns result from the i.i.d. assumption that underlies the computations. The J. P. Morgan IGARCH approach creates spurious long memory in the estimated VaR and is not capable to recover the dynamics of the true VaR series. The comparison of the two ARCH-based VaR's shows that the estimated paths strongly depend on the estimated ARCH coefficients. When the estimators are based on 200 observations, we observe excess smoothing. When the estimators are based on 800 observations, the model is able to recover the general pattern, but overestimates the VaR when it is small and underestimates the VaR when it is large. The outcomes of the localized VaR method are similar to the second ARCH model, with a weaker tendency to overestimate the VaR when it is small.
The comparison of the different approaches shows the good mimicking properties of the ARCH-based methods and of the localized VaR. However, these methods need also to be compared with respect to their tractability. It is important to note that the ARCH parameters were estimated only once and were kept fixed for future VaR computations. The approach would become very time consuming if the ARCH model was reestimated at each point in time. In contrast, it is very easy to regularly update the localized VaR.
5. Properties of the Estimator of the Local Parameter Function
5.1. Asymptotic Properties
In this section, we discuss the asymptotic properties of the local pseudomaximum likelihood estimator under the following strict stationarity assumption.
Assumption. The process is strictly stationary, with marginal pdf .
Let us note that the strict stationarity assumption is compatible with nonlinear dynamics, such as in the ARCH-GARCH models, stochastic volatility models, and so forth, All proofs are gathered in Appendices.
The asymptotic properties of the local P. M. L. estimator of are derived along the following lines. First, we find the asymptotic equivalents of the objective function and estimator, that depend only on a limited number of kernel estimators. Next, we derive the properties of the local P. M. L. estimator from the properties of these basic kernel estimators. As the set of assumptions for the existence and asymptotic normality of the basic kernel estimators for multivariates dependent observations can be found in the literature (see the study by Bosq in ), we only list in detail the additional assumptions that are necessary to satisfy the asymptotic equivalence. The results are derived under the assumption that is exactly identified (see Assumptions 5.2 and 5.3). (In the overidentified case , the asymptotic analysis can be performed by considering the terms of order in the expansion of the objective function (see Appendix A), which is out of the scope of this paper.)
Let us introduce the additional assumptions.
Assumption. The parameter set is a compact set and .
Assumption. (i) There exists a unique solution of the system of equations:
and this solution belongs to the interior of .
(ii) The matrix is nonsingular.
Assumption. The following kernel estimators are strongly consistent: (i)(ii)(iii)
Assumption. In any neighbourhood of , the third-order derivatives varying, are dominated by a function such that is integrable.
Proof. See Appendix C.
It is possible to replace the set of Assumptions 5.4 by sufficient assumptions concerning directly the kernel, the true density function , the bandwidth , and the process. In particular it is common to assume that the process is geometrically strong mixing, and that , when tends to infinity (see [13–15]).
Proposition 5.7. Under Assumptions 5.1–5.5 the local pseudomaximum likelihood estimator is asymptotically equivalent to the solution of the equation: where: is the Nadaraya-Watson estimator of based on the kernel .
Proof. See Appendix D.
Therefore the asymptotic distribution of may be derived from the properties of , which are the properties of the Nadaraya-Watson estimator in the degenerate case when the regressand and the regressor are identical. Under standard regularity conditions , the numerator and denominator of have the following asymptotic properties.
Assumption. If , , , and , we have the limiting distribution
The formulas of the first- and second-order asymptotic moments are easy to verify (see Appendix E). (Assumption 5.8 is implied by sufficient conditions concerning the kernel,the process... (see, ). In particular it requires some conditions on the multivariate distribution of the process such as where denotes the joint p.d.f. of and the associated product of marginal distributions, and , where denotes the joint p.d.f of ) Note that the rate of convergence of the numerator is slower than the rate of convergence of the denominator since we study a degenerate case, when the Nadaraya-Watson estimator is applied to a regression with the regressor equal to the regressand.
We deduce that the asymptotic distribution of is equal to the asymptotic distribution of which is .
By the -method we find the asymptotic distribution of the local pseudomaximum likelihood estimator and the asymptotic distribution of the log-derivative of the true p.d.f..
The first-order asymptotic properties of the estimator of the log-derivative of the density function do not depend on the pseudofamily, whereas the value of the estimator does. (It is beyond the scope of this paper to discuss the effect of the pseudofamily when dimension is strictly larger than . Nevertheless, by analogy to the literature on local estimation of nonparametric regression and density functions (see, e.g., the discussion in ), we expect that the finite sample bias in the associated estimator of the density will diminish when the pseudofamily is enlarged, that is, when the dimension of the pseudoparameter vector increases.) For a univariate proces , the functional estimator of the log-derivative may be compared to the standard estimator where is the derivative of the kernel of the standard estimator. The standard estimator has a rate of convergence equal to that of the estimator introduced in this paper and the following asymptotic distribution: The asymptotic distributions of the two estimators of the log-derivative of the density function are in general different, except when , which, in particular, arises when the kernel is Gaussian. In such a case the asymptotic distributions of the estimators are identical.
5.2. Asymptotic versus Finite Sample Properties
In kernel-based estimation methods, the asymptotic distributions of estimators do not depend on serial dependence and are computed as if the data were i.i.d. However, serial dependence affects the finite sample properties of estimators and the accuracy of the theoretical approximation. Pritsker  (see also work by Conley et al. in ) illustrates this point by considering the finite sample properties of Ait-Sahalia's test of continuous time model of the short-term interest rate  in an application to data generated by the Vasicek model.
The impact of serial correlation depends on the parameter of interest, in particular on whether this parameter characterizes the marginal or the conditional density. This problem is not specific to the kernel-based approaches, but arises also in other methods such as the OLS. To see that, consider a simple autoregressive model , where is IIN. The expected value of is commonly estimated from the empirical mean that has asymptotic variance , where . In contrast, the autoregressive coefficient is estimated by and has asymptotic variance .
If serial dependence is disregarded, both estimators and have similar asymptotic efficiencies that are and , respectively. However, when tends to one while remains fixed, the variance of tends to infinity whereas the variance of tends to zero. This simple example shows that omission of serial dependence does not have the same effect on the marginal parameters as opposed to the conditional ones. Problems considered by Conley et al.  or Pritsker  concern the marginal (long run) distribution of , while our application is focused on a conditional parameter, which is the conditional VaR. This parameter is derived from the analysis of the joint pdf as in the previous example was derived from the bivariate vector . Due to cointegration between and in the case of extreme persistence, we can reasonably expect that the estimator of the conditional VaR has good finite sample properties, even when the point estimators do not. The example shows that in finite sample the properties of the estimator of a conditional parameter can be even better than those derived under the i.i.d. assumption.
This paper introduces a local likelihood method of VaR computation for univariate or multivariate data on portfolio returns. Our approach relies on a local approximation of the unknown density of returns by means of a misspecified model. The method allows us to estimate locally the conditional density of returns, and to find the local conditional moments, such as a tail mean and tail variance. For a Gaussian pseudofamily, these tail moments can replace the global moments in the standard Gaussian formula used for computing the VaR's. Therefore, our method based on the Gaussian pseudofamily is convenient for practitioners, as it justifies computing the VaR from the standard Gaussian formula, although with a different input, which accommodates both the thick tails and path dependence of financial returns. The Monte-Carlo experiments indicate that tail-adjusted VaRs are more accurate than other VaR approximations used in the industry.
A. Proof of Proposition 2.2
Let us derive the expansion of the objective function when approaches zero. By using the equivalence (see Assumption (A.1)) where is the trace operator, we find that The result follows.
The expansion above provides a local interpretation of the asymptotic objective function at order as a distance between the first-order derivatives of the logarithms of the pseudo and true pdf's. In this respect the asymptotic objective function clearly differs from the objective function proposed by Hjort and Jones , whose expansion defines an -distance between the true and pseudo pdfs.
B. Proof of Proposition 2.4
For a Gaussian kernel of dimension , we get
We have Moreover we have
Let us consider the normalized objective function It can be written as where are the residual terms in the expansion. We deduce: Under the assumptions of Proposition 5.7,the residual terms tend almost surely to zero, uniformly on , while the main terms tend almost surely uniformly on to which is identical to (see Appendix A).
Then, by Jennrich theorem  and the identifiability condition, we conclude that the estimator exists and is strongly consistent of .
D. Asymptotic Equivalence
The main part of the objective function may also be written as We deduce that the local parameter function can be asymptotically replaced by the solution of
E. The First- and Second-Order Asymptotic Moments
Let us restrict the analysis to the numerator term , which implies the nonstandard rate of convergence.
(1) First-Order Moment
(2) Asymptotic Variance
We have which provides the rate of convergence of the standard error. Moreover the second term of the bias will be negligible if or .
(3) Asymptotic Covariance
Finally we have also to consider:
The authors gratefully acknowledge financial support of NSERC Canada and of the Chair AXA/Risk Foundation: “Large Risks in Insurance”.
- Basle Committee on Banking Supervision, An Internal Model-Based Approach to Market Risk Capital Requirements, Basle, Switzerland, 1995.
- Basle Committee on Banking Supervision, Overview of the Amendment to the Capital Accord to Incorporate Market Risk, Basle, Switzerland, 1996.
- C. Gourieroux and J. Jasiak, “Value-at-risk,” in Handbook of Financial Econometrics, Y. Ait-Sahalia and L. P. Hansen, Eds., pp. 553–609, Elsevier, Amsterdam, The Netherlands, 2009.
- J. P. Morgan, RiskMetrics Technical Manual, J.P. Morgan Bank, New York, NY, USA, 1995.
- R. F. Engle and S. Manganelli, “CAViaR: conditional autoregressive value at risk by regression quantiles,” Journal of Business & Economic Statistics, vol. 22, no. 4, pp. 367–381, 2004.
- C. Gourieroux and J. Jasiak, “Dynamic quantile models,” Journal of Econometrics, vol. 147, no. 1, pp. 198–205, 2008.
- N. L. Hjort and M. C. Jones, “Locally parametric nonparametric density estimation,” The Annals of Statistics, vol. 24, no. 4, pp. 1619–1647, 1996.
- C. R. Loader, “Local likelihood density estimation,” The Annals of Statistics, vol. 24, no. 4, pp. 1602–1618, 1996.
- B. W. Silverman, Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1986.
- E. Nadaraya, “On estimating regression,” Theory of Probability and Its Applications, vol. 10, pp. 186–190, 1964.
- G. S. Watson, “Smooth regression analysis,” Sankhyā. Series A, vol. 26, pp. 359–372, 1964.
- J. Hull and A. White, “Incorporating volatility updating into the historical simulation for VaR,” The Journal of Risk, vol. 1, pp. 5–19, 1998.
- D. Bosq, Nonparametric Statistics for Stochastic Processes, vol. 110 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1996.
- G. G. Roussas, “Nonparametric estimation in mixing sequences of random variables,” Journal of Statistical Planning and Inference, vol. 18, no. 2, pp. 135–149, 1988.
- G. G. Roussas, “Asymptotic normality of the kernel estimate under dependence conditions: application to hazard rate,” Journal of Statistical Planning and Inference, vol. 25, no. 1, pp. 81–104, 1990.
- M. Pritsker, “Nonparametric density estimation and tests of continuous time interest rate models,” Review of Financial Studies, vol. 11, no. 3, pp. 449–487, 1998.
- T. G. Conley, L. P. Hansen, and W.-F. Liu, “Bootstrapping the long run,” Macroeconomic Dynamics, vol. 1, no. 2, pp. 279–311, 1997.
- Y. Aït-Sahalia, “Testing continuous-time models of the spot interest rate,” Review of Financial Studies, vol. 9, no. 2, pp. 385–426, 1996.
- R. I. Jennrich, “Asymptotic properties of non-linear least squares estimators,” The Annals of Mathematical Statistics, vol. 40, pp. 633–643, 1969.