Quantile Regression and Beyond in Statistical Analysis of the DataView this Special Issue
Fully Bayesian Estimation of Simultaneous Regression Quantiles under Asymmetric Laplace Distribution Specification
In this paper, we are interested in estimating several quantiles simultaneously in a regression context via the Bayesian approach. Assuming that the error term has an asymmetric Laplace distribution and using the relation between two distinct quantiles of this distribution, we propose a simple fully Bayesian method that satisfies the noncrossing property of quantiles. For implementation, we use Metropolis-Hastings within Gibbs algorithm to sample unknown parameters from their full conditional distribution. The performance and the competitiveness of the underlying method with other alternatives are shown in simulated examples.
Quantile regression has received increasing attention from both theoretical and empirical points of view. It is a statistical procedure intended to estimate conditional quantile of a response distribution, given a set of covariates, and to conduct inference about them.
In regression, when we deal with highly skewed data distribution, traditional mean regression may not explore interesting aspects of the relationship between the response variable and the available covariates. Furthermore, in the presence of outliers, the evaluation of the response average becomes much more complicated. However, since a set of quantiles often provides a complete description of the response distribution conditionally to the given covariates, quantile regression offers a practical alternative to the traditional mean regression.
Quantile regression was initially introduced by Koenker and Bassett , where inference proceeds by minimizing a check loss function. Based on their theory, many other approaches have been emerged to study the regression quantiles in both frequentist and Bayesian points of view.
In the frequentist framework, different techniques have been developed in the literature to infer regression quantiles. To estimate linear regression quantiles, Koenker and Bassett  used linear programming technique, whereas Gutenbrunner and Jurecková  considered the corresponding dual problem whose solution represents the quantile coefficients estimator. From theoretical point of view, asymptotic normality and consistency properties are proven in Koenker and Bassett , Zhou et al. , and Koenker and Xiao . For nonparametric quantile regression, spline methods are commonly used for quantile regression fitting (see Nychka et al. , Koenker and Portnoy , and Koenker et al. ). Oh et al.  proposed, in univariate and bivariate settings, an approximation of the nonlinear optimization problem by a sequence of penalized least squares type nonparametric mean estimation problems. Koenker and Mizera  explored the estimation of triograms in bivariate setting. Koenker et al.  proposed an additive models estimation method for quantile regression based on selecting the smoothing parameter and constructing confidence bands for nonparametric components, in univariate and bivariate settings. The univariate regression quantile spline is extended to problems with several covariates in He and Ng  and Li et al.  where, in this latter, regression quantiles are considered in reproducing kernel Hilbert spaces (RKHS).
In the Bayesian framework, different methods have been successfully emerged to study quantile regression. This was pioneered by Yu and Moyeed , where the authors specified the Bayesian quantile regression model and assumed that the error terms are iid according to the asymmetric Laplace distribution (ALD), denoted by with being its th quantile, is the scale parameter, and are the asymmetry parameters. They also showed that the Bayesian approach yields a proper full conditional distribution, even for an improper uniform prior on the quantile coefficients and they used the Metropolis-Hastings algorithm for implementation. Further, Kozumi and Kobayashi  and Bernardi et al.  considered that the scale parameter of the ALD is unknown. They used the location-scale mixture representation of the ALD, which enables developing a Gibbs sampling algorithm for the Bayesian quantile regression model. In the literature, we can also find other methods that extended quantile regression model. For censored data, Kobayashi et al.  proposed Tobit quantile regression with endogenous variables and Li and Bradic  developed regression adjustment for local adaptive representation of random forests. In continuous dependent variable case and for variable selection and coefficient estimation, Alhamzawi et al.  used adaptive LASSO quantile regression and, recently, Sayed-Ahmed  applied this approach for small sample sizes, whereas Abbas and Thaher  developed Bayesian adaptive Tobit regression. Benoit and Van den Poel  proposed Bayesian quantile regression methods for binary response data and Alhamzawi and Ali  adapted the quantile regression model to deal with longitudinal ordinal data.
All these methods mentioned above, from both frequentist and Bayesian points of view, studied single regression quantile. However, there are many applications where tackling several quantiles at the same time is needed. For this direction, if one uses methods that deal with single quantile regression to estimate regression quantiles separately, results may not be satisfactory. Indeed, it is known that the separate estimation of a set of quantiles may break the quantile monotonicity property (see He ), which means that quantile curves may cross.
Fortunately, many approaches have been proposed to overcome the crossing problem. In the frequentist framework, Liu and Wu  and Bondell et al.  considered monotonicity constraints for a fixed number of quantiles in both linear and nonparametric cases. Sangnier et al.  proposed a nonparametric method that estimates multiple conditional quantiles simultaneously when assuming that the quantile functions spans an RKHS. In this work, they choose an appropriate matrix-valued kernel taking into account the distance between quantiles.
On the other hand, in simultaneous Bayesian framework, Reich et al.  proposed a two-stage semiparametric method for linear quantile regression, which is one of the first methods addressing the crossing problem of quantiles. At the first stage, they used Koenker and Bassett’s  method to estimate quantile coefficients. At the second stage, these coefficients are reestimated with Bernstein polynomial basis function with constraints on their coefficients. Later, Reich  proposed a fully Bayesian approach based on interpolating the quantile function on different quantile levels using a piecewise Gaussian basis function. Tokdar et al.  proposed a simultaneous linear quantile regression method for a univariate explanatory variable with a bounded and convex domain. They showed that a linear quantile, , is monotonically increasing if and only if and are linear combination of two monotonic increasing functions in . Afterwards, Yang and Tokdar  extended this method to the multivariate explanatory case. Another two-stage method is proposed by Rodrigues and Fan . At the first stage, they used the standard Bayesian quantile regression of Yu and Moyeed  fitted separately at the different quantile levels. They introduced induced quantiles, at the second stage, using a relation between quantiles of the ALD distribution. At the same stage, they used the Gaussian process regression that monotonizes quantile functions by borrowing strength from the nearby ones. However, this method is still affected by the outputs of the first stage since different likelihoods are used for initial estimates. For nonparametric quantile regression, a cubic B-spline method is considered by Muggeo et al. , solving an penalisation problem with monotonicity constraint on spline coefficients. Many recent works on simultaneous Bayesian quantile regression can be found in literature such as Das and Ghosal , Das and Ghosal , Rodrigues et al. , and Petrella and Raponi .
In this paper, our contribution aims at developing a fully Bayesian approach that estimates multiple quantiles simultaneously in one step. Unlike the other Bayesian methods, our proposed method is not based on misspecification, i.e., the ALD is the real underlying response distribution. We used a relation between quantiles of the ALD distribution and Metropolis-Hastings within Gibbs algorithm for implementation. Our proposed method provides parallel quantile functions estimators and, thus, the noncrossing is a natural result. Our numerical results show, using specific empirical criteria, that the estimated regression quantiles respect the quantile level ordering.
The paper is organised as follows. Section 2 describes the method presenting the model and establishing the relation between quantiles of the ALD distribution used in the estimation procedure. The algorithm associated with the proposed method is given in Section 3, where we introduce the location-scale mixture representation of the ALD. In Section 4, simulated examples are provided to show the performance and the flexibility of the method, and the final section presents our conclusion.
2. Simultaneous Bayesian Estimation of Quantiles
We consider the quantile regression modelwith , where , , and , respectively, are the location, the scale, and the asymmetry parameters. This leads to assuming that the response variable , given the covariable , follows an ALD distribution:whose probability density function (p.d.f.) is given bywhere is the check loss function. The explanatory variables ’s are supposed to be iid distributed according to an arbitrary continuous distribution, , whose support is , , . In what follows, the function and the parameters , , and are supposed to be unknown.
In all subsequent sections, any quantity in bold represents a vector; capital letters denote random variables or vectors, whereas lowercase letters denote observed values of the corresponding random vectors.
2.2. Bayesian Procedure
To infer simultaneously distinct quantiles with a fully Bayesian approach, say with distinct orders , of the conditional distribution of , one needs to characterize the likelihood through all these unknown quantiles.
As explained below, this is done first by partitioning the whole sample into subsamples, which requires a sufficient number of observations, second by using the relation between any two quantiles of the ALD distribution, and third by rewriting the whole likelihood through terms, where the -th term only depends on .(1)Consider the following partition of the whole sample through subsamples: where with , , and ; suppose that is integer, without loss of generality.(2)To characterize the likelihood through all quantiles of interest, we use the relation that links any th quantile to the th quantile of the underlying ALD distribution (see Yu and Zhang  and Rodrigues and Fan ): where .(3)From (5), we rewrite the model given by (1) as follows: where Then, the likelihood associated with the model given by (1) is rewritten as the product of likelihoods; each one only depends on one subsample:
In this section, we specify the prior distributions for all unknown quantities. It is worth mentioning that is unknown even it is not our primary interest, and therefore it should require a prior; but since any prior on that is independent of the prior on would disappear upon marginalization of the posterior of relatively to , we drop it in the sequel. Thus, it suffices to choose a prior distribution for , for , , and .(1)For the quantiles, we distinguish between the linear and nonlinear cases.(a) Linear case. , , where denotes the transpose of and . Due to (5), one should note that distinct quantiles differ only by the intercept , so that all quantiles of interest have the same slope ; then we consider the unknown vector parameter of dimension with Gaussian prior, with being a positive definite square matrix of dimension .(b) Nonlinear case. We define the function on , by . We put a Gaussian process prior on , , with a zero-mean function and a covariance function : ; following Sangnier et al. , we choose to be decomposable and its form is given by where , and are positive hyperparameters, and is the Euclidean norm on . Here, , , encodes the relation between the conditional quantiles and . As explained by Sangnier et al. , if , the quantile curves are parallel so they do not cross. If , the quantiles are learned independently and then they may cross. Thus, the choice of is important to control the crossing.(2) and , where denotes the inverse Gamma distribution with positive hyperparameters and .(3) and .
To compute the full conditional distributions, we shall make use of the location-scale mixture of the ALD distribution (see Kozumi and Kobayashi ).
Let be an exponential latent variable with parameter , denoted by , and let be a standard normal variable such that and are independent. If has an ALD distribution, ; then, it can be represented as a mixture of normal variable:where and .
Exploiting this augmented data structure, the model defined by the system of equations given by (6) admits, conditionally on , the following Gaussian representation:where .
3.1. Full Conditional Distributions
From the equations given in (12) and the priors defined in Section 2.2.2, we follow Bernardi et al.  to derive the full conditional distributions of and . Yet, the full conditionals of and are not tractable and this is carried out by including a Metropolis-Hasting step to the Gibbs sampler algorithm.(1) For all , as well as , set , and considering the distribution as a prior on , with where stands for the Inverse Gaussian distribution with and as location and shape parameters.(2) The conjugate Gaussian prior on quantiles provides Gaussian full conditional distributions in both linear and nonlinear cases.(a) Linear case. Let us introduce some notations:(i) ,(ii) is the design matrix defined by (iii) This allows to rewrite the system of equations (12) into the following format: with and where . With the zero-mean Gaussian prior distribution, the full conditional distribution on is then Gaussian: with (b) Nonlinear case. We shall use other extra notations: This allows to rewrite the system of equations (12) into the following vector format: Combining (20) with the Gaussian process prior on leads to the following joint distribution of conditional on : Finally, classical calculations lead to the desired full conditional distribution: where(3) The full conditional distribution, of , is proportional to where .(4) The full conditional, of , is proportional to
Due to (24) and (25), it is not possible to generate and directly from their full conditional distribution. Therefore, they are simulated by incorporating a random walk Metropolis-Hastings step within Gibbs, as described in Algorithm 1.
|1: Initialization : , .|
|2: Loop :|
|1. (i) ,|
|2. (i) ,|
For a subset , denote by the truncated version on of the corresponding Gaussian distribution. Note that the choice of the truncated Gaussian distribution is classical.
It is worth to mention that the values of the scale parameters and are calibrated to achieve the equilibrium of the random walk Metropolis-Hastings step quickly; in fact, they are chosen neither too small nor too large so that the acceptance rate becomes practically stable.
The conditional posterior ratios of and are given bywhere and , are respectively, given up to a constant in (24) and (25), and the transition probabilities are given bywhere denotes the pdf of and denotes the cumulative distribution function of the standard normal distribution.
4. Simulation Study
In this section, we study the performance of our method in both linear and nonlinear quantile regressions. For the model given by (1), we shall consider three different designs for the th quantile:(1)Univariate linear quantile: , with .(2)Multivariate linear quantile: , with , , and either or , .(3)Nonlinear quantile: for ,
For all designs, we generate independently 300 observations issued from the model defined in (1). All runs of Metropolis-Hastings within Gibbs algorithm consist in 20000 iterations, one-third of which are burn-in. The prior hyperparameters are chosen as follows: for the inverse Gamma of , and ; for the Beta prior of , and and is set to be the identity matrix for linear quantile and and for the nonlinear case. The choice of and is typical; indeed, whatever the value of is, minimizes the empirical root mean integrated square error, RMISE; that is, , where stands for the posterior mean quantile regression. In order to test the robustness of our procedure with respect to the model parameters, different values of and are considered for the three designs.
The first design is a straightforward example and is carried out just to check the convergence of the algorithm from different tools: of Gelman and Rubin diagnostic, the autocorrelation analysis, and the posterior plots of the various parameters.
Through the second design, we use the crossing loss criterion (see Sangnier et al. ) to compare the performance, against crossing, of our method, denoted by “SBQR" with other approaches: the frequentist single quantile method of Koenker and Bassett , denoted by “K&B”, the Bayesian single quantile regression method of Yu and Moyeed , denoted by “Y&M", and the simultaneous Bayesian method of Reich and Smith , denoted by “R&S”; in addition, we use the RMISE to evaluate the estimation performance. These other methods are performed using available codes in R Core Team (2017): rq function available in quantreg package Koenker  for “K&B”, bayesQR function in bayesQR package Benoit et al.  for “Y&M", and qreg function in BSquare package Smith et al.  for “R&S".
For design 3, we compare our “SBQR" method with both the nonparametric quantile regression method of Muggeo et al. , denoted by “M&ST", and the simultaneous noncrossing method of Rodrigues and Fan , denoted by “F&R”. We have implemented “M&ST" with R package (see Muggeo ) and “F&R” with an own made code in R. We use features of the RMISE criterion to show how well “SBQR” performs among the other considered methods.
We note that the number of observations, , is chosen carefully to illustrate the intended objectives in all designs.
4.1. Design 1: Univariate Linear Quantile Regression
The iid sample is generated according to . We propose to infer three quantiles that are close, namely, the quantiles of order and , respectively.
We fix . To evaluate the convergence of our algorithm, we use three different seeds and parameter starting values to run three different chains and calculate of Gelman convergence diagnostic. Besides, we use other convergence diagnostics such as the autocorrelation analysis and the posterior plots.
As shown in the top panel of Figures 1, 2, and 3, all posterior distributions shrink at the true parameters value. Furthermore, in the middle panel of Figures 1, 2, and 3, the decrease of the empirical autocorrelation of posterior samples proves that the underlying chains are stationary. The bottom panels of Figures 1, 2, and 3 show that goes to 1 through the iterations, which confirms the convergence of the algorithm.
4.2. Design 2: Multivariate Linear Quantile Regression
The second design is dedicated to the multivariate linear case; hence, we consider the model given by (1) with , , , , and (1.6, 2.2, 2.8, 3.4, 4, 4.5, 5.1, 5.7, 6.3, 6.9).
As commonly known, crossing quantiles is a practical problem that often occurs when there is a large number of covariates. We propose to infer two close quantiles of order and and to study the crossing throughout the following four methods: “K&B", “Y&M", “R&S", and “SBQR".
To achieve the desired posterior distribution through MCMC methods, we perform Y&M and R&S with different number of iterations: 1000 for Y&M and 10000 for R&S. For “R&S”, we use the logistic base distribution with 4 basis functions. For “SBQR", we fix and to be the identity matrix.
To compare the methods, we make use of the crossing loss criterion (see Sangnier et al. ) that measures how far goes below :
For a given approach, the less the crossing loss is, the better is the method. As shown in Table 1, the crossing loss, by “K&B", is significantly of 0.43%, which corresponds to 34 data points of which are above . This percentage is considerably weakened when applying the separate Bayesian method “Y&M" (0.38%) but still has crossing quantiles (26 data points of are above ). However, for simultaneous estimation methods, as our proposed “SBQR" or “R&S” methods, the crossing loss becomes zero; this means that the simultaneous quantile estimation has the potential to make quantile crossing vanish. Moreover, the estimated quantiles are in the right order. While simultaneous approaches control the monotonicity property of quantiles in a certain sense, separate approaches do not, and they provide more than 8% of violation according to results previously discussed (11.33% of violation by “K&B" and 8.66% of violation by “Y&M").
To go further, we compute the RMISE for each quantile order and for all methods. The “SBQR" has roughly the smallest RMISE in both quantile levels and . The fact is that “R&S” may oversmooth when estimating the quantiles simultaneously and here, for this linear case, the oversmoothness impacts the flexibility of the method (high RMISE).
To see if the covariable support has an impact on results, we consider another simulation set in which , that is, support(). Table 2 shows that “SBQR" behaves like in the previous example: a zero crossing loss and the smallest RMISE among all the methods.
We also consider another case with a different pair of quantile orders: and . We turn back to the support for . Table 3 shows similar results as the ones obtained for and . Thus, “SBQR” still has the best behavior among the other methods in terms of crossing loss and RMISE.
It should be noted that the estimation of and by “SBQR" is quite good, since their estimated values are near the true ones in the different treated cases.
4.3. Design 3: Nonparametric Quantile Regression
Considering the third design with and , we are interested in estimating quantile functions for orders 0.10, 0.12, 0.15, and 0.20. We fix and and we compare our “SBQR" method with two others: the “M&ST” method with three-order cubic B-splines and the “F&R" approach with smoothness parameter value equal to 0.1.
For each value of , we evaluate the performance of these methods through the RMISE criterion. Table 4 shows that the RMISE values are significantly smaller for “SBQR" than the ones for “M&ST” and “F&R”. It is worth noting, therefore, that our method is significantly better than the two others in quantiles estimation; besides, it provides a reasonable estimation of () and ().
Figure 4 gives the quantile curves estimators. While our “SBQR" method (left panel) provides quantile curves estimates that are close to the true ones (red-dashed lines), the th quantile curve estimate given by “F&R" method (middle panel) is fairly distant from the true curve, especially when . The same happens for “M&ST" method when . However, there is no scarred crossing by any of these three methods, since they are tackling simultaneous quantile estimation techniques.
The issue that makes “F&R" method less flexible in simultaneous quantiles fitting is that, in the second stage, the final estimators are quite affected by the first stage output that may be badly estimated. For “M&ST", the estimators are constructed iteratively, when solving the minimization problem, by adding constraints so that each subsequent quantile function does not cross with the previous one; this may cause overestimation of quantile curves. Our “SBQR" method does not suffer from these mentioned disadvantages, which explains its significant flexibility.
Our proposed estimation procedure, “SBQR", for simultaneous Bayesian quantile regression guarantees the fundamental property of noncrossing. Assuming that the ALD is the underlying data distribution, this method enables characterizing the likelihood function by all quantiles of interest using the relation between two distinct quantiles. Using the location-scale normal mixture representation of the ALD distribution, we develop a Metropolis-Hastings within Gibbs algorithm to implement our method. Our simulation studies show good results that reflect the good performance of the method and the convergence of the algorithm. Against the crossing problem of estimated quantiles, our method has good performance compared with single quantile estimation methods like Koenker and Bassett  and Yu and Moyeed  methods. From the RMISE point of view, “SBQR" is very competitive in both linear and nonlinear quantile regression cases.
As future perspectives, we envisage extending this work to deal with any conditional distribution of .
The simulated data used to support the findings of this study are provided by R software and are included within the attached document.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The author would like to thank the Ph.D. supervisor Professor Ghislaine Gayraud, University of Technology of Compiegne, France, for her support and comments that greatly assisted this paper.
The simulated data used to support the findings of this study are provided by R software and are included within the attached document. (Supplementary Materials)
R. Koenker and S. Portnoy, Nonparametric Estimation of Conditional Quantile Functions, 1992.View at: MathSciNet
N. Sayed-Ahmed, “Efficiency of bayesian approaches in quantile regression with small sample size,” Asian Journal of Probability and Statistics, pp. 1–13, 2018.View at: Google Scholar
H. K. Abbas and R. J. Thaher, “Bayesian adaptive lasso tobit regression,” Journal of Al-Qadisiyah for Computer Science and Mathematics, vol. 110, no. 1, 2019.View at: Google Scholar
X. He, “Quantile curves without crossing,” The American Statistician, vol. 51, no. 2, pp. 186–192, 1997.View at: Google Scholar
M. Sangnier, O. Fercoq, and F. D'Alché-Buc, “Joint quantile regression in vector-valued RKHSs,” in Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2016, pp. 3693–3701, Spain, December 2016.View at: Google Scholar
V. M. R. Muggeo, M. Sciandra, A. Tomasello, and S. Calvo, “Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology,” Environmental and Ecological Statistics, vol. 20, no. 4, pp. 519–531, 2013.View at: Publisher Site | Google Scholar | MathSciNet
R. Koenker, Quantile Regression in R: A Vignette, 2015, https://cran.r-projectorg/web/packages/quantreg/vignettes/rq.pdf.View at: Publisher Site | MathSciNet
D. Benoit, R. Al-Hamzawi, K. Yu, and D. Van den Poel, bayesqr: Bayesian Quantile Regression, 2011.
L. Smith, B. Reich, and M. L. Smith, “Package ’bsquare’,” 2013.View at: Google Scholar
V. M. R. Muggeo, Quantreggrowth-Package: Growth Charts via Regression Quantiles, 2018.