Journal of Probability and Statistics

Volume 2019, Article ID 8610723, 12 pages

https://doi.org/10.1155/2019/8610723

## Fully Bayesian Estimation of Simultaneous Regression Quantiles under Asymmetric Laplace Distribution Specification

Sorbonne Universités, Université de Technologie de Compiègne, LMAC, Compiègne Cedex, France

Correspondence should be addressed to Josephine Merhi Bleik; rf.ctu@kielb-ihrem.enihpesoj

Received 7 December 2018; Accepted 26 February 2019; Published 2 June 2019

Guest Editor: Rahim Alhamzawi

Copyright © 2019 Josephine Merhi Bleik. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, we are interested in estimating several quantiles simultaneously in a regression context via the Bayesian approach. Assuming that the error term has an asymmetric Laplace distribution and using the relation between two distinct quantiles of this distribution, we propose a simple fully Bayesian method that satisfies the noncrossing property of quantiles. For implementation, we use Metropolis-Hastings within Gibbs algorithm to sample unknown parameters from their full conditional distribution. The performance and the competitiveness of the underlying method with other alternatives are shown in simulated examples.

#### 1. Introduction

Quantile regression has received increasing attention from both theoretical and empirical points of view. It is a statistical procedure intended to estimate conditional quantile of a response distribution, given a set of covariates, and to conduct inference about them.

In regression, when we deal with highly skewed data distribution, traditional mean regression may not explore interesting aspects of the relationship between the response variable and the available covariates. Furthermore, in the presence of outliers, the evaluation of the response average becomes much more complicated. However, since a set of quantiles often provides a complete description of the response distribution conditionally to the given covariates, quantile regression offers a practical alternative to the traditional mean regression.

Quantile regression was initially introduced by Koenker and Bassett [1], where inference proceeds by minimizing a check loss function. Based on their theory, many other approaches have been emerged to study the regression quantiles in both frequentist and Bayesian points of view.

In the frequentist framework, different techniques have been developed in the literature to infer regression quantiles. To estimate linear regression quantiles, Koenker and Bassett [1] used linear programming technique, whereas Gutenbrunner and Jurecková [2] considered the corresponding dual problem whose solution represents the quantile coefficients estimator. From theoretical point of view, asymptotic normality and consistency properties are proven in Koenker and Bassett [1], Zhou et al. [3], and Koenker and Xiao [4]. For nonparametric quantile regression, spline methods are commonly used for quantile regression fitting (see Nychka et al. [5], Koenker and Portnoy [6], and Koenker et al. [7]). Oh et al. [8] proposed, in univariate and bivariate settings, an approximation of the nonlinear optimization problem by a sequence of penalized least squares type nonparametric mean estimation problems. Koenker and Mizera [9] explored the estimation of triograms in bivariate setting. Koenker et al. [10] proposed an additive models estimation method for quantile regression based on selecting the smoothing parameter and constructing confidence bands for nonparametric components, in univariate and bivariate settings. The univariate regression quantile spline is extended to problems with several covariates in He and Ng [11] and Li et al. [12] where, in this latter, regression quantiles are considered in reproducing kernel Hilbert spaces (RKHS).

In the Bayesian framework, different methods have been successfully emerged to study quantile regression. This was pioneered by Yu and Moyeed [13], where the authors specified the Bayesian quantile regression model and assumed that the error terms are iid according to the asymmetric Laplace distribution (ALD), denoted by with being its th quantile, is the scale parameter, and are the asymmetry parameters. They also showed that the Bayesian approach yields a proper full conditional distribution, even for an improper uniform prior on the quantile coefficients and they used the Metropolis-Hastings algorithm for implementation. Further, Kozumi and Kobayashi [14] and Bernardi et al. [15] considered that the scale parameter of the ALD is unknown. They used the location-scale mixture representation of the ALD, which enables developing a Gibbs sampling algorithm for the Bayesian quantile regression model. In the literature, we can also find other methods that extended quantile regression model. For censored data, Kobayashi et al. [16] proposed Tobit quantile regression with endogenous variables and Li and Bradic [17] developed regression adjustment for local adaptive representation of random forests. In continuous dependent variable case and for variable selection and coefficient estimation, Alhamzawi et al. [18] used adaptive LASSO quantile regression and, recently, Sayed-Ahmed [19] applied this approach for small sample sizes, whereas Abbas and Thaher [20] developed Bayesian adaptive Tobit regression. Benoit and Van den Poel [21] proposed Bayesian quantile regression methods for binary response data and Alhamzawi and Ali [22] adapted the quantile regression model to deal with longitudinal ordinal data.

All these methods mentioned above, from both frequentist and Bayesian points of view, studied single regression quantile. However, there are many applications where tackling several quantiles at the same time is needed. For this direction, if one uses methods that deal with single quantile regression to estimate regression quantiles separately, results may not be satisfactory. Indeed, it is known that the separate estimation of a set of quantiles may break the quantile monotonicity property (see He [23]), which means that quantile curves may cross.

Fortunately, many approaches have been proposed to overcome the crossing problem. In the frequentist framework, Liu and Wu [24] and Bondell et al. [25] considered monotonicity constraints for a fixed number of quantiles in both linear and nonparametric cases. Sangnier et al. [26] proposed a nonparametric method that estimates multiple conditional quantiles simultaneously when assuming that the quantile functions spans an RKHS. In this work, they choose an appropriate matrix-valued kernel taking into account the distance between quantiles.

On the other hand, in simultaneous Bayesian framework, Reich et al. [27] proposed a two-stage semiparametric method for linear quantile regression, which is one of the first methods addressing the crossing problem of quantiles. At the first stage, they used Koenker and Bassett’s [1] method to estimate quantile coefficients. At the second stage, these coefficients are reestimated with Bernstein polynomial basis function with constraints on their coefficients. Later, Reich [28] proposed a fully Bayesian approach based on interpolating the quantile function on different quantile levels using a piecewise Gaussian basis function. Tokdar et al. [29] proposed a simultaneous linear quantile regression method for a univariate explanatory variable with a bounded and convex domain. They showed that a linear quantile, , is monotonically increasing if and only if and are linear combination of two monotonic increasing functions in . Afterwards, Yang and Tokdar [30] extended this method to the multivariate explanatory case. Another two-stage method is proposed by Rodrigues and Fan [31]. At the first stage, they used the standard Bayesian quantile regression of Yu and Moyeed [13] fitted separately at the different quantile levels. They introduced induced quantiles, at the second stage, using a relation between quantiles of the ALD distribution. At the same stage, they used the Gaussian process regression that monotonizes quantile functions by borrowing strength from the nearby ones. However, this method is still affected by the outputs of the first stage since different likelihoods are used for initial estimates. For nonparametric quantile regression, a cubic B-spline method is considered by Muggeo et al. [32], solving an penalisation problem with monotonicity constraint on spline coefficients. Many recent works on simultaneous Bayesian quantile regression can be found in literature such as Das and Ghosal [33], Das and Ghosal [34], Rodrigues et al. [35], and Petrella and Raponi [36].

In this paper, our contribution aims at developing a fully Bayesian approach that estimates multiple quantiles simultaneously in one step. Unlike the other Bayesian methods, our proposed method is not based on misspecification, i.e., the ALD is the real underlying response distribution. We used a relation between quantiles of the ALD distribution and Metropolis-Hastings within Gibbs algorithm for implementation. Our proposed method provides parallel quantile functions estimators and, thus, the noncrossing is a natural result. Our numerical results show, using specific empirical criteria, that the estimated regression quantiles respect the quantile level ordering.

The paper is organised as follows. Section 2 describes the method presenting the model and establishing the relation between quantiles of the ALD distribution used in the estimation procedure. The algorithm associated with the proposed method is given in Section 3, where we introduce the location-scale mixture representation of the ALD. In Section 4, simulated examples are provided to show the performance and the flexibility of the method, and the final section presents our conclusion.

#### 2. Simultaneous Bayesian Estimation of Quantiles

##### 2.1. Model

We consider the quantile regression modelwith , where , , and , respectively, are the location, the scale, and the asymmetry parameters. This leads to assuming that the response variable , given the covariable , follows an ALD distribution:whose probability density function (p.d.f.) is given bywhere is the check loss function. The explanatory variables ’s are supposed to be iid distributed according to an arbitrary continuous distribution, , whose support is , , . In what follows, the function and the parameters , , and are supposed to be unknown.

In all subsequent sections, any quantity in bold represents a vector; capital letters denote random variables or vectors, whereas lowercase letters denote observed values of the corresponding random vectors.

##### 2.2. Bayesian Procedure

###### 2.2.1. Likelihood

To infer simultaneously distinct quantiles with a fully Bayesian approach, say with distinct orders , of the conditional distribution of , one needs to characterize the likelihood through all these unknown quantiles.

As explained below, this is done first by partitioning the whole sample into subsamples, which requires a sufficient number of observations, second by using the relation between any two quantiles of the ALD distribution, and third by rewriting the whole likelihood through terms, where the -th term only depends on .(1)Consider the following partition of the whole sample through subsamples: where with , , and ; suppose that is integer, without loss of generality.(2)To characterize the likelihood through all quantiles of interest, we use the relation that links any th quantile to the th quantile of the underlying ALD distribution (see Yu and Zhang [37] and Rodrigues and Fan [31]): where .(3)From (5), we rewrite the model given by (1) as follows: where Then, the likelihood associated with the model given by (1) is rewritten as the product of likelihoods; each one only depends on one subsample:

###### 2.2.2. Priors

In this section, we specify the prior distributions for all unknown quantities. It is worth mentioning that is unknown even it is not our primary interest, and therefore it should require a prior; but since any prior on that is independent of the prior on would disappear upon marginalization of the posterior of relatively to , we drop it in the sequel. Thus, it suffices to choose a prior distribution for , for , , and .(1)For the quantiles, we distinguish between the linear and nonlinear cases.(a) *Linear case. *, , where denotes the transpose of and . Due to (5), one should note that distinct quantiles differ only by the intercept , so that all quantiles of interest have the same slope ; then we consider the unknown vector parameter of dimension with Gaussian prior, with being a positive definite square matrix of dimension .(b) *Nonlinear case.* We define the function on , by . We put a Gaussian process prior on , , with a zero-mean function and a covariance function : ; following Sangnier et al. [26], we choose to be decomposable and its form is given by where , and are positive hyperparameters, and is the Euclidean norm on . Here, , , encodes the relation between the conditional quantiles and . As explained by Sangnier et al. [26], if , the quantile curves are parallel so they do not cross. If , the quantiles are learned independently and then they may cross. Thus, the choice of is important to control the crossing.(2) and , where denotes the inverse Gamma distribution with positive hyperparameters and .(3) and .

#### 3. Computations

To compute the full conditional distributions, we shall make use of the location-scale mixture of the ALD distribution (see Kozumi and Kobayashi [14]).

Let be an exponential latent variable with parameter , denoted by , and let be a standard normal variable such that and are independent. If has an ALD distribution, ; then, it can be represented as a mixture of normal variable:where and .

Exploiting this augmented data structure, the model defined by the system of equations given by (6) admits, conditionally on , the following Gaussian representation:where .

##### 3.1. Full Conditional Distributions

From the equations given in (12) and the priors defined in Section 2.2.2, we follow Bernardi et al. [15] to derive the full conditional distributions of and . Yet, the full conditionals of and are not tractable and this is carried out by including a Metropolis-Hasting step to the Gibbs sampler algorithm.(1) For all , as well as , set , and considering the distribution as a prior on , with where stands for the Inverse Gaussian distribution with and as location and shape parameters.(2) The conjugate Gaussian prior on quantiles provides Gaussian full conditional distributions in both linear and nonlinear cases.(a) *Linear case.* Let us introduce some notations:(i) ,(ii) is the design matrix defined by (iii) This allows to rewrite the system of equations (12) into the following format: with and where . With the zero-mean Gaussian prior distribution, the full conditional distribution on is then Gaussian: with (b) *Nonlinear case.* We shall use other extra notations: This allows to rewrite the system of equations (12) into the following vector format: Combining (20) with the Gaussian process prior on leads to the following joint distribution of conditional on : Finally, classical calculations lead to the desired full conditional distribution: where(3) The full conditional distribution, of , is proportional to where .(4) The full conditional, of , is proportional to

##### 3.2. Algorithm

Due to (24) and (25), it is not possible to generate and directly from their full conditional distribution. Therefore, they are simulated by incorporating a random walk Metropolis-Hastings step within Gibbs, as described in Algorithm 1.