Abstract

We address a quantile dependent prior for Bayesian quantile regression. We extend the idea of the power prior distribution in Bayesian quantile regression by employing the likelihood function that is based on a location-scale mixture representation of the asymmetric Laplace distribution. The propriety of the power prior is one of the critical issues in Bayesian analysis. Thus, we discuss the propriety of the power prior in Bayesian quantile regression. The methods are illustrated with both simulation and real data.

1. Introduction

Quantile regression models have been widely used for a variety of applications (Koenker [1]; Yu et al. [2]). Like standard or mean regression models, dealing with parameter and model uncertainty as well as updating information is of great importance for quantile regression and application. Since Yu and Moyeed [3] Bayesian inference quantile regression has attracted a lot of attention in the literature (Hanson and Johnson [4]; Tsionas [5]; Scaccia and Green [6]; Schennach [7]; Dunson and Taylor [8]; Geraci and Bottai [9]; Taddy and Kottas [10]; Yu and Stander [11]; Kottas and Krnjajić [12]; Lancaster and Jun [13]). These Bayesian inference models include Bayesian parametric, Bayesian semiparametric as well as Bayesian nonparametric models. However, almost all these models set priors independent of the values of quantiles, or the prior is the same for modelling different quantiles. This approach may result in inflexibility in quantile modelling. For example, a 95% quantile regression model should have different parameter values from the median quantile, and thus the priors used for modelling the quantiles should be different. It is therefore more reasonable to set different priors for different quantiles. In this paper, we address a quantile dependent prior for Bayesian quantile regression. Our idea is to set priors based on historical data. Although one can use improper prior in Bayesian quantile regression, the inference on current data could be more reliable and sensitive if there exist historical data gathered from similar previous studies. There are several methods to incorporate the historical data in the analysis of a current study. One of these methods is the power prior proposed by Ibrahim and Chen [14] which is constructed by raising the likelihood function of the historical data to a power parameter between 0 and 1. The power parameter represents the proportion of the historical data needed in the current study. The a priori idea for the power prior distribution belongs to Diaconis and Ylvisaker [15] and Morris [16] who studied conjugate priors for the exponential families, where they considered the power parameter as fixed constant which can be determined in advance. Ibrahim and Chen [14] developed this idea and considered the uncertainty case of the power parameter. They applied it in generalized linear mixed models, semiparametric proportional hazards models, and cure rate models for survival data. Chen et al. [17] examined the theoretical properties of power prior distribution for generalized linear models, while Ibrahim et al. [18] studied the optimality properties of the power prior, and Chen and Ibrahim [19] studied the relation between the power prior and hierarchical models and provided a formal justification of the power prior by examining formal analytical relationships between the power prior and hierarchical modelling in linear models.

Following the standard setup and notation for the power prior by Ibrahim and Chen [14], suppose that there exist historical data gathered from previous studies similar to the current study denoted by 𝐷0=(𝑛0,𝑦0,𝑥0) along with a precision parameter 𝑎0, 0𝑎01, where 𝑛0 denotes the sample size of the historical data, 𝑦0 is an 𝑛0×1 historical data response vector, and 𝑥0𝑖=(1,𝑥0𝑖1,𝑥0𝑖2,,𝑥0𝑖𝑛) represent the 𝑘+1 known covariates from the historical data. The power parameter 𝑎0; represents how much data from the previous study is to be used in the current study. There are two special cases for 𝑎0; the first case 𝑎0=0 corresponds to no incorporation of the data from previous study relative to the current study. The second case 𝑎0=1 corresponds to full incorporation of the data from previous study relative to the current study. Therefore, 𝑎0 controls the influence of the data gathered from previous studies that is similar to the current study; such control is important when the sample size of the current data is quite different from the sample size of the historical data or where there is heterogeneity between two studies (Ibrahim and Chen [14]). In generalized linear models, Ibrahim and Chen [14] defined the power prior of unknown parameters 𝛽 based on the historical data as 𝜋𝛽𝐷0,𝑎0𝐿𝛽𝐷0𝑎0𝜋0𝛽𝑐0,(1.1) where 𝑐0 is a specified hyperparameter for the initial prior. Formulation (1.1) was initially elicited for 𝑎0 as known parameter which can be determined previously, for example, by using expert beliefs or via a meta-analytic approach. Ibrahim and Chen [14] extend this idea by treating 𝑎0 as random that is why the formulation becomes quite complicated. However, a random 𝑎0 gives the researcher more freedom and flexibility in weighting the data gathered from previous studies. Thus Ibrahim and Chen [14] proposed a joint power prior distribution for (𝛽,𝑎0) in generalized linear model of the form 𝜋𝛽,𝑎0𝐷0𝐿𝛽𝐷0𝑎0𝜋0𝛽𝑐0𝜋𝑎0𝛾0,(1.2) where 𝑐0 and 𝛾0 are specified hyperparameter vectors. Power priors (1.1) and (1.2) will not have a closed form in general; however Ibrahim and Chen [14] suggested using a uniform prior for 𝜋0(𝛽𝑐0) and a beta prior for 𝜋(𝑎0𝛾0), or other choices, such as truncated normal or gamma priors. The advantage of employing these three priors for 𝜋(𝑎0𝛾0) is due to their similar theoretical and computational properties. Furthermore, the authors extend the original power prior to a situation where the set of covariates measured in the previous study is a subset from a set of covariates in the current data or when the historical data are not available. In addition they generalized power prior (1.2) to multiple data from previous studies, and power prior (1.2) becomes 𝜋𝛽,𝑎0𝐷0𝑀𝑗=1𝐿𝛽𝐷0𝑗𝑎0𝑗𝜋𝑎0𝑗𝛾0𝜋0𝛽𝑐0,(1.3) where 𝑀 represent the size of previous studies, 𝑎0=(𝑎01,,𝑎0𝑀), 𝐷0𝑗 is the historical data for 𝑗th study, 𝑗=1,2,,𝑀, and 𝐷0=(𝐷01,,𝐷0𝑀).

Section 2 of the paper gives a brief overview of likelihood function based on asymmetric type of Laplace distribution, and we define the power prior for Bayesian quantile regression. In Section 3, we discuss the propriety of the power prior. In Section 4 we describe in detail the location-scale mixture of normal representation, and we propose power priors by using this representation for Bayesian quantile regression. Section 5 contains two simulation studies with one real data, and we end with a short discussion in Section 6.

2. The Power Prior

Consider the quantile linear regression model𝑦𝑖=𝑥𝑖𝛽𝑝+𝜀𝑖,(2.1) where {(𝑥𝑖,𝑦𝑖),𝑖=1,2,,𝑛} are independent observations, 𝑦𝑖 is the response variable, 𝑥𝑖=(1,𝑥𝑖1,𝑥𝑖2,,𝑥𝑖𝑘) represent the (𝑘+1) known covariates, 𝛽𝑝=(𝛽0(𝑝),𝛽1(𝑝),,𝛽𝑘(𝑝)) is the (𝑘+1) unknown parameters, and 𝜀𝑖, 𝑖=1,,𝑛, represent error terms which are independent and identically distributed errors. The distribution of the error is assumed unknown and is restricted to have the 𝑝th quantile equal to zero and 0<𝑝<1. Let 𝑞𝑝(𝑦𝑥) represent the conditional quantile of 𝑦𝑖 given 𝑥𝑖. Then the relation between 𝑞𝑝(𝑦𝑥) and 𝑥 can be modelled as 𝑞𝑝(𝑦𝑥)=𝑥𝑖𝛽𝑝.

Following Yu and Moyeed [3], we suppose that 𝜀𝑖 has an asymmetric Laplace distribution with the density 𝑓(𝜀𝑝)=𝑝(1𝑝)exp𝜌𝑝(𝜀),(2.2) where 𝜌𝑝(𝑢)=𝑝|𝑢|if𝑢0,(1𝑝)|𝑢|if𝑢<0.(2.3)

We refer to Kotz et al. [20] for a nice comprehensive review about the asymmetric Laplace distribution. The mean and variance of the asymmetric Laplace distribution are, respectively, given by 𝐸𝜀𝑖=12𝑝𝜀𝑝(1𝑝),Var𝑖=12𝑝+2𝑝2𝑝2(1𝑝)2.(2.4)

It is known that the probability density function of the asymmetric Laplace distribution of 𝑦𝑖 given a location parameter 𝜇𝑖=𝑥𝑖𝛽𝑝 is given by 𝑓𝑦𝑖𝛽𝑝𝑦=𝑝(1𝑝)exp𝑖𝑥𝑖𝛽𝑝𝑝𝐼𝑦𝑖𝑥𝑖𝛽𝑝.(2.5)

Let 𝐷=(𝑛,𝑦𝑖,𝑥𝑖) denote the data from the current study. Then, the likelihood function for the current study is given by 𝑓𝛽𝑝𝐷=𝑝𝑛(1𝑝)𝑛𝑛𝑖=1𝑦exp𝑖𝑥𝑖𝛽𝑝𝑝𝐼𝑦𝑖𝑥𝑖𝛽𝑝=𝑝𝑛(1𝑝)𝑛exp𝑛𝑖=1𝑦𝑖𝑥𝑖𝛽𝑝𝑝𝐼𝑦𝑖𝑥𝑖𝛽𝑝.(2.6) Suppose that there exists historical data from a previous study denoted by 𝐷0=(𝑛0,𝑦0,𝑥0) measuring the same response variable and covariates as the current study, where 𝑛0 denotes the sample size of the previous study, 𝑦0 is an 𝑛0×1 response vector of the previous study, and 𝑥𝑖=(1,𝑥0𝑖1,𝑥0𝑖2,,𝑥0𝑖𝑘) represent the 𝑘+1 known covariates from the previous study. Then the likelihood function based on the data from the previous study is defined by 𝐿𝛽𝑝𝐷0=𝑝𝑛0(1𝑝)𝑛0exp𝑛0𝑖=1𝑦𝑖𝑥0𝑖𝛽𝑝𝑝𝐼𝑦0𝑖𝑥0𝑖𝛽𝑝.(2.7)

From Ibrahim and Chen [14] we define the joint prior distribution of 𝛽𝑝 and 𝑎0 for Bayesian quantile regression as 𝜋𝛽𝑝,𝑎0𝐷0𝐿𝛽𝑝𝐷0𝑎0𝜋0𝛽𝑝𝑐0𝜋𝑎0𝛾0,(2.8) where 𝐿(𝛽𝑝𝐷0) is the likelihood function for the historical data for quantile regression which is given by (2.7). We assume that the initial prior for 𝛽𝑝 is uniform. However, other choices, including multivariate normal or a double exponential can be used. Yu and Stander [11] prove that all posterior moments for 𝛽𝑝 exist under these priors.

3. The Propriety of Power Prior Distribution in Quantile Regression

The power prior proposed by Ibrahim and Chen [14] has been constructed to be a useful class of informative prior in Bayesian analysis. This prior depends on the availability of the historical data, and in the context of Bayesian analysis when such data are available the prior distribution should be proper because it is well known that any informative Bayesian analysis requires a proper prior distribution; thus the propriety of the power prior is of critical importance. In this section we discuss the propriety of the power prior distribution in Bayesian quantile regression.

Theorem 3.1. Suppose that the initial prior distribution for 𝛽𝑝 is a uniform prior and 𝑎0 has a beta prior with hyperparameters (𝛿0>0,𝜆0>0). Then, the joint prior distribution (2.8) in quantile regression for (𝛽𝑝,𝑎0) is proper. In other words 0<10𝐿𝛽𝑝𝐷0𝑎0𝑎𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝<.(3.1)

Proof. See the appendix.

Corollary 3.2. Suppose that the initial prior distribution for 𝛽𝑝 is a uniform prior and the random variable 𝑎0 has a uniform prior. Then, the joint power prior distribution (2.8) in quantile regression for (𝛽𝑝,𝑎0) is proper. In other words 0<10𝐿𝛽𝑝𝐷0𝑎0𝑑𝑎0𝑑𝛽𝑝<.(3.2) This corollary is derived directly from Theorem 3.1 because the uniform distribution is the special case of the beta distribution when (𝛿0=1,𝜆0=1) and the proof is omitted.

Corollary 3.3. Suppose that the initial prior distribution for 𝛽𝑝 is uniform prior and 𝑎0 is constant. Then, power prior (1.1) in quantile regression for 𝛽𝑝 is proper. In other words 0<𝐿𝛽𝑝𝐷0𝑎0𝑑𝛽𝑝<.(3.3) This corollary is derived directly from Corollary 3.2, and the proof is omitted. It is straightforward to verify that the joint prior 𝜋(𝛽𝑝,𝑎0𝐷0) when 𝛽𝑝 has a uniform prior is always proper in quantile regression, which also ensures the proper propriety of the joint posterior of (𝛽𝑝,𝑎0).

Theorem 3.4. Suppose that the initial prior distribution for 𝛽𝑝 is assumed to be independent, and each 𝜋0(𝛽𝑖(𝑝)𝑐0)exp{(1/𝜆𝑖)|𝛽𝑖(𝑝)𝜇𝑖|}, a double-exponential with fixed 𝜇𝑖, 𝜆𝑖>0, and 𝑎0 has a beta prior with hyperparameters (𝛿0,𝜆0). Then, the joint prior distribution (2.8) in quantile regression for (𝛽𝑝,𝑎0) is proper.

4. Mixture Representation

Consider the linear model for quantile regression (2.1), where the error term 𝜀 has an asymmetric Laplace distribution with the 𝑝th quantile equal to zero. The probability density function of the asymmetric Laplace distribution with location parameter 𝜇 and skewness parameter 𝑝, 𝑝(0,1) is given by (2.2).

It is well known that the asymmetric Laplace distribution (2.2) can be viewed as a mixture of an exponential and a scaled normal distribution (Reed and Yu [21] and Kotz et al. [20]). This can be recognized in the following lemma.

Lemma 4.1. Suppose that 𝑋 is a random variable that follows the asymmetric Laplace distribution with density (2.2), 𝜉 is a standard normal random variable, and 𝑧 is a standard exponential random variable. Then, one can represent 𝑋 as a location-scale mixture of normals given by 𝑋=𝑑12𝑝𝑝(1𝑝)𝑧+2𝑧𝑝(1𝑝)𝜉.(4.1)
From this result we can equivalently represent the error term 𝜀𝑖 as a mixture of normal distributions, given by 𝜀𝑖=𝜃𝑧𝑖+𝜙𝑧𝑖𝜉𝑖,(4.2) where 𝜃=12𝑝𝑝(1𝑝),𝜙2=2𝑝(1𝑝).(4.3)

Following Reed and Yu [21], we assume that the conditional distribution of each 𝑦𝑖 given 𝑧𝑖 is normal with mean 𝑥𝑖𝛽𝑝+𝜃𝑧𝑖 and variance 𝜙2𝑧𝑖 and the 𝑧𝑖 given 𝛽𝑝 are independent standard exponential variables. Letting 𝑦=(𝑦1,,𝑦𝑛) and 𝑧=(𝑧1,,𝑧𝑛), then, the joint density of (𝑦,𝑧) is given by 𝑓𝑦,𝑧𝛽𝑝=𝑛𝑖=1𝑓𝑦𝑖𝛽𝑝,𝑧𝑖𝜋𝑧𝑖𝛽𝑝𝑓,(4.4)𝑦,𝑧𝛽𝑝𝑛𝑖=1𝑧𝑖1/2𝑦exp𝑖𝑥𝑖𝛽𝑝𝜃𝑧𝑖22𝜙2𝑧𝑖exp𝑧𝑖=𝑛𝑖=1𝑧𝑖1/2exp𝑛𝑖=1𝑦𝑖𝑥𝑖𝛽𝑝𝜃𝑧𝑖22𝜙2𝑧𝑖exp𝑛𝑖=1𝑧𝑖.(4.5)

We then integrate out the exponential variable 𝑧, which leads to the likelihood 𝑓(𝑦𝛽𝑝), where 𝑓𝑦𝛽𝑝=𝑓𝑦,𝑧𝛽𝑝𝑑𝑧.(4.6)

4.1. The Power Prior for Mixture Representation

Suppose that we are interested in making inference about 𝛽𝑝 on the normal distribution with unknown variance, by incorporating both the previous and current studies.

Following the standard setup and notation for the power prior distribution for mixture representation, we assume that only one historical data set exists, and it is given by 𝐷0=(𝑛0,𝑦0,𝑥0), where 𝑛0 is the sample size of the historical data, 𝑦0 is the 𝑛0×1 response vector, and 𝑥0 is the 𝑛0×(𝑘+1) matrix of covariates.

Let 𝑧0=(𝑧01,,𝑧0𝑛0), where 𝑧01,,𝑧0𝑛0 are standard exponential random variables. As a mixture representation, the joint density for the historical data of 𝑦0𝑖 given 𝑧0𝑖 is normal with mean 𝑥0𝑖𝛽𝑝+𝜃𝑧0𝑖 and variance 𝜙2𝑧0𝑖, and each 𝑧0𝑖 given 𝛽𝑝 is independently and identically standard exponential distribution, which can be viewed as the prior distribution on 𝑧0𝑖. For 𝜋0(𝛽𝑝𝑐0) we choose a normal density as initial prior with mean 0 and variance 𝐵=𝑐0𝐼, that is, 𝜋0(𝛽𝑝𝑐0)exp((1/2𝑐0)𝛽𝑝𝛽𝑝). The purpose of this choice is due to the fact that all posterior moments of 𝛽𝑝 exist under the above prior as provided in the studies of Yu and stander [11]. It is also convenient if all covariates are measured on the same scale parameter. As a special case one may choose a uniform improper prior which is special case of beta distribution when (𝛿0=1,𝜆0=1) for 𝜋0(𝛽𝑝𝑐0), that is, 𝜋0(𝛽𝑝|𝑐0)1; this corresponds to 𝑐0, and this choice is very convenient with the partially Gibbs sampler as provided by Reed and Yu [21]. We propose a prior distribution of 𝛽𝑝 taking the form 𝜋𝛽𝑝𝐷0,𝑎0𝑛0𝑖=1𝑧0𝑖𝑓𝑦0𝑖𝛽𝑝,𝑧0𝑖𝑎0𝜋𝑧0𝑖𝛽𝑝𝑑𝑧0𝑖𝜋0𝛽𝑝𝑐0,(4.7) where 𝑓(𝑦0𝑖𝛽𝑝,𝑧0𝑖) and 𝑓(𝑧0𝑖𝛽𝑝) are the same 𝑓(𝑦𝑖𝛽𝑝,𝑧𝑖) and 𝑓(𝑧𝑖𝛽𝑝) in (4.4) with (𝑦0𝑖,𝑧0𝑖) in place of (𝑦𝑖,𝑧𝑖) to represent the historical data. Since we view 𝑎0 as a random quantity, the prior specification is completed by specifying a prior distribution for 𝑎0. We take a beta prior for 𝑎0 with parameter (𝛿0,𝜆0), or one may choose a uniform prior. Thus we propose a joint prior distribution for 𝛽𝑝 and 𝑎0 of the form 𝜋𝛽𝑝,𝑎0𝐷0𝑛0𝑖=1𝑧0𝑖𝑓𝑦0𝑖𝛽𝑝,𝑧0𝑖𝑎0𝜋0𝑧0𝑖𝛽𝑝𝑑𝑧0𝑖𝜋0𝛽𝑝𝑐0𝜋𝑎0𝛾0,(4.8)𝑛0𝑖=1𝑧0𝑖𝑧1/20𝑖exp𝑎0𝑦0𝑖𝑥0𝑖𝛽𝑝𝜃𝑧0𝑖22𝜙2𝑧0𝑖exp𝑧0𝑖𝑑𝑧0𝑖1×exp2𝑐0𝛽𝑝𝛽𝑝×𝑎𝛿0011𝑎0𝜆01.(4.9)

We see that (4.8) will not have a closed form in general because it depends on the initial priors that we choose. Thus the joint posterior distribution of 𝛽𝑝 and 𝑎0 is given by 𝑝𝛽𝑝,𝑎0𝐷,𝐷0𝑛𝑖=1𝑓𝑦𝑖𝛽𝑝,𝑧𝑖𝜋𝛽𝑝,𝑎0𝐷0.(4.10) Power prior (4.8) is constructed for one historical data, and this power prior can be easily generalized to multiple historical data. To generalized power prior (4.8) to multiple historical data, we assume that there are 𝑀 historical studies denoted by 𝐷0=(𝐷01,,𝐷0𝑀), where 𝐷0𝑗=(𝑛0𝑗,𝑦0𝑗,𝑥0𝑗) represent the historical data based on the 𝑗 study, 𝑗=1,,𝑀. Let 𝑧0𝑗=(𝑧01𝑗,,𝑧0𝑛0𝑗), where 𝑧01𝑗,,𝑧0𝑛0𝑗 are standard exponential random variables. We define 𝑎0𝑗 to be the power parameter for the 𝑗th study with beta prior distribution. Hence, the prior can be generalized as 𝜋𝛽𝑝,𝑎0𝐷0𝑀𝑗=1𝑛0𝑗𝑖=1𝑧0𝑖𝑗𝑓𝑦0𝑖𝑗𝛽𝑝,𝑧0𝑖𝑗𝑎0𝑗𝜋0𝑧0𝑖𝑗𝛽𝑝𝑑𝑧0𝑖𝑗×𝜋0𝛽𝑝𝑐0𝜋𝑎0𝑗𝛾0,(4.11) where 𝑎0=(𝑎01,,𝑎0𝑀), and each 𝑎0𝑗 has a beta prior with the same hyperparameters (𝛿0,𝜆0).

4.2. Inference with Scale Parameter

In the previous section, we have considered the power prior distribution in quantile regression model without taking into account a scale parameter. One may be interested to introduce a scale parameter into the model for the proposed Bayesian inference. Suppose that 𝜏>0 is the scale parameter. From now on, it is more convenient to work with 𝑣𝑖=𝜏𝑧𝑖 for the current data and with 𝑣0𝑖=𝜏𝑧0𝑖 for the historical data. We assume that only one historical data set exists, and it is given by 𝐷0=(𝑛0,𝑦0,𝑥0). Let 𝑣0=(𝑣01,,𝑣0𝑛0). Then, the conditional distribution for each 𝑦0𝑖 given 𝑣0𝑖,𝛽𝑝, and 𝜏 is normal with mean 𝑥0𝑖𝛽𝑝+𝜃𝑣0𝑖 and variance 𝜏𝜙2𝑣0𝑖, that is, 𝑦0𝑖𝑣0𝑖,𝛽𝑝,𝜏𝑁(𝑥0𝑖𝛽𝑝+𝜃𝑣0𝑖,𝜏𝜙2𝑣0𝑖), and the 𝑣0𝑖 given 𝛽𝑝 and 𝜏 are independent and identically distributed exponential variables with rate parameter 𝜏. The conditional distribution of 𝑣0𝑖 given 𝛽𝑝 and 𝜏 can be viewed as prior distribution on 𝑣0𝑖. It will be more convenient to work with the following priors:𝑙𝜏Γ0,𝑠0,𝛽𝑝𝜏𝑁𝑘0,𝐵0,𝐵0=𝑐0𝐼,𝑐0,(4.12) where 𝑙0, 𝑠0, and 𝐵0 are known parameters. For 𝑎0 we take a beta prior with parameter (𝛿0,𝜆0). Now, the specification of the power prior distribution is completed, and thus we propose a joint prior distribution for 𝛽𝑝,𝜏, and 𝑎0 of the form 𝜋𝛽𝑝,𝜏,𝑎0𝐷0𝑛0𝑖=1𝑣0𝑖𝑓𝑦0𝑖𝛽𝑝,𝜏,𝑣0𝑖𝑎0𝜋0𝑣0𝑖𝛽𝑝,𝜏𝑑𝑣0𝑖×𝜋0𝛽𝑝𝑐0𝑎𝜋(𝜏)𝜋0𝛾0,(4.13)𝑛0𝑖=1𝑣0𝑖𝜏𝑣0𝑖1/2exp𝑎0𝑦0𝑖𝑥0𝑖𝛽𝑝𝜃𝑣0𝑖22𝜙2𝜏𝑣0𝑖𝜏exp𝜏𝑣0𝑖𝑑𝑣0𝑖1×exp2𝑐0𝛽𝑝𝛽𝑝×(𝜏)𝑙01exp𝑠0𝜏𝑎𝛿0011𝑎0𝜆01.(4.14) Then, the joint posterior distribution of 𝛽𝑝, 𝜏, and 𝑎0 is given by 𝑝𝛽𝑝,𝜏,𝑎0𝐷,𝐷0𝑛𝑖=1𝑓𝑦𝑖𝛽𝑝,𝜏,𝑣𝑖𝜋𝛽𝑝,𝜏,𝑎0𝐷0.(4.15)

Power prior (4.13) can be easily generalized to 𝑀 historical data, and the generalized distribution can be given as𝜋𝛽𝑝,𝜏,𝑎0𝐷0𝑀𝑗=1𝑛0𝑗𝑖=1𝑣0𝑖𝑗𝑓𝑦0𝑖𝑗𝛽𝑝,𝜏,𝑣0𝑖𝑗𝑎0𝑗𝜋0𝑣0𝑖𝑗𝛽𝑝,𝜏𝑑𝑣0𝑖𝑗×𝜋0𝛽𝑝𝑐0𝑎𝜋(𝜏)𝜋0𝑗𝛾0.(4.16)

5. Numerical Examples

In this section, our aim is to compare the posterior means of parameters of interest after incorporating the current and historical data with the mean of true values for both studies. In addition, we will demonstrate the behaviour of the prior under several choices of prior parameters.

Example 5.1. We simulate two data sets, the first one for the current study and the second for the previous study. For the current study we generate 100 observations from the model 𝑦𝑖=𝜇+𝜀𝑖 assuming that 𝜇=5.0 and 𝜀𝑖𝑁(0,1).
For the historical data we use the same model with 50 observations and 𝜇=6.0. In this example we have used only one parameter 𝜇. Table 1 compares the posterior means with the means of true values for 𝑞𝑝(𝑦𝑖)=𝛽𝑝 at 5 different quantiles, namely, 90%, 75%, 50%, 25%, and 10%. We conduct sensitive analysis with respect to five different choices for (𝛿0,𝜆0) for five different quantiles. For computation we construct a Markov chain via the Metropolis-Hastings (MH) algorithm. We ran the algorithm for 15000 iterations and discarded the first 5000 as burn in. Figures 1, 2, and 3 compare the posterior densities of 𝛽𝑝 for 𝑝=0.90,0.50, and 0.10, respectively, for improper prior with the posterior densities of 𝛽𝑝 for the power prior with parameters (𝜇𝑎0,𝜎𝑎0)=(0.50,0.078) and (𝜇𝑎0,𝜎𝑎0)=(0.99,0.010). Clearly, the power prior is more informative than improper prior, due to the small range of posterior densities.
Note that as shown in Chen et al. [17] it is easier to specify the prior mean and standard deviation of 𝑎0 from the following equations: 𝜇𝑎0=𝛿0𝛿0+𝜆0,𝜎𝑎0=𝜇𝑎01𝜇𝑎01/2𝛿0+𝜆0+11/2.(5.1)
Furthermore they have shown that the investigator must choose 𝜇𝑎0 small if he/she wishes low weight to the historical data and must choose 𝜇𝑎00.5 if he/she wishes more weight to the historical data.
In this example we use power prior (2.8), taking uniform prior for 𝛽𝑝 and beta prior for 𝑎0. Under specific quantile level, we see that as the weight for the historical data increases the posterior mean of 𝛽𝑝 increases. This is a comforting feature because it is consistent with what we expect from the data. This implies that the posterior mean for the parameters of interest is quite robust for the different weights for power parameter.
More noticeably, when (𝛿0=100,𝜆0=1), that is, we give more weight to the historical data, we see that the posterior mean is very close to the mean of the true values. In addition, under specific quantile level, we found that as the weight for the historical data increases the standard deviation tends to decrease.

Example 5.2. For a mixture representation with scale parameter, we simulate two data sets, the first one for the current study and the second for the previous study. For the current study we generate a data set of 𝑛=50 observations from the model 𝑦𝑖=𝛽0(𝑝)+𝛽1(𝑝)𝑥𝑖+1/11(11+𝑥𝑖)𝜀𝑖, where 𝑥𝑖 are random uniform numbers on the interval (0, 10) and 𝜀𝑖𝑁(0,1). We restricted 𝛽0(𝑝)=10 and 𝛽1(𝑝)=1. For the previous study we generate 𝑛0=150 observations from the above model with 𝛽0(𝑝)=9 and 𝛽1(𝑝)=1.2.
We use initial prior 𝑁(0,106) on all regression parameters and Γ(103,103) on all scale parameters. Then we ran MCMC algorithm for 11000 iterations and discarded the first 1000 as burn in. We then compute the posterior means of the parameters at 5 different quantiles, namely, 90%, 75%, 50%, 25%, and 10%. We conduct sensitive analysis with respect to five different weights for the power parameter, namely, 10%, 25%, 50%, 75%, and 90%. The results are summarized in Table 2. Based on the results in Table 2 for each quantile, it is consistent in the sense that the posterior mean of 𝛽𝑝 either increases or decreases steadily as the weight of the historical data increases. Under specific quantile level, we also found that as the weight for the historical data increases the posterior standard deviations for all parameters tend to decrease.

Example 5.3. We consider data from the British Household Panel Survey. The data were originally collected by the ESRC Research Centre on Microsocial Change at the University of Essex and analyzed by Yu et al. [22]. The data represent the wage distribution among British workers between 1991 and 2001. We use the data for the year 2000 as current data and for 1994 as historical data. Four covariates and intercept are included in the analysis. The relation between response variable and covariates are given by the following model: 𝑌ln𝑖=𝛽0+𝛽1𝑆𝑖+𝛽2𝐸𝑖+𝛽3𝐸2𝑖+𝛽4𝐷𝑖+𝜀𝑖,(5.2) where 𝑆𝑖 is the number of years of schooling, 𝐸𝑖 is the potential experience (approximated by the age minus years of schooling minus 6), and 𝐷𝑖 is equal to 1 for public sector workers and 0 otherwise. In this example we fixed the power parameter at five weights, namely, 0.10, 0.25, 0.50, 0.75, and 0.90. The results are summarized in Table 3. From Table 3, we see that as the weight for the historical data increases, the posterior mean for each regression coefficient either decreases or increases. We also found that as the weight for the historical data increases, the posterior standard deviations for all parameters tend to decrease.

6. Discussion

In this paper, we have demonstrated the use of power prior in Bayesian quantile regression that incorporates both historical and current data. The advantage of the method is that the prior distribution is changing automatically when we change the quantile. Thus, we have prior distribution for each quantile, and the prior is proper. In addition, we proposed joint prior distributions using a mixture of normal representation of the asymmetric Laplace distribution. The behavior of the power prior is clearly quite robust with different weights for power parameter. We use random power parameter in the first example that can be determined via the hyperparameters of beta distribution, and we compare the posterior mean of the intercept with the mean of true values. In the second example we show the behavior of the power prior distribution when the power parameter is a fixed parameter and can be determined using expert beliefs or via a meta-analytic approach, and we compare the posterior mean of parameter of interest with the mean of true values for both studies. In the third example, we also use fixed power parameter, and we compare the posterior mean for different weights for the historical data. The power prior is a very useful class of informative prior distribution for Bayesian quantile regression. It also seems to be useful in many applications such as model selection and carcinogenicity studies.

Appendix

Proof of Theorem 3.1

To prove the joint prior distribution is proper prior, that is, 0<10𝐿𝛽𝑝𝐷0𝑎0𝑎𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝<,(A.1) note that10𝐿𝛽ln𝑝𝐷0𝑎0𝑎𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝=𝑛0𝑖=1𝑦0𝑖𝑥0𝑖𝛽𝑝𝑝𝐼{𝑦0𝑖𝑥0𝑖𝛽𝑝}𝑑𝛽𝑝10𝑎0𝑑𝑎0+10𝑎ln𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝=lnexp𝑛0𝑖=1𝑦0𝑖𝑥0𝑖𝛽𝑝𝑝𝐼{𝑦0𝑖𝑥0𝑖𝛽𝑝}12𝑑𝛽𝑝+𝐾,(A.2) where 𝐾=10𝑎ln𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝.(A.3)

Then 10𝐿𝛽𝑝𝐷0𝑎0𝑎𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝=𝐾1exp2𝑛0𝑖=1𝑦0𝑖𝑥0𝑖𝛽𝑝𝑝𝐼{𝑦0𝑖𝑥0𝑖𝛽𝑝}𝑑𝛽𝑝.(A.4) Following Yu and Moyeed [3], this integral is finite:0<10𝐿𝛽𝑝𝐷0𝑎0𝑎𝛿0011𝑎0𝜆01𝑑𝑎0𝑑𝛽𝑝<.(A.5)

Acknowledgments

The authors wish to thank Professor Tomasz J. Kozubowski and two anonymous referees for helpful comments and suggestions, which have led to an improvement of this paper.