Quantile Regression and Beyond in Statistical Analysis of the DataView this Special Issue
Research Article | Open Access
V. F. Miranda-Soberanis, T. W. Yee, "New Link Functions for Distribution–Specific Quantile Regression Based on Vector Generalized Linear and Additive Models", Journal of Probability and Statistics, vol. 2019, Article ID 3493628, 11 pages, 2019. https://doi.org/10.1155/2019/3493628
New Link Functions for Distribution–Specific Quantile Regression Based on Vector Generalized Linear and Additive Models
In the usual quantile regression setting, the distribution of the response given the explanatory variables is unspecified. In this work, the distribution is specified and we introduce new link functions to directly model specified quantiles of seven 1–parameter continuous distributions. Using the vector generalized linear and additive model (VGLM/VGAM) framework, we transform certain prespecified quantiles to become linear or additive predictors. Our parametric quantile regression approach adopts VGLMs/VGAMs because they can handle multiple linear predictors and encompass many distributions beyond the exponential family. Coupled with the ability to fit smoothers, the underlying strong assumption of the distribution can be relaxed so as to offer a semiparametric–type analysis. By allowing multiple linear and additive predictors simultaneously, the quantile crossing problem can be avoided by enforcing parallelism constraint matrices. This article gives details of a software implementation called the VGAMextra package for R. Both the data and recently developed software used in this paper are freely downloadable from the internet.
Much of modern regression analysis for estimating conditional quantile functions may be viewed as starting from Koenker and Bassett , who offered a systematic strategy for examining how covariates influence the entire response distribution. The fundamental idea is based on the linear specification of the th quantile function and finding that solves the optimization problemfor independent and identically distributed (i.i.d.) observations from a family of linear quantile regression models, say . Equation (1) can be reformulated as a linear programming problem using the piecewise linear function for . More details can be found in Koenker .
In the spirit of quantile regression, the conditional distribution is usually unspecified, although it relies on normal–based asymptotic theory that is used for inference, whilst the assumption of homoskedasticity of the error terms is dropped. In this paper we use an alternative approach of conditional–quantile regression based on assuming a prespecified distribution for the response. Parametric quantile regression has some advantages over many nonparametric approaches, including overcoming the quantile crossing problem. Two examples are Noufaily and Jones  which is based on the generalized gamma distribution and generalized additive models for location, scale, and shape (GAMLSS; ). Further examples are the LMS-BCN method involving the standard normal distribution and a three–parameter Box–Cox transformation  and the classical method of quantile regression based on the asymmetric Laplace distribution (ALD).
Our approach uses the vector generalized linear and additive model (VGLM/VGAM; [6, 7]) framework. We develop new link functions, , for the quantile regression modelfor a vector of quantiles . Our methodology relies on the prespecification of the distribution . We will also show that the quantile crossing problem can be overcome by this modelling framework. Equations (2)–(3) state that the conditional distribution of the response at a given value of has a distribution involving a parameter and that the transformed quantile of the distribution becomes a linear predictor of the form (5). This can be achieved by defining link functions that connect (3) to (5). The reason for the linear predictors is that generalized linear modelling  is a very well-established method for regression modelling. GLMs are estimated by iteratively reweighted least squares (IRLS) and Fisher scoring, and this algorithm is also adopted by VGLMs and VGAMs.
The method presented in this paper differs from conventional quantile regression  in that we assume is known whereas the conventional case does not but use an empirical method instead to obtain the quantiles : the expectation of the check function results in the property which defines the -quantile ( is the cumulative distribution function (CDF) of ). In this paper we consider the s listed in Table 2.
1.2. VGLMs and VGAMs
VGLMs/VGAMs provide the engine and overall modelling framework in this work—the VGAM R package described below fits over 150 models and distributions—therefore we only sketch the details here. VGLMs are defined in terms of linear predictors, , as any statistical model for which the conditional density of given a –dimensional vector of explanatory variables, has the formfor some known function , with , a matrix of unknown regression coefficients. Ordinarily, for an intercept.
In general, the of VGLMs may be applied directly to the parameters, , of any distribution, transformed if necessary, as the th linear predictorwhere is a VGLM–parameter link function, as in Table 1 (see  for further choices) and is the th element of . Prior to this work the were ‘raw’ parameters such as location, scale, and shape parameters; however, in this present work we define them to be quantiles or a very simple function of quantiles.
This is the VGLM–link for the mean function of the logarithmic distribution.
qgamma() is the quantile function of the standard gamma distribution in R.
‡, with erf() denoting the error function.
In matrix form one can write where , . Sometimes, for some , it may be required to model as intercept–only, that is, , and for .
VGAMs are a nonparametric extension of VGLMs, that is, (6) is generalized towith . Usually the component functions are estimated by splines. Here, are known full–column rank constraint matrices, and is a vector of unknown intercepts. With no constraints at all, (the order- identity matrix). For VGLMs, the are linear so that, cf. (6),The can enforce a wide range of linear constraints such as parallelism and exchangeability.
VGLMs are estimated by maximum likelihood performed by IRLS using the expected information. The VGLM log–likelihood is given byfor known fixed positive prior weights , and a Newton–like algorithm for maximizing (9) has the form , where is the overall expected information matrix (EIM), is the score vector, and is the iteration number. The vector is obtained as the solution of the generalized least squares problem , where the quantity minimized at each IRLS iteration is the weighted (or residual) sum of squares, RSS =The () are known as the working weight matrices and they have th element given byThe use of individual EIMs instead of observed information matrices means that Fisher scoring is used rather than the Newton–Raphson algorithm.
VGAMs are also estimated by IRLS, where the difference with respect to VGLMs is that a vector additive model is now fitted to the pseudo–response with explanatory variables and working weight matrices at each IRLS iteration. Two approaches are currently used by VGAM to estimate the component functions : regression splines and vector smoothing methods with vector backfitting. Rudimentary P-splines  are almost operational, albeit this work is not yet complete. Compared to VGLMs, the VGAM log–likelihood includes a penalty if used with vector smoothing splines. In VGAM the objective function maximized isHere, the are nonnegative smoothing parameters, and are endpoints covering the values of each covariate. The basic penalty approach adopted here is described in Green and Silverman Green and Silverman .
Let be a 1-parameter statistical model as in (4) parametrized by for some parameter space residing in . Also let be the corresponding quantile function with . Crucially, note that (5) handles suitable transformations of in the linear predictor by parameter link functions. In contrast our proposal focusses on directly modelling via a smooth and one–to–one function , in the form ofwhich is to be incorporated in the VGLM/VGAM log–likelihood, namely, (9) and (12). Here, is a prespecified vector of quantiles of interest. Examples of (13) are , , and .
Equation (13) is central to this work. It allows modelling choices via for the quantile function , and it represents a new modification to the VGLM/VGAM framework. Note that resembles a link function within the VGLM/VGAM framework as in Table 1. Two notes: first, without any loss of generality, (13) can be seen (strictly) as a function of since the quantiles and the covariates are known. Secondly, is monotonic and one–to–one, as a result of the composite of and which also hold such properties. However, during the fitting process, the IRLS algorithm internally requires the inverse of . Working with 1–parameter distribution at this stage eases the implementation via Fisher scoring because the inverse can be derived manually and then incorporated in the IRLS algorithm. In a few cases, the inverse of does not have a closed form, such as the 1–parameter gamma distribution, and an alternative iterative method is employed to approximate . To achieve this efficiently, two choices are available. These are (a) newtonRaphson.basic() from VGAMextra and (b) VGAM::bisection.basic(), two vectorized implementations of the well-known Newton–Raphson and bisection algorithms, to solve the roots of a real-valued function in a given interval . Further details are given in Section 2.2.
One advantage of this work is that the VGLM/VGAM framework can circumvent the quantile crossing problem (e.g., [2, 11], Sect. 2.5) by choosing and (an -vector of ones). Under this parallelism assumption the method borrows strength across the entire data set so that the additive models for with respect to are parallel. Each family function in Tables 2 and 3 has a parallel argument which is FALSE by default. Using the syntax of VGAM based on Chambers and Hastie , setting parallel = TRUE (or parallel = FALSE ~ 1) results in and ; i.e., it is false for only the intercept.
denotes the inverse of the logit() transformation.
It is noted that for some distributions such as the exponential and Maxwell the are naturally parallel with respect to because has the form . If this is so then only the intercepts will change as a function of and the MLEs for are the same. Other distributions such as the 1-parameter gamma do not possess this property, and then it is necessary to constrain to avoid the quantile crossing problem.
2.1. Two Derivations
Ideally the link transforms the support of to because should be unbounded. The three most common cases are as follows. For a log link is recommended, for a logit link is a good choice, and means that an identity link is natural. These cases have been implemented for seven 1–parameter distributions. The selection of the function for each is shown in the 5th column of Table 2, whilst the resulting quantile links as functions of are shown in the last column.
We now describe the quantile links for the exponential and the Topp–Leone distributions as examples. Firstly, for , with a rate parameter , the density and CDF are given by and . With a slight change in notation, the quantile function is given by , i.e.,which lies in regardless of the values of and . Given that values of are known (prespecified by the user), (14) becomes a function of . Thus, the new quantile link for the exponential distribution as shown in Table 2 is simply obtained by taking as the logarithmic transformation, as follows:This quantile link has been implemented in VGAMextra via the function expQlink(), as shown in Table 3. Its inverse (denoted as ) can be manually obtained from the inverse of (15). Note that the corresponding family function (exponential()) implemented in VGAM includes a (known) location parameter , which gives the density . By default , and it is handled by the argument location.
Secondly, consider the Topp–Leone distribution whose support is andwith . Here, . To verify this restriction note that , for any shape parameter , and hence for any ,Thus, to allow the quantile function to be modelled by covariates, we take the logit transformation as . The resulting quantile link for this distribution is simply , shown in Table 2. The distribution has CDF for , and density . The quantile function derives from solving the equation , for , which leads to the quadratic equation . The solution must lie in and is in fact (16), as a function of . The family function topple() from VGAM estimates , where the default link is .
2.2. Software Implementation
For practical use by others, we have implemented seven VGLM–quantile links, in the R package VGAMextra. They are summarized in Table 2. The package VGAM is a requirement of VGAMextra because the modelling functions vglm() and vgam(), and all but the last family function of Table 2, reside there. For this paper VGAMextra 0.0-2 and VGAM 1.1-0 or later are required; they are available at www.stat.auckland.ac.nz/~vmir178 and www.stat.auckland.ac.nz/~yee/VGAM/prerelease/ whilst older versions of both are available on CRAN (http://CRAN.R-project.org).
One special case is gamma1Qlink(), for the 1–parameter (shape) gamma distribution, defined aswhose primary arguments are and . Its inverse (Table 3) does not admit a closed form and it is approximated by the function VGAM::newtonRaphson.basic(), a vectorized implementation of the Newton–Raphson algorithm. Almost all implementations elsewhere of this are for a scalar argument, but we operate on vectors of length . It works as follows. Our data is effectively , whilst the quantiles of interest, or , must be entered by the user. The shape parameter is estimated by IRLS and therefore it is available at each iteration. Thus, for each , the ‘inverse’ is given by the root, , of the function
Finally, the inverse of all the VGLM–quantile links is shown in Table 3, as well as the name of the corresponding implementation in VGAMextra. The inverse–links are required at different stages of the IRLS by Fisher scoring, which internally switches between (namely, Table 2) and (namely, Table 3). Specifically, the algorithm requires the score vector and the EIMs at each IRLS iteration, which are given by the following chain–rule formulas:
Internally, the functions utilized to compute the inverse are VGAM::eta2theta() or VGAM::theta2eta(). The VGAMextra Manual and Miranda-Soberanis  give further details about the derivation of the quantile links, whilst Yee  describes in the IRLS and Fisher scoring algorithms for estimating VGLMs and VGAMs. Complements at the second author’s homepage give additional details on link functions.
2.3. Software Use
For the user, this methodology runs as usual by calling the modelling functions VGAM::vglm() and VGAM::vgam(), except for two modifications that are described below.
To start with, we give the following output that shows the central arguments handled by VGAM::vglm():
The first adjustment takes place with the argument formula, a symbolic description of the model to be fit. Usually, an expression like y ~ x2 + x3 should suffice for a response y and covariates x2 and x3. This effectively works for univariate and even for multiple responses say y1, y2, and y3, where the only change is to set cbind(y1, y2, y3) ~ x2 + x3. Here, the right hand side (RHS) of the formula is applied to each linear predictor.
For quantile modelling using VGLMs and VGAMs, Q.reg() must be incorporated in the formula, whose arguments are shown in Table 4. For a given set of quantiles of interest, entered through , Q.reg() replicates the response matrix Y into columns, where denotes the number of columns of Y. Then, the RHS of the formula applies to every set of columns according to the number of quantiles of interest. Ordinarily the response is a vector so that and .
As an example, suppose that we have two responses and sampled from a prespecified distribution , as per Table 3, and the quantiles of interest are . Then Q.reg(cbind(Y1, Y2), pvector = p) will return a matrix with six columns, with the first three columns being , one for each quantile, and similarly the last three columns equal . Thus vglm() handles this model as a multiple responses fit.
The second adjustment is related to the argument family, a function that describes the statistical model to be fitted. Each family has at least one argument for the link functions to be used in the fitting process (the name changes from family to family). For example, for VGAM::exponential() this is called link, whilst for the family function VGAM::benini1() (see the third column of Table 3), it is called lshape. When VGLM–quantile modelling is to be performed, the corresponding link (last column of Table 3) must be entered into the family accordingly. All the quantile links manage the same arguments, including p, the vector of quantiles, except by benini1Qlink() which has the additional argument y0.
With both modifications, a typical call has the following form:
Further fitting variants can be incorporated here, e.g., categorical covariates and the use of smoothers such as regression splines. These and a few other features are illustrated in the following section.
3.1. Maxwell Data
We use simulation to generate random variates from a Maxwell distribution whose rate parameter is a function of a single covariate . To account for a nonlinear trend in the dataset, additive models with cubic smoothing splines appear to be a better choice over linear schemes such as with VGLMs. In this example we perform the following steps to confirm the performance of the methodology.(1)Generate random deviates from the Maxwell distribution.(2)Run conditional VGAM–quantile modelling using maxwellQlink() based on the VGAM family function VGAM::maxwell(), which estimates the Maxwell distribution by Fisher scoring.(3)Perform ordinary quantile regression using VGAM::alaplace1() that estimates the 1–parameter ALD by Fisher scoring. Here, the special argument tau will be employed.(4)Plot the artificial data with the estimated quantile functions, (from (2)), and the estimated quantile curves (from (3)) superimposed.
We will consider the quantiles 25%, 50%, and 75% for simplicity, so that .
Regarding (1), the data is generated by VGAM::rmaxwell(), which gives random deviates from the Maxwell distribution whose density is . We use the rate functionwhere , . The following code chunk sets things up and the dataset is saved as maxdata.
The following code chunk performs steps (2) and (3). Note the fitting of additive models via VGAM::vgam() with smooth terms defined by VGAM::s() where is to be smoothed. To compare both fits, they are saved in fit.Qmodelling (from (2)) and fit.Qregression (from (3)).
Figure 1 shows the simulated data, the estimated quantile functions, and the fitted quantile curves from fit.Qmodelling and fit.Qregression, obtained from vector smoothing spline fits . The results are similar for , but our present work performs better at the bottom LHS tail. The data coverage from each modelling framework is summarized in Table 5. Once again our work outperforms the ALD method.
We conclude with a few remarks.(1)The argument p is available for all quantile links in Table 3 and not only for maxwellQlink(). It can be assigned any vector of percentile values.(2)Under the conditional VGAM–quantile modelling framework, the arguments to handle the parallelism assumption such as the arguments parallel.locat and parallel.scale in family functions are no longer required. This is internally managed by the new quantile links rather than being managed by the family function.(3)If fit is a Qlink fit, then fitted(fit) returns the fitted quantiles. This is in the form of a matrix. Similarly, predict(fit) returns a matrix where the th row is .
3.2. Comparison with the Quantreg Package
For checking purposes, the results are compared with quantreg too. Figure 2 gives the results based on the following code.
The results should be the similar to Section 3.1 because the ALD and the classical quantile regression method are essentially the same. It can be seen that the bottom LHS corner is not modelled well with quantreg either. Once again our method performs best, which is not surprising given the strong distributional assumption.
3.3. Exponential Data
Feigl and Zelen  fit an exponential distribution to a data set comprising the time to death (in weeks) and white blood cell counts for two groups of leukaemia patients, and a binary variable for AG-positive and AG-negative. The two groups were not created by random allocation. The variable AG is the morphological variable, the AG factor; a numeric vector where 1 means AG-positive and 2 means AG-negative. We create AG01 which is AG - 1. We take the log of the white blood cell count (WBC) because it is very highly skewed. The data are found in GLMsData on CRAN, which supports Dunn and Smyth .
One benefit of quantile modelling with VGLMs is that it easily allows comparisons of the effect of AG01 or any other indicator variable, at different quantiles. First note that, for AG-positive patients with logWBC = 9, the 25% percentile for time to death is around weeks, whilst the 75% percentile is about weeks. Secondly, the coefficient of AG011 measures the influence of the AG factor on the time to death. Keeping the levels of WBC constant, for patients at either the 25% or 75% percentiles, the time to death for AG–negatives compared to AG–positives is multiplicative by a factor of , i.e., a % reduction in lifetime.
For further illustration’s sake, we fit a 1-parameter gamma distribution to these data and interpret the results. Unlike the Maxwell and exponential distributions, where simple mathematics shows that different quantiles are parallel because their logarithm is additive with respect to , the 1-parameter gamma does not possess this property.
Here, keeping the level of WBC constant, for patients at the 25% percentile, the time to death for AG–negatives compared to AG–positives is multiplicative by a factor of , i.e., a % reduction. In comparison, for patients at the 75% percentile, the time to death for AG–negatives compared to AG–positives is multiplicative by a factor of , i.e., a % reduction. This suggests that the effect of AG is greater for more severe cases than those who live longer in general.
Finally, just to check, we obtain the constraint matrices for each predictor:
There is a parallelism assumption made for logWBC but not for any of the other explanatory variables.
4. Discussion and Future Work
This work in parametric quantile regression is blighted by the strong assumption of the assumed distribution. In theory, this might be ameliorated somewhat by implementing as many distributions as possible. Some of the distributions listed in Table 2 have real applications, for example, in kinetic-molecular theory the speed of individual molecules of idealized gases follows the Maxwell distribution and the average kinetic speed is directly related to Kelvin temperature. In experiments that do not satisfy the various postulates made (such as the effects of the container) one might model the median particle speed with comprising temperature and other covariates such as volume of the container and density of the container walls. Different forms of gases, such as plasmas and rarefied gases, could be modelled as such too. Another example is the Rayleigh distribution which is similar to the Maxwell distribution. In two-dimensions, and in applications of magnetic resonance imaging (MRI), complex images are often viewed in terms of the background data, which is Rayleigh distributed. Nonstandard background information could be included in and their effects on the distribution examined.
In the current software implementation there are limitations due to its internal design. For example, it would be good if
worked like many other VGAM models. The difficulty here is that the @linkinv S4 slot of a VGAM family function has eta as an argument, and in our implementation this could only possibly be created by supplying the new percentiles to predict() beforehand.
Another minor deficiency in our software implementation is that the response vector is replicated times so that is a form of recycling. Possibly this could be avoided because the memory requirement might be excessive when either or are very large.
At present, the VGAM framework has infrastructure to afford 1–parameter quantile links. For quantile functions depending on 2 or more parameter, such as the two-parameter gamma distribution, the quantiles will be bivariate functions whose inverse would probably not admit a closed form. Nevertheless, future work includes being able to write links for two-parameter distributions, of which the normal distribution would be the most important. For this, the methodology behind Yee and Miranda-Soberanis  could be employed; they solve a decades-old problem implementing the two-parameter canonical link function of the negative binomial distribution. We have already commenced work in this direction, e.g., with the 2-parameter gamma distribution.
The data used to support the findings of this study are available in the Supplementary Materials.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
VM’s work was supported by University of Auckland Doctoral Scholarship.
The file is a text file that comprises R commands that can be copied into R. The file contains information about what R packages need to be installed and loaded to run properly. There are no copyright issues. Some of the data is contained in the R packages, and some of the data is simulated data. (Supplementary Materials)
- R. Koenker and G. Bassett Jr., “Regression quantiles,” Econometrica, vol. 46, no. 1, pp. 33–50, 1978.
- R. Koenker, Quantile Regression, vol. 38, Cambridge University Press, New York, USA, 2005.
- A. Noufaily and M. C. Jones, “Parametric quantile regression based on the generalized gamma distribution,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 62, no. 5, pp. 723–740, 2013.
- R. A. Rigby and D. M. Stasinopoulos, “Generalized additive models for location, scale and shape,” Journal of Applied Statistics, vol. 54, no. 3, pp. 507–554, 2005.
- T. J. Cole and P. J. Green, “Smoothing reference centile curves: the LMS methods and penalized likelihood,” Statistics in Medicine, vol. 110, no. 5, pp. 1305–1319, 1992.
- T. W. Yee, Vector Generalized Linear and Additive Models with an Implementation in R, Springer, New York, USA, 2015.
- T. W. Yee and T. J. Hastie, “Reduced-rank vector generalized linear models,” Statistical Modelling. An International Journal, vol. 3, no. 1, pp. 15–41, 2003.
- J. Nelder and R. Wedderburn, “Generalized linear models,” Journal of the Royal Statistical Society, vol. 1350, no. 3, pp. 370–384, 1972.
- P. H. C. Eilers and B. D. Marx, “Flexible smoothing with B-splines and penalties,” Statistical Science, vol. 11, no. 2, pp. 89–121, 1996.
- P. J. Green and B. W. Silverman, Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, Chapman & Hall, London, UK, 1994.
- X. He, “Quantile Curves without Crossing,” The American Statistician, vol. 51, no. 2, pp. 186–192, 1997.
- J. M. Chambers and T. J. Hastie, Statistical Models in S, Chapman & Hall, New York, USA, 1993.
- V. Miranda-Soberanis, Vector Generalized Linear Time Series Models with an Implementation in R [Ph.D. thesis], Department of Statistics, University of Auckland, New Zealand, 2018.
- T. W. Yee and C. J. Wild, “Vector generalized additive models,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 3, pp. 481–493, 1996.
- P. Feigl and M. Zelen, “Estimation of exponential survival probabilities with concomitant information,” Biometrics, vol. 21, no. 4, pp. 826–838, 1965.
- P. K. Dunn and G. K. Smyth, Generalized Linear Models with Examples in R, Springer, Springer, New York, 2018.
- T. W. Yee and V. F. Miranda-Soberanis, “Vector generalized linear models and negative binomial regression,” Australian & New Zealand Journal of Statistics, 2018.
Copyright © 2019 V. F. Miranda-Soberanis and T. W. Yee. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.