Abstract

Associated longitudinal response variables are faced with variations caused by repeated measurements over time along with the association between the responses. To model a longitudinal ordinal outcome using generalized linear mixed models, integrating over a normally distributed random intercept in the proportional odds ordinal logistic regression does not yield a closed form. In this paper, we combined a longitudinal count and an ordinal response variable with Bridge distribution for the random intercept in the ordinal logistic regression submodel. We compared the results to that of a normal distribution. The two associated response variables are combined using correlated random intercepts. The random intercept in the count outcome submodel follows a normal distribution. The random intercept in the ordinal outcome submodel follows Bridge distribution. The estimations were carried out using a likelihood-based approach in direct and conditional joint modelling approaches. To illustrate the performance of the model, a simulation study was conducted. Based on the simulation results, assuming a Bridge distribution for the random intercept of ordinal logistic regression results in accurate estimation even if the random intercept is normally distributed. Moreover, considering the association between longitudinal count and ordinal responses resulted in estimation with lower standard error in comparison to univariate analysis. In addition to the same interpretation for the parameter in marginal and conditional estimates thanks to the assumption of a Bridge distribution for the random intercept of ordinal logistic regression, more efficient estimates were found compared to that of normal distribution.

1. Introduction

Many longitudinal studies are designed so that more than one response variable is recorded for the same subject. Thanks to various statistical tools, different types of outcomes such as count, nominal, and continuous can be analyzed jointly. For instance, patients after coronary artery bypass graft surgery are monitored several times a day, and their vital signs and bleeding condition are checked. In the case of kidney transplant data which is well explained in the following sections, the frequency of acute kidney transplant rejection as the count variable and estimated glomerular filtration rate as an ordinal variable are followed up for one year after the transplantation. In addition, the impact of several predictive variables such as patients’ diabetes condition, antithymocyte globulin, urinary tract infection, and hypercalcemia on the response variables were assessed.

Univariate analysis of longitudinal outcomes has been developed by many authors during the last decades. The within-subject effect can be taken into account using several approaches such as covariance pattern model in generalized estimating equations or a random effect in subject-specific models [13]. Copula-based approaches have been utilized to account for the serial dependence of repeated observations [4, 5]. Various methods have been developed for jointly modeling longitudinal and survival data [6]. Kassahun et al. used generalized linear mixed models to model two longitudinal outcomes simultaneously. They considered weight and days of illness as the continuous and overdispersed count responses, respectively, [7]. Seyoum et al. considered the determinants of CD4 cell count change and adherence to highly active antiretroviral therapy among HIV adult patients, and they utilized a generalized linear mixed model to determine joint predictors of two longitudinal response variables over time. The adherence and CD4 cell were considered as ordinal and count outcomes respectively [8]. Joint modeling of discrete outcomes has been developed and discussed by Molenberghs and Verbeke. In addition to several approaches for jointly modeling two longitudinal responses, the association among the repeated measurements and between the two longitudinal processes can be considered via correlated random effects [9].

Copulas are widely used to integrate separate univariate regression models into a joint regression analysis of response variables [10, 11]. Copulas are very useful to combine two or more response variables from different types such as continuous, ordinal, and count [11, 12].

Several modeling approaches have been introduced to model clustered and hierarchical settings of response variables such as ordinal and overdispersed and/or zero inflated longitudinal count outcomes [1315]. For count data, a Poisson distribution with a log-linear link function is commonly assumed. In the case of inequality of mean and variance of count response due to the extra heterogeneity, overdispersion would be considered in the modeling process. Negative binomial is a common choice to address this issue [16]. One of the key points in negative binomial regression, compared to other methods for analyzing overdispersed data, is that the mean is a single parameter. The negative binomial regression can be an extension of the Poisson-gamma model in which the mean in the Poisson distribution follows a gamma distribution in order to take the overdispersion into account [17, 18]. In the case of inflated zeros in the data, zero-inflated models are used [1, 19]. Analyzing methods for binary data have been extended to nominal and ordinal categorical outcomes [20].

Most commonly, a normally distributed random effect is considered for the ordinal outcome. In the case of a binary response, Wang and Louis showed that assuming a Bridge distribution for the random intercept in a logistic mixed model forces the fixed effects to have a marginal similar to conditional interpretation (conditional on the random intercepts) [21]. This idea was then applied by Lin et al. for evaluating the association between binary and continuous clustered data [22]. Bridge distribution is a useful technique for conditional assessment while allowing for meaningful marginal regression effects, especially for logistic regression. The key advantage of using Bridge distribution is that it allows the marginal probability of the binary response, integrated over the random intercept, to have a logistic structure with an odds ratio interpretation of the marginal regression effect. The regression parameters in the marginal logistic regression model are proportional to the corresponding regression parameters in the subject-specific conditional logistic model.

In this study, we aim to combine an ordinal and a possibly overdispersed count response variables by combining their correlated random intercepts. We assume a Bridge distribution for the ordinal outcome random intercept which is an extension of a hierarchical binary outcome. A Gaussian copula is used to integrate the two random intercepts [12]. This model can be fitted using commercially available statistical software, including the NLMIXED procedure of the Statistical Analysis System (SAS).

2. Materials and Methods

2.1. Notations and Distributions

For subject at occasion , let represent the count response and is the ordinal response variable which can take the values . The count response variable follows a Poisson distribution. When overdispersion is present, a negative binomial distribution with overdispersion parameter () and success probability () is used instead. The dispersion statistic can be determined as the ratio of the Pearson to its degrees of freedom and is a common method used to estimate overdispersion [23]. It has been argued that overdispersion exists if the dispersion statistic is over 1.2 and negative binomial outperforms simple Poisson model [24]. A logarithmic function is used to link the expected count response () to the systematic component () where is the matrix of independent variables and is the vector of corresponding coefficients. The random intercept of count response submodel () follows a normal distribution (mean zero and variance). The ordinal response variable is assumed to follow multinomial distribution and is linked to a linear function of covariates () via a traditional logit function. The random intercept of the ordinal response submodel () follows a Bridge or normal distribution (mean zero and variance). In the case of two associated response variables, a correlation parameter () takes the association between the random intercepts into account. Details about the notations have been prepared in the Appendix.

2.2. Model Specification and Estimation

The two associated submodels are shown in equation (1) in which the mean count and ordinal response variables are linked to their systematic components through logarithm and logit functions. Let, then the model can be specified as follows:

Bridge and normal distributions are considered for the random intercept of the ordinal and count outcome submodels. A copula approach is used in the case of different distributions to combine the random intercepts. The Bridge distribution proposed by Wang and Louis [21] allows both the conditional and marginal probabilities of the binary response to follow a logistic structure. Here, we have extended this assumption to an ordinal response variable. The Bridge distribution is well illustrated by Wang and Louis [21].

2.3. Bridge Distribution

Let be the cumulative Bridge distribution for subject-specific random effect with the attenuation parameter and let be an inverse link function with the characteristics of being monotone, increasing, and twice differentiable.

In equation (2), is an unknown constant parameter and is equal to zero when the link unction is symmetric. To find the Bridge function, we need to use Fourier transformation and convolution operation. After differentiating and applying the convolution operation and Fourier transformation to equation (2), one can determine the general density function of Bridge distribution for any link function as shown in [3] where.

In the case of logit link function, the density function of Bridge distribution is as [4]:

The mean and variance of the Bridge distribution are zero and , respectively. The intraclass correlation (ICC) can be determined by 1-.

The bridge density is symmetric with heavier tails and higher kurtosis than the normal density. The conditional and marginal probabilities have the same logistic form. The parameter of the bridge distribution is the same for each subject which assures the exchangeability of subject random effects on the binary responses. More details about the Bridge distribution for binary response variables is well discussed in more details elsewhere [21, 22].

2.4. Gaussian Copula

Among many of copula families, Gaussian copula provides a convenient way to describe a complex relationship [25]. The Gaussian copula function is where standard normal cumulative distribution function. Let and follow a standard bivariate normal distribution so that , and . Therefore, the bivariate distribution function of the random intercepts is

The variance matrix of two response variables can easily be seen to be given by where is the diagonal overdispersion matrix, is the diagonal variance matrix of response variables assuming zero random effects, and (here an identity matrix) is the matrix denoting the correlation among residual errors.

2.5. Likelihood and Estimation

The likelihood function can be written as below where is the standard normal density function:

2.6. Real Data Application

We applied the proposed model to two read datasets in which associated count and ordinal response variables are assessed using several independent variables longitudinally.

2.7. Migraine Data

As a first example, we consider a prospective, two-arm, randomized, triple-blind, placebo-controlled trial in the neurology clinic of Shohadaye-Tajrish hospital, Tehran, Iran [26]. The patients were randomly divided into two equal groups, coriander fruit syrup and control (33 patients in each group). In addition to 500 mg of sodium valproate per day, the patients received either 15 mL of coriander fruit syrup or 15 mL of placebo syrup, three times a day, for a month, according to the code provided by the department of traditional pharmacy in Tehran University of Medical Sciences, Tehran, Iran. The subjects were measured at weeks 1, 2, 3, and 4. Each time, the mean severity of pain was evaluated, on a ten-point visual analog scale (VAS). Severity was categorized into three levels (0-0.30, 0.30-0.60, and 0.60-1) as the ordinal response. Moreover, the frequency of migraine attacks were recorded and used as the count response variable. The severity and frequency of migraine attacks were significantly associated at four time points, and hence, the joint modeling approach was utilized. At the end of each week, patients were referring to the neurology clinic to report the requested items. Time, intervention, and their interaction were considered as the predictive variables.

2.8. Kidney Transplant Data

In this historical cohort study, patients referred to the kidney transplant center of Urmia University of Medical Sciences from 2003 to 2014 were investigated [27]. Two main longitudinal response variables, the frequency of acute kidney transplant rejection as the count variable and estimated glomerular filtration rate as an ordinal variable in 5 levels, were assessed. These variables were recorded every 4 months after the transplantation, during one year. Glomerular filtration rate was estimated from abbreviated prediction equation provided by the Modification of Diet in Renal Disease study (MDRD). The stages were determined using the National Kidney Foundation (NKF) criteria as stage 1 with normal or high eGFR (), stage 2 mild chronic kidney disease (), stage 3 moderate chronic kidney disease (), stage 4 Severe chronic kidney disease (), and stage 5 end stage chronic kidney disease () [28]. The independent and predictor variables were the type of kidney donor (relative/nonrelative), recipient’s age and sex, anemia (yes/no), type of medication (Azathioprine/Cellcept/both/none), diabetes (yes/no), and antithymocyte globulin (yes/no), as well as complications after transplantation, such as proteinuria (yes/no), hyperkalemia (yes/no), hyperuricemia (yes/no), leukopenia (yes/no), myocardial infarction(yes/no), delayed graft function (yes/no), acute tubular necrosis (yes/no), urinary tract infection (yes/no), chronic allograft necrosis (yes/no), dyslipidemia (TG/CHOL), liver dysfunction (BIL-T/BIL-D), and hypercalcemia (yes/no).

2.9. Simulation Settings

A simulation study is conducted to assess the impact of different magnitudes of overdispersion, sample size, and the correlation between the random intercepts, in joint modelling of hierarchical ordinal and count outcome. Moreover, we assessed the difference in performance of the joint model regarding the distribution of (normal vs. Bridge) random intercept in the ordinal logistic regression submodel.

Bivariate normal distribution was used to generate associated normally distributed random intercept with () as the correlation coefficient. Gaussian copula was utilized to generate associated random intercepts where the associated random intercepts in the count and ordinal submodels follow normal and Bridge distributions, respectively. The correlations between the random intercepts () were chosen in three levels as zero (no correlation between the random intercepts), 0.4 (medium correlation between the random intercepts), and 0.8 (high correlation between the random intercepts). A continuous independent variable (time) was generated from uniform distribution (between 1 and 5) in 5 different points (5 occasions) so that the occasions differ for each subject, and time intervals are not equal. A binary independent variable (group) was generated from binomial distribution with 0.5 as the probability. The count response variable was generated from a negative binomial distribution with the probability equal to the exponential of its systematic component and three different negative binomial parameter values (1 as high, 5 as medium, and 150 as low overdispersion). The ordinal response variable with three levels (: threshold 1, : threshold 2) was generated from a multinomial distribution using the probabilities associated with logit link function. The model was fitted on three different sample sizes 50, 200, and 500. The models were specified as

Moreover, and using the same generated datasets as explained above, the simulation results from the joint models were compared with separate univariate models to demonstrate the outperformance of joint modeling approach. The comparisons were made using the sets of data in which the sample sizes were 50 and 200 and the correlations between the random intercepts were chosen as zero, 0.4, and 0.8. Absolute value of bias () and mean square error (MSE) were used to compare the estimated coefficients with their true values.

2.10. Computational Support

The R software version 3.3.1 was used to generate the datasets using several packages including “copula,” “bridgedist,” “MASS,” and “mnormt.” In addition, the NLMIXED procedure in SAS program version 9.2 was utilized to fit the proposed joint models.

3. Results

3.1. Migraine Data

Descriptive statistics of continuous and categorical characteristics of the two groups are described in details elsewhere [26]. The frequency of migraine attacks followed a Poisson distribution without any evidence of overdispersion and zero inflation. The longitudinal joint model was used to combine the frequency and severity of migraine attacks, and the results are shown in Table 1. However, the intensity of migraine attacks has reduced significantly over time; the coriander fruit syrup reduced the odds of higher stages of severity compared to the control group over time. Likelihood ratio test confirmed the use of random intercepts in the model. A significant association was found between the random effects (, ) which confirms the use of a joint modeling approach. Based on the results, the odds of higher categories of migraine attack severity in the intervention group was 0.012 times than that of control group after each week. Moreover, the mean number of migraine attacks after each week in the intervention group was 0.28 less than that of control.

3.2. Kidney Transplant Data

The longitudinal joint model was used to combine the eGFR and frequency of rejections and the results are shown in Table 2. The frequency of rejections was overdispersed. The odds ratio of being in a higher stage of eGFR for one-year-older patient is 1.05 (). Females were less prone to experience a lower stage of eGFR compared to males. Being in a higher stage of eGFR for females was 2.42 times more than men. Patients without chronic allograft necrosis experienced higher stages of eGFR with the odds ratio of 0.44. The results exposes that time, DGF, ATN, and being diabetic did not affect eGFR. The acute rejection of kidney transplantation was assessed by ATG, DGF, UTI, CAN, liver dysfunction, and time. The expected number of rejections for a patient without ATG is 2.63 times the expected number of rejections for a patient with ATG (). In patients without hypercalcemia, the expected number of rejections would increase by a factor of 2.622. A positive correlation between eGFR stages and the frequency of acute kidney transplant rejection was observed. Moreover, the negative binomial parameter was significant. The results are shown in Table 2.

3.3. Simulation Results

Tables 38 show the simulation results. Regarding different sample sizes, different magnitude of overdispersion, and different amount of correlation between the random intercepts, the estimated values were almost close to the true values in various settings. Regarding AVB and MSE, the performance of the proposed joint model with the assumption of Bridge distribution for the random intercept of the ordinal outcome did not differ whether the random effect follows normal or Bridge distributions. The simulation results can be discussed in different aspects. We assumed three different scenarios for sample size, and the simulation results exposed that a sample size of 50 yields to considerable less accurate estimations than those of 200 and 500 subjects. In other words, considering 5 occasions for each subject, a total number of 250 observations was less efficient than that of 1000 and 2500. We considered three levels of dispersion as high, low, and medium. The higher the amount of negative binomial parameter, the bigger the estimated MSE and AVB. However, the large sample size decreases the amount of MSE and AVB in high value (low overdispersion). Generally and considering different scenarios, the magnitude of overdispersion did not affect the accuracy of the estimations after our proposed joint model. The standard deviation of the random intercepts and the estimated correlation coefficients between the random intercepts are very close to their true values with a relatively low MSE and AVB whether the random intercepts followed bivariate normal or Gaussian copula distributions. Moreover, the results were robust in different amount of correlation scenarios.

The results of comparing the joint and univariate models are presented in the Appendix. When the association between the random intercepts is high, the estimated coefficients in the joint modeling approach is significantly more robust than that of univariate approaches. The results when the sample is 50 is less accurate than that of 200. In the absence of correlation between the random intercepts, the estimations are almost the same for joint and univariate methods. For both of the joint and univariate approaches, our proposed model outputs reliable and accurate estimations whether the random intercept of the ordinal outcome was generated from normal or Bridge distributions.

4. Discussion

Longitudinal approaches consider the variation caused by the repeated measurements over the time. Two or more longitudinal response variables are frequently recorded in medical and clinical area. To combine the associated response variables, several approaches have been introduced and applied such as correlated random effects in a generalized linear mixed-effect model [1, 29, 30]. Primary studies were performed by Tate, and consequent researches have well discussed the conditional models as a major approach toward joint methods [31, 32]. Other approaches combine the responses directly [33]. In addition to GLMMs, the extension of Placket-Dale model to other mixed responses is straightforward and is well described elsewhere [29]. Besides, the generalizability of GLMM makes the extensions possible to other settings of combined discrete and continuous outcomes [1]. This article combined the ordinal and count outcomes through a bivariate distribution of their random intercepts. We assumed that the random intercept of the ordinal logistic regression follows Bridge distribution. Later, the normally distributed random intercept of the Poisson (negative binomial in case of overdispersion) was combined with the random intercept from the ordinal logistic submodel to take the association between the outcomes into account. Moreover, most of the studies assume that the random effects in a mixed model follow normal distribution, and we let the random intercepts follow Bridge and normal distributions. We compared the results of Bridge distribution with the frequently used assumption of normal. However, many of other distributions can be assumed based on the aim of research. The idea of using Bridge distribution for the random intercept of a longitudinal ordinal outcome can be used for both univariate and multivariate purposes.

In the case of a binary response variable, it had been shown that integrating over a normally distributed random intercept in a logistic model does not result in a closed form [22]. To make similar subject-specific and population average interpretations in terms of odds ratio, Bridge distribution was introduced [21]. In this paper, we have extended the use of Bridge distribution for the random intercept of binary logistic model to an ordinal proportional-odds cumulative logit model. Extra thresholds are estimated in the ordinal logistic model compared to the case of binary. The ordinal logistic regression requires some attentions due to its multiple categories in comparison to the case of binary. In the current study, the coefficients are interpreted in terms of proportional-odds cumulative logit model which is popular due to its relation to the concept of a continuous latent response variable [34].

The modeling of overdispersed count response was built upon several previous studies such as those of Molenberghs et al. in which normal and gamma random effects were used to take the subject-specific effect and overdispersion into account [35, 36]. Regarding the overdispersion, we assumed the count response to follow a negative binomial distribution which can be defined as a Poisson distribution with a gamma-distributed rate [17].

The simulation results showed that even if the random intercepts follow a bivariate normal distribution, assuming a Bridge distribution for the random intercept of the ordinal logistic submodel leads to estimations with lower standard error. Moreover, ignoring the association between the response variables resulted in estimations with higher MSE. It has been shown a rational association between the response variables.

In this study, we longitudinally combined ordinal and overdispersed count response variables in which the random intercepts follow Bridge and normal distributions, respectively. The random intercepts followed a bivariate normal copula. A maximum likelihood estimation approach was used to estimate the coefficients. The integration over the two random intercepts can be carried out via combination of analytical and numerical techniques. The SAS procedure NLMIXED is available for the estimation and integration processes.

5. Conclusion

Considering a Bridge distribution for the random intercept of ordinal logistic regression yields to accurate estimation even if the random intercept follows normal distribution. In the presence of any association between longitudinal count and ordinal responses, the estimations have lower standard error in comparison to univariate analysis.

Data Availability

The data is available on request contacting Dr. Payam Amini via email: [email protected].

Conflicts of Interest

No potential conflict of interest was reported by the authors.

Acknowledgments

The authors are grateful to Z. Ghorbanifar and H. Sayyadi for the permission to use the data.

Supplementary Materials

The Appendix section includes three sections as A: SAS codes for the joint model; B: simulation results to compare the efficiency of estimation from the joint and univariate models; and C: notations in the text. Appendix A describes the NLMIXED procedure for our proposed model. Appendix B includes Tables 1 and 2 in which the results of comparing the joint and univariate models are presented for 50 and 200 sample sizes, respectively. Appendix C describes the notations used in this manuscript. (Supplementary Materials)