Table of Contents Author Guidelines Submit a Manuscript
Journal of Probability and Statistics
Volume 2012, Article ID 194194, 18 pages
Research Article

A Two-Stage Joint Model for Nonlinear Longitudinal Response and a Time-to-Event with Application in Transplantation Studies

1Department of Biostatistics, Erasmus University Medical Center, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
2I-Biostat, Catholic University of Leuven, B-3000 Leuven, Belgium

Received 7 July 2011; Revised 27 October 2011; Accepted 6 November 2011

Academic Editor: Grace Y. Yi

Copyright © 2012 Magdalena Murawska et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


In transplantation studies, often longitudinal measurements are collected for important markers prior to the actual transplantation. Using only the last available measurement as a baseline covariate in a survival model for the time to graft failure discards the whole longitudinal evolution. We propose a two-stage approach to handle this type of data sets using all available information. At the first stage, we summarize the longitudinal information with nonlinear mixed-effects model, and at the second stage, we include the Empirical Bayes estimates of the subject-specific parameters as predictors in the Cox model for the time to allograft failure. To take into account that the estimated subject-specific parameters are included in the model, we use a Monte Carlo approach and sample from the posterior distribution of the random effects given the observed data. Our proposal is exemplified on a study of the impact of renal resistance evolution on the graft survival.

1. Introduction

Many medical studies involve analyzing responses together with event history data collected for each patient. A well-known and broadly studied example can be found in AIDS research, where CD4 cell counts taken at different time points are related to the time to death. These data need to be analyzed using a joint modeling approach in order to properly take into account the association between the longitudinal data and the event times. The requirement for a joint modeling approach is twofold. Namely, when focus is on the longitudinal outcome, events cause nonrandom dropout that needs to be accounted for in order to obtain valid inferences. When focus is on the event times, the longitudinal responses cannot be simply included in a relative risk model because they represent the output of an internal time-dependent covariate [1].

In this paper, we focus on a setting that shares some similarities with the standard joint modeling framework described above, but also has important differences. In particular, we are interested in the association between longitudinal responses taken before the actual followup for the time-to-event has been initiated. This setting is frequently encountered in transplantation studies, where patients in the waiting list provide a series of longitudinal outcomes that may be related to events occurring after transplantation. A standard analysis in transplantation studies is to ignore the longitudinal information and use only the last available measurement as a baseline covariate in a model for the allograft survival. It is, however, evident that such an approach discards valuable information. An alternative straightforward approach is to put all longitudinal measurements as covariates in the survival model. Nevertheless, there are several disadvantages with this approach. First, it would require spending many additional degrees of freedom, one for each of the longitudinal measurements. Second, patients with at least one missing longitudinal response need to be discarded, resulting in a great loss of efficiency. Finally, we may encounter multicollinearity problems due to the possibly high correlation between the longitudinal measurements at different time points.

Nowadays, when it comes to measuring the association between a longitudinal marker and an event-time outcome, a standard approach is to use the joint model postulated by Faucett and Thomas [2] and Wulfsohn and Tsiatis [3]. In this setting, the longitudinal responses are considered realizations of an endogenous time-dependent covariate (Kabfleish and Prentice [1]), which is measured with error and for which we do not have the complete history of past values available. To account for these features we estimate in the joint modeling framework the joint distribution of the survival and longitudinal processes. Unlike in the multivariate approach, where we have to interpret the estimates for each longitudinal measurement separately, the joint modeling approach allows to get more insight in the nature of the relation between the two analyzed processes since it assumes some underlying process for the longitudinal measures.

However, in contrast with the standard joint modeling setting, in our case (i.e., transplantation studies) the longitudinal responses do not constitute an endogenous time-dependent variable measured at the same period as the time to event. In particular, since the longitudinal measurements are collected prior to transplantation, occurrence of an event (i.e., graft failure after transplantation) does not cause nonrandom dropout in the longitudinal outcome. Nevertheless, the problem of measurement error still remains. Ignoring the measurement error affects not only the standard errors of the estimates of interest, but also it can cause attenuation of the coefficients towards zero [4]. To overcome this, we propose a two-stage modeling approach that appropriately summarizes the longitudinal information before the start of followup by means of a mixed effects model and then uses this information to model the time to event by including the Empirical Bayes estimates of the subject-specific parameters as predictors in the Cox model. To account for the fact that we include the estimates and not the true values of the parameters, we use a Monte Carlo approach and sample from the posterior distribution of the random effects. The proposed method does not require joint maximization neither fitting the random effects model for each event time as in the two-stage approach of Tsiatis et al. [5]. We compare this approach with the “naive” one when the uncertainty about the estimates from the first step is not taken into account, as well as with the full Bayesian approach. Our approach shares similarities with the two-stage approach of Albert and Shih [6]. They considered a model, in which a discrete event time distribution is modeled as a linear function of the random slope of the longitudinal process estimated from the linear-mixed model. The bias from informative dropout was reduced by using the conditional distribution of the longitudinal process given the dropout time to construct the complete data set. To account for the measurement error in the mean of the posterior distribution of the random effects, the variance, that incorporates the error in estimating the fixed effects in the longitudinal model, was used. However, we use sampling not to impute missing values and correct for nonrandom dropout but in order to account for the variability in the predicted longitudinal covariates that are then used in survival model. A method of adjusting for measurement error in covariates, which was used by Albert and Shih, does not apply in our case since it requires the discrete time-to-event and linear model for longitudinal data. The time-to-event could be discretized but still we have a nonlinear model for longitudinal data.

Our research is motivated by data from an international prospective trial on kidney-transplant patients. The study has two arms, where in the first arm donors’ kidneys were administered to cold storage and in the second arm they were administered to machine perfusion (MP). The advantage of machine perfusion is the possibility of measuring different kidney’s parameters reflecting the state of the organ. One of the parameters of interest is renal resistance level (RR), which has been measured at 10 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, and just before transplantation. Our aim here is to study the association of the renal resistance evolution profile with the risk of graft failure. The time of last measurement was different for different patients and often unknown exactly. However, based on the medical consult and visual inspection of the individual profiles, the last measurement was chosen to be taken at 6 hours for each patient.

The rest of the paper is organized as follows. Section 2 provides the general modeling framework with the definition of the two submodels for the longitudinal and survival data, respectively. Section 3 describes the estimation methods for the full likelihood and the proposed two-stage approach. In Section 4, we apply the two-stage approach to the renal data. Section 5 contains the setup and the results for the simulation study. Finally, in Section 6 we discuss the proposed methodology.

2. Joint Modeling Framework

Let 𝑌𝑖(𝑢) denote the longitudinal profiles for individual 𝑖, 𝑖=1,2,,𝑁. We assume that 𝑌𝑖(𝑢) are collected for the 𝑖th individual prior to the specified time 𝑡𝑖, 𝑢(0,𝑡𝑖). Let 𝑡=0 denote the time of the first longitudinal measurement and 𝑡𝑖 the time of the last collected measurement. 𝑡𝑖 can be different for different individuals, and we denote by 𝑚𝑖 the number of longitudinal measurements for subject 𝑖 collected until time 𝑡𝑖 and by 𝑢𝑖𝑗 the time of 𝑗th measurement. Denote by 𝑇𝑖𝑡𝑖 the true survival time for individual 𝑖. Since the survival time is right censored, we observe only 𝑇𝑖=min(𝑇𝑖,𝐶𝑖), where 𝐶𝑖𝑡𝑖 is the censoring time with the failure indicator Δ𝑖, which equals to 1 if the failure is observed and 0 otherwise, that is, Δ𝑖=𝐼(𝑇𝑖𝐶𝑖) with 𝐼() denoting the indicator function. We will assume that censoring is independent of all other survival and covariate information. In addition, we assume that the observed longitudinal responses 𝑌𝑖(𝑢) are measured with error (i.e., biological variation) around the true longitudinal profile 𝑊𝑖(𝑢), that is, 𝑌𝑖(𝑢)=𝑊𝑖(𝑢)+𝜀𝑖(𝑢),with𝜀𝑖(𝑢)𝑁0,𝜎2,𝜀cov𝑖(𝑢),𝜀𝑖𝑢=0,𝑢𝑢.(2.1) We will consider the longitudinal response that exhibits nonlinear profiles in time. Therefore, we approximate 𝑊𝑖(𝑢) by means of a nonlinear mixed effects model: 𝑊𝑖(𝑢)=𝑓𝑢;𝜙𝑖,with𝜙𝑖=𝐗𝑖𝜷+𝐙𝑖𝜶𝑖,(2.2) where 𝑓() is a nonlinear function, parameterized by the vector 𝜙𝑖. The parameters 𝜙𝑖 control the shape of the nonlinear function, and subscript 𝑖 denotes that each subject may have its own nonlinear evolution in time in the family 𝑓(;𝜙). For the subject-specific parameter 𝜙𝑖, we assume a standard mixed model structure with 𝐗𝑖 denoting the fixed effects design matrix with corresponding regression coefficients 𝜷, 𝐙𝑖 the random effects design matrix, and 𝜶𝑖 the random effects. The random effects 𝜶𝑖 are assumed to be independent and normally distributed with mean zero and variance-covariance matrix 𝐃.

For the event process, we postulate the standard relative risk model of the form: 𝜆𝑖(𝑡)=𝜆0𝜸(𝑡)exp𝑇𝜙𝑖,(2.3) where 𝜆𝑖(𝑡) is the hazard function and 𝜆0(𝑡) is the baseline hazard, which can be modeled parametrically or left completely unspecified. The subject-specific parameters 𝜙𝑖 summarize the longitudinal evolutions of the response for each subject, and therefore coefficients 𝜸 measure the strength of the association between the different characteristics of the underlying subject-specific nonlinear evolution of the longitudinal profiles and the risk for an event. Within the formulation of the two submodels (2.2) and (2.3), the same random effects now account for both the association between the longitudinal and event outcomes, and the correlation between the repeated measurements in the longitudinal process.

In the particular transplantation setting that will be analyzed in the following study, 𝑌𝑖(𝑢) are the renal resistance level measurements collected for the 𝑖th donor prior to the transplantation time 𝑡𝑖 and the same index 𝑖 is used to denote the allograft transplanted to the 𝑖th patient. Time 𝑡=0 represents the time that the kidney is removed from the donor and put in cold storage or in a perfusion machine.

3. Estimation

3.1. Full Likelihood Framework: Bayesian Approach

In the standard joint modeling framework, the estimation is typically based on maximum likelihood or Bayesian methods (MCMC). This proceeds under the following set of conditional independence assumptions:𝑝𝑇𝑖,Δ𝑖,𝐘𝑖𝜶𝑖𝑇;𝜽=𝑝𝑖,Δ𝑖𝜶𝑖;𝜽𝑡𝑝𝐘𝑖𝜶𝑖;𝜽𝑦,𝑝𝐘𝑖𝜶𝑖;𝜽𝑦=𝑚𝑖𝑗=1𝑝𝑌𝑖𝑢𝑖𝑗𝜶𝑖;𝜽𝑦.(3.1) In particular, we assume that given the random effects the longitudinal process is independent from the event times, and moreover, the longitudinal measurements are independent from each other.

Maximum likelihood methods use the joint likelihood and maximize the log-likelihood function 𝑙𝑖(𝜽)=𝑖log𝑝(𝑇𝑖,Δ𝑖,𝐘𝐢;𝜽). This requires numerical integration and optimization, which makes the fit difficult, especially in high-dimensional random effects settings. Standard options for numerical integration are Gaussian quadrature, Laplace approximation, or Monte Carlo sampling [7, 8]. Maximization of the approximated log-likelihood is based on an EM algorithm [3, 5, 911]. Several authors proposed a Bayesian approach (MCMC) [2, 12, 13]. Bayesian estimation, that generalizes a joint model for the case with multivariate longitudinal data, has been discussed by Ibrahim et al. [14]. Brown and Ibrahim [15] considered semiparametric model relaxing the distributional assumption for the random effects. In most papers, the longitudinal submodel is a linear mixed-effects model. Joint models with nonlinear mixed-effects submodels have been less studied in the literature [16]. Nonlinear mixed models are more common in pharmacokinetics and pharmacodynamics, where they are jointly modeled with nonrandom dropout using NONMEM software. Several authors considered a Bayesian approach with a nonlinear mixed model and informative missingness [17, 18].

Here we will proceed under the Bayesian paradigm to estimate the model parameter. Under the conditional independence assumption (3.1), the posterior distribution of the parameters and the latent terms, conditional on the observed data, are derived as 𝑝𝜽,𝜶𝑖𝑇𝑖;Δ𝑖;𝐘𝑖𝑁𝑚𝑖=1𝑖𝑗=1𝑝𝑌𝑖𝑢𝑖𝑗𝜶𝑖;𝜽𝑦𝑝𝑇𝑖,Δ𝑖𝜶𝑖;𝜽𝑡𝑝𝜶𝑖;𝜽𝛼𝑝𝜽𝑦,𝜽𝑡,𝜽𝛼,(3.2) where 𝜽𝑇=(𝜽𝑇𝑦,𝜽𝑇𝑡,𝜽𝑇𝛼) is a vector of parameters from the longitudinal and survival models and the vector of the random effects, respectively, and 𝑝() denotes the appropriate probability density function. The likelihood contribution for the 𝑖th subject conditionally on the random terms is given by𝑝𝐘𝐢,𝑇𝑖,Δ𝑖𝜶𝑖𝐘;𝜽=𝑝𝐢𝜶𝑖;𝜽𝑦𝑝𝑇𝑖,Δ𝑖𝜶𝑖;𝜽𝑡=𝜆0𝑇𝑖exp{𝜸𝑇𝜙𝑖𝜶𝑖}Δ𝑖exp𝑇𝑖0𝜆0𝜸(𝑡)exp𝑇𝜙𝑖𝜶𝑖×1𝑑𝑡2𝜋𝜎2𝑚𝑖/2exp𝑚𝑖𝑗=1𝑊𝑖𝑢𝑖𝑗,𝜶𝑖𝑌𝑖𝑢𝑖𝑗22𝜎2.(3.3) The baseline hazard can be assumed of a specific parametric form, for example, the Weibull hazard. For the priors of the model parameters, we make standard assumptions following Ibrahim et al. [14]. In particular, for the regression coefficients 𝜷 of the longitudinal submodel and for the coefficients 𝜸 of survival submodel, we used multivariate normal priors. For variance-covariance matrices, we assumed an inverse Wishart distribution and for the variance-covariance parameters we took as a prior an inverse gamma. For all parameters, the vague priors have been chosen.

The implementation of the Cox and piecewise constant hazard models is typically based on the counting process notation introduced by Andersen and Gill [19] and formulated by Clayton [20]. In particular, we treat the counting process increments 𝑑𝑁𝑖(𝑡) in the time interval [𝑡,𝑡+Δ𝑡] as independent Poisson random variables with means Λ𝑖(𝑡)𝑑𝑡:Λ𝑖(𝑡)𝑑𝑡=𝜔𝑖𝜸(𝑡)exp𝑇𝜙𝑖𝑑Λ0(𝑡),(3.4) where 𝜔𝑖(𝑡) is an observed process taking the value 1 if subject 𝑖 is observed at time 𝑡 and 0 otherwise and 𝑑Λ0(𝑡) is the increment in the integrated baseline hazard function occurring during the time interval [𝑡,𝑡+Δ𝑡]. Since the conjugate prior for the Poisson mean is the gamma distribution, we assume the conjugate-independent increments prior suggested by Kalbfleisch [21], namely,𝑑Λ0(𝑡)Gamma𝑐𝑑Λ0(𝑡),𝑐,(3.5) where 𝑑Λ0(𝑡) is a prior mean hazard with 𝑐 being a scaling parameter representing the “strength" of our prior beliefs. The gamma prior was also chosen for the baseline risk parameter of the Weibull distribution in parametric survival model. Alternatively to implement the Cox model in a fully Bayesian approach, one may use the “multinomial-Poisson trick” described in the OpenBUGS manual that is equivalent to assuming independent increments in the cumulative hazard function. The increments are treated as failure times, and noninformative priors are given for their logarithms. Analogically to the Cox model, a piecewise constant hazard model was implemented. We have modeled baseline hazard using a step function with 3 quantiles 𝑡1, 𝑡2, and 𝑡3 as changing points assuring the same number of events in between. Let 𝑡0 denote the start of the followup, 𝑡4 the maximum censoring time, and 𝑑Λ0𝑘(𝑡) the increment in the integrated baseline hazard function occurring during the time interval [𝑡𝑘,𝑡𝑘+1],𝑘=0,1,2,3. Then for different intervals, we specify a separate prior hazard mean 𝑑Λ0(𝑡) and𝑑Λ0𝑘(𝑡)Gamma𝑐𝑑Λ0𝑘(𝑡),𝑐.(3.6) Similarly as for the Cox model, the results were not sensitive with respect to the choice of the hyperparameters as long as the priors were sufficiently diffuse. The above nonparametric approach can be criticized as having the independent priors for the hazard distribution. However, as suggested by Kalbfleisch [21] a mixture of gamma priors can be considered as an alternative. Moreover, in a piecewise constant model one can also include change points as unknown parameters in the model as proposed in a Bayesian context by Patra and Dey [22] and applied by Casellas [23].

In order to assess convergence for the full Bayesian model, standard MCMC diagnostic plots were used. The burn-in size was set to 10000 iterations, which was chosen based on the visual inspection of the trace plots and confirmed by the the Raftery and Lewis diagnostics. The same number of iterations were used for constructing the summary statistics. Based on the autocorrelation plots, we have chosen every 30th iteration. Therefore, in total to obtain 10000 iterations for the final inferenc 300000 iterations were required after the burn-in part. Additionally, we run a second parallel chain and used Gelman and Rubin diagnostic plots to assess the convergence.

3.2. Two-Stage Approach

As mentioned in Section 1, the longitudinal measurements in our setting do not constitute an internal time-dependent covariate, since the events took place after the last longitudinal measurement was collected. In particular, since events do not cause nonrandom dropout, the event process does not carry any information for the longitudinal outcome. Mathematically, this means that information for the random effects 𝜶𝐢 is actually only coming from the longitudinal responses, that is,𝑝𝜶𝑖𝑌𝑖𝑢𝑖𝑗;𝑇𝑖;Δ𝑖;𝜽𝑦𝜶=𝑝𝑖𝑌𝑖𝑢𝑖𝑗;𝜽𝑦.(3.7) Thus, we can avoid the computational complexity of the full likelihood framework presented in Section 3.1 by employing a two-stage approach. More specifically, at Stage I, we obtain 𝜽𝑦 by maximizing the log-likelihood:𝑙𝑦𝜽𝑦=𝑁𝑖=1𝑝𝐘𝐢𝜶𝑖;𝜽𝑦𝑝𝜶𝑖;𝜽𝑦𝑑𝜶𝑖.(3.8) This requires numerical integration, and we use a Gaussian quadrature for that purpose. Then we obtain the corresponding empirical Bayes estimates:𝜶𝑖=argmax𝛼𝑌log𝑝𝑖𝜽𝜶;𝑦𝜽+log𝑝𝜶;𝑦(3.9) and compute the predictions:𝜙𝑖=𝐗𝜷+𝐙𝑖𝜶𝑖.(3.10) At Stage II, we fit the relative risk model:𝜆𝑖(𝑡)=𝜆0𝜸(𝑡)exp𝑇𝜙𝑖.(3.11) However, a potential problem in the above is that 𝜙𝑖 is not the true subject-specific parameters but rather an estimate with a standard error. If we ignore this measurement error, the regression coefficients 𝜸𝑖 will be possibly attenuated. To overcome this problem, we propose here a sampling approach to account for the variability in 𝜙𝑖, very close in spirit to the Bayesian approach of Section 3.1. In particular, we use the following sampling scheme.

Step 1. simulate 𝜽𝑦(𝑚)𝜽𝑁(𝑦,𝜽var(𝑦)).

Step 2. simulate 𝜶𝑖(𝑚)[𝜶𝑖𝐘𝐢,𝜽𝑦(𝑚)].

Step 3. calculate 𝜙𝑖(𝑚)=𝐗𝜷(𝑚)+𝐙𝑖𝜶𝑖(𝑚) and fit the relative risk model 𝜆𝑖(𝑡)=𝜆0(𝑡)exp{𝜸𝑇𝜙𝑖(𝑚)} from which 𝜽𝑡(𝑚)=̂𝜸(𝑚) and 𝜽var(𝑡(𝑚)) are kept.
Steps 13 are repeated 𝑚=1,,𝑀 times.

Step 1 takes into account the variability of the MLEs, and Step 2 the variability of 𝜶𝐢. Moreover, because the distribution in Step 2 is not of a standard form, we use a independent Metropolis-Hastings algorithm to sample from it with multivariate 𝑡-proposal density centered at an Empirical Bayes estimates 𝜶𝑖, covariance matrix 𝜶var(𝑖), and 𝑑𝑓=4. The low number of degrees of freedom was chosen to ensure that the proposal density has heavy tails to provide sufficient coverage of the target density [𝜶𝑖𝐘𝑖,𝜽𝑦]. The variance-covariance matrix estimated from the nonlinear mixed model was additionally scaled by some parameter 𝑆𝑐𝑎𝑙𝑒. The tuning parameter allows to control the acceptance rate through the range of the proposed distribution. If the range is too narrow, the proposed values will be close to the current ones leading to low rejection rate. On the contrary, if the range is too large, the proposed values are far away from the current ones leading to high rejection rate. We chose the acceptance rate to be 0.5 following Carlin and Louis [24] that suggests a desirable acceptance rates of Metropolis-Hastings algorithms to be around 1/4 for the dependence (random walk) M-H version and 1/2 for the independent M-H. Roberts et al. [25] recommended to use the acceptance rate close to 1/4 for high dimensions and 1/2 for the models with dimensions 1 or 2. They discussed this issue in the context of the random walk proposal density. The authors showed that if the target and proposal densities are normal, then the scale of the latter should be tuned so that the acceptance rate is approximately 0.45 in one-dimensional problems and approximately 0.23 as the number of dimensions approaches infinity, with the optimal acceptance rate being around 0.25 in as low as six dimensions. In our case, an independent version of Metropolis-Hastings algorithm is applied. The proposal density in the algorithm does not depend on the current point as in the random-walk Metropolis algorithm. Therefore, for this sampler to work well, we want to have a proposal density that mimics the target distribution and have the acceptance rate be as high as possible. In order to obtain a desirable acceptance rate one needs to run a sampling algorithm for a number of iterations for a given data set and compute an acceptance rate and then repeat the procedure changing the tuning parameter until the chosen acceptance rate, is obtained. Usually a small number of iterations (100–500) is sufficient for the purpose of calibration. More details about the Metropolis-Hastings acceptance-rejection procedure can be found in the supplementary material (available online at A final estimate of 𝜽𝑡 is obtained using the mean of the estimates from all 𝑀 iterations:𝜽𝑡=𝑀𝑚=1𝜽𝑚𝑡𝑀.(3.12) To obtain the SE of 𝜽𝑡, we use the variance-covariance matrix 𝐕:𝐁𝐕=𝐖+(𝑀+1)𝑀,(3.13) where 𝐖 is the average within-iteration variance and 𝐁 is the between-iteration variance, that is,𝐖=𝑀𝑚=1𝐔𝑚𝑀,1𝐁=𝑀1𝑀𝑚=1𝜽𝑚𝑡𝜽𝑡𝜽𝑚𝑡𝜽𝑡𝑇.(3.14)𝐔𝑚 represents a variance-covariance matrix estimated in iteration 𝑚 for ̂𝜸𝑚.

4. Analysis of the RR Data

4.1. Models’ Specification

We apply the proposed two-stage procedure and a fully Bayesian approach to the transplantation study introduced in Section 1. The data was taken from an international prospective trial on 337 kidney pairs, which aimed to compare two different types of storage, namely, cold storage and machine perfusion (MP). Here we focus on the second arm. Our main outcome of interest is graft survival time censored after 1 year. At the end of the study, only 26 graft failures were observed. The renal resistance level (RR) was expected to be an important risk factor for graft failure. It was measured using the perfusion machine at the moment of taking the organ out from a donor (𝑡=0), and thereafter at 10 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, and just before transplantation. As mentioned in the Section 1, the time of last measurement was different for different patients and sometimes unknown. However, there was a clear asymptote visible from the individual profiles that was reached after about 5 hours by each patient. Based on that behavior and after the medical consult, we chose the last measurement to be taken at 6 hours for each patient. Other variables of interest include the age of the donor, donor’s region (3 countries considered), and donor’s type (heart beating or non-heart-beating).

The individual profiles of 50 randomly selected kidney donors are presented in Figure 1. This plot confirms the biological expectation that allografts exhibit their highest renal resistance levels just after being extracted from the donor. Thereafter, they show a smooth decrease in RR until they reach an asymptote above zero indication that there is no “perfect flow" through the kidney. Furthermore, we observe that the initial RR level, the rate of decrease, and the final RR level differ from donor to donor. Additional descriptive plots for our data are presented in the supplementary material.

Figure 1: Individual profiles of renal resistance level for 50 sampled donors.

In the first step of our analysis, we aim to describe the evolution of the renal resistance level in time. Motivated by both biological expectation and Figure 1, we postulate the following nonlinear function: 𝑓(𝑡)=𝜙1+𝜙2𝑒𝜙3𝑡,(4.1) where 𝜙1 is a lower asymptote, 𝜙1+𝜙2 is an initial value for 𝑡=0, and 𝜙3 is the rate of decrease from 𝜙2 to 𝜙1 (see also Figure  2 in supplementary material).

To accommodate for the shapes of RR evolutions observed in Figure 1, we need to constraint 𝜙1, 𝜙2, and 𝜙3 to be positive. Moreover, in order to allow for individual donor effects, we use the following formulation:𝑌𝑖(𝑡)=𝑊𝑖(𝑡)+𝜀(𝑡),with𝑊𝑖(𝑡)=𝑓𝑖𝜙(𝑡)=exp1𝑖𝜙+exp2𝑖𝑒exp(𝜙3𝑖)𝑡,(4.2) where𝜙1=𝛽10+𝛽11DonorAge+𝛽12DonorType+𝛽13DonorReg1+𝛽14DonorReg2+𝛼1,𝜙2=𝛽20+𝛽21DonorAge+𝛽22DonorType+𝛽23DonorReg1+𝛽24DonorReg2+𝛼2,𝜙3=𝛽30+𝛽31DonorAge+𝛽32DonorType+𝛽33DonorReg1+𝛽34DonorReg2+𝛼3,(4.3) and 𝜶𝐢𝑁(0,𝐷),𝜀(𝑡)𝑁(0,𝜎2) with 𝜶=(𝛼1,𝛼2,𝛼3) and cov(𝛼𝑖,𝜀(𝑡))=0. In the second step, the predicted parameters (𝜙1,𝜙2,𝜙3) summarizing the RR evolution of the nonlinear mixed model are included in the graft survival model. The final model for graft survival was of the form:𝜆𝑖(𝑢)=𝜆0𝛾(𝑢)exp1𝜙1𝑖+𝛾2𝜙2𝑖+𝛾3𝜙3𝑖.(4.4) To investigate the impact of ignoring that the covariate 𝜙𝑖 is measured with error, we compared the naive approach in which we ignored this measurement error and our proposal that accounts for the uncertainty in 𝜙𝐢 via Monte Carlo sampling. We used Metropolis-Hastings algorithm with independent 𝑡-proposal and acceptance rate around 50% for the reason given in Section 3.2. We simulated 𝑀=10000 samples with additional initial step of the scaling parameter calibration in order to achieve the desirable acceptance rate. The final estimates (and SE) of the parameters associated with RR covariates were calculated using the formulas described in Section 3.2. We compared the results from the two-stage procedure with the estimates obtained from the fully Bayesian joint model fitted for the data using the priors specified in Section 3.1.

The analysis was performed using 𝑅 Statistical Software. Packages survival and nlme were used for the two submodels fit, and a separate code was written by the first author for the sampling part. The fully Bayesian model was fitted using OpenBUGS software with the priors specified in Section 3.1. In particular, for the 𝑝×𝑝 variance-covariance matrices of multivariate normal priors, we used inverse Wishart distribution with 𝑝 degrees of freedom. For the variance-covariance parameter of the normal longitudinal response, we took as a prior an inverse-Gamma (103,103). For the baseline risk parameter of the Weibull distribution in survival submodel, a Gamma(103,103) prior was used. To analyze the data using the fully Bayesian Cox model described in Section 3.1, we chose the scaling parameter 𝑐 in a gamma prior for the independent increments to be equal 0.001 and a prior mean 𝑑Λ0(𝑡)=0.1. We did not observe any substantial difference for the different values of parameter 𝑐 as long as 𝑐 was small enough to keep the prior noninformative. We do not recommend too small values of the scaling parameter 𝑐 as they can lead to the computation problems. Analogically we have chosen gamma priors for the piecewise constant hazard model. The code for the Bayesian full joint model, as well the R codes for the sampling two-stage procedure, is available from the authors on request.

4.2. Results

The results for the nonlinear-mixed model are presented in Table 1, for the two-stage approach and in supplementary material, for the full Bayesian approach with Weibull survival model. The results for the longitudinal part for the full Bayesian approach with Cox and piecewise constant hazard models were similar (not presented). It can be observed, based on two-stage model results, that only Donor Age had a significant impact on the RR asymptote. Donor Type and Region had a significant impact on the steepness parameter. However, we keep all the covariates in the model for the purpose of prediction for the second stage. The mean RR profiles for Heart-Beating and Non-Heart-Beating donors from different regions together with fitted values based on the obtained nonlinear mixed model are given in the supplementary material.

Table 1: Parameter estimates, standard errors, and 95% confidence intervals from the nonlinear mixed model for RR.

In the second step of the analysis, we applied at first the naive approach and used the estimates 𝜙1,𝜙2, and 𝜙3 from the nonlinear mixed model as fixed covariates in the final Cox models for graft survival. Table 2 presents the results for the survival submodel for the all approaches, namely, the plug-in method, two-stage approach, and the fully Bayesian model. For the fully Bayesian approach, the results for the parametric Weibull model together with Cox and piecewise constant hazard models are listed. The results from Table 2 indicate that only the RR asymptote could have a significant impact on graft survival.

Table 2: Parameter estimates, SE, and 95% confidence/credibility intervals from proportional hazards Cox model for graft survival for plug-in method (a), sampled covariates (b), and fully Bayesian approach (c, d, e).

We observe that the estimates are close or almost identical as in plug-in model. SE of the Cox regression coefficients for the model with sampling are greater than SE from the plug-in model in Table 2(a), especially for the parameter 𝜙3. The increase in SE is somewhat the expected and is caused by the additional variability in covariates captured by the sampling approach. The fully Bayesian model produces similar results to our semi-Bayesian sampling model with somewhat lower SE. We do not observe substantial difference between fully parametric and nonparametric models in a fully Bayesian approach. Since in the analyzed real data the number of events is small, the fully Bayesian Cox and piecewise constant hazard Bayesian models were expected to produce similar results. We did not observe any substantial difference for the different values of hyperparameters.

Even though it is hard to compare exactly the computational time for the two approaches, the rough estimation of the total computational time needed to estimate and assess the convergence (2 chains) of the full Bayesian model was about 21.6 hours and depended on the implemented survival model. A similar computational time was needed for the full Bayesian model with the Cox survival model and piecewise constant hazard model with a slightly more time needed for the parametric Weibull model. For the two-stage approach, the total computational time was about 10 hours using the Intel(R) Core(TM)2 Duo T9300 2.5 GHz and 3.5 GB RAM.

5. Simulations

5.1. Design

We have conducted a number of simulations to investigate the performance of our proposed two-stage method. In particular, we compared the plug-in method that uses the Empirical Bayes estimates 𝜙𝑖 from the nonlinear mixed model and ignores the measurement error, the two-stage Monte Carlo sampling approach that accounts for the variability in 𝜙𝑖, and the fully Bayesian approach. In the fully Bayesian approach, only the parametric Weibull model was considered.

For the longitudinal part, the data were simulated for 500 patients from model (5.1) with 𝜙1𝑖=𝛽10+𝛼1𝑖, 𝜙2𝑖=𝛽20+𝛼2𝑖 and 𝜙3𝑖=𝛽30+𝛼3𝑖, 𝜶𝑖𝑁(0,𝐃), 𝑌𝑁(𝑓(𝑡),𝜎2). The variance-covariance matrix 𝐃 of the random effects was chosen to be 𝐃=vech(0.6,0.01,0.01,0.6,0.01,0.3). We kept 7 measurement points as in the original analyzed data set as well as the nonequal distances between them. The variance of the measurement error 𝜎2 was chosen to be 0.25, 1, and 25. Survival times were simulated for each patient using the exponential model with the rate parameter equal exp(𝜆), where𝜆=𝛾1𝜙exp1+𝛾2𝜙exp2+𝛾3𝜙exp3.(5.1) We considered scenarios with fixed coefficients 𝛾1=0.5,𝛾2=0.5, and 𝛾3=0.2. The censoring mechanism was simulated independently using an exponential distribution Exp(𝜆𝐶). Parameter 𝜆𝐶 was changed in order to control proportion of censored observations. Different scenarios with 40% and 20% of censoring were examined. For each simulated data set we fitted four survival models, namely, the gold standard model that uses the true simulated values 𝜙𝑖, the plug-in model, the sampling model, and fully Bayesian joint model. Neither nonparametric Cox nor piecewise constant hazard model were considered in a fully Bayesian approach since we have simulated the data from the parametric exponential model and wanted to compare the proposed two-stage approach with the “best" fully Bayesian model. All the prior settings, size of burn-in, number of iterations, and so forth, for the fully Bayesian model were the same as for the real data analysis.

5.2. Results

In Table 3, we present the average results for 200 simulations of different scenarios are presented. The results from our sampling model were very close to the results obtained for the fully Bayesian model with slightly smaller bias and RMSE for the fully Bayesian approach. That was due to the better estimation of random effects variability in fully Bayesian approach. The plug-in method produced the biggest bias that sometimes with combination with the small variability of the estimates around the biased mean resulted in larger RMSE than in sampling approach. As the measurement error or the percentage of censored observations increased, the estimates of survival submodel were more biased with larger RMSE for all approaches. The increase in bias was more severe when the measurement error variance was increased rather than when the percentage of to censoring was higher. This bias was, however, decreased when the number of repeated measures per individual was increased. This has to do with the amount of information that is available in the data for the estimation of 𝜙𝑖. As it is known from the standard mixed models literature [26], the degree of shrinkage in the subject-specific predicted values is proportional to 𝜎 and inversely proportional to 𝑛𝑖 and 𝜎𝛼. To compare the relation between variance of the random effects and variance of the measurement error, one can use intraclass correlation (ICC) defined as the proportion of the total variability that is explained by the clustering with a given random effect. ICC was similar for the simulated and the real data for the biggest 𝜎 and increased in a simulation data as 𝜎 decreased.

Table 3: Bias and residual mean squared error (RMSE) for the method with true 𝜙𝑖 (GS), Empirical Bayes estimates 𝜙𝑖 (Plug-in), sampled 𝜙𝑖, and fully Bayesian approach.

Since the calculations for the simulation study were highly computationally intensive, we have used the cluster with about 20 nodes with AMD Quad-Core Opteron 835X, 4 × 2 GHz, and 16 GB RAM per node. The analysis for the the 200 simulated data sets for a single scenario took about 65.5 hours using the Bayesian approach and 31.2 hours using the two-stage approach.

6. Discussion

We have proposed a two-stage method that can be used in a joint analysis of longitudinal and time to event data when the longitudinal data are collected before the start of followup for survival, and the interest is in estimation of the impact of longitudinal profiles on survival. The modeling strategy is based on specification of two separate submodels for the longitudinal and time to event data. First, the longitudinal outcome is modeled using a random effects model. Then the survival outcome is modeled using the Empirical Bayes estimates of the subject-specific effects from the first stage. The variability of the estimates from the first stage is properly taken into account using a Monte Carlo approach by sampling from the posterior distribution of the random effects given the data.

As it was demonstrated, ignoring the additional variability of the subject-specific estimates when modeling survival leads to some bias, and in particular, attenuates the regression coefficients towards zero [4]. That was also confirmed by our simulation study. In comparison with the fully Bayesian approach, the proposed partially Bayesian method produced similar results with substantially less number of iterations required. This is due to the fact that sampling was conducted already around the EB estimates, and there is no needed for a burn-in part as in a standard fully Bayesian approach. We used 10000 iterations per subject, which was about the size of burn-in needed in the fully Bayesian models. No thinning was used in our approach, based on the visual inspection of the trace plots. Though it is hard to compare the fully Bayesian approach and the two-stage approach with respect to the computational time precisely, the rough approximation of the total computational time required for the two-stage approach was about half in comparison with the fully Bayesian approach. The fully Bayesian approach provided similar results with the two-stage approach for the special setting we have considered here. However, fitting a fully Bayesian model was a bit of “overdone" in the sense that by design the longitudinal data could not be affected by the survival. Since in many transplantation studies, the longitudinal data are collected before the start of followup for survival; therefore, using our method in that cases seems to be more appropriate than using a fully Bayesian approach. We recommend the proposed approach not only for the particular transplantation studies but in any setting that shares the similarity of the separated followup periods for the two analyzed endpoints. That is, for example, when the event process does not carry any information for the longitudinal outcome and the condition (3.7) from Section 3.2 holds. The simulation results indicate that even if the data come from the real joint setting in which (3.7) may not hold, the proposed two-stage procedure can be comparable to the fully Bayesian approach.

Since the sampling in the proposed method relies strongly on the results of the first part, the accurate estimation of all parameters of nonlinear mixed model is a key feature and should be performed carefully. This can be problematic when the deviation from normality of the random effects, is suspected. However, it was shown that even for the nonnormal random effects one can still use a standard software such as nlmixed in SAS with just a small change in a standard program code. In such cases, the probability integral transformation (PIT) proposed by Nelson et al. [27] can be used or the reformulation of the likelihood proposed by Liu and Yu [28]. An alternative is fitting a Bayesian model only to estimate the longitudinal submodel in the first stage, instead of the likelihood methods. Fitting nonlinear mixed models using Bayesian standard software can be, however, problematic due to the high nonlinearity in random effects that is caused both by the nonlinear function of the longitudinal profiles and by the possible restrictions on parameters [29].

In comparison with the two-stage approach proposed by Tsiatis et al. [5], our method is less computationally intensive since it does not require fitting as many mixed models as there are event times in the data. An alternative, that is somewhat simpler to implement and does not require any assumption about the distribution on the underlying random effects, is the conditional score approach proposed by Tsiatis and Davidian [11]. However, this method is less efficient than the methods based on likelihood. The focus in the discussed approaches is on the association between the longitudinal and event time processes. However, in transplantation studies when the two followup periods for longitudinal and survival outcomes are often separated, the interest is rather in making an inference on the marginal event-time distribution. This is similar to the Bayesian approach proposed by Xu and Zeger [12], that uses the longitudinal data as auxiliary information or surrogate for time-to-event data. Our approach is particularly useful in this setting. Since each of the two submodels is fitted separately, a standard software can be used to implement our method with just a small part of additional programming for Monte Carlo sampling.

Another advantage of the proposed two-stage method is that it can be easily generalized from survival to other types of models as it was applied for the binary Delayed Graft Failure (DGF) indicator (results not shown) in the analysis of the renal data. For that purpose in the second step of the two-stage procedure, the survival model was replaced by the logistic regression model for the DGF indicator. The first stage of the proposed approach could be also modified allowing for other types of longitudinal response and other types of mixed models. Therefore, instead of using a nonlinear mixed model a linear mixed model or generalized linear mixed model (GLMMs) can be considered depending on the type and the shape of the longitudinal response. In the presented real data example, we have chosen the three parameters that described the evolution of the longitudinal response. However, for the particular question of interest, one can easily choose the most convenient parametrization for the longitudinal model and use the selected parameters to analyze the nonlongitudinal response in the second stage.


The authors thank J. M. Smits from Eurotransplant International Foundation in Leiden for providing the data set analyzed in the paper and for the medical consult regarding the application of the proposed method.


  1. J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, Wiley Series in Probability and Statistics, John Wiley & Sons, Hoboken, NJ, USA, Second edition, 2002. View at Zentralblatt MATH
  2. C. L. Faucett and D. C. Thomas, “Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach,” Statistics in Medicine, vol. 15, no. 15, pp. 1663–1685, 1996. View at Publisher · View at Google Scholar
  3. M. S. Wulfsohn and A. A. Tsiatis, “A joint model for survival and longitudinal data measured with error,” Biometrics. Journal of the International Biometric Society, vol. 53, no. 1, pp. 330–339, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  4. R. L. Prentice, “Covariate measurement errors and parameter estimation in a failure time regression model,” Biometrika, vol. 69, no. 2, pp. 331–342, 1982. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  5. A. Tsiatis, V. DeGruttola, and M. Wulfsohn, “Modeling the relationship of survival to longitudinal data measured with error: applications to survival and CD4 counts in patients with AIDS,” Journal of the American Statistical Association, vol. 90, pp. 27–37, 1995. View at Google Scholar
  6. P. S. Albert and J. H. Shih, “On estimating the relationship between longitudinal measurments and time-to-event data using a simple two-stage procedure,” Biometrics, vol. 66, pp. 983–991, 2009. View at Google Scholar
  7. D. Rizopoulos, G. Verbeke, and E. Lesaffre, “Fully exponential Laplace approximations for the joint modelling of survival and longitudinal data,” Journal of the Royal Statistical Society B, vol. 71, no. 3, pp. 637–654, 2009. View at Publisher · View at Google Scholar
  8. R. Henderson, P. Diggle, and A. Dobson, “Joint modelling of longitudinal measurements and event time data,” Biostatistics, vol. 4, pp. 465–480, 2000. View at Google Scholar
  9. V. DeGruttola and X. Tu, “Modeling progression of CD4 lymphocyte count and its relationship to survival time,” Biometrics, vol. 50, pp. 1003–1014, 1994. View at Google Scholar
  10. S. Self and Y. Pawitan, “Modeling a marker of disease progression and onset of disease,” in AIDS Epidemiology: Methodological Issues, N. P. Jewell, K. Dietz, and V.T. Farewell, Eds., Birkhauser, Boston, Mass, USA, 1992. View at Google Scholar
  11. A. A. Tsiatis and M. Davidian, “A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error,” Biometrika, vol. 88, no. 2, pp. 447–458, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  12. J. Xu and S. L. Zeger, “Joint analysis of longitudinal data comprising repeated measures and times to events,” Journal of the Royal Statistical Society C, vol. 50, no. 3, pp. 375–387, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  13. Y. Wang and J. M. G. Taylor, “Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome,” Journal of the American Statistical Association, vol. 96, no. 455, pp. 895–905, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  14. J. G. Ibrahim, M.-H. Chen, and D. Sinha, “Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials,” Statistica Sinica, vol. 14, no. 3, pp. 863–883, 2004. View at Google Scholar · View at Zentralblatt MATH
  15. E. R. Brown and J. G. Ibrahim, “A Bayesian semiparametric joint hierarchical model for longitudinal and survival data,” Biometrics, vol. 59, no. 2, pp. 221–228, 2003. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  16. L. Wu, X. J. Hu, and H. Wu, “Joint inference for nonlinear mixed-effects models and time to event at the presence of missing data,” Biostatistics, vol. 9, pp. 308–320, 2008. View at Google Scholar
  17. C. Hu and M. E. Sale, “A joint model for nonlinear longitudinal data with informative dropout,” Journal of Pharmacokinetics and Pharmacodynamics, vol. 30, no. 1, pp. 83–103, 2003. View at Publisher · View at Google Scholar
  18. N. A. Kaciroti, T. E. Raghunathan, M. A. Schork, and N. M. Clark, “A Bayesian model for longitudinal count data with non-ignorable dropout,” Journal of the Royal Statistical Society C, vol. 57, no. 5, pp. 521–534, 2008. View at Publisher · View at Google Scholar · View at PubMed
  19. P. K. Andersen and R. D. Gill, “Cox's regression model for counting processes: a large sample study,” The Annals of Statistics, vol. 10, no. 4, pp. 1100–1120, 1982. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  20. D. Clayton, “Bayesian analysis of frailty models,” Tech. Rep., Medical Research Council Biostatistics Unit., Cambridge, UK, 1994. View at Google Scholar
  21. J. D. Kalbfleisch, “Non-parametric Bayesian analysis of survival time data,” Journal of the Royal Statistical Society B, vol. 40, no. 2, pp. 214–221, 1978. View at Google Scholar · View at Zentralblatt MATH
  22. K. Patra and D. K. Dey, “A general class of change point and change curve modeling for life time data,” Annals of the Institute of Statistical Mathematics, vol. 54, no. 3, pp. 517–530, 2002. View at Publisher · View at Google Scholar · View at MathSciNet
  23. J. Casellas, “Bayesian inference in a piecewise Weibull proportional hazards model with unknown change points,” Journal of Animal Breeding and Genetics, vol. 124, no. 4, pp. 176–184, 2007. View at Publisher · View at Google Scholar · View at PubMed
  24. B. P. Carlin and T. A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Chapman & Hall, New York, NY, USA, 2000.
  25. G. O. Roberts, A. Gelman, and W. R. Gilks, “Weak convergence and optimal scaling of random walk Metropolis algorithms,” The Annals of Applied Probability, vol. 7, no. 1, pp. 110–120, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  26. G. Verbeke and G. Molenberghs, Linear mixed models for longitudinal data, Springer Series in Statistics, Springer, New York, NY, USA, 2000.
  27. K. P. Nelson, S. R. Lipsitz, G. M. Fitzmaurice, J. Ibrahim, M. Parzen, and R. Strawderman, “Using the probability integral transformation for non-normal random effects in non-linear mixed models,” Journal of Computational and Graphical Statistics, vol. 15: 3957, 2004. View at Google Scholar
  28. L. Liu and Z. Yu, “A likelihood reformulation method in non-normal random effects models,” Statistics in Medicine, vol. 27, no. 16, pp. 3105–3124, 2008. View at Publisher · View at Google Scholar · View at PubMed
  29. M. Davidian and D. M. Giltinan, Nonlinear Models for Repeated Measurment Data, Chapman and Hall, New York, NY, USA, 1998.