Abstract

In the past two decades, joint models of longitudinal and survival data have received much attention in the literature. These models are often desirable in the following situations: (i) survival models with measurement errors or missing data in time-dependent covariates, (ii) longitudinal models with informative dropouts, and (iii) a survival process and a longitudinal process are associated via latent variables. In these cases, separate inferences based on the longitudinal model and the survival model may lead to biased or inefficient results. In this paper, we provide a brief overview of joint models for longitudinal and survival data and commonly used methods, including the likelihood method and two-stage methods.

1. Introduction

Longitudinal data and survival data frequently arise together in practice. For example, in many medical studies, we often collect patients’ information (e.g., blood pressures) repeatedly over time and we are also interested in the time to recovery or recurrence of a disease. Longitudinal data and survival data are often associated in some ways. The time to event may be associated with the longitudinal trajectories. Separate analyses of longitudinal data and survival data may lead to inefficient or biased results. Joint models of longitudinal and survival data, on the other hand, incorporate all information simultaneously and provide valid and efficient inferences.

Figure 1 shows a longitudinal dataset in which CD4 cell counts are measured repeatedly over time in an AIDS study. Here, the time to event could be time to viral rebound, time to dropout, or time to death, depending on the research objectives. Data analysis can mainly focus on either the longitudinal data or the survival data or both. When the analysis focuses on longitudinal data, we often need to address informative dropouts since dropouts are very common in longitudinal studies. When the analysis focuses on survival data, we often need to incorporate time-dependent covariates such as CD4 since the times to event may be associated with the covariate trajectories. Sometimes, the main interest may lie in the association between the longitudinal process and survival process. In any of these cases, joint models are required to feature correlated longitudinal and survival data.

Typically, joint models for longitudinal and survival data are required in the following situations:(i)survival models with measurement errors in time-dependent covariates;(ii)longitudinal models with informative dropouts;(iii)longitudinal and survival processes are governed by a common latent process;(iv)the use of external information for more efficient inference.

Joint models of longitudinal and survival data have attracted increasing attention over the last two decades. Tsiatis and Davidian [1] provided a nice overview of early work on joint models, including De Gruttola and Tu [2], Wulfsohn and Tsiatis [3], Henderson et al. [4], and Wang and Taylor [5], among others. More recent work includes Ding and Wang [6], Nathoo and Dean [7], Ye et al. [8], Albert and Shih [9], Jacqmin-Gadda et al. [10], Rizopoulos et al. [11], Wu et al. [12], Huang et al. [13], and Pan and Yi [14], among others. A typical model setting is to assume a mixed-effects model for the longitudinal data and a Cox model or an accelerated failure time (AFT) model for the survival data, with the two models sharing some random effects or variables. The likelihood method is often used, implemented by EM algorithms. Another common approach is based on two-stage methods, which are computationally simpler. Henderson et al. [4] allow different random effects in the longitudinal and survival models, but assume that the random effects are correlated. Bayesian methods have also been proposed [13, 15, 16].

Since the literature on joint models is quite extensive, it is difficult to review all references here. In this paper, we provide a brief review of the joint model literature. In Section 2, we describe a standard formulation of joint models. In Section 3, we review a commonly used two-stage method. In Section 4, we describe the standard likelihood method. In Section 5, we discuss some extensions of standard joint models. A real data example and a simulation study are presented in Section 6 to illustrate and evaluate the methods. We conclude the paper in Section 7 with discussion.

2. A Standard Formulation of a Joint Model

In this section, we consider a standard formulation of a joint model. In the literature, a typical setup is a survival model with measurement errors in time-dependent covariates, in which a linear mixed-effects (LME) model is often used to model time-dependent covariates to address covariate measurement errors and a Cox proportional hazards (PH) model is used for modelling the survival data. We focus on this setup to illustrate the basic ideas.

Consider a longitudinal study with 𝑛 individuals in the sample. The objective is to model the time to an event of interest or survival time. Time-varying covariates are used in the survival model to partially explain the variation in the event times. Let 𝑠𝑖 be the survival time for individual 𝑖, 𝑖=1,2,…,𝑛. Some individuals may not experience any events by the end of the study so their event times may be right censored. We assume that the censoring is random or noninformative. For individual 𝑖, let 𝑐𝑖 be the censoring time, and let 𝛿𝑖=𝐼(𝑠𝑖≀𝑐𝑖) be the censoring indicator such that 𝛿𝑖=0 if the survival time for individual 𝑖 is right censored and 𝛿𝑖=1 otherwise. The observed survival data are {(𝑑𝑖,𝛿𝑖),𝑖=1,2,…,𝑛}, where 𝑑𝑖=min(𝑠𝑖,𝑐𝑖).

In survival models, some time-dependent covariates may be measured with errors. For simplicity, we consider a single time-dependent covariate. Let 𝑧𝑖𝑗 be the observed covariate value for individual 𝑖 at time 𝑒𝑖𝑗, subject to measurement errors, 𝑖=1,2,…,𝑛; 𝑗=1,2,…,π‘šπ‘–. Let the corresponding unobserved true covariate value be π‘§βˆ—π‘–π‘—. Denote 𝐳𝑖=(𝑧𝑖1,…,π‘§π‘–π‘šπ‘–)𝑇, and π³βˆ—π‘–=(π‘§βˆ—π‘–1,…,π‘§βˆ—π‘–π‘šπ‘–)𝑇. In many cases, the longitudinal covariate measurements are terminated at the event or censoring time 𝑑𝑖. For example, this is the case when the events are dropouts. In this case, we have π‘’π‘–π‘šπ‘–β‰€π‘‘π‘–, and the covariate values after the event time 𝑑𝑖 are all missing. Let 𝐱𝑖 be covariates without measurement errors.

We consider the following Cox model for the survival data:πœ†π‘–(𝑑)=πœ†0𝑧(𝑑)expβˆ—π‘–(𝑑)𝛽1+π±π‘‡π‘–πœ·2ξ€Έ,𝑖=1,…,𝑛,(2.1) where πœ†π‘–(𝑑) is the hazard function, πœ†0(𝑑) is the unspecified baseline hazard function, and 𝜷=(𝛽1,πœ·π‘‡2)𝑇 are unknown parameters. Survival model (2.1) assumes that the hazard function πœ†π‘–(𝑑) depends on the unobserved true covariate values π‘§βˆ—π‘–(𝑑), rather than the observed but mismeasured covariate value 𝑧𝑖(𝑑).

In Cox model (2.1), the time-dependent covariate value 𝑧𝑖(𝑑) should be available at any event time 𝑑. In practice, however, covariates are usually measured intermittently at times (say) {𝑒𝑖𝑗,𝑗=1,2,…,π‘šπ‘–} for individual 𝑖, with the measurement times possibly varying across individuals, leading to possible β€œmissing covariates.” Moreover, covariates may be measured with errors. Therefore, it is common to have both missing data and measurement for errors in time-dependent covariates, which must be addressed when conducting inference based on the Cox model (2.1). For simplicity, we assume that the missing covariate data are missing at random [17].

To address either missing data or measurement error or both, a standard approach is to model the time-dependent covariates. A common choice is the following LME model:𝐳𝑖=π‘ˆπ‘–πœΆ+π‘‰π‘–πšπ‘–+ππ‘–β‰‘π³βˆ—π‘–+𝝐𝑖,𝑖=1,…,𝑛,(2.2) where π‘ˆπ‘– and 𝑉𝑖 are known design matrices, 𝜢 is a vector of fixed-effects parameters, πšπ‘– is a vector of random effects, 𝝐𝑖=(πœ–π‘–1,…,πœ–π‘–π‘šπ‘–)𝑇 is a vector of measurement errors, and the unobserved true covariates are π³βˆ—π‘–=π‘ˆπ‘–πœΆ+π‘‰π‘–πšπ‘–. Often, we assume thatπšπ‘–βˆΌπ‘(0,𝐴),ππ‘–ξ€·βˆΌπ‘0,𝜎2𝐼,(2.3) and πšπ‘– and 𝝐𝑖 are independent, where 𝐴 is a covariance matrix, 𝜎2 is a parameter, and 𝐼 is an identity matrix. Here, we focus on the case that the observations of the covariate process is truncated by the event time; that is, no covariate data are available after the event occurs (such as death or dropout).

Note that the survival model (2.1) and the longitudinal model (2.2) are linked through the shared random effects πšπ‘–. In some applications, not necessarily in the measurement error context, the shared random effects may be viewed as a latent process that governs both the longitudinal process and the survival process. The shared random effects induce the dependence between the longitudinal and survival processes, and this dependence suggests the need of joint modelling.

There are two commonly used approaches for inference of joint models:(i)two-stage methods,(ii)likelihood methods.

In the following sections, we describe these two approaches in detail. Other approaches for joint modes have also been proposed, such as those based on estimating equations, but we omit them here for space consideration.

3. Two-Stage Methods

In the joint modelling literature, various two-stage methods have been proposed. A simple (naive) two-stage method is as follows.

Stage 1. Fit a LME model to the longitudinal covariate data, and estimate the missing or mismeasured covariates based on the fitted model.

Stage 2. Fit the survival model separately, with the missing or unobserved true covariate values substituted by their estimates from the first stage as if they were observed values and then proceed with the usual survival analysis.

Main advantages of the two-stage methods, including the modified two-stage methods as described below, are the simplicity and that they can be implemented with existing software. The limitation of those methods is that they may lead to biased inference for several reasons. First, in the estimation of the longitudinal covariate model parameters, the truncations resulted from the events are not incorporated. That is, the longitudinal covariate trajectories of subjects who experience an event may be different from those who do not experience that event, so estimation of the parameters associated with the longitudinal covariate model in the first stage, based only on observed covariate data, may be biased. Second, the uncertainty of the estimation in the first stage is not incorporated in the second stage of the survival model estimation. Thus, standard errors of the parameter estimates of the survival model may be underestimated. Third, all information in the longitudinal process and the survival process is not fully combined in each model fitting to produce the most efficient estimates.

The bias in the estimation of the longitudinal model parameters caused by ignoring the informative truncations from the events may depend on the strength of the association between the longitudinal process and the survival process. The bias resulted from ignoring the estimation uncertainty in Stage 1 may depend on the magnitude of measurement errors in covariates. To address these issues, various modified two-stage methods have been proposed, leading to better two-stage methods.

Self and Pawitan [18] considered a two-stage method in which the least-square method was used to fit individual longitudinal covariate trajectories; the resulting estimates were used to impute the β€œtrue” covariate values in the survival models and inference was then based on the usual partial likelihood. Tsiatis et al. [19] considered an approximation to the hazard function and then using the approximation to construct the partial likelihood. They replaced the true covariate π‘§βˆ—π‘–(𝑑) by an empirical Bayes estimate of the conditional expectation 𝐸(π‘§βˆ—π‘–(𝑑)βˆ£π‘§π»π‘–(𝑑),𝑑≀𝑠𝑖), where 𝑧𝐻𝑖(𝑑)={𝑧𝑖(𝑒),𝑒≀𝑑} is the covariate history up to time 𝑑. They obtained the empirical Bayes estimate from a standard fit of the LME model to the covariate data up to time 𝑑 for all subjects still at risk at time 𝑑. Similar two-stage methods were also proposed in Bycott and Taylor [20] and Dafni and Tsiatis [21].

More recently, other two-stage methods have been developed in the literature. In the sequel, we review some of these recent methods. Following Prentice [22], we rewrite the survival model (2.1) asπœ†π‘–ξ€·π‘‘;𝑧𝑖(𝑑),𝐱𝑖=πœ†0𝑧(𝑑)𝐸expβˆ—π‘–(𝑑)𝛽1+π±π‘‡π‘–πœ·2ξ€Έβˆ£π‘§π‘–(𝑑),𝐱𝑖,𝑑𝑖>𝑑,(3.1) which involves an intractable conditional expectation. Following Dafni and Tsiatis [21] and Ye et al. [8], we approximate the above conditional expectation by𝐸𝑧expβˆ—π‘–(𝑑)𝛽1+π±π‘‡π‘–πœ·2ξ€Έβˆ£π‘§π‘–(𝑑),𝐱𝑖,𝑑𝑖𝐸𝑧>π‘‘β‰ˆexpβˆ—π‘–(𝑑)𝛽1+π±π‘‡π‘–πœ·2βˆ£π‘§π‘–(𝑑),𝐱𝑖,𝑑𝑖.>𝑑(3.2) A two-stage method may then proceed as follows. In the first step, we estimate the conditional expectation 𝐸(π‘§βˆ—π‘–(𝑑)𝛽1+π±π‘‡π‘–πœ·2βˆ£π‘§π‘–(𝑑),𝐱𝑖,𝑑𝑖>𝑑) by fitting the covariate model (2.2) to the observed longitudinal data and survival data. In the second step, we then substitute the conditional expectation (3.2) in (3.1) by its estimate from the first step and then we proceed with standard inference for the Cox model. Ye et al. [8] proposed two approaches for the first step, called risk set regression calibration (RRC) method and ordinary regression calibration (ORC) method, respectively. The idea is to fit the LME covariate model (2.2) to either the observed covariate data in the risk set or all observed covariate data.

Note that the bias resulted from the naive two-stage method is caused by the fact that the covariate trajectory is related to the length of followup. For example, subjects who drop out early or die early may have different trajectories than those who stay in the study. Thus, much of the bias may be removed if we can recapture these missing covariate measurements due to truncation by incorporating the event time information. Albert and Shih [9] proposed to recapture the missing measurements by generating data from the conditional distribution of the covariate given the event time:π‘“ξ€·π³π‘–βˆ£π‘ π‘–ξ€Έ=ξ€œπ‘“ξ€·π³;πœ½π‘–βˆ£πšπ‘–ξ€Έπ‘“ξ€·πš;πœ½π‘–βˆ£π‘ π‘–ξ€Έ;πœ½π‘‘πšπ‘–,(3.3) where the covariate 𝐳𝑖 and event time 𝑠𝑖 are assumed to be conditionally independent given the random effects πšπ‘–, and 𝜽 contains all unknown parameters. They approximate the conditional density 𝑓(π³π‘–βˆ£π‘ π‘–;𝜽) using a LME model, and then use standard software to simulate missing data from 𝑓(π³π‘–βˆ£π‘ π‘–;𝜽). Once the missing measurements are simulated, the covariate model is then fitted to the β€œcomplete data,” which are used in the second step. The procedure is iterated several times to incorporate the missing data uncertainty. Thus, the idea is similar to a multiple imputation method with nonignorable missing data. Such an approach may reduce the bias resulted from truncations.

To incorporate the estimation uncertainty in the first step, we may consider a parametric bootstrap method as follows.

Step 1. Generate covariate values based on the assumed covariate model, with the unknown parameters substituted by their estimates.

Step 2. Generate survival times from the fitted survival model.

Step 3. For each generated bootstrap dataset from Steps 1 and 2, fit the models using the two-stage method and obtain new parameter estimates.

Repeating the procedure 𝐡 times (say, 𝐡=500), we can obtain the standard errors for the fixed parameters from the sample covariance matrix across the 𝐡 bootstrap datasets. This Bootstrap method may produce more reliable standard errors than the naive two-stage method if the assumed models are correct.

Two-stage methods have bearing with the regression calibration method in measurement error literature. Many of these two-stage methods may not completely remove biases. Moreover, they rely on certain assumptions and approximations. The validity of these assumptions and the accuracy of these approximations need to be further investigated.

4. Likelihood Methods

The likelihood method is perhaps the most widely used approach in the joint model literature. It provides a unified approach for inference, and it produces valid and the most efficient inference if the assumed models are correct. The likelihood method is based on the likelihood for both longitudinal data and survival data. However, since the likelihood function can be complicated, a main challenge for the likelihood method is computation.

4.1. The Likelihood

All the observed data are {(𝑑𝑖,𝛿𝑖,𝐳𝑖,𝐱𝑖),𝑖=1,2,…,𝑛}. Let 𝜽=(𝜷,𝜢,𝜎,𝐴,𝝀0) denote the collection of all unknown parameters in the models, where 𝝀0={πœ†0(𝑑𝑖),𝑖=1,2,…,𝑛}. We assume that the censoring of survival data and the assessment process of longitudinal measurements are noninformative. The (overall) likelihood for all the observed data is given by𝐿(𝜽)=𝑛𝑖=1ξ€œπ‘“ξ€·π‘‘π‘–,π›Ώπ‘–βˆ£π³βˆ—π‘–,𝝀0𝑓𝐳,πœ·π‘–βˆ£πšπ‘–,𝜢,𝜎2ξ€Έπ‘“ξ€·πšπ‘–ξ€Έβˆ£π΄π‘‘πšπ‘–,(4.1) where𝑓𝑑𝑖,π›Ώπ‘–βˆ£π³βˆ—π‘–,𝝀0ξ€Έ=ξ€Ίπœ†,𝜷0𝑑𝑖𝑧expβˆ—π‘–ξ€·π‘‘π‘–ξ€Έπ›½1+𝐱𝑇𝑖𝛽2ξ€Έξ€»π›Ώπ‘–ξ‚Έβˆ’ξ€œΓ—exp𝑑𝑖0πœ†0(𝑧π‘₯)expβˆ—π‘–ξ€·π‘‘π‘–ξ€Έπ›½1+𝐱𝑇𝑖𝛽2ξ€Έξ‚Ή,𝑓𝐳𝑑π‘₯π‘–βˆ£πšπ‘–,𝜢,𝜎2ξ€Έ=ξ€·2πœ‹πœŽ2ξ€Έβˆ’π‘šπ‘–/2ξƒ¬βˆ’ξ€·π³expπ‘–βˆ’π³βˆ—π‘–ξ€Έπ‘‡ξ€·π³π‘–βˆ’π³βˆ—π‘–ξ€Έ2𝜎2ξƒ­,π‘“ξ€·πšπ‘–ξ€Έ=ξ€·||𝐴||ξ€Έβˆ£π΄2πœ‹βˆ’1/2ξƒ¬βˆ’ξ€·πšexpπ‘‡π‘–π΄βˆ’1πšπ‘–ξ€Έ2ξƒ­.(4.2)

Parameter estimation can then be based on the observed-data likelihood 𝐿(𝜽) via the maximum likelihood method. Note that the baseline hazard πœ†0(𝑑) in the Cox model is unspecified. It can be estimated using the nonparametric maximum likelihood method by assuming that πœ†0(𝑑) takes discrete mass at each failure time 𝑑𝑖. Thus, the dimension of the parameter vector 𝝀0 is equal to the number of unique failure times. This converts the semiparametric Cox model to a parametric model, but it introduces a major challenge since standard asymptotic theory for the maximum likelihood estimators (MLEs) may not apply due to the infinitely dimensional nature of 𝝀0.

MLEs of the model parameters can either be obtained by a direct maximization of the observed data log likelihood or by using an EM algorithm. Since the observed data log likelihood involves an intractable integral, a direct maximization is often based on numerical integration techniques such as the Gaussian Hermite quadrature or Monte Carlo methods. These methods, however, can be quite computationally intensive if the dimension of the unobservable random effects πšπ‘– is not low. The EM algorithm is known for its stability and generality, so it is widely used for likelihood inference of joint models [1, 3, 11, 23]. Since the E-step of an EM algorithm still involves an intractable integral, Monte Carlo methods or Laplacian approximations are often used to approximate the conditional expectation in the E-step. In the M-step, the Newton-Raphson method is often used.

Hsieh et al. [24] noted that standard errors for estimators of the parameters (𝜢,𝜷,𝜎) based on the Fisher information matrix may be problematic, because of the semiparametric nature of the joint model. They recommended a bootstrap method to obtain standard errors.

4.2. Computational Issues

A main challenge in the likelihood inference for joint models is the computational complexity, since numerical methods or Monte Carlo methods can be very computationally intensive when the dimension of the random effects πšπ‘– is not small. Moreover, convergence of the EM algorithms can sometimes be an issue. Tsiatis and Davidian [1] and Tsiatis and Davidian [25] proposed an alternative approach, a so-called conditional score method, which makes no distributional assumption for the random effects πšπ‘–. Their method treats πšπ‘– as β€œnuisance parameters” and conditions on an appropriate β€œsufficient statistic” for πšπ‘–. Conditioning on this sufficient statistic would remove the dependence of the conditional distribution on the random effects πšπ‘–. This approach leads to a set of unbiased estimating equations, similar to the usual partial likelihood score equations or generalized estimating equation (GEE). The resulting estimates are, under certain regularity conditions, consistent and asymptotically normal, although they may not be the most efficient. Moreover, their method is relatively simple to implement. Song et al. [26] considered an alternative approach to relax the normality assumption of πšπ‘–. They assume that the random effects follow distributions in a class of smooth density family, including the standard normal density as a special case. They use the likelihood method for inference via an EM algorithm, but the computation is considerably more intensive than the conditional score method. Song and Wang [27] also proposed a local corrected score estimator and a local conditional score estimator, for which no distributional assumptions are needed for the underlying true covariates.

Approximate but computationally more efficient methods for joint models have also appeared in the literature, such as those based on Laplace approximations (e.g., [11, 12, 28]). When the dimension of the integration in the likelihood for joint models is high, the Laplace approximations offer considerably computational advantages over numerical or Monte Carlo methods. Note that the order of the Laplace approximation error is 𝑂(π‘šπ‘–βˆ’1), which cannot be made arbitrarily accurate for a given dataset, where π‘šπ‘– is the number of within-individual measurements for individual 𝑖. Therefore, the Laplace approximation works well if the number of within-individual measurements is large. Approximate methods based on Taylor series approximations are similar to the Laplace approximation; that is, their performance improves as π‘šπ‘– increases.

Rizopoulos et al. [11] proposed to use the full exponential Laplace approximation in the E-step of the EM algorithm. Compared to the standard (first-order) Laplace approximation, the full exponential Laplace approximate method has approximation error of order 𝑂(π‘šπ‘–βˆ’2) and requires a much smaller number of within-individual longitudinal measurements to produce reliable results. Lee et al. [28] suggested second-order Laplace approximations. However, these Laplace approximation methods cannot control the magnitude of the approximation errors, unlike Gaussian quadrature or Monte Carlo integration techniques.

5. Bayesian Methods

Bayesian joint models have also been studied by various authors, including Faucett and Thomas [29], Xu and Zeger [15], Wang and Taylor [5], Law et al. [30], Ibrahim et al. [16], and Huang et al. [13]. Joint models may contain many unknown parameters, which may lead to potential problems in inference. A main advantage of Bayesian methods is that they can borrow additional information from similar studies or from experts and incorporate this information in the current analysis, in the forms of prior distributions for the current model parameters. Thus, Bayesian methods can be very useful for inference of joint models.

For Bayesian joint models, the model parameters are assumed to follow some prior distributions, and inference is then based on the posterior distribution given the observed data. Let 𝜽 denote the collection of unknown parameters in the joint model, and let 𝑓(𝜽) denote the prior distribution. Let π’Ÿ={(𝑑𝑖,𝛿𝑖,𝐳𝑖,𝐱𝑖),𝑖=1,2,…,𝑛} denote all observed data. The joint posterior distribution for all unknown parameters 𝜽 and random effects 𝐚={πšπ‘–,𝑖=1,…,𝑛} is then given by𝑓(𝜽,πšβˆ£π’Ÿ)βˆπ‘›ξ‘π‘–=1𝑓𝑑𝑖,π›Ώπ‘–βˆ£π³βˆ—π‘–,𝐱𝑖𝑓𝐳,πœ½π‘–βˆ£πšπ‘–ξ€Έπ‘“ξ€·πš,πœ½π‘–π‘“ξ€·βˆ£π΄ξ€Έξ€»πœ½βˆ£πœ½0ξ€Έ,(5.1) where 𝜽0 are known hyperparameters. Bayesian inference is then based on Monte Carlo samples drawn from the posterior distribution 𝑓(𝜽,πšβˆ£π’Ÿ) using an MCMC algorithm such as the Gibbs sampler. For example, the posterior means and variances of the parameters can be estimated based on these Monte Carlo samples, and Bayesian inference can then be based on these estimated posterior means and variances. This Monte Carlo sampling can be done using the publically available WinBUGS software [31], which is quite general, flexible, and easy to use.

Like other Bayesian methods, it is desirable to check if the final results are sensitive to the choices of prior distributions. Sometimes, in the absence of prior information, noninformative priors or flat priors may be desirable.

6. Other Joint Models

In the previous sections, we have focused on joint models based on a Cox model for right-censored survival data and a LME model for longitudinal data. Other models for survival data and longitudinal data can also be considered in joint models. For example, for survival data, we may consider accelerated failure time (AFT) models and models for interval censored data and models for recurrent events. For longitudinal data, nonlinear, generalized linear mixed models or semiparametric/nonparametric mixed models can be utilized. Although the different survival models and longitudinal models can be employed, basic ideas and approaches for inference remain essentially the same. In the following, we briefly review some of these joint models.

6.1. Joint Models Based on an LME Model and an AFT Model

In joint modelling of longitudinal and survival data, we can use the AFT model to feature survival data. Here, we focus on an AFT model with measurement errors in time-dependent covariates. For longitudinal data, we again consider LME models for simplicity. The description below is based on Tseng et al. [23]. A semiparametric AFT model can be written in a form similar to the Cox model:β„Žπ‘–(𝑑)=β„Ž0ξ‚Έξ€œπ‘‘0ξ€½expβˆ’π‘§βˆ—π‘–(𝑒)𝛽𝑑𝑒expβˆ’π‘§βˆ—π‘–(𝑑)𝛽,(6.1) where β„Žπ‘–(𝑑) is the hazard function of the 𝑖th individual at time 𝑑, β„Ž0(𝑑) is the baseline hazard function, and π‘§βˆ—π‘–(𝑑) is the unobserved true covariate value at time 𝑑. For the observed measurements 𝑧𝑖(𝑑), we again consider the LME model (2.2).

Tseng et al. [23] proposed a likelihood method using an EM algorithm. The likelihood for all observed data is given by𝐿(𝜽)=𝑛𝑖=1ξ€œπ‘“ξ€·π‘‘π‘–,π›Ώπ‘–βˆ£π³βˆ—π‘–,β„Ž0𝑓𝐳,π›½π‘–βˆ£πšπ‘–,𝜢,𝜎2ξ€Έπ‘“ξ€·πšπ‘–ξ€Έβˆ£π΄π‘‘πšπ‘–,(6.2) where 𝑓(π³π‘–βˆ£πšπ‘–,𝜢,𝜎2) and 𝑓(πšπ‘–βˆ£π΄) are the same as those for (4.1) and𝑓𝑑𝑖,π›Ώπ‘–βˆ£π³βˆ—π‘–,β„Ž0ξ€Έ=ξƒ¬β„Ž,𝛽0ξ€½πœ™ξ€·π‘‘π‘–;𝜽,πšπ‘–ξ€·π‘‘ξ€Έξ€Ύπœ•πœ™π‘–;π³βˆ—π‘–ξ€Έ,π›½πœ•π‘‘π‘–ξƒ­π›Ώπ‘–ξƒ―βˆ’ξ€œexpπœ™(𝑑𝑖;π³βˆ—π‘–0,𝛽)β„Ž0ξƒ°,(𝑒)𝑑𝑒(6.3) where π³βˆ—π‘– denotes the covariate history and πœ™ is a known function.

Handling the AFT structure in the joint modelling setting is more difficult than for the Cox model, since 𝑓(𝑑𝑖,π›Ώπ‘–βˆ£π³βˆ—π‘–,β„Ž0,𝜷) is more complicated and the baseline function β„Ž0{πœ™(𝑑𝑖;π³βˆ—π‘–,𝜷)} involves unknown quantities (𝜷,𝜢,𝐴,πšπ‘–), while this is not the case in the Cox model. One cannot use the point mass function with masses assigned to all uncensored survival times 𝑑𝑖 for the baseline hazard function β„Ž0. In other words, in Cox models, the baseline hazard β„Ž0 can be represented by a collection of parameters which are point masses, but this approach is not feasible for the AFT model because of its dependence on covariates via function πœ™(𝑑𝑖;π³βˆ—π‘–,𝜷). To circumvent this, Tseng et al. [23] assumed the baseline hazard function β„Ž0 to be a step function, taking constant values between two consecutive failure times.

Tseng et al. [23] used an Monte Carlo EM algorithm to obtain the MLEs. The framework is similar to that in the previous section. They used a Monte Carlo method to approximate the conditional expectations in the E-step. The M-step involves more complicated computations due to the complicated baseline hazard β„Ž0. To obtain the standard errors of the MLEs, the usual asymptotic formula based on Fisher information matrix may be questionable, so they used a bootstrap method.

6.2. Joint Models with Interval Censored Survival Data

In the previous sections, we have focused on right censored survival data and assume that either the exact survival times or censoring times are observed. In practice, however, we often cannot observe the exact survival nor censoring times, but we only know that events have occurred over certain time intervals. Such survival data are called interval censored. For simplicity, we assume that all individuals are assessed at the same times. Again, let 𝑆𝑖 be the time to an event (survival time) for individual 𝑖, with observed value 𝑠𝑖. Let 𝐫𝑖=(π‘Ÿπ‘–1,…,π‘Ÿπ‘–π‘š)𝑇 be the vector of event indicators such that π‘Ÿπ‘–π‘—=1 if subject 𝑖 has an event occurred from time π‘‘π‘—βˆ’1 to time 𝑑𝑗, and let π‘Ÿπ‘–π‘—=0 otherwise, 𝑖=1,2,…,𝑛; 𝑗=1,2,…,π‘š. We assume that π‘Ÿπ‘–1=0 for all 𝑖. Let 𝑝𝑖𝑗=𝑃(π‘‘π‘—βˆ’1≀𝑆𝑖<𝑑𝑗), and letπœ‹π‘–π‘—ξ€·π‘‘=π‘ƒπ‘—βˆ’1≀𝑆𝑖<π‘‘π‘—βˆ£π‘†π‘–β‰₯π‘‘π‘—βˆ’1𝑆=1βˆ’π‘ƒπ‘–β‰₯π‘‘π‘—βˆ£π‘†π‘–β‰₯π‘‘π‘—βˆ’1ξ€Έ.(6.4) Then, we have 𝑝𝑖𝑗=(1βˆ’πœ‹π‘–1)(1βˆ’πœ‹π‘–2)β‹―(1βˆ’πœ‹π‘–,π‘—βˆ’1)πœ‹π‘–π‘—. The probability function for the event indicator vector 𝐫𝑖 can be written as𝑓𝐫𝑖=π‘šξ‘π‘—=1π‘π‘Ÿπ‘–π‘—π‘–π‘—=π‘šξ‘π‘—=1πœ‹π‘Ÿπ‘–π‘—π‘–π‘—ξ€·1βˆ’πœ‹π‘–π‘—ξ€Έ1βˆ’π‘Ÿπ‘–π‘—,(6.5) which is the probability function for a Bernoulli distribution. We can introduce observed error-prone covariate value 𝐳𝑖, with true value π³βˆ—π‘–, and assumeξ€½ξ€·logβˆ’log1βˆ’πœ‹π‘–π‘—ξ€Έξ€Ύ=πœ·π‘‡π³βˆ—π‘–+𝛾𝑗,(6.6) where 𝜷 and 𝜸=(𝛾1,…,π›Ύπ‘š)𝑇 are unknown parameters. Then, we can write the probability function of 𝐫𝑖 as 𝑓(π«π‘–βˆ£π³βˆ—π‘–,𝜷,𝜸). Alternatively, we can assume that 𝐫𝑖 depends on π³βˆ—π‘– only through the random effects πšπ‘– and writing the probability function of 𝐫𝑖 as 𝑓(π«π‘–βˆ£πšπ‘–,𝜷,𝜸).

Let 𝑓(π³π‘–βˆ£πšπ‘–,𝜢,𝜎) be the conditional probability density function, given the random effects πšπ‘–, and 𝑓(πšπ‘–βˆ£π΄) be the marginal probability density function for πšπ‘– with covariance matrix 𝐴. Let 𝜽 denote the collection of all parameters in all models. Then, the likelihood for all the observed data can be written asπΏπ‘œ(𝜽)=𝑛𝑖=1ξ‚Έξ€œπ‘“ξ€·π³π‘–βˆ£πšπ‘–ξ€Έπ‘“ξ€·π«,𝜢,πœŽπ‘–βˆ£πšπ‘–ξ€Έπ‘“ξ€·πš,𝜷,πœΈπ‘–ξ€Έβˆ£π΄π‘‘πšπ‘–ξ‚Ή.(6.7) MLE of parameters 𝜽 can be obtained by maximizing the observed data likelihood πΏπ‘œ(𝜽). Because the observed-data likelihood πΏπ‘œ(𝜽) can be difficult to evaluate due to its involvement of an intractable and possibly high-dimensional integral, one may proceed with Monte Carlo EM algorithms or other computationally more efficient approximate methods.

6.3. GLMM and NLME Models for Longitudinal Data

We have focused on LME models for modelling the longitudinal data. Other models for longitudinal data can also be considered. For example, one may consider nonlinear mixed-effects (NLME) models for modelling the longitudinal data in joint models [12, 32]. NLME models are often mechanistic models in the sense that they are typically based on the underlying mechanisms which generate the data. On the other hand, LME models are typically empirical models; that is, they are usually used to approximately describe the observed data without considering possible data-generation mechanisms. Thus, NLME models may be scientifically more desirable if such models exist. Similarly, for nonnormal longitudinal data, generalized linear mixed models (GLMMs) can be considered, which are special nonlinear models but are essentially empirical models as well.

When the longitudinal models are nonlinear, the general ideas of the two-stage methods and likelihood methods for joint models can still be applied. The complication is that computation becomes more demanding, because of the nonlinearity of the longitudinal models.

6.4. Joint Models with Missing Data

For longitudinal data, missing values are very common. When missing data are nonignorable in the sense that the missingness probability may be related to the missing values or the random effects, the missing data process is often needed to be incorporated in inferential procedures in order to obtain valid results. For likelihood methods, it is straightforward to incorporate missing data mechanisms in joint model inference. However, the computation becomes even more challenging. Wu et al. [12, 32] considered the missing data problems for joint models, using Monte Carlo EM algorithms and Laplace approximations.

7. Example and Simulation

7.1. An Illustrating Example

As an illustration, we consider an AIDS dataset which includes 46 HIV infected patients receiving an anti-HIV treatment. Viral load (i.e., plasma HIV RNA) and CD4 cell count were repeatedly measured during the treatment. The number of viral load measurements for each individual varies from 4 to 10. It is known that CD4 is measured with substantial errors. About 11% viral load measurements are below the detection limit after the initial period of viral decay. We call the viral load below the detection limit as β€œviral suppression.” We wish to check if the initial CD4 trajectories are predictive for the time to viral suppression.

Let 𝑠𝑖 be the time to viral suppression, that is, the time from the start of the treatment to the first scheduled time when the viral load drops below the detection limit. The viral suppression times for patients whose viral loads never dropped below detection limit may be regarded as right censored, with the study end time as the censoring time. We employ two models, the Cox model (2.1) or the AFT model (6.1), to feature the time to viral suppression, where π‘§βˆ—π‘–(𝑑) is the unobserved true CD4 cell count at time 𝑑 for individual 𝑖. We do not consider additional covariates here.

To address the measurement error in the time-dependent covariate CD4 cell count, we use the LME model to model the CD4 trajectories:CD4𝑖𝑗=𝛼0+π‘Žπ‘–1ξ€Έ+𝛼1+π‘Žπ‘–2𝑒𝑖𝑗+πœ–π‘–π‘—,(7.1) where the parameters, random effects and random errors, and the assumed distributions are the same as those described in the previous sections. The fixed parameters (𝛼0,𝛼1) are population intercept and slope of the CD4 process and (π‘Žπ‘–1,π‘Žπ‘–2) are individual variations from the population averages. To avoid very small (large) estimates, which may be unstable, we standardize the CD4 cell counts and rescale the original time 𝑑 (in days) so that the new time scale is between 0 and 1. We estimate the model parameters using the joint model method and the two-stage method with/out bootstrap standard error correction. The number 𝐡=500 of the bootstrap samples is taken. For the joint model method, we consider the Cox model (2.1) and the AFT model (6.1) with β„Ž0(β‹…) being the Weibull baseline risk function. On the other hand, only the Cox model (2.1) is employed for the two-stage method for comparison. These analyses may be implemented by using the functions π‘π‘œπ‘₯π‘β„Ž(), π‘™π‘šπ‘’(), and π‘—π‘œπ‘–π‘›π‘‘π‘€π‘œπ‘‘π‘’π‘™() in R software.

Table 1 presents the resulting estimates of main parameters of interest and their standard errors. We can see from Table 1 that under either the Cox or the AFT survival models, both the two-stage and the joint model methods produce similar estimates for the covariate (CD4) longitudinal model. However, the two-stage method may underestimate the standard errors of the parameter estimates since it does not incorporate the survival information in the estimation procedure. Note that the parameters in the Cox model and the AFT model have different interpretations, due to different model formulations, so they are not directly comparable.

The parameter 𝛽1 in the survival models measures the effect of the true time-dependent covariate CD4 values on event times, so its estimate and the associated 𝑃 value can be used to check if the true CD4 values are predictive for the times to viral suppression. Since the covariate CD4 is measured with errors, addressing the measurement error is the main focus of joint modelling in this application. Thus, the estimation of 𝛽1 is of primary interest. For the joint model method, under either the Cox or the AFT models, there is some evidence that covariate CD4 is associated with the time to viral suppression, after measurement error has been addressed. It is seen that evidence of significance of the covariate effect is the strongest under the Cox model. On the other hand, the two-stage method may severely underestimate the covariate CD4 effect (the small value of ̂𝛽1). Moreover, the naive two-stage method underestimates the standard error of ̂𝛽1, due to failing to incorporate the estimating uncertainty from the first step. This underestimation of standard error is somewhat corrected by the bootstrap method.

7.2. A Simulation Study

In this section, we conduct a simulation study to compare the joint model method and the two-stage method with/out bootstrap standard error correction. We generate 500 datasets from the time-dependent covariate CD4 process (7.1) in the example of the previous section and the Cox model (2.1) with constant baseline hazard function πœ†0(𝑑)≑1 with emphasis on the effect of the time-dependent covariate. The measurement time points used in the simulation are the same as those in the example of the previous section. The true values of model parameters, given in Table 2, are similar to those in the example, and the variance-covariance matrix 𝐴 of the random effect πšπ‘– is set to be diagonal. Again, we take 𝐡=500 bootstrap samples for the bootstrap standard error in the two-stage method.

In Table 2, we report the simulation results of averages parameter estimates (Est), empirical standard errors (ESE), averages of asymptotic standard errors (ASE), average of bootstrap standard errors (BSE), empirical biases (Bias), mean square errors (MSE), and coverage rates (CR) for the 95% confidence intervals. We can see from Table 2 that the two-stage and the joint model methods produce similar parameter estimates (close to true parameters) except the one for the covariate effect 𝛽1. In particular, the estimate ̂𝛽1 based on the joint model method is very close to its true value, while the estimate based on the two-stage method is about one-third of its true value, which indicates that the two-stage method may underestimate the time-dependent covariate effect severely. The joint model method provides smaller mean square errors and more reasonable coverage rates for the 95% confidence intervals than the two-stage method. Moreover, the two-stage method may underestimate the standard deviation of ̂𝛽1 and the bootstrap correction on this standard error seems plausible.

From the above results, we see that the joint likelihood method produces less biased estimates and more reliable standard errors than the two-stage method. These results have important implications. For example, if one uses Wald-type tests for model selection, the likelihood method would give more reliable results. However, two-stage methods are generally simpler and computationally quicker to output estimates than likelihood methods. We can also compare the two methods with Bayesian methods. Note that, however, Bayesian methods are equivalent to the likelihood method when noninformative priors are used. We expect that Bayesian methods have similar performance to likelihood methods.

8. Discussion

We have provided a brief review of common joint models and methods for inference. In practice, when we need to consider a longitudinal process and an event process and suspect that the two processes may be associated, such as survival models with time-dependent covariates or longitudinal models with informative dropouts, it is important to use joint model methods for inference in order to avoid biased results. The literature on model selection for joint models is quite limited. In practice, the best longitudinal model can be selected based on the observed longitudinal data, and the best survival model can be selected based on the survival data, using standard model selection procedures for these models. Then, we specify reasonable link between the two models, such as shared random effects. To choose methods for inference, the joint likelihood method generally produces most reliable results if the assumed models and distributions are correct. On the other hand, the two-stage methods may be computationally simpler, and many existing models and methods for longitudinal data and survival data can be easily adapted. However, two-stage methods may not completely eliminate the biases in parameter estimates in some cases.

When the longitudinal covariate process terminates at event times, that is, when the longitudinal values are unavailable at and after the event times such as deaths or dropouts, the covariates are sometimes called internal time-dependent covariates. Sometimes, however, longitudinal covariate information is available at and after the event times. For example, CD4 measurements may be still available after patients have been diagnosed with AIDS. Such covariates are sometimes called external covariates. In joint models, it is important to distinguish internal and external time-dependent covariates. In particular, for internal time-dependent covariates, joint models are more desirable since separate analysis in this case may lead to more severe bias.

Survival models with measurement errors in time-dependent covariates have received much attention in the joint models literature. Another common situation is longitudinal models with informative dropouts, in which survival models can be used to model the dropout process. Both situations focus on characterizing the association between the longitudinal and survival processes. Some authors have also considered joint models in which the focus is on more efficient inference of the survival model, using longitudinal data as auxiliary information [15, 33, 34] or assume that the longitudinal process and the survival process are governed by a common latent process [4]. Nathoo and Dean [7] considered an interesting joint model in which an NLME model is used to model tree growth, with spatial correlation incorporated.

Joint models can also be extended to multivariate cases, in which more than one longitudinal processes and more than one event processes can be modelled simultaneously. Extensions are often conceptually straightforward, but computation and implementation can be more tedious than univariate cases. See Henderson et al. [4], Xu and Zeger [35], and Song et al. [26].

Zeng and Cai [36] derived some asymptotic results for maximum likelihood estimators in joint analysis of longitudinal and survival data. They showed the consistency of the maximum likelihood estimators, derived their asymptotic distributions, and showed that the maximum likelihood estimators in joint analysis are semiparametrically efficient.

Although there has been extensive research in joint models in the last two decades and the importance of joint models has been increasingly recognized, joint models are still not widely used in practice. A main reason is perhaps lack of software. Recently, Dimitris Rizopoulos has developed an R package called JM that can be used to fit joint models with normal longitudinal responses and event times under a maximum likelihood approach. Various options for the survival model and optimization/integration algorithms are provided, such as Cox models and AFT models for survival data and the Gauss-Hermite integration methods and Laplace approximations.

Acknowledgments

The authors are grateful to the editor and two referees for helpful and constructive comments. The research was partially supported by the Canada Natural Sciences and Engineering Research Council (NSERC) discovery grants to L. Wu, W. Liu, and G. Y. Yi and by NIAID/NIH Grant AI080338 and MSP/NSA Grant H98230-09-1-0053 to Y. Huang.