#### Abstract

In longitudinal studies, clinicians usually collect longitudinal biomarkers’ measurements over time until an event such as recovery, disease relapse, or death occurs. Joint modeling approaches are increasingly used to study the association between one longitudinal and one survival outcome. However, in practice, a patient may experience multiple disease progression events successively. So instead of modeling of a single event, progression of the disease as a multistate process should be modeled. On the other hand, in such studies, multivariate longitudinal outcomes may be collected and their association with the survival process is of interest. In the present study, we applied a joint model of various longitudinal biomarkers and transitions between different health statuses in patients who underwent renal transplantation. The full joint likelihood approaches are faced with the complexities in computation of the likelihood. So, here, we have proposed two-stage modeling of multivariate longitudinal outcomes and multistate conditions to avoid these complexities. The proposed model showed reliable results compared to the joint model in case of joint modeling of univariate longitudinal biomarker and the multistate process.

#### 1. Introduction

In longitudinal medical studies, repeated measurements of biomarkers are usually collected until the occurrence of a clinical event such as recovery, disease relapse, or death. The main objective of these studies is to study the association between two correlated processes or to use the information of biomarkers to predict or explain the time-to-event. In such analyses, joint models are needed to accurately study the association between two processes. Joint modeling approaches, which have been developed to handle the association between a single longitudinal biomarker and a time-to-event outcome, have received considerable attention recently [1–3].

In practice, a patient may experience multiple disease progression events prior to the event of interest; for example, the patient with cancer may experience a local recurrence followed by a distant recurrence and then death. So, instead of the occurrence of a single event, the progression of the disease should be modeled as a multistate process, focusing on the transitions between different health states and the impact of the longitudinal biomarker on them. Moreover, in such studies, multiple longitudinal biomarkers may be collected for each patient and the correlation structure between them should be taken into account in the model [4].

There are several studies that have extended classical joint models for when there are multiple longitudinal outcomes. Chi and Ibrahim [5] extended a joint model for multivariate longitudinal data and multivariate survival data. Then, Andrinopoulou et al. [6] extended a joint model for competing risks and two longitudinal biomarkers. On the other hand, there are few studies that extended the joint models for multistate data. Dantan et al. [7] proposed a joint multistate model with the latent state for one longitudinal response and illness-death data. Ferrer et al. [4] developed a joint model with the shared random effect framework for joint modeling of longitudinal and multistate processes. However, based on our knowledge, there is no study to model multivariate longitudinal and multistate processes jointly. So, in this paper, we are interested to simultaneous modeling of these two correlated processes.

The joint model based on the shared random effects framework, which is the most commonly used framework of joint modeling of longitudinal and survival data, is faced with implementation issues when the number of the longitudinal outcomes grows [8]. Moreover, modeling the transitions between different health states instead of the time to a single event can make the issue more complicated.

The primary joint models have been based on the two-stage approaches in which the likelihood calculation is implemented in two steps [1, 8, 9]. These initial approaches are not faced with the problem of computational difficulties on the full joint likelihood calculation [1]. So, here, we use the idea of these initial approaches to avoid the computational complexities and propose a two-stage base model for joint modeling of multivariate longitudinal and multistate data. The rest of the paper is organized as follows. The motivating example is described in Section 2. The basic concepts of joint modeling of longitudinal and survival data are provided in Section 3. In Section 4, we illustrate our proposed two-stage based model for multivariate longitudinal and multistate data. The application of the model to renal transplantation data set is presented in Section 5, and a discussion and conclusion is finally given in Section 6.

#### 2. Motivating Dataset

Our motivating example includes end-stage renal disease (ESRD) patients who underwent renal transplantation. The information of all renal transplantations performed at Shahid Beheshti hospital in Hamadan province (western of Iran) from July 1994 to February 2017 (*n* = 408) was collected retrospectively. Patients were followed from the time of transplantation at regular visits (first week, first month, third month, sixth month, and every year thereafter). The number of visits for each patient was variable with the median number of 12 visits and varied between 3 and 37 visits. Donor-Recipient Gender (DRG) was matched in 221 (54.2%) transplantations and was mismatched in 187 (45.8%) transplantations. DRG match included both male donor to male recipient and female donor to female recipient, though DRG mismatch included female donor to male recipient and vice versa. The mean (±SD) age of patients was 37.24 ± 13.46 years.

Several parameters, so-called biomarkers, such as creatinine, hemoglobin, and blood urea nitrogen (BUN), were collected in regular visits to monitor renal transplantation. These biomarkers are variably associated with renal transplantation function and mortality. Often, it is the trend of these biomarkers rather than the measure of their level at a specific time point that affect the risk of an event. Moreover, acute or chronic graft rejection is an unwanted outcome of transplantation, which can affect the overall survival of patients. Indeed, a patient may experience graft failure rejection prior to death and this graft failure may cause an increase in the risk of mortality. So, in this study, there were two types of outcomes, including (1) multiple longitudinal outcomes represent the longitudinal biomarkers and (2) multistate outcome that represents the survival process. To take into account the correlation between longitudinal biomarkers as well as the correlation between longitudinal and multistate processes, it was essential to use the joint modeling approaches.

Multistate process is shown in Figure 1. After transplantation (state 1), a patient may experience graft rejection (state 2) and death (state 3) due or not to ESRD. It should be noted that here all patients with acute or chronic rejection were considered as the same states, due to the small sample sizes. The matrix represents the amount of direct observed transitions between different states. In total, the graft of 53 patients was rejected after transplantation. Overall, 49 patients have died during the follow-up; among them, 14 patients died after rejection, and 35 patients died without graft rejection. Moreover, 359 patients were censored during the follow-up; among them, 320 patients were censored for both rejection and death events, and 39 patients were censored for death after rejection. These 39 patients were returned to dialysis.

The observed levels of the longitudinal biomarkers are called observed values and the predicted levels of longitudinal biomarkers based on the multivariate linear mixed effects model are called expected values. The mean of observed and expected values of the longitudinal biomarkers over time are depicted in Figure 2. In this figure, patients are divided into four groups: those who experienced graft rejection, those whose graft was not rejected during the follow-up, those who died, and those who were alive at the end of the study.

**(a)**

**(b)**

#### 3. Simultaneous Modeling of Longitudinal and Time-to-Event Data

In this section, we will briefly describe the joint modeling of a single longitudinal biomarker and time to a single event of interest, in the shared random effects framework. This model consists of two submodels, including a linear mixed submodel for longitudinal data (repeated measurements of the longitudinal biomarker) and a proportional hazard submodel for time-to-event data.

##### 3.1. Longitudinal Submodel

Let us assume that is the measure of a biomarker on subject , at time , *.* The trajectory of this longitudinal variable is modeled by a linear mixed model as follows:where and are the vector of covariates associated with the vector of fixed effects and the vector of random effects , respectively, where ^{T} denotes the transpose operator. We assumed that ( and are the random intercept and random slope effects, respectively) and , where is the identity matrix of order ; and are independent [10].

##### 3.2. Proportional Hazard Submodel

Let represent the observed failure time for the th subject, where denotes the true failure time, and is the censoring time for th subject. Moreover, an event indicator is defined as , which equals 1 if the failure is observed and 0 otherwise. To model the time-to-event occurrence, a proportional hazard Cox model [11], which takes into account the biomarker’s dynamics via shared random effects, , is used. Thus, the model is defined as follows:where is the hazard at baseline and is the vector of possibly time-dependent fixed effects covariates associated with the vector of coefficients . The multivariate function defines the association between the longitudinal and survival processes, and is the coefficient that quantifies this association. Here, different (most commonly used) association functions which are used to describe the association structure between longitudinal and time-to-event data are presented as follows (note, for ease of writing, here only the time variable is included in the model; however other covariates can also be included in the model):(i)The association structure takes into account the random time trend in the time-to-event model. This association structure is defined as So, the time-to-event submodel is rewritten as in which represents the association between the longitudinal biomarker measurements and the risk of the event at time *t* [2]. This association structure expresses the subject-specific deviations from the average intercept and average slope.(ii)The association structure includes the true value of the longitudinal biomarker (at time *t*) in the time-to-event model, as So, the time-to-event submodel is rewritten as in which is the association between the longitudinal biomarker and the risk of the event at time *t* taking into account fixed and random effects predictions of the true value of the longitudinal biomarker [1].(iii)The association structure includes both the true value of the longitudinal biomarker and the slope of the true biomarker’s trajectory at time *t* in the time-to-event model. This association structure is defined as So, the time-to-event submodel is rewritten as in which represents the association between the current true value of longitudinal biomarker and the risk of the event at time *t* like the previous association function, and defines the association between the slope of the true trajectory at time *t* and the risk of the event. Note that this association structure can catch situations in which two patients have a similar true level of the biomarker at time *t*, but they may have different rate of change of the biomarker [12].(iv)The association structure takes into account the cumulative effect of the longitudinal biomarker (the area under the biomarker trajectory) in the time-to-event model. In this association structure, the integral of the longitudinal trajectory that represents the cumulative effect of the biomarker up to time *t* is calculated via So, the time-to-event submodel is rewritten as in which represents the association between the whole history of the longitudinal biomarker up to time *t* and the risk of the event at time *t*. Note that, in this association structure, the whole history of the biomarker is associated with the risk of the event, while in the three previous structures, the risk of an event is just allowed to be associated with the features of the biomarker at a specific time point [1].

#### 4. Simultaneous Modeling of Multivariate Longitudinal and Multistate Data

As there are multiple longitudinal biomarkers with a multistate process in our motivating dataset, the clinical interest is on the correlation between the longitudinal biomarkers and their association with transitions between different health states. However, the implementation of the joint model within the shared random effects framework, mentioned in the previous section, is difficult when the number of biomarkers is large. This computational problem is due to the high number of parameters in the variance-covariance matrix of the random effects. Moreover, modeling the transitions between different health states instead of the time to a single event can make the issue more complicated. In order to overcome this issue, a quick and approximate estimation method is required.

The initial joint models are based on the two-stage methods, where the likelihoods of the longitudinal and time-to-event data are calculated separately [12–14]. Some investigators have shown in some particular studies that the performance and the resulting estimations of these initial two-stage based models are similar to the joint models (based on the full joint likelihood calculation) [15] in the case of a single longitudinal outcome and single time-to-event outcome. So, in the current paper, we will propose an extension of the joint models for multivariate longitudinal and multistate data based on the main idea of the two-stage approaches.

The proposed model is implemented in two stages. At the first stage, a multivariate linear mixed model is used for multiple longitudinal biomarkers. Then, at the second stage, a multistate model with proportional hazards that incorporate the dynamics of each longitudinal biomarker through association structures (Section 3.2) is used for multistate data.

##### 4.1. Stage 1: Multivariate Linear Mixed Effects Model

At the first stage, a multivariate linear mixed effects model is used for modeling the longitudinal biomarkers measurements. The multivariate linear mixed effects model is defined aswhere and are the vector of covariates associated with the vector of fixed effects and the vector of random effects for the th biomarkers, respectively. It is assumed that when , and (the “” operator vectorizes a matrix by stacking its columns). Note that is a matrix in which the rows are independently distributed as , where is the number of longitudinal measurements for subject . To estimate fixed and random effects coefficients, maximum likelihood and empirical Bayes methods are used, respectively [11]. The main interest of this stage is to obtain the prediction of the longitudinal biomarkers from the multivariate mixed effects model to be included as covariates in the multistate model at Stage 2.

##### 4.2. Stage 2: Multistate Model

At this stage, a multistate proportional hazards model, including the estimation of the longitudinal biomarkers from Stage 1, is fitted to the data. In the following, we describe the multistate process and Markovian multistate model which is used at this stage.

For each individual, a multistate process is observed. Let be the multistate process that takes values in finite state space and denotes the state which is occupied by subject at time *t*. The multistate process is assumed to be continuous, and is a non-homogeneous Markov process. The Markov assumption ensures that the future of the process depends only on the past via the current state. Let define as the vector of the observed transition times for the th subject, with ; i.e., the time of each transition should be greater than the previous one. The vector of observed transition indicators for subject is defined by , with equal to 1 if a direct transition to state is observed at time and 0 otherwise. There will be direct transition for subject, if the last observed state is an absorbing state (i.e., it is impossible to leave once it is entered (e.g., death state)). Otherwise, there are direct transitions, and is equal to the right censoring .

A Markov multistate model with proportional hazards is used to model the transition times. The transition intensity for the transition from state to state at time *t* is given bywhere is the completely unspecified baseline hazard for the transition from *h* to *k*, is the vector of possibly time-dependent transition-specific covariates, and is the associated vector of fixed regression coefficients. The multivariate function (mentioned in Section 3) defines the association between the -th longitudinal biomarker and the transition from state *h* to *k*, and its corresponding coefficient quantifies this association [4]. The maximum likelihood method is used to estimate the parameters of this model.

The details of the implementation procedure are given in Section 5, when the model is used to analyze the renal transplantation data.

#### 5. Application to Renal Transplantation Data

In this section, we first describe the implementation of our proposed two-stage based model and then give its fitting results for our motivating renal transplantation data.

##### 5.1. Implementation of the Two-Stage Model

*Stage 1.* A multivariate linear mixed effects model is fitted. Use the *mlmmm.em()* function from the *R mlmmm* package to fit the multivariate linear mixed effects model. Joining the following equations showed the multivariate mixed model for subject at time :

As described in Section 3, different association structures can be used to explain the dependence of the two processes. Here, we focus on the four association structures described in Section 3.2. So, the estimation of the parameters is used to obtain the linear predictors of the longitudinal biomarkers via the following equations:

These linear predictors are used as the covariates in Stage 2 for the multistate model.

*Stage 2.* The transitions intensity between different states (shown in Figure 1) including the estimation of the longitudinal biomarkers (linear predictors obtained at Stage 1) is defined as follows:

Note that, in the two-stage based models, the likelihood calculation for longitudinal and time-to-event submodel are done separately.

At this stage, the *mstate* package is used to fit the Markov multistate model. The *msprep()* and *expands.covs()* functions are used to prepare the multistate data, and then the *coxph()* function is used to estimate the parameters of the transition-specific multistate model.

##### 5.2. Results

The results of the multivariate longitudinal model (Stage 1) are presented in Table 1. As shown in this table, the time variable had significant linear and quadratic effects on creatinine, hemoglobin, and BUN measurements. Furthermore, the results showed that DRG was associated with the creatinine trajectory and age of recipient was associated with hemoglobin, and BUN measurements (Table 1). Since the association between three biomarkers was at primary interest, the correlation of the random intercepts and random slopes of longitudinal biomarkers was examined in the multivariate model. The correlation matrix of the random effects based on joint modeling of the longitudinal biomarkers is given in Table 2. As shown in this table, the creatinine and BUN measurements were correlated on both the intercept (*r* = 0.67) and the slope (*r* = 0.75). Moreover, there was a moderate inverse correlation between the random slope of hemoglobin and the random slope of creatinine (*r* = −0.52) as well as between the random slope of hemoglobin and the random slope of BUN (*r* = −0.55).

The results of the multistate model (Stage 2) fitted with various association structures are given in Table 3. We can observe the effect of different association structures on the results of two-stage based model through comparing their estimations. For instance, we had not found any significant effect of the true slope of creatinine and hemoglobin on time to graft rejection at time (time of transition from state 1 to state 2), when we used their prediction based on the multivariate longitudinal model. However, the cumulative effect of these longitudinal biomarkers, as well as the effect of their true values on the mentioned transition, was significant. Such results display that the choice of association structure (mentioned in Section 2), which is used in the multistate model, is of particular importance and may lead to conflicting results.

As, in practice, the association structure is unknown, we can compare different association structures and select the best one via various model selection criteria. The log-likelihood values of the multistate models can be used to select the best structure. The values of the log-likelihood for different structures (presented in Table 3) revealed that including the estimation of the cumulative effect of longitudinal biomarkers in the multistate model had the best fit to data. It should be noted that choosing the appropriate association structure between longitudinal and multistate model also depends on the clinical aspects. Moreover, we can use different association structures for each of the longitudinal outcomes in a single multistate model and even in each of the transitions of a unique model.

Based on the results of the presented innovative method, we found that creatinine, hemoglobin, and BUN had predictive power in ESRD patients who underwent renal transplantation. The results showed that, after adjustments for fixed covariates, the cumulative effects (the whole area under the curve of the longitudinal biomarkers trajectories) of creatinine (HR = 1.10; 95% CI: 1.01–1.18 and; ), hemoglobin (HR = 0.90; 95% CI: 0.84–0.98 and; ), and BUN (HR = 1.13; 95% CI: 1.02–1.24 and; ) rather than their measurements at a particular time affected the risk of graft rejection. Moreover, the results revealed that the cumulative effect of hemoglobin on the survival after rejection was significant (HR = 1.13; 95% CI: 1.02–1.24 and; ). The results also showed that an increase in age at the time of transplantation was associated with a lower risk of graft rejection (HR = 0.95; 95% CI: 0.91–0.99 and; ), and a higher risk of death after rejection (HR = 1.14; 95% CI: 1.03–1.26 and; ).

#### 6. Discussion and Conclusion

Traditional joint models for longitudinal and time-to-event data often consist of one longitudinal and one survival outcome [1–3]. However, in longitudinal health studies, a patient may experience a succession of clinical events instead of a single event and multiple longitudinal biomarkers may be measured over time. Here, we proposed a two-stage based model for joint modeling of multivariate longitudinal and multistate data. With the use of two-stage based modeling, we avoid the computational complexities of likelihood calculation in the full joint likelihood approaches [8, 16]. Furthermore, as we have used multivariate longitudinal model that incorporated the correlation between biomarkers at the first stage and a multistate model for the disease progression process at the second stage, we have introduced an extension of the nave two-stage method in which univariate models are used at both stages.

The proposed model enabled us to study the complex association structure between multiple longitudinal biomarkers and transition between different health states in our motivating renal transplantation dataset. The strategy of this proposed model was based on the specification of two separate submodels for multivariate longitudinal and multistate data. At the first stage, a multivariate linear mixed effects model proposed by Yücel [17] was used to model biomarkers trajectories. This model could handle the missing values which is a crucial issue in longitudinal data. Then, at the second stage, a multistate model with proportional hazards was used to model transition intensities incorporating the estimation of the random effects. In our case study, we assumed a Markov continuous multistate process as this assumption held for our renal data. However, a semi-Markov model, in which the hazard of entry into the next state depends on the sojourn time on the current state [18] or a non-Markov model [19], can be used simply instead. Moreover, as we did not have any information about graft retransplantation among patients who experienced graft rejection, we did not use a reversible multistate model, although this type of multistate model also can be used. So, the main advantage of the two-stage based proposed model is that it can be easily generalized to other types of longitudinal and survival models.

Guler et al. [8] have mentioned that the main limitation of two-stage methods is bias arising with the unignorable informative censoring in the longitudinal process which is caused by the survival process. Here, we compare joint and two-stage based models to see whether this bias is considerable. To achieve this aim, joint models for multistate and univariate longitudinal models (for each biomarker) are fitted and the results compare with two-stage based model. As presented in Appendix A (TableA.1, two approaches had similar likelihood values and the biases were minimal. Moreover, we showed that the impact of longitudinal biomarkers and the effects of covariates on the transition intensities between different states were the same in terms of the association direction and significance. On the other hand, the effect of incorporating the correlation between longitudinal biomarkers in the model was examined by comparing the use of univariate and multivariate prediction of biomarkers in the multistate model (Appendix A (Table A.2)). The results showed that the estimation of associations and their significant levels differ in two types of models. So, we can conclude that incorporating the correlation between longitudinal outcomes in the model is of particular importance. In fact, our proposed model is an extension of the two-stage model-based approach introduced by Guler et al. [8] for multivariate longitudinal and a single event of interest. Besides, we used multivariate linear mixed effects model to model the trajectories of longitudinal biomarkers, while they have used a pairwise approach as an alternative. Other features of our proposed model are similar to the model introduced by Guler et al., discussed in detail in [8].

In our case study, as it was expected, the level of creatinine and BUN can be used to monitoring post-transplant renal graft function. In fact, the levels of these biomarkers will increase in patients with a dysfunctional graft, and this increase is translated to a higher risk of rejection or mortality [20–23]. On the other hand, less hemoglobin level in the post-transplant period has been known to be associated with an increased risk of graft loss and mortality [24, 25] that was consistent with our findings. Moreover, some transplant medications are known to be associated with low levels of hemoglobin (anemia) [24], and the increased risk may be caused by these treatments. We showed that the prediction of measurements of investigated biomarkers at a fixed time point could be useful for monitoring the outcomes of transplantation. However, the ability of these measurements to predict the outcomes of interest is limited. Instead, here the cumulative effects of these biomarkers on the transplantations’ outcomes are demonstrated. So, it is advised that the measurements of these biomarkers should be recorded in such a way that the calculation of cumulative effects is possible.

#### Data Availability

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

#### Disclosure

This study has been adapted from the PhD thesis at the Hamadan University of Medical Sciences.

#### Conflicts of Interest

The authors declare that there are no conflicts of Interest.

#### Acknowledgments

The authors would like to thank the Vice-Chancellor of Research and Technology, Hamadan University of Medical Sciences, for the approval and support of the study. The study was funded by Vice-Chancellor for Research and Technology, Hamadan University of Medical Sciences (Grant no. 9812209742).

#### Supplementary Materials

Table A.1: results of comparison of joint models and two-stage based models for each longitudinal biomarker separately. Table A.2: results of comparison of two multistate models which included the estimation of the longitudinal biomarkers based on separated linear mixed effects models (Model 1) and a multivariate linear mixed effects model (Model 2) for longitudinal biomarkers.* (Supplementary Materials)*