BioMed Research International

Volume 2018, Article ID 7409284, 13 pages

https://doi.org/10.1155/2018/7409284

## A Proposed Approach for Joint Modeling of the Longitudinal and Time-To-Event Data in Heterogeneous Populations: An Application to HIV/AIDS’s Disease

Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran

Correspondence should be addressed to Seyyed Mohammad Taghi Ayatollahi; ri.ca.smus@mihalotaya

Received 13 July 2017; Revised 15 November 2017; Accepted 5 December 2017; Published 9 January 2018

Academic Editor: Momiao Xiong

Copyright © 2018 Narges Roustaei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In recent years, the joint models have been widely used for modeling the longitudinal and time-to-event data simultaneously. In this study, we proposed an approach (PA) to study the longitudinal and survival outcomes simultaneously in heterogeneous populations. PA relaxes the assumption of conditional independence (CI). We also compared PA with joint latent class model (JLCM) and separate approach (SA) for various sample sizes (150, 300, and 600) and different association parameters (0, 0.2, and 0.5). The average bias of parameters estimation (AB-PE), average SE of parameters estimation (ASE-PE), and coverage probability of the 95% confidence interval (CP) among the three approaches were compared. In most cases, when the sample sizes increased, AB-PE and ASE-PE decreased for the three approaches, and CP got closer to the nominal level of 0.95. When there was a considerable association, PA in comparison with SA and JLCM performed better in the sense that PA had the smallest AB-PE and ASE-PE for the longitudinal submodel among the three approaches for the small and moderate sample sizes. Moreover, JLCM was desirable for the none-association and the large sample size. Finally, the evaluated approaches were applied on a real HIV/AIDS dataset for validation, and the results were compared.

#### 1. Introduction

In many studies, the repeated measures of a biomarker are recorded together with time to an event of interest. For example, in HIV/AIDS studies, the trajectories of CD4 counts and time-to-death are collected. In such studies, the interest often lies in understanding the relationships between the longitudinal history of a process and its effect on the risk of an event [1–9].

Classical models such as the separate analysis were performed for these types of data; consequently, the association between the longitudinal and survival outcomes is neglected because the linear mixed model for repeated measurements and the Cox model for time-to-event are conducted separately [6, 10, 11]. In addition, some practices consider the dependency between the two outcomes. Hence, the extended Cox model is used to incorporate the repeated measures as time-varying covariates [4]. In this method, time varying covariates are assumed to be observed continuously till the study terminated using this approach. In practice, this assumption usually does not stratify. Moreover, longitudinal biomarkers tend to be measured with error; thus, modeling the longitudinal measures by a mixed model accounts for this measurement error, which is neglected in the extended Cox model, thus leading to biased and inefficient estimates [4, 10, 12–14].

In recent years, joint model has been used to analyze the longitudinal and survival outcomes simultaneously to consider association between the two outcomes [1, 14, 15]. Joint model enjoys some advantages as compared to classical approaches such as Cox and linear mixed models alone and provides more powerful, accurate, efficient, and robust estimations [4, 10, 12, 16].

Most of the joint models allow subjects to just follow one pattern [5, 6, 13], and the baseline hazard is considered the same for all subjects. Thus, they become inappropriate when there are subgroups with different patterns of response profiles [13].

Joint latent class model (JLCM) is a type of joint models that assumes the population of the subjects to be heterogeneous with multiple homogenous patterns; it is known as the latent class (subpopulation, subtype, or subgroup), having its own longitudinal trajectory and survival curve [2, 5, 6, 17].

Conditional independence (CI) as a fundamental assumption of the JLCM shows that the entire association between longitudinal and survival outcomes is captured by the latent class structure. Thus, given these latent classes, the two types of outcomes are independent [17–20]. However, the CI assumption may not sufficiently show the strength of association and might underestimate the association between the longitudinal and survival processes [13]. Furthermore, to ensure the CI assumption, JLCM has to be examined for various numbers of latent classes, which may ultimately lead to choosing an inappropriate and meaningless size of classes.

We designed a simulation study to combine the joint model with the latent class framework which proposed an approach (PA) for heterogeneous population of subjects free from the CI assumption. At first, the class membership for each subject based on the latent class framework was identified for appropriate number of latent classes. Then, the joint model for longitudinal and survival processes was conducted separately in each latent class for PA. In addition, the separate approach (SA), the linear mixed model for the longitudinal data, and the extended Cox model for the survival outcome were applied separately in each latent class. Finally, we compared PA with JLCM and SA for various sample sizes and different association parameters. In addition, we focused on both the longitudinal and survival outcomes in this study.

#### 2. Materials and Methods

##### 2.1. Models Framework

###### 2.1.1. Joint Latent Class Model (JLCM)

JLCM assumes that the subjects in each latent class have their own specific longitudinal trajectory and risk of the event, which is useful in many types of research with different patterns of the longitudinal and survival outcomes. In addition, JLCM can be performed for normal and nonnormal distributions and ordinal outcomes [6, 21]. This model does not require normal distribution of random-effects assumption, since it consists of several subpopulations, where this assumption is not realistic [22].

JLCM includes three components: the latent class membership, the longitudinal, and survival submodels. Given the latent class , there is no association between two processes of the longitudinal and survival outcomes; consequently, dependency between time-to-event and longitudinal processes is captured by the structure of latent class [5]. Several methods were introduced to evaluate the CI assumption: evaluation based on the posterior classification, analysis of the residuals conditional on the event, and a score test [19, 23, 24]. Among these approaches, the score test is more powerful than the other methods to assess the CI assumption [2, 5].

In practice, JLCM is applied to a number of latent classes from one to three; the appropriate number of latent classes is determined using the best Bayesian information criterion (lower BIC) and satisfactory CI assumption [6, 20].

Each subject is assigned to each latent class, which has the highest class membership probabilities [25]. A case that is wrongly classified is called misclassified on a categorical variable [13].

###### 2.1.2. Separate Approach (SA)

Commonly, the linear mixed model is used for continuous longitudinal measurements. Also, the parametric or semiparametric survival models are used for modeling the time-to-event data [11]. In SA, the probability that a subject belongs to a latent class structure can be modeled via a latent class framework. Next, the linear mixed model for modeling the longitudinal measurements and the extended Cox model by incorporating repeated measurements into the survival data were conducted for each latent class.

###### 2.1.3. Proposed Approach (PA)

We incorporated the latent class framework to identify its subgroups behind the observed longitudinal measurements and survival outcome. PA provides an approach that achieves appropriate number of the latent classes in heterogeneous populations without requiring the CI assumption. Appropriate number of latent classes are determined by a suitable and easier interpretation according to researcher’ comments. For PA, each subject was allocated to an appropriate class according to the highest class membership probabilities. Then, joint model was conducted for each class; additionally, in each latent class, the association between the longitudinal and time-to-event data was modeled by the entire longitudinal trajectory as a covariate in the survival submodel.

*(**1) Latent Class Framework*. The class membership probability for a subject belonging to a latent class can be modeled via a multinomial logistic regression with vector of covariate :Let represent the latent variable with latent classes.

is the intercept for class and is the vector of class-specific parameters associated with the set covariates . Also, to ensure identifiability, and , that is, last latent class as [2, 5, 25].

In the application, parameters from latent class framework are estimated by maximizing the log likelihood function with iteration of Expected-Maximization (EM) algorithm with steps of Newton-Raphson [26, 27].

*(**2) Longitudinal Submodel*. The longitudinal submodel is specified as a class-specific linear mixed model. Let be the total number of subjects and let be the number of repeated measurements for subject . The longitudinal submodel given to each latent class can be written asGiven the latent class , is the longitudinal outcome for subject at the time of , and represents the random effect covariate vectors at the time for subject , associated with the -vector of random effect , where is the fixed effects covariate vectors at the time , which is associated with the -vector of fixed effect. The random error term, is usually assumed to be normally distributed.

*(**3) Survival Submodel*. The survival submodel is specified as a Cox or any parametric survival model. Given latent class , the survival submodel is specified aswhere is the baseline hazard function for class and is the covariate vector associated with the -vector parameters for the latent class .

The quantity, , is the trajectory of the longitudinal function for class to connect the longitudinal process with the survival outcome. The parameter links the longitudinal and time-to-event outcomes in each class.

##### 2.2. Simulation Studies

We conducted this simulation study to examine bias, SE, the average bias of parameters estimation (AB-PE), the average SE of parameters estimation (ASE-PE), and coverage probability of the 95% confidence interval (CP) for three approaches (PA, JLCM, and SA) for the longitudinal and survival submodels. AB-PE shows the average of absolute bias of all parameters estimation. CP shows the proportion of time that confidence interval contains the true value.

A multinomial logistic model was considered for the latent class membership for each subject: We considered a binary and a continuous covariate, where is called a treatment effect, which was assumed as a binomial distribution with and . We assumed two latent classes (), where approximately 50% of the subjects belonged to class 1.

The longitudinal outcome was generated from a linear mixed model, where time of measurements was fixed at with a maximum of 11 measurements. The longitudinal submodel given to each latent class isTo achieve appropriate heterogeneous classes and to decrease misclassification rate, we considered the parameters with opposite direction in two classes from a previous study [13]. Thus, in the first class, we set coefficients to be and assumed subject-specific unobservable heterogeneity in class 1, . The error term had normal standard distribution. In the second class, we set coefficients to be and assumed that , and where the random intercept effect was assumed independent from the error term. The is called trajectory function for each class.

The survival submodel assumed a Cox model with a Weibull baseline hazard function. The event time was generated using an inverse cumulative hazard function [15, 28, 29]. The censored time is noninformative and is uniformly distributed random variable on 2.5+ uniform . Therefore, the observed failure time for the th subject was considered as the minimum of true event time and censored time [20, 30]. As some previous studies, the censoring rate was considered around 60% in this simulation study [13, 28].

The survival submodel was generated for each latent class as follows:The treatment effect on the time-to-event was and −0.5 in classes 1 and 2, respectively. The shape and scale parameters, , of baseline hazard function were (0.6, 0.001) and (1, 0.001) in classes 1 and 2, respectively.

Sets of simulated data were performed for three sample sizes (150, 300, and 600 as small, moderate, and large sample sizes). Similar to previous study, three association parameters between longitudinal and survival outcomes were considered for none, moderate, and considerable association, respectively [12]. The magnitude of the association parameters was assumed the same in the two classes. For each simulation, the three approaches of PA, SA, and JLCM were fitted. We ran 1000 replications for each set of simulated data.

There are several methods to estimate parameters in joint models, including ML, restricted maximum likelihood (REML), and Bayesian method [18]. In PA, Gauss-Hermite integration method for maximizing the log likelihood of the joint distribution and EM iterations algorithm or quasi-Newton iterations were used. In JLCM, ML with EM algorithm was implemented to estimate parameters. For SA approach, ML in the longitudinal submodel and REML in the survival submodel were used for parameters estimation. The JM and LCMM packages in R version 3.1.1 software were used in this study.

#### 3. Results of Simulation Study

##### 3.1. Effect of Sample Size

Simulations results showed that in most cases when the three approaches were used the sample size increased, while AB-PE and ASE-PE decreased, and the CP went close to nominal level of 0.95. Tables 1–3 and Figures 1 and 2 present detailed information.