Research Article | Open Access
Guogen Shan, Hua Zhang, Tao Jiang, "Correlation Coefficients for a Study with Repeated Measures", Computational and Mathematical Methods in Medicine, vol. 2020, Article ID 7398324, 11 pages, 2020. https://doi.org/10.1155/2020/7398324
Correlation Coefficients for a Study with Repeated Measures
Repeated measures are increasingly collected in a study to investigate the trajectory of measures over time. One of the first research questions is to determine the correlation between two measures. The following five methods for correlation calculation are compared: (1) Pearson correlation; (2) correlation of subject means; (3) partial correlation for subject effect; (4) partial correlation for visit effect; and (5) a mixed model approach. Pearson correlation coefficient is traditionally used in a cross-sectional study. Pearson correlation is close to the correlations computed from mixed-effects models that consider the correlation structure, but Pearson correlation may not be theoretically appropriate in a repeated-measure study as it ignores the correlation of the outcomes from multiple visits within the same subject. We compare these methods with regard to the average of correlation and the mean squared error. In general, correlation under the mixed-effects model with the compound symmetric structure is recommended as its correlation is close to the nominal level with small mean square error.
Repeated-measure designs are increasingly used in practice to evaluate the trajectory of measures. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) study is a longitudinal study to investigate the progression of Alzheimer’s disease (AD) [1, 2]. This study evaluates the normal cognitive aging with the focus on mild cognitive impairment (MCI) and early AD. Brain structure and function are two research areas of interest in the ADNI study. As expected, brain structure volumes are often highly associated with results from cognitive tests [3–5]. In a longitudinal study, correlation for repeated measures should be calculated and reported. However, recent articles still only reported the Pearson correlation coefficient that ignores the correlation of outcomes from the same subject. For these reasons, it is important to compare the existing correlations for repeated measures and make recommendations for other researchers to use.
Bland and Altman [6, 7] discussed several approaches to compute correlations for repeated measures. They proposed calculating subject means to compute the Pearson correlation, where subject means eliminate the correlation of outcomes from the same subject. The second approach is to fit a linear regression model with one measure as the dependent variable and the other measure and the subject as the predictor variables. The second approach is similar to the one proposed by Christensen  who suggested computing correlation after adjusting for the subject effect [9–12]. In a repeated-measure study, the visit effect is the correlation within the subject. Lipsitz et al.  proposed computing partial correlation adjusting the visit effect. When data are correlated, mixed-effects models may be utilized to analyze data while controlling for these additional correlations. Lam et al.  were among the first to propose computing correlation between repeated measures under the compound symmetric (CS) correlation structure. Later, Hamlett et al.  developed programs to compute correlation under the CS structure by using the commercially available statistical software, SAS. In the work by Lam et al. , they also computed the correlation under the autoregressive correlation structure, AR(1). After that, Roy  developed SAS macros to compute correlation under the AR(1) structure and compared the correlations for repeated measures under these two correlation structures with limited simulation studies.
The objective of this manuscript is to conduct extensive simulation studies to compare the existing correlations for repeated measures with regard to the average of correlation and the mean squared error (MSE) and identify the correlation method that has the best performance to be used in practice. In addition to the parameter of interest (correlation for repeated measures), there are several nuisance parameters in the variance-covariance matrix: variances, correlations within each outcome, and correlation between outcomes from different visits [17–20]. It is computationally intensive for these comparisons. We have to use supercomputers for simulation studies. However, it is computationally feasible to calculate correlations for an observed data set. We use one example from the ADNI study to illustrate the application of the considered methods to calculate correlation between hippocampal volumes and a neuropsychological assessment to evaluate verbal memory.
We organize this article as follows. In Section 2, we introduce the existing methods to calculate correlations for repeated measures. In Section 3, we conduct extensive Monte Carlo simulation studies to compare the performance of the considered correlations with regard to the average of correlation and the MSE. A real example from the ADNI study is then used to illustrate the application of these correlations. Lastly, we provide conclusions in Section 4 on computing correlation for repeated measures when heterogeneity of correlation is observed across visits.
For a repeated-measure study with n participants, each participant has several scheduled visits (mi visits for the i-th subject). Suppose U and are the two measures in a repeated-measure study and Uij and are the outcomes of the i-th subject at the j-th visit, where i = 1, 2, …, n and j = 1, 2, …, mi. The correlation between U and , , is the parameter of interest to quantify a relationship between them. Several methods have been proposed to calculate , including independence models, partial correlation models, and mixed-effects models.
2.1. Independent Assumption
Bland and Altman [6, 7] were among the first to provide methods to compute longitudinal correlation coefficient. One of their approaches assumes the independence between outcomes from the same subject: Uij ⊥ Uij′ and . The longitudinal correlation is computed as the Pearson correlation by ignoring the correlation structure from repeated measures. This approach is referred to as the I approach, with the computed correlation as . This is a naive approach that is easy to apply. Irimata and Li  found that for a pharmacokinetics data set is very close to other correlations computed from other complicated models.
2.2. Subject Means
As suggested by Bland and Altman , the correlation can be computed by using the averages at the subject level to eliminate the subject effect in repeated measures. This correlation is able to address the research question whether the average of one measure is related to the average of another. When correlation within each measure is large, at different visits should be similar to each other, and this average correlation model would have good performance. We refer to this correlation approach as the M approach with the notation of .
These two correlations for repeated measures, and , are the Pearson correlation and can be computed by using many statistical software: such as the Proc corr procedure in SAS and the function cor or cor.test in R . The next five correlations are computed from regression models (e.g., mixed-effects models), and we would like to suggest using SAS Proc mixed procedure for implementation. Detailed SAS programs are provided in the Appendix.
2.3. Correlation Adjusting for the Subject Effect
Christensen  proposed computing correlation for repeated measures by partialling out the subject effect. The subject effect can be removed from the two measures by fitting a multivariate regression model with both measures being the outcomes and the subject ID as the only covariate. The residuals are used to compute the final correlation, which is essentially a partial correlation method for repeated data. This correlation is referred to as the PS correlation that partials out the subject effect, .
2.4. Correlation Adjusting for the Visit Effect
In the calculation, the correlation between the two measures is included in the multivariate model. In addition to that correlation, another correlation between measures at different visits may be considered. Lipsitz et al.  proposed computing partial correlation between outcome and one of the covariates by using this approach. When one of the two measures (e.g., measure U) is considered as the dependent variable, the other measure () is considered as the covariate. The correlation structure between visits is assumed to be compound symmetric. We refer this correlation as the correlation. We use for another correlation when is considered as the dependent variable in the model. One of the properties for correlation is , but this property is not met here: is generally not equal to .
2.5. Mixed-Effects Model
Let be the outcomes from the i-th subject, with the vector length of 2mi. The complete data can be reorganized in a long format, with the columns subject ID, visit, mtype, and outcome, where mtype = “U” for the U measure and mtype = “” for the measure. The long format utilizes 2mi rows for the outcomes from Yi.
The linear mixed-effects model is presented aswhere Xi and Zi are the design matrices for the fixed effect and the random effect, respectively. The random effect bi follows a multivariate normal distribution N (0, D), and the measurement error ϵi follows a multivariate normal distribution N (0, Ri). The detailed formula for D and Ri may be found in the article by Hamlett et al. . The fixed effect is β = (β0, βU, βW)′, where β0 is the intercept, and βU and βW are the fixed effects of U and W, respectively. Correlation between U and is computed aswhich is assumed to be independent of the visit.
Each subject has multiple visits, correlation within U is , and the correlation within is , where d (j − j′) = 1 for the CS structure and d (j − j′) = |j − j′| for the AR(1) structure. Since is correlated with both Uij and , therefore, Uij and are correlated and their correlation is assumed to be δρUW, where δ is a factor which is generally less than 1. Let and be the variances of U and , respectively. These variances and covariances are used to derive the variance-covariance matrix under the CS structure (see Lam et al.  and Hamlett et al. ) and that under the AR(1) structure (see Lam et al.  and Roy ).
We conduct simulation studies to compare the performance of the considered 7 methods for the correlation between repeated measures for a study with four visits. The mean values of U and are assumed to be (2.0, 1.9, 1.7, 1.4) and (0.8, 0.7, 0.6, 0.5), with both measures decreasing as time goes. Such data are commonly available from cognitive tests on elderly population and other studies. The prespecified correlation for repeated measures is , 0.5, and 0.8.
In the simulation studies for the AR(1) structure for the visit effect, the correlation within U is , with , 0.5, and 0.8, and the correlation within is , with , 0.5, and 0.8. The factor δ in the correlation between Uij and is assumed to be 0.6 in all simulations. The considered variances are and 3 and and 1. The variance-covariance matrix can be separated into two parts: and Ri. We assume that a quarter of variance is from Ri and the remaining is from . This weight is needed in order to calculate the covariances. For each configuration, we simulate B = 2,000 data sets.
Under the AR(1) structure for the visit effect, Figure 1 presents the average of correlation and the MSE when , , and n = 60 subjects. The MSE is defined aswhere is the estimator of by using the b-th simulated data set. It can be seen that the correlations adjusting the visit effect, and , often underestimate the correlation, while the correlation adjusting the subject effect, , always overestimate the correlation. The remaining methods have correlations close to the nominal level. Although is the best with the correlation around the nominal level, its MSE is much larger than the ones that have the correlations close to the nominal level. In the calculation of , each subject only has one outcome for each measure, as compared to multiple outcomes in other correlation calculations. Due to the reduced number of outcomes, the variance of is much large that leads to a large MSE. It is noted that or could have the lowest MSE in some cases, but their estimated correlations are generally much below the nominal level. For this reason, we exclude and in the following simulation studies. When a study has the same number of visits for each subject, the estimated correlation by using the mixed-effects model with the CS structure, , is very similar to under the independent assumption. The other mixed-effects model correlation has a similar correlation as and . The MSE of is slightly smaller than the MSEs of and when the correlations within U or are small, and this trend is reversed when and are large. Similar results are observed when is increased to 3.
When is increased to 0.5 (the top plot in Figure 2), the averages of , , and are generally above the nominal level, and the first two correlations are closer to the nominal level as compared to the third correlation . We also present the correlation estimates when sample size n is 100 in Figure 2. It can be seen that the MSEs become smaller as compared to the MSEs in the top plot (Figure 2) when sample size is 60.
Figure 3 shows the results when data sets are simulated under the CS structure given , , and n = 60. Correlation does not perform well with the average correlations much below the nominal level in many configurations. We also found that is likely to overestimate the correlation. It seems that and have different trajectories as increases. Both of these methods do not have satisfactory performance with regard to correlation under the CS structure, although has very good correlation estimates under the AR(1) structure. The other three correlations (, , and ) have similar good performance with regard to correlation and the MSE. It should be noted that the variance-covariance matrix is not positively defined when . Therefore, data sets cannot be generated for that configuration. We also simulate data under the unstructured correlation structure and found that , , and are still the best correlation estimates.
The aforementioned simulations have data sets that each subject has the same number of visits. In practice, it is possible that the number of visits may not be exactly the same for all subjects. We assume the number of visits is either 2, 3, or 4. Each subject is randomly assigned to have 2, 3, or 4 visits with the same probability. We present the results with n = 60 in Figure 4 when variances are small ( and and 1) and large ( and and 30). The MSE of is slightly smaller than that of ρI, and their biggest difference occurs when both and are large. is more likely to overestimate the correlation. Although has the correlation very close to the nominal level, it has the largest MSE as compared to other correlations. When variance is large, and are the best correlations with the estimated correlations much closer to the nominal level as to the configurations with small variances. The mixed-effects model correlation performs slightly better than with regard to the average of correlation and the MSE.
We use one data set from the ADNI study to illustrate the application of the considered correlation methods, with 47 participants who had 5-year visits and completed imaging volumes and memory scores. Hippocampal volumes are found to be highly associated with the delayed recall scores from the Rey Auditory Verbal Learning Test (RAVLT delayed recall) . The RAVLT delayed recall has the possible integer score from 0 to 15, which is often used to assess verbal memory. The higher the score is, the better the memory is.
The computed correlations are presented in Table 1. Participants in this data set have the same number of visits. For this reason, is very similar to . is slightly larger than them, and is smaller than them. Correlation adjusted by the subject effect is much smaller than . Correlations adjusted by the visit effect highly depend on which variable is considered as the dependent variable in the linear regression model. When hippocampal volumes are used as the dependent variable, the estimated correlation is high (0.686), and it becomes too low (0.016) when RAVLT delayed recalls are considered as the dependent variable.
It was reported by Wang et al.  that the Pearson correlation between hippocampal volumes and RAVLT delayed recall scores is slightly above 0.4. They also provided the Pearson correlations for each group (AD, MCI, and control) which are all below the correlation using combined samples. The correlation within the control group is the lowest.
From Table 1, RAVLT delayed recall scores always have a larger correlation with left hippocampal volumes than the correlation with right hippocampal volumes for each correlation method. We also add RAVLT immediate recall scores to further illustrate the application of the considered methods. Its correlation with left hippocampal volumes is often larger than its correlation with right hippocampal volumes. The estimated between hippocampal volumes and RAVLT delayed recalls is larger than that between hippocampal volumes and RAVLT immediate recalls.
From the simulation studies, under the independence assumption and using the mixed-effects model with CS variance-covariance structure are shown to have similar correlation estimates when subjects have the same number of visits. But, is appropriate as it models the data properly. The mixed-effects model correlation is recommended for use as its correlation is close to the nominal level with small mean square error.
Lam et al.  derived the detailed variance and covariance. The variances and and covariance are used to calculate . These variances and covariance estimates are not exactly the same from the independent model and the mixed-effects model with the CS structure: in the calculation and 16.6136 from the CS model. Because these estimated variances and covariance are very close between these two methods, the final estimated correlations are very similar. When a study has different number of follow-up for each participant, ρI and differ from each other [18, 24–26]. For a study with some possible outliers as seen in the data testing association between pH and PaCO2 , their difference is substantial. We provide the SAS programs by using that example in the Appendix.
When CS or AR(1) correlation structure for the visit effect is applied in the mixed-effects models [10, 25, 27], the computed correlation is the same at different visits. In the observation of the heterogeneity of correlations at different visits, the unstructured correlation may be considered for the visit effect. Under the heterogeneity assumption, correlation can be computed at each visit from a mixed-effects model [28–30]. This brings some challenges to explain the results, such as the overall correlation and the trend of correlation. Alternatively, when a study has a monotonic relationship between correlation and visit, one may include an additional predictor: visit, in the statistical model, to calculate a monotonic correlation for repeated measures.
The data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pd.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI was funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study was coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data were disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Shan’s research was partially supported by grants from the National Institute of General Medical Sciences from the National Institutes of Health: P20GM109025. Jiang’s work was supported by the National Natural Foundation of China under grant 11971433, and the First Class Discipline of Zhejiang –A (Zhejiang Gongshang University‐Statistics).
- E. Bayram, N. Kaplan, G. Shan, and J. Z. K. Caldwell, “The longitudinal associations between cognition, mood and striatal dopaminergic binding in Parkinson’s disease,” Aging, Neuropsychology, and Cognition, pp. 1–14, 2019.
- C. Bernick, H. Zetterberg, G. Shan, S. Banks, and K. Blennow, “Longitudinal performance of plasma neurofilament light and tau in professional fighters: the professional fighters brain health study,” Journal of Neurotrauma, vol. 35, no. 20, pp. 2351–2356, 2018.
- G. Shan, “More efficient unconditional tests for exchangeable binary data with equal cluster sizes,” Statistics & Probability Letters, vol. 83, no. 2, pp. 644–649, 2013.
- G. Shan and C. Ma, “Exact methods for testing the equality of proportions for binary clustered data from otolaryngologic studies,” Statistics in Biopharmaceutical Research, vol. 6, no. 1, pp. 115–122, 2014.
- G. Shan and G. E. Wilding, “Powerful exact unconditional tests for agreement between two raters with binary endpoints,” PLoS One, vol. 9, no. 5, Article ID e97386, 2014.
- J. M. Bland and D. G. Altman, “Calculating correlation coefficients with repeated observations: part 1–correlation within subjects,” BMJ, vol. 310, no. 6977, p. 446, 1995.
- J. M. Bland and D. G. Altman, “Statistics notes: calculating correlation coefficients with repeated observations: part 2—correlation between subjects,” BMJ, vol. 310, no. 6980, p. 633, 1995.
- R. Christensen, Plane Answers to Complex Questions: The Theory of Linear Models, Springer, New York, NY, USA, 1996.
- S. Litwin, B. Stanley, and E. A. Ross, “Two-sample binary phase 2 trials with low type I error and low sample size,” Statistics in Medicine, vol. 36, no. 9, 2017.
- G. Shan, Exact Statistical Inference for Categorical Data, Academic Press, San Diego, CA, USA, 1st edition, 2015.
- G. Shan, G. E. Wilding, A. D. Hutson, and S. Gerstenberger, “Optimal adaptive two-stage designs for early phase II clinical trials,” Statistics in Medicine, vol. 35, no. 8, pp. 1257–1266, 2016.
- G. Shan, H. Zhang, T. Jiang, H. Peterson, D. Young, and C. Ma, “Exact values for Simon’s two-stage designs in clinical trials,” Statistics in Biosciences, vol. 8, no. 2, pp. 351–357, 2016.
- S. R. Lipsitz, T. Leong, J. Ibrahim, and S. Lipshultz, “A partial correlation coefficient and coefficient of determination for multivariate normal repeated measures data,” Journal of the Royal Statistical Society: Series D, vol. 50, no. 1, pp. 87–95, 2001.
- M. Lam, K. A. Webb, and Donnell, “Correlation between two variables in repeated measures,” in Proceedings of the American Statistical Association Biometrics Section, pp. 213–218, Alexandria, VA, USA, 1999.
- A. Hamlett, L. Ryan, and R. Wolfinger, “On the use of PROC MIXED to estimate correlation in the presence of repeated measures,” Proc Statistics and Data Analysis, pp. 129–198, 2004.
- A. Roy, “Estimating correlation coefficient between two variables with repeated observations using mixed effects model,” Biometrical Journal, vol. 48, no. 2, pp. 286–301, 2006.
- G. Shan, “A better confidence interval for the sensitivity at a fixed level of specificity for diagnostic tests with continuous endpoints,” Statistical Methods in Medical Research, vol. 26, no. 1, pp. 268–279, 2017.
- G. Shan, “Exact confidence limits for the probability of response in two-stage designs,” Statistics, vol. 52, no. 5, pp. 1086–1095, 2018.
- G. Shan and J. J. Chen, “Optimal inference for Simon's two-stage design with over or under enrollment at the second stage,” Communications in Statistics—Simulation and Computation, vol. 47, no. 4, pp. 1157–1167, 2018.
- G. Shan and S. Gerstenberger, “Fisher’s exact approach for post hoc analysis of a chi-squared test,” PLoS One, vol. 12, no. 12, Article ID e0188709, 2017.
- K. Irimata, W. Paul, and X. Li, Estimation of Correlation Coefficient in Data with Repeated Measures, SAS Global, Washington, DC, USA, 2018.
- G. Shan and W. Wang, “ExactCIdiff: an R package for computing exact confidence intervals for the difference of two proportions,” The R Journal, vol. 5, no. 2, pp. 62–71, 2013.
- X. Wang, W. Zhou, T. Ye, X. Lin, and J. Zhang, “Alzheimer’s disease neuroimaging initiative. The relationship between hippocampal volumes and delayed recall is modified by APOE ϵ4 in mild cognitive impairment,” Frontiers in Aging Neuroscience, vol. 11, p. 36, 2019.
- G. Shan, “Exact confidence limits for the response rate in two-stage designs with over- or under-enrollment in the second stage,” Statistical Methods in Medical Research, vol. 27, no. 4, pp. 1045–1055, 2018.
- G. Shan, S. Banks, J. B. Miller et al., “Statistical advances in clinical trials and clinical research,” Alzheimer's & Dementia: Translational Research & Clinical Interventions, vol. 4, no. 1, pp. 366–371, 2018.
- G. Shan, H. Zhang, and T. Jiang, “Adaptive two-stage optimal designs for Phase II clinical studies that allow early futility stopping,” Sequential Analysis, vol. 38, 2019, In press.
- G. Shan and A. Hutson, Tests for Comparing Two Ordered Multinomials, Wiley StatsRef: Statistics Reference Online, Hoboken, NJ, USA, 2018.
- G. Shan, C. Dodge-Francis, and G. E. Wilding, “Exact unconditional tests for dichotomous data when comparing multiple treatments with a single control,” Therapeutic Innovation & Regulatory Science, vol. 54, Article ID 216847901881469, pp. 411–417, 2020.
- G. Shan and W. Wang, “Exact one-sided confidence limits for Cohen's kappa as a measurement of agreement,” Statistical Methods in Medical Research, vol. 26, no. 2, pp. 615–632, 2017.
- G. Shan, H. Zhang, and T. Jiang, “Minimax and admissible adaptive two-stage designs in phase II clinical trials,” BMC Medical Research Methodology, vol. 16, no. 1, p. 90, 2016.
Copyright © 2020 Guogen Shan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.