Research Article  Open Access
Testing the Correlation and Heterogeneity for Hierarchical Nonlinear MixedEffects Models
Abstract
Nonlinear mixedeffects models are very useful in analyzing repeatedmeasures data and have received a lot of attention in the field. It is of common interest to test for the correlation within clusters and the heterogeneity across different clusters. In this paper, we address these problems by proposing a class of score tests for the null hypothesis that all components of within and betweensubject variance are zeros in a kind of nonlinear mixedeffects model, and the asymptotic properties of the proposed tests are studied. The finite sample performance of this test is examined through simulation studies, and an illustrative example is presented.
1. Introduction
Repeatedmeasures data are frequent observations in different areas of investigation, such as economics and pharmacokinetics. For instance, in longitudinal studies, observations on the same subject are usually made at different times. Analysis of such data requires accounting for the withincluster correlation and the betweensubject heterogeneity of the data. Randomeffects models are commonly used for analyzing clustered and repeatedmeasures data. It is of no doubt that the linear mixedeffects models play an important role in evaluating and analyzing the repeatedmeasures and clustered data. For an example, see Laird and Ware [1]. However, many repeatedmeasures data, such as growth data, doseresponse and pharmacokinetics data, are inherently nonlinear with respect to a given response regression function. Several different nonlinear mixedeffects model and various inference procedures have been proposed [2–5]. For models considered in these literatures, one of the interesting issues is whether there exists correlation within clusters or/and heterogeneity between subjects. It is well known that the misspecification of model can have serious impact on statistical inference. In regression model, two of the common assumptions are that of homogeneity and independence; the violation of these two assumptions can have adverse consequence for the efficiency of the estimators. So it is important to check these two assumptions whenever possible. For repeatedmeasures data and cluster data, if one can identify that there are no heterogeneity and correlation, which are caused by random effects, then a simple model can be used to fit the data and the efficient statistical inference can be obtained. However, if there are heterogeneity and correlation among outcomes and one does not identify their existence, then the overestimate or underestimate will be obtained for the parameters in model, and the precision and confidence of statistical inference are affected and even the mistaken conclusion is obtained. Therefore, It is an important and meaningful work to test the heteroscedasticity and correlation among outcomes in nonlinear mixedeffects model with repeatedmeasures data.
For testing explainable heterogeneity of variance, Cook and Weisberg [6] and Simonoff and Tsai [7] investigated the score test in classical linear models. Cai et al. [8], Eubank and Thomas [9], and Oyet and Sutradhar [10] considered some problems in nonparametric regression models. A number of tests for the unexplainable heterogeneity of variance have been proposed. Liang [11] applied the score test for homogeneity in different groups. JacqminGadda and Commenges [12] extended Liang's work to canonical generalized linear models(GLMs) with random effects and GEEs with a single correlation parameter. Lin [13] developed a unified theory for testing for correlation and heterogeneity in the framework of GLMs with random effects, using the Laplace expansion of the integrated logquasilikelihood. Zhu and Fung [14] extended Lin's work [13] to semiparametric mixed models. Zhang and Weiss [15] divided the heterogeneity of the variance into explainable and unexplainable and considered the test of explainable heterogeneity. However, for a nonlinear mixedeffects model with repeatedmeasures data, such tests have not been developed. Our aim here is to develop a class of score tests for correlation within clusters and heterogeneity across different subjects in a nonlinear mixedeffects model.
Following Lin [13], we use the Laplace expansion to derive a class of score tests of homogeneity and correlation for hierarchical nonlinear mixedeffects model. The advantage of the score test is that it does not require the user to specify the joint distribution of the random effects. Thus this test is rather easy to be carried out. We give the asymptotic distribution of the test statistic under the null hypothesis and examine the finitesample performance of the test through a Monte Carlo simulation study. It is found that the performance of this test is satisfactory in terms of both size and power even when the samples are of medium size. The rest of this paper is organized as follows. In Section 2, nonlinear mixedeffects model is introduced. In Section 3, we derive the score test statistic for the testing of correlation within clusters and heterogeneity across different subjects and give the asymptotic distribution of the test statistic under the null hypothesis. In Section 4, we examine the finitesample performance of these tests through a Monte Carlo simulation study and a real example is analyzed.
2. Nonlinear MixedEffects Model
In this paper, we consider the following nonlinear mixedeffects model as proposed by Pinheiro and Bates [5]. In the first stage the th observation on the th subject is modeled as where is a twicedifferentiable smooth nonlinear function of a subjectspecific parameter vector and the predictor is a normally distributed noised term, is the total number of subjects, and is the number of observations on the th subject, and is the total number of observations. In the second stage, the subjectspecific vector is modeled as where is a vector of unknown parameters, , are independent of random effects associated with the th subject, and and are design matrices for fixed and random effects, respectively. The random effects account for the correlation with the same cluster. Furthermore, we make the following distributional assumptions. The , are independently distributed as , where is a twice differentiable function, is a vector of unknown parameters, are the known design vectors, and if and only if , and they are independent of . Let denote the vector obtained from stacking up the clusterspecific entries and assume that is generated from some distribution with mean zero and covariance matrix with , where is a vector of unknown variance component. The magnitude of can be used to measure the degree of correlation and heterogeneity of the clusterspecific response vector within each subject. We postulate that each component of is a function of such that if . We further assume that the third and higherorder moments of the random effects are of order . These conditions are consistent with Lin [13], Hall and Praestgaard [16], and Zhu and Fung [14]. This model can be regarded as a hierarchical model that in some aspects generalized the linear mixedeffects model of Laird and Ware [1], the usual nonlinear model for independent data of Bates and Watts [17].
It should be pointed out that the distribution of the random error , need not to be normal. In this paper, we still assume the , to be normal. The main reason is to have the computation and deduction in mathematics become relatively simple. From the results presented in the below of this paper, it can be found that the derivation of the score test statistics is on the basis of the completedata log likelihood, which requires making some specifications for the random error in model and further the random error is directly related to the complexity and difficulty of computation and deduction in mathematics. In nonlinear mixedeffects model, the random effects entering nonlinearly in model make the likelihood analysis of nonlinear mixedeffects model more difficult and complicated than that of their linear counterpart. It can be seen that even if the error distribution is assumed to be normal, the computation and deduction of the test score are cumbersome, let alone the error distribution is the others. In addition, the score test does not depend on the normality of the random error, but some good properties of the normal distribution help us obtain the test statistics. Therefore, we choose the normal distribution as that of the random error.
For model (2.1) and (2.2), there is either the correlation within clusters or heterogeneity among observations. First, we investigate whether there exists correlation among observations in the same subject; it is equivalent to test or not. Thus we can use the hypothesis for the test of the correlation within clusters in nonlinear mixed model. Second, we study whether there exists heterogeneity of betweensubject variance and correlation within clusters at the same time. To address this problem, we can use the composite hypothesis
The testing of random effects in the nonlinear mixedeffects models has been discussed in some literatures (e.g., JacqminGadda and Commenges [12], Hall and Praestgaard [16], and Zhu and Fung [14]). However, the models they investigated are confined to the additive nonlinear mixedeffects model; that is, the random effects added to a nonlinear function or the rand noise is i.i.d random. Moreover, they generally studied the hypothesis (2.3), which tested whether there exists the correlation within clusters. In our paper, we consider the testing of random effects under some considerably general conditions for hierarchical nonlinear mixedeffects models. We not only study the testing correlation within clusters, but also investigate whether there exists heterogeneity of betweensubject variance at the same time.
Let , and . Denote the the vector obtained from stacking up the clusterspecific entries of the same symbol by , and . To derive score tests for null hypothesis in which and . We firstly study the properties of the log likelihood in model (2.1) and (2.2). For a given , the conditional loglikelihood of nonlinear mixedeffects model and its derivative are as follows: where is the first partial derivative of with respect to , and where and is the second partial derivative of with respect to evaluated at , and then
3. Score Test for Correlation and Heterogeneity within Clusters
In this section, we firstly use the Laplace expansions to develop a score test for the null hypothesis under model (2.1) and (2.2), which corresponds to having no correlation within clusters. Then using the same approach, we obtain a score test statistic for the null hypothesis under model (2.1) and (2.2), which corresponds to having no correlation within clusters and heterogeneity of betweensubject variance at the same time.
Let , for an data vector following model (2.1) and (2.2). The likelihood function is Using the moment assumptions on the random effects and the Laplace expansion similar to Lin [13] and Hall and Praestgaard [16], we expand the integrated likelihood (3.1) and obtain the marginal loglikelihood for as follows: where and . Note that (3.2) mimics a Laplace expansion of [18]. Many authors extended this expansion to a variety of models; see Lin [13] for an example. It is worth pointing out that (3.2) is a secondorder expansion of the marginal distribution of about . For model (2.1) and (2.2), from the logquasilikelihood expansion (3.2), some calculations give the efficient score under the null hypothesis as follows: where is the magnitude of evaluated at and is the maximum likelihood estimator of under .
To test for , we construct a score statistics as follows: where is the efficient information matrix of evaluated under . Here, where , and the scores , and the expectations are all calculated at . Noting the properties of the normal distribution, after some computations, we obtain, for , where , , , , , , and , , , . The detailed derivation of (3.6) is supplied in Appendix 4.2, and the others are similar and omitted.
One important feature of the proposed score test statistic is that a detailed specification of the distribution of the random effects is not necessary. Therefore, the test is robust against arbitrary mixed model alternation where only the first two moments of the random effects are specified. The following gives the asymptotic properties of the proposed score test statistic. The “asymptotic” in the theorem refers to the number of clusters with cluster sizes bounded in model (2.1) and (2.2).
Theorem 3.1. For model (2.1) and (2.2), under regularity conditions in Appendix 4.2, when in (2.3) is true, the asymptotic distribution of the score test statistic ST is a distribution with degrees of freedom.
Now, one considers testing the composite hypothesis
for detecting whether there exists heterogeneity of betweensubject variance and correlation within clusters at the same time. Let , and , and let be the maximizer of under . The Fisher information matrix can be obtained by differentiating (3.2) twice, taking the expectation under , and evaluating at . If one partitions the information matrix as
here
where and the score , and the expectations are all evaluated at , then, similar to (3.4), one constructs a score statistic as follows:
where is the efficient information matrix of . Similar to the derivation of , the expression form of the score statistics can be obtained. Therefore, for the sake of space, the detailed derivation of is omitted. The asymptotic properties of the score test statistic are given as follows.
Corollary 3.2. For model (2.1) and (2.2), under some regularity conditions, when in (2.4) is true, the asymptotic distribution of the score test statistic is a distribution with degrees of freedom.
Remark 3.3. It is obvious that obtaining MLEs of parameter under null hypothesis is very crucial in the score test. To obtain the MLEs of the unknown parameters in the hierarchical nonlinear mixedeffects models (2.1) and (2.2), we can take the NewtonRaphson method or the method of score according to the score functions and (see Appendix 4.2).
Denote by the unknown parameter vector, and let be the likelihood function defined in (3.2) under the null hypothesis and are the score function and Hessian matrix, respectively. If is the value of parameter vector at the previous iteration, the NewRaphson methods gets the new estimates such that and the methods of score get
Applying (3.12) or (3.13) iteratively, we can obtain the MLEs . For the asymptotic distribution of the estimate of , we can use the Taylor expansion to show that where is the true value of under the null hypothesis and is the Fisher information matrix.
It should be pointed out that the estimating parameters in the nonlinear model are a challenge. In general, the iterating procedure of obtaining the estimates of unknown parameters in a nonlinear model is a popular way. For example, Wong et al. [19] used the NewtonRaphson iteration approach to get the maximum likelihood estimation of ARMA model with error process for replicated observation. In this paper, we adopt a similar method to obtain the MLEs of hierarchical nonlinear mixedeffects model. We do not show in detail the convergence of the algorithm and only give the idea of proof, but we explore the behavior of this algorithm both in the simulation study and the analysis of a real data set in Section 4 and find that the convergence of this algorithm is guaranteed and the precision of convergence is well. The practical results show that the algorithm is reasonable and feasible.
4. Empirical Investigations
4.1. Simulation Studies
We perform some simulation studies to evaluate the sizes and the powers of the score tests proposed in Section 3. We first draw data for subjects with measurements on each unit from the model where are independent random variates from , the are random noise having the , and with as independent random drawing from , and the random effects are independent random from distribution , where is an identity matrix. We vary from 0 to 0.2, to study the sizes and powers of the score test and .
We consider four different sample sizes, . The experiment is replicated 2000 times for each parameter configuration. The nominal sizes of the tests are set to be 0.05. The empirical sizes and powers of test statistics and are presented in Tables 1 and 2 respectively.


The results in Tables 1 and 2 show that the empirical sizes of the tests are very close to 0.05. As the , and increase, the power of the test increases quickly and approaches 1. Furthermore, we notice that, for the score test , when but , the test has lower power. We speculate that the higher power is obtained because the discrepancy between the model and the postulated model is larger when there exist the random effects in the model; otherwise, the lower power is obtained. These findings are consistent with the theoretical results and also show that random effects in the models may be main factors influencing the inference performance.
To further confirm the performance of the test and , we present the plots of and with in Figures 1 and 2, respectively. The others are omitted for the sake of space. The plots also confirm that the asymptotic distribution of the test and is , which is consistent with the theoretical results.
(a)
(b)
(a)
(b)
4.2. An Illustrative Example
We illustrate the use of the test in analysis of longitudinal study. The data of the example are taken from the guinea pig data in Johansen [20]. In the experiment, 50 tissue samples were taken from the intestine of each of eight guinea pigs. For each guinea pig, five tissue samples were assigned randomly to each different concentration of Bmethylglucoside and the uptake volume was measured in micromoles per milligram of fresh tissue per 2 min. Only the means of the five tissue samples at each concentration for each animal were used. The data is previously analyzed by Lee and Xu [21], to investigate the diagnostic measures through local influence analysis. They proposed nonlinear mixedeffects model for uptake volume as function of concentration is where is the th uptake volume for individual , is the th concentration level for individual , , and is a vector of individual random effects with , where . In this paper, we assume that and . According to the algorithm proposed in Section 3, under the null hypothesis ; that is, there exist no random effects in the model, we obtain , , , , , and . The degree of freedom of is 3. We also calculate the test statistic and obtain , , , , and under . The degree of freedom of is 4. The small values of the test statistics suggest strongly rejecting the null hypothesis of independence and homogeneity and . These results demonstrate that there exists heterogeneity of the betweensubject variance and correlation within clusters at the same time for the guinea pig data. It should be pointed out that our results are consistent with those of Lee and Xu [21]. They also illustrated the dependence and heterogeneity for the guinea pig data. These results may suggest that the original model having both correlation within clusters and the heterogeneity across different clusters for the guinea pig data should be taken.The illustrative example also demonstrates that the score test can efficiently detect the random effects among the outcomes in practice.
It is worth noticing that the estimate of the random effects is an interesting topic when the existence of random effects is proved. However, in this paper, our interests focus on testing whether the random effects exist. So the estimate of random effects is out of our study scope. Lee and Xu [21] gave an estimate of through SAMCMC algorithm.
Appendices
Appendix A
In what follows, we assume that the expectations are taken under . According to (3.3), the th component ( of is, where , all being the same as that of in Section 3, respectively.
Note the following properties of the normal distribution: if , then for any nonnegative definite matrices and . After some calculations, we obtain Therefore, (3.6) holds immediately.
Appendix B
Proof of the Asymptotic Distribution of .
Here we study the asymptotic distribution of under . Let where and is given in (3.6). Assume is the true value of . For obtaining the asymptotic properties of the score test statistic . We assume the following regularity conditions under . These assumptions are similar to those given in [13] by Lin.
Condition 1. The size of cluster is a finite sequence of positive integers, the first and secondorder partial derivatives of with respect to parameter are bounded. The components of , are uniformly bounded.
Condition 2. There exists a neighborhood . The components of and are bounded in .
Condition 3. The logquasilikelihood of has the usual asymptotic properties, including consistency of and the linear expansion
Condition 4. There exists a positive definite matrix , such that
Proof. For any given constant vector , where is an vector, is a constant, is an vector, and is an vector, we have
Because is a block diagonal matrix, according to the definition of dependent sequence, then is the summation of a sequence of dependent random variables; that is, can be written as
where is an dependent sequence. From Conditions 1 and 2, we have that are uniformly bounded in for any given . It can be shown that . By applying Theorem of Chung [22] and applying the Condition 4 to , we have
in distribution as . Using the CramerWald device, we conclude that in distribution.
Note the linear expansion of the efficient score statistic about and use Condition 3. It follows that .This implies that in distribution, where . According to the consistency of and Slutsky's theorem, we have that
converges in distribution to a chisquare distribution with degrees of freedom as , that is, .
Acknowledgments
The authors thanks the editor and two anonymous referees for their careful reading of the paper and constructive comments. This work was partially supported by the Natural Science Foundation of China (10671038).
References
 N. M. Laird and J. H. Ware, “Randomeffects models for longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963–974, 1982. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 L. B. Sheiner and S. L. Beal, “Evaluation of methods for estimating population pharmacokinetic parameters. I. MichaelisMenten model: routine clinical pharmacokinetic data,” Journal of Pharmacokinetics and Biopharmaceutics, vol. 8, no. 6, pp. 553–571, 1980. View at: Publisher Site  Google Scholar
 M. J. Lindstrom and D. M. Bates, “Nonlinear mixed effects models for repeated measures data,” Biometrics, vol. 46, no. 3, pp. 673–687, 1990. View at: Publisher Site  Google Scholar
 W. K. Wong and R. B. Miller, “Analysis of ARIMANoise models with repeated time series,” Journal of Business and Economic Statistics, vol. 8, pp. 243–250, 1990. View at: Publisher Site  Google Scholar
 J. C. Pinheiro and D. M. Bates, “Approximations to the loglikelihood function in the nonlinear mixedeffects model,” Journal of Computational and Graphical Statistics, vol. 4, pp. 12–35, 1995. View at: Publisher Site  Google Scholar
 R. D. Cook and S. Weisberg, “Diagnostics for heteroscedasticity in regression,” Biometrika, vol. 70, no. 1, pp. 269–274, 1983. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 J. S. Simonoff and C. L. Tsai, “Improved tests for nonconstant variance in regression based on the modified profile likelihood,” Applied Statistics, vol. 42, pp. 31–41, 1994. View at: Google Scholar
 Z. W. Cai, C. M. Hurvich, and C. L. Tsai, “Score tests for heteroscedasticity in wavelet regression,” Biometrika, vol. 85, no. 1, pp. 229–234, 1998. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. L. Eubank and W. Thomas, “Detecting heteroscedasticity in nonparametric regression,” Journal of the Royal Statistical Society. Series B, vol. 55, no. 1, pp. 145–155, 1993. View at: Google Scholar  Zentralblatt MATH
 A. J. Oyet and B. Sutradhar, “Testing variances in wavelet regression models,” Statistics & Probability Letters, vol. 61, no. 1, pp. 97–109, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 K. Y. Liang, “A locally most powerful test for homogeneity with many strata,” Biometrika, vol. 74, no. 2, pp. 259–264, 1987. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 H. JacqminGadda and D. Commenges, “Tests of homogeneity for generalized linear models,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1237–1246, 1995. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 X. Lin, “Variance component testing in generalised linear models with random effects,” Biometrika, vol. 84, no. 2, pp. 309–326, 1997. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 Z. Y. Zhu and W. K. Fung, “Variance component testing in semiparametric mixed models,” Journal of Multivariate Analysis, vol. 91, no. 1, pp. 107–118, 2004. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 F. Zhang and R. E. Weiss, “Diagnosing explainable heterogeneity of variance in randomeffects models,” The Canadian Journal of Statistics, vol. 28, no. 1, pp. 3–18, 2000. View at: Publisher Site  Google Scholar
 D. B. Hall and J. T. Præstgaard, “Orderrestricted score tests for homogeneity in generalised linear and nonlinear mixed models,” Biometrika, vol. 88, no. 3, pp. 739–751, 2001. View at: Publisher Site  Google Scholar
 D. M. Bates and D. G. Watts, Nonlinear Regression Analysis and Its Applications, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1988. View at: Publisher Site  Zentralblatt MATH
 P. J. Solomon and D. R. Cox, “Nonlinear component of variance models,” Biometrika, vol. 79, no. 1, pp. 1–11, 1992. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 W. K. Wong, R. B. Miller, and K. Shrestha, “Maximum likelihood estimation of ARMA model with error processes for replicated observations,” Journal of Applied Statistical Science, vol. 10, no. 4, pp. 287–297, 2001. View at: Google Scholar  Zentralblatt MATH
 S. Johansen, Functional Relations, Random Coefficients, and Nonlinear Regression with Application to Kinetic Data, vol. 22 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1984.
 S. K. Lee and L. Xu, “Influence analyses of nonlinear mixedeffects models,” Computational Statistics & Data Analysis, vol. 45, no. 2, pp. 321–341, 2004. View at: Publisher Site  Google Scholar
 K. L. Chung, A Course in Probability Theory, London, UK, 2nd edition, 1974.
Copyright
Copyright © 2011 Qingming Zou and Zhongyi Zhu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.