Research Article | Open Access
Testing the Correlation and Heterogeneity for Hierarchical Nonlinear Mixed-Effects Models
Nonlinear mixed-effects models are very useful in analyzing repeated-measures data and have received a lot of attention in the field. It is of common interest to test for the correlation within clusters and the heterogeneity across different clusters. In this paper, we address these problems by proposing a class of score tests for the null hypothesis that all components of within- and between-subject variance are zeros in a kind of nonlinear mixed-effects model, and the asymptotic properties of the proposed tests are studied. The finite sample performance of this test is examined through simulation studies, and an illustrative example is presented.
Repeated-measures data are frequent observations in different areas of investigation, such as economics and pharmacokinetics. For instance, in longitudinal studies, observations on the same subject are usually made at different times. Analysis of such data requires accounting for the within-cluster correlation and the between-subject heterogeneity of the data. Random-effects models are commonly used for analyzing clustered and repeated-measures data. It is of no doubt that the linear mixed-effects models play an important role in evaluating and analyzing the repeated-measures and clustered data. For an example, see Laird and Ware . However, many repeated-measures data, such as growth data, dose-response and pharmacokinetics data, are inherently nonlinear with respect to a given response regression function. Several different nonlinear mixed-effects model and various inference procedures have been proposed [2–5]. For models considered in these literatures, one of the interesting issues is whether there exists correlation within clusters or/and heterogeneity between subjects. It is well known that the misspecification of model can have serious impact on statistical inference. In regression model, two of the common assumptions are that of homogeneity and independence; the violation of these two assumptions can have adverse consequence for the efficiency of the estimators. So it is important to check these two assumptions whenever possible. For repeated-measures data and cluster data, if one can identify that there are no heterogeneity and correlation, which are caused by random effects, then a simple model can be used to fit the data and the efficient statistical inference can be obtained. However, if there are heterogeneity and correlation among outcomes and one does not identify their existence, then the overestimate or underestimate will be obtained for the parameters in model, and the precision and confidence of statistical inference are affected and even the mistaken conclusion is obtained. Therefore, It is an important and meaningful work to test the heteroscedasticity and correlation among outcomes in nonlinear mixed-effects model with repeated-measures data.
For testing explainable heterogeneity of variance, Cook and Weisberg  and Simonoff and Tsai  investigated the score test in classical linear models. Cai et al. , Eubank and Thomas , and Oyet and Sutradhar  considered some problems in nonparametric regression models. A number of tests for the unexplainable heterogeneity of variance have been proposed. Liang  applied the score test for homogeneity in different groups. Jacqmin-Gadda and Commenges  extended Liang's work to canonical generalized linear models(GLMs) with random effects and GEEs with a single correlation parameter. Lin  developed a unified theory for testing for correlation and heterogeneity in the framework of GLMs with random effects, using the Laplace expansion of the integrated log-quasilikelihood. Zhu and Fung  extended Lin's work  to semiparametric mixed models. Zhang and Weiss  divided the heterogeneity of the variance into explainable and unexplainable and considered the test of explainable heterogeneity. However, for a nonlinear mixed-effects model with repeated-measures data, such tests have not been developed. Our aim here is to develop a class of score tests for correlation within clusters and heterogeneity across different subjects in a nonlinear mixed-effects model.
Following Lin , we use the Laplace expansion to derive a class of score tests of homogeneity and correlation for hierarchical nonlinear mixed-effects model. The advantage of the score test is that it does not require the user to specify the joint distribution of the random effects. Thus this test is rather easy to be carried out. We give the asymptotic distribution of the test statistic under the null hypothesis and examine the finite-sample performance of the test through a Monte Carlo simulation study. It is found that the performance of this test is satisfactory in terms of both size and power even when the samples are of medium size. The rest of this paper is organized as follows. In Section 2, nonlinear mixed-effects model is introduced. In Section 3, we derive the score test statistic for the testing of correlation within clusters and heterogeneity across different subjects and give the asymptotic distribution of the test statistic under the null hypothesis. In Section 4, we examine the finite-sample performance of these tests through a Monte Carlo simulation study and a real example is analyzed.
2. Nonlinear Mixed-Effects Model
In this paper, we consider the following nonlinear mixed-effects model as proposed by Pinheiro and Bates . In the first stage the th observation on the th subject is modeled as where is a twice-differentiable smooth nonlinear function of a subject-specific parameter vector and the predictor is a normally distributed noised term, is the total number of subjects, and is the number of observations on the th subject, and is the total number of observations. In the second stage, the subject-specific vector is modeled as where is a vector of unknown parameters, , are independent of random effects associated with the th subject, and and are design matrices for fixed and random effects, respectively. The random effects account for the correlation with the same cluster. Furthermore, we make the following distributional assumptions. The , are independently distributed as , where is a twice differentiable function, is a vector of unknown parameters, are the known design vectors, and if and only if , and they are independent of . Let denote the vector obtained from stacking up the cluster-specific entries and assume that is generated from some distribution with mean zero and covariance matrix with , where is a vector of unknown variance component. The magnitude of can be used to measure the degree of correlation and heterogeneity of the cluster-specific response vector within each subject. We postulate that each component of is a function of such that if . We further assume that the third- and higher-order moments of the random effects are of order . These conditions are consistent with Lin , Hall and Praestgaard , and Zhu and Fung . This model can be regarded as a hierarchical model that in some aspects generalized the linear mixed-effects model of Laird and Ware , the usual nonlinear model for independent data of Bates and Watts .
It should be pointed out that the distribution of the random error , need not to be normal. In this paper, we still assume the , to be normal. The main reason is to have the computation and deduction in mathematics become relatively simple. From the results presented in the below of this paper, it can be found that the derivation of the score test statistics is on the basis of the complete-data log likelihood, which requires making some specifications for the random error in model and further the random error is directly related to the complexity and difficulty of computation and deduction in mathematics. In nonlinear mixed-effects model, the random effects entering nonlinearly in model make the likelihood analysis of nonlinear mixed-effects model more difficult and complicated than that of their linear counterpart. It can be seen that even if the error distribution is assumed to be normal, the computation and deduction of the test score are cumbersome, let alone the error distribution is the others. In addition, the score test does not depend on the normality of the random error, but some good properties of the normal distribution help us obtain the test statistics. Therefore, we choose the normal distribution as that of the random error.
For model (2.1) and (2.2), there is either the correlation within clusters or heterogeneity among observations. First, we investigate whether there exists correlation among observations in the same subject; it is equivalent to test or not. Thus we can use the hypothesis for the test of the correlation within clusters in nonlinear mixed model. Second, we study whether there exists heterogeneity of between-subject variance and correlation within clusters at the same time. To address this problem, we can use the composite hypothesis
The testing of random effects in the nonlinear mixed-effects models has been discussed in some literatures (e.g., Jacqmin-Gadda and Commenges , Hall and Praestgaard , and Zhu and Fung ). However, the models they investigated are confined to the additive nonlinear mixed-effects model; that is, the random effects added to a nonlinear function or the rand noise is i.i.d random. Moreover, they generally studied the hypothesis (2.3), which tested whether there exists the correlation within clusters. In our paper, we consider the testing of random effects under some considerably general conditions for hierarchical nonlinear mixed-effects models. We not only study the testing correlation within clusters, but also investigate whether there exists heterogeneity of between-subject variance at the same time.
Let , and . Denote the the vector obtained from stacking up the cluster-specific entries of the same symbol by , and . To derive score tests for null hypothesis in which and . We firstly study the properties of the log likelihood in model (2.1) and (2.2). For a given , the conditional log-likelihood of nonlinear mixed-effects model and its derivative are as follows: where is the first partial derivative of with respect to , and where and is the second partial derivative of with respect to evaluated at , and then
3. Score Test for Correlation and Heterogeneity within Clusters
In this section, we firstly use the Laplace expansions to develop a score test for the null hypothesis under model (2.1) and (2.2), which corresponds to having no correlation within clusters. Then using the same approach, we obtain a score test statistic for the null hypothesis under model (2.1) and (2.2), which corresponds to having no correlation within clusters and heterogeneity of between-subject variance at the same time.
Let , for an data vector following model (2.1) and (2.2). The likelihood function is Using the moment assumptions on the random effects and the Laplace expansion similar to Lin  and Hall and Praestgaard , we expand the integrated likelihood (3.1) and obtain the marginal log-likelihood for as follows: where and . Note that (3.2) mimics a Laplace expansion of . Many authors extended this expansion to a variety of models; see Lin  for an example. It is worth pointing out that (3.2) is a second-order expansion of the marginal distribution of about . For model (2.1) and (2.2), from the log-quasilikelihood expansion (3.2), some calculations give the efficient score under the null hypothesis as follows: where is the magnitude of evaluated at and is the maximum likelihood estimator of under .
To test for , we construct a score statistics as follows: where is the efficient information matrix of evaluated under . Here, where , and the scores , and the expectations are all calculated at . Noting the properties of the normal distribution, after some computations, we obtain, for , where , , , , , , and , , , . The detailed derivation of (3.6) is supplied in Appendix 4.2, and the others are similar and omitted.
One important feature of the proposed score test statistic is that a detailed specification of the distribution of the random effects is not necessary. Therefore, the test is robust against arbitrary mixed model alternation where only the first two moments of the random effects are specified. The following gives the asymptotic properties of the proposed score test statistic. The “asymptotic” in the theorem refers to the number of clusters with cluster sizes bounded in model (2.1) and (2.2).
Theorem 3.1. For model (2.1) and (2.2), under regularity conditions in Appendix 4.2, when in (2.3) is true, the asymptotic distribution of the score test statistic ST is a -distribution with degrees of freedom.
Now, one considers testing the composite hypothesis for detecting whether there exists heterogeneity of between-subject variance and correlation within clusters at the same time. Let , and , and let be the maximizer of under . The Fisher information matrix can be obtained by differentiating (3.2) twice, taking the expectation under , and evaluating at . If one partitions the information matrix as here where and the score , and the expectations are all evaluated at , then, similar to (3.4), one constructs a score statistic as follows: where is the efficient information matrix of . Similar to the derivation of , the expression form of the score statistics can be obtained. Therefore, for the sake of space, the detailed derivation of is omitted. The asymptotic properties of the score test statistic are given as follows.
Remark 3.3. It is obvious that obtaining MLEs of parameter under null hypothesis is very crucial in the score test. To obtain the MLEs of the unknown parameters in the hierarchical nonlinear mixed-effects models (2.1) and (2.2), we can take the Newton-Raphson method or the method of score according to the score functions and (see Appendix 4.2).
Denote by the unknown parameter vector, and let be the likelihood function defined in (3.2) under the null hypothesis and are the score function and Hessian matrix, respectively. If is the value of parameter vector at the previous iteration, the New-Raphson methods gets the new estimates such that and the methods of score get
Applying (3.12) or (3.13) iteratively, we can obtain the MLEs . For the asymptotic distribution of the estimate of , we can use the Taylor expansion to show that where is the true value of under the null hypothesis and is the Fisher information matrix.
It should be pointed out that the estimating parameters in the nonlinear model are a challenge. In general, the iterating procedure of obtaining the estimates of unknown parameters in a nonlinear model is a popular way. For example, Wong et al.  used the Newton-Raphson iteration approach to get the maximum likelihood estimation of ARMA model with error process for replicated observation. In this paper, we adopt a similar method to obtain the MLEs of hierarchical nonlinear mixed-effects model. We do not show in detail the convergence of the algorithm and only give the idea of proof, but we explore the behavior of this algorithm both in the simulation study and the analysis of a real data set in Section 4 and find that the convergence of this algorithm is guaranteed and the precision of convergence is well. The practical results show that the algorithm is reasonable and feasible.
4. Empirical Investigations
4.1. Simulation Studies
We perform some simulation studies to evaluate the sizes and the powers of the score tests proposed in Section 3. We first draw data for subjects with measurements on each unit from the model where are independent random variates from , the are random noise having the , and with as independent random drawing from , and the random effects are independent random from distribution , where is an identity matrix. We vary from 0 to 0.2, to study the sizes and powers of the score test and .
We consider four different sample sizes, . The experiment is replicated 2000 times for each parameter configuration. The nominal sizes of the tests are set to be 0.05. The empirical sizes and powers of test statistics and are presented in Tables 1 and 2 respectively.
The results in Tables 1 and 2 show that the empirical sizes of the tests are very close to 0.05. As the , and increase, the power of the test increases quickly and approaches 1. Furthermore, we notice that, for the score test , when but , the test has lower power. We speculate that the higher power is obtained because the discrepancy between the model and the postulated model is larger when there exist the random effects in the model; otherwise, the lower power is obtained. These findings are consistent with the theoretical results and also show that random effects in the models may be main factors influencing the inference performance.
To further confirm the performance of the test and , we present the plots of and with in Figures 1 and 2, respectively. The others are omitted for the sake of space. The plots also confirm that the asymptotic distribution of the test and is , which is consistent with the theoretical results.
4.2. An Illustrative Example
We illustrate the use of the test in analysis of longitudinal study. The data of the example are taken from the guinea pig data in Johansen . In the experiment, 50 tissue samples were taken from the intestine of each of eight guinea pigs. For each guinea pig, five tissue samples were assigned randomly to each different concentration of B-methyl-glucoside and the uptake volume was measured in micromoles per milligram of fresh tissue per 2 min. Only the means of the five tissue samples at each concentration for each animal were used. The data is previously analyzed by Lee and Xu , to investigate the diagnostic measures through local influence analysis. They proposed nonlinear mixed-effects model for uptake volume as function of concentration is where is the th uptake volume for individual , is the th concentration level for individual , , and is a vector of individual random effects with , where . In this paper, we assume that and . According to the algorithm proposed in Section 3, under the null hypothesis ; that is, there exist no random effects in the model, we obtain , , , , , and . The degree of freedom of is 3. We also calculate the test statistic and obtain , , , , and under . The degree of freedom of is 4. The small -values of the test statistics suggest strongly rejecting the null hypothesis of independence and homogeneity and . These results demonstrate that there exists heterogeneity of the between-subject variance and correlation within clusters at the same time for the guinea pig data. It should be pointed out that our results are consistent with those of Lee and Xu . They also illustrated the dependence and heterogeneity for the guinea pig data. These results may suggest that the original model having both correlation within clusters and the heterogeneity across different clusters for the guinea pig data should be taken.The illustrative example also demonstrates that the score test can efficiently detect the random effects among the outcomes in practice.
It is worth noticing that the estimate of the random effects is an interesting topic when the existence of random effects is proved. However, in this paper, our interests focus on testing whether the random effects exist. So the estimate of random effects is out of our study scope. Lee and Xu  gave an estimate of through SA-MCMC algorithm.
Note the following properties of the normal distribution: if , then for any nonnegative definite matrices and . After some calculations, we obtain Therefore, (3.6) holds immediately.
Proof of the Asymptotic Distribution of .
Here we study the asymptotic distribution of under . Let where and is given in (3.6). Assume is the true value of . For obtaining the asymptotic properties of the score test statistic . We assume the following regularity conditions under . These assumptions are similar to those given in  by Lin.
Condition 1. The size of cluster is a finite sequence of positive integers, the first- and second-order partial derivatives of with respect to parameter are bounded. The components of , are uniformly bounded.
Condition 2. There exists a neighborhood . The components of and are bounded in .
Condition 3. The log-quasilikelihood of has the usual asymptotic properties, including consistency of and the linear expansion
Condition 4. There exists a positive definite matrix , such that
Proof. For any given constant vector , where is an vector, is a constant, is an vector, and is an vector, we have
Because is a block diagonal matrix, according to the definition of -dependent sequence, then is the summation of a sequence of -dependent random variables; that is, can be written as
where is an -dependent sequence. From Conditions 1 and 2, we have that are uniformly bounded in for any given . It can be shown that . By applying Theorem of Chung  and applying the Condition 4 to , we have
in distribution as . Using the Cramer-Wald device, we conclude that in distribution.
Note the linear expansion of the efficient score statistic about and use Condition 3. It follows that .This implies that in distribution, where . According to the consistency of and Slutsky's theorem, we have that converges in distribution to a chi-square distribution with degrees of freedom as , that is, .
The authors thanks the editor and two anonymous referees for their careful reading of the paper and constructive comments. This work was partially supported by the Natural Science Foundation of China (10671038).
- N. M. Laird and J. H. Ware, “Random-effects models for longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963–974, 1982.
- L. B. Sheiner and S. L. Beal, “Evaluation of methods for estimating population pharmacokinetic parameters. I. Michaelis-Menten model: routine clinical pharmacokinetic data,” Journal of Pharmacokinetics and Biopharmaceutics, vol. 8, no. 6, pp. 553–571, 1980.
- M. J. Lindstrom and D. M. Bates, “Nonlinear mixed effects models for repeated measures data,” Biometrics, vol. 46, no. 3, pp. 673–687, 1990.
- W. K. Wong and R. B. Miller, “Analysis of ARIMA-Noise models with repeated time series,” Journal of Business and Economic Statistics, vol. 8, pp. 243–250, 1990.
- J. C. Pinheiro and D. M. Bates, “Approximations to the log-likelihood function in the nonlinear mixed-effects model,” Journal of Computational and Graphical Statistics, vol. 4, pp. 12–35, 1995.
- R. D. Cook and S. Weisberg, “Diagnostics for heteroscedasticity in regression,” Biometrika, vol. 70, no. 1, pp. 269–274, 1983.
- J. S. Simonoff and C. L. Tsai, “Improved tests for nonconstant variance in regression based on the modified profile likelihood,” Applied Statistics, vol. 42, pp. 31–41, 1994.
- Z. W. Cai, C. M. Hurvich, and C. L. Tsai, “Score tests for heteroscedasticity in wavelet regression,” Biometrika, vol. 85, no. 1, pp. 229–234, 1998.
- R. L. Eubank and W. Thomas, “Detecting heteroscedasticity in nonparametric regression,” Journal of the Royal Statistical Society. Series B, vol. 55, no. 1, pp. 145–155, 1993.
- A. J. Oyet and B. Sutradhar, “Testing variances in wavelet regression models,” Statistics & Probability Letters, vol. 61, no. 1, pp. 97–109, 2003.
- K. Y. Liang, “A locally most powerful test for homogeneity with many strata,” Biometrika, vol. 74, no. 2, pp. 259–264, 1987.
- H. Jacqmin-Gadda and D. Commenges, “Tests of homogeneity for generalized linear models,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1237–1246, 1995.
- X. Lin, “Variance component testing in generalised linear models with random effects,” Biometrika, vol. 84, no. 2, pp. 309–326, 1997.
- Z. Y. Zhu and W. K. Fung, “Variance component testing in semiparametric mixed models,” Journal of Multivariate Analysis, vol. 91, no. 1, pp. 107–118, 2004.
- F. Zhang and R. E. Weiss, “Diagnosing explainable heterogeneity of variance in random-effects models,” The Canadian Journal of Statistics, vol. 28, no. 1, pp. 3–18, 2000.
- D. B. Hall and J. T. Præstgaard, “Order-restricted score tests for homogeneity in generalised linear and nonlinear mixed models,” Biometrika, vol. 88, no. 3, pp. 739–751, 2001.
- D. M. Bates and D. G. Watts, Nonlinear Regression Analysis and Its Applications, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1988.
- P. J. Solomon and D. R. Cox, “Nonlinear component of variance models,” Biometrika, vol. 79, no. 1, pp. 1–11, 1992.
- W. K. Wong, R. B. Miller, and K. Shrestha, “Maximum likelihood estimation of ARMA model with error processes for replicated observations,” Journal of Applied Statistical Science, vol. 10, no. 4, pp. 287–297, 2001.
- S. Johansen, Functional Relations, Random Coefficients, and Nonlinear Regression with Application to Kinetic Data, vol. 22 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1984.
- S. K. Lee and L. Xu, “Influence analyses of nonlinear mixed-effects models,” Computational Statistics & Data Analysis, vol. 45, no. 2, pp. 321–341, 2004.
- K. L. Chung, A Course in Probability Theory, London, UK, 2nd edition, 1974.
Copyright © 2011 Qingming Zou and Zhongyi Zhu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.