Research Article | Open Access
The Empirical Cressie-Read Test Statistics for Longitudinal Generalized Linear Models
For the marginal longitudinal generalized linear models (GLMs), we develop the empirical Cressie-Read (ECR) test statistic approach which has been proposed for the independent identically distributed (i.i.d.) case. The ECR test statistic includes empirical likelihood as a special case. By adopting this ECR test statistic approach and taking into account the within-subject correlation, the efficiency theory results of estimation and testing based on ECR are established under some regularity conditions. Although a working correlation matrix is assumed, there is no need to estimate the nuisance parameters in the working correlation matrix based on the quadratic inference function (QIF). Therefore, the proposed ECR test statistic is asymptotically a standard limit under the null hypothesis. It is shown that the proposed method is more efficient even when the working correlation matrix is misspecified. We also evaluate the finite sample performance of the proposed methods via simulation studies and a real data analysis.
Longitudinal studies are increasingly common in many scientific research fields, including epidemiology, econometrics, medicine, and life and social sciences. For example, longitudinal studies are often used in psychology to study developmental trends across the life span, in sociology to study life events throughout lifetimes or generations, and in medical area to take different treatments at the start of the study and to see what kind of effects the assigned patients have with each treatment by week or by year.
Longitudinal data modeling is a statistical method often used in the experiments that are designed such that responses on the same experimental units are observed at each repetition. However, generalized linear models (GLMs ) have become a favored tool for the modelling of clustered and longitudinal data by allowing for the non-Gaussian data and nonlinear link functions, such as binomial or Poisson’s type responses. The intrinsic complexity of longitudinal GLMs makes them challenging. First, one characteristic of longitudinal data is the within-subject correlation among the repeated measurements. Ignoring this within-subject correlation causes a loss of efficiency in general problems, and the presence of correlation makes it hard to establish the underlying asymptotic theory. Second, the full likelihood for longitudinal data is often difficult to specify, particularly for the correlated non-Gaussian data. Researchers had developed appropriate methods and criteria for longitudinal GLMs, where generalized estimating equations (GEE [2, 3]) or a recently proposed related method based on quadratic inference functions (QIF; see ) is an attractive option for longitudinal regression, particularly when marginal relationships rather than within-subject correlation are of primary interest.
In the present paper, we consider the popular marginal longitudinal GLMs. Suppose that is the multivariate response for the subject and is the matrix of the covariates for the th subject (). Observations from different subjects are independent, but those from the same subjects are correlated. Assume that the marginal mean of is where is a known link function, is the unknown parameter vector of interest, and is a vector for . In order to simplify notation, but without loss of generality, we assume . Recently, there has been considerable interest of investigating the GLMs, such as Liang and Zeger , Zeger and Liang , Fitzmaurice , Qu et al. , Pan , Balan and Schiopu-Kratina , Qu and Li , Wang and Qu , Wang , and Li et al. .
The empirical likelihood method was introduced by Owen [12–14] as a nonparametric method of inference based on a data-driven likelihood ratio function in the i.i.d. case and had been applied to various statistical models. For example, Kolaczyk  and Chen and Cui  made an extension to the generalized linear models. As for other studies, see Baggerly , DiCiccio et al. , Chen and Qin , Qin and Lawless , Zhu and Xue , and Li et al.  among others. Owen  is one of the best references for this field. Although the empirical likelihood method had been investigated by many authors, those researches had mostly focused on independent observations. For the analysis of longitudinal data, Xue and Zhu  applied the empirical likelihood approach to the varying coefficient models with longitudinal data, but they did not consider the within-subject correlation structure of the longitudinal data. Li et al.  and Li et al.  proposed the generalized empirical likelihood method for longitudinal data by introducing the known within-subject correlation structure. Wang et al.  proposed two generalized empirical likelihood methods for the longitudinal linear model by taking into consideration within-subject correlations. They showed that, by plugging the estimator of the working correlation matrix into the empirical log-likelihood ratio (ELR), the resulting ELR is asymptotically a weighted sum of independent -variables with unknown weights. Thus, the empirical likelihood method needs to be adjusted so that it can be efficiently used to accommodate the correlation inherent in longitudinal data. However, in many applications, how to estimate the working correlation matrix effectively is a challenging problem.
Instead of the empirical likelihood ratio statistic, Baggerly  used the empirical Cressie-Read (ECR) test statistic and showed that it is also asymptotically a Chi-squared distribution in the i.i.d. case. Bravo  extended the ECR approach to inference for -mixing processes. The empirical Cressie-Read test statistic has user-specified parameter and encompasses several commonly used testing statistics as special cases, such as the empirical likelihood statistic (), the maximum entropy, minimum information or the Kullback-Leibler statistic (), the Neyman-modified statistic or the Euclidean likelihood statistic (), the Freeman-Tukey statistic (), and Pearson’s statistic (), the first two being defined in a limiting sense. Thus, we can use these results to construct confidence regions of the parameter of interest.
In this paper we develop Baggerly’s results on the ECR test statistic for the longitudinal data by combining the idea of QIF in Qu et al.  and apply the proposed method to the marginal longitudinal GLMs (1). Specifically, we derive the asymptotic distributions of the maximum empirical Cressie-Read likelihood estimator (MECRLE) and the ECR test statistic. Although the working correlation matrix is assumed, the proposed method needs not to estimate the nuisance parameters associated with correlations. This is because the inverse of the commonly used working correlation matrix can be exactly represented or approximated by a linear combination of basis matrices such that the dimension of the unbiased estimating functions is greater than the number of unknown parameters. Thus, the efficiency results can be obtained by combining the ECR method with the generalized method of moments (GMM) proposed by Hansen . Therefore, the proposed method inherits the advantages of the empirical likelihood, QIF, and GMM methods, and the proposed method represents a good alternative in this area as well.
The paper is organized as follows. In Section 2, we propose the more generalized method of the ECR test statistic for marginal longitudinal GLMs by combining with the QIF approach. The technical conditions and some asymptotic properties are given in Section 3. In Section 4, we present the results for simulation studies and a real dataset to illustrate the proposed methods. The technical proofs of the main results are presented in Section 5.
2. Model and ECR Test Statistic
Suppose that the population comes from the marginal longitudinal generalized linear model (1). Then the marginal mean of is and the marginal variance of is where is a variance function and is a scale parameter.
With assumptions on the first marginal moments, the GEE  estimator of is defined as the solution of where , , is an matrix and is a working covariance matrix with being the diagonal matrix of marginal variances and as the working correlation matrix, where is a vector which fully characterizes . The main advantage of the GEE method is that it yields a consistent estimator even if the working correlation matrix is misspecified.
Qu et al.  introduced the quadratic inference function (QIF) by assuming that the inverse of the working correlation can be approximated by a linear combination of several basis matrices; that is, where is the identity matrix, are symmetric basis matrices which are determined by the structure of , and are constant coefficients. The advantage of this approach is that it does not require estimation of linear coefficients ’s which can be viewed as nuisance parameters.
To implement the QIF approach, we need to choose the basis for the inverse of the correlation matrix . Qu et al.  and Qu and Song  found that, if is the exchangeable working correlation matrix, , where is the identity matrix and is a matrix with 0 on the diagonal and 1 off diagonal. If the working correlation matrix is AR(1) with , then can be written as a linear combination of three basis matrices, that is, , where is the identity matrix, can be 1 on the subdiagonal and 0 elsewhere, and can be 1 on the corners and and 0 elsewhere. However, can often be dropped out of the model, as removing does not affect the efficiency of the estimator too much, and this can simplify the estimation procedure. More details can be found in Qu et al. , Qu and Song , and Qu and Li  among others. In addition, Qu and Lindsay  developed an adaptive estimating equation approach to find a reliable approximation to the inverse of the variance matrix.
Let It is easy to check that the GEE defined in (4) becomes a linear combination of the above extended score vector of . Note that the dimension of is , and it is greater than the number of unknown parameters; thus the GEE method is unavailable. Therefore, Qu et al.  proposed the quadratic inference function by extending the method of GMM proposed by Hansen  to obtain the estimator of .
In this paper, we consider an alternate method using the following empirical Cressie-Read test statistics [17, 32]: where is a user-specified parameter. ECR (7) is a generalization of the empirical log-likelihood ratio statistic and can also be defined as Let be a multinomial likelihood supported on the observed data, and suppose that we are interested in testing the null hypothesis , where is the true parameter vector. The empirical Cressie-Read family of test statistics for can be based on the following constrained minimization problem: As noted by Baggerly , a unique value for the right-hand side of (9) exists, provided that zero is inside the convex hull of the for given . Applying the Lagrange multiplier method, we can obtain the solution of by minimizing (9). Let where and are the Lagrange multipliers. Let ; we have where and the vector are determined by In the present paper, we only are interested in the case of . When , the estimator of is viewed as the empirical exponential tilting estimator (see [33, 34]). When and substituting (11) into (7), we obtain the following empirical Cressie-Read test statistic: Baggerly  had applied the empirical Cressie-Read test statistic to the general parameter case and shown that all members of the empirical Cressie-Read family have a Chi-squared calibration and enjoy the advantages of the empirical likelihood method. From the proof of Theorem 1 in this paper, we can obtain that where and . If satisfies , the estimator is called the maximum empirical Cressie-Read likelihood estimator (MECRLE) of the parameter .
3. Asymptotic Properties
To establish the asymptotic properties, we need the following regularity conditions.(C1)The matrix converges a.s. to an invertible positive definite matrix . This condition holds based on the weak law of large number when tends to infinity and the cluster size is fixed.(C2)The domain is a compact subset of and the true parameter value is in its interior. There exists a unique satisfying mean zero model assumption .(C3)Let where ; then is a.s. continuously differentiable in . Denote this derivative matrix by ; then has full column rank a.s.(C4)Assume that is continuous in a neighborhood of the true value and the rank of is . In addition, converges in probability to when converges in probability to .
Conditions (C1)–(C4) are actually quite mild and can be easily satisfied, and these conditions are also found in Qin and Lawless  and Qu et al. . Conditions (C1) and (C4) ensure that the asymptotic variance exists for the proposed estimator. Condition (C2) ensures that there exists a -consistent solution in the compact subset . (C3) is a common condition used in GLMs with longitudinal data.
Theorem 1. Assume that conditions (C1–C4) hold. Then, as , with probability tending to 1, attains its minimum value at some point in the interior of the ball and , , and satisfy where
Theorem 2. Suppose that the conditions of Theorem 1 hold, and further assume that is continuous about in a neighborhood of the true value . Then, as , one has where “” denotes the convergence in distribution and
Remark 3. When , the empirical log-likelihood ratio statistic is the special case of the empirical Cressie-Read test statistic .
Remark 4. The proposed method provides a way to find efficient estimates for marginal longitudinal GLMs when the within-subject correlation is considered. It is known that we can construct the confidence regions of using Theorem 2. The asymptotic variance of does not depend on any and can be consistently estimated by or by the same expression with the ’s, replaced by .
Theorem 5. Suppose that the conditions of Theorem 2 hold; then, under the hypothesis , is asymptotically a Chi-squared distribution with degrees of freedom as .
Theorem 5 allows us to use the empirical Cressie-Read test statistic for testing or obtaining the confidence regions for the parameter . For any , the confidence region of with asymptotic coverage can be determined by
We can construct a goodness-of-fit statistic to test the model assumption: We call the empirical Cressie-Read likelihood ratio statistic and call the model test statistic.
Theorem 6. Suppose that the conditions of Theorem 2 hold and has dimension and has dimension with . Then, under the model assumption (21), is asymptotically a Chi-squared distribution with degrees of freedom as ; under the null hypothesis , is asymptotically a Chi-squared distribution with degrees of freedom as .
4. Numerical Studies
4.1. Simulation Studies
In this subsection, we report the simulation study to illustrate the finite sample properties of the proposed ECR test statistic. For simplicity, we only compute the empirical likelihood (EL, ) and the empirical exponential tilting (ET, ) and compare the proposed methods with the GEE and QIF methods. Throughout the simulation study, each dataset comprised and 100 subjects and observations per subject over time. For each case, we repeat the experiment 500 times.
Consider the following logistic regression model. The response variable is binary and its marginal expectation given is where and the covariate has a multivariate normal distribution with mean zero, marginal variance 1, and an AR(1) correlation matrix with autocorrelation coefficient 0.7. The binary response vector for each cluster has mean specified by (22) and an AR(1) correlation structure with an autocorrelation coefficient . Table 1 reports the simulation results for the average coverage probabilities and the average lengths of the confidence intervals for , , and when the nominal level is 0.95 and the true correlation structure is AR(1) structure.
From Table 1, it is easy to see that the empirical likelihood method performs much better in terms of coverage accuracies of the confidence intervals even when the working correlation structure is misspecified. And when the working correlation structure is correctly specified, the performances of all the methods are usually slightly better. Table 1 also shows that the average coverage probabilities obtained by all methods tend to the nominal level 0.95 and the average lengths decrease as increases.
When sample size is 60 and the true correlation structure is AR(1), the histograms and the QQ plots of the 500 maximum empirical likelihood estimators , , and based on the empirical likelihood method are plotted in Figure 1 and the histograms and the QQ plots of the 500 maximum empirical exponential tilting estimators , , and based on the empirical exponential tilting method are plotted in Figure 2 under the misspecified and correct correlation structures, respectively. Figures 1 and 2 show that empirically these estimators are asymptotically normal even when the working correlation structure is misspecified.
4.2. Application to Real Data
To examine the performance of the proposed method, we analyze real data [35, 36] from a six-week frequent magnetic resonance imaging (MRI) substudy of the Betaseron clinical trial conducted at the University of British Columbia in relapsing-remitting multiple sclerosis involving 52 patients. The real data concerns a longitudinal clinical trial to assess the effects of neutralizing antibodies on interferon beta-1b (IFNB) in relapsing-remitting multiple sclerosis (MS), which is a disease that destroys the myelin sheath that surrounds the nerves. All patients were randomized into three treatment groups with allocation of 17 patients being treated by placebo, 17 by low dose, and 16 by high dose. This dataset has been studied by Song  and Li et al. . There exist the missing values in this dataset; for convenience, we only analyze the balanced longitudinal data which contain 39 patients.
For the analysis of this data, the binary response variable is exacerbation, which refers to whether an exacerbation began since the previous MRI scan, and is 1 for yes and 0 for no. Several baseline covariates are included in the model. They are treatment (trt), time () in weeks, squared time (), and duration of disease (dur) in years. Here trt is treated as an ordinal covariate with scale 0, 1, 2 representing zero (placebo), low, and high dosage of the drug treatment. We consider the following marginal logistic model for the data: where is the probability of exacerbation at visit for subject . Two correlation structures (exchangeable (CS) and AR(1)) are considered in this analysis. Table 2 reports the coefficient estimators and the corresponding standard errors for CS and AR(1) correlation structures, respectively. Table 3 reports the 95% confidence intervals and the lengths of the confidence intervals for CS and AR(1) correlation structures.
From Tables 2 and 3, we can see that the parameter estimators are very similar for all four methods under the CS and AR(1) working correlation structures. For the CS working correlation structure, the empirical likelihood and the empirical exponential tilting methods have smaller standard errors and smaller interval lengths than the GEE and QIF methods. However, the GEE method has the smaller standard errors and the smaller interval lengths under the AR(1) working correlation structure. Similar to the results in Song , we also find that the baseline disease severity measured as the duration of disease before the trial is an important explanatory variable associated with the risk of exacerbation, and we also do not find strong evidence that the drug treatment is effective in reducing the risk of exacerbation. Therefore, all the methods have comparable performance, and these findings are close to the existing analysis in Song .
5. Proofs of the Theorems
In this section, we present rigorous proofs of our results stated in Section 3.
Proof of Theorem 1. Assume that ; defined by (10) takes derivatives with respect to ; we have
which implies that
Multiplying at both sides of (25) and taking sum and noting that and , we can derive that
Substituting (26) into (25), we have
By Theorem 1 in Baggerly , there exists a constant such that
which implies that , where . Substituting this expression into (27), we have
Therefore, we can obtain that
where and . The next step is to bound the convergence rate of . Let , where . Invoking Lemma 3.2 in Künsch  and , it is easy to show that . Applying the Taylor expansion to , we have
where . Assume that , . By the central limit theorem, we have and . Let ; it is easy to show that
If (or ), then the dominant term in the right-hand side of (31) is , while , which implies that the sum cannot be zero unless . Therefore, we obtain that . By (31), we have
For convenience, let . We now find an approximation for in terms of . By and applying the Taylor expansion, we have where and is defined by (33). Rearranging (34) and ignoring terms of order , we have That is, From (33) and (36), we have By (33) and (37) and applying the Taylor expansion for , it is easy to show that
Similar to the proof of Lemma 1 in Qin and Lawless , let , for , where . We first give a lower bound for on the surface of the ball. When and , we can obtain that holds uniformly for . From (32), (38), and (39), we have where and is the smallest eigenvalue of Similarly, It is known that, when , is a continuous function about and has a minimum value in the interior of this ball. In addition, satisfies
Proof of Theorem 2. We take derivatives for , , and with respect to , , and ; we have By the conditions and Theorem 1 and the Taylor expanding of , , and at , we have where . We further have By the above expression, we can obtain that where