Research Article | Open Access
Adjusted Empirical Likelihood for Varying Coefficient Partially Linear Models with Censored Data
By constructing an adjusted auxiliary vector ingeniously, we propose an adjusted empirical likelihood ratio function for the parametric components of varying coefficient partially linear models with censored data. It is shown that its limiting distribution is standard central chi-squared. Then the confidence intervals for the parametric components are constructed. A simulation study and a real data analysis are undertaken to assess the finite sample performance of the proposed method.
Outcome censoring often occurs in many disciplines such as econometrics, biostatistics, and bioinformatics. There have been many recent researches in the area of statistical inference for censored data. Wang and Zheng  and Wang and Li  considered the estimation problem for partly linear models based on different methods. Yang et al.  proposed an empirical likelihood method for a partially linear single-index model with right censored data. More results on the statistical inferences of censored data can be seen in [4–6]. In this paper, we consider the empirical likelihood inferences for a varying coefficient partially linear model with right censored data. Specifically, let be the response variable and let be associated covariates. We consider the following varying coefficient partially linear model: where is a vector of unknown parameters, is a vector of unknown function, and is the model error such as . Due to the curse of dimensionality, as in , we assume that is univariate. In this paper, we assume that the available data , , are independent and identically distributed, where , and is a censoring variable. We also assume that are . variables with a distribution function and is independent of the prognostic variables and the response .
Model (1) has proved to be very useful as it combines the flexibility of nonparametric models and the interpretation of linear models. Recently, a variety of methods have been proposed for estimating the parameters and nonparametric functions in model (1) based on the uncensored data (see [7, 8]). In this paper, we recommend using the empirical likelihood method to make inference for in the model (1) under right censored data. We define an empirical log-likelihood function based on the synthetic data and show that its asymptotic distribution is a mixture of central chi-squared distribution. Hence, the confidence regions for parametric components can be constructed if the unknown weights are estimated. In addition, by constructing an adjusted auxiliary vector ingeniously, we also propose an adjusted empirical likelihood ratio function for the parametric components and show that its limiting distribution is a standard central chi-squared distribution. Then the confidence intervals for parametric components are constructed and the estimation for the unknown weights is avoided. To demonstrate the performance of the proposed method, based on some numerical experiments such as the simulation study and the real data analysis, we conducted a comparison with the normal approximation method. Compared with the normal approximation method, the confidence intervals based on the empirical likelihood method perform fairly well.
The confidence intervals obtained by the empirical likelihood method possess several attractive features compared to the conventional Wald-type confidence intervals, such as circumvention of asymptotic variance estimation to compute the standard error, flexible shapes of the confidence intervals determined by data and range-preserving property (see ). Wang and Li  considered the adjusted empirical likelihood inferences for a class of partially linear models with right censored data. Yang et al.  considered the adjusted empirical likelihood inferences for a partially linear single-index model with right censored data. This paper also contributes to the rapidly growing literature on the adjustment-based empirical likelihood method and provides additional positive results for the Wilks phenomena in the parametric components of varying coefficient partially linear models with censored data, which extends the application literature of the adjusted empirical likelihood method. The more works on empirical likelihood methods can be found in [10, 11].
2. Methodology and Results
As in , let ; then we have . This implies that where . When is known as well, model (2) is a standard semiparametric varying coefficient partially linear model. For given , using the same arguments as Fan and Huang , we can get the weighted local least-squares estimator of by minimizing where , is a kernel function, is a bandwidth, and denotes the th component of .
Let , let , let be identity matrix, let be zero matrix, let be diagonal matrix. Then, the solution to minimize (3) can be given by where .
Let , let , and let . Then we have . Substituting this into (2) and by a simple calculation we have where , . To construct the empirical likelihood ratio function for , we introduce the auxiliary random vector
By (5) we have that is true if and only if . Using such information, we can define a profile empirical log-likelihood ratio function for . However, is usually unknown, and cannot be used directly to make inference for . To solve this problem, we replace in by its estimator. In this paper, we employ the Kaplan-Meier estimator where . Hence, an estimator of can be defined as where and . Then, we can define the profile empirical log-likelihood ratio function for as where , . A unique value for exists, provided that is inside the convex hull of the points . Based on the method of Lagrange multipliers to find the optimal , then the empirical log-likelihood ratio function can be represented as where is a vector, which satisfies Let The following theorem gives the asymptotic distribution of .
Theorem 1. Suppose that conditions (C1)–(C5) in the Appendix hold. If is the true parameter, then where are the eigenvalues of , , and are independent standard chi-square random variables with degree of freedom.
From Theorem 1, we can see that the asymptotic distribution of is a mixture of central chi-square distributions. Hence, the confidence regions for the parametric components can be constructed if the unknown weights are estimated. Next, we give an adjusted empirical log-likelihood ratio function that has an asymptotic standard chi-square distribution. Then the confidence regions for the parametric components are constructed, and the estimation for the unknown weights is avoided. Let be the Kapaln-Meier estimator for the distribution function of , and Then, as in , an adjusted empirical log-likelihood ratio function can be defined by where . The following theorem shows that the asymptotic distribution of can be approximated by standard chi-square distribution.
Theorem 2. Suppose that conditions (C1)–(C5) in the Appendix hold. If is the true parameter, then
Let be the quantile of , . Theorem 2 implies that an approximate confidence region for can be defined by
3. Numerical Results
In this section, we conduct several simulation experiments to illustrate the finite sample performances of the proposed method and consider a real data set analysis for further illustration.
3.1. Simulation Studies
To evaluate the performance of the proposed adjustment-based empirical likelihood (AEL) method, we present some simulation results. In the simulation, we simulated data from the following model: where , . We generated and subjects, respectively, and the covariates , , and . is generated according to the model with . The censoring variable , where , and , respectively, such that the corresponding censoring rates (CR) are about , and . We use the Epanechnikov kernel function , and the cross-validation bandwidth is obtained by minimizing where and are estimators of and which are computed with all of the measurements, but the th subject is deleted. can be obtained by minimizing (10), can be obtained by replacing , and in (4) by and , respectively.
For comparison, we consider two approaches for constructing the confidence intervals: the AEL method proposed in this paper and normality approximation (NA) method proposed by . To construct confidence intervals based on the NA method, we need to estimate the asymptotic variance. However, note that the asymptotic variance of the estimator has a complicated structure; then we estimate the asymptotic variance by the bootstrap method. The average lengths and coverage probabilities of the confidence intervals are shown in Table 1. Here the nominal level is taken as , and the simulation is computed with simulation runs. From Table 1, we can see the following observations.(i)For any given level of CR and the size of sample, although the coverage probabilities of the confidence intervals obtained by the AEL method and the NA method are similar, the lengths of the confidence intervals obtained by the AEL method are shorter than those obtained by the NA method. This implies that the confidence intervals obtained by the AEL method outperform those obtained by the NA method.(ii)For the given size of sample , the performances of the confidence intervals, obtained by the AEL method under the moderate censoring rate, are all close to the nominal level 95%. This implies that the adjustment scheme is workable for the moderate censoring rate. However, the simulation results also suggest that larger samples would be needed when the censoring rate is relatively high such as .
3.2. Application to CGD Data
We illustrate the applicability of our proposed method using the chronic granulomatous disease (CGD) data set from the International CGD Cooperative Study Group. This data set was designed to have a single interim analysis when the follow-up data as of July 15, 1989 were complete. The monitoring committee for the trial terminated the trial at a meeting on September 22, 1989. The treatment given each patient was unblinded at the first scheduled visit for the patient following the decision of the monitoring committee. More details for this data description and analysis can be seen in .
The variables contained here are : treatment code, rIFN, placebo; : pattern of inheritance, -linked, autosomal recessive; : age, in years; : height, in cm; : weight, in kg; : using corticosteroids at time of study entry, yes, no; : using prophylactic antibiotics at time of study entry, yes, no; : male, female; : hospital category, US-NIH, US-other, Europe, Amsterdam, Europe, other; : elapsed time (in days) from randomization to diagnosis of a serious infection or if a censored observation: elapsed time from randomization to censoring date; : censoring indicator, Noncensored observation, censored observation. : sequence number. For each patient, the infection records are in sequence number order.
Based on the generalized likelihood ratio testing, Jiang and Qian  show that this data set can be fitted by the following model: where , , and as the intercept term. The estimators and confidence intervals are shown in Table 2. From Table 2, we see that the covariates , , and have somewhat stronger associations with the response. In addition, we also can see that the confidence intervals obtained by the AEL method are workable.
A. Proof of Theorems
For convenience and simplicity, let denote a positive constant which may be different value at each appearance throughout this paper. Before we prove our main theorems, we list some regularity conditions which are used in this paper.(C1) The bandwidth , for some constant . The kernel is a symmetric probability density function, and .(C2), , , and are twice continuously differentiable on , where .(C3) The density function of , says , is bounded away from and infinity on and is continuously differentiable on .(C4) For , and do not have common jumps, where . Moreover, we assume that (C5) For given , is a positive definite matrix, and is nonsingular.
The proofs of theorems rely on the following lemmas.
Lemma 3. Let be i.i.d. random vectors, where is scalar random variables. Assume that and , where denotes the joint density of . Let be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Then provided that , for some .
Proof. This follows immediately from the result that was obtained by Mack and Silverman .
Lemma 4. Suppose that conditions (C1)–(C5) hold. Then where .
Proof. Let . Similar to the proof of Lemma 7.2 in , a simple calculation yields
Note that , by Lemma 3, we obtain
uniformly for , where is the Kronecker product. Using the same argument, we have
uniformly for . Combining (A.5) and (A.6) yields
uniformly for .
Invoking , with the similar argument, we can prove . This completes the proof of Lemma 4.
Lemma 5. Suppose that conditions (C1)–(C5) hold. If is the true parameter, then where is defined in Theorem 1.
Proof. By (8), we have
A simple calculation yields
By the Central Limits Theorem, it is easy to prove . Next, we prove . Using Abel inequality, invoking , and using Lemma 4, we can prove that that is, . Invoking , and with a similar argument, we can prove . In addition, by Lemma 4, we have . Hence, we get that By Wang and Zheng , we have Then, using the similar argument to Yang et al. , we can prove that Together with (A.12), we complete the proof of Lemma 5.
Proof of Theorem 1. From the proof of Lemma 5, it is easy to show that Then, using the arguments similar to Owen , we can obtain . Together with Lemma 5, applying the Taylor expansion to (10), we get that By (11), it follows that Then, it is easy to show that Invoking (A.16)–(A.18), by some algebra calculations, we have where . Invoking the proof of Lemma 5, we can obtain that . Hence, we get where . Let be the eigenvalues of , and let . Note that has the same eigenvalues as . Hence, there exists orthogonal matrix such that . Then, with a simple calculation we get that Note that is an orthogonal matrix, and by Lemma 5, Theorem 1 can be proved.
This research was supported by the National Natural Science Foundation of China (11101119 and 11126332), the National Social Science Foundation of China (11CTJ004), the Natural Science Foundation of Guangxi (2010GXNSFB013051), and the Philosophy and Social Sciences Foundation of Guangxi (11FTJ002).
- Q. Wang and Z. Zheng, “Asymptotic properties for the semiparametric regression model with randomly censored data,” Science in China A, vol. 40, no. 9, pp. 945–957, 1997.
- Q. H. Wang and G. Li, “Empirical likelihood semiparametric regression analysis under random censorship,” Journal of Multivariate Analysis, vol. 83, no. 2, pp. 469–486, 2002.
- Y. P. Yang, L. G. Xue, and W. H. Cheng, “An empirical likelihood method in a partially linear single-index model with right censored data,” Acta Mathematica Sinica, vol. 28, no. 5, pp. 1041–1060, 2012.
- X. Lu and T. L. Cheng, “Randomly censored partially linear single-index models,” Journal of Multivariate Analysis, vol. 98, no. 10, pp. 1895–1922, 2007.
- Q. Wang and L. Xue, “Statistical inference in partially-varying-coefficient single-index model,” Journal of Multivariate Analysis, vol. 102, no. 1, pp. 1–19, 2011.
- Z. Huang, “Empirical likelihood-based inference in varying-coefficient single-index models,” Journal of the Korean Statistical Society, vol. 40, no. 2, pp. 205–215, 2011.
- J. Fan and T. Huang, “Profile likelihood inferences on semiparametric varying-coefficient partially linear models,” Bernoulli, vol. 11, no. 6, pp. 1031–1057, 2005.
- W. Zhang, S. Y. Lee, and X. Song, “Local polynomial fitting in semivarying coefficient model,” Journal of Multivariate Analysis, vol. 82, no. 1, pp. 166–188, 2002.
- A. B. Owen, “Empirical likelihood ratio confidence regions,” Annals of Statistics, vol. 18, no. 1, pp. 90–120, 1990.
- L. Xue, “Empirical likelihood local polynomial regression analysis of clustered data,” Scandinavian Journal of Statistics, vol. 37, no. 4, pp. 644–663, 2010.
- Y. P. Yang, L. G. Xue, and W. H. Cheng, “Empirical likelihood for a partially linear model with covariate data missing at random,” Journal of Statistical Planning and Inference, vol. 139, no. 12, pp. 4143–4153, 2009.
- X. Luo, Y. Li, Y. Ma, and Y. Zhou, “Varying-coefficient partially linear models with censored data,” Acta Mathematica Scientia, vol. 26, pp. 79–92, 2010.
- T. R. Fleming and D. P. Harrington, Counting Processes and Survival Analysis, John Wiley and Sons, New York, NY, USA, 2005.
- R. Jiang and W. M. Qian, “Generalized likelihood ratio tests for varying-coefficient models with censored data,” Open Journal of Statistics, vol. 1, no. 1, pp. 19–23, 2011.
- Y. P. Mack and B. W. Silverman, “Weak and strong uniform consistency of kernel regression estimates,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 61, no. 3, pp. 405–415, 1982.
Copyright © 2013 Peixin Zhao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.