Abstract
Imputation is a popular technique for handling missing data especially for plenty of missing values. Usually, the empirical log-likelihood ratio statistic under imputation is asymptotically scaled chi-squared because the imputing data are not i.i.d. Recently, a bias-corrected technique is used to study linear regression model with missing response data, and the resulting empirical likelihood ratio is asymptotically chi-squared. However, it may suffer from the “the curse of high dimension” in multidimensional linear regression models for the nonparametric estimator of selection probability function. In this paper, a parametric selection probability function is introduced to avoid the dimension problem. With the similar bias-corrected method, the proposed empirical likelihood statistic is asymptotically chi-squared when the selection probability is specified correctly and even asymptotically scaled chi-squared when specified incorrectly. In addition, our empirical likelihood estimator is always consistent whether the selection probability is specified correctly or not, and will achieve full efficiency when specified correctly. A simulation study indicates that the proposed method is comparable in terms of coverage probabilities.
1. Introduction
Consider the following multidimensional linear regression model: where is a scalar response variable, is a vector of design variable, is a vector of regression parameter, and the errors are independent random variables with , . Suppose that we have incomplete observations , , from this model, where all the are observed, and if is missing, otherwise. Throughout this paper, we assume that is missing at random (MAR); that is, the probability that is missing may depend on but not on, that is, . We focus on constructing confidence regions on with the incomplete data , .
The empirical likelihood (EL) method, introduced by Owen [1], has many advantages over normal approximation methods for constructing confidence intervals [1]. One is that it produces confidence intervals and regions whose shape and orientation are determined entirely by the data; the other is that empirical likelihood regions are range preserving and transformation respecting. But it cannot directly to be used in missing responses situation. A natural method is to impute the predictor of based on the completely observed pairs, which is provided by Wang and Rao [2]. Unfortunately, the empirical log-likelihood ratio under this imputation is always asymptotically scaled chi-squared. Therefore, the empirical log-likelihood ratio cannot be applied directly to make a statistical inference on the parameter.
Recently, a bias-corrected technique combined imputation method, and Horvitz-Thompson inverse-selection weighted method is separately explored by Xue [3] and Qin et al. [4]. The provided empirical likelihood ratios obey Wilks theorem and the empirical likelihood estimator is consistent. However, the true selection probability function is ordinarily unknown, its nonparametric estimator may suffer from “the curse of high dimensionality” in multidimensional linear regression models.
To avoid “the curse of high dimensionality”, it is customary to suppose a parametric selection probability function, but the risk of misspecified function maybe exist. In this paper, we consider the following two situations, one is that the selection probability is specified correctly, the other is that specified incorrectly. To the best of our knowledge, this issue is rarely to be discussed. With the similar bias-corrected method, the provided empirical likelihood statistic is asymptotically chi-squared in the first situation and asymptotically weighted sum of chi-squared in the second situation. In addition, the following desired feature is worth mentioning. The auxiliary random vector to construct the empirical likelihood statistic is just the same form as that of proposed by Robins et al. [5]. The Robins’ estimator has the characteristic of “doubly robust”, that is, the estimator is asymptotically unbiased either the underlying missing data mechanism or the underlying regression function is correctly specified. Because the underlying regression function is asymptotically correct in our situation, our estimator is always consistent in both cases. Even our estimator can achieve asymptotically full efficiency in the first situation. From this point, it is feasible to use the parametric selection probability function to construct an EL statistic.
The rest of the paper is organized as follows. In Section 2, separately, the EL statistic for is constructed and the asymptotically results are shown in the two situations. Section 3 reports some simulation results on the performance of the proposed EL confidence region. The proof of the main result is given in the Appendix.
2. Main Result
For simplicity, denote the true selection probability function by and a specified probability distribution function by for given , a unknown vector parameter. Thus, the first situation means for some , where is the true parameter and the second situation means for any . And let be the least square estimator of based on the completely observed pairs , that is,.
2.1. Empirical Likelihood
In this subsection, an empirical likelihood statistic is conducted, and then some asymptotic results are shown when the selection probability is specified correctly, that is, for some .
Since the design variable is observable for each subject, the maximum likelihood estimator can be obtained by maximizing the likelihood function
The following regularity assumptions on are sufficiently strong to ensure both consistency and asymptotic normality of the maximum likelihood estimator. Suppose is some field of where is the true parameter:(C1), and exist in for all ;(C2) in , and ;(C3), , .
Then, we use , , as “complete” data set for to construct the auxiliary random vectors Thus, an empirical log-likelihood ratio is defined as Further, the maximum empirical likelihood estimator of is to maximize .
To ensure asymptotic results, the following assumptions are needed:(C4) is uniformly continued in for all ;(C5) is a positive definite matrix, where and ;(C6) has bounded almost surely and in;
where condition (C4) is common for some selection probability function. Condition, (C5)-(C6) are necessary for asymptotic normality of the maximum empirical likelihood estimator.
Theorem 2.1. Suppose that conditions (C1)–(C6) above hold. If is the true parameter and is specified correctly, then
where means the chi-square variable with degrees of freedom. represents the convergence in distribution.
Let the be the quartile of the for . Using Theorem 2.1, we obtain an approximate confidence region for , defined by
Theorem 2.1 can also be used to test the hypothesis . One could reject if .
Remark 2.2. In general, plug-in empirical likelihood will asymptotically lead to a sum of weighted variables with unknown weights for the EL statistics proposed. However, when is specified correctly, the EL statistic with two plug-ins has the limiting distribution of which is due to the following reasons. Firstly, the bias-correction method, that is, the selection function as inverse weight will eliminate the influence by the . Secondly, the estimating function has special structure, that is, the influence of will be also eliminated if is concluded in the denominator of function.
Theorem 2.3. Under Conditions (C1)–(C6), if is specified correctly, then where .
To apply Theorem 2.3 to construct the confidence region of , we give the estimator of , say , where and are defined by , .
It is easily proved that is a consistent estimator of . Thus, by Theorem 2.3, we have where is an identity matrix of order . Using (10.2d) in Arnold [6], we can obtain Therefore, the confidence region of can be constructed by using (2.8).
2.2. Adjusted Empirical Likelihood When
In this subsection, we also construct the empirical likelihood statistic and discuss some asymptotic results when the selection probability is specified incorrectly, that is, for any .
Since the design variable is observable for each subject, the quasi-maximum likelihood estimator , other than the maximum likelihood estimator, can be obtained by maximizing the likelihood function under some regularity assumptions.
The following regularity assumptions are sufficiently strong to ensure both consistency and asymptotic normality of the quasi-maximum likelihood estimator. Let , , , . It is natural that is the true density function of , and does not contain the true structure . The Kullback-Leibler Information Criterion (KLIC) can be defined by , here, and in what follows, expectations are taken with respect to the true distribution . When expectations of the partial derivatives exist, we define the matrices , .(C7) is measurable on .(C8) are measurable in for every in , a compact subset of a , and continuous for every in .(C9) (a) exists and for all in , where is integrable with respect to ; (b) has a unique minimum at in .(C10),, are measurable functions of for each in and continuously second-order differentiable functions of for each in .(C11) and , are dominated by functions integrable with respect to for all in and in .(C12)(a) is interior to ; (b) is nonsingular; (c) is a regular point of .
Then, we use , as “complete” data set for to construct the auxiliary random vectors Thus, an empirical log-likelihood ratio is defined as And the maximum empirical likelihood estimator of is to maximize .
To ensure asymptotic results, the following assumptions are needed.(C4′) is uniformly continued in for all ;(C5′) is a positive definite matrix, where , , , , , , and ;(C6′) has bounded almost surely and ,
where condition (C4′) is common for some selection probability function. Condition (C5′)-(C6′) are necessary for asymptotic normality of the maximum empirical likelihood estimator.
Theorem 2.4. Suppose that conditions (C7)–(C12) and (C4′)–(C6′) above hold. If is the true parameter and is specified incorrectly, then where means the chi-square variable with 1 degrees of freedom. The weights , are the eigenvalues of matrix . represents the convergence in distribution.
Let be adjustment factor. Along the lines of [7], it is straightforward to show that .
Let , , . We define an adjusted log-likelihood ratio by
Corollary 2.5. Under the conditions of Theorem 2.4, one has Let the be the quartile of the for . Using Corollary 2.5, we obtain an approximate confidence region for , defined by Corollary 2.5 can also be used to test the hypothesis . One could reject if .
Note that , when is close to correct one. Actually, the adjustment factor reflects information loss due to the misspecification of .
Theorem 2.6. Under the conditions of Theorem 2.4, one has where .
To apply Theorem 2.6 to construct the confidence region of , we give the estimator of , say , where and are defined by , , , , , , .
It is easily proved that is a consistent estimator of . Thus, by Theorem 2.6, we have where is an identity matrix of order . Using (10.2d) in Arnold [6], we can obtain Therefore, the confidence region of can be constructed by using (2.18).
Remark 2.7. The estimator proposed by Robins in this situation is to solve Noted that that is, in that case the underlying regression function can asymptotically correctly specified. So whether missing data mechanism is specified correctly or not, the estimator is always consistent and the estimator can achieve asymptotic full efficiency when specified correctly.
3. Simulation
Due to the curse of nonparametric estimation, Xue’s method may be hard to realize. Here we conducted an extensive simulation study to compare the performances of the weighted-corrected empirical likelihood(WCEL) proposed in this paper and Wang’s method (AEL) proposed in Wang and Rao under the covariates of four dimensions.
We considered the linear model (1.1) with and , where was generated from a four-dimensions standard normal distribution, and was generated from the normal distribution with mean zero and variance 0.04.
In the first case, the real selection probability function was taken to be . We considered equaled to the three following values , , , respectively.
In the second case, the real selection probability function was taken to be the following three cases:
Case 1. if , and 0.9 elsewhere.
Case 2. if , and 0.9 elsewhere.
Case 3. for all .
We generated 5000 Monte Carlo random samples of size , 200, and 500 based on the above six selection probability functions . When the working model was , the empirical coverage probabilities for , with a nominal level 0.95, were computed based on the above two methods with 5000 simulation runs. The results are reported in Table 1.
From Table 1, we can obtain the following results. Firstly, under both cases, WCEL performs better than AEL because its confidence regions have uniformly higher coverage probabilities. Secondly, all the empirical coverage probabilities increase as increases for every fixed missing rate. Observably, the missing rate also affects coverage probability. Generally, the coverage probability decreases as the missing rate increases for every fixed sample size. However, under Case I, the values do hardly change by a large amount for both methods because of the exponential selection probability function.
4. Conclusion
In this paper, a parametric selection probability function is introduced to avoid the dimension difficulty, and a bias-corrected technique leads to an empirical likelihood (EL) statistic with asymptotically chi-square distribution when the selection probability specified correctly and with asymptotically weighted chi-square distribution when specified incorrectly. Also, our estimator is always consistent and will achieve asymptotic full efficiency when selection probability function is specified correctly.
Appendix
A. Proofs
Lemma A.1. Suppose that (C1)–(C4) hold. Then .
Proof of Lemma A.1. Under conditions (C1)–(C3), it can be shown that is consistent: Together with condition (C4), we have By Taylor expanding, it is easily shown that
Lemma A.2. Suppose that (C7)–(C12) hold. Then .
Proof of Lemma A.2. From Theorem 3 of White [8], under conditions (C7)–(C12), it can be shown that is consistent: Similar to Lemma A.1, it is easy to show that .
Lemma A.3. Suppose that (C7)–(C12) and (C4′)–(C6′) hold. If is specified wrongly, then
Proof of Lemma A.3. We prove (A.5) only; (A.6) can be proved similarly.
By direct calculation, we have
It is easily to shown that
Note that
so we have
Therefore, by using (A.8)–(A.10), the proof of (A.5) is completed.
We just proof Theorems 2.4 and 2.6, the proofs of Theorems 2.1 and 2.3 are similar, only to replace the by separately.
Proof of Theorem 2.4. By the Lagrange multiplier method, can be represented as
where is a vector given as the solution to
by Lemma A.2, and using the same arguments as that of the proof (A.4) of in [9], we can show that
Applying the Taylor expansion to (A.11) and (A.13), we get that
By (A.12), it follows that
This together with Lemma A.3 and (A.13) proves that
Therefore, from (A.14) we have
This together with Lemma A.3 completes Theorem 2.4.
Proof of Theorem 2.6. From Theorem 1 of Qin [10] and (A.5), we obtain the result of Theorem 2.6 directly.
Acknowledgment
This paper is supported by NSF projects (2011YJYB005, 2011SSQD002 and 2011YJQT01) of Changji College Xinjiang Uygur Autonomous Region of China.