Abstract

Imputation is a popular technique for handling missing data especially for plenty of missing values. Usually, the empirical log-likelihood ratio statistic under imputation is asymptotically scaled chi-squared because the imputing data are not i.i.d. Recently, a bias-corrected technique is used to study linear regression model with missing response data, and the resulting empirical likelihood ratio is asymptotically chi-squared. However, it may suffer from the “the curse of high dimension” in multidimensional linear regression models for the nonparametric estimator of selection probability function. In this paper, a parametric selection probability function is introduced to avoid the dimension problem. With the similar bias-corrected method, the proposed empirical likelihood statistic is asymptotically chi-squared when the selection probability is specified correctly and even asymptotically scaled chi-squared when specified incorrectly. In addition, our empirical likelihood estimator is always consistent whether the selection probability is specified correctly or not, and will achieve full efficiency when specified correctly. A simulation study indicates that the proposed method is comparable in terms of coverage probabilities.

1. Introduction

Consider the following multidimensional linear regression model:𝑌𝑖=𝑋𝑖𝛽+𝜖𝑖,1𝑖𝑛,(1.1) where 𝑌𝑖 is a scalar response variable, 𝑋𝑖 is a 𝑝×1 vector of design variable, 𝛽 is a 𝑝×1 vector of regression parameter, and the errors 𝜖𝑖 are independent random variables with 𝐸[𝜖𝑖𝑋𝑖]=0, Var[𝜖𝑖𝑋𝑖]=𝜎2. Suppose that we have incomplete observations (𝑋𝑖,𝑌𝑖,𝛿𝑖), 𝑖=1,2,,𝑛, from this model, where all the 𝑋𝑖𝑠 are observed, and 𝛿𝑖=0 if 𝑌𝑖 is missing, 𝛿𝑖=1 otherwise. Throughout this paper, we assume that 𝑌 is missing at random (MAR); that is, the probability that 𝑌 is missing may depend on 𝑋 but not on, 𝑌 that is, 𝑃(𝛿=1𝑋,𝑌)=𝑃(𝛿=1𝑋). We focus on constructing confidence regions on 𝛽 with the incomplete data (𝑋𝑖,𝑌𝑖,𝛿𝑖), 𝑖=1,2,,𝑛.

The empirical likelihood (EL) method, introduced by Owen [1], has many advantages over normal approximation methods for constructing confidence intervals [1]. One is that it produces confidence intervals and regions whose shape and orientation are determined entirely by the data; the other is that empirical likelihood regions are range preserving and transformation respecting. But it cannot directly to be used in missing responses situation. A natural method is to impute the predictor of 𝑌 based on the completely observed pairs, which is provided by Wang and Rao [2]. Unfortunately, the empirical log-likelihood ratio under this imputation is always asymptotically scaled chi-squared. Therefore, the empirical log-likelihood ratio cannot be applied directly to make a statistical inference on the parameter.

Recently, a bias-corrected technique combined imputation method, and Horvitz-Thompson inverse-selection weighted method is separately explored by Xue [3] and Qin et al. [4]. The provided empirical likelihood ratios obey Wilks theorem and the empirical likelihood estimator is consistent. However, the true selection probability function is ordinarily unknown, its nonparametric estimator may suffer from “the curse of high dimensionality” in multidimensional linear regression models.

To avoid “the curse of high dimensionality”, it is customary to suppose a parametric selection probability function, but the risk of misspecified function maybe exist. In this paper, we consider the following two situations, one is that the selection probability is specified correctly, the other is that specified incorrectly. To the best of our knowledge, this issue is rarely to be discussed. With the similar bias-corrected method, the provided empirical likelihood statistic is asymptotically chi-squared in the first situation and asymptotically weighted sum of chi-squared in the second situation. In addition, the following desired feature is worth mentioning. The auxiliary random vector to construct the empirical likelihood statistic is just the same form as that of proposed by Robins et al. [5]. The Robins’ estimator has the characteristic of “doubly robust”, that is, the estimator is asymptotically unbiased either the underlying missing data mechanism or the underlying regression function is correctly specified. Because the underlying regression function is asymptotically correct in our situation, our estimator is always consistent in both cases. Even our estimator can achieve asymptotically full efficiency in the first situation. From this point, it is feasible to use the parametric selection probability function to construct an EL statistic.

The rest of the paper is organized as follows. In Section 2, separately, the EL statistic for 𝛽 is constructed and the asymptotically results are shown in the two situations. Section 3 reports some simulation results on the performance of the proposed EL confidence region. The proof of the main result is given in the Appendix.

2. Main Result

For simplicity, denote the true selection probability function by 𝑝0(𝑥) and a specified probability distribution function by 𝑝(𝑥,𝜃) for given 𝜃, a 𝑝×1 unknown vector parameter. Thus, the first situation means 𝑝0(𝑥)=𝑝(𝑥,𝜃0) for some 𝜃0, where 𝜃0 is the true parameter and the second situation means 𝑝0(𝑥)𝑝(𝑥,𝜃) for any 𝜃. And let ̂𝛽𝑟 be the least square estimator of 𝛽 based on the completely observed pairs (𝑋𝑖,𝑌𝑖,𝛿𝑖=1), 𝑖=1,2,,𝑛 that is,̂𝛽𝑟=(𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖)1𝑛𝑖=1𝛿𝑖𝑋𝑖𝑌𝑖.

2.1. Empirical Likelihood

In this subsection, an empirical likelihood statistic is conducted, and then some asymptotic results are shown when the selection probability is specified correctly, that is, 𝑝0(𝑥)=𝑝(𝑥,𝜃0) for some 𝜃0.

Since the design variable 𝑋𝑖 is observable for each subject, the maximum likelihood estimator ̂𝜃 can be obtained by maximizing the likelihood function𝐿(𝜃)=𝑛𝑖=1𝑝𝑋𝑖,𝜃𝛿𝑖𝑋1𝑝𝑖,𝜃1𝛿𝑖.(2.1)

The following regularity assumptions on 𝐿(𝜃) are sufficiently strong to ensure both consistency and asymptotic normality of the maximum likelihood estimator. Suppose 𝑈(𝜃0) is some field of 𝜃0 where 𝜃0 is the true parameter:(C1)𝜕ln𝐿𝐵(𝜃)/𝜕𝜃, 𝜕2ln𝐿𝐵(𝜃)/𝜕𝜃2 and 𝜕3ln𝐿𝐵(𝜃)/𝜕𝜃3 exist in 𝑈(𝜃0) for all 𝑋;(C2)|𝜕3ln𝐿𝐵(𝜃)/𝜕𝜃3|𝐻(𝑋) in 𝑈(𝜃0), and EH(𝑋)<;(C3)𝐸𝜃0[ln𝐿𝐵(𝜃)/ln𝐿𝐵(𝜃)]=0, 𝐸𝜃0[ln𝐿𝐵(𝜃)/ln𝐿𝐵(𝜃)]=0, 𝐼(𝜃0)=𝐸𝜃0[ln𝐿𝐵(𝜃)/ln𝐿𝐵(𝜃)]2>0.

Then, we use 𝑌𝑖=(𝛿𝑖/𝑝(𝑋𝑖,̂𝜃))𝑌𝑖+(1𝛿𝑖/𝑝(𝑋𝑖,̂𝜃))𝑋𝑖̂𝛽𝑟, 𝑖=1,2,,𝑛, as “complete” data set for 𝑌 to construct the auxiliary random vectors 𝑍in(𝛽)=𝑋𝑖𝑌𝑖𝑋𝑖𝛽=𝑋𝑖𝛿𝑖𝑝𝑋𝑖,̂𝜃𝑌𝑖𝑋𝑖𝛽+𝛿1𝑖𝑝𝑋𝑖,̂𝜃𝑋𝑖̂𝛽𝑟𝛽.(2.2) Thus, an empirical log-likelihood ratio is defined as𝑙𝑛(𝛽)=2max𝑛𝑖=1log𝑛𝜔𝑖𝜔𝑖0,𝑛𝑖=1𝜔𝑖=1,𝑛𝑖=1𝜔𝑖𝑍in.(𝛽)=0(2.3) Further, the maximum empirical likelihood estimator ̃𝛽 of 𝛽 is to maximize {𝑙𝑛(𝛽)}.

To ensure asymptotic results, the following assumptions are needed:(C4)𝑝(𝑥,𝜃) is uniformly continued in 𝑈(𝜃0) for all 𝑋;(C5)𝐴,𝐷1 is a positive definite matrix, where 𝐴=𝐸{𝑋𝑋𝑇} and 𝐷1=𝐸{{1/𝑝(𝑋,𝜃0)}𝑋𝑋𝜀2};(C6)𝑝(𝑥,𝜃0) has bounded almost surely and in𝑓𝑥𝑝(𝑥,𝜃0)>0;

where condition (C4) is common for some selection probability function. Condition, (C5)-(C6) are necessary for asymptotic normality of the maximum empirical likelihood estimator.

Theorem 2.1. Suppose that conditions (C1)–(C6) above hold. If 𝛽 is the true parameter and 𝑝(𝑥,𝜃) is specified correctly, then 𝑙𝑛(𝛽)𝐷𝜒2𝑝,(2.4) where 𝜒2𝑝 means the chi-square variable with 𝑝 degrees of freedom. 𝐷 represents the convergence in distribution.
Let the 𝜒2𝑝(1𝛼) be the 1𝛼 quartile of the 𝜒2𝑝 for 0<𝛼<1. Using Theorem 2.1, we obtain an approximate 1𝛼 confidence region for 𝛽, defined by ̌𝛽𝑙𝑛̌𝛽𝜒2𝑝(1𝛼).(2.5) Theorem 2.1 can also be used to test the hypothesis 𝐻0𝛽=𝛽0. One could reject 𝐻0 if 𝑙𝑛(̌𝛽)>𝜒2𝑝(1𝛼).

Remark 2.2. In general, plug-in empirical likelihood will asymptotically lead to a sum of weighted 𝜒21 variables with unknown weights for the EL statistics proposed. However, when 𝑝(𝑥,𝜃) is specified correctly, the EL statistic with two plug-ins has the limiting distribution of 𝜒2𝑝 which is due to the following reasons. Firstly, the bias-correction method, that is, the selection function as inverse weight will eliminate the influence by the ̂𝛽𝑟. Secondly, the estimating function has special structure, that is, the influence of ̂𝜃 will be also eliminated if 𝜃 is concluded in the denominator of function.

Theorem 2.3. Under Conditions (C1)–(C6), if 𝑝(𝑥,𝜃) is specified correctly, then 𝑛̂𝛽𝛽𝐷𝑁0,Σ1,(2.6) where Σ1=𝐴1𝐷1𝐴1.

To apply Theorem 2.3 to construct the confidence region of 𝛽, we give the estimator of Σ1, say Σ1=𝐴1𝐷1𝐴1, where 𝐴 and 𝐷1 are defined by 𝐴=(1/𝑛)𝑛𝑖=1𝑋𝑖𝑋𝑇𝑖, 𝐷1=(1/𝑛)𝑛𝑖=1̂(1/𝑝(𝑋,𝜃))𝑋𝑖𝑋𝑇𝑖(𝑌𝑖𝑋𝑖̂𝛽𝑟)2.

It is easily proved that Σ1 is a consistent estimator of Σ1. Thus, by Theorem 2.3, we have Σ11/2𝑛̂𝛽𝛽𝐷𝑁0,𝐼𝑝,(2.7) where 𝐼𝑝 is an identity matrix of order 𝑝. Using (10.2d) in Arnold [6], we can obtain̂𝛽𝛽𝑇𝑛Σ11̂𝛽𝛽𝐷𝜒2𝑝.(2.8) Therefore, the confidence region of 𝛽 can be constructed by using (2.8).

2.2. Adjusted Empirical Likelihood When 𝑝0(𝑥)𝑝(𝑥,𝜃)

In this subsection, we also construct the empirical likelihood statistic and discuss some asymptotic results when the selection probability is specified incorrectly, that is, 𝑝0(𝑥)𝑝(𝑥,𝜃) for any 𝜃.

Since the design variable 𝑋𝑖 is observable for each subject, the quasi-maximum likelihood estimator ̃𝜃, other than the maximum likelihood estimator, can be obtained by maximizing the likelihood function𝐿(𝜃)=𝑛𝑖=1𝑝𝑋𝑖,𝜃𝛿𝑖𝑋1𝑝𝑖,𝜃1𝛿𝑖(2.9) under some regularity assumptions.

The following regularity assumptions are sufficiently strong to ensure both consistency and asymptotic normality of the quasi-maximum likelihood estimator. Let 𝑢=(𝑥,𝛿), Ω=(𝑅𝑝,{0,1}), 𝑔(𝑢)=𝑝0(𝑥)𝛿(1𝑝0(𝑥))1𝛿, 𝑓(𝑢,𝜃)=𝑝(𝑥,𝜃)𝛿(1𝑝(𝑥,𝜃))1𝛿. It is natural that 𝑔(𝑢) is the true density function of 𝑢, and 𝑓(𝑢,𝜃) does not contain the true structure 𝑔(𝑢). The Kullback-Leibler Information Criterion (KLIC) can be defined by 𝐼(𝑔𝑓,𝜃)=𝐸(log[𝑔(𝑈)/𝑓(𝑈,𝜃)]), here, and in what follows, expectations are taken with respect to the true distribution 𝑔(𝑢). When expectations of the partial derivatives exist, we define the matrices 𝐴(𝜃)=𝐸(𝜕2log𝑓/𝜕𝜃i𝜃𝑗), 𝐵(𝜃)=𝐸((𝜕log𝑓/𝜕𝜃𝑖)(𝜕log𝑓/𝜕𝜃𝑗)).(C7)𝑔(𝑢) is measurable on Ω.(C8)𝑓(𝑢,𝜃) are measurable in 𝑢 for every 𝜃 in Θ, a compact subset of a 𝑅𝑝, and continuous 𝑢 for every 𝜃 in Θ.(C9) (a) 𝐸[log(𝑔(𝑈))] exists and |log𝑓(𝑢,𝜃)|𝑚(𝑢) for all 𝜃 in Θ, where 𝑚 is integrable with respect to 𝑔; (b) 𝐼(𝑔𝑓,𝜃) has a unique minimum at 𝜃 in Θ.(C10)𝐸((𝜕2log𝑓)/(𝜕𝜃𝑖)),𝑖=1,,𝑝, are measurable functions of 𝑢 for each 𝜃 in Θ and continuously second-order differentiable functions of 𝜃 for each 𝑢 in Ω.(C11)|𝜕2log𝑓/𝜕𝜃𝑖𝜃𝑗| and |(𝜕log𝑓/𝜕𝜃𝑖)(𝜕log𝑓/𝜕𝜃𝑗)|,𝑖,𝑗=1,,𝑝 are dominated by functions integrable with respect to 𝑔 for all 𝑢 in Ω and 𝜃 in Θ.(C12)(a) 𝜃 is interior to Θ; (b) 𝐵(𝜃) is nonsingular; (c) 𝜃 is a regular point of 𝐴(𝜃).

Then, we use 𝑌𝑖=(𝛿𝑖/𝑝(𝑋𝑖,̃𝜃))𝑌𝑖+(1(𝛿𝑖/𝑝(𝑋𝑖,̃𝜃)))𝑋𝑖̂𝛽𝑟,𝑖=1,2,,𝑛 as “complete” data set for 𝑌 to construct the auxiliary random vectors𝑍in(𝛽)=𝑋𝑖𝑌𝑖𝑋𝑖𝛽=𝑋𝑖𝛿𝑖𝑝𝑋𝑖,̃𝜃𝑌i𝑋𝑖𝛽+𝛿1𝑖𝑝𝑋𝑖,̃𝜃𝑋𝑖̂𝛽𝑟𝛽.(2.10) Thus, an empirical log-likelihood ratio is defined as𝑙𝑛(𝛽)=2max𝑛𝑖=1log𝑛𝜔𝑖𝜔𝑖0,𝑛𝑖=1𝜔𝑖=1,𝑛𝑖=1𝜔𝑖𝑍in.(𝛽)=0(2.11) And the maximum empirical likelihood estimator ̃𝛽 of 𝛽 is to maximize {𝑙𝑛(𝛽)}.

To ensure asymptotic results, the following assumptions are needed.(C4′)𝑝(𝑥,𝜃) is uniformly continued in 𝑈(𝜃) for all 𝑥;(C5′)𝐴,𝐶,𝐸,𝐹,𝐺,𝐷1,𝐷2 is a positive definite matrix, where 𝐴=𝐸{𝑋𝑋𝑇}, 𝐶=𝐸[𝑋𝑋𝑇(1𝑝0(𝑋)/𝑝(𝑋,𝜃))], 𝐸=𝐸[𝑝0(𝑋)𝑋𝑋𝑇], 𝐹=𝐸[𝑝0(𝑋)𝑋𝑋𝑇𝜀2], 𝐺=𝐸[(𝑝0(𝑋)/𝑝(𝑋,𝜃))𝑋𝑋𝑇𝜀2], 𝐷2=𝐸{(𝑝0(𝑋)/𝑝2(𝑋,𝜃))𝑋𝑋𝜀2}, and 𝐷3=𝐷2+𝐶𝐸1𝐹𝐸1𝐶𝑇+2𝐶𝐸1𝐺;(C6′)𝑝0(𝑥) has bounded almost surely and 𝑖𝑛𝑓𝑥𝑝0(𝑥)>0,

where condition (C4′) is common for some selection probability function. Condition (C5′)-(C6′) are necessary for asymptotic normality of the maximum empirical likelihood estimator.

Theorem 2.4. Suppose that conditions (C7)–(C12) and (C4′)–(C6′) above hold. If 𝛽 is the true parameter and 𝑝(𝑥,𝜃) is specified incorrectly, then 𝑙𝑛(𝛽)𝐷𝑝𝑖=1𝜔𝑖𝜒21,𝑖,(2.12) where 𝜒21 means the chi-square variable with 1 degrees of freedom. The weights 𝜔𝑖, 𝑖=1,2,,𝑝 are the eigenvalues of matrix 𝐷31𝐷2. 𝐷 represents the convergence in distribution.

Let 𝑟(𝛽)=𝑝/𝑡𝑟(𝐷31𝐷2) be adjustment factor. Along the lines of [7], it is straightforward to show that 𝑟(𝛽)Σ𝑝𝑖=1𝜔𝑖𝜒2𝐷1𝑖𝜒2𝑝.

Let 𝑆𝑛1(𝛽)=Σ𝑛𝑖=1𝑍in(𝛽)Z𝑇in(𝛽),  𝑆𝑛2(𝛽)=(Σ𝑛𝑖=1𝑍in(𝛽))(Σ𝑛𝑖=1𝑍in(𝛽))𝑇,  𝑆̂𝑟(𝛽)=𝑡𝑟(1𝑛2(𝛽)𝑆𝑛2𝑆(𝛽))/𝑡𝑟(1𝑛1(𝛽)𝑆𝑛2(𝛽)). We define an adjusted log-likelihood ratio by𝑙𝑛,ad(𝛽)=̂𝑟(𝛽)𝑙𝑛(𝛽).(2.13)

Corollary 2.5. Under the conditions of Theorem 2.4, one has 𝑙𝑛,ad(𝛽)𝐷𝜒2𝑝.(2.14) Let the 𝜒2𝑝(1𝛼) be the 1𝛼 quartile of the 𝜒2𝑝 for 0<𝛼<1. Using Corollary 2.5, we obtain an approximate 1𝛼 confidence region for 𝛽, defined by ̌𝛽𝑙𝑛,aď𝛽𝜒2𝑝.(1𝛼)(2.15) Corollary 2.5 can also be used to test the hypothesis 𝐻0𝛽=𝛽0. One could reject 𝐻0 if 𝑙𝑛,ad(̌𝛽)>𝜒2𝑝(1𝛼).

Note that 𝑟(𝛽)1, when 𝑝(𝑥,𝜃) is close to correct one. Actually, the adjustment factor 𝑟(𝛽) reflects information loss due to the misspecification of 𝑝(𝑥,𝜃).

Theorem 2.6. Under the conditions of Theorem 2.4, one has 𝑛̃𝛽𝛽𝐷𝑁0,Σ2,(2.16) where Σ2=𝐴1𝐷3𝐴1.

To apply Theorem 2.6 to construct the confidence region of 𝛽, we give the estimator of Σ2, say Σ2=𝐴1𝐷3𝐴1, where 𝐴 and 𝐷3 are defined by 𝐴=(1/𝑛)𝑛𝑖=1𝑋𝑖𝑋𝑇𝑖, 𝐶=(1/𝑛)𝑛𝑖=1𝑋𝑖𝑋𝑇𝑖(1𝛿𝑖/𝑝(𝑋𝑖,𝜃)), 𝐸=(1/𝑛)𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑇𝑖, 𝐹=(1/𝑛)𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑇𝑖(𝑌𝑖𝑋𝑖̂𝛽𝑟)2, 𝐺=(1/𝑛)𝑛𝑖=1(𝛿𝑖/𝑝(𝑥,𝜃))𝑋𝑖𝑋𝑇𝑖(𝑌𝑖𝑋𝑖̂𝛽𝑟)2, 𝐷2=(1/𝑛)𝑛𝑖=1(𝛿𝑖/𝑝2(𝑥,𝜃))𝑋𝑖𝑋𝑇𝑖(𝑌𝑖𝑋𝑖̂𝛽𝑟)2), 𝐷3=𝐷2+𝐶𝐸1𝐹𝐸1𝐶𝑇𝐶𝐸+21𝐺.

It is easily proved that Σ2 is a consistent estimator of Σ2. Thus, by Theorem 2.6, we haveΣ21/2𝑛̃𝛽𝛽𝐷𝑁0,𝐼𝑝,(2.17) where 𝐼𝑝 is an identity matrix of order 𝑝. Using (10.2d) in Arnold [6], we can obtaiñ𝛽𝛽𝑇𝑛Σ21̃𝛽𝛽𝐷𝜒2𝑝.(2.18) Therefore, the confidence region of 𝛽 can be constructed by using (2.18).

Remark 2.7. The estimator proposed by Robins in this situation is to solve 𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,̂𝜃𝑋𝑖𝑌𝑖𝑋𝑖𝛽+𝛿1𝑖𝑝𝑋𝑖,̂𝜃𝑋𝑖𝑋𝑖̂𝛽𝑟𝛽=0.(2.19) Noted that 𝐸(𝑋(𝑌𝑋𝛽)𝑋)=𝑋𝑋(𝛽𝛽)=𝑋𝑋̂𝛽[(𝑟̂𝛽𝛽)+(𝛽𝑟)]=𝑋𝑋(̂𝛽𝑟𝛽)+𝑜𝑝(1/𝑛) that is, in that case the underlying regression function can asymptotically correctly specified. So whether missing data mechanism is specified correctly or not, the estimator is always consistent and the estimator can achieve asymptotic full efficiency when specified correctly.

3. Simulation

Due to the curse of nonparametric estimation, Xue’s method may be hard to realize. Here we conducted an extensive simulation study to compare the performances of the weighted-corrected empirical likelihood(WCEL) proposed in this paper and Wang’s method (AEL) proposed in Wang and Rao under the covariates of four dimensions.

We considered the linear model (1.1) with 𝑑=4 and 𝛽=(0.8,1.5,1,2), where 𝑋 was generated from a four-dimensions standard normal distribution, and 𝜀 was generated from the normal distribution with mean zero and variance 0.04.

In the first case, the real selection probability function 𝑝0(𝑥,𝜃) was taken to be exp(𝑥𝜃)/(1+exp(𝑥𝜃)). We considered 𝜃 equaled to the three following values 𝜃1=(0.5,0.5,0.5,0.5), 𝜃2=(0,0,0,0), 𝜃3=(0.5,0.5,0.5,0.5), respectively.

In the second case, the real selection probability function 𝑝0(𝑥) was taken to be the following three cases:

Case 1. 𝑝1(𝑥)=0.8+0.2(|𝑥1|+|𝑥2|+|𝑥3|+|𝑥4|) if |𝑥1|+|𝑥2|+|𝑥3|+|𝑥4|1, and 0.9 elsewhere.

Case 2. 𝑝2(𝑥)=0.90.1(|𝑥1|+|𝑥2|+|𝑥3|+|𝑥4|)| if |𝑥1|+|𝑥2|+|𝑥3|+|𝑥4|4, and 0.9 elsewhere.

Case 3. 𝑝3(𝑥)=0.6 for all 𝑥.

We generated 5000 Monte Carlo random samples of size 𝑛=100, 200, and 500 based on the above six selection probability functions 𝑝(𝑥). When the working model was 𝑝(𝑥,𝜃)=exp(𝑥𝜃)/(1+exp(𝑥𝜃)), the empirical coverage probabilities for 𝛽, with a nominal level 0.95, were computed based on the above two methods with 5000 simulation runs. The results are reported in Table 1.

From Table 1, we can obtain the following results. Firstly, under both cases, WCEL performs better than AEL because its confidence regions have uniformly higher coverage probabilities. Secondly, all the empirical coverage probabilities increase as 𝑛 increases for every fixed missing rate. Observably, the missing rate also affects coverage probability. Generally, the coverage probability decreases as the missing rate increases for every fixed sample size. However, under Case I, the values do hardly change by a large amount for both methods because of the exponential selection probability function.

4. Conclusion

In this paper, a parametric selection probability function is introduced to avoid the dimension difficulty, and a bias-corrected technique leads to an empirical likelihood (EL) statistic with asymptotically chi-square distribution when the selection probability specified correctly and with asymptotically weighted chi-square distribution when specified incorrectly. Also, our estimator is always consistent and will achieve asymptotic full efficiency when selection probability function is specified correctly.

Appendix

A. Proofs

Lemma A.1. Suppose that (C1)–(C4) hold. Then ̂1/𝑝(𝑋,𝜃)=(1/𝑝(𝑋,𝜃0))(1+𝑜𝑝(1)).

Proof of Lemma A.1. Under conditions (C1)–(C3), it can be shown that ̂𝜃 is 𝑛 consistent: ̂𝜃𝜃0=𝑂𝑝𝑛1/2=𝑜𝑝(1).(A.1) Together with condition (C4), we have 𝑝̂𝜃𝑋,=𝑝𝑋,𝜃0+𝑜𝑝(1).(A.2) By Taylor expanding, it is easily shown that 1𝑝̂𝜃=1𝑋,𝑝𝑋,𝜃01+𝑜𝑝.(1)(A.3)

Lemma A.2. Suppose that (C7)–(C12) hold. Then ̃1/𝑝(𝑋,𝜃)=(1/𝑝(𝑋,𝜃))(1+𝑜𝑝(1)).

Proof of Lemma A.2. From Theorem  3 of White [8], under conditions (C7)–(C12), it can be shown that ̃𝜃 is 𝑛 consistent: ̃𝜃𝜃=𝑂𝑝𝑛1/2=𝑜𝑝(1).(A.4) Similar to Lemma A.1, it is easy to show that ̃1/𝑝(𝑋,𝜃)=(1/𝑝(𝑋,𝜃))(1+𝑜𝑝(1)).

Lemma A.3. Suppose that (C7)–(C12) and (C4′)–(C6′) hold. If 𝑝(𝑋,𝜃) is specified wrongly, then 1𝑛𝑛𝑖=1𝑍in(𝛽)𝑁0,𝐷3,(A.5)1𝑛𝑛𝑖=1𝑍in(𝛽)𝑍𝑇in(𝛽)𝐷2,||𝑍maxin||𝑛(𝛽)=𝑜1/2.(A.6)

Proof of Lemma A.3. We prove (A.5) only; (A.6) can be proved similarly.
By direct calculation, we have 1𝑛𝑛𝑖=1𝑍in1(𝛽)=𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,̃𝜃𝑋𝑖𝑌𝑖𝑋𝑖𝛽+1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,̃𝜃𝑋𝑖𝑋𝑖̂𝛽𝑟=1𝛽𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝜀𝑖1+𝑜𝑝+1(1)𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖+1𝑛𝑛𝑖=1𝑋𝑖𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖𝑜𝑝(=11)𝑛𝑛𝑖=1𝛿𝑖𝑝X𝑖,𝜃𝑋𝑖𝜀𝑖+1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖+𝑜𝑝(1)(A.7) It is easily to shown that 1𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝜀𝑖𝑁0,𝐷2,1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖1𝑁(0,𝐹),𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖=𝐶+𝑜𝑝(11),𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖=𝐸+𝑜𝑝(1).(A.8) Note that 1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖=1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝐶𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖1+𝐶𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖1𝐸11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖+CE11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖=CE11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖+𝑜𝑝1(1),𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝜀𝑖=𝐺+𝑜𝑝(1),(A.9) so we have 1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖𝑁0,𝐶𝐸1𝐹𝐸1𝐶𝑇,1𝑛𝑛𝑖=1𝛿1𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝑋𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝑋𝑖11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝜀𝑖=CE11𝑛𝑛𝑖=1𝛿𝑖𝑋𝑖𝜀𝑖1𝑛𝑛𝑖=1𝛿𝑖𝑝𝑋𝑖,𝜃𝑋𝑖𝜀𝑖+𝑜𝑝(1)=CE1𝐺+𝑜𝑝(1).(A.10) Therefore, by using (A.8)–(A.10), the proof of (A.5) is completed.

We just proof Theorems 2.4 and 2.6, the proofs of Theorems 2.1 and 2.3 are similar, only to replace the ̃𝜃,𝜃,𝐷3,𝑍in(𝛽),𝐶 by ̂𝜃,𝜃0,𝐷1,𝑍in(𝛽),0 separately.

Proof of Theorem 2.4. By the Lagrange multiplier method, 𝑍in(𝛽) can be represented as 𝑙𝑛(𝜃)=2𝑛𝑖=1log1+𝜆𝑇𝑍in,(𝛽)(A.11) where 𝜆=𝜆(𝛽) is a 𝑑×𝑎 vector given as the solution to 𝑛𝑖=1𝑍in(𝛽)1+𝜆𝑇𝑍in(𝛽)=0,(A.12) by Lemma A.2, and using the same arguments as that of the proof (A.4) of in [9], we can show that 𝜆=𝑂𝑝𝑛1/2.(A.13) Applying the Taylor expansion to (A.11) and (A.13), we get that 𝑙𝑛(𝜃)=2𝑛𝑖=1𝜆𝑇𝑍in𝜆(𝛽)𝑇𝑍in(𝛽)22+𝑜(1).(A.14) By (A.12), it follows that 0=𝑛𝑖=1𝑍in(𝛽)1+𝜆𝑇𝑍in=(𝛽)𝑛𝑖=1𝑍in(𝛽)𝑛𝑖=1𝑍in(𝛽)𝜆𝑇𝑍in(𝛽)+𝑜𝑝𝑛1/2.(A.15) This together with Lemma A.3 and (A.13) proves that 𝑛𝑖=1𝜆𝑇𝑍in(𝛽)2=𝑛𝑖=1𝜆𝑇𝑍in(𝛽)+𝑜(1),𝜆=𝑛𝑖=1𝑍in(𝛽)𝑍𝑇in(𝛽)𝑛1𝑖=1𝑍in(𝛽)+𝑜𝑝𝑛1/2.(A.16)
Therefore, from (A.14) we have 𝑙𝑛1(𝛽)=𝑛𝑛𝑖=1𝑍in(𝛽)𝑇1𝑛𝑛𝑖=1𝑍in(𝛽)𝑍𝑇in(𝛽)11𝑛𝑛𝑖=1𝑍in(𝛽)+𝑜(1).(A.17) This together with Lemma A.3 completes Theorem 2.4.

Proof of Theorem 2.6. From Theorem 1 of Qin [10] and (A.5), we obtain the result of Theorem 2.6 directly.

Acknowledgment

This paper is supported by NSF projects (2011YJYB005, 2011SSQD002 and 2011YJQT01) of Changji College Xinjiang Uygur Autonomous Region of China.