Research Article | Open Access

Yuye Zou, Chengxin Wu, "Statistical Inference for the Heteroscedastic Partially Linear Varying-Coefficient Errors-in-Variables Model with Missing Censoring Indicators", *Discrete Dynamics in Nature and Society*, vol. 2021, Article ID 1141022, 26 pages, 2021. https://doi.org/10.1155/2021/1141022

# Statistical Inference for the Heteroscedastic Partially Linear Varying-Coefficient Errors-in-Variables Model with Missing Censoring Indicators

**Academic Editor:**Chris Goodrich

#### Abstract

In this paper, we focus on heteroscedastic partially linear varying-coefficient errors-in-variables models under right-censored data with censoring indicators missing at random. Based on regression calibration, imputation, and inverse probability weighted methods, we define a class of modified profile least square estimators of the parameter and local linear estimators of the coefficient function, which are applied to constructing estimators of the error variance function. In order to improve the estimation accuracy and take into account the heteroscedastic error, reweighted estimators of the parameter and coefficient function are developed. At the same time, we apply the empirical likelihood method to construct confidence regions and maximum empirical likelihood estimators of the parameter. Under appropriate assumptions, the asymptotic normality of the proposed estimators is studied. The strong uniform convergence rate for the estimators of the error variance function is considered. Also, the asymptotic chi-squared distribution of the empirical log-likelihood ratio statistics is proved. A simulation study is conducted to evaluate the finite sample performance of the proposed estimators. Meanwhile, one real data example is provided to illustrate our methods.

#### 1. Introduction

In regression analysis, for a long period of time, the flexible and refined statistical regression models are widely applied in theoretical study and practical application. The main results related to parameter regression models and nonparameter regression models are rather mature. Recently, semiparameter regression models can reduce the high risk of misspecification related to parameter regression models and avoid the “curse of dimensionality” for nonparametric regression models. Thanks to their advantage, semiparametric regression models enjoy consideration attention from statisticians. Semiparametric regression models have various forms. Specially, partially linear varying-coefficient errors-in-variables (PLVCEV) model, as a typical example, was introduced by You and Chen [1] and has the following form:where is the response variable, are the covariates, is a vector of -dimensional unknown parameter, is an unknown -dimensional vector of coefficient function, and is the random error. The measurement error is independent of with mean zero and covariance matrix . In order to identify the model, is assumed to be known.

As one general and flexible semiparametric model, model (1) includes a variety of models of interest. When is observed exactly, model (1) boils down to be PLVC model [2, 3]. When , and is observed exactly, model (1) reduces to partially linear regression model [4]. When is observed exactly, and is a constant vector, model (1) becomes a linear regression model. When and , model (1) reduces to partially linear EV model [5]. For model (1), You and Chen [1] proposed estimators of parametric and nonparametric components and showed their asymptotic properties. Liu and Liang [6] constructed the asymptotical normality of jackknife estimator for error variance and standard chi-square distribution of jackknife empirical log-likelihood statistic. Fan et al. [7] established penalized profile least squares estimation of parameter and nonparameter in the model.

The literature mentioned above assumed that the random errors are homoscedastic, which means that the random error is independent of . However, in many practical application fields, the error variance function may change with the variables. Heteroscedastic error models have attracted much attention of many scholars. For example, You et al. [8] considered the estimation of parametric and nonparametric parts for partially linear regression models with heteroscedastic errors. Fan et al. [9] constructed confidence regions of parameter for heteroscedastic PLVCEV model based on empirical likelihood method. Shen et al. [10] discussed estimation and inference for PLVC model with heteroscedastic errors. Xu and Duan [11] extended the results of Shen et al. [10] to efficient estimation for PLVCEV model with heteroscedastic errors.

The above related works assumed that the responses are observed completely. However, in many practical fields, especially in biomedical studies and survival analysis, the response cannot be completely observed due to censored variables. Huang and Huang [12, 13] discussed the constructed confidence regions of the parameters for varying-coefficient single-index model and partially linear single-index EV model by empirical likelihood method under censored data, respectively. The aforementioned results require that the censoring indicators be always observed. However, the censoring indicators may not be observed completely. For example, the death of individual is attributable to the cause of interest that may require information that is not gathered or lost due to various reasons [14]. In this paper, we assume that the censoring indicators are missing at random (MAR), which is common and reasonable in statistical analysis with missing data [15]. There are a lot of works related to missing censoring indicators. For example, Wang and Dinse [16] and Li and Wang [17] proposed weighted least square estimators of unknown parameter and proved their asymptotical normality for linear regression model. Shen and Liang [18] discussed the estimation and variable selection for PLVC quantile regression model. Wang et al. [19] considered composite quantile regression for linear regression model. However, there is no literature focusing on the estimation and confidence regions of heteroscedastic errors model with right-censored data when the censoring indicators are MAR.

In this paper, we consider modified profile least square (PLS) estimators of the unknown parameter and local linear estimators of the coefficient function. Besides the point estimation, we are also interested in interval estimation in terms of empirical likelihood (EL) method, which, first introduced by Owen [20], is a very effective method for constructing confidence regions, which enjoys a lot of nice properties over the normal approximation-based methods and bootstrap approach. Thanks to its advantage, there are a lot of literature-related EL methods to refer to. For instance, Fan et al. [21] considered penalized EL for high-dimensional PLCVEV model. Wang and Drton [22] established estimation for linear structural equation models with dependent errors based on EL method. Fan et al. [23] discussed weighted EL for heteroscedastic varying-coefficient partially nonlinear model with missing data. Zou et al. [24] considered EL inference for partially linear single-index EV model with missing censoring indicators.

It is worth pointing out that it is innovative and interesting in studying the PLVCEV model with heteroscedastic errors under censoring indicators MAR. Thus, we consider estimation and confidence regions based on modified profiled LS method and EL inference, respectively. The main aims of this paper include the following aspects: (1) define a class of modified PLS estimators of the parameter and local linear estimators of coefficient function based on regression calibration, imputation, and inverse probability weighted approaches, and prove the asymptotical normality of the proposed estimators; (2) construct reweighted estimators of the parameter and coefficient function based on estimators of the error variance function, and establish the asymptotic properties of the proposed estimators; (3) develop the asymptotic standard chi-squared distribution of the empirical log-likelihood ratio functions, construct the confidence regions for the parameter, and propose the asymptotic distribution of the corresponding maximum EL estimators. Finally, a simulation study and a real data analysis are conducted to demonstrate the finite sample performance of the proposed procedures.

The rest of this paper is organized as follows. In Section 2, we construct modified PLE estimators of the parameter and local linear estimators of the coefficient function. In Section 3, we proposed empirical log-likelihood ratio statistics and maximum EL estimators. The main results are shown in Section 4. Section 5 presents simulation and real data analysis. In Section 6, we show some conclusions. The proofs of the main results are shown in Appendix.

#### 2. Methodology

Suppose that is a sample from model (1), that is,where the model error satisfies and , which is an unknown function of representing heteroscedastic error. In the practical application, the response may be right censored by various reasons. Let be censoring time with distribution function (df) . One can only observe with df and censoring indicator . Define the missing indicator to be , which is 0 if is missing; otherwise, it is 1. Throughout this article, we assume that is independent of , and is MAR, which implies that and are conditional independent given , i.e.,

##### 2.1. Modified Profile Least Squares Estimation

The local linear regression technique is employed to estimate the coefficient function . If has twice continuous derivative at point , for in a small neighborhood of , one can approximate by the following expansion with Taylor expansion:where . Then, can be estimated by minimizing the following objective function:where is a kernel function, and is a bandwidth sequence. Due to the missing indicators, some cannot be observed. Therefore, model (2) cannot be applied directly. One can replace with its conditional expectation . Thus, can be defined as the minimizer of

However, in practical fields, function is usually unknown. One can use parametric and nonparametric methods to estimate . However, when the covariates are high-dimensional, nonparametric estimation may cause “the curse of dimensionality.” Hence, throughout this paper, we assume that follows a parametric model , where is an unknown parameter vector. Following Wang and Dinse [16], the estimator of can be obtained by maximizing the following likelihood function:

Let . Since , we replace with its estimator . Hence, can be estimated by minimizing the following objective function:where is the estimator of , which is defined bywhich is the Nadaraya–Watson estimator of with the kernel function and bandwidth sequence .

For notational simplicity, let , , ,

If is known, one can obtain the local linear estimator of coefficient function by

Substituting (11) into the original model (8) and eliminating bias produced by the measurement error, we get the following modified PLS estimator of the parameter based on regression calibration method,

Then, the local linear regression estimator of is defined as follows:

Let . Since under the missing mechanism, we can impute with in expression (6). Hence, can be estimated by minimizing the following objective function:

If is known, one can obtain the local linear estimator of coefficient function bywhere . Substituting (21) into the original model and eliminating bias produced by the measurement error, hence, we obtain the following modified PLS estimator of based on imputation method:

Thus, the local linear regression estimator of is defined as follows:

Let . Note that under MAR assumption. Hence, we substitute with , where is a nonparametric estimator of with kernel function and bandwidth sequence . Hence, can be estimated by minimizing the following objective function:

If is known, one can obtain the local linear estimator of coefficient function bywhere . Substituting (19) into the original model and eliminating bias produced by the measurement error. Hence, we can get the following modified PLS estimator of based on inverse probability weighted method:

Hence, the local linear regression estimator of is defined as follows:

##### 2.2. Estimation for Error Variance

In order to improve the estimation of parametric and nonparametric parts, we construct local linear estimators of the error variance function in this subsection. Note that . By minimizing the following object function with respect to ,the local linear regression estimator of based on regression calibration method is defined bywhere the weight function is defined bywith

Note that

By minimizing the following object function with respect to ,the local linear regression estimator of based on imputation method is defined bywhere the weight function is defined bywith

Note that

By minimizing the following object function with respect to ,the local linear regression estimator of based on inverse probability weighted method is defined bywhere the weight function is defined bywith

##### 2.3. Reweighted Estimation

In this subsection, we construct the reweighted estimations of the parametric and nonparametric parts based on the error variance estimator given in (23). By minimizing the following object function,then, we get the following reweighted estimator of based on the regression calibration method:

Furthermore, the reweighted estimator of the coefficient function is defined by

Similarly, based on the error variance estimator given in (28) and minimizing the following object functionthen, we get the reweighted estimator of based on the imputation method:

Hence, the reweighted estimator of the coefficient function is defined by

From the error variance estimator given in (33) and minimizing the following object functionthen, we get the reweighted estimator of based on the inverse probability weighted method:

Thus, the reweighted estimator of the coefficient function is defined by

#### 3. Empirical Likelihood

The confidence regions of the parameter can be constructed by the asymptotic distribution of Theorems 1 and 4. However, the estimation of asymptotic covariance is quite complicated. In this section, we shall employ the EL method to construct confidence regions for , which avoids to estimate the complicated covariance.

##### 3.1. Regression Calibration Empirical Likelihood

We introduce the following auxiliary random vector based on regression calibration method:

Thus, we define the empirical log-likelihood ratio function as follows:

The optimal value of satisfying (46) is given by , where is the solution to the equation . By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is represented as

By maximizing , we can obtain a maximum EL estimator of with regression calibration method.

##### 3.2. Imputation Empirical Likelihood

We introduce the following auxiliary random vector based on imputation method:

Hence, we define the empirical log-likelihood ratio function as follows:

The optimal value of satisfying (49) is given by , where is the solution to the equation . By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is

By maximizing , we can obtain a maximum EL estimator of with imputation method.

##### 3.3. Inverse Probability Weighted Empirical Likelihood

We introduce the following auxiliary random vector based on inverse probability weighted method:

Then, we define the empirical log-likelihood ratio function as follows:

The optimal value of satisfying (52) is given by , where is the solution to the equation . By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is represented as

By maximizing , we can obtain a maximum EL estimator of with inverse probability weighted method.

#### 4. Main Results

For convenience and simplicity, we use and generically to represent any positive constants, which may take different values for each appearance. Let , , , and . Denote

In order to prove the main results, we give a set of assumptions that are stated in the following theorems: (C1) The random variable has bounded support and its density function is Lipschitz continuous and away from zero on its support. (C2) There is such that ., ., . and are nonsingular matrixes. (C3) has continuous second derivatives in . (C4) The variance function with uniform boundedness has continuous second-order derivation and is bounded away from zero. (C5) The kernel as a symmetric density function has compact support , which is Lipschitz continuous, and satisfies . (C6) Denote and . Let and is continuous. is continuous for . (C7) The kernel functions and are bounded with bounded compact supports, and , , and . (C8) and have bounded derivatives of order 1, and there exists such that . (C9) is a positive definite. is continuous at . (C10) The bandwidth satisfies , , and for .

*Remark 1. *(a)Assumptions (C1) and (C2) are used to establish the asymptotic normality and the oracle property of the estimators. Assumptions (C3) and (C4) are common conditions for varying-coefficients models with heteroscedastic error. Assumption (C5) requires that the kernel function is a proper density with finite second moment, which is required to derive the asymptotic variance of estimators. Assumption (C6) implies that is bounded away from zero. Assumptions (C7)–(C9) are needed for the properties of and . Assumption (C10) underlines the relationship bandwidth with sample size , which implies the optimal bandwidth in nonparametric estimation.(b)From the Taylor expansion and conclusion in Li and Wang [17], one can get which, together with assumption (C9), gives .

The asymptotic properties of the proposed estimators are shown in the following theorems.

Theorem 1. *Suppose that assumptions (C1)–(C10) are satisfied; then, we havewhere is taken to be , and . correspond to , and , respectively.*

Theorem 2. *Suppose that assumptions (C1)–(C10) are satisfied; then, we havewhere is taken to be , and . correspond to , and , respectively.*

Theorem 3. *Suppose that assumptions (C1)–(C10) are satisfied; let ; then, we havewhere is taken to be one of , and .*

Theorem 4. *Suppose that assumptions (C1)–(C10) are satisfied; then, we havewhere is taken to be one of , and . correspond to , and , respectively.*

Theorem 5. *Suppose that assumptions (C1)–(C10) are satisfied; then, we havewhere is taken to be one of , and . correspond to , and , respectively.*

Theorem 6. *Suppose that assumptions (C1)–(C10) are satisfied; if is the true value, then we havewhere denotes one of , , and . is a standard chi-squared random variable with 1 degree of freedom.*

Theorem 7. *Suppose that assumptions (C1)–(C10) are satisfied; then, we havewhere denotes one of , and . correspond to , and , respectively.*

*Remark 2. *(a)From Theorems 1 and 4, the asymptotic variance of the reweighted estimator is not greater than that of the modified profile LS estimator ; that is, is a positive semidefinite matrix. The asymptotic variance of the reweighted estimator is smaller than that of , and is larger than that of , which indicates that performs the best, and performs the worst. The modified PLS estimators , enjoy the same conclusion.(b)From Theorems 2 and 5, the local polynomial estimator and reweighted estimator have the same asymptotic distribution, which reflects the characteristic of the local regression in nonparametric models.(c)From Theorem 6, the EL confidence region for can be established as , where is the upper -quantile of distribution of .

#### 5. Simulation

In this subsection, we carry out some numerical simulation to investigate the finite sample behavior of the proposed estimators. We compare the performance of the estimators based on the regression calibration method (CA), imputation method (IM) and inverse probability weighted method (IPW), and their corresponding reweighted estimators (R-CA, R-IM, R-IPW). Besides, we conduct a comparison of the EL method with the normal approximation (NA) approach in terms of coverage probabilities (CP) and average interval lengths (AL) under different settings. At the same time, we give a real data analysis. The kernel functions are taken as , and . The bandwidths , and have taken the same values by leave-one-sample-out cross-validation. The following simulation is based on 500 replications. The sample size is chosen to be 100 and 400, repeatedly.

##### 5.1. Simulation Experiments

The data are generated from the following the PLVCEV model:where , , , the covariates and are from and pairwise covariance