A Least Squares Method for Variance Estimation in Heteroscedastic Nonparametric Regression
Interest in variance estimation in nonparametric regression has grown greatly in the past several decades. Among the existing methods, the least squares estimator in Tong and Wang (2005) is shown to have nice statistical properties and is also easy to implement. Nevertheless, their method only applies to regression models with homoscedastic errors. In this paper, we propose two least squares estimators for the error variance in heteroscedastic nonparametric regression: the intercept estimator and the slope estimator. Both estimators are shown to be consistent and their asymptotic properties are investigated. Finally, we demonstrate through simulation studies that the proposed estimators perform better than the existing competitor in various settings.
Consider the nonparametric regression model where are observations, are design points with , is an unknown mean function, and are independent random errors with mean zero and variance , respectively. In the special case when are all the same, model (1) reduces to a homoscedastic nonparametric regression. In this paper, we are interested in estimating the variance in the situation when are not all the same but known constants. Note that such a setting can arise in various situations. As an illustration, we consider a regression model with repeated observations on design points , respectively, where the measurement errors are normal. If in practice we only report the average values on each design point, we have the new model as , where with .
Needless to say, an accurate estimate of variance is important in nonparametric regression. For instance, it is required in constructing confidence bands, in choosing the amount of smoothing, in testing the goodness of fit, and in estimating the detection limits of immunoassay [1–8]. In the past several decades, researchers have proposed many methods for estimating , especially when the regression model is homoscedastic. Among the existing methods, one popular class is referred to as difference-based estimators. The first-order difference-based estimator was proposed in Rice , Assume that is a Lipschitz continuous function and . Note that as . Therefore, is an asymptotically unbiased estimator of . Since then, many difference-based estimators have been proposed in the literature. For instance, Gasser et al.  proposed a second-order difference-based estimator. Hall et al.  proposed an th-order difference-based estimator with a finite number. Other significant works include Dette et al. , Müller et al. , Tong et al. , Du and Schick , and Wang et al. , among others. Furthermore, Brown and Levine , Wang et al. , and Cai and Wang  considered the difference-based kernel and wavelet estimators for the variance function in nonparametric regression. Note that the difference-based estimators do not require an estimate of the mean function and so are popular in practice.
As a variation of the difference-based estimation, Tong and wang  proposed a least squares estimator of . Let the lag- Rice estimator be For the equally spaced design with , it can be shown that for any , where and . This indicates that the lag- Rice estimators are always positively biased estimators of , especially when the sample size is small. To reduce bias, Tong and Wang regressed on using a simple linear regression and then estimate as the intercept. The least squares estimator achieves the asymptotically optimal rate that is usually possessed by residual-based estimators only. In addition, Tong et al.  established the asymptotic normality and also demonstrated the efficiency of the least squares estimator. We also note that Park et al.  investigated the least squares method in small sample nonparametric regression via a local quadratic approximation to determine the regressor and weights.
The aforementioned methods have significantly advanced our understanding on the difference-based estimation of the error variance. Nevertheless, most of the above methods, including the least squares method, only applied to nonparametric regression models with homoscedastic errors. In practice, it is not uncommon that the errors may have different variances. In such situations, we note that the bias term of the least squares estimator in Tong and Wang  will be significantly enlarged; for more details, see Sections 2 and 3. Inspired by this, we propose two adaptive least squares estimators for the residual variance in heteroscedastic nonparametric regression.
The remainder of this paper is organized as follows. In Section 2, we propose two least squares estimators for the error variance: the intercept estimator and the slope estimator. In Section 3, we investigate the asymptotic properties of the proposed estimators and present some theoretical results including the asymptotic normalities of the estimators. In Section 4, we conduct simulation studies to evaluate the proposed estimators and compare them with the existing competitor in the literature. We then conclude the paper in Section 5 with a brief discussion and provide the technical proofs in Section 6.
For model (1), without loss of generality, we assume that . In matrix notation, the model is written as where , , and . The covariance matrix of is , where When , namely, for all , it reduced to the homoscedastic setting in Tong and Wang . In this paper, we assume that the values are not all the same.
For this setting, one naive approach is to apply the transformation . Through this transformation the errors become homogeneous. Nevertheless, meanwhile, it makes the transformed mean function no longer a Lipschitz continuous function. Specifically, if and , the difference will not be negligible when tends to be zero. As a consequence, the difference-based methods fail to apply in such situations.
To advance the research project, we reconsider the lag- Rice estimator defined in Tong and Wang . Suppose that has a bounded first derivative. For model (4), the expectation of the lag- Rice estimator is where , , and . Note that when . Therefore, for model (4) with heteroscedastic errors, it is not guaranteed that is an asymptotically unbiased estimator of .
In what follows, we develop two new estimators for : the first method estimates as the intercept and the second method estimates as the slope. For the first method, we let and . Then, for any , we have Now treating as the response variable and as the independent variable, we fit the following simple linear regression and estimate as the fitted intercept, where are the random errors and is the total number of pairs used for the fit. Note that involves pairs of difference; we assign weights , where , to the response variable . We then fit the linear model (8) using the weight least squares that minimizes the weighted sum of squares . Specifically, the estimated error variance is where , , and . Let and The quadratic form of can be represented as , where is a symmetric matrix with for , for , and otherwise.
For the second method, we fit the linear regression with two independent variables and and with no intercept term. Specifically, we fit where are the random errors associated with the linear regression. We then estimate as the fitted slope . For ease of notation, let . By minimizing the weighted sum of squares , we have the second estimator of as Let and It is easy to verify that has the quadratic form , where is a symmetric matrix with for , for , and otherwise.
3. Main Results
This section investigates the statistical properties of the proposed least squares estimators. Note that in (9) and in (12) are two similar estimators, except that (9) treats as i.i.d. random errors and (12) treats as i.i.d. random errors. For simplicity, in what follows, we present the asymptotic results for only. To evaluate the achievement of the proposed estimators, we will also investigate the behavior of in Tong and Wang  under the new model (4). Recall that for , we have where , , and is a symmetric matrix with for , for , and otherwise.
Theorem 1. For the equally spaced design, the estimator in (9) is an unbiased estimator of when is a linear function, regardless of the choice of and . Under the same setting, however, the estimator in (14) does not preserve the unbiasedness property. More specifically, the bias term of has the expression
Theorem 2. Assume that has a bounded second derivative and with . When , for any with and the equally spaced design, then where As a comparison, the bias and variance of are where
Theorem 3. Assume that has a bounded second derivative and with . When with , for any with and the equally spaced design, then As a comparison,
Theorem 4. Assume that has a bounded second derivative and with . For with and any with , then where , , and denotes convergence in distribution.
The proofs of the theorems are given in Section 6, respectively. Theorems 1 and 2 indicate that is an unbiased or asymptotically unbiased estimator of whereas is not. The comparison on the asymptotic variances, or equivalently on and , will be presented in Section 4. Furthermore, when the heteroscedasticity level is high, Theorem 3 shows that the bias term of is getting more severe so that it does not remain to be a consistent estimator. The asymptotic normality in Theorem 4 can be used to construct confidence intervals for . When , an approximate confidence interval for is where is the upper -th percentile of the standard normal distribution. When are from normal distribution with variance , we have so that the confidence interval is fully specified. In general, we need to give an estimate for the unknown .
4. Simulation Studies
In this section, we conduct simulation studies to evaluate the finite sample performance of the proposed estimators, and . Their performance will also be compared with the estimator . Let for . Throughout the simulations, we choose the bandwidth , as suggested in Tong and Wang .
Our first simulation study considers only one value being different from the others. Specifically, for a given location , we let and for any , where is a constant. Note that is satisfied. In this study, we let . To investigate the behavior of the estimators along with the variance pattern, we consider the mean function and and and , respectively. Given the and values, we then simulate independently from . With 1000 repetitions, we plot the relative mean squared errors, , along with the location for in Figure 1. It is evident that our estimators and perform better than in most locations. To check the behavior near the boundary, we also plot the values of and along with the location for (chosen ) and (chosen ) in Figure 2. Combining Figures 1 and 2, we recommend the use of the new estimators when no significant different variance appears in the boundaries.
Our second simulation study is to investigate the average improvement of and over when one or more variances are different from the others. To proceed, we consider three mean functions, two standard deviations, and , and three sample sizes, , , and , respectively. In total, there are 18 combinations. The values corresponding to and are , , and , respectively. We then randomly sample one location or five locations from the set without replacement. For , the choice of the values follows the previous study. For with the five locations , we let for and for . This results in . For each combination setting, we repeat the simulation 1000 times and report the relative MSEs in Table 1 for and in Table 2 for . From the simulation results, we observe that and have smaller relative MSEs than in all the settings. In addition, we note that the performances of and are almost identical.
In this paper, we have proposed two least squares estimators for the error variance in heteroscedastic nonparametric regression: the intercept estimator and the slope estimator. Both estimators are shown to be consistent and their asymptotic properties are investigated, including the consistency and the asymptotic normalities. Simulation studies indicate that the proposed estimators perform better than the existing competitor in most settings. In the boundaries, however, we note that the proposed estimators behave not as well as expected when significantly different variances appear in the boundaries of design points. As a practical rule, we have suggested adopting the boundaries as and . Further research may be necessary in this direction.
This section provides the technical proofs of the theorems in Section 3. To prove the theorems, we first establish two lemmas. For ease of notation, let and .
Lemma 5. Assume that has a bounded second derivative. When , for any with and the equally spaced design, then (a), ;(b);(c);(d);(e).
Proof. For simplicity, we prove only for . Let and . By the definition of and , we have
First, we consider the upper bound of . We know Thus, we have
Next, we consider the lower bound of . By the definition, we can know Let and . Then, and are monotonically increasing about , and Note that is a monotonically increasing function of for with and . Let be the unique integer such that and . Therefore, we have
Let with . It is easy to verify that . Then, . Let . Since for , then for . Then, Consequently, we obtain and . Note that So, we can get .
Note that where . By , for any , we have
Let , . We know For any , we have
We know Note that Thus,
We know Note that Therefore, we have
Lemma 6. Assume that has a bounded second derivative. When with , for any with and the equally spaced design, then (i), ;(ii);(iii);(iv);(v).
Proof. (i) Here we only consider the proof of . For , it is similar with in Lemma 5 to verify that and . Thus, we can get
(ii) According to , we have
(iii) By part in Lemma 5, we know For any , we have
(iv) By in Lemma 5, we have
(v) Similar to in Lemma 5, it is easy to get
Lemma 7 (see ). Let be entries of a real symmetric matrix , let be i.i.d. random variables, and . Assume that and ; then where and are the Euclidean norm and the spectral norm of the matrix , respectively.
6.1. Proof of Theorem 1
6.2. Proof of Theorem 2
It is easy to verify that . This leads to Thus, By Lemma 5, for any , we have In what follows, we calculate . Note that For , we have where . For , by Lemma 5, for any , we have Finally, we consider , Combining (61), (62), and (63), we know Note that, for and any with , we have Therefore, we get where
Let , . Then, we have
6.3. Proof of Theorem 3
By lemma 2, we know According to (63), for , we have Note that, under the condition and , it can be shown that In addition, by Cauchy-Schwarz inequality, we know Thus, When and satisfy and , namely, and , then
Next, we consider the order of the bias and variance of . By (55), we have Note that By in Lemma 6, then Consequently, it shows that Thus, we get By (68) and in Lemma 6, for and , it is similar with (74) to get
This completes the proof of the theorem.
6.4. Proof of Theorem 4
Now we consider the third term. Let