Abstract

We first propose in this paper a new test method for detecting heteroscedasticity of the error term in nonparametric regression. Some simulation experiments are then conducted to evaluate the performance of the proposed methodology. A real-world data set is finally analyzed to demonstrate the application of the method.

1. Introduction

In recent years, nonparametric regression models have been widely applied in a variety of areas for data analysis. The estimation of the regression function and related statistical inferences in nonparametric models are usually based on the assumption that the error term is homoscedastic. However, in many real-world problems, we rarely know a priori whether this assumption can be guaranteed. Therefore, it is necessary to develop a method for detecting heteroscedasticity in the error terms before we embark on the model fitting and inferential issues.

In the literature of the statistical nonparametric regression, there have been many papers on testing heteroscedasticity (see, e.g., [18]). Among these papers, a procedure was developed by Dette and Munk [2] based on an estimator for the best -approximation of the variance function by a constant and was extended by You and Chen [5] to partially linear regression models. Dette [1] proposed a test for heteroscedasticity in nonparametric regression. A residual-based statistic was suggested by Eubank and Thomas [3] to detect heteroscedasticity of the error term in nonparametric models. Furthermore, Zhang and Mei [7] obtained a test for the constant variance of the model errors based on residual analysis.

Most of the existing procedures, including those mentioned above, belong to the class of parametrically hypothesis test methods. That is, the methods work quite well when the model errors coincide with the preassumed distribution, while the performance significantly decreases when the distribution cannot be guaranteed. Therefore, it is necessary to develop a test which is robust to the error distributions. To the best of our knowledge, however, there has been little work done on this issue.

In this paper, we propose a completely nonparametrically hypothesis test method for detecting heteroscedasticity of the error term in nonparametric regression. In this method, the test statistic is constructed on the basis of an appropriate transformation of the residuals after fitting the regression model with the local linear estimation. In order to evaluate the performance of the proposed method, we conduct a simulation comparison with Zhang and Mei’s procedure [7] and a real-world data set is analyzed to show the application of the method.

The remainder of this paper is organized as follows. In Section 2, we briefly describe the local linear estimation method. By using the residuals after fitting the regression model with the local linear estimation and applying the idea of trend analysis in nonparametric statistics, a testing procedure is described in Section 3. In Section 4, we conduct some simulations to assess the performance of the test. A real-world data set is analyzed in Section 5 to demonstrate the application of the proposed method. The paper is then ended with some final remarks.

2. A Brief Description of the Local Linear Estimation

Consider the univariate nonparametric regression model where and indicate the response and explanatory variable, respectively, and is a random sample from model (1). and are unknown regression and variance functions. are generally assumed to be independently and identically distributed random variables with zero mean and unit variance. Also and are independent.

Due to its several attractive mathematical properties (see [911] for details), the local linear estimation procedure is used to calibrate the model in (1). Specifically, suppose that the second order derivative of the regression function in model (1) is continuous in the domain of the variable , say , and is a given point in . According to Taylor’s expansion, we have in the neighborhood of that where denotes the first order derivative of at . By replacing in model (1) with its linear approximation in (2) and combining the least-squares procedure, the local linear estimate of the regression function at can be obtained by solving the following weighted least-squares problem: with respect to and , where and is a given kernel function that is generally taken to be a symmetric probability density function and is the bandwidth which can be determined by some data-driven methods such as the cross-validation, generalized cross-validation methods, and corrected Akaike information criterion (see [1214] for more details). Specifically, in the cross-validation procedure, the optimal value of the bandwidth is chosen to minimize the following expression: where stands for the th predicted value of the response under the bandwidth with the th observation omitted from the calibration process.

For convenience, we introduce the matrix notations. Let By solving the weighted least-squares problem in (3), we can obtain the local linear estimate of at as where indicates a two-dimensional vector with its first element being 1 and the other being 0.

Taking in (6) to be , , and , respectively, we can get the fitted value of , denoted by , as where is called “hat” matrix or smoothing matrix.

Further, the residual vector can be computed from which will be used in the next section.

3. A Procedure for Detecting Heteroscedasticity in Nonparametric Regression

As mentioned in introduction, in real-world data analysis, we rarely know in advance whether the error term is homoscedastic, which deals with the problem of testing for heteroscedasticity. That is, the hypothesis to be tested is where is a certain constant.

Let be the residual vector which is described in (9). In order to construct a test statistic suitable for quantifying the heteroscedasticity of the error term in nonparametric regression, we use the transformed residuals where with “tr” standing for the trace of a matrix and is the th diagonal element of the matrix .

If the null hypothesis in (10) is true, which means that the variance of the error term in model (1) is constant, the values of should not have any trend, whereas there will be some variations in if heteroscedasticity is present. Therefore, we can test heteroscedasticity of the error term by analyzing the trend of . Along this line of thinking, the hypothesis in (10) amounts to the hypothesis

According to the literatures Diblasi and Bowman [15] and Wei et al. [16], the random variables are approximately independent and identically distributed. Let Then are approximately independent under and . Therefore, the test statistic is constructed as follows: where is the indicative function.

If the null hypothesis in (10) (or (13)) is true, which means the model error term is homoscedastic, we have where denotes the binomial distribution with the parameter being 1/2 and the sample size being . By noting the fact that the test statistic is symmetric or approximately symmetric with respect to , the value of the test statistic tends to be large if the error heteroscedasticity is present. Therefore, the -value of testing versus based on the statistic is where is the observed value of computed by (15). For a given significance level , reject if ; otherwise, do not reject .

4. Simulation Studies

As mentioned in the introduction, Zhang and Mei [7] also proposed a test method for detecting heteroscedasticity in nonparametric models. The particular method that they used is the -test applied to the squared residuals , which are shown in (9). A comparison with Zhang and Mei’s method is conducted in this section to assess the validity of the proposed test method.

The following three types of regression and variance functions are considered:(1) , ;(2) , ;(3) , ,

where and is a constant.

Using the above regression and variance functions, we can formulate three models to generate the experimental data. For convenience, the models that correspond to those three settings of regression and variance functions are denoted by Model 1, Model 2, and Model 3, respectively.

In each model, the observations of the explanatory variable are equidistantly taken on the interval ; that is, , . The constant in the variance functions is considered to be 0, 0.5, and 1.0, respectively. Note that refers to the model with the error term being homoscedastic, and the variance function deviates from homoscedasticity more and more significantly with the value of increasing. The sample sizes are taken to be and , respectively.

Furthermore, in order to evaluate the robustness of the test methods (the proposed and Zhang and Mei’s methods) on the error distributions, the random numbers are independently drawn from , , and the standardized Chi-square distribution with 4 degrees of freedom, respectively.

Given each of Models 1, 2, and 3 for each combination of the values of the constant , the error distributions, and the sample sizes, we ran attempts of replication of the testing procedure either for our proposed method or Zhang and Mei’s method, in which the Gauss kernel function is adopted and the bandwidth is selected by the cross-validation procedure. Throughout attempts of replication, we record the frequency of rejecting the null hypothesis under the significance level and the related results are reported in Table 1.

We see from Table 1 that, under the normality distribution of the error term, the rejection frequency of both methods under (i.e., ) is reasonably close to the corresponding significance levels for both sample sizes. On the other hand, two test methods perform quite differently for different types of variance functions under the alternative hypothesis. Although the rejection frequency computed by our method tends to be undersized for monotone variance function (see Model 3), it is much larger than that obtained by Zhang and Mei’s method for high frequency variance functions (see Models 1 and 2), which means that our method is of satisfactory power in detecting heteroscedasticity, especially when the variance function shows many alternations.

Under the situations where the distribution of the model error term is nonnormal, we see from Table 1 that, under (i.e., ), the estimated values of the nominal probability computed from our method are more stable and more close to the corresponding significance levels than those obtained by Zhang and Mei’s method with respect to different types of error distributions, which indicates that the proposed test approach is more robust to the choices of the error distributions. Furthermore, the values of the rejection frequency for both test methods under show the same patterns, which demonstrate that our test approach is more powerful in detecting high frequency variance functions.

5. An Example on the Application of the Proposed Method

A real-world data set is analyzed in this section to demonstrate the application of the proposed method. Specifically, with the observed data of the average temperature (AT) of each day in Xi’an, China, from January 1, 1951, to December 31, 2000, the mean of the average temperatures collected on the same days of the 50 years is taken as the values of the average temperature (unit: degree). It is worth pointing out that the data on February 29 during the 50 years have been excluded. Furthermore, the observations of the explanatory variable in the regression function are taken as the time orders from January 1 to December 31.

Based on the observations which are graphically shown in Figure 1, the following nonparametric regression model is considered: where is assumed to satisfy and . Here, we test whether or not the model error term is homoscedastic. By using the proposed test method with the Gauss kernel function, the optimal value of the bandwidth selected by the cross-validation procedure is and the resulting -value is . Because of the extremely small -value of the test statistic, we may conclude that the heteroscedasticity of the model error is significant over the time that ranges from January 1 to December 31.

6. Final Remarks

In this paper, a test which is free of the types of the error distributions is developed for detecting heteroscedasticity in nonparametric regression models. Specifically, the statistic is constructed on a basis of appropriate transformation of the residuals after fitting the regression model with the local linear estimation as well as the idea of trend analysis in nonparametric statistics. In order to assess the performance of the proposed method, we conduct a simulation comparison with other procedures and the results are satisfactory, especially for high frequency variance functions.

Compared to Zhang and Mei’s method, the power of the proposed test when heteroscedasticity is present tends to be underestimated for monotone variance function. This is reasonable because the former is mainly formulated based on the monotone trend of the squared residuals, while the latter is a sign-based testing method.

Anyhow, due to its conceptual simplicity and easy implementation, our method is useful in testing heteroscedasticity in nonparametric regression, especially for the variance functions with many alternations.

Conflict of Interests

The authors declare that there is no conflict of interests regarding publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundations of China (no. 11326181 and no. 11201123), International Cooperative Project in Henan Province (no. 134300510034,) and the start fund of doctorial scientific research (no. 09001624). The authors are especially grateful to the reviewer and editor for their valuable comments and suggestions which led to significant improvements in the paper.