Recent Advances in Univariate and Multivariate ModelsView this Special Issue
Research Article | Open Access
Least Absolute Deviation Estimate for Functional Coefficient Partially Linear Regression Models
The functional coefficient partially linear regression model is a useful generalization of the nonparametric model, partial linear model, and varying coefficient model. In this paper, the local linear technique and the method are employed to estimate all the functions in the functional coefficient partially linear regression model. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to show the validity of the estimate procedure.
In this paper, we are concerned with a functional coefficient partially linear regression model (FCPLR), that is, where and are random explanatory variables, is a random vector, and is some measurable function from to for . We call the intercept function, and , , the coefficient functions. As usual, denotes the errors with zero-mean and fixed variance.
The FCPLR model, first introduced by Wong et al. , is a generalization of the nonparametric model, partial linear model, and varying coefficient model. Zhu et al.  studied a similar functional coefficient model, a functional mixed model, using a new Bayesian method. Model (1.1) reduces to a varying coefficient regression mode if the intercept function is a constant function and a partially linear regression model when the coefficient functions are constants. Many researchers, for example, Aneiros-Pérez and Vieu , made contributions to studying this kind of model. When and in model (1.1) coincide, the model becomes a semiparametric varying coefficient model discussed by Ahmad et al. . Since the FCPLR model combines the nonparametric and functional coefficient regression model, its flexibility makes it attractable in various regression problems .
Statistical inference for the FCPLR model mainly includes the estimations of the intercept function and the coefficient functions . To estimate the unknown functions in the nonparametric/semiparametric regression models, many statistical inference methods have been proposed over the past decades, such as the kernel estimate method [5–7], spline smoothing [8, 9], and two-step estimation method [10, 11]. Wong et al.  employed local linear regression method and integrated method to give the initial estimates of all functions in the FCPLR model. All the papers mentioned above used the least-squares technique to obtain the estimators of the unknown coefficient functions. The least-squares estimators ( method), of course, have some good properties, especially for the normal random errors case. It is well known that, however, the least-squares method will perform poor when the random errors have a heavy-tailed distribution in that it is highly sensitive to extreme values and outliers. This motivates us to find more robust estimation methods instead of the aforementioned inference methods for model (1.1).
Local linear approximation is a good method for nonparametric regression problems , and the method based on the least absolute deviations overcomes the sensitivity caused by outliers. As noted in Wang and Scott  and Fan and Gijbels , among many robust estimation methods, the method based on the local least absolute deviations behaves quite well. In this paper, we adopt the method, accompany with the local linear technique and the integrated method to estimate all the unknown functions in model (1.1). Furthermore, the estimating problem can be reduced to a linear programming problem, and the numerical solutions are obtained quickly by some available softwares subsequently (e.g., Matlab is very useful for this kind of problems). The main difficulty encountered in the proof of the asymptotic normalities is that the estimates have no closed form. This paper shows the asymptotic normalities of estimators through a method completely different from those based on the method, and the simulation results show that the method is a robust method indeed.
The rest of this paper is organized as follows. In Section 2, we describe the estimation method and the associated bandwidth selection procedure. Section 3 gives the the asymptotic theories of the estimators. Simulation studies are conducted in Section 4. A real application is given in Section 5. Section 6 gives the proofs of the main results.
2. Least Absolute Deviation Estimate
This section gives the main idea of the proposed estimation method; that is, local linear polynomials are used to approximate the nonparametric function and the functional coefficients, and the least absolute deviation technique is used to find the best approximation. Bandwidth selection technique is also discussed in this section. Throughout this paper, we suppose that is an i.i.d. sample from model (1.1) and assume the following conditions.
Assumptions(1) is a positive definite matrix, .(2)Bandwidth subjects to .(3)Random error , with zero mean and zero median, is independent of conditional on . The conditional probability density of given random variables and is continuous in a neighborhood of the point 0, and . are independent and identically distributed.(4)The density functions , , of , , and are continuous in neighborhoods of , and , and .(5)All functions , and are twice continuously differentiable in neighborhoods of and , respectively.(6)Kernel functions , are bounded, nonnegative, and compactly supported.(7).
To simplify typesetting, we introduce the following symbols:
2.1. Local Linear Estimate Based on Least Absolute Deviation
The main idea is to approximate the functional coefficients by linear functions for , that is, can be approximated by for in a neighborhood of within the closed support of , and by for in a neighborhood of within the closed support of . For simplicity, denoting , as , , and , as , for . The local linear least absolute deviation estimate ( estimate) of the unknown parameters , , and , denoted, respectively, by , , , , is the optimal solution of the minimization problem as follows: where , are the given kernel functions and is the chosen bandwidth. The optimization problem is equivalent to the following linear programming problem: There are many algorithms available for the optimal solution of problem (2.5); for example, the feasible direction method can be directly used to compute the optimal solution , and the numerical solution of (2.5) can be quickly computed by a series of Matlab functions.
By the integrated method , the estimator of the intercept function is defined by and the estimators of the coefficient functions , are defined by We focus our main task in establishing the asymptotic distributions of the estimators and for .
2.2. Selection of Bandwidth
It is well known that the choice of the bandwidth strongly influences the adequacy of the estimators. We use an automatic bandwidth choice procedure in this paper; that is, the absolute cross-validation (ACV) method, and the ACV bandwidth is defined as where and , are constructed based on observations with size by leaving out the th observation . According to Wang and Scott , this bandwidth is better than cross-validation (CV) bandwidth . The latter one is suggested by Rice and Silverman  and often used in curve regression as in Hoover et al.  and Wu et al. .
3. Asymptotic Theory
This section gives the asymptotic distribution theories of the estimators. Using Taylor’s expansion, for and , we have where is between and , and is between and . Let , , , then we have
The aim of this paper is to study the asymptotic behavior of and . Combining some technique reasons, we first introduce the new variables , , , and  and form a new equivalent problem as follows:
Let where is the objective function of the equality above and is the sign function.
Since the estimators have no closed forms, we first give the limit form of the function , which is critical to obtain the asymptotic properties of the estimators.
Theorem 3.1. Suppose Assumptions (1)–(7) hold, and , then for any fixed , , , , converges to , which is defined as
Remark 3.2. If the kernel functions , are symmetric about zero and Lipschitz continuous, the limit form of can be simplified as
Now we are in the position to state the asymptotic properties of the estimators.
Theorem 3.3. Suppose Assumptions (1)–(7) hold and , then one has where with and .
Remark 3.4. If the kernel functions are symmetric about zero and Lipschitz continuous, the results of Theorem 3.3 can be simplified as
Remark 3.5. Here, we have considered estimation method and asymptotic distributions for the case that two bandwidths are same. It is important to note that similar asymptotic theories can be obtained for the case that two bandwidths with the same order are different.
Remark 3.6. If we consider different bandwidth for kernel functions and . Furthermore, suppose the Assumptions (2) and (7) are replaced by and , for example, , and , respectively. Similar results also will be obtained except that all the second-order derivatives in the results will disappear.
Remark 3.7. This paper restricts the study to one-dimensional variable . The ideas used here can be adapted to higher dimensional variable , for example, consider -dimensional variable for the case of same bandwidths. Similar asymptotic distribution results for and can be obtained under the assumptions with Assumptions (2) and (7) being replaced by and , respectively.
In this section, we carry out some simulations to illustrate the performance of -method, and compare the performance of our -method with that of the -method. All the following simulations are conducted for sample size .
The following example is considered: where , , , , , are normally distributed with correlation coefficient , the marginal distributions of and are standard normal, , and and , are mutually independent.
In each simulation, the estimators of , , were computed by solving the minimization problem (2.5) and using the integrated method described in (2.6) and (2.7). We use the Epanechnikov kernel, , for every , . All bandwidths in a model are selected by the method proposed in Section 2.
To evaluate the asymptotic results given in Theorem 3.3, the quantile-quantile plots of the estimators are constructed. Figure 1 presents the quantile-quantile plots for with sample size and 100 replications, respectively, and these plots reveal that the asymptotic approximation is reasonable.
Figure 2 displays the true function curves of , and and their estimated curves with sample size and one replication. We can see from the figure that the estimates perform well.
In order to illustrate that the method is a robust method. Figure 3 displays the estimated curves with four outliers, that is, , , , for the three element array . From Figure 3, we can see that the estimate also has a good performance even in the presence of four large singular points of . The fact that outliers have little influence on the estimates is displayed in Figure 3, so it is a robust method.
By solving the following minimization problem: we can obtain similarly the estimators of the functions by the equations (2.6) and (2.7). For comparing the -method with the method. We simulated the function by method and display the fitted curves with (without) outliers for sample sizes and 1000 replications in Figure 4. We can see that estimate cannot perform well for the data sparsity and singularity, and three small outlying data points make the estimated curve by the method deviate from the true curve significantly. Combining Figures 2, 3, and 4, we conclude that the method performs better than the method, the method is a robust method.
Finally, for further comparing the -estimate with the -estimate method, we also assess their performance via the weighted average squared error (WASE), which is defined as where range () for are the ranges of the functions , , and . The weights are introduced to account for the different scales of the functions. We conducted 200 replications with sample size . For the bandwidths and the Epanechnikov kernels used in the simulations, we obtain the mean and standard deviation of the WASE are 0.1201 and 0.0183 for the method, and 1.6613 and 0.8936 for the method. We can see that method outperforms method.
A real data is analyzed by the proposed -method in this section. The classic gas furnace data was studied recently by Wong et al. . The data set includes 296 samples , measured at a fixed interval of 9 seconds, where ’s represent the input gas rate in cubic feet per minute, and ’s represent the concentration of carbon dioxide in the gas out of the furnace. Similar to the procedures of Wong et al. , the original data are transformed as , for such that both ’s and ’s are limited in the interval , and the model is used to fit the data. The first 250 samples are used to establish the model, and the remained 46 samples are used for prediction.
In the proposed method, Epanechnikov kernel is used and all the bandwidths are selected as 0.14 via cross validation for simplicity. The mean absolute error (MAE) and the mean squared error (MSE) for fitting and forecasting are listed as follows.
Fitting: , ; forecasting: , .
Since the model is chosen based on the errors, the results are not perfect as that in Wong et al. . Moreover, the data set does not contain obvious outliers, so the advantage of the estimation method is not apparent. Compared to the results showed in Wong et al. , the difference between the fitting MAE/MSE and the forecasting MAE/MSE is small. The reason is that -methods are employed in our method. The fitted values and predictive values are shown in Figure 5. These results indicate that the estimated results are reasonable.
6. Proofs of the Main Results
Before completing the proofs of the main results, we give the following useful lemma first.
Lemma 6.1. Suppose Assumptions (1)–(7) hold, then for any fixed , converges to 0 in probability, that is, as .
Proof of Lemma 6.1. Let
Note that , and Assumptions (2), (5), and (7), for any fixed , , and , we have that
Let It can be easily seen that if . Hence we have where is the indicator function. For any , , we have Since , we have here . Combining (6.2) and (6.4), we have when . Combining (6.2), (6.5), and the argument as and , the desired conclusion follows by using the Chebyshev’s inequality. This completes the proof of Lemma 6.1.
Proof of Theorem 3.1. Set
By Lemma 6.1, under Assumptions (1)–(7), and converge to zero as .
Start from the equality, We first give the limit form of , for fixed , , , , we have By the Integral Mean Value Theorem (refer to the Appendix), we have where is the conditional probability distribution function of , and converge to zero as and . Then by Assumptions (1)–(5), for any small enough , we have Note that, for any fixed , , , , as , we draw the desired conclusion. This completes the proof of Theorem 3.1.
Proof of Theorem 3.3. Since the proofs of (3.7) and (3.8) are quite similar, we only give the proof of (3.8).
Start from the equality By Lemma 6.1 and Theorem 3.1, for fixed , , , , we have hence Note that We obtain that is bounded in probability for any fixed , , , . Thus, for any fixed , the random convex function converges to . According to the convexity lemma , we can deduce that, for any compact set , when . By the proof of Theorem 2 in Wang , we obtain that the “limit" here is not only in the sense of the limit of a sequence of random variables, but also in the sense of the limit of a sequence of stochastic process, and the minimizer of converges to the minimizers of .
By the convexity of the function , we have
Thus we obtain By interchanging summation signs and noting that can be rewritten as