Research Article | Open Access
Integral Least-Squares Inferences for Semiparametric Models with Functional Data
The inferences for semiparametric models with functional data are investigated. We propose an integral least-squares technique for estimating the parametric components, and the asymptotic normality of the resulting integral least-squares estimator is studied. For the nonparametric components, a local integral least-squares estimation method is proposed, and the asymptotic normality of the resulting estimator is also established. Based on these results, the confidence intervals for the parametric component and the nonparametric component are constructed. At last, some simulation studies and a real data analysis are undertaken to assess the finite sample performance of the proposed estimation method.
In the recent literature, there has been increased interest in regression modeling for functional data, where both the predictor and response are random functions. Compared with the discrete multivariate analysis, functional data analysis can take into account the smoothness of the high dimensional covariates and can suggest some new approaches to the problems that have not been discovered before. Examples of functional data can be found in different application fields such as biomedicine, economics, and archaeology (see Ramsay and Silverman ). Furthermore, the statistical analysis for the regression model with functional data also has been considered by many authors. For example, Ramsay and Silverman  studied the linear regression model with functional data. Ait-Saïdi et al.  proposed a cross-validated estimation procedure for the single-functional index model. Ferraty et al.  and Chen et al.  considered the inferences for single and multiple index functional regression models by using the functional projection pursuit regression technology. In addition, Ferraty and Vieu  and Rachdi and Vieu  considered the nonparametric regression modeling for functional data. More works for the functional data analysis can be found in [8–10] and among others.
However, the linear functional model, which assumes that the model satisfies the linear relationship between the response and the covariates, may be too restrictive. Then, the semiparametric model with functional data is a useful extension of functional linear regression models and functional nonparametric regression models. More specifically, let , , and be continuous random functions of index ; then, the semiparametric regression model with functional data has the following structure: where is the response variable, is the covariate vector, is the covariate vector, is a vector of unknown parameters, is a vector of unknown function of , and is a zero-mean stochastic process. Here, without loss of generality, we assume that index ranges over a nondegenerate compact interval such as .
Because the samples of response and covariate are functions of index , the Euclidean distance cannot measure the distance between and . Hence, the ordinary least-squares method cannot be implemented directly. Recently, the studies for such semiparametric regression model with functional data have been considered by many papers (see Aneiros-Pérez and Vieu , Shin , and Lian ). In this paper, we provide additional positive results of the inferences for semiparametric models with functional data and extend the application literature of the classical least-squares technology. More specifically, we propose an integral least-squares method for estimating the parametric components and the nonparametric component. Furthermore, the asymptotic normalities of the integral profile least-squares estimators are studied. Some simulation studies and a real data application imply that the proposed method is workable.
The rest of this paper is organized as follows. In Section 2, we introduce the integral least-squares based estimation procedure for the parametric components and the nonparametric components. The asymptotic distributions of these estimators are also derived under some regularity conditions. In Section 3, some simulations and a real data analysis are carried out to assess the performance of the proposed estimation method. The technical proofs of all asymptotic results are provided in the Appendix.
2. Estimation and Asymptotic Distributions
Suppose is a random sample of size . From (1), we have that In this paper, we assume that, for given , and are i.i.d. for different and .
Now, we in turn model (2); for in a small neighborhood of , can be locally approximated by a linear function For given , applying the integral least-squares method, we can get the weighted local integral least-squares estimator of by minimizing where , is a kernel function, is a bandwidth, and denote the th component of . LetThen, the solution to (4) is given by where is identity matrix and is zero matrix. Substituting (6) into (2), and by a simple calculation, we have where Applying the integral least-squares technology to linear model (7), we can get the integral least-squares estimator of , say , by minimizing Let If the matrix is invertible, can be given by Let where , , , and . The following result states the asymptotic normality of .
Theorem 1. Suppose that conditions in the Appendix hold; then, one has where .
In order to construct the confidence interval of by Theorem 1, we give the estimator of , say , where is defined in (11) and Invoking , with the similar argument to Lemma A.6, we can prove that is a consistent estimator of . Thus, by Theorem 1, we have where is an identity matrix of order . Therefore, the confidence region of can be constructed by using (17).
Furthermore, substituting into (6), we can get the integral least-squares estimator of as We state the asymptotic normality of in the following theorem.
Theorem 2. Suppose that conditions in the Appendix hold. For given , then one has where , .
Let Using the law of large numbers, with the similar argument to Lemma A.2 and Proposition 4.1 of Xue and Zhu  and Lemma 1 of Wu et al. , it can be shown that , , and are consistent estimators of , , and , respectively. Finally, we can obtain the estimator by substituting and in with and , respectively. It can be shown that and are the consistent estimators of and , respectively. By Theorem 2, we have that where is the unit matrix of order .
Using (21), a pointwise confidence interval for can be given by where is the th component of , is the th element of , and is the quantile value of the standard normal distribution.
3. Numerical Results
In this section, we conduct several simulation experiments to illustrate the finite sample performances of the proposed method and consider a real data set analysis for further illustration.
3.1. Simulation Studies
To evaluate the performance of the proposed method, we consider the following model: where and . To perform the simulation, we generated samples, respectively. The covariates and are generated according to the model where , , , and , respectively.
We use the Epanechnikov kernel function and use the cross-validation method to determine bandwidth . Let and be the integral least-squares estimators of and , respectively, which are computed with all of the measurements but not the th observation. Define the integral least-squares cross-validation function The cross-validation bandwidth is the one that minimizes (25); that is,
For the parametric component , the average and standard deviation of the estimator , based on 1000 simulations, are reported in Table 1. In addition, the average length and coverage probability of the confidence interval , with a nominal level , are computed with 1000 simulation runs. The results are also summarized in Table 1.
For the nonparametric component , the average pointwise confidence intervals, based on 1000 simulations, with a nominal level are presented in Figure 1, and the corresponding coverage probabilities are presented in Figure 2.
Table 1 shows that, for the parametric component, our method can give a shorter confidence interval and the corresponding coverage probability is close to real nominal level. Figures 1 and 2 show that the average interval length decreases as the sample size increases, while the corresponding coverage probability increases. In addition, we can see that, for the nonparametric component, the proposed estimation method works well except for boundary points.
3.2. Application to Spectrometric Curves Data
In this section, we present an application of the proposed estimation method to spectrometric curves data. This original data comes from a quality control problem in the food industry. This data set concerns a sample of finely chopped meat, and each food sample contains finely chopped pure meat with different fat, protein, and moisture (water) contents. The sample size of this data set is , and, for each food sample, the functional data consist of channel spectrum of absorbances, which were recorded on the Tecator Infratec Food and Feed Analyzer working in the wavelength range 850–1050 nm by the near infrared transmission (NIT) principle. Because of the fineness of the grid, we can consider each subject as a continuous curve. Thus, each spectrometric analysis can be summarized by some continuous curves giving the observed absorbance as function of the wavelength. More details of the data can be found in Ferraty and Vieu .
The aim is to find the relationship between the percentage of fat content and the corresponding percentages of protein content , the moisture content , and the spectrometric curve . The results, obtained by Aneiros-Pérez and Vieu , indicate that there is a strong linear relationship between the fat content and the protein and moisture contents, but the spectrometric curve has a functional effect on the fat content. Hence, we consider the following semiparametric model:
We computed the estimators of the parametric components and and the nonparametric component by using the proposed integral least-squares method. The results for the parametric components are reported in Table 2, and the results for the nonparametric components are reported in Figure 3, where the solid curve is the estimator of and the dashed curve is the pointwise confidence interval of . From Table 2, we can see that there is a significant negative correlation relationship between the fat content and the protein and moisture contents. In addition, Figure 3 indicates that the baseline function really varies over the spectrometric curve . This finding basically agrees with what was discovered by Aneiros-Pérez and Vieu .
Proof of Theorems
For convenience and simplicity, let denote a positive constant which may be different value at each appearance throughout this paper. Before we prove our main theorems, we list some regularity conditions which are used in this paper.)The bandwidth satisfies , for some constant .()The kernel is a symmetric probability density function, and .(), , and are twice continuously differentiable on .(), , and and are continuous at , , where is the th component of .()For given , is positive definite matrix.
Lemma A.1. Let and be two sequences of real numbers. Let ; then, one has
Lemma A.2. Let , , be a sequence of multi-independent random variate with and . Then, one has Further, let be a permutation of . Then, we have
Lemma A.3. Let be i.i.d. random vectors, where is scalar random variables. Further, assume that , , where denotes the joint density of . Let be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Then, provided that for some .
Lemma A.4. Suppose that conditions hold. Then, one has where , are defined in (9) and .
then, a simple calculation yields
together with (A.6), (A.7), by using Lemma A.3, we obtain
uniformly for , where is the Kronecker product. By using the same argument, we have
uniformly for . Combining (A.9), (A.10), and (9) yields
uniformly for . With the similar proof to (A.11), we have
Combining (A.12) with (5), we can prove that
uniformly for .
Invoking (A.9) and (A.13), it is easy to show that This completes the proof of Lemma A.4.
Lemma A.5. Suppose that conditions hold. Then, one has that where is defined by (14) and .
Proof. By (8), a simple calculation yields where . Then, we have Hence, we get Note that , , are i.i.d. It is easy to show that , . Using the central limit theorem, we have Hence, to prove this lemma, we only need to prove , . Now, we deal with . Let be the component of and let be the th component of , . Moreover, let Let be the th component of . By Lemmas A.1–A.4, we obtain That is, . By Lemma A.4, a simple calculation yields Together with , and similar to the proof of , we have . In addition, by Lemma A.4, we have . This completes the proof of Lemma A.5.
Proof. Combining (8) and (11), a simple calculation yields By the law of large numbers, we can derive that . We now show . From Lemma A.4, we have that Together with , using the similar argument to Lemma A.5, we can prove . Similarly, we can prove that . In addition, by (A.25), it is easy to show that . This completes the proof.
Proof of Theorem 2. For in a small neighborhood of , such that , by using Taylor expansion, we can get
Together with (5) and (1), we can derive that
Invoking (18), (A.9), and (A.29), it is easy to show that
Let ; we can prove that and , where By the central limit theorem, we have that Combining this with (A.9), we can prove that Using the similar argument to (A.9) in Lemma A.4, we can get Together with (A.9) and , it is easy to prove that Invoking (A.31), (A.34), and (A.36), a simple calculation yields This completes the proof of Theorem 2.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This paper is supported by the National Natural Science Foundation of China (11101119), the Higher-Education Reform Project of Guangxi (Grant no. 2014JGA209), and the Training Program for Excellent Young Teachers in Guangxi Universities.
- J. O. Ramsay and B. W. Silverman, Applied Functional Data Analysis: Methods and Case Studies, Springer Series in Statistics, Springer, Berlin, Germany, 2002.
- J. O. Ramsay and B. W. Silveman, Functional Data Analysis, Springer, New York, NY, USA, 1997.
- A. Ait-Saïdi, F. Ferraty, and P. Vieu, “Cross-validated estimations in the single-functional index model,” Statistics, vol. 42, no. 6, pp. 475–494, 2008.
- F. Ferraty, A. Goia, E. Salinelli, and P. Vieu, “Functional projection pursuit regression,” TEST, vol. 22, no. 2, pp. 293–320, 2013.
- D. Chen, P. Hall, and H. Müller, “Single and multiple index functional regression models with nonparametric link,” The Annals of Statistics, vol. 39, no. 3, pp. 1720–1747, 2011.
- F. Ferraty and P. Vieu, “Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination,” Journal of Nonparametric Statistics, vol. 16, no. 1-2, pp. 111–125, 2004.
- M. Rachdi and P. Vieu, “Nonparametric regression for functional data: automatic smoothing parameter selection,” Journal of Statistical Planning and Inference, vol. 137, no. 9, pp. 2784–2801, 2007.
- M. Escabias, A. Aguilera, and M. Valderrama, “Functional PLS logit regression model,” Computational Statistics & Data Analysis, vol. 51, no. 10, pp. 4891–4902, 2007.
- F. Yao, H. Müller, and J. Wang, “Functional linear regression analysis for longitudinal data,” The Annals of Statistics, vol. 33, no. 6, pp. 2873–2903, 2005.
- J. T. Zhang and J. Chen, “Statistical inferences for functional data,” The Annals of Statistics, vol. 35, no. 3, pp. 1052–1079, 2007.
- G. Aneiros-Pérez and P. Vieu, “Semi-functional partial linear regression,” Statistics and Probability Letters, vol. 76, no. 11, pp. 1102–1110, 2006.
- H. Shin, “Partial functional linear regression,” Journal of Statistical Planning and Inference, vol. 139, no. 10, pp. 3405–3418, 2009.
- H. Lian, “Functional partial linear model,” Journal of Nonparametric Statistics, vol. 23, no. 1, pp. 115–128, 2011.
- L. Xue and L. Zhu, “Empirical likelihood for a varying coefficient model with longitudinal data,” Journal of the American Statistical Association, vol. 102, no. 478, pp. 642–654, 2007.
- C. O. Wu, C. T. Chiang, and D. R. Hoover, “Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data,” Journal of the American Statistical Association, vol. 93, no. 444, pp. 1388–1402, 1998.
- F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis, Springer, New York, NY, USA, 2006.
Copyright © 2014 Limian Zhao and Peixin Zhao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.