Abstract

We introduce a new partially linear functional additive model, and we consider the problem of variable selection for this model. Based on the functional principal components method and the centered spline basis function approximation, a new variable selection procedure is proposed by using the smooth-threshold estimating equation (SEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero and simultaneously estimates the nonzero regression coefficients by solving the SEE. The approach avoids the convex optimization problem, and it is flexible and easy to implement. We establish the asymptotic properties of the resulting estimators under some regularity conditions. We apply the proposed procedure to analyze a real data set: the Tecator data set.

1. Introduction

Functional data may be viewed as realization of observed stochastic processes, and it is commonly encountered in many fields of applied sciences, such as econometrics, biomedical studies, and physics experiment. The Tecator data set is collected by the Tecator company and is publicly available at http://lib.stat.cmu.edu/datasets/tecator. This data set consists of 215 meat samples. The measurements were made through a spectrometer named the Tecator Infratec Food and Feed Analyzer, and the spectral curves were recorded at wavelengths ranging from 850 nm to 1050 nm. For each meat sample, the data consists of a 100 channel spectrum of absorbances as well as the contents of moisture (water), fat, and protein. The three contents of fat, protein, and moisture (water), measured in percentages, are determined by analytic chemistry. We aim to predict the fat content of a meat sample. In this paper, we propose a new partially linear functional additive model and apply the SEE procedure to analyze the Tecator data set.

With the development of computer technology, much progress has been made on developing methodologies for analyzing functional data by many researchers, like Ramsay and Silverman [1], Cardot, Ferraty, and Sarda [2], Lian and Li [3], Fan, James, and Radchenko [4], Feng and Xue [5], Yu, Zhang, and Du [6], Zhou, Du, and Sun [7], among others. Regression models play a major role in the functional data analysis. The most widely used regression model is the following functional linear model: where is a scalar response, functional predictor is a smooth and square-integrable random function defined on a compact domain for simplicity, is the square-integrable regression parameter function, and is a random error, which is independent of . A commonly adopted approach for fitting model is the basis expansion; that is, , where . Model (1) is then transformed to a linear form with the coefficients : , where and . The basis function set can be either predetermined (e.g., Fourier basis, wavelets, or B-splines basis) or data-driven. One convenient choice for data-driven basis is the eigenbasis of the autocovariance operator of , in which case the random coefficients are called the functional principal component (FPC) scores. The FPC scores have zero means and variances equal to the corresponding eigenvalues . We focus on the FPC representation of the functional regression throughout this paper.

Müller and Yao [8] relaxed the linearity assumption and proposed a functional additive model (FAM). This leads to a more widely applicable and flexible framework for the functional regression models. In the case of scalar response, the linear structure is replaced by the sum of nonlinear functional components; that is, where are unknown smooth functions. Model (2) was fitted by estimating by the functional principal component analysis, and by estimating by the local polynomial smoothing in Müller and Yao [8]. Zhu, Yao, and Zhang [9] proposed a new regularization framework for the structure estimation of FAM in the context of reproducing kernel Hilbert spaces. The selection was achieved by the penalized least squares using a penalty which encourages the sparse structure of the additive components, and the rate of convergence was investigated.

However, in many real world problems, it is common to collect information on a large number of nonfunctional predictors. How to incorporate scalar predictors into the functional regression and perform model selection are important issues. In this paper, we combine the linear model with the functional additive model and introduce a new partially linear functional additive model (PLFAM).

Traditional real-value additive models were studied in Stone [10], Wang and Yang [11], Huang, Horowitz and Wei [12], and Zhao and Xue [13]. When the explanatory variables are of functional nature, Ferraty and Vieu [14] used a two-step procedure to estimate an additive model with two functional predictors. Fan, James, and Radchenko [4] suggested a new penalized least squares method to fit the nonlinear functional additive model. This method can efficiently fit the high-dimensional functional models while simultaneously performing variable selection to identify the relevant predictors. Febrero-Bande and Gonzalez-Manteiga [15] extended the ideas of the generalized additive models with multivariate data to the functional data covariates. The proposed algorithm was a modified version of the local scoring and backfitting algorithm that allows for the nonparametric estimation of the link function.

In the last decades, variable selection has received substantial attention, which has been a very important topic in regression analysis. Generally speaking, most of the variable selection procedures are based on penalized estimation based on some penalty functions, like Lasso penalty [16], SCAD penalty [17], Adaptive Lasso [18], and so on. However, these penalty functions have a singularity at zero such that these penalized estimation procedures require convex optimization, while adding the burden of computation. To overcome this problem, Ueki [19] developed a new variable selection procedure called the smooth-threshold estimating equation that can automatically eliminate irrelevant parameters by setting them as zero. The method has also been successfully applied to a large class of models; for example, Lai, Wang and Lian [20] explored the generalized the estimation equation (GEE) estimation and the smooth-threshold generalized estimation equation (SGEE) variable selection for single-index model with clustered data. Li et al. [21] considered the SGEE variable selection for the generalized linear model with longitudinal data. Tian, Xue, and Xu [22] proposed a smooth-threshold estimating equation variable selection for varying coefficient models with longitudinal data.

As we know, functional regression models have been widely applied to engineering problems. For example, Escabias, Aguilera, and Valderrama [23] used functional logistic regression to deal with the environmental problem, which is to estimate the risk of drought in a specific zone from time evolution of temperatures. Sonja, Branimir, and DraDen [24] dealt with tool wear in milling process and the prediction of its behaviour by utilizing functional data analysis (FDA) methodology. Pokhrel and Tsokos [25] applied functional data analysis techniques to model age-specific brain cancer mortality trend and forecast entire age-specific functions using exponential smoothing state-space models.

In this article, we propose a new functional regression model and consider the variable selection problem for this model; then we apply the proposed procedure to analyze the Tecator data set. Motivated by the idea in Ueki [19], based on the functional principal components analysis and the centered spline basis function approximation, an automatic variable selection procedure is proposed using the smooth-threshold estimating equation. The proposed procedure automatically eliminates the irrelevant parameters in the model, while estimating the nonzero regression coefficients. Our approach can be easily implemented without solving any convex optimization problems, and it reduces the burden of computation. The proposed method shares some of the desired features including the oracle property. The proposed smooth-threshold estimating equation approach is flexible and easy to implement. Finally, the proposed method is applied to analyze a real data set: the Tecator data set. The validity of the partially linear functional additive model and the SEE method are confirmed.

The rest of this paper is organized as follows. In Section 2, we propose a variable selection procedure for PLFAM and study the asymptotic properties under some regularity conditions. In Section 3, we give the computation of the estimators as well as the choice of the tuning parameters. In Section 4, we apply the proposed method to analyze the Tecator data set. Concluding remarks are presented in Section 5. The technical proofs of all asymptotic results are provided in the Appendix.

2. Methodology and Main Results

Let be a real-valued random variable, be a -dimensional vector of random variables, and be a zero mean and square-integrable random function defined on interval . Assume that are independent identically and distributed realizations of the pair . Denote by the FPC scores sequence of , which is associated with eigenvalues with

2.1. Smooth-Threshold Estimating Equation

For the convenience of model regularization, we would like to restrict the predictor variables to take values in without loss of generality. This is achieved by taking a transformation of the FPC scores through a cumulative distribution function (CDF) for all . We take the normal CDF denoted by , with zero mean and variance . It is easy to see that, if s follow normal distribution, the normal CDF leads to uniformly distributed transformed variables on .

Denoting the transformed variable of by , i.e., , and denoting , we propose a partially linear functional additive model as follows: where is independent error with zero mean and variance , is a vector of unknown regression coefficients, and is a smooth function. To ensure identifiability, we assume that . In this paper, we assume that the PLFAM is the sparse structure of the underlying true model, and this assumption is critical in the context of functional data analysis. It means that the number of important functional additive components that contribute to the response is finite, but not necessarily restricted to the leading terms. In particular, we denote by the index set of the important FPC scores and assume that , where denotes the cardinality of a set. In other words, there is a sufficiently large such that , which implies that as long as . Model (3) is thus equivalent to We replace with its basis function approximations. Let be the centralized B-spline basis functions with the order of , where , and is the number of interior knots. Thus, can be approximated by Substituting this into model (4), we have

Let , , and , Due to the unknown , we substitute the estimator of by , where and are the estimators of and , respectively. Let , , and define the following estimating function of

Motivated by the idea of Ueki [19], we propose the following smooth-threshold estimating equation: where , is the -dimensional identity matrix, and is the diagonal matrix, i.e., , where is a diagonal matrix, and is a diagonal matrix. Note that reduces to for , and reduces to , that is, , for Therefore, (8) can yield a sparse solution. Unfortunately, we cannot directly obtain the estimator of by solving the smooth-threshold estimating equation (8). The reason is that (8) involves the unknown parameters and , which need to be chosen using some data-driven criteria.

For the choice of and , Ueki [19] suggested that and may be determined by the data, and , with initial estimators and , respectively. The initial estimators and are the solutions of the estimating equation . Note that this choice involves two tuning parameters . In Section 3, we will propose a BIC-type criterion to select the tuning parameters. Replacing in (8) by with diagonal elements and , the smooth-threshold estimating equation (8) becomes

The solution of (9) denoted by is called the SEE estimator. Thus, is the estimator of , and is the estimator of . For the convenience of the notations, we define as the set of indices of nonzero regression coefficient estimators, and define as the set of nonzero function estimators. and are the sets of zero regression coefficient estimators and zero function estimators, respectively. We further denote , and The SEE estimator is defined in the following form: Where , , , and is the identity matrix with the same dimension of .

2.2. Asymptotic Properties

We first introduce some notations. Let and denote the true values of and , respectively. is the spline coefficient vector from the spline approximation to . Denote , , , and . We also denote and . The norm of is defined as .

For convenience and simplicity, let denote a positive constant that may be different at each appearance throughout this paper. We list some regularity conditions that are used in this paper.

(C1) The transformation function is differentiable at and , and satisfies the fact that and for some positive constants and negative constants .

(C2) The second derivative is continuous on with probability 1 and with probability 1, for

(C3) The spline regression parameter is identified; that is, there is a unique , where the parameter space is compact.

(C4) The inner knots satisfy where , , , and .

(C5) is the th continuously differentiable on , where

The following theorem gives the consistency of our proposed estimators.

Theorem 1. Suppose that regularity conditions (C1)-(C5) hold. For any positive and , and , as , and if , one has

In the following theorem, we will show that such consistent estimators enjoy the sparsity property.

Theorem 2. Under the regularity conditions of Theorem 1, one has(i),(ii)

Next, we will show that the estimators of the nonzero coefficients for the parametric components have the same asymptotic distribution as that based on the correct model.

Theorem 3. Suppose that the regularity conditions of Theorem 1 hold; as , one has, where is defined in (A.23) in the Appendix, is the number of , is the identity matrix, and “” represents the convergence in distribution.

3. Issues in Practical Implementation

3.1. Computational Algorithm

Since the transformed FPC scores cannot be observed, we first need to estimate the FPC scores before the estimation and selection of . In what follows, we propose the algorithm to implement the estimation procedure.

Step 1. Apply the functional principal component analysis (Zhu, Yao, and Zhang, 2014) to estimate the FPC scores of denoted by . Then we can obtain the transformed variables , where is the estimated eigenvalue, and is chosen to explain nearly of the total variation.

Step 2. Calculate the initial estimate of by solving the estimating equation .

Step 3. Choose the tuning parameters and by the BIC-type criterion in the next subsection.

Step 4. Solve the smooth-threshold estimating equations (9) and update the estimator of as follows:

3.2. Selection of the Tuning Parameters

Following Ueki [19], we minimize the following BIC-type criterion to choose the tuning parameters . where is the SEE estimator for given , and is the number of nonzero coefficients .

4. Application to Real Data

We demonstrate the effectiveness of the proposed method by an application for the Tecator data set. The Tecator data set contains 215 samples; each sample contains the finely chopped pure meat with different moisture, fat, and protein contents, which are measured in percentages and are determined by the analytical chemistry. The functional covariate by for each food sample consists of a 100-channel spectrum of absorbances recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850-1050 nm by the near-infrared transmission (NIT) principle. In this analysis, is the fat content, are functional principal component (FPC) scores of , and , we take the protein and the moisture content by and , respectively. In order to predict the fat content of a meat sample, many models and algorithms are proposed to fit the data; see, for example, Aneiros-Pérez and Vieu [26]. In this paper, to fit the data, we consider the following partially linear functional additive model: To compare the performance of different models, the sample is divided into two data sets: the training sample is used to obtain the estimators of the parameter and the nonparametric function, and the testing sample is used to verify the quality of prediction by the following mean squared errors of prediction (PMSE), In this section, we consider the variable selection problem for model (16). We apply the proposed smooth-threshold estimating equation approach to eliminate the irrelevant parameters in the model, while estimating the nonzero regression coefficients. Steps are as follows.

Step 1. Estimate the FPC scores of denoted by . Then, we can obtain the transformed variables , where is the estimated eigenvalue, and is chosen to explain nearly of the total variation.

Step 2. Calculate the initial estimate of by solving the estimating equation .

Step 3. Choose the tuning parameters and by the BIC-type criterion.

Step 4. Solve the smooth-threshold estimating equations (9).

The estimated values of the , , and are , , and , and is estimated to be 0. The SEE method selects two parametric components , and three nonparametric components: , , and . Hence, in conclusion, we would say that the strong linear relationship between the fat content and the protein and moisture contents, and , , and , are important functional nonparametric part. Our approach can be easily implemented and reduces the burden of computation.

Aneiros-Pérez and Vieu [26] proposed several additive semifunctional models, , , and , where are nonparametric function for , and . To assess the performance of the proposed PLFAM and SEE method, we compare with these additive semifunctional models and the SCAD-penalized method and the Lasso-penalized method. The PMSE results are reported in Table 1. From Table 1 we can see the following.

Real explanatory variables and can be used to improve the accuracy of the prediction; it is consistent with the conclusion of Aneiros-Pérez and Vieu [26].

Compared with these additive semifunctional models, the PLFAM has a much smaller PMSE. The proposed PLFAM performs better than other models.

Compared to the SCAD-penalized method and the Lasso-penalized method, the PMSE of PLFAM (SEE method) is smaller than all of the results of them.

These conclusions confirm the validity of the proposed PLFAM and SEE method.

5. Concluding Remarks

The article develops a SEE procedure for automatic variable selection in the partially linear functional additive model. The proposed procedure can identify nonzero regression coefficients significant variables from the parametric components and the nonparametric components simultaneously, and it automatically eliminates the irrelevant parameters by setting them as zero and simultaneously estimates the nonzero regression coefficients. It is noteworthy that the proposed procedure avoids the convex optimization problem, and the resulting estimator enjoys the oracle property. The application to the Tecator data set confirms the validity of the proposed PLFAM model and the SEE method.

Appendix

Proofs of the Main Results

In this Appendix, we will prove the main results stated in Section 2.

Proof of Theorem 1. Let , and . We will prove that, , there exists a constant , such that for large enough. This will be sufficient to ensure that there exists a local solution to the equation such that with probability at least . That is, and with probability at least . We will evaluate the sign of in the ball . Note that where lies between and . By some simple calculations, we have By the Cauchy-Schwarz inequality, we obtain that Since and , we only need to show the convergence rate of and . Given the initial estimator , and we assume that satisfies . By , for any and , we have This means that for each . Similarly, we get for each . Therefore, we have By this, (A.4), and (A.5), similar to the proof of Theorem 3.6 in Wang [27], we have , then
For , we have Hence Now consider , we can derive that where . For sufficiently large on the ball is asymptotically dominated in probability 1 by , which is positive for the sufficiently large . Thus, (A.1) holds. By some simple calculations, we have where for , and is a matrix with . From conditions (C3)-(C5) and Corollary 6.21 in Schumaker [28], we get that . Invoking the same arguments, we can get that . By , a simple calculation yields that In addition, it is easy to show that By conditions (C1) and (C2), and Lemma 2 in Zhu et al. [9], we get that Invoking (A.9)-(A.11), we complete the proof of Theorem 1.

Proof of Theorem 2. Similar to Theorem 1 in Zhao and Xue [29], it is known that the initial estimator obtained by solving the estimating equation is -consistent. Noting that , we can derive that which implies that . Invoking the same argument, we can derive that . Thus, we finish the proof of Theorem 2.

Proof of Theorem 3. As shown in Theorem 2, for with probability tending to 1. In addition, is the solution of the smooth-threshold estimating equation Applying a Taylor expansion to (A.13), we can show that can be approximated by On the other hand, where . Using the same argument, we obtain that By the simple calculation, we get By the fact , we have For simplicity, let By the inverse of the block matrix, we have where Consequently, we have where the elements of are for For convenience, we denote By (A.21), we have Similar to the proof of (A.16) in Tian, Xue, and Hu [30], we can show that We complete the proof of Theorem 3.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

Yuping Hu’s research was supported by the National Social Science Foundation of China (18BTJ021). Liugen Xue’s research was supported by the National Natural Science Foundation of China (11571025, Key Grant: 11331011) and the Beijing Natural Science Foundation (1182002). Sanying Feng’s research was supported by the National Nature Science Foundation of China (11501522) and the Startup Research Fund of Zhengzhou University (1512315004).