Abstract

So far, most of the researchers developed one type of estimator in nonparametric regression. But in reality, in daily life, data with mixed patterns were often encountered, especially data patterns which partly changed at certain subintervals, and some others followed a recurring pattern in a certain trend. The estimator method used for the data pattern was a mixed estimator method of smoothing spline and Fourier series. This regression model was approached by the component smoothing spline and Fourier series. From this process, the mixed estimator was completed using two estimation stages. The first stage was the estimation with penalized least squares (PLS), and the second stage was the estimation with least squares (LS). Those estimators were then implemented using simulated data. The simulated data were gained by generating two different functions, namely, polynomial and trigonometric functions with the size of the sample being 100. The whole process was then repeated 50 times. The experiment of the two functions was modeled using a mixture of the smoothing spline and Fourier series estimators with various smoothing and oscillation parameters. The generalized cross validation (GCV) minimum was selected as the best model. The simulation results showed that the mixed estimators gave a minimum (GCV) value of 11.98. From the minimum GCV results, it was obtained that the mean square error (MSE) was 0.71 and R2 was 99.48%. So, the results obtained indicated that the model was good for a mixture estimator of smoothing spline and Fourier series.

1. Introduction

Regression curve approaches that are often used are parametric regression and nonparametric regression approaches. However, not all relationship patterns can be approached with a parametric approach because there is no information about the form of the relationship between the response variable and the predictor variable. If the shape of the curve is unknown and the pattern is spread, it can be assumed that the regression curve uses an approach of the nonparametric regression model. Some nonparametric regression models that are widely used are spline [1] and Fourier series estimators [2].

Estimation methods have attracted a lot attention of nonparametric regression researchers and become popular among them. One of the methods is the smoothing spline estimator. Smoothing spline estimates nonparametric regression functions that are assumed to be smooth in the sense that the function is included in a particular function space and is often assumed to be appropriate in the Sobolev space. Smoothing spline also has an excellent power to handle data of which the nature changes at certain subintervals [37]. In addition to the spline estimator, there is a popular estimation technique in nonparametric regression, namely, the Fourier series estimator. Fourier series is a trigonometric polynomial that has flexibility so that it can adapt effectively to the local nature of the data. The Fourier series estimator is generally used if the data investigated for the pattern are unknown and there is a tendency for repeated patterns [8, 9]. Fourier series is one model that has a good statistical and visual interpretation among the nonparametric regression models [10]. Advantages of estimating Fourier series are being able to handle data characters that follow repeated patterns at certain trend intervals and having good statistical interpretation. In previous research studies, only one type of estimator was developed.

Along with the development of research on nonparametric regression, lately a mixed estimator has been developed in nonparametric regression. Sudiarsa et al. developed a combination estimator of Fourier series and truncated spline [11]. Budiantara et al., Rismal et al., and Ratnasari et al. developed a mixture of kernel and truncated spline estimators [1214]. Other research on mixed estimators was conducted by Afifah et al. and Nisa et al. who developed a mixture of kernel and Fourier series [15, 16]. Recent research about mixture estimator is mix local polynomial and truncated spline which was developed by Suparti and Santoso [17]. Mixed smoothing spline and kernel estimator was developed by Hidayat et al. [1820]. In daily life, mixed data patterns often appear, and some data patterns change at certain subintervals and some follow recurring patterns at certain trends. So as to handle the data pattern, in this study, we developed combination estimation of smoothing spline and Fourier series. Based on the description of previous research studies, focus of this paper will be emphasized on the nonparametric regression model that combines smoothing spline and Fourier series obtained through optimization of penalized least squares (PLS). Furthermore, this combined estimator is applied to the simulation data. These simulation data are generated from two different functions to represent two different patterns of predictor variables so that this condition is in accordance with the combined estimator that was developed.

2. Materials and Methods

The data provided is in pairs of which is assumed that the predictor variables were and the respon variables . The relationship between the two variables follows nonparametric regression multivariable model:

Assume the multivariable nonparametric regression model is additive, so the regression model is obtained as follows:

In the estimation process using PLS function, is a fixed model. Furthermore, the function is approximated by the Fourier series function. Function is assumed to be in Sobolev space.

Based on equation (2) for the following equation is obtained:

Its matrix form can be written as

The regression model in equation (4) can be written as

Furthermore, equation (5) can be written aswith

Component regression curve is assumed to be smooth in the sense that it is contained in the Sobolev space .

Component regression curve is approached by using the Fourier series function:

Combined estimator smoothing spline and Fourier series in the nonparametric regression estimation method can be obtained through two stages. The first stage is done by completing a smoothing spline component using the PLS method, and the second phase is done by completing the Fourier series components using the LS method. To complete the smoothing spline components, equation (6) is modified to the following form:where . Estimation of the smoothing spline component can be obtained by PLS optimization; penalty for penalized least squares optimization:where goodness of fit is expressed aswith. Furthermore, the results estimated in the first stage are substituted into regression equation (8). The second stage, to get the estimation of the components of the Fourier series, is obtained by LS optimization. Furthermore, the results of two-stage estimation are substituted into equation (6) to obtain a combined smoothing spline and Fourier series estimator in multivariable nonparametric regression.

3. Results and Discussion

Function g in equation (6) is a function whose form is unknown and assumed to be smooth in the sense of being contained in space W. Then, space W can be decomposed into direct sum of two spaces W0 and W1 which are perpendicular to each other, that is, with . The following describes the shape of the functions g and goodness of fit which is explained in the following theorem.

Theorem 1. If given goodness of fit equation (9), i.e.,then goodness of fit can be written aswhere

The proof of Theorem 1 is provided in Appendix A.

Lemma 1. If given a penalty of PLS as equation (9), a penalty can be written aswhere

The proof of Lemma 1 is provided in Appendix B.

Then, according to Theorem 1 and Lemma 1, the estimator component smoothing spline , Fourier series , and combination smoothing spline and Fourier series will be searched. The whole process is described in Theorem 2.

Theorem 2. If given the regression model:then the estimator for and obtained through the optimization PLS of equation (9) is given bywhere

The proof of Theorem 2 is provided in Appendix C.

3.1. Simulation

In this research, a simulation is conducted where it aims to show the ability of a combined estimator of smoothing spline and Fourier series in multivariable nonparametric regression. Data are generated from the polynomial function for smoothing spline; trigonometric functions for Fourier series and errors are normally distributed. The simulation about mixed estimator smoothing spline and Fourier series in nonparametric regression uses sample size n = 100, oscillation parameters (K = 1, K = 2 and K = 3). Each oscillation parameter is repeated fifty times. The regression equation designed for this simulation study is as follows:with

Furthermore, generation function error is obtained from the distribution , , . Scatterplot simulation data are shown in Figure 1.

The plot of simulation data in two dimensions is shown in Figure 1; from Figure 1, it can be seen that the data tend to change on a particular subinterval such as pattern smoothing spline and data tend to have a repeating pattern like the Fourier series pattern. Based on estimates using two-stage estimate for the simulated data, the best model for the combined estimator is modeled with smoothing parameter and oscillation parameters. After obtaining the optimal smoothing parameter and oscillation parameters, the minimum GCV value is chosen. The GCV values of the oscillation parameters K = 1, K = 2, and K = 3 are given in Table 1.

Table 1 and Figure 2 show that the GCV minimum value is 39.79, the optimal smoothing parameter is 0.9, and the oscillation parameter is K = 1. This model gives satisfying results with GCV = 39.79; MSE = 0.13; and R2 = 99.48%.

The plot between the estimation results and the original simulation data is presented in Figure 3. Based on Figure 3(a), the estimated data plot is very close to the original data, so this model can be used to predict very well. Furthermore, on the left side of Figure 3(b), surface plots are formed from equation (C.15), which is an equation to generate simulated data. Figures 3(a) and 3(b) show the same meaning, so that the estimation formula in equation (C.15) can be used appropriately to estimate the simulation data.

Based on the theory, it was obtained a mixed estimator of smoothing spline and Fourier series. This theory was then proved through simulation; based on the simulation which was conducted, the theory produced a result of R2 whose value was high. This result of the research can be developed by the other research using different estimators on mixed patterned data. Besides that, optimization methods can be used to solve other mixed estimator problems, multiresponse model, and longitudinal data.

4. Conclusions

Based on the discussion, the following conclusions can be drawn:(a)Based on PLS optimization, the smoothing spline component estimator is obtained as , Fourier series component estimator is obtained as , and combined smoothing spline and Fourier series estimator is obtained as , which are given by(b)The result of the simulation is that mixed estimator smoothing spline and Fourier series is good because it has R2 = 99.48% and MSE = 0.13.

Appendix

A. Proof for Theorem 1

is a component of smoothing spline. If the base on space W0 is with m being the polynomial spline order and base on space W1 is with n being the number of observations, then for each function, can be written as follows:where can be written aswhile can written as follows:where and are constants. So, for each function, can be described as follows:withand by describing as a linear functional limited to space W and , the equation can be presented as follows:

is a linear functional limited to space W, so we get a single value which is a representation of and satisfies the following equation:

Based on equation (A.3), (A.7) can be written as

For j = 1, equation (A.8) can be stated as follows:and then for i = 1, we obtain

If the process is continued in the same way, then i = n is obtained:

Vectorcan be stated in the following form:

In the same way, for , we can obtain the following equation:

Remembering , the matrix can be written as

Based on equations (A.12) and (A.13), spline estimator form in equation (A.3) can be stated in the following form:

Furthermore, component regression curve is a regression curve of unknown shape and contained in continuous space. Component regression curve is approached with Fourier series functions, as follows:

The Fourier series regression equation can be written as follows:

Equation (A.17) can be written in the following form:

In equation (A.16), function of the Fourier series in the nonparametric regression component with the predictor can be expressed in the following form:where

Goodness of fit component is as follows:where .

So, it is proven that goodness of fit can be written as

Here is an explanation for the penalty component of the optimization PLS.

B. Proof for Lemma 1

Penalty component can be obtained with the following explanation:

By substituting equation (B.1) into the penalty component, the following can be obtained:

So, it is proven that

C. Proof for Theorem 2

Theorem 1 discussed about the function and goodness of fit on spline function; the first step will be conducted by making estimator mixture by using PLS. The description will be processed below.

Based on the goodness of fit in Theorem 1 and Lemma 1, completion of a combination estimator is described as follows:e.g.,, so that equation (C.2) becomes

Substituting equation (C.3) into equation (C.4), we obtain

It has previously been described that ; then, .

Equation (C.5) can be rewritten as

Substituting equation (C.6) into equation (C.3), we getso that

After getting , the next value will be sought, i.e., , which is explained in the second step below. Next, we substitute equation (C.8) into equation (4) so that it becomes

Substituting equation (A.19) into equation (C.9), we get

After getting equation (C.10), the next step is to look for which will be explained as follows:

The next step is to derive equation (C.11):

Substituting equation (C.12) into equation (A.19), we get

After getting , we substitute equation (C.13) into equation (C.8), thus obtaining

Based on equation (C.14), we obtain a combination estimator of smoothing spline and Fourier series:where

So, it is proven thatwhere

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Ni Putu Ayu Mirah Mariati thanks the Kemendikbud (DIKTI), the Republic of Indonesia, which has given the BPPDN scholarship. The authors thank Direktorat Jenderal Penguatan Riset dan Pengembangan, Kemendikbud (DIKTI), the Republic of Indonesia, which funded this study via grant Penelitian Disertasi Doktor (PDD) in 2020.