Abstract

Mixed estimators in nonparametric regression have been developed in models with one response. The biresponse cases with different patterns among predictor variables that tend to be mixed estimators are often encountered. Therefore, in this article, we propose a biresponse nonparametric regression model with mixed spline smoothing and kernel estimators. This mixed estimator is suitable for modeling biresponse data with several patterns (response vs. predictors) that tend to change at certain subintervals such as the spline smoothing pattern, and other patterns that tend to be random are commonly modeled using kernel regression. The mixed estimator is obtained through two-stage estimation, i.e., penalized weighted least square (PWLS) and weighted least square (WLS). Furthermore, the proposed biresponse modeling with mixed estimators is validated using simulation data. This estimator is also applied to the percentage of the poor population and human development index data. The results show that the proposed model can be appropriately implemented and gives satisfactory results.

1. Introduction

One of the most popular statistical methods often used for prediction is regression analysis. Regression analysis is commonly used to determine the functional relationship between independent variables (predictors) and dependent variables (responses) [1]. Functional relationships between predictor variables and response variables can have clear or unknown patterns; if these relationships have unknown patterns, the appropriate type of regression analysis is nonparametric regression [2]. In the nonparametric regression, the regression curve is assumed to be smooth. This regression has high flexibility because the data can drive to estimate its own regression curve without subjectivity from the researcher [3]. Researchers have proposed methods for estimating nonparametric regression functions such as spline, kernel, and Fourier series functions. The spline nonparametric regression has been developed by Eubank [3], Becher et al. [4], and Wang et al. [5]. Hall and Huang [6], Okumura and Naito [7], Du et al. [8], Chamidah and Saifudin [9], and Erçelik and Nadar [10] developed kernel nonparametric regression. Bilodeau [11] and Amato et al. [12] estimated the nonparametric regression function with the Fourier series function.

In applying nonparametric regression modeling, researchers sometimes assume that each predictor variable has the same pattern. Although there are often real cases with different patterns between the response and each predictor, if the researcher still insists on applying one type of estimator to all predictor variables, the estimation results can be inaccurate and produce a large error. Researchers have begun to develop nonparametric regression with a mixed estimator, including Hidayat et al. [13], Mariati et al. [14], and Octavanny et al. [15]. These mixed estimators are formed by referring to the idea of semiparametric regression. The semiparametric regression model is an additive regression model that consists of a parametric component and a nonparametric component; see the work of Green and Yandell [16], Roozbeh and Arashi [17], and Roozbeh and Najarian [18]. In these mixed estimators, the additive model concept in semiparametric regression is adapted by modification using two different nonparametric components. However, these mixed estimators only use one response variable, even though some biresponse cases also have different patterns among the predictor variables. At present, nonparametric regression studies with mixed estimators for biresponse cases have never been developed. Therefore, this study develops a new theory about the mixed estimator of spline smoothing and kernel in biresponse nonparametric regression. This mixed estimator is the development of the mixed estimator proposed by Hidayat et al. [13] that the kernel estimator is considered to be fixed, while the kernel function in this paper is estimated. In addition, this new mixed estimator can be applied for biresponse cases. This mixed estimator is obtained through a two-stage estimation. The first stage of estimation uses the penalized weighted least square (PWLS) to obtain the spline smoothing component, followed by the second stage that employs the weighted least-square (WLS) estimation method to estimate the kernel component.

The spline smoothing estimator is very dependent on the smoothing parameter, while the kernel estimator is very dependent on the bandwidth parameter. These smoothing and bandwidth parameters are tuning parameters. The optimal value of these parameters will produce the best regression model. In nonparametric and semiparametric regression, there are several methods to determine the optimal parameter value to obtain the best regression model. Some of the popular methods are the cross-validation (CV) and generalized cross-validation (GCV) methods. The CV method is a method of selecting the best model based on the best predictive ability from all the different datasets. This method is widely used, but the calculation of this method will become more complex as the number of datasets increases. In addition, for partial linear models including mixed estimator models, the one-out crossover method tends to be time consuming even for moderate sample sizes [19]. Craven and Wahba [20] modified the CV method to make the calculation simpler, and the result of this modification is called the GCV method. This method is widely used by researchers because it has several advantages. The advantages of the GCV method include the following: simple and efficient in calculation, invariant to transformation, and does not require variant information. This method also has the advantage of optimal asymptotic properties over other methods [21, 22]. Some researchers develop specific GCV methods according to the model in their research such as the GCV for semiparametric ridge regression with kernel smoothing [2325]; also, several types of GCV were developed for uniresponse nonparametric regression with mixed estimators including the mixed estimator of spline smoothing and kernel [13], mixed estimator of spline smoothing and Fourier series [14], and mixed estimator of truncated spline and fourier series for longitudinal data [15]. Therefore, in this study, the determination of the best model was carried out using the GCV method which was developed specifically for the mixed estimator of spline smoothing and kernel in biresponse nonparametric regression.

Next, the proposed mixed estimator is applied to the simulation data. The formula for generating data contains two different functions to represent two different patterns of predictor variables. This estimator is also implemented to model to the percentage of the poor population and human development index data in Papua Province, Indonesia. Empirical results indicate that the proposed mixed estimator performs very well for modeling the data with two different patterns. One pattern (response vs. predictors) tends to change at certain subintervals, and another pattern appears to be random, which are commonly modeled using kernel regression.

The rest of this paper is organized as follows: In Section 2, we present the materials and methods about the two-stage estimation method, i.e., the PWLS and followed by the WLS. The proposed mixed spline smoothing and kernel estimator in biresponse nonparametric regression is explained in Section 3.1. The selection of smoothing and bandwidth parameters using generalized cross validation (GCV) is described in Section 3.2. The simulation study and real data analysis are conducted to illustrate the performance of the proposed biresponse mixed estimator in Sections 3.3 and 3.4. The conclusions and further research are presented in the last section.

2. Materials and Methods

The paired data are given, where some predictor variables have a pattern that changes at certain subintervals, and the remaining predictors have a typically random pattern. This work adopts the idea of the semiparametric regression model developed by Green and Yandell [16], which employed an additive model to combine parametric and nonparametric models. Thus, the additive model for nonparametric biresponse regression with two different estimators can be formulated as follows:where is a regression curve. In the biresponse cases, if and are in pairs, then there is a correlation between the error of h-response and error of h’-response . Error correlation between responses can be defined as , where [5].

Each regression curve is assumed to be unknown and additive so that it can be written as

If equation (2) is substituted into equation (1), then equation (1) can be expressed in the following vector form:

The component of the regression curve is approximated by the spline smoothing function, which is assumed to be smooth and contained in the Sobolev space , while the component of the regression curve is approximated by the kernel function. Mixed spline smoothing and kernel estimator in biresponse nonparametric regression can be obtained through the two-stage estimation method. The first stage is performed by estimating the spline smoothing component using the PWLS method, and the second stage is estimating the kernel component using the WLS method. To estimate the spline smoothing part, equation (3) is modified to the following form:where .

Estimation of the spline smoothing component can be performed by applying PWLS optimization to equation (5), and then, the estimation results in the first stage are substituted into equation (3).

The estimation of the kernel component is obtained in the second stage estimation using WLS optimization in equation (6). Furthermore, the results of the two-stage estimations are substituted into equation (3) to obtain the mixed spline smoothing and kernel estimator in biresponse nonparametric regression.where is a weighting matrix for the kernel estimator and in equations (5) and (6) is a weighting matrix for the biresponse nonparametric regression like in the work of Wang et al. [5].

3. Results and Discussion

3.1. Mixed Spline Smoothing and Kernel Estimator in Biresponse Nonparametric Regression

Before conducting the estimation to obtain biresponse mixed spline smoothing and kernel estimator, it is necessary to obtain the function form for each component of this mixed estimator. The structure of the spline smoothing component is explained in Lemma 1, while the kernel component is explained in Lemma 2.

Lemma 1. If the regression curve is assumed to be smooth and contained in the Sobolev space [22], then the function form of the spline smoothing component in the biresponse nonparametric regression can be stated aswithwhere and are bases of spaces in the Sobolev space.

Proof. If is a function lying in Hilbert space H, the H space can be decomposed into a direct sum of two spaces H0 and H1 where and . If is the basis in and is the basis in H1, according to Wahba [22], for each function with and , we obtainwhereEquation (10) is a limited linear function in H space and ; therefore, equation (10) can be stated asand using inner product properties, equation (12) can be written asand for each response with all observations , we can obtain the following vector :wherewhere and are equal to with ; ; ;
Therefore, for all responses , we can obtain the function form of the spline smoothing component in the biresponse regression curve as follows:

Lemma 2. If the regression curve is approached by the kernel function, then the function form of kernel component in the biresponse nonparametric regression can be expressed aswherewithThe function is approached by the Taylor series with around .

Proof. The form of the regression function is derived from the component in equation (2). The form of the function is unknown and approached using the kernel estimator. The function for can be approached by the Taylor series with around as follows [9]:and if , then equation (20) can be stated aswhere , is the predictor value for prediction.
The kernel estimator can be obtained when the polynomial order . Therefore, the function form for each response involving all observations can be stated asThen, we can obtain the function form of component in biresponse nonparametric regression as follows:where ; .

Theorem 1. The biresponse nonparametric regression model is given in equation (1), where each component of the regression curve is additive as stated in equation (2). The function form of the component is presented in Lemma 1, and the function form of the component is presented in Lemma 2. Using PWLS in the first-stage estimation and WLS in the second-stage estimation, the mixed spline smoothing and kernel estimator in biresponse nonparametric regression is obtained as follows:wherewhile is obtained from the first-stage estimation and is obtained from the second-stage estimation.

Proof. The first-stage estimation on the mixed spline smoothing and kernel estimator in biresponse nonparametric regression is performed by estimating the spline smoothing component using the PWLS in equation (5). The penalty component in equation (5) can be obtained through the following decomposition [26]:where is the orthogonal projection of to H1 in H space. By substituting equation (26) into the penalty component, we obtainand furthermore, the PWLS optimization in equation (5) can be written in matrix notation as follows:The solution for the PWLS optimization can be obtained from the partial derivative by and . The partial derivative by results as follows:if ; thus, we can obtainThe partial derivative of by gives the resultand by substituting equation (30) into equation (31), we can obtainconsidering that so matrix V can be stated as ; then, we can modifyEquation (33) is substituted into equation (32), and then, we solve it and get the following result:Furthermore, by substituting equation (34) to equation (30), we obtain and are substituted into the function form of the spline smoothing component in equation (16), and then, the following spline smoothing estimator component in the biresponse nonparametric regression model is obtained:whereBecause , the first-stage estimation results can be stated asand remember that ; then, the model in equation (4) can be written asIn the second stage of estimation, the function as the kernel component on a mixed spline smoothing and kernel estimator in biresponse nonparametric regression is estimated using the WLS method with the following formula:where is a weighting matrix for biresponse nonparametric regression and is a weighting for the kernel estimator with the following structure:where and is the kernel function.
By substituting the results of the first-stage estimation (38) and the function form (Lemma 2) into the model of the mixed estimator (equation (3)), the error of this mixed estimator model can be written as follows:Furthermore, by supposing and substituting equation (42) into equation (40), we can get the following equation for WLS optimization:The solution for the WLS optimization can be obtained from the partial derivative by . The optimization result is obtained as follows:Therefore, the estimation for kernel estimator component can be written aswhereBased on the first-stage estimation results in equation (38) and the second-stage estimation results in equation (45), the estimation of the additive regression curve in equation (2) with the mixed spline smoothing and kernel estimator in biresponse nonparametric regression can be stated asand we can write equation (47) aswhere .

3.2. Selection of Smoothing and Bandwidth Parameters

The best model for the biresponse mixed spline smoothing and kernel estimator depends on the optimal smoothing parameters and optimal bandwidth parameters , where and are tuning parameters. These optimal parameters can be obtained from the model with the smallest generalized cross-validation (GCV) value, see [1315, 2325]. The GCV criteria are one of the methods to determine the best model in nonparametric regression [20]. The GCV formula for the mixed spline smoothing and kernel estimator in biresponse nonparametric regression (48) can be stated as follows:

3.3. Simulation Study

In this simulation, the mixed spline smoothing and kernel estimator in biresponse nonparametric regression, as formulated in equation (48), is applied to the simulation data. The simulation data are generated from a formula that contains two different functions, i.e., a polynomial function and an exponential function, to present two different patterns for each predictor. The polynomial function is used to generate spline smoothing-like patterns, and the exponential function is used to generate patterns in the data such as kernel. Using two response variables and two predictor variables, the formula for generating data is defined as follows:with

The predictors are generated from and with the sample size and the random errors are generated from bivariate normal distributions with , and . The scatterplots of the simulated data are shown in Figure 1. It can be seen that the pattern between against tends to change at certain subintervals such as the spline smoothing pattern, while the scatterplot between against tends to have a random pattern that is commonly modeled with kernel regression.

In this simulation, the Gaussian kernel was employed. Based on the empirical results from the two-stage estimation, we obtain combination values of the smoothing parameters and bandwidth parameters around the optimal values (Table 1). The exhibited results are a few of all combinations due to the limited space. The best model is chosen based on the smallest GCV value resulted from optimal smoothing parameters and along with optimal bandwidth parameters and . This model produces the lowest GCV = 3.239 with and .

The result of modeling simulation data using the mixed spline smoothing and kernel estimator is compared with modeling using either a spline smoothing estimator or kernel estimator only. The results of these models are presented in Table 2. From Table 2, we can find out the best model that gives the smallest GCV value is the model with the mixed spline smoothing and kernel estimator. Besides, this model has the largest R2 and the lowest MSE value.

The plot between the estimation results and the original simulation data is presented in Figure 2, where the estimated values (red triangles) are very close to the original data (black squares). Thus, the proposed model and estimation procedure can be used to make a prediction correctly. Furthermore, on the left side of Figure 3, the surface plots are formed using equation (50), which is the equation for generating simulation data, whereas on the right side of Figure 3, there are two surface plots for each response which are formed from equation (48) where its parameters are estimated using two-stage estimation, i.e., the PWLS and WLS. The two sides of Figure 3 show that the plots appear to have similar surface shape. This evidence indicates that the estimation procedure proposed in equation (48) can be used appropriately to estimate the function generated from the simulation.

3.4. Data Application

The biresponse mixed spline smoothing and kernel estimator proposed in this paper is applied to regress the percentage of the poor population (PPP), as the first response, and human development index (HDI), as the second response, on several predictors. These two response variables are important because they are indicators of the success level of a country’s development. The adoption of biresponse modeling for the two variables considers the initial study that there is a negative correlation between PPP and HDI. If the PPP in a region is getting lower, the HDI in that region will be higher [27, 28].

Several variables that typically affect the two response variables are the gross regional domestic product (GRDP) and the population growth rate. Some researchers have pointed out several factors that can affect the PPP and HDI, including Grubaugh [29], who stated the variables that influence the growth of HDI in developing countries are population, population growth, and the initial level of the gross domestic product (GDP). Meanwhile, Mallick and Ghani [30] found that high population growth is the cause of poverty in Pakistan. While in North Sumatra Province, Indonesia, the GRDP and education level to university have a positive and significant influence in reducing the PPP [31]. Based on Malthus’s theory, poverty is considered as the impact of high population growth rates [32]. Also, additional life support needs are considered slower than population growth. A high increase in population growth will have an impact on decreasing the quality of natural resources and reducing the opportunity for people to access life-support facilities. This situation can reduce the quality of human life, and people will be challenged to live in prosperity. Rapid economic growth is one way to alleviate poverty [33]. The GDP or GRDP has a close relationship with economic growth because the economic growth of a region is related to an increase in production or an increase in income per capita. Besides, if the GRDP is higher, the income per capita in the region will increase and have an impact on increasing the ability of the community to meet their needs and improve their quality of life.

The application of this mixed estimator model on the PPP and HDI in this study is made after the authors conducted a preliminary study. Based on information from the initial research, it is known that the GRDP has a changing pattern at certain subintervals such as the spline pattern. In contrast, the population growth rate has a random pattern that is usually modeled by the kernel. Therefore, the data of PPP and HDI of Papua Province in 2017 are used as response variables, while the predictor variables are the GRDP and the population growth rate. The data were obtained from Statistics Indonesia (Badan Pusat Statistik–BPS), Papua Province, with 29 regencies/cities as the observation unit. The biresponse mixed spline smoothing and kernel estimator in equation (12) was applied to the data. This modeling produces the minimum GCV value of 62.75, the R2 of 96.54%, and the RMSE of 3.166. Based on the R2 value, it is known that the model can describe the relationship between predictor variables and response up to 96.54%. This finding shows that the biresponse mixed spline smoothing and kernel estimator is suitable for modeling the PPP and HDI in Papua Province. Also, the bar chart in Figures 4 and 5 show that the estimated values of PPP and HDI in Papua Province are close to their actual values.

Furthermore, to determine the predictive ability of this modeling, we use the model that has been obtained from the data in 2017 to predict the PPP and HDI values of Papua Province in 2018 and 2019. This prediction is carried out by applying the model that has been obtained to the GRDP and the population growth rate of Papua Province in 2018 and 2019. One of the criteria for the predictive ability of a model that can be obtained from this prediction is the Mean Absolute Percentage Error (MAPE) value. The MAPE value obtained from the predictions in 2018 is 3.8042% or the level of accuracy is 96.1958%, and the MAPE value from the predictions in 2019 is 5.1658% or the level of accuracy is 94.8342%. These MAPE values are less than 10%, which indicates that the biresponse nonparametric regression model with mixed spline smoothing and kernel estimators has a good predictive ability to predict PPP and HDI in Papua Province.

4. Conclusions

This paper presents the biresponse nonparametric regression model with mixed spline smoothing and kernel estimators. This new mixed estimator is obtained through two-stage estimation, i.e., the first stage using the PWLS to obtain the spline smoothing component, followed by the second stage that employs the WLS to estimate the kernel component. This mixed estimator is formed to handle the different data patterns between each predictor in the biresponse case, so this estimator can provide better estimation results. Selection of the best model for the proposed estimator is carried out by selecting a model that produces a minimum GCV value. The simulation results show the biresponse mixed spline smoothing and kernel estimator provides better results compared to the biresponse spline smoothing or biresponse kernel estimator. Furthermore, this proposed estimator can be appropriately applied to model the percentage of the poor population (PPP) and human development index (HDI) in Papua Province and gives satisfactory results. The limitation of this study is we only use one predictor variable for each component of the estimators, both spline smoothing and kernel estimators. For future work, this biresponse mixed estimator can be developed with more than one predictor for each estimator component. Apart from this limitation, this study is useful for our insight into mixed estimators in biresponse nonparametric regression.

Data Availability

The data in this article are available in the BPS of the Papua Province repository (https://papua.bps.go.id).

Conflicts of Interest

The authors declare that there are no conflicts of interest in this paper.

Acknowledgments

The authors thank the Ministry of Research, Technology, and Higher Education (Ristekdikti), Republic of Indonesia, for supporting this research through the PMDSU scholarship.