Abstract

This study developed a new method of hypothesis testing of model conformity between truncated spline nonparametric regression influenced by spatial heterogeneity and truncated spline nonparametric regression. This hypothesis test aims to determine the most appropriate model used in the analysis of spatial data. The test statistic for model conformity hypothesis testing was constructed based on the likelihood ratio of the parameter set under H0 whose components consisted of parameters that were not influenced by the geographical factor and the set under the population parameter whose components consisted of parameters influenced by the geographical factor. We have proven the distribution of test statistics and verified that each of the numerators and denominators in the statistic test followed a distribution of . Since there was a symmetric and idempotent matrix S, it could be proved that . Matrix was positive semidefinite and contained weighting matrix which had different values in every location; therefore matrix was not idempotent. If and was not idempotent and also was a distributed random vector, then there were constants and ; hence ; therefore it was concluded that test statistic followed an F distribution. The modeling is implemented to find factors that influence the unemployment rate in 38 areas in Java in Indonesia.

1. Introduction

This study examines theoretically the multivariate nonparametric regression influenced by spatial heterogeneity with truncated spline approach. The model is the development of truncated spline nonparametric regression that takes into account geographic or spatial factors. Truncated spline is a function constructed on the basis of polynomial components and truncated components; i.e., polynomial pieces that have knot points, which can overcome the pattern of changes in data behavior. Truncated spline approach is used as a solution to solve the problem of spatial data analysis modeling; that is, the relationship between the response variable and the predictor variable does not follow a certain pattern and there is a changing pattern in certain subintervals. The response variable in the model contains the predictor variables whose respective regression coefficients depend on the location where the data is observed, due to differences in environmental and geographic characteristics between the observation sites; therefore each observation has different variations (spatial heterogeneity). Spatial is one type of dependent data, where data at a location is influenced by the measurement of data at another location (spatial dependency).

This study determines the model conformity hypothesis test between multivariable nonparametric regression that is influenced by spatial heterogeneity with truncated spline approach and multivariable nonparametric regression in general. This hypothesis test aims to determine the model that is most suitable for spatial data analysis. The test statistic was derived using the maximum likelihood ratio test (MLRT) method. The first step in this study was formulating the hypothesis to be tested and then defining the set of parameters under H0 whose components consist of parameters that are not influenced by geographical factors and the set under population parameters whose components consist of parameters influenced by geographical factors. Likelihood ratio was constructed based on the maximum ratio of the likelihood function under H0 as the numerator and set under the population as a denominator. Based on the likelihood ratio test statistic was obtained. Furthermore, the distribution of test statistic was determined. To prove the distribution of test statistic , we first proved that each numerator and denominator are chi square distributed.

The purpose of this study is to obtain a new method for the determination of hypothesis test of model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity versus multivariate nonparametric truncated spline regression in general. This hypothesis test aims to determine what model is most suitable for spatial data analysis.

2. Truncated Spline Nonparametric Regression Influenced by Heterogeneity Spatial

Truncated spline nonparametric regression influenced by spatial heterogeneity is the development of nonparametric regression for spatial data with parameter estimators local to each location of observation. Truncated spline approach is used to solve spatial analysis problems whose regression curve is unknown [1]. The assumption of the regression model used is the normal distributed error with mean zero and variance at each location . Location coordinates are an important factor in determining the weights used to estimate the parameters of the model. Given data and relationship between and , it is assumed to follow multivariate nonparametric regression model as follows:

is response variable and is unknown regression curve and assumed to be additive. If is approached with a truncated spline function. Mathematically, the relation between response variable and the predictor variable at i-th location for the multivariate nonparametric truncated spline regression model can be expressed as follows [2]:with truncated function:Equation (2) is a multivariate nonparametric truncated spline regression model of degree m with n area. The components in (2) are described as follows:

is a response variable at i-th location, where .

is a p-th predictor variable at i-th location with .

is an h-th knot point in p-th predictor variable component with .

is a polynomial component parameter of a multivariate nonparametric truncated spline regression. is a k-th parameter from p-th predictor variable at i-th location. is a truncated component from multivariate nonparametric truncated spline regression. is an -th parameter in h-th knot point and p-th predictor variable at i-th location.

Multivariate nonparametric truncated spline regression in (2) is described as follows:Equation (4) can also be expressed as follows:Thus (5) can be expressed bywhose vector contains of truncated spline function with geographical weighting sized ; response variable and error, respectively, are given by vectors as follows:Vectors and are, respectively, sized .

Meanwhile matrices and are, respectively, given byVectors and are, respectively, given byMatrix is sized ; matrix contains predictor variable of truncated function sized . Vector is a parameter vector sized . Vector is a parameter vector containing truncated function sized . The estimator forms , , and are complete in Theorem 1 and Corollary 2 [2].

Theorem 1. If the regression model (2) with an error normally distributed with zero mean and variance was given Maximum Likelihood Estimator (MLE), it is used to obtain estimator and as follows.where

Corollary 2. If and are given by Theorem 1, then the estimator for the regression curve is given bywhere

Estimator of regression curve contains the polynomial components represented by matrix and truncated components represented by matrix [3]. If the matrix , then the estimator multivariable of nonparametric regression curve in the Geographically Weighted Regression (GWR) models with truncated spline approach, , will change to estimator polynomial parametric regression curve in the GWR model. Furthermore, if and matrix contains a linear function, the estimator of the multivariable spline nonparametric regression curve in the GWR model, , will change to estimator of linear parametric regression curves in the GWR model or multiple linear regression in the GWR model developed by many researchers such as Brusdon and Fotheringham [4], Fotheringham, Brunsdon, and Charlton (2003), Demsar, Fotheringham, and Charlton [5], Yan Li, Yan Jiao, and Joan A. Browder [6], Shan-shan Wu, Hao Yang, Fei Guo, and Rui -Ming Han [7], and Benassi and Naccarato [8].

This study continued the previous research [2]; in this study the test statistics that will be used in the truncated spline nonparametric regression influenced by spatial heterogeneity modeling will be found; further research continued the distribution of test statistics and rejection areas.

3. Method

The hypothesis test for model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity with nonparametric truncated spline regression is derived.

Step 1. Formulating hypothetical model:: and ,  ; ; ; : at least, there is one of or ,; ; ; .

Step 2. Defining the set of parameters under population .

Step 3. Determining estimators and which are parameters in the space under population .

Step 4. Obtaining maximum likelihood function under population .

Step 5. Defining parameter space under H0, i.e., .

Step 6. Determining estimators and which are parameters under H0.

Step 7. Obtaining maximum likelihood function under space H0.

Step 8. Obtaining likelihood ratio .

Step 9. Obtaining test statistic from model conformity testing.

Step 10. Specifying the distribution of numerator from test statistic .

Step 11. Specifying the distribution of denominator from test statistic .

Step 12. Specifying the distribution of test statistic .

Step 13. Deciding the rejection area of and writing the conclusion.

4. Parameter Estimation under Space H0 and Space Population in the Model

A hypothesis testing of model conformity for nonparametric spline regression with spatial heterogeneity was designed by using hypothesis formulation:: and ,: or .

This hypothesis test was derived using maximum likelihood ratio test method by defining the parameter spaces under H0 and under population . The parameter space under H0 is given bywhere .While the parameter space under the population is given byObtaining the test statistic of hypothesis above required some lemmas as follows.

Lemma 3. If is a parameter under population from nonparametric spline regression with spatial heterogeneity (2), then estimator is given by

Proof. To obtain estimator we form likelihood function under population parameter space . Therefore has normal distribution with meanand variance ; then probability functions are given byObtained likelihood function is as follows:Equation (20) in matrix form isEstimator is obtained on the basis of the following derivative results:Then the following is obtained:Therefore,

Furthermore, estimator is shown in Lemma 4.

Lemma 4. If is a parameter in space under population from nonparametric spline regression with spatial heterogeneity spatial (2), then estimator which is obtained from likelihood function:is given by

Proof. Estimator is obtained using likelihood function:The ln likelihood function is given byFurthermore, estimator is obtained on the basis of the following derivative results:Then the following was obtained:Therefore,

Based on estimators and which are given by Lemmas 3 and 4 the following is obtained:

Lemma 5. If and are parameters under from multivariate nonparametric truncated spline influenced by spatial heterogeneity model (2), then estimator for is given byand estimator for is given by

Proof. To obtain estimators and we form likelihood function under parameter space H0. Therefore has normal distribution with meanand variance ; then probability functions are given byThe following likelihood functions were obtained:The equation is in the form of a matrixEstimators and are obtained:

Based on Lemma 5, maximum likelihood function is obtained as follows: and are parameters estimator of under H0 from multivariate nonparametric regression with truncated spline approach.

5. Statistics Test for Truncated Spline Nonparametric Regression with Spatial Heterogeneity

The test statistic for the model conformity hypothesis test can be obtained by using Lemmas 3, 4, and 5. In the next step, we show the likelihood ratio for test statistic presented in Lemma 6.

Lemma 6. If and , respectively, are given by (32) and (40), then the likelihood ratio is given bywhere

Proof. Based on Lemmas 3, 4, and 5, and also (32) and (40), the likelihood ratio is obtained:Based on (3) and (5), the likelihood ratio

Given test statistic for model conformity hypothesis is presented by Theorem 7.

Theorem 7. If likelihood ratio is given by Lemma 6, then test statistic for versus in (2) is given by

Proof. Based on Lemma 6, the likelihood ratio is as follows:Based on MLRT method, H0 is rejected ifFor a constant , (43) is equivalent toIn the two sections of the inequality above, each numerator is divided by and each denominator is divided by ; then the following inequality is obtained:Based on (44), the test statistic for H0 versus H1 is given by

Furthermore, the distribution of statistics test will be found.

The statistics test given in Theorem 7 is test statistics developed from the spline truncated approach in the GWR model, different from the one developed by Leung, Mei, and Zhang [9], Leung, Mei, and Zhang [10], and Mennis and Jordan [11] using GWR without using the Truncated Spine approach.

6. Distribution of Test Statistic and Critical Area of Hypothesis

To prove the distribution of test statistic , we first prove and . The proofs are presented in Theorems 8 and 9 as follows.

Theorem 8. If is a matrix given by Lemma 6 then statistic is

Proof. To prove this Lemma, the following steps are taken.
Matrix is shown which is a symmetric and idempotent matrix as follows:Based on equation above, it is proved that matrix is symmetric.It is proved that matrix is idempotent. Furthermore, is calculated as follows:Therefore, it is proved that

Theorem 9. If is a matrix given by Lemma 6 then statistic is

Proof. Based on (24), we obtainAnd estimator is obtained:in whichFurther, error vector is given as follows:Sum Square of Error (SSE) of model is obtained by squaring the following error vectors:Furthermore,Since SSE is a quadratic form from random variable:hence, matrix is positive semidefinite but not idempotent. Next, we obtainSince , hence , and since matrix is not idempotent, the distribution of statistic isFor constants k and r, based on (55), we obtainSince is symmetric and positive semidefinite, hence there is an orthogonal matrix ; therefore, is a diagonal matrix in which are eigenvalues from matrix . Hencein which . Random variables are independent, identical, and normal distributed; thereforewith mean 1 and variance 2; thereforeSince ,Hence the values of k and r are as follows: and ; as a result,Hence

Corollary 10. If statistic V is given by Theorem 7, then

Proof. Based on Theorem 8, statistic is obtained:Based on Theorem 9, statistic is obtained:Hence

The critical area for the model conformity hypothesis is derived which is given by Lemma 11.

Lemma 11. If given test statistic V is as in Theorem 7, then the critical area for is given byA constant c is obtained according toin which is a determined level of significance and

Proof. Based on Theorem 7, the following relationship is obtained:for a constant According to Corollary 10, statistic is obtained:for a level of significance given by H0 which is rejected if

After finding the hypothesis test formulation, the suitability of the model between the truncated spline nonparametric regression model which is influenced by spatial heterogeneity and nonparametric regression (global) will then be implemented on unemployment rate data in 38 regions in Java Indonesia.

7. Empirical Study on Unemployment Rate in Java Indonesia

7.1. Description of Research Data

In this study, the nonparametric truncated spline regression model influenced by spatial heterogeneity was applied to Open Unemployment Rate (OUR) data in province of Java, Indonesia, and some predictor variables that were suspected to affect it, i.e., population density (X1), percentage of the poor (X2), percentage of population with low education (X3), percentage of population working in agriculture sector (X4), area of agricultural land (X5), economic growth rate (X6), regional minimum wage (X7), and ratio number of large industries being number of labor force (X8). The amount of data used is 382 from 38 provinces and 8 predictor variables. Table 1 shows the description of our research data and the predictor variables.

The spread of Open Unemployment Rate in East Java is shown by Figure 1. It shows the percentage of East Java unemployment rate in 2015.

7.2. Spatial Heterogeneity Test

Each region has different characteristics and different parameters, as well as different functional forms; this is what proves spatial aspect. Breusch-Pagan testing is used to see the spatial heterogeneity of each location. Table 2 shows the Breusch-Pagan test.

Since spatial effect testing is fulfilled, i.e., there are effects of spatial heterogeneity, then the case can be solved by using the point approach. Furthermore, an analysis was performed using nonparametric truncated spline regression model influenced by spatial heterogeneity.

7.3. Model Conformity Test

Hypotheses for model conformity test between multivariate nonparametric truncated spline regression model influenced by heterogeneity spatial and multivariate nonparametric truncated spline regression model (global) are as follows:: and ,; ; ; : at least, there is one of or ,; ; ;

The test statistic is given by Theorem 7 as follows:

Matrix was constructed by multivariate nonparametric truncated spline. Matrix was constructed by multivariate nonparametric truncated spline regression. Hence, the numerator is obtained:with degree of freedom . Meanwhile the denominator is obtained:with degree of freedom 27,0762. Test statistic 2,06 with level of significance 0,05 was obtained and concluded to reject H0 since 1,88. Therefore, there is a significant difference between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity and nonparametric truncated spline regression. Due to the influence of geographical factors on the model, the appropriate model used is multivariate nonparametric truncated spline regression influenced by spatial heterogeneity.

The modeling application used Open Unemployment Rate (OUR) data in 38 districts/cities in East Java. The results of the empirical study showed that the OUR data has a geographical influence, namely, spatial heterogeneity, and based on the results of the model conformity hypothesis test, the appropriate model used is a multivariable nonparametric truncated spline regression model influenced by spatial heterogeneity with the weighted Gaussian kernel function. The modeling produced a coefficient of determination of 80.42%.

8. Conclusion

Multivariate nonparametric regression with truncated spline approach influenced by spatial heterogeneity is given as follows:Based on the results of the discussion and data analysis, some conclusions can be drawn as follows:(1)The hypotheses for model conformity between multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity and nonparametric truncated spline regression (global) are as follows:: and ,; ;; : at least, there is one of or ,; ;; Test statistic derived using Maximum Likelihood Ratio Test (MLRT) is obtained as follows:(2)The distribution of multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity is as follows:with level of significance ; therefore H0 is rejected if

Data Availability

The data of 38 districts/cities in East Java used to support the findings of this study have been deposited in https://www.bps.go.id/.

Conflicts of Interest

The authors of this work declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors thank The Ministry of Research, Technology and Higher Education, Republic of Indonesia/Kementerian Riset, Teknologi dan Pendidikan Tinggi Republik Indonesia (Kemenristekdikti RI) for funding this work.