Abstract and Applied Analysis

Volume 2018, Article ID 9769150, 13 pages

https://doi.org/10.1155/2018/9769150

## A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application

^{1}Department of Mathematics, Faculty of Mathematics and Natural Sciences, Mulawarman University, Samarinda, Indonesia^{2}Department of Statistics, Faculty of Mathematics, Computing and Data Sciences, Sepuluh Nopember Institute of Technology, Surabaya, Indonesia^{3}Department of Mathematics, Faculty of Mathematics and Natural Sciences, Gadjah Mada University, Yogyakarta, Indonesia

Correspondence should be addressed to I. N. Budiantara; moc.liamg@56aratnaidubnamoyn

Received 19 April 2018; Revised 17 July 2018; Accepted 5 August 2018; Published 12 September 2018

Academic Editor: Giovanni P. Galdi

Copyright © 2018 Sifriyani et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This study developed a new method of hypothesis testing of model conformity between truncated spline nonparametric regression influenced by spatial heterogeneity and truncated spline nonparametric regression. This hypothesis test aims to determine the most appropriate model used in the analysis of spatial data. The test statistic for model conformity hypothesis testing was constructed based on the likelihood ratio of the parameter set under H_{0} whose components consisted of parameters that were not influenced by the geographical factor and the set under the population parameter whose components consisted of parameters influenced by the geographical factor. We have proven the distribution of test statistics and verified that each of the numerators and denominators in the statistic test followed a distribution of . Since there was a symmetric and idempotent matrix S, it could be proved that . Matrix was positive semidefinite and contained weighting matrix which had different values in every location; therefore matrix was not idempotent. If and was not idempotent and also was a distributed random vector, then there were constants and ; hence ; therefore it was concluded that test statistic followed an F distribution. The modeling is implemented to find factors that influence the unemployment rate in 38 areas in Java in Indonesia.

#### 1. Introduction

This study examines theoretically the multivariate nonparametric regression influenced by spatial heterogeneity with truncated spline approach. The model is the development of truncated spline nonparametric regression that takes into account geographic or spatial factors. Truncated spline is a function constructed on the basis of polynomial components and truncated components; i.e., polynomial pieces that have knot points, which can overcome the pattern of changes in data behavior. Truncated spline approach is used as a solution to solve the problem of spatial data analysis modeling; that is, the relationship between the response variable and the predictor variable does not follow a certain pattern and there is a changing pattern in certain subintervals. The response variable in the model contains the predictor variables whose respective regression coefficients depend on the location where the data is observed, due to differences in environmental and geographic characteristics between the observation sites; therefore each observation has different variations (spatial heterogeneity). Spatial is one type of dependent data, where data at a location is influenced by the measurement of data at another location (spatial dependency).

This study determines the model conformity hypothesis test between multivariable nonparametric regression that is influenced by spatial heterogeneity with truncated spline approach and multivariable nonparametric regression in general. This hypothesis test aims to determine the model that is most suitable for spatial data analysis. The test statistic was derived using the maximum likelihood ratio test (MLRT) method. The first step in this study was formulating the hypothesis to be tested and then defining the set of parameters under H_{0} whose components consist of parameters that are not influenced by geographical factors and the set under population parameters whose components consist of parameters influenced by geographical factors. Likelihood ratio was constructed based on the maximum ratio of the likelihood function under H_{0} as the numerator and set under the population as a denominator. Based on the likelihood ratio test statistic was obtained. Furthermore, the distribution of test statistic was determined. To prove the distribution of test statistic , we first proved that each numerator and denominator are chi square distributed.

The purpose of this study is to obtain a new method for the determination of hypothesis test of model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity versus multivariate nonparametric truncated spline regression in general. This hypothesis test aims to determine what model is most suitable for spatial data analysis.

#### 2. Truncated Spline Nonparametric Regression Influenced by Heterogeneity Spatial

Truncated spline nonparametric regression influenced by spatial heterogeneity is the development of nonparametric regression for spatial data with parameter estimators local to each location of observation. Truncated spline approach is used to solve spatial analysis problems whose regression curve is unknown [1]. The assumption of the regression model used is the normal distributed error with mean zero and variance at each location . Location coordinates are an important factor in determining the weights used to estimate the parameters of the model. Given data and relationship between and , it is assumed to follow multivariate nonparametric regression model as follows:

is response variable and is unknown regression curve and assumed to be additive. If is approached with a truncated spline function. Mathematically, the relation between response variable and the predictor variable at* i*-th location for the multivariate nonparametric truncated spline regression model can be expressed as follows [2]:with truncated function:Equation (2) is a multivariate nonparametric truncated spline regression model of degree* m* with* n* area. The components in (2) are described as follows:

is a response variable at* i*-th location, where .

is a* p*-th predictor variable at* i*-th location with .

is an* h*-th knot point in* p*-th predictor variable component with .

is a polynomial component parameter of a multivariate nonparametric truncated spline regression. is a* k*-th parameter from* p*-th predictor variable at* i*-th location. is a truncated component from multivariate nonparametric truncated spline regression. is an -th parameter in* h*-th knot point and* p*-th predictor variable at* i*-th location.

Multivariate nonparametric truncated spline regression in (2) is described as follows:Equation (4) can also be expressed as follows:Thus (5) can be expressed bywhose vector contains of truncated spline function with geographical weighting sized ; response variable and error, respectively, are given by vectors as follows:Vectors and are, respectively, sized .

Meanwhile matrices and are, respectively, given byVectors and are, respectively, given byMatrix is sized ; matrix contains predictor variable of truncated function sized . Vector is a parameter vector sized . Vector is a parameter vector containing truncated function sized . The estimator forms , , and are complete in Theorem 1 and Corollary 2 [2].

Theorem 1. *If the regression model (2) with an error normally distributed with zero mean and variance was given Maximum Likelihood Estimator (MLE), it is used to obtain estimator and as follows.where*

Corollary 2. *If and are given by Theorem 1, then the estimator for the regression curve is given bywhere*

Estimator of regression curve contains the polynomial components represented by matrix and truncated components represented by matrix [3]. If the matrix , then the estimator multivariable of nonparametric regression curve in the Geographically Weighted Regression (GWR) models with truncated spline approach, , will change to estimator polynomial parametric regression curve in the GWR model. Furthermore, if and matrix contains a linear function, the estimator of the multivariable spline nonparametric regression curve in the GWR model, , will change to estimator of linear parametric regression curves in the GWR model or multiple linear regression in the GWR model developed by many researchers such as Brusdon and Fotheringham [4], Fotheringham, Brunsdon, and Charlton (2003), Demsar, Fotheringham, and Charlton [5], Yan Li, Yan Jiao, and Joan A. Browder [6], Shan-shan Wu, Hao Yang, Fei Guo, and Rui -Ming Han [7], and Benassi and Naccarato [8].

This study continued the previous research [2]; in this study the test statistics that will be used in the truncated spline nonparametric regression influenced by spatial heterogeneity modeling will be found; further research continued the distribution of test statistics and rejection areas.

#### 3. Method

The hypothesis test for model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity with nonparametric truncated spline regression is derived.

*Step 1. *Formulating hypothetical model: : and , ; ; ; : at least, there is one of or , ; ; ; .

*Step 2. *Defining the set of parameters under population .

*Step 3. *Determining estimators and which are parameters in the space under population .

*Step 4. *Obtaining maximum likelihood function under population .

*Step 5. *Defining parameter space under H_{0}, i.e., .

*Step 6. *Determining estimators and which are parameters under H_{0}.

*Step 7. *Obtaining maximum likelihood function under space H_{0}.

*Step 8. *Obtaining likelihood ratio .

*Step 9. *Obtaining test statistic from model conformity testing.

*Step 10. *Specifying the distribution of numerator from test statistic .

*Step 11. *Specifying the distribution of denominator from test statistic .

*Step 12. *Specifying the distribution of test statistic .

*Step 13. *Deciding the rejection area of and writing the conclusion.

#### 4. Parameter Estimation under Space H_{0} and Space Population in the Model

A hypothesis testing of model conformity for nonparametric spline regression with spatial heterogeneity was designed by using hypothesis formulation: : and , : or .

This hypothesis test was derived using maximum likelihood ratio test method by defining the parameter spaces under H_{0} and under population . The parameter space under H_{0} is given bywhere .While the parameter space under the population is given byObtaining the test statistic of hypothesis above required some lemmas as follows.

Lemma 3. *If is a parameter under population from nonparametric spline regression with spatial heterogeneity (2), then estimator is given by*

*Proof. *To obtain estimator we form likelihood function under population parameter space . Therefore has normal distribution with meanand variance ; then probability functions are given byObtained likelihood function is as follows:Equation (20) in matrix form isEstimator is obtained on the basis of the following derivative results:Then the following is obtained:Therefore,

Furthermore, estimator is shown in Lemma 4.

Lemma 4. *If is a parameter in space under population from nonparametric spline regression with spatial heterogeneity spatial (2), then estimator which is obtained from likelihood function:is given by*

*Proof. *Estimator is obtained using likelihood function:The ln likelihood function is given byFurthermore, estimator is obtained on the basis of the following derivative results:Then the following was obtained:Therefore,

Based on estimators and which are given by Lemmas 3 and 4 the following is obtained:

Lemma 5. *If and are parameters under from multivariate nonparametric truncated spline influenced by spatial heterogeneity model (2), then estimator for is given byand estimator for is given by*

*Proof. *To obtain estimators and we form likelihood function under parameter space H_{0}. Therefore has normal distribution with meanand variance ; then probability functions are given byThe following likelihood functions were obtained:The equation is in the form of a matrixEstimators and are obtained:

Based on Lemma 5, maximum likelihood function is obtained as follows: and are parameters estimator of under H_{0} from multivariate nonparametric regression with truncated spline approach.

#### 5. Statistics Test for Truncated Spline Nonparametric Regression with Spatial Heterogeneity

The test statistic for the model conformity hypothesis test can be obtained by using Lemmas 3, 4, and 5. In the next step, we show the likelihood ratio for test statistic presented in Lemma 6.

Lemma 6. *If and , respectively, are given by (32) and (40), then the likelihood ratio is given bywhere*

*Proof. *Based on Lemmas 3, 4, and 5, and also (32) and (40), the likelihood ratio is obtained:Based on (3) and (5), the likelihood ratio

Given test statistic for model conformity hypothesis is presented by Theorem 7.

Theorem 7. *If likelihood ratio is given by Lemma 6, then test statistic for versus in (2) is given by*

*Proof. *Based on Lemma 6, the likelihood ratio is as follows:Based on MLRT method, H_{0} is rejected ifFor a constant , (43) is equivalent toIn the two sections of the inequality above, each numerator is divided by and each denominator is divided by ; then the following inequality is obtained:Based on (44), the test statistic for H_{0} versus H_{1} is given by

Furthermore, the distribution of statistics test will be found.

The statistics test given in Theorem 7 is test statistics developed from the spline truncated approach in the GWR model, different from the one developed by Leung, Mei, and Zhang [9], Leung, Mei, and Zhang [10], and Mennis and Jordan [11] using GWR without using the Truncated Spine approach.

#### 6. Distribution of Test Statistic and Critical Area of Hypothesis

To prove the distribution of test statistic , we first prove and . The proofs are presented in Theorems 8 and 9 as follows.

Theorem 8. *If is a matrix given by Lemma 6 then statistic is*

*Proof. *To prove this Lemma, the following steps are taken.

Matrix is shown which is a symmetric and idempotent matrix as follows:Based on equation above, it is proved that matrix is symmetric.It is proved that matrix is idempotent. Furthermore, is calculated as follows:Therefore, it is proved that

Theorem 9. *If is a matrix given by Lemma 6 then statistic is*

*Proof. *Based on (24), we obtainAnd estimator is obtained:in whichFurther, error vector is given as follows:Sum Square of Error (SSE) of model is obtained by squaring the following error vectors:Furthermore,Since SSE is a quadratic form from random variable:hence, matrix is positive semidefinite but not idempotent. Next, we obtainSince , hence , and since matrix is not idempotent, the distribution of statistic isFor constants* k* and* r*, based on (55), we obtainSince is symmetric and positive semidefinite, hence there is an orthogonal matrix ; therefore, is a diagonal matrix in which are eigenvalues from matrix . Hencein which . Random variables are independent, identical, and normal distributed; thereforewith mean 1 and variance 2; thereforeSince ,Hence the values of* k* and* r* are as follows: and ; as a result,Hence

Corollary 10. *If statistic V is given by Theorem 7, then *

*Proof. *Based on Theorem 8, statistic is obtained:Based on Theorem 9, statistic is obtained:Hence

The critical area for the model conformity hypothesis is derived which is given by Lemma 11.

Lemma 11. *If given test statistic V is as in Theorem 7, then the critical area for is given by*