Table of Contents Author Guidelines Submit a Manuscript
International Journal of Mathematics and Mathematical Sciences
Volume 2016 (2016), Article ID 2601601, 16 pages
http://dx.doi.org/10.1155/2016/2601601
Research Article

Asymptotic Theory in Model Diagnostic for General Multivariate Spatial Regression

1Department of Mathematics, Haluoleo University, Kendari, Indonesia
2Department of Geological Engineering, Haluoleo University, Kendari, Indonesia

Received 25 March 2016; Revised 13 July 2016; Accepted 28 July 2016

Academic Editor: Andrei I. Volodin

Copyright © 2016 Wayan Somayasa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We establish an asymptotic approach for checking the appropriateness of an assumed multivariate spatial regression model by considering the set-indexed partial sums process of the least squares residuals of the vector of observations. In this work, we assume that the components of the observation, whose mean is generated by a certain basis, are correlated. By this reason we need more effort in deriving the results. To get the limit process we apply the multivariate analog of the well-known Prohorov’s theorem. To test the hypothesis we define tests which are given by Kolmogorov-Smirnov (KS) and Cramér-von Mises (CvM) functionals of the partial sums processes. The calibration of the probability distribution of the tests is conducted by proposing bootstrap resampling technique based on the residuals. We studied the finite sample size performance of the KS and CvM tests by simulation. The application of the proposed test procedure to real data is also discussed.

1. Introduction

As mentioned in the literatures of model checks for multivariate regression, the appropriateness of an assumed model is mostly verified by analyzing the least squares residual of the observations; see, for example, Zellner [1], Christensen [2], pp. 1–22, Anderson [3], pp. 187–191, and Johnson and Wichern [4], pp. 395–398. A common feature of these works is the comparison between the length of the matrix of the residuals under the null hypothesis and that of the residuals under a proposed alternative.

Instead of considering the residuals directly MacNeill [5] and MacNeill [6] proposed a method in model check for univariate polynomial regression based on the partial sums process of the residuals. These popular approaches are generalized to the spatial case by MacNeill and Jandhyala [7] for ordinary partial sums and Xie and MacNeill [8] for set-indexed partial sums process of the residuals. Bischoff and Somayasa [9] and Somayasa et al. [10] derived the limit process in the spatial case by a geometric method generalizing a univariate approach due to Bischoff [11] and Bischoff [12]. These results can be used to establish asymptotic test of Cramér-von Mises and Kolmogorov-Smirnov type for model checks and change-point problems. Model checks for univariate regression with random design using the empirical process of the explanatory variable marked by the residuals was established in Stute [13] and Stute et al. [14]. In the papers mentioned above the limit processes were explicitly expressed as complicated functions of the univariate Brownian motion (sheet).

The purpose of the present article is to study the application of set-indexed partial sums technique to simultaneously check the goodness-of-fit of a multivariate spatial linear regression defined on high-dimensional compact rectangle. In contrast to the normal multivariate model studied in the standard literatures such as in Christensen [2], Anderson [3], and Johnson and Wichern [4] or in the references of model selection such as in Bedrick and Tsai [15] and Fujikoshi and Satoh [16], in this paper we will consider a multivariate regression model in which the components of the mean of the response vector are assumed to lie in different spaces and the underlying distribution model of the vector of random errors is unknown.

To see the problem in more detail let be fixed. Let a -dimensional random vector be observed independently over an experimental design given by a regular lattice: where is the -dimensional unit cube. Let be the true but unknown -valued regression function on which represents the mean function of the observations. Let and be the observation and the corresponding mean in the experimental condition . Under the null hypothesis we assume that follows a multivariate linear model. That is, we assume a model where, for , is a -dimensional vector of unknown parameters; is a -dimensional vector of known regression functions whose components are assumed to be square integrable with respect to the Lebesgue measure on , that is, , for all and . is the mutually independent -dimensional vector of random errors defined on a common probability space . We assume that, for all , , and . Let be the -dimensional pyramidal array of random observations and let be the -dimensional pyramidal array of random errors taking values in the Euclidean space . Then under the observations can be represented by where . Under the alternative a multivariate nonparametric regression model is assumed, where . By applying the similar argument as in Christensen [2] and Johnson and Wichern [4], the -dimensional array of the least squares residuals of the observations is given by the following component-wise projection: with , for , and . Thereby, for , we define as the subspace of spanned by the arrays . It is worth mentioning that the Euclidean space is furnished with the inner product denoted by and defined byfor every , and .

Next we define the set-indexed partial sums operator. Let be the family of convex subset of , and let be the Lebesgue pseudometric on defined by , for . Let be the set of continuous functions on under . We embed the array of the residual into a -dimensional stochastic process indexed by by using the component-wise set-indexed partial sums operator such that, for any ,where, for , . Let us call this process the -dimensional set-indexed least squares residual partial sums process. The space is furnished with the uniform topology induced by the metric defined by for and .

We notice that, in the works of Bischoff and Somayasa [9], Bischoff and Gegg [17], and Somayasa and Adhi Wibawa [18], the limit process of the partial sums process of the least squares residuals has been investigated by applying the existing geometric method of Bischoff [11, 12]. However, the method becomes not applicable anymore in deriving the limit process of as the dimension of varies. Therefore, in this work, we attempt to adopt the vectorial analog of Prohorov’s theorem; see, for example, Theorem in Billingsley [19] for obtaining the limit process. For our result we need to extend the ordinary partial sums formula to -dimensional case defined on as follows. Let and be the set of all -combinations of the set , with . For a chosen value of , we denote the th element of by a -tuple , for , where is the number of elements of which is clearly given by For example, let . Then, for , we denote the elements of as , , and . In a similar way, we denote the elements of which consists of , respectively, by , , and . Finally the element of is sufficiently written by . Hence the -dimensional ordinary partial sums operator transforms any -dimensional array to a continuous function on defined byfor every , where for positive integers we define a notation It is clear that the partial sums process of the residuals obtained using (11) is a special case of (8) since for every it holds

It is worth noting that the extension of the study from univariate to multivariate model and also the expansion of the dimension of the lattice points are strongly motivated by the prediction problem in mining industry and geosciences. As for an example recently Tahir [20] presented data provided by PT Antam Tbk (a mining industry in Southeast Sulawesi). The data consist of a joint measurement of the percentage of several chemical elements and substances such as Ni, Co, Fe, MgO, , and CaO which are recorded in every point of a three-dimensional lattice defined over the exploration region of the company. Hence, by the inherent existence of the correlation among the variables, the statistical analysis for the involved variables must be conducted simultaneously.

There have been many methods proposed in the literatures for testing . Most of them have been derived for the case of under normally distributed random error. Generalized likelihood ratio test which has been leading to Wilk’s lambda statistic or variant of it can be found in Zellner [1], Christensen [2], pp. 1–22, Anderson [3], pp. 187–191, and Johnson and Wichern [4], pp. 395–398. Mardia and Goodall [21] derived the maximum likelihood estimation procedure for the parameters of the general normal spatial multivariate model with stationary observations. This approach can be straightforwardly extended for obtaining the associated likelihood ratio test in model check for the model. Unfortunately, in the practice especially when dealing with mining data, normal distribution is sometimes found to be not suitable for describing the distribution model of the observations, so that the test procedures mentioned above are consequently no longer applicable. Asymptotic method established in Arnold [22] for multivariate regression with can be generalized in such a way that it is valid for the general model. As a topic in statistics it must be well known. However, we cannot find literatures where the topic has been studied.

The rest of the paper is organized as follows. In Section 2 we show that when is true converges weakly to a projection of the -dimensional set-indexed Brownian sheet. The limit process is shown to be useful for testing asymptotically based on the Kolmogorov-Smirnov (KS-test) and Cramér-von Mises (CvM-test) functionals of the set-indexed -dimensional least squares residual partial sum processes, defined, respectively, by For both tests the rejection of is for large value of the KS and CvM statistics, respectively. Under localized alternative the above sequence of random processes converges weakly to the above limit process with an additional deterministic trend (see Section 3). In Section 4, we define a consistent estimator for . In Section 5 we investigate a residual based bootstrap method for the calibration of the tests. Monte Carlo simulation for the purpose of studying the finite sample behavior of the KS and CvM tests is reported in Section 6. Application of the test procedure in real data is presented in Section 7. The paper is closed in Section 8 with conclusion and some remarks for future research. Auxiliary results needed for deriving the limit process are presented in the appendix. We note that all convergence results derived throughout this paper which hold for simultaneously go to infinity, that is, for , for all ; otherwise they will be stated in some way. The notion of convergence in distribution and convergence in probability will be conventionally denoted by and , respectively.

2. The Limit of VR under

Let be the one-dimensional set-indexed Brownian sheet having sample path in . We refer the reader to Pyke [23], Bass and Pyke [24], and Alexander and Pyke [25] for the definition and the existence of . Let us consider a subspace of which is closely related to , defined byUnder the inner product and the norm defined by it is clear that and are isometric. For our result we need to define subspaces and associated with the regression functions , where and , for .

Now we are ready to state the limit process of the sequence of -dimensional set-indexed residual partial sums processes for the model specified under .

Theorem 1. For , let be an orthonormal basis (ONB) of . We assume that , for . Let be the -dimensional set-indexed Brownian sheet with the covariance function , for , where is the identity matrix. Suppose are in , where is the space of continuous functions on (see Definition A.4 for the definition of ). Then under it holds that where Thereby is a projector such that, for every and , For , we set for , and stands for the integral in the sense of Riemann-Stieltjes. Moreover, the limit process is a centered Gaussian process with the covariance function given by

Proof. By applying the linear property of and Lemma C.2, we have under ,It can be shown by extending the uniform central limit theorem studied in Pyke [23], Bass and Pyke [24], and Alexander and Pyke [25] to its vectorial analog that the term on the right-hand side of (21) converges weakly to . Therefore we only need to show that the second term satisfies the weak convergence: where is a -dimensional centered Gaussian process with the covariance matrix given by By Prohorov’s theorem it is sufficient to show that, for any finite collection of convex sets in and real numbers , with , it holds that where the left-hand side has the covariance which can be expressed as where , for and . Let and be fixed. Then by a simultaneous application of the definition of (see Lemma C.3), (11), the definition of the Riemann-Stieljes integral (cf. Stroock [26], pp. 7–17), and the independence of the array , we further get By recalling Lemma C.3 the last expression clearly converges to Hence the matrix converges element wise to the symmetric matrix which can be represented as , for a matrix defined by with , for . Thereby denotes the Hadarmard product defined, for example, in Magnus and Neudecker [27], pp. 53-54. Since , we successfully have shown that converges to the following general linear combination:which is actually the covariance of . Next we observe that, by applying the definition of and the definition of the Riemann-Stieltjes integral, we can also write as follows: where Let , , and be the Euclidean norm of . Then by considering the stochastic independence of the array of the -vector of the random errors, it holds that in which by the well-known bounded convergence theorem (cf. Athreya and Lahiri [28], pp. 57-58) the last term converges to zero. Thus the Lindeberg condition is satisfied. Therefore by the Lindeberg-Levy multivariate central limit theorem studied, for example, in van der Vaart [29], pp. 16, it can be concluded that converges in distribution to , where has the -variate normal distribution with mean zero and the covariance given by (29).
The tightness of can be shown as follows. By the definition can also be expressed as Since is tight, hence by recalling Lemma in Billingsley [19], pp. 38–40, it is sufficient to show that the mapping is continuous on for every . Proposition C.4 finishes the proof.

Corollary 2. By Theorem 1 and the well-known continuous mapping theorem (cf. Theorem in Billingsley [19]) the distribution of the statistics and can be approximated, respectively, by those of Let and be the th quantile of the distributions of and , respectively. When is used, will be rejected at level if and only if . Likewise if is used, then will be rejected at level if and only if .

3. The Limit of VR under

The test procedures derived above are consistent in the sense of Definition in Lehmann and Romano [30]. That is, the probability of rejection of under the competing alternative converges to 1. As an immediate consequence we cannot observe the performance of the tests when the model moves away from . Therefore, to be able to investigate the behavior of the tests, we consider the localized model defined as follows:When is true we get the similar least squares residuals as given in Section 1. Therefore, observing Model (35), the test problem will not be altered.

In the following theorem, we present the limit process of the -dimensional set-indexed partial sums process of the residuals under associated with Model (35).

Theorem 3. For , let be an ONB of with , for . If and (see Definition A.3 for the notion of ), then, observing (35), we have under that where , with .

Proof. Considering the linearity of and Lemma C.2, when is true we have Since, for , is in , it can be shown that converges uniformly to , for every . Also the last two terms on the right-hand side of the preceding equation converge in distribution to by Theorem 1. Thus to the rest we only need to show that converges to . By the definition of component-wise projection we have The right-hand side of the last expression is obtained directly from the definition of the ordinary partial sums (11) and the definition of the Riemann-Stieltjes integral on . Since converges uniformly to and has bounded variation on for all and , then the last expression clearly converges component-wise to , which can be written as . We are done.

Corollary 4. By Theorem 3 the power function of the KS and CvM tests at a level can now be approximated by computing the probabilities of the form for a fixed , respectively. In Section 5 we investigate the empirical power functions of the KS and CvM tests by simulation.

4. Estimating the Population Covariance Matrix

If the covariance matrix is unknown, as it usually is, it is impossible to use and in practice. What we propose to do is to employ a consistent estimate of . We need some further notations for expressing the residuals of the model. For , let , , and be -dimensional column vectors defined by Furthermore, let , , and be -dimensional matrices whose th column is given, respectively, by the column vectors , , and , . Then Model (35) can also be represented as follows:where, for , , with being the identity matrix in .

Associated with the subspace we define the design matrix as an element of whose th column is given by the -dimensional column vector: We denote the column space of by for the sake of brevity. We also define the column-wise projection of any matrix in into the product space by A reasonable estimator of the covariance matrix is denoted by , defined by where constitutes the component-wise orthogonal projector into the orthogonal complement of the product space .

Zellner [1] and Arnold [22] investigated the consistency of toward in the case of the multivariate regression model with . Some difficulties appear when the situation is extended to the case of , since it involves the problem of finding the limit of matrices with the components given by inner products of two vectors.

Theorem 5. Suppose the localized model (41) is observed. If is true, then, under the conditions of Theorem 1, we have .

Proof. If is true, it can be easily shown that For technical reason we assume without lost of generality that is an orthogonal matrix, for . Hence we further get the representation Since are independent and identically distributed random matrices with mean , by the well-known weak law of large numbers, we get Note that in the practice we consider the polynomial regression model. Hence, for every , the design matrix satisfies the so-called Huber condition (cf. Pruscha [31], pp. 115–117). By this reason, for the rest of the terms, we can immediately apply the technique proposed in Arnold [22] to show , for all . Therefore, we finally get the following component-wise convergence: where is the -zero matrix.

Remark 6. Since , without altering the convergence result presented in Theorem 1, the population variance-covariance matrix can be directly replaced by the consistence estimator .

5. Calibration of the Tests

The limits of the test statistics are not distribution-free and we need therefore calibration for the distribution of the statistical tests. For the calibration we adapted the idea of residual based bootstrap for multivariate regression studied in Shao and Tu [32] for approximating the distributions of and .

For fixed , let be the empirical distribution function of the vectors of least squares residuals centered at zero vector, where . Let