Research Article | Open Access
Wayan Somayasa, "Accessing the Power of Tests Based on Set-Indexed Partial Sums of Multivariate Regression Residuals", Journal of Applied Mathematics, vol. 2018, Article ID 2071861, 13 pages, 2018. https://doi.org/10.1155/2018/2071861
Accessing the Power of Tests Based on Set-Indexed Partial Sums of Multivariate Regression Residuals
The intention of the present paper is to establish an approximation method to the limiting power functions of tests conducted based on Kolmogorov-Smirnov and Cramér-von Mises functionals of set-indexed partial sums of multivariate regression residuals. The limiting powers appear as vectorial boundary crossing probabilities. Their upper and lower bounds are derived by extending some existing results for shifted univariate Gaussian process documented in the literatures. The application of multivariate Cameron-Martin translation formula on the space of high dimensional set-indexed continuous functions is demonstrated. The rate of decay of the power function to a presigned value is also studied. Our consideration is mainly for the trend plus signal model including multivariate set-indexed Brownian sheet and pillow. The simulation shows that the approach is useful for analyzing the performance of the test.
Investigating the partial sums of least squares residuals has been shown to be reasonable and powerful tool for testing the adequacy of an assumed multivariate regression model; see Somayasa and et al. [1–4]. The development of the technique was motivated by the works proposed mainly for the purpose of detecting change in parameter as well as for detecting the existence of boundaries in univariate spatial regression; see [5–8] for references. The rejection region is constructed based on either Kolmogorov-Smirnov (KS) or Cramér-von Mises (CvM) functionals of the processes. It was shown in the literatures cited above that the limiting power function of the test appeared as a type of boundary crossing probability which has been involving shifted multidimensional Gaussian process.
To understand the objective considered in this paper in more detail we present below brief review how such a kind of probability appears. Let be the dimensional set-indexed Brownian sheet defined on a probability space , say with sample paths in and the control measure , where is a probability measure on , , and , for . We refer the reader to [9, 10] for well documented notion of . In the literature of Gaussian process is frequently called dimensional Gaussian white noise having as the control measure, cf. , p. 13-14. Let and , where for any , is defined as . Under mild condition, [1–3] showed after a suitable localization given to the regression function that the sequence of the partial sums of the least squares residuals obtained from the multivariate regression model converges, when , to a dimensional signal plus noise model defined by where means that is positive definite, and for , provided that builds an ONB of in . Thereby and is the space of functions on with bounded variation in the sense of Hardy. It is worth mentioning that the notion of is a direct extension of the definition of formulated in  to higher dimensional space. Here the notation stands for , cf. . Throughout the paper will be denoted by and by , for the sake of brevity. It was established in [1–3] that is a projection of onto the orthogonal complement of which is a finite dimensional subspace of the so-called reproducing kernel Hilbert Space (RKHS) of , denoted by , given by with . In the literatures mentioned above the process is called the -dimensional set-indexed residual partial sums limit process with the control measure . Hence, the process itself and the dimensional set-indexed Brownian pillow , with , are special cases of that correspond to and , respectively, with and . The control measure appears in the process determines the design under which the experiment was constructed; see  for detail.
It was shown by using the well-known continuous mapping theorem that the limiting power functions of size KS and CvM type tests for testing the hypothesis are given, respectively, by the following complicated formulas:where stands for the Euclidean norm, whereas and are constants that satisfy . By the difficulty of the computation of as well as and the power of the test as the dimension of the experimental region and get large, the implementation of the test in practice becomes restricted. Approximation by Monte Carlo simulation has been proposed in [1–3]. Some attempts of establishing concrete computation procedure by generalizing the principal component approach proposed, e.g., by MacNeill [5, 6] and Stute  for some univariate Gaussian processes on a line, have led us to incorrect result.
Since analytical computation of and is impossible, it is the purpose of the present paper to establish approximation procedure for that functions. As suggested in , p. 315, and , p. 423-424, studying the power function is of importance to be able to evaluate the performance of the test especially their rate of decay to . Therefore in this paper we investigate the upper and lower bounds for (6) by considering the result for the univariate Brownian sheet and Brownian pillow presented in Janssen  and Hashorva [18, 19]. Upper and lower bound for the power function of goodness-of-fit test involving multiparameter Brownian process have been studied by Bass . The RKHS of is crucial for our results. By Theorem 4.1 in  (factorization theorem) if there exists a family , such that the covariance function of admits the representation then the corresponding RKHS is given by Furthermore, the inner product and the corresponding norm on are denoted, respectively, by and . For examples, the RKHS of is given by with such that .
The rest of the present paper is organized as follows. In Section 2 we derive the upper and lower bounds for the power functions of and tests by applying the Cameron-Martin translation formula of the multivariate process . The rate of decay of the power to is also discussed. Alternative method of obtaining the bounds of the power function is proposed in Section 3. In Section 4 we propose Neyman-Pearson test which is a most powerful test. The comparison of the rate of decay of the obtained power to with those of the KS and CvM tests is also investigated. Justification of the result is also studied in Section 5 by simulation. The paper is closed with a concluding remark in Section 6.
2. Rate of Decay of the Power of Tests
Our final goal in this section is to obtain an expression for the rate of decay of both and to the preassigned number representing the size of the test. First we derive their upper and lower bounds by generalizing the method proposed in  concerning bounds for the probability of shifted event; see also Theorem 7.3. in  for comparison. Second, we apply the technique studied in  to get the result. As reported in  and the references cited therein, they studied the upper and lower bounds for the power of signal detection test by applying Cameron-Martin density formula for a shifted measure. The rate of decay was obtained by means of mean value theorem.
Throughout this work let be the probability distributions of and let be a probability measure on , defined byThen the Cameron-Martin density of with respect to for any is given bywhere is a bilinear form, such that This general formula can be obtained by extending the formula for the univariate model presented either in , Theorem 5.1 of , and  or  to higher dimensional set-indexed Gaussian processes.
The following theorem is already well known in the literatures mentioned above; however the proof is given only for the case of Gaussian random vector in with zero mean and identity covariance matrix (canonical Gaussian Euclidean random vector); see  and , p. 53. In this paper we present again the theorem especially for the process on . Although the result for is straightforward based on that of [11, 21], to give information on how the inequality for higher dimensional set-indexed Gaussian process was derived, we insist to present the proof of the theorem; see the appendix of this work.
Theorem 1 (Li and Kuelbs  and Lifshits ). Let be any subset of and be any constant, such that . Then for any , it holds true thatwhere is the cumulative distribution function of the standard normal distribution.
The following corollary which gives an expression regarding the rate of decay of to , for any and , is an immediate implication of Theorem 1. Rate of decay describes how fast the distance between and vanishes, cf. [17–19].
Corollary 2. Let be an arbitrary subset of and be any constant, such that . Then under the assumption , we have, for any ,
Proof. We apply the technique of proving Lemma 5 of . By (14) presented in Theorem 1 and by using the symmetry of , it holds thatfor some mean value , where is the probability density function of . Since , then we have Conversely, by the inequality of (14), we can derive the following result: for some mean value . Since , by the preceding result, we get which establises the proof.
When the model is either , with , or , with , such that , for and , then Hence, when we are dealing with the -dimensional set-indexed Brownian sheet and -dimensional set-indexed Brownian pillow, Inequality (14), respectively, becomes The corresponding rate of decays can be obtained respectively as follows:
In light of the preceding results, we can state the upper and lower bounds as well as the rate of decays for the power and , when is given by either or . Let be a subset of , defined by then for , we get Since is the distribution of , then is equivalent to Analogously, let Then . Thus by considering these two representations we have on the basis of Theorem 1 and Corollary 2 the following summary concerns the bounds for the power of the KS and CvM type tests.
Corollary 3. Suppose that ; then, for , it holds thatFurthermore, we have simple formulas for the rate of decay of and to where in the context of model check, the norm of related to the process and is given by
Corollary 3 says that the rate of decay or convergence of the power function to in the case of as well as depends on the norm of the trend. A Model with small norm trend leads to faster decay. Conversely, a model with large norm trend results in slower decay. For both models, the norm can be concretely calculated. It is clear that both tests achieve their sizes as the trends vanish. Indeed the work of Samorodnitsky  can be incorporated in the estimation of , for any large real number . In Section 5 we demonstrate by simulation the behavior of the power functions of the KS and CvM tests as summarized in Corollary Corollary 3 to give empirical study regarding the rate of decay of the power functions.
3. Alternative Approaches
In this section other formulas for the upper and lower bounds of the power of KS and CvM tests involving the -dimensional set-indexed Brownian sheet and pillow models are derived. Our results are obtained by generalizing the approach proposed in that studied in [18, 19] who confined the investigation to one-dimensional Kolmogorov type boundary noncrossing probability involving the so-called univariate ordinary Brownian sheet and pillow.
To simplify the notation we restrict the attention to the case of two-dimensional experimental region .
Theorem 4. Let the ONB of be in and let , such that are constant on the boundary of . Then for the model it holds thatwhere where and is the th component of , which is given by with denoting the th element of , say, for . Furthermore, for the model, we havewhere
Proof. By using a rule for the probability of complement, we get for the model where by using transformation of variables, it can be further expressed as Next, Cameron-Martin formula (12) for the -dimensional set-indexed Brownian sheet implicates Since means , then under the indicator we get by recalling integration by parts formula on , cf. [24, 25] and the assumption that is constant throughout the boundary of ; for the model we get Thus, the lower bound in (30) is established. To prove the upper bound, we start with the following inequality: By applying the similar technique as that used in deriving the preceding result and the implication under the indicator we have, by the integration by parts, the following inequality:completing the proof for the model. To prove the lower and upper bounds (33) for the model, we start with the equality Next by the integration by parts and the assumption that and are constant on the boundary of , we have under and the fact that Hence, , establishing the lower bound in (33). The similar argument as that used in the case of model can be applied in deriving the upper bound of as follows:establishing the proof.
Now we can derive other formulas for the rate of decay of and to by applying the similar method as that utilized in deriving the formula in Corollary 3. However by Theorem 4 we lead to computationally more complicated results.
Corollary 5. Under the condition of Theorem 7, it holds true thatfor some mean values In particular, if the mean values and are taken to be the same, then
Proof. From Inequality (30), we have, by applying the mean value theorem, for some mean value laid in the interval Conversely, based on the lower bound formula (30), we get for some mean value within the intervalThus it can be concluded that is laid in the following closed interval: In particular, if the mean values and are taken to be the same, then establishing the proof.
Analogously, from (33), we get for some . Conversely, for some . Particularly, for