Accessing the Power of Tests Based on Set-Indexed Partial Sums of Multivariate Regression Residuals

Somayasa, Wayan

doi:https://doi.org/10.1155/2018/2071861

Journal of Applied Mathematics

On this page

Abstract Introduction Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 2071861 | https://doi.org/10.1155/2018/2071861

Accessing the Power of Tests Based on Set-Indexed Partial Sums of Multivariate Regression Residuals

Wayan Somayasa¹

Academic Editor: Lucas Jodar

Received27 Feb 2018

Accepted04 Jun 2018

Published02 Sept 2018

Abstract

The intention of the present paper is to establish an approximation method to the limiting power functions of tests conducted based on Kolmogorov-Smirnov and Cramér-von Mises functionals of set-indexed partial sums of multivariate regression residuals. The limiting powers appear as vectorial boundary crossing probabilities. Their upper and lower bounds are derived by extending some existing results for shifted univariate Gaussian process documented in the literatures. The application of multivariate Cameron-Martin translation formula on the space of high dimensional set-indexed continuous functions is demonstrated. The rate of decay of the power function to a presigned value is also studied. Our consideration is mainly for the trend plus signal model including multivariate set-indexed Brownian sheet and pillow. The simulation shows that the approach is useful for analyzing the performance of the test.

1. Introduction

Investigating the partial sums of least squares residuals has been shown to be reasonable and powerful tool for testing the adequacy of an assumed multivariate regression model; see Somayasa and et al. [1–4]. The development of the technique was motivated by the works proposed mainly for the purpose of detecting change in parameter as well as for detecting the existence of boundaries in univariate spatial regression; see [5–8] for references. The rejection region is constructed based on either Kolmogorov-Smirnov (KS) or Cramér-von Mises (CvM) functionals of the processes. It was shown in the literatures cited above that the limiting power function of the test appeared as a type of boundary crossing probability which has been involving shifted multidimensional Gaussian process.

To understand the objective considered in this paper in more detail we present below brief review how such a kind of probability appears. Let be the dimensional set-indexed Brownian sheet defined on a probability space , say with sample paths in and the control measure , where is a probability measure on , , and , for . We refer the reader to [9, 10] for well documented notion of . In the literature of Gaussian process is frequently called dimensional Gaussian white noise having as the control measure, cf. [11], p. 13-14. Let and , where for any , is defined as . Under mild condition, [1–3] showed after a suitable localization given to the regression function that the sequence of the partial sums of the least squares residuals obtained from the multivariate regression model converges, when , to a dimensional signal plus noise model defined by where means that is positive definite, and for , provided that builds an ONB of in . Thereby and is the space of functions on with bounded variation in the sense of Hardy. It is worth mentioning that the notion of is a direct extension of the definition of formulated in [12] to higher dimensional space. Here the notation stands for , cf. [8]. Throughout the paper will be denoted by and by , for the sake of brevity. It was established in [1–3] that is a projection of onto the orthogonal complement of which is a finite dimensional subspace of the so-called reproducing kernel Hilbert Space (RKHS) of , denoted by , given by with . In the literatures mentioned above the process is called the -dimensional set-indexed residual partial sums limit process with the control measure . Hence, the process itself and the dimensional set-indexed Brownian pillow , with , are special cases of that correspond to and , respectively, with and . The control measure appears in the process determines the design under which the experiment was constructed; see [4] for detail.

It was shown by using the well-known continuous mapping theorem that the limiting power functions of size KS and CvM type tests for testing the hypothesis are given, respectively, by the following complicated formulas:where stands for the Euclidean norm, whereas and are constants that satisfy . By the difficulty of the computation of as well as and the power of the test as the dimension of the experimental region and get large, the implementation of the test in practice becomes restricted. Approximation by Monte Carlo simulation has been proposed in [1–3]. Some attempts of establishing concrete computation procedure by generalizing the principal component approach proposed, e.g., by MacNeill [5, 6] and Stute [13] for some univariate Gaussian processes on a line, have led us to incorrect result.

Since analytical computation of and is impossible, it is the purpose of the present paper to establish approximation procedure for that functions. As suggested in [14], p. 315, and [15], p. 423-424, studying the power function is of importance to be able to evaluate the performance of the test especially their rate of decay to . Therefore in this paper we investigate the upper and lower bounds for (6) by considering the result for the univariate Brownian sheet and Brownian pillow presented in Janssen [17] and Hashorva [18, 19]. Upper and lower bound for the power function of goodness-of-fit test involving multiparameter Brownian process have been studied by Bass [20]. The RKHS of is crucial for our results. By Theorem 4.1 in [11] (factorization theorem) if there exists a family , such that the covariance function of admits the representation then the corresponding RKHS is given by Furthermore, the inner product and the corresponding norm on are denoted, respectively, by and . For examples, the RKHS of is given by with such that .

The rest of the present paper is organized as follows. In Section 2 we derive the upper and lower bounds for the power functions of and tests by applying the Cameron-Martin translation formula of the multivariate process . The rate of decay of the power to is also discussed. Alternative method of obtaining the bounds of the power function is proposed in Section 3. In Section 4 we propose Neyman-Pearson test which is a most powerful test. The comparison of the rate of decay of the obtained power to with those of the KS and CvM tests is also investigated. Justification of the result is also studied in Section 5 by simulation. The paper is closed with a concluding remark in Section 6.

2. Rate of Decay of the Power of Tests

Our final goal in this section is to obtain an expression for the rate of decay of both and to the preassigned number representing the size of the test. First we derive their upper and lower bounds by generalizing the method proposed in [21] concerning bounds for the probability of shifted event; see also Theorem 7.3. in [11] for comparison. Second, we apply the technique studied in [17] to get the result. As reported in [17] and the references cited therein, they studied the upper and lower bounds for the power of signal detection test by applying Cameron-Martin density formula for a shifted measure. The rate of decay was obtained by means of mean value theorem.

Throughout this work let be the probability distributions of and let be a probability measure on , defined byThen the Cameron-Martin density of with respect to for any is given bywhere is a bilinear form, such that This general formula can be obtained by extending the formula for the univariate model presented either in [20], Theorem 5.1 of [11], and [17] or [22] to higher dimensional set-indexed Gaussian processes.

The following theorem is already well known in the literatures mentioned above; however the proof is given only for the case of Gaussian random vector in with zero mean and identity covariance matrix (canonical Gaussian Euclidean random vector); see [21] and [11], p. 53. In this paper we present again the theorem especially for the process on . Although the result for is straightforward based on that of [11, 21], to give information on how the inequality for higher dimensional set-indexed Gaussian process was derived, we insist to present the proof of the theorem; see the appendix of this work.

Theorem 1 (Li and Kuelbs [21] and Lifshits [11]). Let be any subset of and be any constant, such that . Then for any , it holds true thatwhere is the cumulative distribution function of the standard normal distribution.

The following corollary which gives an expression regarding the rate of decay of to , for any and , is an immediate implication of Theorem 1. Rate of decay describes how fast the distance between and vanishes, cf. [17–19].

Corollary 2. Let be an arbitrary subset of and be any constant, such that . Then under the assumption , we have, for any ,

Proof. We apply the technique of proving Lemma 5 of [17]. By (14) presented in Theorem 1 and by using the symmetry of , it holds thatfor some mean value , where is the probability density function of . Since , then we have Conversely, by the inequality of (14), we can derive the following result: for some mean value . Since , by the preceding result, we get which establises the proof.

When the model is either , with , or , with , such that , for and , then Hence, when we are dealing with the -dimensional set-indexed Brownian sheet and -dimensional set-indexed Brownian pillow, Inequality (14), respectively, becomes The corresponding rate of decays can be obtained respectively as follows:

In light of the preceding results, we can state the upper and lower bounds as well as the rate of decays for the power and , when is given by either or . Let be a subset of , defined by then for , we get Since is the distribution of , then is equivalent to Analogously, let Then . Thus by considering these two representations we have on the basis of Theorem 1 and Corollary 2 the following summary concerns the bounds for the power of the KS and CvM type tests.

Corollary 3. Suppose that ; then, for , it holds thatFurthermore, we have simple formulas for the rate of decay of and to where in the context of model check, the norm of related to the process and is given by

Corollary 3 says that the rate of decay or convergence of the power function to in the case of as well as depends on the norm of the trend. A Model with small norm trend leads to faster decay. Conversely, a model with large norm trend results in slower decay. For both models, the norm can be concretely calculated. It is clear that both tests achieve their sizes as the trends vanish. Indeed the work of Samorodnitsky [23] can be incorporated in the estimation of , for any large real number . In Section 5 we demonstrate by simulation the behavior of the power functions of the KS and CvM tests as summarized in Corollary Corollary 3 to give empirical study regarding the rate of decay of the power functions.

3. Alternative Approaches

In this section other formulas for the upper and lower bounds of the power of KS and CvM tests involving the -dimensional set-indexed Brownian sheet and pillow models are derived. Our results are obtained by generalizing the approach proposed in that studied in [18, 19] who confined the investigation to one-dimensional Kolmogorov type boundary noncrossing probability involving the so-called univariate ordinary Brownian sheet and pillow.

To simplify the notation we restrict the attention to the case of two-dimensional experimental region .

Theorem 4. Let the ONB of be in and let , such that are constant on the boundary of . Then for the model it holds thatwhere where and is the th component of , which is given by with denoting the th element of , say, for . Furthermore, for the model, we havewhere

Proof. By using a rule for the probability of complement, we get for the model where by using transformation of variables, it can be further expressed as Next, Cameron-Martin formula (12) for the -dimensional set-indexed Brownian sheet implicates Since means , then under the indicator we get by recalling integration by parts formula on , cf. [24, 25] and the assumption that is constant throughout the boundary of ; for the model we get Thus, the lower bound in (30) is established. To prove the upper bound, we start with the following inequality: By applying the similar technique as that used in deriving the preceding result and the implication under the indicator we have, by the integration by parts, the following inequality:completing the proof for the model. To prove the lower and upper bounds (33) for the model, we start with the equality Next by the integration by parts and the assumption that and are constant on the boundary of , we have under and the fact that Hence, , establishing the lower bound in (33). The similar argument as that used in the case of model can be applied in deriving the upper bound of as follows:establishing the proof.

Now we can derive other formulas for the rate of decay of and to by applying the similar method as that utilized in deriving the formula in Corollary 3. However by Theorem 4 we lead to computationally more complicated results.

Corollary 5. Under the condition of Theorem 7, it holds true thatfor some mean values In particular, if the mean values and are taken to be the same, then

Proof. From Inequality (30), we have, by applying the mean value theorem, for some mean value laid in the interval Conversely, based on the lower bound formula (30), we get for some mean value within the intervalThus it can be concluded that is laid in the following closed interval: In particular, if the mean values and are taken to be the same, then establishing the proof.

Analogously, from (33), we get for some . Conversely, for some . Particularly, for , we get Thus, we proved the following corollary.

Corollary 6. Under the condition of Theorem 7, we have for some and specified above. If and are chosen to be the same, then

4. Comparison to Neyman-Pearson Test

Our aim in this section is to establish nonrandomized Neyman-Pearson test for the hypothesis defined in the preceding section. It is well known in the literatures of test theory that Neyman-Pearson test constitutes a most powerful (MP) test for simple hypotheses; see, e.g., Theorem 3.2.1 in [15]. If some criterion is satisfied, the test can be extended to a uniformly most powerful (UMP) test for composite hypotheses. In this section the behavior of the power function including the rate of decay to will be investigated and compared to those of KS and CvM type tests studied in the preceding section.

Let be a linear subspace generated by a set of known and orthonormal regression functions including . In this section we consider the hypothesis against instead of against . The former is actually the common frame work of model check for multivariate regression in which one is testing whether or not while observing ; see [26] for reference. Suppose there exist and , such that . It is enough to consider the simple hypothesesHence the -dimensional set-indexed partial sums process of the residuals is given by , when is true; otherwise .

The following theorem presents an MP test of size for testing (59). Here we exhibit again the application of Cameron-Marin density formula of the shifted measure with respect to , for . Recently, [4] investigated the asymptotic optimality of a test for the mean vector in multivariate regression by means of Neyman-Pearson test.

Theorem 7. Suppose . Neyman-Pearson test of size for testing (59) will reject , if and only if Furthermore, suppose is the corresponding power function of the test. Then the value of the power, evaluated at any , is given by and otherwise, .

Proof. Let and be the density of with respect to under and , respectively. By Theorem 3.2.1 in [15], an MP test of size for testing (59) will reject , if and only if , for a constant such that Since and , then by recalling the fact , we get establishing the rejection region of the test. Next, we compute the power function for any . By the definition of and by the symmetry of , we have The last formula results in , when vanishes. The proof of the theorem is complete.

The test presented in Theorem 7 depends on the choice of specified under . For example, if we consider , for some , then is rejected at level , if This means that the test cannot be extended as a uniformly most powerful (UMP) test for the composite alternative . It is also not a UMP test for more specific one-sided alternatives or .

As discussed in the preceding section, we are also interested in investigating the rate of decay of to . Toward this topic the result of Theorem 7 leads us to the following important corollary. The proof is left since it can be handled by using the similar technique as in the proof of Corollary 3.

Corollary 8. Let be an element of , such that and . Then for every presigned , it holds thatIn the case , we have where .

Corollary 8 states that how fast the power function decays to depends on some value determined by whose structure is influenced by the type of . For comparison study suppose that the simple hypothesis (59) is tested using the KS or CvM type test. Then by virtue of Corollary 2, we have Thus, in contrast to Corollary 8, the rate of decay of the KS and CvM type tests does not depend on at all. Consequently, compared to Neyman-Pearson test, KS and CvM test cannot detect whether to take larger or less than in order to have faster or slower decay.

The result presented throughout this section will become more visible when we look at the model involving -dimensional set-indexed Brownian sheet or pillow. For example suppose we observe the model , for testing (59). Then is rejected at level , if where for the -dimensional set-indexed Brownian pillow; we have Furthermore, we get, for fixed , where the first inequality appears by Holder’s inequality, whereas the second follows by the fact that and represent continuous linear transformations on ; therefore they are uniformly bounded, cf. [27], p. 26-27. Since is continuous on the closed subset , then is bounded on . Thus there exists , such that is the uniform upper bound for . It is clear that is also the uniform upper bounds of as well as .

5. Simulation Study

In this section we investigate the behavior of and with respect to their lower and upper bounds derived in Corollary 3. The simulated model is represented by the trend plus noise process where is the two-dimensional Brownian pillow and Such a model appears as the limit process of the two-dimensional set-indexed partial sums processes of the residuals of two variate regression model for testing whether or not a constant model holds true. That is, we test the hypothesis that where , with , for . For fixed , the arrays of observation are generated from the model according to an experimental design given by a regular lattice with points on . Let and , for ; then we equivalently have for . Hence, if , then the observations are from the model assumed under . Otherwise, they support . In this simulation, the random vector is generated independently from the two-dimensional centered normal distribution with the covariance matrix given by so that we have after some computations Now, the norm of for the matrix can be computed concretely as

The simulation results using a sample of size with runs are exhibited in Table 1 and Figure 1 for . The figures presented in Table 1 represent the values of the power functions of the KS and CvM tests together with the associated values of the lower () and upper () bounds evaluated at each given value of utilizing the formulas given in Corollary 3, where in this case It is shown that the values of are never exceeding the corresponding powers. Likewise, the values of are also never preceding those of the corresponding power functions as suggested by the theory. Figure 1 presents the graphs of (dotdash line), (dotted line), (smoothed line), and (dashed line) scattered together in one panel. It can be seen that the curves of the power functions are laid within a band formed by the paired curve of and as they should be.

6. Concluding Remark

We have established the upper and lower bounds for the boundary crossing probability involving multivariate trend plus noise model. Our results give important contributions not only in the area of statistics, but also in other disciplines such as in finance mathematics and in statistical physics, where such probability model is also frequently encountered. It is important to note that the Cameron-Martin translation formula is valid if the trend function laid in the RKHS of the corresponding Gaussian process. In practice this is not always the case. Therefore further research must be conducted to be able to handle the problem appears in such situation.

Appendix

Proof of Theorem 1. Let , for a fixed . Then by recalling , we have The last equality implicatesWe will show that . For this purpose we use the Cameron-Martin formula (12), the fact that , whenever , and (A.2). Hence, we get Next by the definition of , the term on the right-hand side of the last inequality is greater than the following one:which is exactly the same with , establishing , where by the definitionWe notice that the lower bound holds for any and any constant . Hence it holds as well for the complement . That is, with . Since and , for any (by the symmetry of ), then we get the following:On the other hand by the equality and by the symmetry of , it holds thatThus, we get the upper bound which establishes the proof.

Data Availability

No data were used to support this study.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

This research was supported by the Ministry of Research, Technology, and Higher Education of the Republic of Indonesia through the KLN Research Award 2018. The author wishes to thank Professor Bischoff for valuable discussion during the preparation of the manuscript.

References

W. Somayasa, G. N. A. Wibawa, and Y. B. Pasolon, “Multidimensional set-indexed partial sums method for checking the appropriateness of a multivariate spatial regression,” Mathematical Models and Methods in Applied Sciences, vol. 9, pp. 700–713, 2015.
View at: Google Scholar
W. Somayasa and G. N. Adhi Wibawa, “Asymptotic model-check for multivariate spatial regression with correlated responses,” Far East Journal of Mathematical Sciences, vol. 98, no. 5, pp. 613–639, 2015.
View at: Publisher Site | Google Scholar
W. Somayasa, G.N.A. Wibawa, L. Hamimu, and L.O. Ngkoimani, “Asymptotic theory in model diagnostic for general multivariate spatial regression,” International Journal of Mathematics and Mathematical Sciences, vol. 2016, Article ID 2601601, 16 pages, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
W. Somayasa and H. Budiman, “Testing the mean in multivariate regression using set-indexed Gaussian white noise,” Statistics and Its Interface, vol. 11, no. 1, pp. 61–77, 2018.
View at: Publisher Site | Google Scholar | MathSciNet
I. B. MacNeill, “Properties of sequences of partial sums of polynomial regression residuals with applications to tests for change of regression at unknown times,” The Annals of Statistics, vol. 6, no. 2, pp. 422–433, 1978.
View at: Publisher Site | Google Scholar | MathSciNet
I. B. MacNeill, “Limit processes for sequences of partial sums of regression residuals,” Annals of Probability, vol. 6, no. 4, pp. 695–698, 1978.
View at: Publisher Site | Google Scholar | MathSciNet
I. B. MacNeill and V. K. Jandhyala, “Change-point methods for spatial data,” in Multivariate environmental statistics, G. P. Patil and C. R. Rao, Eds., pp. 288–306, Elsevier Science Publishers, 1993.
View at: Google Scholar | MathSciNet
L. Xie and I. B. MacNeill, “Spatial residual processes and boundary detection,” South African Statistical Journal, vol. 40, no. 1, pp. 33–53, 2006.
View at: Google Scholar | MathSciNet
K. S. Alexander and R. Pyke, “A uniform central limit theorem for set-indexed partial-sum processes with finite variance,” Annals of Probability, vol. 14, no. 2, pp. 582–597, 1986.
View at: Publisher Site | Google Scholar | MathSciNet
R. Pyke, “A uniform central limit theorem for partial-sum processes indexed by sets,” in Probability, statistics and analysis, vol. 79 of London Math. Soc. Lecture Note Ser., pp. 219–240, Cambridge Univ. Press, Cambridge-New York, 1983.
View at: Google Scholar | MathSciNet
M. Lifshits, Lectures on Gaussian processes, SpringerBriefs in Mathematics, Springer, Berlin, Heidelberg, 2012.
View at: Publisher Site | MathSciNet
J. A. Clarkson and C. R. Adams, “On definitions of bounded variation for functions of two variables,” Transactions of the American Mathematical Society, vol. 35, no. 4, pp. 824–854, 1933.
View at: Publisher Site | Google Scholar | MathSciNet
W. Stute, “Nonparametric model checks for regression,” The Annals of Statistics, vol. 25, no. 2, pp. 613–641, 1997.
View at: Publisher Site | Google Scholar | MathSciNet
R. J. Serfling, Approximation Theorems of Mathematical Statistics, John Wiley & Sons, New York, NY, USA, 1980.
View at: MathSciNet
E. L. Lehmann and J. P. Romano, Testing statistical hypotheses, Springer, New York, 3rd edition, 2005.
P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, NY, USA, 2nd edition, 1999.
View at: MathSciNet
A. Janssen and H. Ünlü, “Regions of alternatives with high and low power for goodness-of-fit tests,” Journal of Statistical Planning and Inference, vol. 138, no. 8, pp. 2526–2543, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
E. Hashorva, “Boundary non-crossings of Brownian pillow,” Journal of Theoretical Probability, vol. 23, no. 1, pp. 193–208, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
E. Hashorva and Y. Mishura, “Boundary noncrossings of additive Wiener fields,” Lithuanian Mathematical Journal, vol. 54, no. 3, pp. 277–289, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
R. F. Bass, “Probability estimates for multiparameter Brownian processes,” Annals of Probability, vol. 16, no. 1, pp. 251–264, 1988.
View at: Publisher Site | Google Scholar | MathSciNet
V. L. Wenbo and J. Kuelbs, “Some shift inequalities for Gaussian measures,” Progress in Probability, vol. 43, pp. 233–243, 1998.
View at: Google Scholar
J. A. Wellner, “Gaussian white noise models: some results for monotone functions,” in Crossing boundaries: statistical essays in honor of Jack Hall, vol. 43 of IMS Lecture Notes Monogr. Ser., pp. 87–104, Inst. Math. Statist., Beachwood, OH, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
G. Samorodnitsky, “Probability tails of Gaussian extrema,” Stochastic Processes and Their Applications, vol. 38, no. 1, pp. 55–84, 1991.
View at: Publisher Site | Google Scholar | MathSciNet
F. Móricz, “Pointwise behavior of double Forier series of functions of bounded variation,” Monatshefte für Mathematik, vol. 148, pp. 51–59, 2006.
View at: Google Scholar
J. Yeh, “Cameron-Martin translation theorems in the Wiener space of functions of two variables,” Transactions of the American Mathematical Society, vol. 107, no. 3, pp. 409–420, 1963.
View at: Publisher Site | Google Scholar | MathSciNet
S. F. Arnold, The theory of linear models and multivariate analysis, John Wiley & Sons, Inc., New York, NY, USA, 1981.
View at: MathSciNet
J. Conway, A Course in Functional Analysis, Graduate Texts in Mathematics, Springer, New York, NY, USA, 1990.
View at: MathSciNet

Copyright

Copyright © 2018 Wayan Somayasa. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

688

Downloads

683

Citations