Research Article | Open Access
Distributions Escaping to Infinity and the Limiting Power of the Cliff-Ord Test for Autocorrelation
We consider a family of proper random variables which converges to an improper random variable. The limit in distribution is found and applied to obtain a closed-form expression for the limiting power of the Cliff-Ord test for autocorrelation. The applications include the theory of characteristic functions of proper random variables, the theory of almost periodic functions, and the test for spatial correlation in a linear regression model.
Improper random variables do not satisfy the condition ; that is, they may take values outside the real line . They are not used much by themselves, but there are situations when they arise as limits of proper random variables. In such cases, we say that a distribution escapes to infinity. The main problem considered in this paper is illustrated in the following in a simplified situation.
Throughout the paper we denote by the indicator of a set . Let be a uniform distribution on the segment . Consider the family of densities , where . The total mass is constant: for any . In Bayesian estimation, improper priors are obtained by letting . In this case, there are two effects at work: the support stretches out indefinitely and the height of the density goes to zero. One might be tempted to think of the limit of , as , as an infinitesimally thin layer smeared over the whole real line. This notion would be wrong because for any (the set of continuous functions on with bounded support ). Thus, if the limit in distribution of exists, it vanishes on all elements of . By the definition of the support of a distribution [1, Chapter 1, Section 13], the support of does not contain . So instead of spreading the mass over , it is more correct to say that the mass escapes to infinity.
In this paper we provide a rigorous framework for treating more complex situations. To illustrate the arising complexities, let us look at the standard normal variable with the density . Let have the density . Its characteristic function is The moment generating function of is . None of these expressions is good for characterizing the limit as . Further, let denote an arbitrary continuous and bounded function on . In the expression the height of the density goes to zero, so the integrand converges to zero everywhere. However, the graph of the density stretches out from the origin. Therefore, the best majorant for the integrand is which is generally not integrable. Thus, the dominated convergence theorem cannot be used to obtain convergence in distribution. While the case of collapsing density is easily handled by the existing theory, the case of stretching out requires new tools which we develop here.
Problem 1. Describe the limit in distribution when the stretching-out is applied to a density along all or some variables.
This problem is solved in Theorem 2.2 in Section 2 (case of all variables) and Theorem 2.6 (case of some variables). In their simplest form, those results reveal the main ideas. Let be any summable even function on (it may change sign and not integrate to unity). Let denote the generalized mean of over : Then for a continuous and bounded function on such that exists, one has
Now suppose that the function depends on two variables and the stretching-out is applied with respect to the first of them. Let be continuous and bounded on . Denote , the result of application of (1.4) to with respect to the first argument, and let . In cases when is not a density, we call this function a marginal “density.” Then, (1.5) can be used to prove if is integrable and spherically symmetric. As we show, the right sides of (1.5) and (1.7) determine distributions supported at infinity.
Theorem 2.2 is followed by two applications. One generalizes results from  on the link between jumps of a distribution function of a (proper) random variable and its characteristic function. Another application is to the fundamental theorem of H. Bohr on the Parseval identity for almost-periodic functions; see . Theorem 2.6 is applied to the limiting power of the Cliff-Ord test for autocorrelation. To formulate the related problems, we need some notation.
Consider a linear regression model where is an matrix of rank ; a vector and a number are unknown parameters. The matrix is assumed to be a nonnegative function of the parameter , with some . in applications characterizes the degree of autocorrelation. Testing for autocorrelation takes the form versus . The case is of special interest for determining the limiting power of tests.
Assuming that is positive definite for , denote and let be the density of . The density of is then given by [4, Equation (D.2)]
Problem 2. Assuming that
describe the limit in distribution of .
This problem, as an intermediate step in the proof, was considered by Martellosio , and his answer is reproduced as follows. Let denote the null space of a matrix and denote . Let be the translation by of the null space of . Reference  states that in case (1.10) see page 182. Equation (1.11) arises from the confusion between the stretching-out and collapsing. On page 159, rows 10 and 11, he remarks that (1.11) can be extended to the case . He does not specify the mode of convergence, but, as we argue in Section 2, the convergence in distribution is the right one for the problem of the behavior of the limiting power of tests for autocorrelation. Unfortunately, (1.11) does not hold under the convergence in distribution. We prove this and solve Problem 2 in Theorem 2.7 for the general case .
Let be a critical region for rejecting in favor of . Denote the probability content of under density (1.9) and define the limiting power as the limit of this probability content: . Specifically, we consider critical regions that arise from the Cliff-Ord test, to be described now. With the regressor matrix from (1.8) denote . Under the spatial autocorrelation assumption, the regression disturbances follow where is a new disturbance and is some known matrix. The scalar , which is unknown, determines the degree of correlation among the components of . For testing the null against the alternative , Cliff and Ord  proposed a test that rejects the null if . Denote
Problem 4. Describe the cases when the limiting power disappears, that is, when .
This problem has been the main motivation for this paper. Spatial models in general are peculiar in many respects, and the possibility of the limiting power to disappear is one of those peculiarities that has been attracting researchers' attention lately. Krämer  was the first to suggest that the limiting power of tests for spatial autocorrelation may vanish, for some combinations of the regressor matrix and the spatial matrix. Unfortunately, the terms in which Krämer expressed his results do not adequately reflect all the possibilities, and his proof contains an incorrect argument; see [7, Footnote 5]. Martellosio  was the first to suggest that the answer is better described in terms of the geometrical relationship between the eigenvectors of and the critical region . However, both of his main result [4, Theorem 1] and its proof contain errors (see Remark 2.12 in Section 2). Our answer to Problem 4 is given in Theorem 2.11 and corrects [4, Theorem 1] in that where he excludes the extreme values 0 and 1 for the limiting power, we give examples showing that the extreme values are possible. In the light of our result, several statements in [4, 7] have to be reconsidered. The analysis for the Cliff-Ord test is very involved, and it is not feasible to indicate corrections for all Martellosio's main results that depend on the wrong intermediate statement or his Theorem 2.9. The complexity of our analysis necessitates a more detailed notation than that used by Martellosio. In particular, some of his verbal definitions and statements are cast in a more formal way. As a result, our citations are not word for word.
Despite the fact that the convergence in distribution is the right one for Problem 4, it would be interesting to know what kind of convergence produces the limit indicated by Martellosio, as is stated next.
Problem 5. Design the mode of convergence that leads to (1.11).
To this end, we introduce a new convergence concept (which, given its purpose, could be called a retrofit convergence), which may not look intuitively appealing but allows us to prove (1.11) in Theorem 2.13. Under this alternative convergence, there is no analog of (1.12). Therefore, we did not consider Problems 3 and 4 for this convergence.
2. Main Statements
In the multidimensional version of the generalized mean (1.4) instead of averaging over segments , we have to average over balls. The shape of those balls depends on the norm of . Let be an arbitrary norm in . The balls are defined by , where the indication of the space dimension will be important when dealing with more than one space. As an example, one can think of the -norm defined by In case of the Euclidean norm , we obtain usual balls; in cases and , the balls are cubes. Another useful example is , where is a symmetric, positive definite matrix. We say that a function on is -spherically symmetric if with some function defined on the half-axis . Conditions involving spherical symmetry in the following are similar to Conditions (2.1) and (2.2) from .
Let denote an element of the unit sphere and let be the representation of a point in the polar system of coordinates such that and for . Then, the Lebesgue measure of is where is the volume of the unit ball. The norm gives rise to averages and to the generalized mean In the one-dimensional case all balls are segments, and we write simply instead of .
denotes the set of continuous bounded functions on that satisfy the Lipschitz condition By [9, Theorem 3.6.1], convergence in distribution of random elements is equivalent to the convergence of expected values [10, Theorem 2.1] asserts that here can be replaced by the set of bounded uniformly continuous functions. is the space of -summable functions on provided with the norm , .
For our applications, in the multidimensional version of (1.5), we need to allow to depend on the parameter , as in Here , . , , are matrices and , , are vectors such that and tend to and , respectively, sufficiently quickly, as stipulated in the next assumption.
Condition 1. (a) is-spherically symmetric, .(b) , , .
Remark 2.1. Using polar coordinates we see that item (a) imposes a certain integrability restriction on : where stands for the surface of the unit sphere . The class of densities satisfying (a) includes contaminated normal distribution, multivariate -distribution, multivariate Cauchy distribution, and see; . It does not matter much which norms are used in item (b).
Remark 2.3. The triangle inequality is not used in the proof, and a slight generalization of Theorem 2.2 in terms of the geometry of balls is possible. The Lipschitz condition (2.6) can be omitted if are constant ( can be assumed just bounded and continuous).
The first application of Theorem 2.2 is to the theory of characteristic functions. Let be the distribution function of a (proper, real-valued) random variable . Denote the jump of at point and let be the characteristic function.
Corollary 2.4. If is even, then where the sum on the right is over all jump points of .
The second application is to the theory of almost-periodic functions. A complex-valued continuous function on is called almost-periodic if for each there exists such that each interval of length contains at least one number for which . In the space of almost-periodic functions, the formula defines a scalar product ( is a complex conjugate of ), and the numbers are called Fourier coefficients of . It is proved that for each at most; a countable number of these coefficients are nonzero. Denoting them , . Bohr's theorem states that ; see [3, page 134]. This fact together with our Theorem 2.2 gives the next corollary.
Corollary 2.5. If is even, then for any almost-periodic function Further, if , the formula defines a scalar product which is equivalent to .
Note that the first part of this corollary applies to characteristic functions of purely discrete distribution functions because from [2, Corollary 2 of Theorem 3.2.3] any such characteristic function is almost periodic. Now, we turn to the multidimensional version of (1.7). For the density, we assume a stronger condition than Condition 1(a).
Condition 2. is - spherically symmetric, .
This assumption allows us to show that when some coordinates of are fixed, as a function of the remaining coordinates satisfies Condition 1(a).
Now, we provide the intuition for the next condition. The stretching-out applied in (1.7) is described by the transformation where is the analog of because . Here, the matrix has two eigenvalues. The limit is a singular matrix because one of these eigenvalues tends to zero as . Generalizing upon this situation and also thinking of applications to invariant tests, in the -dimensional case we consider a symmetric nonnegative matrix of size , where the parameter belongs to the segment . Denote its eigenvalues and let be diagonalized as , where and is an orthogonal matrix. degenerates at the right end of the segment , owing to the following assumption.
Condition 3. The matrix is positive definite for .
The first eigenvalues tend to zero at the same rate, ; as ; the remaining ones have positive limits: as ,.
The matrices and converge sufficiently quickly as . Namely, with the matrix one has , , .
Note that does not exist because of part (b), but part (c) allows us to set by continuity. In line with part (b), we use the partitions Also, partition conformably with (2.14) ( is and is ). Define a transformation by . denotes the result of application of the generalized mean operator with respect to , keeping fixed: and denotes a marginal "density":
Next, we turn to the solution of Problem 2. Denoting and assuming that is positive definite for , we see that condition (1.10) corresponds to the case of our Condition 3. The density (1.9) fits the framework of our Theorem 2.6 because the stretching-out is applied along one variable. Thus, in Theorem 2.7 we apply Theorem 2.6 to (a) characterize the limit in distribution of in case and (b) show that (1.11) does not hold under the weak convergence. By implication, (1.11) is wrong if any convergence stronger than the weak one is considered (e.g., uniform, almost sure, in probability and in ). Even though we use Theorem 2.6, the assumption on the density in part (b) of the next theorem is weaker than that in Theorem 2.6.
Condition 4. is a density on bounded by an integrable, -spherically symmetric function, .
An example of such a density is , where is a nonnegative and nonincreasing function on , which decays at infinity sufficiently quickly for to be integrable. By monotonicity of inequality (3.14) in the following implies that .
Theorem 2.7. Let Condition 3 hold in which .(a)Denote the projector onto the subspace spanned by the last eigenvectors of and let be defined by is obtained by replacing with in (2.16). If Condition 2 holds and is such that exists for almost all , then with defined in (2.17).(b)If Condition 4 holds, then (1.11) cannot be true if the convergence in distribution is understood.
Remark 2.8. Because of the identity , (1.11) correctly captures one feature of the limit distribution: it depends on only through .
Before giving the solution to Problem 3, we need more notation and definitions. Obtaining a closed-form formula for involves a meticulous analysis of based on the representation of given in the following.
Let be the image of . If for a given set (a) the space is represented as an orthogonal sum of two subspaces and and (b) there is a set such that , then we say that is a cylinder with the base and element . For any set , denote . Using , we see that . We say that is cone-like if . In particular, a cone-like set with each of its element contains its opposite . The next representation is a stronger statement than saying that defined in (1.14) is invariant with respect to transformations , , , .
The rejection region for the Cliff-Ord test is a cylinder with a cone-like base , where .
It is convenient to call an aperture the set in representation (2.21). The function is continuous on the unit sphere of and the set is open. By the general property of continuous mappings [11, Chapter 2, § 5, Section 5, Theorem 6], the preimage of an open set under a continuous mapping is open. Thus, the aperture is an open set (in the relative topology) of the unit sphere of . We need the notations of the interior (defined as the set of points of that belong to with some neighborhood), closure (the set of all limit points of ) and boundary .
Writing (1.12) in the form we see that Theorem 2.7(a) will be applicable if we manage to extend it from continuous Lipschitz functions to discontinuous functions of type . This is done in Theorem 2.9. In Theorems 2.9 and 2.11, we assume that . Up to the notation, this is the same assumption as (1.10). At least within our method, generalizations of the results in the following to the case are hard to obtain.
Let Condition 3 hold for and let . Then, as and all other eigenvalues have positive limits. Denote the orthonormal eigenvectors of corresponding to the eigenvalues . The partitions (2.14) become , , , . The vector will be called a shift because its role is to shift the line . In the next theorem, we extend (2.20) to with described by (2.21). In the notation of marginal densities, the subscript will indicate the number of integrated-out variables. For example, , .
Remark 2.10. (a) As can be seen from the proof, the theorem holds for any test with the critical region satisfying (2.21). (b) Conditions (2.23) and (2.25) are technical assumptions that provide integrability in the neighborhood of the origin of marginal densities that arise in the course of the proof.
To avoid triviality, in the following theorem we assume that the inclusion is strict. This implies that in representation (2.21) the set is a nonempty proper subset of .
Theorem 2.11. Let conditions of Theorem 2.9 hold, with (2.23) accompanying and (2.25) accompanying . Besides, we require the function from Condition 2 to be positive on .(1)If , then .(2)Suppose .(2.1) If , then .(2.2) If , then examples can be presented such that , or .(3)If , then .
Remark 2.12. Here, we compare this theorem with [4, Theorem 1].
(1.4) Our conditions on the density and critical region are much more restrictive. Our proof reveals the distinction between the cases and . In particular, in case (2.2), Martellosio excludes the extreme values and , while we provide counter examples showing that they are possible. Note also that we do not impose any conditions on the structure of . What happens to the limiting power in case of that arises in practice needs additional investigation.
(1.5) Martellosio's proof is based on (1.11) which we disprove in Theorem 2.7(b). A series of other propositions from the same paper (see Lemmas D.2, D.3 and E.4, Corollary 1 and Propositions 1, 2 and 5), as well as from  (see Lemma 3.2, Theorems 3.3, 3.5 and 4.1, and Proposition 3.6), depend on (1.11) and need a revision. In particular, his claim that his results are true for any invariant critical region and any continuous density that is unimodal at the origin is unwarranted.
(1.7) Even if statement (1.11) were right, the proof of [4, Theorem 1] would be incomplete because it incorrectly uses the -dimensional Lebesgue measure. Its use is inappropriate because for a degenerate density even one point may carry a positive mass. In our proof, we justify the use of -dimensional Lebesgue measure.
Now, we turn to the description of the alternative approach (solution to Problem 5). A mode is not a very good characteristic of a distribution when two unimodal densities with the same modes have very different spreads. A set of points where the density is close to its maximum might be a better characteristic in this case, at least for bell-shaped distributions. Let and let be the maximum of a continuous density . We call an -maximizing set of . The idea of this density-maximizing approach is close to the maximum likelihood principle.
Suppose a decision is taken if the statistic belongs to a set . In case of a favorable decision, is chosen in such a way that the probability is high. The use of probability in this decision rule presumes that the statistic can be calculated repeatedly. However, in practice, especially in economics, the decision is based on just one value of the statistic in question. In such a case it may be preferable to choose so that is sufficiently close to . For the -maximizing set, we have . Requiring to be close to 1 in the density-maximizing approach is similar to requiring to be close to 1 in the probability-maximizing approach (although normally as ).
As before, we assume that the matrix of size is symmetric for and positive definite for . The idea is to impose conditions ensuring that -maximizing sets are ellipsoids which in the limit give a set of the desired shape. This idea is realized through a delicate balance of the limit behavior of the eigenvalues and density contained in Conditions 5 through 7 and (2.32).
Condition 5. In the diagonal representation of , the orthogonal matrix satisfies the first eigenvalues vanish as power functions with positive constants ; the remaining eigenvalues tend to positive constants where .
Condition 6. The function in the definition of the set vanishes as a power function where are positive constants.
Condition 7. The density is -spherically symmetric, , where is continuous and monotonically decreasing on and such that where .
This assumption implies that , that the inverse function is continuous and monotonically decreasing on , and that
The idea of the proof is to approximate with a step function , prove the statement for and then pass to the limit to obtain the statement for . A step function, by definition, is a finite linear combination of indicators of measurable sets. Due to the spherical symmetry of , these sets turn out to be balls. For the method to work, the radii of the balls should be positive and finite. Approximation of by a continuous function in Step 1 is a trick to make sure that takes a finite number of values.
Step 1 ( can be assumed continuous on ). Let and denote the space of measurable functions on such that . satisfies conditions of [12, Theorem 1]:(1). (2)Minkowsky inequality: if is a measurable set and is a measurable on function, then .(3)A multiplication operator by a function is bounded in : .(4)Any finite in function is translation-continuous: .
The first three properties are standard facts of the theory of spaces; the last one follows from the fact that if is a compact subset of , then the weight in the definition of the norm of satisfies on and therefore for such an the norms and are equivalent. Functions from are known to be translation-continuous.
By Burenkov’s theorem there exists a sequence such that . Defining , from (2.9) we have . Hence, uniformly in . Thus, if we establish then (2.8) will follow.
Step 2 (approximating with a Step Function). Take an arbitrary . We can assume that is continuous on .(a)Suppose . We approximate by a step function , which will vanish where is large or small. By summability of , there exist such that is uniformly continuous on the ring . By Condition 1(a), for any natural we can find and split this ring into smaller rings in such a way that in each ring is close to its value on the inner boundary: Put if or ; for , . Combination of (3.3), (3.4), and (3.5) leads to Consequently, we can fix so that (b)If , we just put to get (3.7).
Step 3 (replacing rings by balls in the representation of ). Suppose is not identically zero. The sets in (3.4) are concentric rings with finite positive radii of the inner and outer boundaries, and by construction, is a finite linear combination of indicators of such rings. Hence, it can be written as , where are some real numbers and the radii satisfy
Therefore, with some new constants , we can write as a linear combination of indicators of balls:
If , we put formally , , . Note that , and all depend on and that the constants may deviate from the values of significantly.
Step 4 (introducing residuals for generalized means). Define the residual by
and in both cases
By the Lipschitz condition (2.6) and Condition 1(b), Here, we have used the fact that on any two norms are equivalent, so with some . Equations (2.3) and (3.13) imply that where and do not depend on . From this bound and (3.11), it follows that where is a new residual satisfying
Step 5 (proving (2.8) for the approximating function). Let . For one term in (3.9) by (2.2), and the first equation in (3.16) we have
Summation of these equations produces
Here, by (3.8), (3.12), and (3.17), , , , as . From these relations and (3.19), we see that for the given there exists such that
In case , the only difference consists in application of the second equation in (3.16). The conclusion is (3.20) with .
3.1. Proof of Theorem 2.6
Step 2 (changing the limit variable). Let us have a closer look at the expression in the square brackets in (3.25): Since , here , and by Condition 3(c), Considering the independent variable, take as its function. The limit relation translates to , and (3.26) and (3.27) become, respectively.
Step 3 (proving a preliminary version of (2.18)). Now we check that Theorem 2.2 applies to (3.28) for any fixed . Letting we see that is -spherically symmetric: Condition 1(a) is satisfied. As for Condition 1(b), partition conformably with (2.14) and put