Abstract

Ripley’s function is the classical tool to characterize the spatial structure of point patterns. It is widely used in vegetation studies. Testing its values against a null hypothesis usually relies on Monte-Carlo simulations since little is known about its distribution. We introduce a statistical test against complete spatial randomness (CSR). The test returns the value to reject the null hypothesis of independence between point locations. It is more rigorous and faster than classical Monte-Carlo simulations. We show how to apply it to a tropical forest plot. The necessary R code is provided.

1. Introduction

The commonest tool used to characterize the spatial structure of a point set is Ripley's statistic [1, 2]. It has been widely used in ecology and other scientific fields and is well referenced in handbooks [37]. Classically, an observed set of points is tested against a homogeneous Poisson point process taken as a null model. Since little is known about the distribution of the function, the test of rejection of the null hypothesis relies on Monte-Carlo simulations. Large point patterns are difficult to deal with because computation time is roughly proportional to the square of the number of points (to calculate the distances between all pairs of points) multiplied by the number of simulations. Moreover, the test is valid for one distance but using it repeatedly for all distances where the function is calculated dramatically increases the risk to reject the null hypothesis by error [8]. Duranton and Overman [9] provided a heuristic global test followed by Marcon and Puech [10]. Loosmore and Ford proposed a goodness-of-fit test to eliminate this error, but still rely on Monte-Carlo simulations.

We showed in [11] that a global test was able to return a classical value, that is to say, the probability to erroneously reject the null model, relying on exact values of the biases and variances of the statistics. We derived its asymptotic properties in a growing square window. We develop it in this paper so that it can be used in a rectangular window, as most applications require. We show that it can be applied to real point patterns, even with a little number of points and discuss in depth the way to employ it, so that it can be used by empirical researchers.

We first present our motivating example: a tropical forest plot where we want to elucidate the spatial structure of two species of trees. We provide the mathematical framework supporting the test. We apply it to the dataset and present the results. In the Discussion, we review the literature of previous tests to show why this one is a significant improvement and we verify that the test actually works. Finally, we provide a user guide to allow researchers to easily apply the test with the provided R [12] code.

2. Materials and Methods

2.1. Data to Analyze

We consider the distribution of two tree species in Paracou field station, French Guiana [13]. All trees over 10 cm DBH have been plotted. We use data from a 400.6 by 522.3 meters rectangle included in the four plots of the southern block of the experimental device. A map of trees is in Figure 1.

Dicorynia guianensis Amsh. is a widely studied species in French Guiana; its spatial structure has been characterized for a long time [14, 15]; as a visual inspection of the map allows to guess, Dicorynia guianensis is an aggregated species.

Much less is known about Tachigali melinonii (Harms) Barneby. The species has been studied for its special biomechanical behavior [16] or leaf trait plasticity [17]. The spatial structure of its saplings has been reported by Flores et al. [18] but the structure of adult trees is not clear.

2.2. Mathematical Framework

We consider a point pattern in a rectangular window. and are the sides of the window (width and length). is the intensity of the underlying point process; estimated by the number of observed points divided by the area of the window. We denote by the vector of distances . We omit the subscript for readability when there is no ambiguity; is one of the distances, and is the estimator of at distance . Points are denoted by , and is an indicator function equal to 1 when the distance between two points is less than or equal to , 0 else. Details of the calculation are not provided as we follow exactly Lang and Marcon [11] but its important steps and intuitions are presented here.

Ripley’s function is estimated from the data for each distance , without correction for edge effects:

Assumptions are those of Ripley’s function: we test the independence of locations of an observed point pattern, assumed to be a realization of a homogenous point process. Homogeneity means both stationarity (the process in unchanged by translation) and isotropy (the process is unchanged by rotation). Thus the null hypothesis of complete spatial randomness (CSR) is that the point process is a homogenous Poisson process.

The expectation of under CSR is . Edge effects introduce a bias in , computed for the null model:

Estimated can be corrected for the bias of the null model to test them against it. We get a vector of results of length :

For a homogenous point process the vector is asymptotically normal, with expectation zero and the explicit variance matrix :

Consequently follows a law with degrees of freedom. Asymptotic value of the variance is reached with dozens of thousand points, so it is of little use, but normality is acceptable with very few points so the test can be used with real data, as the exact value of can be calculated. The exact calculation of variance and covariance is quite painful because it takes into account the edge effects, resulting in the formulas given in Appendix A.

2.3. Application

The test is applied as follows: (i) compute from the observed point pattern, following (3); (ii) compute according to the window’s size and the number of observed points; (iii) finally compare to a distribution with degrees of freedom and return the value.

We provide a classical plot of [19] against , computing every 5 meters up to 250 m (to illustrate the discussion). 1,000 simulations of a binomial process with the same number of points as the real data are run. At each distance , the 25 lower and greater values are eliminated to build the local 5% confidence interval. The global confidence interval is built iteratively [9, 10]; simulations corresponding to extreme values (maximum or minimum) at any distance are eliminated. This process is repeated until 5% of the simulations are concerned. The extreme remaining values are plotted. Interpolation is used if the last iteration eliminates more simulations than required.

We apply our test to these two point sets, up to 150 meters following Collinet [15] who characterized the spatial structure of many species in Paracou and detected possible dependence at this scale. The vector of distances is meters; we discuss this choice later. Finally, we apply Loosmore and Ford’s test based on the same and 1,000 simulations.

3. Results

Aggregation of Dicorynia guianensis is obvious on Figure 2(b). Our test applied with the vector of distances (10, 20, …, 150 meters) returns a value equal to zero; that is to say, the quantile of the distribution with 15 degrees of freedom for is so low that R returns 0. Loosmore and Ford’s test gives the same result with a value of their statistic much greater than all simulations.

Figure 2(a) shows a less clear structure of Tachigali melinonii. The curve leaves the local confidence interval many times, but not the global one. Our test applied with the same distance vector (10 to 150 m) returns a value equal to 2.5%; aggregation is significant.

Loosmore and Ford’s test applied to the same distance range returns a value equal to ( is ranked 984th among the 1,000 simulations).

4. Discussion

4.1. Foundations of the Test

Many methods exist to test a point pattern against CSR [7, page 83:98], among which the relative variance [20, 21] or tests based on the nearest neighbors [2224]. Tests based on the function are particularly appealing because provides data about the relative position of points at different scales, and the function simultaneously gives useful information about the point process.

4.1.1. The Distribution of under CSR Was Unknown

The classical, local test consists in comparing each observed to the confidence interval of obtained by Monte-Carlo simulations of the null model (which should be a homogenous Poisson point process, but is usually approximated by a binomial process for simplicity, artificially reducing variability). The null model is rejected at the chosen significance level, 5%, when the observed is out of the corresponding confidence interval.

To avoid simulations, approximate confidence intervals for were proposed by Ripley [25], refuted by Koen [26] whose errors were finally corrected by Chiu [27]. All these confidence intervals were built on simulations.

The variance of has been investigated early (see [5, page 58]). Asymptotic variance was calculated and asymptotic normality was proved, allowing calculating a confidence interval for , but the exact variance remains far from its asymptotic value for usual point sets [11]. We derived the exact variance in Appendix A.

4.1.2. Testing along Many Values of Is Not Correct

In our examples, we have 30 values of . If we draw a point pattern in a Poisson process we can expect 5% of them, that is, 1 or 2 of them, to be out of the confidence interval. As a consequence, the local significance level of the test should be decreased dramatically to have a global significance level of 5%. Actually, are highly correlated because is a cumulative function; roughly speaking, most of values come from that of the previous one (see Figure 3). This reduces the need for a correction but does not eliminate it completely [28]. Since no quantification of the correction is available, the local test is used, keeping in mind that the global significance level of the test is somehow higher than announced (see [7, page 456] for a discussion). Each of the local confidence interval values is correct but testing a curve made of 30 points against local confidence envelope is not [8].

To address this issue, solutions have been proposed. Duranton and Overman [9] proposed a test consisting in eliminating simulated vectors globally when one of their values is an extreme one. Global confidence intervals plotted in the figures are heuristic; they do not rely on any mathematical proof. They appear to be too conservative for Tachigali melinonii.

4.1.3. Goodness-of-Fit Tests Are a Solution

Goodness-of-fit (GoF) tests measure the discrepancy between the expected curve of under the null hypothesis and the actual curve. This value can be compared to its quantiles under CSR. Loosmore and Ford’s [8] test is a GoF test, already proposed by Diggle [4]. Its quantiles are not known so the test relies on Monte-Carlo simulations. Three GoF tests have been proposed by Heinrich [29] but all of them are asymptotic so none can be used with real data. Our test is very similar to Heinrich’s test, but as we derived the bias due to edge effects and the exact variance-covariance matrix of under CSR, we were able to compute the statistic, which follows an law whose quantiles are well known.

4.1.4. Graphical Interpretation of the Test

Figure 3(a) shows the correlation of values of for two different values of . Each point represents a simulation of a Poisson process. The plot should be imagined in a number of dimensions equal to the number of values. As is a cumulative function, its values are highly autocorrelated. Figure 3(a) presents the results of simulations of a Poisson point process. Some are slightly aggregated (positive values); others are dispersed (negative values) due to stochasticity. Multiplying by yields values of that are independent, centered, and of variance 1 (Figure 3(b)). We denote by each element of and .

The circle’s radius is the square root of the 5% critical value of an distribution with 2 degrees of freedom. Point patterns corresponding to plots outside the circle are rejected. Thus, the test detects significant regularity of points (example not shown) as well as aggregation.

Transforming correlated values into independent whose squared norm follows an distribution relies on the exact, not asymptotic, variance matrix.

4.1.5. Correcting Edge Effects for the Null Model Is a Better Choice

Classically, edge effects are corrected when estimating . Corrections assume unseen neighbors exist beyond the window’s limits and evaluate their number according to observed data inside the window. We prefer to calculate the bias of the null model and compare empirical, uncorrected values of to their expected, biased values; see (3). We follow Gignoux et al. [24] who showed that testing data against CSR with nearest-neighbor functions was more powerful when ignoring edge effects (i.e., neither the actual point pattern nor Monte-Carlo simulations of the null hypothesis were corrected). Heuristically, correcting edge effects for each point means adding neighbors uniformly, reducing the power of the test against CSR.

4.2. Test of the Test

Although we use the exact variance matrix, we only proved asymptotical normality. This is a common issue of statistical tests; the confidence interval of an average value calculated from 30 repeated measures is usually evaluated assuming normality, only asymptotically proved.

We evaluated the minimum number of points necessary to validate the level of the test. We simulated 10,000 realizations of a Poisson point process in a rectangle window and tested them. was calculated at distances 1, 2, 3, 4, and 5. Intensity was chosen between 1/3 and 10, so that the expected number of points ranged between 50 and 1500.

Figure 4 shows the actual levels of rejections of the test (extreme intensities only are shown for readability). When points are few, the number of simulated patterns whose value is above the critical value of at low risk levels (e.g., 1%) is a little more than the risk level. At the 5% risk level, we believe the test results are acceptable for practical purposes even with very few points. We expect 500 simulations to be rejected; Table 1 shows the rejection level is always under 6%.

We also wanted to evaluate the power of the test, that is to say, its ability to reject point patterns that are not completely random but hard to detect. Grabarnik and Chiu [30] proposed to use a mixture of Matérn and Strauss processes as a counterfactual. They tested the ability different statistics including Ripley’s to distinguish them from a Poisson process. We followed them to draw a power test presented in Appendix B. Unsurprisingly, we find that our test’s power is very similar to that of in Grabarnik and Chiu’s tests.

4.3. Choosing the Distance Vector

The choice of values (the vector of distances) is arbitrary. If the point process is actually a homogenous Poisson, results are identical whatever is. Since it may not be, some rules should be followed; choosing the distances up to the expected range of interactions, with uniform steps, allows an “objective” analysis of the data [29], better than selecting values from the plots.

The statistic is the sum of contributions of for all values. are made independent by construction. Taking into account distances above the maximum range of interaction between points limits the power of the test since a fraction of the values are purely stochastic. This is a normal behavior for a goodness-of-fit test. When used on the whole distance range 0–250, Loosmore and Ford’s [8] test applied to the Tachigali example loses its power and returns a value equal to 23.5%. In the same conditions, our test returns a value equal to 6.3%; it appears to lose much less power when noisy data (there is no interaction between points at such distances) is introduced in the analysis. We chose to investigate distances up to 150 meters because we knew this was the possible range of interest.

The last question is the number of distances considered. Too many values increase stochasticity relatively to the number of points (the number of new point pairs at each new value of gets more variable, if not often zero), while too few values do not allow to detect all scales of the pattern. In order to integrate all the information, steps between values should be coherent with the processes to detect. 15 steps of 10 meters appear to be a correct way to test the structure of Tachigali. 5-meter steps are too small considering the density of the pattern (see Figure 1), and 20-meter steps are too large for the process we consider.

4.4. Application of the Test: A User’s Guide

We provide a test for a point pattern observed in a rectangular window against CSR. The code to run it with R is available as a supplementary material available online at http://dx.doi.org/10.1155/2013/753475. The function test accepts a point pattern object as defined in the spatstat package [31] and a vector of distances . It returns a value (the risk of error if CSR is rejected).

Distances should be chosen up to the range of possible interactions, with equal intervals small enough to describe correctly the pattern, but too many steps if points are few. The maximum distance is computed must be less than or equal to half the width of the rectangle. This is a classical geometrical limitation, already faced by local edge-effect corrections [4] and [32, corrected up to half the length]. Rectangular windows only are supported.

The test does not give information on what values of are responsible for its significance. Practically, in order not to lose usual references, a graphical representation of or, better, its transformed function , should be provided with local confidence intervals. If the number of points is great, the number of Monte-Carlo simulations can be reduced since these intervals are not used for the test.

5. Conclusion

Characterizing the spatial structure of a dataset representing the location of plants in an experimental plot is a common task for ecologists [33]. We provide a rigorous statistical test to reject the null hypothesis that values of an observed point pattern in a rectangle window are that of a realization of a homogenous Poisson point process. This test replaces advantageously the classical Monte-Carlo one. It will rather complete it in practical applications since Monte-Carlo simulations provide useful local information on the point process.

The test is ready to use with the provided R code to be found in the electronic appendices.

Future work includes both supporting more complex shapes, probably triangle assemblies following Pélissier and Goreaud [34], and other point processes as null hypotheses. Although it is still limited to the simplest applications, we believe this test is an important step towards more rigorous spatial statistics, based on analytical results rather than simulations.

Appendices

A. Variance and Covariance

We consider a point pattern in a rectangular window as described in Section 2.2. and are two distances the function is estimated at; is larger than . is an indicator function equal to 1 when , 0 else. is the expectation of the random variable .

is the estimator of Ripley’s function at distance . The formulas of variance of and covariance of and are explicated here.

A.1. Variance

One has  where

is the expectation of divided by the window’s area. The main term is divided by and the other terms correspond to the bias due to edge effects.

in (A.1) is unknown, so it is estimated by .

A.2. Covariance

One has  where

and are estimated by and as follows a Poisson law [11].

B. Power Test

Grabarnik and Chiu [30] proposed a new statistic () to test data against complete spatial randomness. is of little use in practical situations because it suffers edge effects with no correction for them. We are interested here in the power test proposed by the authors. From a mathematical point of view, mixtures of a clumped and a repulsive point process are an interesting challenge for a test built to reject Poisson processes whose values are intermediate. From an ecological point of view, these patterns make sense if we think of the processes responsible for, say, tree locations; aggregation is expected in regeneration processes and repulsion in competition processes. For example, Aldrich et al. [35] study the relative importance of the two processes along 60 years of the life of a forest.

We drew the same simulations as a power test. (100, 200, or 300) is the number of points of the simulated point pattern in a window of area . Half of them are drawn in a Matérn [36] process whose centers are drawn uniformly in the window (centers are not included in the point pattern) and offsprings are drawn less than 0.06 apart from centers. The number of offsprings around each center is drawn in a Poisson law of expectation (1, 2, 3, or 4). Centers are added until the number of offsprings reaches . The other half of points is drawn in a Strauss [37] process with interaction radius 0.06 and interaction parameter . Actually, Grabarnik and Chiu used a Gibbs process with a fixed number of points [7] where is the pair potential function (from 0 for no interaction to 1.2 for strong repulsion). The interaction parameter in usual presentations of the Strauss process (see [38, page 85]) is . For each parameter set (, , ), 10,000 point sets are drawn and tested against CSR. Results are summarized in Figure 5.

Increasing the number of points (going right in the figure) improves the power of the test. Clustering increases with (going down the figure) while repulsion increases with (going right along curves). The result is the ratio of rejected simulations at a 5% risk level. It should be above 95% if the test was perfect but the point pattern is designed to be difficult to test, especially when parameters are all small (few clusters, no repulsion; the process is close to Poisson) or intermediate (clustering of some points compensates repulsion of others).

Grabarnik and Chiu rejected CSR when the maximum departure of from excessed that of 95% of a simulated binomial process of points. A comparison between Figure 5 and their Figure 2 shows that powers are similar, even though their test is too optimistic according to Loosmore and Ford [8].

In summary, we can see here that the tendencies observed for the function’s power with Monte-Carlo simulations remain valid. is not very efficient to disentangle mixed clustered and repulsive point processes with both high and high (Grabarnik and Chiu showed that Diggle’s is better for that purpose) but rather powerful to detect clustering.

Acknowledgment

This work has benefited from an “Investissement d’Avenir” Grant managed by Agence Nationale de la Recherche (CEBA, ref. ANR-10-LABX-0025).

Supplementary Materials

The supplementary material contains the code for R to perform the test presented in the article.

  1. Supplementary Material