Abstract

Hotspot detection has been widely adopted in health sciences for disease surveillance, but rarely in natural resource disciplines. In this paper, two spatial scan statistics (SaTScan and ClusterSeer) and a nonspatial classification and regression trees method were evaluated as techniques for identifying chestnut oak (Quercus Montana) regeneration hotspots among 50 mixed-oak stands in the central Appalachian region of the eastern United States. Hotspots defined by the three methods had a moderate level of conformity and revealed similar chestnut oak regeneration site affinity. Chestnut oak regeneration hotspots were positively associated with the abundance of chestnut oak trees in the overstory and a moderate cover of heather species (Vaccinium and Gaylussacia spp.) but were negatively associated with the abundance of hayscented fern (Dennstaedtia punctilobula) and mountain laurel (Kalmia latiforia). In general, hotspot detection is a viable tool for assisting natural resource managers with identifying areas possessing significantly high or low tree regeneration.

1. Introduction

A hotspot can mean an unusual phenomenon, anomaly, aberration, outbreak, elevated cluster, or critical area [1]. Spatial scan statistics have been widely adopted in health sciences in detecting hotspots for disease surveillance [2], and they also have wide application in critical issues, such as homeland security, public health, and disaster management. Identifying hotspots can provide early warning of disease outbreaks or other emerging issues. With innovative modifications, hotspot analysis can be used in any field [1]. Yet, applications of hotspot detection methods in forestry and other natural resource related disciplines are rare. By identifying areas with unusual high (hotspot) or low (cold-spot) measures of interested issues or events, natural resource managers can allocate limited resources to those areas that desire more attention. In this study, oak (Quercus spp.) regeneration was used to illustrate the feasibility of applying hotspot detection methods in forestry.

Oak regeneration is one of the critical issues in forest health throughout eastern North America. Oaks are the most prevalent forest species in the eastern deciduous forests. However, as a common problem throughout eastern North America, natural regeneration of oaks is often difficult to obtain [3]. The reasons for the lack of oak regeneration presumably involve large-scale, exogenous, and unprecedented factors such as new disturbance regimes, fire suppression, invasion of exotic species, climate change, and/or modern wildlife and forest management practices [4]. The precise causes are unclear and likely include a complex interaction of factors. Decline in oak can cause detrimental effects both economically and ecologically because oaks are the major timber species in the eastern U.S. and their seeds are a key part of the diet for many vertebrate species. The increasing interest in hardwood regeneration, particularly in mixed-oak stands, has underscored the importance of understanding the relationship between the abundance of tree seedling and site conditions. Knowledge of the relationships between regeneration abundance and biotic and abiotic factors can help resource managers to maintain a healthy, diverse, and compositionally stable forest ecosystem. Similar to health related surveillances that identify areas with abnormally high rates of disease, hotspot detection may help natural resource managers locate areas with high or low abundance of oak regeneration.

A variety of spatial scan statistics have been developed to identify spatial and spatiotemporal hotspots. SaTScan (http://www.satscan.org/) is one of the most popular circle-based scan statistics (i.e., the searching window and the resulting cluster are a set of circles) that has been broadly applied in health sciences. SaTScan analyzes spatial, temporal, and space-time data using a variety of models such as the Poisson based model (the number of events in an area is Poisson distributed), the Bernoulli model (0/1 event data such as cases and controls), or the space-time permutation model (case data only). SaTScan is designed for any of the following interrelated purposes: (1) to evaluate reported spatial or space-time clusters to see if they are statistically significant, (2) to test whether a disease is randomly distributed over space, over time, or over space and time, and (3) to perform geographical surveillance of disease, by detecting areas of significantly high or low rates [2]. ClusterSeer (TerraSeer Inc. Ann Arbor, Michigan, U.S., http://www.terraseer.com/) is the other popular scan statistic to evaluate disease hotspots and nondisease events such as crime or sales data. ClusterSeer can be used to determine whether a cluster is significantly different from a random spatial distribution or an underlying spatial pattern, where it is located, and when it arose. ClusterSeer provides a large selection of spatial, temporal, and space-time detection methods. Other hotspot detection methods such as CrimeStat [5], GeoDa (http://geodacenter.asu.edu/), and ULS [6] are also available; however, an extensive methodological review is beyond the scope of this paper.

In this study, the two most popular spatial scan statistics, SaTScan and ClusterSeer, were applied to identify hotspots for chestnut oak (Q. montana) regeneration. Regeneration hotspot was defined as a region represented by a cluster of stands that have significant high oak regeneration abundance. In addition, a nonspatial method, classification and regression trees (CART), was applied to compare with these spatial scan statistics. CART is a nonparametric method for clustering sites in a manner that ignores the georeferencing of the data. It partitions the dataset recursively into subsets that are increasingly homogeneous with respect to the defined groups, providing a tree-like classification and an associated dichotomous key to classify unknown samples into the groups [7]. Identified hotspots of chestnut oak regeneration not only help regional forest managers to better understand the relationships between regeneration abundance and site conditions, but also help to allocate limited resources to areas where management activities can have the largest impact on the sustainability of oak.

2. Methods

2.1. Data Source

Chestnut oak regeneration abundance and associated biotic and abiotic factors were collected from 50 widely distributed mixed-oak stands in the central Appalachian region in the eastern United States (latitude to N, and longitude to W). Within each stand, depending on stand size, 15 to 40 permanent plots with 8.02 m radius (0.02 ha) were systematically installed in a square grid to represent the whole stand. Four permanent 1.13 m radius () subplots were established within each plot at a fixed distance from plot center at each cardinal direction. In total, 5,732 subplots were established in the study area. At the subplot level, chestnut oak regeneration was recorded by height class, percentage cover of competitive understory vegetation including hayscented ferns (Dennstaedtia punctilobula), mountain laurel (Kalmia latiforia), and heather shrubs (blueberry (Vaccinium spp.) and huckleberry (Gaylussacia spp.)) were estimated, and presence/absence of deer browsing was also recorded. At the plot level, slope percent and aspect were measured and all overstory trees’ diameter at breast height were tallied by species. Chestnut oak basal area and total basal area of other canopy trees were then calculated for each plot. In addition, soil types and elevation were derived from GIS layers for each plot.

Aggregate height, the total height of all the individual seedlings of a given species, was used to describe chestnut oak seedling population abundance. Aggregate height is a composite measure of seedling size and density, which provides a comprehensive measure of cumulative regeneration potential for a given species or species group [8]. The following formula was used to calculate average stand aggregate height where hijis the height (measured in meter) of jth seedling on ith subplot, n is number of seedlings on subplot i, k is the number of sample subplots, and coefficient 4.047 is the total square meters in each subplot. Because of the nature of the field survey design (i.e., subplot is nested in plot and plot is nested in stand), spatial autocorrelation is inevitable. Therefore, hotspot detection was carried out at the stand level. To make the three proposed methods comparable, a prior threshold of 15 percent of stands with above average regeneration abundance was set for all methods (i.e., stands within the highest 15 percent of aggregate height can be defined as hotspot). However, the actual number of stands included in the hotspots classified by each method varied according to their statistical significance with an alpha level of 0.05.

2.2. Hotspot Detection

SaTScan can be used to detect areas of significantly higher or lower rates and to evaluate reported spatial, temporal, or spatiotemporal clusters [2]. Aggregate height, the sum of the height of all the individual seedlings, does not look like a count variable. However, because height is measured in height class, aggregate height can be considered as a total unit of height or total unit of biomass in a given area. Hence, we can view aggregate height as a count variable. A Poisson probability model was used under the assumption that aggregate height in each stand follows the Poisson distribution. Because we cannot obtain the maximum potential regeneration for each stand, a fixed maximum potential regeneration population was assumed for each of the stands using an observed maximum aggregate height of eight m/m2. Kulldorff’s [2] scan statistic considered the collection of circles of all radii, centered about each sampling unit and constrained to comprise less than x percent of the population at risk (). In this study, an initial x was set at 15 percent. The actual population at risk (or the population within the identified clusters) will be determined by the SaTScan program and can be higher or lower than 15 percent.

ClusterSeer provides spatial, temporal and spatiotemporal clustering methods to evaluate disease clusters and nondisease events. Since aggregate height represents group-level oak regeneration abundance, Besag and Newell’s [9] method was used to detect global spatial clusters. As with the SaTScan method, a maximum potential regeneration population was assumed with aggregate height of eight m/m2. For each sampling unit, Besag and Newell’s method only consider the smallest circle that contains at least k cases, for an arbitrarily selected k. Cluster cutoff size (k) was set at 14 in this study, which not only provided enough power to detect the clusters, but also made the total number of stands included in these clusters comparable to the other methods.

As a nonspatial, non-parametric method, CART offers an alternative for exploring chestnut oak regeneration hotspots. CART addresses two companion questions in one analysis: do groups of stands differ significantly from one another, and which variables best account for these differences? CART method partitions data recursively into subsets and provides a tree-like classification. The recursive approach makes it able to capture some relationships that make sense ecologically but that are difficult to reconcile with conventional linear models. SPSS 13.0 (SPSS Inc., Chicago, Illinois, U.S.) was used to run the CART analysis. Since the response variable, aggregate height, is an ordinal variable, a chi-square test was used to determine node splitting and category merging and was calculated using the likelihood-ratio method (CHAID criterion).

Chestnut oak regeneration site affinity was then inferred using relative frequency distribution of associated factors on hotspots identified by the above three methods. Relative frequency was calculated using number of stands identified as hotspots divided by total number of stands within each range for each variable. In this study, relative frequency distributions of elevation, overstory basal area of chestnut oak (BA chestnut), overstory basal area of other species (BA other), and percentage cover of herbaceous species were calculated.

3. Results

3.1. Identified Hotspots

SaTScan identified three chestnut oak regeneration hotspots throughout the study area. Regeneration hotspots were concentrated in the southwest part of the study area (Figure 1(a)). Each dash-line highlighted circle represents a regeneration hotspot. The number of stands within each of the three predicted regeneration hotspots ranged from one to six, hotspot radii ranged from 0.2 to 15.9 km. Radii of regeneration hotspots had an inverse relationship with the average aggregate height in each hotspot. In total, eight stands were included in the three predicted hotspots, and their average aggregate height ranged from 0.16 to 1.68 m/m2 (Table 1). Chestnut oak regeneration hotspots were concentrated in the Ridge and Valley, and no hotspot was detected on the Allegheny Plateau physiographic province.

ClusterSeer identified five nonoverlapping regeneration hotspots. Regeneration hotspots were concentrated in the south part of the study area in the Ridge and Valley province (Figure 1(b)). The number of stands within each predicted regeneration hotspot ranged from one to three. Again, radii of regeneration hotspots had an inverse relationship with the average aggregate height in each hotspot. In total, eight stands were included in the five hotspots, and their average aggregate height ranged from 0.09 to 1.68 m/m2 (Table 1).

Results of the CART analysis found two major branches split by the two physiographic provinces (Figure 2). Both the Allegheny Plateau branch and the Ridge and Valley branch consisted of two terminal nodes. Physiographic provinces and deer browsing were the two major factors that were significantly associated with regeneration abundance. To match with the former methods, stands in Node 6 that had the highest average chestnut oak aggregate height were defined as regeneration hotspots, which included a total number of 10 stands. Since stand location was not considered in this method, stands within the same hotspot may not be spatially close to one another. Average aggregate height of stands in Node 6 ranged from 0.04 to 0.76 m/m2 (Table 1). Regional distribution of stands predicted as high regeneration areas by CART analysis were mapped in Figure 1(c). Chestnut oak regeneration hotspots were again only found in the Ridge and Valley. However, regeneration hotspots were distributed across the Ridge and Valley province.

3.2. Conformity of Hotspots

Oak regeneration hotspots defined by each method have a moderate level of conformity. Percentages of common stands included in regeneration hotspots by different methods had a range between 44 and 56 percent (Table 2). Overall, hotspots defined by SaTScan method had the highest matching percentage with other methods. Regional distribution of hotspots defined by the three methods had some similarities and dissimilarities. All stands identified as hotspots by the three different methods were located in the Ridge and Valley province (Figure 1). However, except for the three stands in southwest part of the study area that were consistently recognized as regeneration hotspots, spatial patterns of hotspots identified by the three methods are quite different.

3.3. Associated Factors

As identified by the CART method, physiographical province was the most dominant factor that influences chestnut oak regeneration abundance. Deer browsing was the other significant factor; however, this factor was not selected as the dominant factor by SaTScan and ClusterSeer methods. Fifty and 37 percent of stands within hotspots identified by ClusterSeer and SaTScan, respectively, had significant deer browsing problems.

Relative frequency distribution of other biotic and abiotic factors associated with the hotspots detected by different methods are presented in Figure 3. About 25 to 57 percent of stands with elevation between 400 to 500 m were identified as chestnut oak regeneration hotspots (Figure 3(a)). Relative frequency of stands identified as chestnut oak hotspots increased as basal area of mature overstory chestnut oak increased (Figure 3(b)). This trend held true for all three methods when overstory chestnut oak basal area is below 0.4 square meters per hectare. Relative frequency distributions of non-chestnut-oak overstory basal area were less consistent among the three methods. With the exception of the CART method, percentage of stands identified as hotspots was the highest (50 percent) when basal area of other overstory tree was below 0.75 square meters per hectare (Figure 3(c)). Relative frequency of stands identified as chestnut oak hotspots generally decreased as percent cover of hayscented fern and mountain laurel increased (Figures 3(d) and 3(e)) and was greatest when heather species (mainly blueberry and huckleberry) cover was around 20 percent (Figure 3(f)).

4. Discussion

Site affinity analyses indicate that all three methods had general agreements on the association between regeneration abundance and environmental factors. It is not surprising to see chestnut oak regeneration hotspots were concentrated in the southern portion of the study area. Oak forests are the dominant natural vegetation in the Ridge and Valley region, and transition into Allegheny hardwoods moving from south to north on the Allegheny Plateau [10]. The elevation range where the hotspots locate coincides with the elevation of ridge tops in the Ridge and Valley province. This agrees with other research findings that abundant chestnut oak regeneration is commonly associated with ridge tops and upper slope positions [11]. Chestnut oak regeneration is favored by the presence of mature overstory trees of its own species and inhibited by other species. This association is mainly caused by the biological characteristics of chestnut oak. Although small mammals and birds may disperse acorns long distances [12, 13], the majority of acorns merely falls to the ground and remains in the vicinity of parent trees. More adult oak trees produce more acorns, which increases the likelihood that more oak seedlings become established. Another biological characteristic of chestnut oak is that its seedlings are intolerant of shade [14]. As overstory crowdedness increases, less sunlight can penetrate the canopy to reach the forest floor, which can reduce oak seedling growth and survival. Presence of abundant overstory nonoak species will reduce the establishment and growth of chestnut oak regeneration, whereas presence of abundant overstory chestnut oak can provide a sufficient seed source that promotes the establishment of oak seedlings.

Hayscented fern in mixed-oak forest understories appears to suppress desirable tree seedlings by decreasing light quantity and quality beneath the herbaceous layer [15]. Hayscented fern has been classified as a competitor species because of its ability to respond aggressively to sudden resource availability with vegetative expansion through rhizomes and sexual reproduction [16, 17]. mountain laurel is another strong regeneration competitor because of its aggressive vegetative growth habit [18]. Chapman [19] reported that light levels underneath mountain laurel canopies may only be about two percent of full sunlight. Blueberry and huckleberry, on the other hand, had positive association with chestnut oak regeneration when moderate cover exists. Results match previous research that found abundant chestnut oak regeneration was normally associated with blueberry and huckleberry cover [20, 21]. Rogers [22] pointed out that heath communities dominated by blueberry and huckleberry have an affinity for infertile sites with well-drained acidic soils. The affinity of chestnut oak for similar environmental conditions may at least partially explain why regeneration of chestnut oak was associated with blueberry and huckleberry.

Hotspots defined by the two spatial scan statistics and the CART method have a moderate level of conformity. Both SaTScan and ClusterSeer are designed to assess the significance of spatial clusters. Although hotspots detected by the two methods have some overlap, the overall patterns of the identified hotspots are rather different. Two different scan statistics were applied in SaTScan and ClusterSeer. Kulldorff’s circular scan statistic was applied in SaTScan, which considers the collection of circles of all radii, centered about each sampling unit and constrained to comprise x percent () of the population at risk. Whereas Besag and Newell’s circular scan statistic was applied in ClusterSeer, which considers the smallest circle that contains at least k cases, for an arbitrarily selected k. Both methods use circles as the scanning window that unavoidably includes areas with low regeneration as hotspots. For example, although deer browsing is a known problem in oak regeneration, 50 percent of the stands identified by ClusterSeer as regeneration hotspots had deer browsing problems, indicating that some stands identified by ClusterSeer were not true chestnut oak regeneration hotspots. Similarly, hotspots detected by SaTScan suggest that some level of hayscented fern cover favors chestnut oak regeneration (Figure 3(d)). This also indicates that some stands identified by SaTScan did not have high chestnut oak regeneration potential.

CART analysis simultaneously defined subgroups and considered associated biotic and abiotic variables. However, it did not directly define which subgroups are hotspots and it ignored the spatial association among stands. CART partitioned stands recursively into relative homogeneous groups with a common set of associated biotic and abiotic variables. This process allowed areas with high regeneration potential but low regeneration abundance to be considered as regeneration hotspots. Meanwhile, because the results of CART analysis is strongly dependent on the biotic and abiotic variables included in the partitioning process, hotspots identified using the CART method can be biased due to the selection of variables by the investigator, or can be misleading if important variables are not included. Nevertheless, if appropriate and sufficient variables were chosen to be included in the model, the CART method is attractive for this type of study. In addition, although CART is not designed for detecting hotspot clusters, it provided an alternative solution when lacking geospatial references.

Because results of the above three methods are regional and species specific, it is unwise to make a general conclusion of which method is the best in detecting regeneration hotspot and it is also not the intention of this study. Each method used in this study has its merits and limitations (Table 3). Among the three methods, only the CART method requires explanatory variables to be included in the analysis, but it also simultaneously provides biological implications in its results. ClusterSeer requires users to have good knowledge about different statistical algorisms, but it provides more flexibility for users to customize their model to better fit their data. Results from all methods are relatively easy to interpolate.

This study provided an example of how hotspot analysis could be used in a natural resource management context. Results suggest that all three hotspot detection methods may be useful to natural resource managers as a means of identifying areas with high (hotspot) or low (cold-spot) measures. Meanwhile, results from these methods are not ideal because most natural resource related data are complex (often cannot be simplified by a Poisson probability model) and irregular shape (rarely a hotspot cluster can be represented by a circle). As hotspot detection methods improve, such as the development of irregular shaped hotspot detection and application of Bayesian statistics, hotspot detection will be more widely applied in forestry and other natural resource disciplines.

Acknowledgment

The author thanks Dr. G.P. Patil and Dr. K.C. Steiner for their intellectual support in developing this paper and Dr. J.M. Lhotka for his critical review of an earlier draft of this article.