Abstract

The areal extent of a biological community is usually determined using statistical techniques that only give reliable results where samples contain similar and high numbers of specimens. This paper presents a simple, inexpensive method for determining the geographical limits of biological communities applicable where adjacent samples contain widely differing numbers of specimens. The method is a development of SHE Analysis, which discerns boundaries between adjacent abundance biozones (ABs), an AB being an area with a distinct community structure. As originally conceived, SHEbi (SHE Analysis for the identification of Biozones) commences with species' absolute abundances and works best with large samples of equal sizes. If the variance in (per sample) is high, SHEbi may place AB boundaries in unexpected locations. A modification, based on proportional abundances, is developed here using species' proportional abundances () for each sample where is the number of specimens in the ith species in the sample. For intertidal foraminifera from the Caroni Swamp, Trinidad, where , the number of specimens, fluctuates widely between samples, the modification (SHEbip) gives ecologically more sensible results than does traditional SHEbi.

1. Introduction

β€œ a statistical analysis or test is not endowed with metaphysical properties; it cannot create good results from bad data!” [20, page 9]

A biological community is a group of interdependent organisms that lives and interacts within a habitat, such as fishes on a coral reef, birds in a forest canopy, or foraminifera within a mangrove swamp. The development of robust quantitative methods for grouping similar samples taken from the same biological community is vital for the recognition of biological communities that are real and not mere statistical artefacts. The boundaries between adjacent biological communities are detected using variations in assemblages of species across an area. The different communities contain either different dominant species or different species altogether. The programmes used to determine these boundaries are usually useful only in limited circumstances where sample sizes are uniform and large. A primary goal of both ecology and paleoecology is to understand the patterns by which groups of species are associated and distributed on the biosphere both at present and through geologic time. This paper presents a novel technique for discerning the boundaries between biological communities that require only Microsoft Excel, or a similar spreadsheet programme, and can be applied to data where the variance in sample size is high.

2. The Basis of

SHE Analysis for Biozone Identification (hereafter abbreviated as ) is a relatively new technique that groups samples within an abundance biozone (AB) by accumulating species’ abundance data one sample at a time along a transect [1], an AB being an area within which the proportional abundance of a particular species or group of species differs significantly from that in adjacent areas.

To demonstrate how a is conducted, this paper first defines some basic terms (see also Glossary of Symbols). It uses as an example the steps followed in this study of intertidal foraminifera in the Caroni Swamp, Trinidad, a small island developing state located in the SE Caribbean Sea.

The intertidal area in this swamp supports a population of foraminiferaβ€”the population being the set of all the foraminifera living within the study area. First, a cupful of intertidal sediment is collected from the study area. It is washed to remove clay, silt, and large fragments of organic matter to leave sand and a sample of intertidal foraminiferaβ€”the sample being a subset of the population. One specimenβ€”a single foraminiferiaβ€”is picked from the sample and identified. A second specimen is then picked and identified. Biological communities usually comprise a number of species. The second specimen could thus belong to either the same species as the first or to a new one. Consequently, the number of species S of foraminifera identified from the sample will increase as the number of specimens N increases. As further foraminifera are picked and identified, a total of N specimens are accumulated from the one sample for which, where is the number of specimens in the species in the sample. Species richness S has been considered a measure of diversity (e.g., [2]). Unfortunately, when comparing samples of different sizes, the number of species S identified is not a helpful parameter [3, 4] because within a sample S is typically proportional to N.

A better measure of diversity is Shannon’s [5] information function H, which is based on proportional rather than absolute abundances. To obtain H for a sample, the proportional abundance pi of the ith species in the sample is first calculated from = /N. H is then defined by This term is also known as the Shannon-Weiner Index [6, 7]. Values of H are typically 1.5–3.5 and rarely 4.5 [8]; only when species (which would require an extremely large sample size N) is [9].

Once H has been obtained for the single sample, (the exponential of it) can be calculated. Jost [10] termed the β€˜β€˜number equivalent’’ or β€˜β€˜effective number of elements’’ of the information function. It tells the absolute number of equally abundant species that would be needed to produce the calculated value of H [11]. Thus, when all species in a sample are equally abundant, is constant across all species and . In practice, species vary in abundance within a population, such that some are common and some are rare in any particular sample and . The extent to which a few species dominate the sample (thus decreasing ) or, conversely, the degree to which species abundances are equitably distributed within it (thus increasing ), is termed the sample’s population structure [1]. The value of /S gives a measure of the degree to which one or a few species dominate, and is termed the equitability index E [12, 13]. This index ranges from 0 to 1, with lower values indicating greater dominance by a few species.

Thus we have SHE: S (species), H (information function), and E (equitability index). Figure 1 gives a cartoon of the entire process outlined above.

Taking the natural logarithm of the equitability index, we get ln E = H βˆ’ ln S. This shows that ln E (which will be negative because ) is the residual remaining when ln S is subtracted from H. Sheldon [14] has shown that for any one sample, E is dependent on the number of species S and that for any one sample E becomes progressively smaller as N (and S) increase. It follows that, as ln N increases, the increase in ln S must be balanced by changes in either H, ln E or both. Buzas and Hayek [15] outline possible behaviours of H and ln E.

If a graph of S against N is plotted, within a single sample the relationship between increasing S and N is usually so strong that under ideal circumstances the plot is asymptotic (e.g., [16]). Thus, a plot of ln S against ln N forms a straight line [1, 17]. In actuality, the world is a noisy place and some deviation from a straight line usually occurs. Also, it is frequently necessary to accumulate several hundred specimens before the values of become almost constant.

Buzas [18] hypothesised that most populations have a logarithmic series population structure. Hayek and Buzas [1] demonstrated that within a population with a logarithmic series structure, H becomes constant beyond a critical but variable value of ln N (see also [19]). So if, as an increasing number of specimens N are accumulated, a graph of H versus ln N is plotted, it will not be horizontal throughout, but will slope upwards until this critical value of lnN is attained (see [17, Figure , Station ]). Practically, it is found that beyond this critical value, most additional species encountered are usually singletons (i.e., represented by single specimens only). The addition of a singleton to a large sample has negligible impact on H, the singleton having a very low proportional abundance [4]. Buzas [18] suggested that the logarithmic series population structure should be used as the null model for determining population structures.

Where a sample is large, usually only an aliquotβ€”a fraction of the total N specimensβ€”is picked. It is nevertheless assumed that these specimens have been taken randomly from an effectively infinite population [20] so that the sampleβ€”or an aliquot of itβ€”is statistically representative. Where the population being studied comprises a taxonomically related set of species (e.g., foraminifera) within a community that includes other organisms such as birds, gastropods, and mangrove trees, the taxonomically restricted population (in this case limited to foraminifera) is termed a taxocene [21].

3.

With the above in mind, may now be introduced. It is a statistical technique used to identify the point at which the population structure of a taxocene changes as a linear transect of sequential samples crosses a boundary between adjacent abundance biozones (ABs)β€”that is, crosses the boundary between two areas supporting populations with differing structures (i.e., with species present at differing proportional abundances) or compositions (with new species added in significant quantities). can be used to define either modern (ecologicalβ€”[15]) or ancient (both paleoecological and ecostratigraphic) AB boundaries based on changes in population structure or composition.

That is not used widely may in part arise from the tediousness of calculating successive measures using spreadsheets [15]. It may also arise, however, from confusion engendered by a failure by previous workers to distinguish statistical measures obtained from a single sample from those derived from 2 accumulated but discrete samples. To overcome this confusion, several symbols are introduced here. N () is used to show the number of specimens in a single sample, L to denote the number of samples in the series, and M to indicate the number of specimens in the accumulated samples L. , and are used to distinguish (a) values of these measures as computed from accumulated samples from (b) their values S, H and E as calculated using single samples.

In , samples are accumulated along a line across the study area (a line transect) and ln NA, ln SA, and ln EA recalculated as each new sample is added. Buzas and Hayek ([15, Figure ]) showed using graphs of ln SA, HA and ln EA versus ln NA that all three measures can all change within an area with a uniform population structure (i.e., within an abundance biozone). will vary until a sufficient number of specimens have been accumulated to exceed the critical value of M in an area with a logarithmic series population structure. This possibility notwithstanding, ln SA, HA and ln EA change more markedly at the point where the line transect moves between ABs having different population structures. Buzas and Hayek [15] concluded ln EA versus ln NA to be the most sensitive indicator of such a transition. The graph of ln EA versus ln NA is essentially linear within an AB but shows a marked change in slope at an AB boundary where either (a) sufficient species have joined the accumulated samples to disturb the values of for at least some species markedly, (b) species proportions within the accumulated assemblage have changed markedly without new species joining the community, or (c) both have occurred.

uses the successive addition of samples in a series, recalculating the information function HA and related measures (species richness and the equitability index ) as samples are accumulated. Where an additional sample is the same as the previous samples, there is no significant change in the value of the H. This contrasts with raw species richness, which increases with the greater overall sample size and is balanced by a decrease in the equitability index [7, 10, 22]. Crossing an AB boundary results in sampling of a new community, with sharp jumps in S, H, and E indicating significant changes in the composition and structure of the population sampled. One challenge facing this cumulative approach is that eventually the accumulating list becomes so large that even the addition of a sample with a substantially different composition needs not have a large effect on H and E [23]. This paper introduces a method termed for use where the standard deviation of the sample size is high (75%) relative to the mean. Under such circumstances, balances the samples by weighting them all equally, increasing the accuracy with which SHE Analysis identifies ecologically meaningful ABs where N varies widely among samples. recognizes dissimilar assemblages that are represented by relatively few specimens. Although this phenomenon is investigated using data from intertidal foraminifera from Trinidad, West Indies, it is applicable to all communities.

4. The Proposed Development:

commences with a table of samples and species’ absolute abundances such that M comprises the cumulative number of specimens encountered as the samples L are accumulated. In (SHE Analysis for Biozone Identification based on Proportional Abundances), analysis instead commences with a table of proportional abundance () data. Since for a single sample , S, H and E do not differ for that sample whether or is used. When a table of proportional abundances is used for , however, L becomes 1, 2, 3, , x, where x is the total number of samples accumulated. Thus, for , where is the sum of the proportional abundances of the ith species across all the samples accumulated, and L is the number of samples accumulated. As successive samples are accumulated, is recalculated using each species’ mean proportional abundances in those samples. Where N varies widely from sample to sample, this will induce differences in as compared with computed using (which uses raw abundance data). Nevertheless, because lnSA is the same for both methods, the relation ln SA = HA + EA holds true whether or is employed and it follows that any differences in HA between and must be matched by differences in ln EA. Whereas in an AB boundary is drawn wherever a graph of ln EA versus ln NA shows a break in slope, in it is drawn where there is a break in slope on a graph of ln EA versus ln NS.

The difference is illustrated here using two model data sets (Table 1, Figure 2) that show how the calculations are made. We used Microsoft Excel for our calculations. In Data Set 1, N is constant at 375 specimens per sample and M across all four samples is 1500. The addition of abundant Species E in sample S3 marks the move from one AB to another. This is reflected by a change in slope (here an increase) in the graph of ln EA versus ln NA, no matter whether ln EA is calculated using or (Figures 2(a) and 2(b), resp.). This will not be the case, however, where there are insufficient specimens in the added sample S3. In Data Set 2, sample S3 yielded only N = 25 specimens but marked the first proportionally abundant occurrence of Species F. When examined using (Figure 2(c)), there is only a slight step between samples S2 and S3 that may be dismissed as being too subtle to be significant (cf. [15, page 237]). The significance of this break can be tested using simultaneous confidence intervals [24], but this can be tedious where a large number of species are involved, simultaneous confidence intervals having to be calculated for every species. Re-examination with instead reveals a marked step between S2 and S3 indicative of an AB boundary (Figure 2(d)).

5. Materials and Methods

Wilson et al. [25] provide a description of the study area, which lies near the mouth of the Blue River in Caroni Mangrove Swamp, Trinidad. Samples of 75 mL each were taken along three line transects (Figure 3), each sample comprising the top centimetre of sediment. Samples from transects T1 and T2 were taken at 1 m horizontal intervals, while from the less steeply shelving transect T3 they were collected at 2 m horizontal intervals. Sample altitudes relative to annual mean sea level (AMSL) were determined using levelling and GPS. Transect T1 lay ~1 m south of transect C1A of Wilson et al. [25]. Within 48 hours of collection all samples were washed and sieved over a 1 mm mesh to remove coarse organic fragments, and a 63 m mesh to remove mud and silt. Because this study examined total (live + dead) foraminiferal assemblages, the washed sample residues were stored in fresh water but not stained with rose Bengal.

Foraminifera were picked from the wet residues. An attempt was made to pick ~250 specimens from all residues, but some yielded considerably fewer. Specimens were identified to species level using especially Todd and Bronnimann [26], Saunders [27, 28], and Boltovskoy and HincapiΓ© de MartΓ­nez [29]. Wilson et al. [25] gave brief taxonomic details.

The aim of this paper being to compare how and behave where N varies markedly between samples, and not to document how AB boundaries differed between the three line transects, all three were spliced on the basis of increasing altitude relative to AMSL only. (Other splicing methods, such as ordering samples using detrended canonical analysis, might indicate different AB boundaries.) and were conducted for the three spliced transects and the results compared. ABs discerned by were distinguished using italicised uppercase letters, and those indicated by using italicised numerals.

6. Results

6.1. General Characteristics of the Fauna

A total of 34 samples were recovered from the three transects and yielded a total of 3638 specimens of benthonic foraminifera in 33 species. The altitudes of the individual samples relative to AMSL ranged from βˆ’1.18 m to 0.34 m. For the 34 samples, N varied from 0 to 377 foraminifera (mean = 107, standard deviation [S.D.] = 120.7).

Further analyses were, therefore, restricted to those L = 23 samples (~68% of those collected) that yielded 20 specimens (Table 2) on the grounds that within these samples H was not correlated with N. Within these samples the total number of specimens recovered (M) was 3547 foraminifera. The most abundant species were Ammonia sp. (31% of total recovery from these 23 samples), Arenoparrella mexicana (20%), Trochammina advena (22%), and T. inflata (10%). Ammonia sp. dominated the four samples farthest below AMSL (T1-11 through T1-8), which collectively yielded ~30% of the total specimens recovered from the 23 samples analysed.

Transect T1 contained 7 of the 23 samples (mean N = 199 foraminifera, S.D. = 125), while T2 contained 6 (mean N = 119 foraminifera; S.D. = 101). Transect T3 contained 10 (mean N = 144 foraminifera, S.D. = 131). Three samples (T1-10, T3-9 and T3-3, Figure 4) each yielded 10% of the total recovery from the 23 samples, and were spread throughout the transects. Five samples yielded 1% of the total recovery. Ammonia sp. was most abundant towards the base of the combined transects, Trochammina advena towards the middle, and T. inflata towards the top (Figure 5). Arenoparrella mexicana showed two peaks in proportional abundance.

There was no significant difference between the mean yields of samples from transect T1, with the highest mean, and T2, with the lowest (Student’s t-test; = 1.255, = 2.201, d.f. = 11). Thus, the observed variations in N have not arisen from amalgamating transects with differing mean population densities. For all 23 samples, the mean N was 154 and S.D. 121, the S.D. being ~78% of the mean. N was insignificantly correlated with S (r = 0.365, P =.087) and H (r = 0.237, P =.277) but significantly correlated with E (r = βˆ’0.755, P =.001). S did not show any trend throughout these samples, but H and E were markedly lower in those four samples near the base of the merged transects that were dominated by Ammonia sp. (Figure 3). For the 23 samples, per sample N as a percentage of the total recovery (i.e., total M) varied between 1% and 11% (mean 4.4%, S.D. 3.4%; Figure 5).

7. and

Both and were applied to the 23 samples with 20 specimens. They indicated complex but markedly different patterns of abundance biozones (ABs), suggesting there were eight ABs and nine (Table 3, Figures 6 and 7). The number of samples per AB as indicated by ranged from two to five, whereas from the number ranged from one to five. Sample T3-9, although comprising 10% of the recovery, was not differentiated as a separate AB by either or . Only four AB boundaries indicated by coincided with those from , and only two ABs were identical between the two methods (AB1 from with ABA from , and AB9 from with ABH from ).

8. Discussion

The results from both and reflect complex fluctuations in the proportional abundances of species (Figure 5). Examination of the raw data shows, however, that, due to fluctuations in N, use of induced spurious placement of AB boundaries along the merged transects T1 through T3.

The lowest four samples (T1-11 through T1-8) yielded 1083 foraminifera (~30% of the total recovery), of which Ammonia sp. per sample ranged from 87% to 97% (mean 94%). In the fourth sample, N = 200. Arenoparrella mexicana was in these four samples represented by four specimens only (i.e., ~0.4% of the recovery from them), and Trochammina advena by forty (~3.7% of recovery). Neither Miliammina fusca nor Siphotrochammina lobata were recovered from the lowest four samples. In the succeeding samples T1-7 (for which N = 25 foraminifera) and T3-12 (N = 117), the proportional abundance of Ammonia sp. dropped markedly, comprising only 16% and 20% of the recovery from T1-7 and T3-12, respectively. Meanwhile the percentage abundance of A. mexicana increased to 28% and 51% of the samples, respectively. Trochammina advena was also more abundant in T1-7 and T3-12 than below, comprising 12% and 18% in these two samples, respectively, while M. fusca and S. lobata formed 24% and 16% of the recovery from T1-7, respectively.

The statistical validity of the changes in the proportional abundances of Ammonia sp. and T. advena between T1-8 and T1-7 was tested using simultaneous confidence intervals [24], using a value of to avoid a Type II statistical error. (This test could not be applied to A. mexicana, S. lobata and M. Fusca because these were not recovered from T1-8). The results indicate that the decrease in the proportional abundance of Ammonia sp. between T1-8 and T1-7 is statistically significant, but that the change in T. advena was not. The decrease in Ammonia sp. being coupled with the appearance of A. mexicana, S. Lobata, and M. fusca, is concluded that there is a change in population structure and composition between samples T1-8 and T1-7.

In line with the above, both and placed an AB boundary after the first three samples. However, whereas placed a boundary (between ABs 2 and 3) after the first four samples and coincident with the fall in the proportional abundance of Ammonia sp., did not, but instead placed the succeeding AB boundary between the fifth (T1-7) and sixth (T3-12) samples. With it is not until after data from the fifth sample has been accumulated that a sufficient number of specimens of other species have amassed to overpower the high numbers and proportions of Ammonia sp. in the fourth sample. Thus, this difference in boundary placement is due to a coupling of the dominant Ammonia sp. in sample T1-8 with the relatively small N for samples T1-7 and T3-12. Only was able to overcome the impact of the difference in sample sizes N and delineate an AB boundary at this point.

In the preceding example, per sample N decreased markedly across the AB boundary detected using. A second example shows that may also miss AB boundaries across which per sample N increases. Both and placed a boundary between samples T3-4 and T3-7 (between ABs 56 and E F, resp.). Above this boundary, grouped the next five samples as ABF. In contrast, grouped the succeeding two samples T3-7 (N = 67) and T3-5 (N = 38) as AB4, and then distinguished the succeeding T3-3 (N = 355) as a separate AB7. The samples in AB6 contained means of ~25% T. advena, ~18% Ammotium distinctum, ~12% T. inflate, and 12% Triloculina oblonga, together with 13%–24% Ammonia sp. and 0%–24% Trochammina inflata. The assemblage in the single sample AB7, in contrast, contained ~60% T. advena, 18% T. oblonga and 0% each of Ammonia sp., T. inflata and A. distinctum. Wright and Hay [30] estimated that a sample size of N = 300 is needed to ensure with 95% confidence that all species with an abundance of 1.0% have been detected. Given that N in T3-3 exceeds this, it is concluded that the disappearance of Ammonia sp., T. inflata and A. distinctum from AB7 is a statistically significant phenomenon. Simultaneous confidence intervals showed that the difference in the proportional abundances of T. advena in ABs 6 and 7 were statistically significant. There thus occurred a distinct change in the assemblage between AB6 and AB7 that warrants the placement of the AB boundary between them, as given by even though this was not detected by

It might be argued that inserts an AB boundary wherever there is a large change in per sample N. One final example demonstrates that this is not the case. Both and place sample T3-9 (~11% of total recovery) within an AB with the preceding sample, despite that fact that in the underlying sample (T2-5) N was only 28 (1% of total recovery).

The above examples demonstrate that is useful where N varies significantly between samples. It must be stressed, however, that is not intended to replace , but rather to allow the recognition of AB boundaries under marginal circumstances where sample quality is poor and cannot function fully. This ability to re-examine poor quality data is surely to be welcomed (just as medical patients with rare diseases welcome any advances made in their treatment despite it being based on studies with small sample sizes). must not, however, be used indiscriminately and seen as a correction for to be applied under all circumstances. Whereas the values of M, HA, SA, and EA from can be further analysed using SHE Analysis Identification of Community Structure (SHECSIβ€”see [18]), those from cannot.

9. Conclusions

If the number of specimens in the samples taken along a line transect varies markedly, may place an AB boundary at an unexpected position. In these cases may be modified by using a table of proportional abundances as a starting point, the new method being termed . Thus, abundance biozone boundaries can now be detected with confidence in situations where specimen recovery from samples is highly variable. Although was here applied to intertidal foraminifera, it can be applied to any community in which N (per sample) fluctuates markedly. Both and can be conducted using spreadsheet programmes that come ready-installed on any new computer. is especially useful in situations where the number of specimens varies markedly from sample to sample.

Glossary of Symbols

N:the number of specimens picked from a sample
ni:the number of specimens of the ith species in a sample
pi:the proportional abundance of the ith species in a sample, ni/N
M:the number of specimens in an accumulated series of samples
L:the number of samples in an accumulated series of samples
S:the number of species present in a single sample
SA:the number of species in an accumulated series of samples
H:the value of the information function for a single sample, H = βˆ’Ξ£ pi* pi
HA:the value of the information function for an accumulated series of samples
E:the value of the equitability index for a single sample, E = /S
EA:the value of the equitability index for an accumulated series of samples
:SHE Analysis for Biozone Identification conducted using a matrix of species absolute abundances
:SHE Analysis for Biozone Identification conducted using a matrix of species proportional abundances.

Acknowledgments

Thanks are due to bpTT Limited, who financed this paper and to Marty Buzas (Smithsonian Institution, Washington, DC) for comments during this paper’s very early stages. A contribution from the Campus Research and Publications Fund of the University of the West Indies, Trinidad, is gratefully acknowledged.