Abstract

We modelled the distributions of two toads (Bufo bufo and Epidalea calamita) in the Iberian Peninsula using the favourability function, which makes predictions directly comparable for different species and allows fuzzy logic operations to relate different models. The fuzzy intersection between individual models, representing favourability for the presence of both species simultaneously, was compared with another favourability model built on the presences shared by both species. The fuzzy union between individual models, representing favourability for the presence of any of the two species, was compared with another favourability model based on the presences of either or both of them. The fuzzy intersections between favourability for each species and the complementary of favourability for the other (corresponding to the logical operation “A and not B”) were compared with models of exclusive presence of one species versus the exclusive presence of the other. The results of modelling combined species data were highly similar to those of fuzzy logic operations between individual models, proving fuzzy logic and the favourability function valuable for comparative distribution modelling. We highlight several advantages of fuzzy logic over other forms of combining distribution models, including the possibility to combine multiple species models for management and conservation planning.

1. Introduction

Comparative distribution modelling (i.e., building models that combine or compare the distributions of different species) is a useful tool to assess differences and similarities between species' distribution areas and environmental correlates. It has been applied, for example, to species with partially overlapping distributions [1], genetically differentiated subspecific forms [2], cryptic species whose distribution data are difficult to assign [3, 4], and species linked by close biotic interactions [5].

Comparative modelling has mostly been done in pairs, by regressing presences of one taxon against presences of the other [14]. However, this poses clear limitations to the modelling procedure: sample size may become considerably smaller than the whole study sample, because only localities with presence of either one or the other taxon (not sites where both are either present or absent) can be used, and only two taxa can be directly compared at a time.

Relatively recent developments in distribution modelling [6] provided tools to obtain environmental favourability values that can be directly compared among species, even when these have different prevalence (i.e., proportion of presences) within the study area. Environmental favourability models have the additional advantage of allowing operations of fuzzy logic (a form of multivalued logic where the truth value may range in degree between 0 and 1) between the predictions for different species [6], opening a range of possibilities for comparative distribution modelling [5, 7].

In this paper, we test fuzzy logic operations as a tool in comparative modelling using two amphibians with partially overlapping distributions, the common toad (Bufo bufo) and the natterjack toad (Epidalea calamita, formerly Bufo calamita), in the Iberian Peninsula (SW Europe). Both species have widespread distributions in the study area, but with local differences that have been related to macroenvironmental factors [1, 8]. We modelled the Iberian distributions of these species, both individually and in different combinations, and then compared the results of these combination models with those of fuzzy logic operations between the two initial individual models. We illustrate and discuss the applicability of fuzzy logic in comparative distribution modelling.

2. Materials and Methods

The study area was the Iberian Peninsula, at the south-western edge of Europe (Figure 1). It is a nearly 600,000 km2 heterogeneous region comprising the mainland territories of Portugal and Spain and linked to the continent by a narrow and mountainous isthmus. It thus constitutes a discrete biogeographical unit appropriate for studies on species distributions [5, 9].

Species distribution data, consisting of presences and absences on Universal Transverse Mercator (UTM) 10×10km grid cells (Figure 1), were taken from the herpetological atlases of Portugal [10] and Spain [11] and were collected in a roughly similar way. Although some of the absences may result from insufficient surveying effort (false absences), many others are due to ecological or historical reasons, all of which are relevant factors in biogeography. As long as false absences are not spatially structured due to geographically biased sampling effort, they do not reduce model reliability [12]. In any case, false absences are the same as missing true presences, so they affect presence-only models as well.

The UTM 10×10 km grid and the limits of the study area were downloaded from the EDIT Geoplatform [13]. We used Quantum GIS 1.7 [14] and its GRASS (Geographic Resources Analysis Support System) plugin [15] to clip the grids with the limits of the study area. Predictor variables, representative of physiography, climate, and human activity (Table 1), were digitized and interpolated in previous studies [16, 17]. We corrected the values of solar radiation [18]. Data management and statistical analyses were carried out in R 2.11 [19] except where otherwise stated.

We built generalized linear models with a binomial distribution and the logit link of the favourability function [6], which may be written as follows: 𝑒𝐹=𝑦𝑛1/𝑛0+𝑒𝑦,(1) where 𝐹 is predicted favourability, 𝑛1 and 𝑛0 are the numbers of presences and absences, respectively, 𝑒 is the basis of the natural logarithm, and 𝑦 is a logit function combining several variables and obtained using logistic regression. Basically, it is a generalized linear model that assesses the local variations in presence probability with respect to the overall species prevalence. This makes the models independent of the species' presence/absence ratio in the study area, enabling direct model comparison and combination when more than one species are involved [5, 7].

To avoid a spurious effect of surface area on the probability of the species being present, only complete UTM cells, and not those that are cut by the study area borders or the unions between UTM zones, were used for the inductive stage of the modelling. Models were then applied to the whole study area [5, 17].

Variables were included in the models using a forward-backward stepwise procedure [4, 20, 21]. Stepwise selection is a useful and effective tool to infer distribution patterns inductively from observed data, when no theory or previous hypotheses exist about the importance of each variable [5, 22]. Variable selection was based on Akaike's Information Criterion (AIC [23]), and we checked that the same models were obtained when using AIC corrected for large numbers of predictors relative to sample size (AICc [24]). In case any nonsignificant variables remained in a model after AIC-based selection, the model was further updated by removing them step by step, starting with the least significant variable [25]. The following models were built: (A)a favourability model for B. bufo, with 1 = presence and 0 = absence of this species as target data, (B)a favourability model for E. calamita, with 1 = presence and 0 = absence, (C1) a favourability model for the occurrence of both species together, where 1 = presence of both and 0 = absence of at least one of them, (D1) a model of favourability for either of the two species, where 1 = presence of at least one and 0 = absence of both species, (E1) a model of favourability for the presence of B. bufo instead of E. calamita, where 1 = presence of B. bufo only, 0 = presence of E. calamita only, and cells where both species are either present or absent were left out of the analysis, (F1) a model of favourability for the presence of E. calamita instead of B. bufo, where 1 = presence of E. calamita only, 0 = presence of B. bufo only, and cells where both species are either present or absent were left out.

Models C1 to F1 were compared, respectively, with their fuzzy logic counterparts from C2 to F2, resulting from the following operations between models A and B: (C2) fuzzy intersection between the individual models (logic “A and B”), (D2) fuzzy union of the individual models (logic “A or B”), (E2) fuzzy intersection between model A and the complementary of model B (logic “A and (not B)”), (F2) fuzzy intersection between model B and the complementary of model A (logic “B and (not A)”).

Note that models E1 and F1, which use presence-only data, are bound to be the same with contrary signs of the variables’ coefficients, but their counterparts E2 and F2 will probably be different. This is why we built both models.

The capacity of each model to discriminate between the modelled events (i.e., presence versus absence or presence of one species versus presence of the other) was assessed with the Area Under the receiver operating characteristic (ROC) Curve (AUC). This is a widely used model evaluation measure that provides a single-number discrimination measure across all possible classification thresholds for each model, thus avoiding the subjective selection of one threshold [26]. We must keep in mind that, as any discrimination measure, the AUC depends on thresholds (just not on one particular threshold) to convert continuous model predictions into binary classifications, and is strongly conditioned by species prevalence or relative occurrence area [27]. However, this does not affect our pair wise comparisons between models based on combined distribution data and those based on fuzzy logic operations, as the set of data used to assess the AUC is the same in each comparison.

We also compared the favourability values predicted by the models of combined species data and the corresponding fuzzy logic operations between individual species models, using two different measures: Spearman’s nonparametric rank correlation between favourability values, with Dutilleul’s [28] sample size adjustment for spatial autocorrelation, implemented in the SAM software [29], and the average overall similarity between maps, calculated with the Map Comparison Kit 3.2.2 (Geonamica/RIKS, The Netherlands), which performs pattern recognition considering logical coherence, local and global similarities [30]. As predictions were numerical, we used the fuzzy numerical comparison, which considers fuzziness of location (the notion that the representation of a cell depends on the cell itself and, to a lesser extent, also the cells in its neighbourhood) in the same manner as the Fuzzy Kappa [29] but is applied to numerical maps, without using a categorical similarity matrix. The following formula was employed to find the fuzzy similarity (FS) of two values 𝑎 and 𝑏 [31]: FS(𝑎,𝑏)||||=1𝑎𝑏||𝑏||max|𝑎|,.(2) We used the default values for neighbourhood radius and decay, although we tried also a few different values to check that the results were robust.

3. Results

There were 3554 presences of B. bufo and 3131 presences of E. calamita (Figure 1) in the 5464 complete UTM cells used for building models A to D (see also Figure 2 for the distribution of the presences of both species together and the presences of either of the two species). For models E and F, based on complete UTM cells where one and only one of the two species was present, the number of analysed cases dropped to 1861. The ratios between the compared events varied among models (Table 2).

The individual models obtained for B. bufo and E. calamita reflect some areas of general agreement between environmental favourability for the two species, in line with the substantial overlap in their distributions; however, there are also areas of disagreement, where one of the two species is clearly more favoured than the other (Figures 1 and 2). The variables included in the models, their coefficient estimates and associated statistics are shown in the Appendix.

The B. bufo model had an AUC of 0.711, while the E. calamita model scored a slightly lower 0.629. Spatial autocorrelation in model residuals was negligible (maximum absolute Moran’s I was 0.003 for B. bufo and 0.002 for E. calamita). The models of combined species data and the corresponding fuzzy logic operations between individual species models produced similarly shaped ROC curves and largely overlapping AUC in all four comparisons (Figure 3(a)).

The predicted values derived from modelling combined species distribution data were also generally similar to the results of fuzzy logic operations between the two single-species models (Figure 2). The similarity between these map pairs is also attested, in all four cases, by both rank correlation and fuzzy numerical comparison of predicted values (Table 2 and Figure 3(b)). For the models of presence of one species against the other, fuzzy logic operations generated less dispersed predictions, with a smaller variation interval (Figure 3(c)).

4. Discussion

The relatively low AUC values obtained for both B. bufo and E. calamita are in line with those generally obtained for species with widespread distributions in the study area [5], as the AUC is known to correlate negatively with species prevalence [27]. Expanding the study area to include the complete distributions of both species could allow obtaining models with larger AUC. However, this would require distribution data at the same resolution from the rest of the distribution areas of both species, which are not available. In addition, higher AUC values do not necessarily mean better calibrated models; they simply reflect the fact that the modelled species does not distinguish so clearly between “good” and “bad” habitat within the studied region. Moreover, as we have pointed out before, this does not affect the pair wise model comparisons, which were the focus of this paper.

Models confronting the presence of B. bufo and E. calamita have been built previously, on a narrower spatial scale, in Southern Spain [1]. Analogous models have also been built for other amphibian pairs, such as cryptic species of frogs (Discoglossus galganoi and D. jeanneae [3]) and newts (Triturus marmoratus and T. pygmaeus [4]) and genetically differentiated forms of a salamander (Chioglossa lusitanica [2]). This may be the adequate approach when the aim of modelling is to assess which environmental parameters distinguish the distribution areas of two organisms. But when the prediction of their potential distributions is the main aim, fuzzy logic operations between the single-species models may be preferable, as they present a series of advantages. (1)They avoid the need to build additional models: the single-species models are enough. (2)They allow using all distribution data available, that is, all the localities in the study area, and not only those with exclusive presence of one of the species. This increase in sample size allows a better model calibration and thus can enhance the predictive power of the models. (3)They allow the possibility of simultaneous multispecies comparisons, instead of comparing species only two by two; models such as C1 may be impracticable when applied to many species, as the number of localities where all the species have been recorded decreases with the number of species analysed, whereas models such a C2 are not affected by this. (4)Modelling the presence of any of two species (as in model D1 in our study) gives greater weight to the species with higher number of presences, while combining individual species models with fuzzy logic gives the same importance to all species involved.

Our results showed that favourability models for two species combined by means of fuzzy logic operations perform similarly to models of combined data for these species. Although we have not tested this specifically, we may assume that the method will work in other situations, differing, for example, in number of species, the magnitude of the differences between their distribution areas, species prevalence, or the geographical extent of the study area. The modelling method, however, should provide directly comparable numerical predictions, as is the case with the favourability function [6].

A fuzzy classification technique (fuzzy envelope model, FEM) has been applied [32] for predicting species’ distributions by using presence-only records, although recognizing that when absence records are available, models built using presence-absence data may perform better than presence-only models. In any case, our conclusions are likely applicable to the use of fuzzy logic operations to their fuzzy models, although this needs to be specifically tested.

Favourability values are here considered as the degree of membership to the fuzzy set of localities favourable to the analysed event (presence of one species, of any of them, of both together, and of one instead of the other). Degrees of membership are sometimes confused with probability values, in part because both take values between 0 and 1. However, the conceptual consequences of this difference between degree of membership and probability are relevant. Local favourability denotes a measure of the degree to which local conditions cause local probability to differ from the probability expected at random, that is, from that expected according to the prevalence of the event [6]. Consequently, favourability values should not be taken as probability values independent of sample prevalence. Local probability depends both on the response of the analysed event to the predictors and on the prevalence of the event [33], whereas favourability depends only on the response to the predictors in the study area [6]. Thus, favourability is aimed at complementing probability, by providing a comparable measure of the response of the event to the predictors for events differing in prevalence.

The mathematical consequences of this difference between degree of membership and probability are also relevant. The probability of simultaneous occurrence of several events is calculated by multiplying the individual probabilities of each event, which inevitably yields increasingly lower output values as more events are taken into account. The use of fuzzy logic operations avoids this mathematical problem, as favourability for the simultaneous occurrence of several events is computed as the favourability for the least favourable event [34]. This is important when the aim is to identify areas that are simultaneously favourable for groups of several species, as it is the case, for example, in the identification of favourability hotspots [7]. This is especially relevant at a time when distribution modelling of multiple species is increasingly necessary to design effective conservation strategies for both present and future scenarios.

Appendix

Variables included in each environmental favourability model, their parameter estimates (coefficients) and associated standard error, 𝑧 test, and significance (𝑃) values. Variable codes as in Table 1. ***𝑃<0.001; **𝑃<0.01; *𝑃<0.05; .𝑃<0.1. (See Tables 3, 4, 5, 6, 7, and 8).

Acknowledgments

Neftalí Sillero merged and kindly shared the species distribution data from Portugal and Spain. Christoph Scherber adapted and kindly shared the script for AICc-based model selection. A. M. Barbosa is supported by a postdoctoral fellowship (SFRH/BPD/40387/2007) from Fundação para a Ciência e a Tecnologia (Portugal), cofinanced by the European Social Fund. The “Rui Nabeiro” Biodiversity Chair is financed by Delta Cafés and an FCT project (PTDC/AAC-AMB/098163/2008).