Research Article | Open Access
Applying Fuzzy Logic to Comparative Distribution Modelling: A Case Study with Two Sympatric Amphibians
We modelled the distributions of two toads (Bufo bufo and Epidalea calamita) in the Iberian Peninsula using the favourability function, which makes predictions directly comparable for different species and allows fuzzy logic operations to relate different models. The fuzzy intersection between individual models, representing favourability for the presence of both species simultaneously, was compared with another favourability model built on the presences shared by both species. The fuzzy union between individual models, representing favourability for the presence of any of the two species, was compared with another favourability model based on the presences of either or both of them. The fuzzy intersections between favourability for each species and the complementary of favourability for the other (corresponding to the logical operation “A and not B”) were compared with models of exclusive presence of one species versus the exclusive presence of the other. The results of modelling combined species data were highly similar to those of fuzzy logic operations between individual models, proving fuzzy logic and the favourability function valuable for comparative distribution modelling. We highlight several advantages of fuzzy logic over other forms of combining distribution models, including the possibility to combine multiple species models for management and conservation planning.
Comparative distribution modelling (i.e., building models that combine or compare the distributions of different species) is a useful tool to assess differences and similarities between species' distribution areas and environmental correlates. It has been applied, for example, to species with partially overlapping distributions , genetically differentiated subspecific forms , cryptic species whose distribution data are difficult to assign [3, 4], and species linked by close biotic interactions .
Comparative modelling has mostly been done in pairs, by regressing presences of one taxon against presences of the other [1–4]. However, this poses clear limitations to the modelling procedure: sample size may become considerably smaller than the whole study sample, because only localities with presence of either one or the other taxon (not sites where both are either present or absent) can be used, and only two taxa can be directly compared at a time.
Relatively recent developments in distribution modelling  provided tools to obtain environmental favourability values that can be directly compared among species, even when these have different prevalence (i.e., proportion of presences) within the study area. Environmental favourability models have the additional advantage of allowing operations of fuzzy logic (a form of multivalued logic where the truth value may range in degree between 0 and 1) between the predictions for different species , opening a range of possibilities for comparative distribution modelling [5, 7].
In this paper, we test fuzzy logic operations as a tool in comparative modelling using two amphibians with partially overlapping distributions, the common toad (Bufo bufo) and the natterjack toad (Epidalea calamita, formerly Bufo calamita), in the Iberian Peninsula (SW Europe). Both species have widespread distributions in the study area, but with local differences that have been related to macroenvironmental factors [1, 8]. We modelled the Iberian distributions of these species, both individually and in different combinations, and then compared the results of these combination models with those of fuzzy logic operations between the two initial individual models. We illustrate and discuss the applicability of fuzzy logic in comparative distribution modelling.
2. Materials and Methods
The study area was the Iberian Peninsula, at the south-western edge of Europe (Figure 1). It is a nearly 600,000 km2 heterogeneous region comprising the mainland territories of Portugal and Spain and linked to the continent by a narrow and mountainous isthmus. It thus constitutes a discrete biogeographical unit appropriate for studies on species distributions [5, 9].
Species distribution data, consisting of presences and absences on Universal Transverse Mercator (UTM) km grid cells (Figure 1), were taken from the herpetological atlases of Portugal  and Spain  and were collected in a roughly similar way. Although some of the absences may result from insufficient surveying effort (false absences), many others are due to ecological or historical reasons, all of which are relevant factors in biogeography. As long as false absences are not spatially structured due to geographically biased sampling effort, they do not reduce model reliability . In any case, false absences are the same as missing true presences, so they affect presence-only models as well.
The UTM km grid and the limits of the study area were downloaded from the EDIT Geoplatform . We used Quantum GIS 1.7  and its GRASS (Geographic Resources Analysis Support System) plugin  to clip the grids with the limits of the study area. Predictor variables, representative of physiography, climate, and human activity (Table 1), were digitized and interpolated in previous studies [16, 17]. We corrected the values of solar radiation . Data management and statistical analyses were carried out in R 2.11  except where otherwise stated.
We built generalized linear models with a binomial distribution and the logit link of the favourability function , which may be written as follows: where is predicted favourability, and are the numbers of presences and absences, respectively, is the basis of the natural logarithm, and is a logit function combining several variables and obtained using logistic regression. Basically, it is a generalized linear model that assesses the local variations in presence probability with respect to the overall species prevalence. This makes the models independent of the species' presence/absence ratio in the study area, enabling direct model comparison and combination when more than one species are involved [5, 7].
To avoid a spurious effect of surface area on the probability of the species being present, only complete UTM cells, and not those that are cut by the study area borders or the unions between UTM zones, were used for the inductive stage of the modelling. Models were then applied to the whole study area [5, 17].
Variables were included in the models using a forward-backward stepwise procedure [4, 20, 21]. Stepwise selection is a useful and effective tool to infer distribution patterns inductively from observed data, when no theory or previous hypotheses exist about the importance of each variable [5, 22]. Variable selection was based on Akaike's Information Criterion (AIC ), and we checked that the same models were obtained when using AIC corrected for large numbers of predictors relative to sample size (AICc ). In case any nonsignificant variables remained in a model after AIC-based selection, the model was further updated by removing them step by step, starting with the least significant variable . The following models were built: a favourability model for B. bufo, with 1 = presence and 0 = absence of this species as target data, a favourability model for E. calamita, with 1 = presence and 0 = absence, a favourability model for the occurrence of both species together, where 1 = presence of both and 0 = absence of at least one of them, a model of favourability for either of the two species, where 1 = presence of at least one and 0 = absence of both species, a model of favourability for the presence of B. bufo instead of E. calamita, where 1 = presence of B. bufo only, 0 = presence of E. calamita only, and cells where both species are either present or absent were left out of the analysis, a model of favourability for the presence of E. calamita instead of B. bufo, where 1 = presence of E. calamita only, 0 = presence of B. bufo only, and cells where both species are either present or absent were left out.
Models C1 to F1 were compared, respectively, with their fuzzy logic counterparts from C2 to F2, resulting from the following operations between models A and B: fuzzy intersection between the individual models (logic “A and B”), fuzzy union of the individual models (logic “A or B”), fuzzy intersection between model A and the complementary of model B (logic “A and (not B)”), fuzzy intersection between model B and the complementary of model A (logic “B and (not A)”).
Note that models E1 and F1, which use presence-only data, are bound to be the same with contrary signs of the variables’ coefficients, but their counterparts E2 and F2 will probably be different. This is why we built both models.
The capacity of each model to discriminate between the modelled events (i.e., presence versus absence or presence of one species versus presence of the other) was assessed with the Area Under the receiver operating characteristic (ROC) Curve (AUC). This is a widely used model evaluation measure that provides a single-number discrimination measure across all possible classification thresholds for each model, thus avoiding the subjective selection of one threshold . We must keep in mind that, as any discrimination measure, the AUC depends on thresholds (just not on one particular threshold) to convert continuous model predictions into binary classifications, and is strongly conditioned by species prevalence or relative occurrence area . However, this does not affect our pair wise comparisons between models based on combined distribution data and those based on fuzzy logic operations, as the set of data used to assess the AUC is the same in each comparison.
We also compared the favourability values predicted by the models of combined species data and the corresponding fuzzy logic operations between individual species models, using two different measures: Spearman’s nonparametric rank correlation between favourability values, with Dutilleul’s  sample size adjustment for spatial autocorrelation, implemented in the SAM software , and the average overall similarity between maps, calculated with the Map Comparison Kit 3.2.2 (Geonamica/RIKS, The Netherlands), which performs pattern recognition considering logical coherence, local and global similarities . As predictions were numerical, we used the fuzzy numerical comparison, which considers fuzziness of location (the notion that the representation of a cell depends on the cell itself and, to a lesser extent, also the cells in its neighbourhood) in the same manner as the Fuzzy Kappa  but is applied to numerical maps, without using a categorical similarity matrix. The following formula was employed to find the fuzzy similarity () of two values and : We used the default values for neighbourhood radius and decay, although we tried also a few different values to check that the results were robust.
There were 3554 presences of B. bufo and 3131 presences of E. calamita (Figure 1) in the 5464 complete UTM cells used for building models A to D (see also Figure 2 for the distribution of the presences of both species together and the presences of either of the two species). For models E and F, based on complete UTM cells where one and only one of the two species was present, the number of analysed cases dropped to 1861. The ratios between the compared events varied among models (Table 2).
The individual models obtained for B. bufo and E. calamita reflect some areas of general agreement between environmental favourability for the two species, in line with the substantial overlap in their distributions; however, there are also areas of disagreement, where one of the two species is clearly more favoured than the other (Figures 1 and 2). The variables included in the models, their coefficient estimates and associated statistics are shown in the Appendix.
The B. bufo model had an AUC of 0.711, while the E. calamita model scored a slightly lower 0.629. Spatial autocorrelation in model residuals was negligible (maximum absolute Moran’s I was 0.003 for B. bufo and 0.002 for E. calamita). The models of combined species data and the corresponding fuzzy logic operations between individual species models produced similarly shaped ROC curves and largely overlapping AUC in all four comparisons (Figure 3(a)).
The predicted values derived from modelling combined species distribution data were also generally similar to the results of fuzzy logic operations between the two single-species models (Figure 2). The similarity between these map pairs is also attested, in all four cases, by both rank correlation and fuzzy numerical comparison of predicted values (Table 2 and Figure 3(b)). For the models of presence of one species against the other, fuzzy logic operations generated less dispersed predictions, with a smaller variation interval (Figure 3(c)).
The relatively low AUC values obtained for both B. bufo and E. calamita are in line with those generally obtained for species with widespread distributions in the study area , as the AUC is known to correlate negatively with species prevalence . Expanding the study area to include the complete distributions of both species could allow obtaining models with larger AUC. However, this would require distribution data at the same resolution from the rest of the distribution areas of both species, which are not available. In addition, higher AUC values do not necessarily mean better calibrated models; they simply reflect the fact that the modelled species does not distinguish so clearly between “good” and “bad” habitat within the studied region. Moreover, as we have pointed out before, this does not affect the pair wise model comparisons, which were the focus of this paper.
Models confronting the presence of B. bufo and E. calamita have been built previously, on a narrower spatial scale, in Southern Spain . Analogous models have also been built for other amphibian pairs, such as cryptic species of frogs (Discoglossus galganoi and D. jeanneae ) and newts (Triturus marmoratus and T. pygmaeus ) and genetically differentiated forms of a salamander (Chioglossa lusitanica ). This may be the adequate approach when the aim of modelling is to assess which environmental parameters distinguish the distribution areas of two organisms. But when the prediction of their potential distributions is the main aim, fuzzy logic operations between the single-species models may be preferable, as they present a series of advantages. They avoid the need to build additional models: the single-species models are enough. They allow using all distribution data available, that is, all the localities in the study area, and not only those with exclusive presence of one of the species. This increase in sample size allows a better model calibration and thus can enhance the predictive power of the models. They allow the possibility of simultaneous multispecies comparisons, instead of comparing species only two by two; models such as C1 may be impracticable when applied to many species, as the number of localities where all the species have been recorded decreases with the number of species analysed, whereas models such a C2 are not affected by this. Modelling the presence of any of two species (as in model D1 in our study) gives greater weight to the species with higher number of presences, while combining individual species models with fuzzy logic gives the same importance to all species involved.
Our results showed that favourability models for two species combined by means of fuzzy logic operations perform similarly to models of combined data for these species. Although we have not tested this specifically, we may assume that the method will work in other situations, differing, for example, in number of species, the magnitude of the differences between their distribution areas, species prevalence, or the geographical extent of the study area. The modelling method, however, should provide directly comparable numerical predictions, as is the case with the favourability function .
A fuzzy classification technique (fuzzy envelope model, FEM) has been applied  for predicting species’ distributions by using presence-only records, although recognizing that when absence records are available, models built using presence-absence data may perform better than presence-only models. In any case, our conclusions are likely applicable to the use of fuzzy logic operations to their fuzzy models, although this needs to be specifically tested.
Favourability values are here considered as the degree of membership to the fuzzy set of localities favourable to the analysed event (presence of one species, of any of them, of both together, and of one instead of the other). Degrees of membership are sometimes confused with probability values, in part because both take values between 0 and 1. However, the conceptual consequences of this difference between degree of membership and probability are relevant. Local favourability denotes a measure of the degree to which local conditions cause local probability to differ from the probability expected at random, that is, from that expected according to the prevalence of the event . Consequently, favourability values should not be taken as probability values independent of sample prevalence. Local probability depends both on the response of the analysed event to the predictors and on the prevalence of the event , whereas favourability depends only on the response to the predictors in the study area . Thus, favourability is aimed at complementing probability, by providing a comparable measure of the response of the event to the predictors for events differing in prevalence.
The mathematical consequences of this difference between degree of membership and probability are also relevant. The probability of simultaneous occurrence of several events is calculated by multiplying the individual probabilities of each event, which inevitably yields increasingly lower output values as more events are taken into account. The use of fuzzy logic operations avoids this mathematical problem, as favourability for the simultaneous occurrence of several events is computed as the favourability for the least favourable event . This is important when the aim is to identify areas that are simultaneously favourable for groups of several species, as it is the case, for example, in the identification of favourability hotspots . This is especially relevant at a time when distribution modelling of multiple species is increasingly necessary to design effective conservation strategies for both present and future scenarios.
Variables included in each environmental favourability model, their parameter estimates (coefficients) and associated standard error, test, and significance () values. Variable codes as in Table 1. ***; **; *; .. (See Tables 3, 4, 5, 6, 7, and 8).
Neftalí Sillero merged and kindly shared the species distribution data from Portugal and Spain. Christoph Scherber adapted and kindly shared the script for AICc-based model selection. A. M. Barbosa is supported by a postdoctoral fellowship (SFRH/BPD/40387/2007) from Fundação para a Ciência e a Tecnologia (Portugal), cofinanced by the European Social Fund. The “Rui Nabeiro” Biodiversity Chair is financed by Delta Cafés and an FCT project (PTDC/AAC-AMB/098163/2008).
- J. Romero and R. Real, “Macroenvironmental factors as ultimate determinants of distribution of common toad and natterjack toad in the south of Spain,” Ecography, vol. 19, no. 3, pp. 305–312, 1996.
- J. W. Arntzen and J. Alexandrino, “Ecological modelling of genetically differentiated forms of the Iberian endemic golden-striped salamander, Chioglossa lusitanica,” Herpetological Journal, vol. 14, no. 3, pp. 137–141, 2004.
- R. Real, A. M. Barbosa, I. Martínez-Solano, and M. García-Paris, “Distinguishing the distributions of two cryptic frogs (Anura: Discoglossidae) using molecular data and environmental modeling,” Canadian Journal of Zoology, vol. 83, no. 4, pp. 536–545, 2005.
- J. W. Arntzen and G. Espregueira Themudo, “Environmental parameters that determine species geographical range limits as a matter of time and space,” Journal of Biogeography, vol. 35, no. 7, pp. 1177–1186, 2008.
- R. Real, A. M. Barbosa, A. Rodríguez et al., “Conservation biogeography of ecologically interacting species: the case of the Iberian lynx and the European rabbit,” Diversity and Distributions, vol. 15, no. 3, pp. 390–400, 2009.
- R. Real, A. M. Barbosa, and J. M. Vargas, “Obtaining environmental favourability functions from logistic regression,” Environmental and Ecological Statistics, vol. 13, no. 2, pp. 237–245, 2006.
- A. Estrada, R. Real, and J. M. Vargas, “Using crisp and fuzzy modelling to identify favourability hotspots useful to perform gap analysis,” Biodiversity and Conservation, vol. 17, no. 4, pp. 857–871, 2008.
- A. Antunez, R. Real, and J. M. Vargas, “Biogeographical analysis of the amphibians in the southern valleys of the Betic Cordillera,” Miscellania Zoologica, vol. 12, pp. 261–272, 1988.
- A. M. Barbosa, R. Real, and J. M. Vargas, “Use of Coarse-resolution models of species' distributions to guide local conservation inferences,” Conservation Biology, vol. 24, no. 5, pp. 1378–1387, 2010.
- A. Loureiro, N. F. de Almeida , M. A. Carretero, and O. S. Paulo, Atlas dos Anfíbios e Répteis de Portugal, Instituto da Conservação da Natureza e da Biodiversidade, Lisbon, Portugal, 2008.
- J. M. Pleguezuelos, R. Márquez, and M. Lizana, Atlas y Libro Rojo de los Anfibios y Reptiles de España, Dirección General de la Conservación de la Naturaleza-Asociación Herpetológica Española, Madrid, Spain, 2002.
- G. C. Reese, K. R. Wilson, J. A. Hoeting, and C. H. Flather, “Factors affecting species distribution predictions: a simulation modeling experiment,” Ecological Applications, vol. 15, no. 2, pp. 554–564, 2005.
- P. Sastre, P. Roca, and J. M. Lobo, “A geoplatform for improving accessibility to environmental cartography,” Journal of Biogeography, vol. 36, p. 568, 2009.
- Quantum GIS Development Team, “Quantum GIS geographic information system,” Open Source Geospatial Foundation Project, http://qgis.osgeo.org/.
- GRASS Development Team, “Geographic Resources Analysis Support System (GRASS) Software,” Open Source Geospatial Foundation Project, http://grass.osgeo.org/.
- A. M. Barbosa, R. Real, J. Olivero, and J. M. Vargas, “Otter (Lutra lutra) distribution modeling at two resolution scales suited to conservation planning in the Iberian Peninsula,” Biological Conservation, vol. 114, no. 3, pp. 377–387, 2003.
- A. M. Barbosa, R. Real, and J. Mario Vargas, “Transferability of environmental favourability models in geographic space: the case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain,” Ecological Modelling, vol. 220, no. 5, pp. 747–754, 2009.
- A. M. Barbosa, R. Real, and J. M. Vargas, “Erratum to " Transferability of environmental favourability models in geographic space: the case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain" [Ecological Modelling 220, 747–754, 2009],” Ecological Modelling, vol. 222, no. 4, p. 1067, 2011.
- Development Core Team R, “R: a language and environment for statistical computing,” R Foundation for Statistical Computing, 2009, http://www.r-project.org/.
- M. B. Araujo, W. Thuiller, P. H. Williams, and I. Reginster, “Downscaling European species atlas distributions to a finer resolution: implications for conservation planning,” Global Ecology and Biogeography, vol. 14, no. 1, pp. 17–30, 2005.
- L. Bulluck, E. Fleishman, C. Betrus, and R. Blair, “Spatial and temporal variations in species occurrence rate affect the accuracy of occurrence models,” Global Ecology and Biogeography, vol. 15, no. 1, pp. 27–38, 2006.
- A. Guisan and N. E. Zimmermann, “Predictive habitat distribution models in ecology,” Ecological Modelling, vol. 135, no. 2-3, pp. 147–186, 2000.
- H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Proceedings of the 2nd International Symposium on Information Theory, B. N. Petrov and F. Csaki, Eds., pp. 267–281, Akadémia Kiadó, Budapest, Hungary, 1973.
- K. P. Burnham and D. R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Springer, 2nd edition, 2002.
- M. J. Crawley, The R Book, John Wiley & Sons, Chichester, UK, 2007.
- A. H. Fielding and J. F. Bell, “A review of methods for the assessment of prediction errors in conservation presence/absence models,” Environmental Conservation, vol. 24, no. 1, pp. 38–49, 1997.
- J. M. Lobo, A. Jiménez-Valverde, and R. Real, “AUC: a misleading measure of the performance of predictive distribution models,” Global Ecology and Biogeography, vol. 17, no. 2, pp. 145–151, 2008.
- P. Dutilleul, “Modifying the t test for assessing the correlation between two spatial processes,” Biometrics, vol. 49, no. 1, pp. 305–314, 1993.
- T. F. Rangel, J. A. F. Diniz-Filho, and L. M. Bini, “SAM: a comprehensive application for Spatial Analysis in Macroecology,” Ecography, vol. 33, no. 1, pp. 46–50, 2010.
- H. Visser and T. De Nijs, “The map comparison kit,” Environmental Modelling and Software, vol. 21, no. 3, pp. 346–358, 2006.
- A. Hagen-Zanker, G. Engelen, J. Hurkens, R. Vanhout, and I. Uljee, Map Comparison Kit 3 User Manual, RIKS Bv, Maastricht, The Netherlands, 2006.
- M. P. Robertson, M. H. Villet, and A. R. Palmer, “A fuzzy classification technique for predicting species' distributions: applications using invasive alien plants and indigenous insects,” Diversity and Distributions, vol. 10, no. 5-6, pp. 461–474, 2004.
- J. S. Cramer, “Predictive performance of the binary logit model in unbalanced samples,” Journal of the Royal Statistical Society Series D, vol. 48, no. 1, pp. 85–94, 1999.
- L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
Copyright © 2012 A. Márcia Barbosa and Raimundo Real. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.