Geographic Information Systems (GISs) are composed of useful tools to map and to model the spatial distribution of events that have geographic importance as schistosomiasis. This paper is a review of the use the indicator kriging, implemented on the Georeferenced Information Processing System (SPRING) to make inferences about the prevalence of schistosomiasis and the presence of the species of Biomphalaria, intermediate hosts of Schistosoma mansoni, in areas without this information, in the Minas Gerais State, Brazil. The results were two maps. The first one was a map of Biomphalaria species, and the second was a new map of estimated prevalence of schistosomiasis. The obtained results showed that the indicator kriging can be used to better allocate resources for study and control of schistosomiasis in areas with transmission or the possibility of disease transmission.

1. Introduction

Schistosomiasis mansoni is an endemic disease, typical of developing countries [1, 2]. In Brazil, the schistosomiasis is caused by the etiological agent Schistosoma mansoni, whose intermediate host is species of mollusk of the Biomphalaria genus.

The S. mansoni was introduced in Brazil by the African slavery trade during the sixteenth century [3]. The Brazilian port of Salvador and Recife received most of the slaves [4], originated from endemic regions. In the early eighteenth century, there was a large migration of slave labor due to the decline of sugar production in the Northeast of Brazil and the discovery of gold and diamond in the Minas Gerais State. It is estimated that one fifth of the population at that time moved to Minas Gerais [5], using the “ways of São Francisco” [6] as the main access route. Probably, in these early migrants also came to schistosomiasis.

In Minas Gerais, there are seven species of Biomphalaria genus: B. glabrata, B. straminea, B. tenagophila, B. peregrina, B. schrami, B. intermedia, and B. occidentalis [79]. Come in these, only B. glabrata, B. tenagophila, and B. straminea have been found naturally infected by S. mansoni [10, 11]. B. glabrata is of great epidemiologic importance, due to its extensive geographic distribution, high infection indices, and effectiveness in the schistosomiasis transmission. Moreover, its distribution is almost always associated with disease occurrence [12]. B. tenagophila was found naturally infected by S. mansoni in state of Minas Gerais, and it is responsible for the focus maintenance in the city of Itajubá [13]. B. straminea, although had not been found infected in state of Minas Gerais, was considered responsible for Paracatu’s focus [14].

They are commonly found in a wide of habitats, both natural (streams, creeks, ponds, swamps) and artificial (irrigation ditches, small dams), particularly in shallow and slow running waters (less than 30 cm/s), where the substratum can be the muddy or rocky bed and with floating or rooted vegetation, pH between 6 and 8, NaCl content below 3 by 1000, and mean temperature between 20 and 25 degrees C [1517].

The study of the habitat of these mollusks, as well as their behavior in relation to the climate, results in valuable information when the goal is the disease transmission control [18].

Environmental and socioeconomic factors may influence the spatial distribution of schistosomiasis. Under these circumstances, the Geographic Information System (GIS) can be applied to characterize, to better understand the interconnection of these factors, and to provide a more complete picture of disease transmission [19]. GIS allows a complex analysis of a large number of information and displays the results of this analysis in graphical maps. These techniques have become important tools for the design and implementation of control programs [20], enabling a better distribution of state resources to allow a direction more suitable for disease control [2123]. Among these tools, we can cite the indicator kriging, which allows to data spatialization aiming at map generation. It also gives information about inference uncertainties that can be used as quality restrictions of the classification process.

This study is a review of the use the indicator kriging of the Georeferenced Information Processing System (SPRING) to make inferences about the presence of the species of Biomphalaria (B. glabrata, B. tenagophila, and/or B. straminea), intermediate hosts of Schistosoma mansoni. Also, using numerical indicator kriging, a new map of estimated prevalence of schistosomiasis, in areas without information in the Minas Gerais State, Brazil, is presented.

2. Methodology

Kriging may be defined as a technique of statistical inference, which allows the estimation of values and the uncertainties associated with the attribute during the spatialization of a sample property [24].

To achieve the objectives, two approaches have been considered: categorical and numerical indicator kriging. The categorical indicator kriging was based on the information of the mollusks species, and the numerical indicator kriging used data from the prevalence of schistosomiasis.

The procedure for adjustment of the semivariogram is not straightforward and automatic, but interactive, because the interpreter does the first adjustment and checks the adequacy of the theoretical model [25]. After the models fitted for each class (categorical) or quartile (numerical), the indicator kriging was applied to obtain an approximation of the conditional distribution function of random variables.

The numerical indicator kriging was conducted in the entire state using the schistosomiasis prevalence data (lower quantile, median and upper quantile) from 999 localities.

The categorical indicator kriging was performed in each of the fifteen river basins (Buranhém, Doce, Grande, Itabapoana, Itanhém, Itapemirim, Jequitinhonha, Jucuruçu, Mucuri, Paraíba do Sul, Paranaíba, Pardo, Piracicaba/Jaguari, São Francisco and, São Mateus) using the mollusk data. The mollusk attributes (class of species and localization) were distributed along the drainage network of 15 River Basins, according to the methodology used by Guimarães et al. [22]. The classes used for this study were defined as B. glabrata, B. tenagophila, B. straminea, B. glabrata + B. tenagophila, B. glabrata + B. straminea, B. tenagophila + B. straminea, B. glabrata + B. tenagophila + B. straminea, and without Biomphalaria. The class without Biomphalaria includes information about the nonoccurrence of Biomphalaria species or information about non-transmitter species in Brazil, such as B. peregrina, B. schrammi, B. intermedia, and B. occidentalis.

The indicator kriging was done in the software SPRING [26]. In Appendix A is described the geostatistical modeling used in the indicator kriging.

2.1. Data Set

Schistosomiasis prevalence values (Pv) were obtained from the Brazilian Schistosomiasis Control Program (PCE) through the Annual Reports of the Secretary of Public Health Surveillance (SVS) and the Secretary of Health in the State of Minas Gerais (SESMG). The PCE in Minas Gerais had its beginning in 1986, and since 2000 has been under the coordination of the SESMG in collaboration with Municipal Health Systems. The PCE prevalence information is available for municipalities and localities [18]. The Kato Katz technique is the methodology used to determine prevalence, examining one slide per person. The spatial distribution of the schistosomiasis prevalence is presented in Figure 1(a), for the 255 municipalities and for the 999 localities used in this study.

Data on the distribution of Biomphalaria mollusks were provided by the Laboratory of Helminthiasis and Medical Malacology of the René Rachou Research Center (CPqRR/Fiocruz-MG). Mollusks were collected in breeding places from different municipalities in Minas Gerais at different periods, using scoops and tweezers, and then packed to be transported to the laboratory [27]. Specific identification was performed according to the morphology of the shells, reproductive system, and renal ridge of the mollusks [2833], and also by low stringency polymerase chain reaction and restriction fragment length polymorphism [34]. The spatial distribution of the Biomphalaria species data are presented in Figure 1(b).

2.2. Test of the Hypothesis for Differences between PCE and Indicator Kriging Estimated Prevalences

To assess whether the estimates made by kriging methods were close to those obtained by PCE, initially estimates were made by municipalities, by averaging the estimated prevalence for all the grid points belonging to that municipality.

A regression line was then adjusted, using the prevalence provided by the PCE as dependent variable and the prevalence provided by kriging as independent variable, that is,𝑃PCE=𝛽0+𝛽1𝑃𝐾,(1) where 𝑃PCE and 𝑃𝐾 are the PCE and kriging prevalences, respectively.

A hypothesis test was then performed to determine, with a 95% confidence level, whether the intercept was zero and the slope parameter was equal to 1:𝐻0𝛽0=0,𝛽1𝐻=1,1𝛽00or𝛽11.(2)

If the null hypothesis is accepted, it can be concluded that, in average, the kriging estimates are equal to PCE prevalences at a 95% confidence level.

3. Results and Discussions

The indicator kriging procedure, based on the fitted semivariograms, was applied using the sample data presented in Figure 1, to generate a regular grid of 250 meters of resolution (𝑥,𝑦) over the Minas Gerais State.

The following results were obtained to achieve the objectives.

3.1. Using Categorical Data (Thematic)

The resulting map of the species distribution generated by applying the mode estimator (Appendix A.2.—(A.6)) is presented in Figure 2(a).

Figure 2(b) presents a map of the uncertainties associated with the classification, computed by using (A.7) of Appendix A.2. The map of uncertainties shows that the higher uncertainties are concentrated among class transition areas.

The methodology was validated using a sampling procedure. The fieldwork was conducted in the São Francisco and Paraíba do Sul River Basins where no information existed about the presence of the mollusks. More details about the results for the São Francisco and Paraíba do Sul River Basins can be found in Guimarães et al. [22] and Guimarães [18] and Tibiriçá et al. [9].

The research of mollusks was accomplished in five municipalities in the São Francisco River Basin (SFRB) and in nine municipalities in the Paraíba do Sul River Basin (PSRB). The mollusks collected were sent to the analysis of the species in the Laboratory of Helminthiasis and Medical Malacology of the René Rachou Research Center (CPqRR/Fiocruz-MG). Also, the mollusks collected in the PSRB were identified at the Parasitology Laboratory in the Federal University of Juiz de Fora (UFJF) and the Entomology Laboratory of the GRS/JF, Secretary of Health in the State of Minas Gerais (SESMG). Collection and identification of the mollusks were performed according to the methodology described in Section 2.1.

Figures 2(c) and 2(d) show the historical Biomphalaria species (Figure 1(b)) at the surveyed municipalities of São Francisco and Paraíba do Sul River Basins, respectively.

Table 1 presents the estimated and found species, as well as the value of the uncertainty mean for searched municipality, and the collection points, where these species had been found.

To explain the differences in the two basins, some considerations should be made. Figure 2 shows the spatial distribution of Biomphalaria species according to historical data. We can observe from this figure that the SFRB (Figure 2(c)) has a better spatial distribution of species surveyed than the PSRB (Figure 2(d)).

The municipalities surveyed in the SFRB had 100% accuracy with at least one specie estimated, but this value for the PSRB was 66.67%.

About 50% of the municipalities of SFRB have historical information about the Biomphalaria species in one of eight classes (B. glabrata, B. tenagophila, B. straminea, B. glabrata + B. tenagophila, B. glabrata + B. straminea, B. tenagophila + B. straminea, B. glabrata + B. tenagophila + B. straminea, and without Biomphalaria). However, for PSRB, this information is only 32% of the municipalities and one of five classes (B. glabrata, B. tenagophila, B. glabrata + B. tenagophila, B. glabrata + B. tenagophila + B. straminea, and without Biomphalaria).

As kriging is affected by the amount and spatial distribution of input data, this may explain the differences in the two basins.

This fact is also reflected in the uncertainties of the estimates. Comparing the values presented in Table 1, the SFRB had an overall uncertainty mean of 0.232, and the PSRB had an overall uncertainty mean of 0.332. Therefore, the overall uncertainty for PSRB is 43.1% greater than for SFRB.

3.2. Using Numerical Data

Figure 3 shows the spatial distribution of schistosomiasis-estimated prevalence by kriging (Figure 3(a)), map of the uncertainties (Figure 3(b)), estimated prevalence by kriging by type of classes (Figure 3(c)), and in Figure 3(d) the mean estimated prevalence by kriging for the 255 municipalities where the PCE prevalence information is available.

The PCE prevalence values (Figure 1(a)) and the respective kriging estimates (Figure 3(d)) were plotted together. Figure 4 shows the scatter plot as well as the regression line. The hypothesis test (A.12) was performed, and the null hypothesis was accepted, indicating that there is no significant difference between the PCE prevalence and the kriging prevalence means, with a significance level of 0.05.

Table 2 presents the comparison between the prevalence of PCE and the prevalence estimated by kriging by type of classes: low (prevalence among 0.001 to 5), medium (5.001 to 15), and high prevalence (above 15).

From Table 2, it can be noted that(i)60.8% of the municipalities are estimated in the same class as they belong;(ii)37.2% of the municipalities had the prevalence estimated in the adjacent class, that is, from low to medium class, from medium to high class, from high to medium class, or from medium to high class;(iii)less than 2% of the municipalities had the low prevalence estimated a high class;(iv)when the estimated class is not in the same class of the PCE, the kriging has a trend of about 25.9% to overestimate and about 13.3% to underestimate the prevalence values.

4. Conclusions and Future Work

Indicator kriging showed to be a rather robust tool since its results presented a very good agreement with the field findings. This tool allowed to determine and to delimit, respectively, the distribution of the Biomphalaria species and the areas of risk (map of uncertainty of the Biomphalaria species).

Kriging is an auxiliary useful tool to guide the fieldwork, indicating the places with higher probability of occurrence of the considered species, with particular attention to those species that are more important for disease transmission. The results of this tool can be used to better allocate the always limited resources for distribution studies and the development of strategies for mollusk control.

Some important issues, related to the nature and precision of the Biomphalaria species data, need to be considered when looking at the results: the data were obtained from historical records (most occurring before the broad usage of GPS equipment), and the information is given in a municipality level basis. Because of this, an assumption was made that the species found in the municipalities are uniformly distributed inside the municipality drainage network. The authors believe, however, that other type of distribution hypothesis would not greatly affect the results.

To improve the accuracy of an estimate using kriging, it would be necessary to obtain data with better location and spatial distribution of the information collected in the fieldwork.

Also, the kriging proved to be a suitable tool, and their results showed a good agreement with the PCE data.

This technique can be used to estimate the schistosomiasis prevalence in the municipalities of Minas Gerais where the prevalence is not determined by the PCE. The results of this tool can be used to better allocate resources for studies in areas with medium and high prevalence.

The entire methodology of this study used free software allowing the playback of the methodology in other states of Brazil where there is no information about the type of Biomphalaria and/or schistosomiasis prevalence at no cost.

Conditioned to appropriate funds existence, an extensive malacological survey is recommended for better evaluation of the methodology and also GPS utilization in all future fieldworks.

It is also recommended to obtain data on the schistosomiasis prevalence in western Minas Gerais (nonendemic region). Thus, one can obtain a better estimate of prevalence at the state level and not only in the endemic area.



A.1. Indicator Kriging

A spatial attribute inside a region 𝐴 of the earth surface can be modeled, from a geostatistical point of view, as a random function [35, 36]. For each spatial position 𝐮𝐴, the attribute value of a spatial data is modeled as a random variable (RV) 𝑍(𝐮), which can assume different values with an associated probability of occurrence. In the 𝑛 sample positions 𝐮𝛼, 𝛼=1,2,,𝑛, the values 𝑍(𝐮) are considered deterministic, that is, they can be considered RV’s with probability 100% of occurrence. The distribution function of 𝑍(𝐮), conditioned to 𝑛 sample points, can be defined by Deutsch and Journel [36]𝐹(𝐮;𝑧(𝑛))=Prob{𝑍(𝐮)𝑧(𝑛)}foranumericalattribute,𝐹(𝐮;𝑧(𝑛))=Prob{𝑍(𝐮)=𝑧(𝑛)}foracategoricalattribute.(A.1)

The function 𝐹(𝐮;𝑧(𝑛)) models the uncertainty about the values of 𝑧(𝐮), in nonsampled positions 𝐮, conditioned to the 𝑛 samples. This function can be evaluated from the inference procedure-denominated indicator kriging.

The indicator codification of the RV 𝑍(𝐮), for a cutoff value 𝑧=𝑧𝑘, generates the indicator RV𝐼(𝐮;𝑧𝑘) using the following nonlinear mapping function:𝐼𝐮;𝑧𝑘=1if𝑍(𝐮)𝑧𝑘0if𝑍(𝐮)>𝑧𝑘𝐼foranumericalattribute,𝐮;𝑧𝑘=1if𝑍(𝐮)=𝑧𝑘0if𝑍(𝐮)𝑧𝑘foracategoricalattribute.(A.2)

The cutoff values, 𝑧𝑘,𝑘=1,2,𝐾, are defined depending on the number of samples. It is necessary that the amount of codified samples with value 1 are enough for defining, with success, a variography model for each cutoff value [37].

The conditional expectation of the indicator RV𝐼(𝐮;𝑧𝑘) is computed by𝐸𝐼𝐮;𝑧𝑘𝐼(𝑛)=1Prob𝐮;𝑧𝑘𝐼=1(𝑛)+0Prob𝐮;𝑧𝑘𝐼=0(𝑛)=1Prob𝐮;𝑧𝑘=1(𝑛)=𝐹𝐮;𝑧𝑘.(𝑛)(A.3)

This estimation, made through ordinary kriging over the indicator values, yields a least-square estimate for the distribution function at the cutoff value 𝑧𝑘 [36]. A set of 𝐹(𝐮;𝑧𝑘(𝑛)) estimates, obtained for different 𝑘 cutoff values, can lead to an approximation of the conditional distribution function (cdf) of 𝑍(𝐮).

The expression of the ordinary indicator kriging estimator is given by:𝐹𝑂𝐮;𝑧𝑘=(𝑛)𝑛(𝐮)𝛼=1𝜆𝑂𝛼𝐮;𝑧𝑘𝑖𝐮𝛼;𝑧𝑘.(A.4)

The weights 𝜆𝑂𝛼(𝐮;𝑧𝑘) are obtained by solving the following system of equations:𝑛(𝐮)𝛽=1𝜆𝑂𝛽𝐮;𝑧𝑘𝐶1𝛼𝛽;𝑧𝑘+𝜙𝐮;𝑧𝑘=𝐶1𝛼;𝑧𝑘𝛼=1,2,,𝑛(𝐮),𝑛(𝐮)𝛽=1𝜆𝑂𝛽𝐮;𝑧𝑘=1,(A.5) where 𝜙(𝐮;𝑧𝑘) is a Lagrange parameter, 𝛼𝛽 is the vector defined by the positions 𝐮𝛼and𝐮𝛽,𝛼 is the vector defined by the positions 𝐮𝛼and𝐮,𝐶1(𝛼;𝑧𝑘) is the autocovariance defined by 𝛼𝛽,and𝐶1(𝛼;𝑧𝑘) is the covariance defined by 𝛼. The autocovariance is determined by the theoretical variography model defined by the set 𝐼 when 𝑧=𝑧𝑘.

The indicator kriging, simple or ordinary, gives, for each cutoff value 𝑘, an estimator which is also the better least square estimator of the conditional expectation of the RV 𝐼(u; 𝑧𝑘). Using this property, it is possible to compute estimates of the cdf values of 𝑍(u) for several values of 𝑧𝑘, belonging to the 𝑍(u) domain. The set of the estimated values of the cdf’s of 𝑍(u), for the cutoff values, is considered a discrete approximation of the real cdf of 𝑍(u). The greater the number of cutoff values, the better the approximation.

The indicator kriging is nonparametric. It does not consider any type of a prior distribution for the random variable. Instead, it permits the construction of a discrete approximation of the cdf of 𝑍(u). The values of the discrete probabilities can be directly used to estimate the values that characterize the distribution, such as average, variance, mode, and quantiles.

A.2. Estimation and Uncertainty Evaluation for Categorical (Thematic) Data

The optimal estimation 𝑧(𝐮), for a categorical attribute represented by 𝐾 classes 𝑐𝑘,𝑘=1,,𝐾, can be defined as𝑧(𝐮)=𝑐𝑘i𝑝𝑘(𝐮)>𝑝𝑖(𝐮)𝑖,𝑘=1,,𝐾,𝑖𝑘,(A.6) where 𝑝𝑘(𝐮)=𝐹(𝐮;𝑧𝑘(𝑛)), when 𝑧𝑘=𝑐𝑘, is the estimated probability of the class 𝑘 at the location u. This estimator is known as the mode estimator since it considers the highest probability, and the classifier that assigns classes at each position 𝐮 using the mode estimator is known as mode classifier.

The species classification probability for each position, inside the region of interest, is obtained with the values for indication (0 or 1) of the neighboring known samples (nearer) of u. This probability is estimated by using ordinary kriging, where the weighted value of each neighboring sample of 𝐮 is calculated considering the variability model defined by the semivariograms for each class.

Usually, inferences about local uncertainties for categorical attributes, Unc(𝐮), are made using the complement of the mode probability, which is defined asUnc(𝐮)=1𝐾Max𝑘=1𝑝𝑘(𝐮).(A.7)

The mode estimator, as defined in (A.4), has the advantage of classifying all locations 𝐮 inside the region of interest. An alternate methodology consists in considering the uncertainties as restrictions to the classification process. In this case, only locations with an uncertainty below a predefined threshold (𝑈max) are classified, that is𝑧𝑐(𝐮)=𝑘𝑝i𝑘(𝐮)>𝑝𝑖(𝐮)Unc(𝐮)<𝑈max𝑖,𝑘=1,,𝐾,𝑖𝑘,𝜙otherwise,(A.8) where 𝑧(𝐮)=𝜙 means that the value was not estimated or the location was not classified.

A.3. Estimation and Uncertainty Evaluation for Numerical Data

For numerical attributes are estimated univariate cumulative probabilities 𝑝𝑘(𝐮),𝑘=1,,𝐾, of the cutoff values 𝑘. These probabilities are used to infer the value and uncertainty in the RV location u not sampled. The estimated value of the numerical RV may be the mean or median of the distribution. The uncertainty Unc(u) can be determined by the variance 𝜎2=𝐸{(𝑍(𝐮)𝐸{(𝑍(𝐮)})2}, and this variance can be used to define confidence intervals of type𝑍𝑧Prob(𝐮)(𝐮)±2𝜎(𝐮)0.95(A.9) when the variable presents a symmetry level that allows to assume the normality hypothesis.

For distributions highly asymmetrical, a more robust measure is the interquartile range defined as the difference between the highest and lowest quartile:𝑄=𝑧0.75(𝐮)𝑧0.25(𝐮),where𝑧𝑝=𝐹1(𝐮,𝑝(𝑛)).(A.10)

From the inferred cdf, 𝐹(𝐮;𝑧(𝑛)), it is possible to derive various probability intervals such as the 95% interval [𝑞0.025; 𝑞0.975], such that,𝑍𝑞Prob(𝐮)0.025𝑞0.975(𝑛)=0,95(A.11) with 𝑞0.025 and 𝑞0.975 being the quantiles 0.025 and 0.975, respectively, of cdf. Attribute values referring to the quantiles are estimated from the fit function and of the cutoff values used in the indicator kriging.

Also, uncertainties can be estimated for ranges of attribute values. The probability of a value 𝑍(u) be within an interval (𝑎,𝑏] is computed as the difference between the values of cdf for the 𝑏 and 𝑎 thresholds:]Prob{𝑍(𝐮)(𝑎,𝑏(𝑛)}=𝐹(𝐮,𝑏(𝑛))𝐹(𝐮,𝑎(𝑛)).(A.12)

After model fittings, indicator kriging procedures were applied to obtain an approximation of the conditional distribution function of the random variables. Based on the estimated function, maps of mollusk spatial distributions along with the corresponding uncertainties for the entire state and also map of estimated prevalence of schistosomiasis were built.


This study was supported by CNPq (Grant nos. 300679/2011-4, 384571/2010-7, 302966/2009-9, and 308253/2008-6).