Abstract

Today it is very difficult to evaluate the quality of spatial databases, mainly for the heterogeneity of input data. We define a fuzzy process for evaluating the reliability of a spatial database: the area of study is partitioned in isoreliable zones, defined as homogeneous zones in terms of data quality and environmental characteristics. We model a spatial database in thematic datasets; each thematic dataset concerns a specific spatial domain and includes a set of layers. We estimate the reliability of each thematic dataset and therefore the overall reliability of the spatial database. We have tested this method on the spatial dataset of the town of Cava de' Tirreni (Italy).

1. Introduction

Fuzzy rule-based models are applied in geographical information systems (GIS) [13] and we use our previous approach [4, 5] for estimating the reliability of spatial databases. There the concept of geodata “reliability” was introduced as a fuzzy measure of the quality of geodata, based on the analysis of uncertainty and quality of the data. Strictly speaking, in [5] the authors implement a tool called (Fuzzy Spatial Reliability Analysis) Fuzzy-SRA [6] for studying the reliability of the intrinsic vulnerability of aquifers by utilizing the DRASTIC model, encapsulated in a GIS; in [4] Fuzzy-SRA is used for estimating the reliability of the aerophotogrammetric set of geographic layers of the island of Procida (near Naples, Italy) and in [7] Fuzzy-SRA is applied in a GIS tool for implementing a fuzzy rule-based system for analyzing the eruption risk of the famous vulcan Vesuvius.

As the first step, we need to divide the geographic area of study in isoreliable zones, that is, in zones having (quasi) homogeneous data quality and geographical characteristics. An expert sets the characteristics related to the quality of each layer (e.g., the percent of uncoded spot elevation features). Each characteristic, called “parameter,” is a measurable entity that could affect the quality of the dataset. After calculating the value of a parameter, a fuzzification process is applied for estimating the quality of the set of layers, where each fuzzy set is given by a triangular fuzzy number (TFN), which in turn is identified from a linguistic label. In other words, an isoreliable zone is a subarea of the area of study in which the quality of the geodata is homogeneous; that is, the values of the parameters are similar and with the same geographical characteristics (e.g., a flat country).

The expert creates a fuzzy partition, labelling the TFNs with linguistic labels (say) (see, e.g., Table 3) for each parameter. An isoreliable zone is associated with the linguistic label of the corresponding TFN, for which the membership degree of the parameter is the highest one. This process is iterated for each parameter of each layer. The elements of Fuzzy-SRA, associated with each parameter, are fuzzy attributes represented by a string. To clarify how these strings are composed, now we suppose, as an example, that we have partitioned the area of study in 5 isoreliable zones, ; hence we create a fuzzy partition of the domain discourse of the parameter in 6 fuzzy sets with linguistic labels, respectively, . After the fuzzification process, we associate each isoreliable zone to a TFN as showed in Table 1.

A string is created for the parameter in the following form: where the symbol “−” indicates the absence of isoreliable zones to be associated with the corresponding TFN.

Now we suppose to create a partition in four TFNs of the domain of a second attribute , obtaining the corresponding string . The combination of the two strings and made by means of the new operation “” is a string (details are given in [6] and in Section 2.1) defined as

In this string new TFNs, labeled as , are obtained as well. Generally, if (resp., ) is the number of TFNs for the string (resp., ), then the string contains () TFNs. The strings obtained for each layer are recombined as in (2) for obtaining a final string used for evaluating the reliability of the whole set of layers. For instance, we can consider (see [4] for details) four aerophotogrammetric layers as given in Table 2.

After calculating the strings for each layer, they are combined again by using operator (2) in order to obtain a final string. In this calculus a weight is associated with each layer, related to the role of that layer in the spatial database. For instance, a layer “buildings” can be more relevant with respect to the layer “infrastructures”; then the quality of the first layer affects the spatial database reliability more than the second one.

There is another question to be considered which consists in the fact that the weight associated with an attribute can change for different isoreliability zones. For instance, the quality of the dataset “spot elevation” affects the reliability of the spatial database in zones with strong slopes more than in smoothed zones. After assigning the weights, the final string is to be recalculated as showed in Section 2.3. The reliability index, to be assigned to each isoreliable zone, is given by the central value of the TFN in which that zone appears in the final string.

In our method we model a spatial database in three hierarchic levels: spatial database, thematic dataset, and layer. A spatial database is composed of several thematic datasets (e.g., geology, hydrology, etc.). Each thematic dataset contains more thematic layers (Figure 1).

Our method starts considering the layer level, assigning the single parameters to each layer. After determining the strings for each layer of a thematic dataset, they are combined with formula (2) for obtaining the final string of the thematic dataset; successively this string is recalculated by considering the weights assigned to each layer. After calculating the final strings of each thematic dataset, we combine these strings by using again formula (2) and we obtain the final string for the whole spatial database. This string is to be recalculated by considering the weights assigned to each thematic dataset.

For each isoreliable zone, then we obtain the corresponding isoreliability index by taking the central value of the TFN related to that zone in the final string. We test our method considering the spatial database of the town of Cava de’ Tirreni, near Salerno (Italy).

In Section 2 we present the algebraic structure given in [6]. Section 3 contains our method, Section 4 gives the results of our tests, and Section 5 is conclusions.

2. Definition of the Algebraic Structure

2.1. The Operations

We recall the main properties of the algebraic structure given in [6]. Let be the universe of discourse and an ordered -tuple of linguistic labels, each composed from one or more linguistic modifiers and a variable, as, for example, “ = False,” “ = More or Less Good,” …,“ = Good,” “ = Very Good,” …,“ = Completely Good,” and each represented by suitable TFNs denoted also by , (see, e.g., Table 3). Let be a fuzzy attribute, that is, a map , represented by a string of the following type: where is a subset of , also called “class” in the sequel. If , then we write . Let be another fuzzy attribute represented by the following string: where the used symbols have a similar meaning to the above ones. In accordance with [6], we define the operation between and by setting where, by assuming without loss of generality, the subsets are given from the following formulas for :

As suggested in [6], the subsets ci can be calculated by using a simple rule based on the usual arithmetical multiplication. The TFNs , for , are indeed given by with the above coefficients , for , defined by

The index (resp., ) represents the number of subsets (resp., ) of the string (resp., ) involved in the operation of union performed to obtain the subsets of the resulting fuzzy attribute , whereas the index (resp., ) stands for the total number of subsets of (resp., of ) involved in the operation of intersection which gives the subsets of .

2.2. The Weights of the Attributes

The first step, which precedes the above mentioned operations over the strings, consists in the determination of the weights of each attribute connected to a fixed zone because they can vary by changing zone. Strictly speaking, the above model implies the necessity to build a mean of the weights of the zones which have the same linguistic label in an attribute. This mean shall be the weight of that linguistic label, which in turn is multiplied for the middle point of the TFN, representing the same label, giving a number , of which we consider the smallest integer contained in it, that is, . At the right of the same linguistic label, thus we create -linguistic labels “approximated” with the procedure of Section 2.3. For example, we consider six zones in which the fuzzy attribute has received six values with the related weights in accordance with Table 3. Then if , then the fuzzy attribute is represented by the following string: and consider the linguistic label . For simplicity, let us denote by W1i the weight of the attribute for the zones with . Then the mean value for is equal to 2, to be multiplied for 1.0 (cf. Table 2) giving which represents the number of new linguistic labels, inserted at the right of . Other new linguistic labels shall not be inserted at the right of the three remaining labels since we have, with evident meaning of the symbology, , , and obtaining , , and . Then we obtain the following finer string for the attribute :

This methodology gives the advantage to improve the position of the objects (in our case study, the isoreliable zones) in the set of the attributes, just bearing in mind the new linguistic labels with which the objects can be associated. The calculation of the membership functions for the TFNs, representing the new linguistic labels, is made in the following way.

Let be the considered linguistic label present in the attribute and let be the number of the new linguistic labels obtained with the above procedure. Let be the linguistic label immediately following in the linguistic labels of . For every , we put and similarly for and . Then is the TFN representative of the linguistic label .

2.3. Approximation of the Linguistic Labels

Some TFNs obtained in the final fuzzy attribute, after the successive composition of several strings, must be reconverted in linguistic labels, which can be approximated to known TFNs using the following procedure.

Let be the TFN to be approximated and , TFNs known (i.e., the meaning of their linguistic labels is known) such that . By setting and if , then we put ; if , then we say is “Next To ” and we write ; if , then we say is “Included Between and ” and we write ; if , then we say is “Before To ” and we write ; if , then we put . For instance, taking in account the TFNs of Table 3, let as in Section 2.1. Since and , it is easily seen that .

We note that no matter of comparison between , , and and similarly for , , and is requested in this procedure.

3. Fuzzy Reliability for Spatial Databases

We model a spatial database in a three-level hierarchic structure as showed in Figure 1. The spatial database is composed of thematic datasets referred to as specific spatial domains. Each thematic dataset is composed of layer, that is, of geo-referenced vectors or raster themes.

As in [4], after applying the algebraic structure on the strings corresponding to each parameter and recalculating the final string considering the weights assigned to the parameters, we reuse the algebraic structure operator applying it to the final strings associated with the layers of a thematic dataset and recalculating the final string obtained considering the weights associated with these layers. For obtaining the reliability index of the spatial database we apply the operator of the algebraic structure to the final strings associated with each thematic dataset and recalculate the obtained final string considering the weights assigned to each thematic dataset. Therefore we estimate the reliability of the spatial database in each isoreliable zone and apply the calculus on the algebraic structure as described in Section 2 in each level of our spatial database model. Below we describe the single steps that compose our method. (1)The domain expert creates a partition of the area of study in isoreliable zones; each isoreliable zone is a geographical area, homogeneous in terms of data quality and environmental characteristics.(2)For each layer the parameters are identified, that is, the observables that affect the quality of the layer, and assigned; for each isoreliable zone, the labels of the corresponding TFNs and the weights of each parameter are assigned as well. The expert creates a fuzzy partition in TFNs of the domain of each parameter.(3)For each layer the operator of the algebraic structure [6] is applied on the parameters, obtaining a final string to be recalculated (as described in Section 2.2) by considering the weights assigned to the same parameters. We obtain the index of reliability of each layer; we call the map of this index the reliability map of the layer.(4)In each thematic dataset its layers are identified as parameters; the string associated with a layer is given by the final string calculated for this layer. For each isoreliable zone the expert assigns the weights of each layer considering the impact of the layer on the quality of its thematic dataset.(5)For each thematic dataset the operator of the algebraic structure [6] is applied on the related strings by obtaining a final string to be recalculated (as described in Section 2.2) by considering the weights assigned to the same layers. We obtain the index of reliability and the corresponding reliability map of the thematic dataset.(6)Now we identify as parameters the thematic datasets of the spatial database; the string associated with a thematic dataset is given by the final string calculated for this thematic dataset. The expert assigns, for each isoreliable zone, the weights of each thematic dataset considering the impact of the layer on the quality of the spatial database.(7)The operator of the algebraic structure [6] is applied on the string assigned to each thematic dataset, obtaining a final string to be recalculated (as described in Section 2.2) by considering the weights assigned to the same thematic datasets. We obtain the index of reliability and the related reliability map of the spatial database.

In Section 4 we present the results by applying our method on a spatial database based on the tool Fuzzy-SRA.

4. Test Results

In our tests the area of study is given by the town of Cava de’ Tirreni (Italy). Considering the data quality and the environmental and climatic characteristics, the area of study is partitioned in five isoreliable zones as showed in Figure 2.

We consider the most significant thematic datasets and layers of the spatial database. In our test we consider 5 thematic datasets. In Table 5 we show the thematic datasets and the parameters chosen for each layer.

In the choice of the parameters, particular attention was focused on the absence of primary information connected to geographical entities (e.g., the height of a building or of a spot elevation’s point).

Other characteristics that affect the quality of a layer consist of geometric and topological types of errors (isolated street lines or particles intersecting between them).

To simplify the calculus we create for each parameter a fuzzy partition in five TFNs, labeled as showed in Table 3. For brevity we show only the TFNs set for the parameter “density of the isolated lines” of the layer 1.1 street and for the parameter “density of intersecting polygons of the layer 2.1—Terrain parcels.”

For each isoreliable zone the weights of the parameters are assigned by an expert. For brevity, we show the weight assigned to the two previous parameters for each isoreliable zone.

After combining the strings related to each parameter of a layer, we obtain a final string that is recalculated considering the weights assigned to the same parameters. By using this string we obtain the reliability map of the layer. For brevity, considering Tables 4, 6, 7, 8, 9, we show the isoreliability maps for the layers 1.1 (Figure 3) and 2.1 (Figure 4) with final string, respectively, given by

In these maps we note that the two less reliable zones are and ; in fact, in these zones the data are imprecise. After obtaining the final string for each layer of a thematic dataset, we combine them for obtaining the final string for the thematic dataset. Then we recalculate this final string by considering the weights assigned to the layers for each isoreliable zone. In Table 10 we show the weights assigned to each layer for the isoreliable zone .

Figure 5 shows the thematic map of the thematic dataset “Aerial photogrammetric data.” We consider as isoreliability values the central values of the TFN formed in the final string.

In Figure 6 we show the thematic map of the thematic dataset “Cadastral data.”

The two reliability maps show that in the isoreliable zones and the quality of the data is poor. This result is confirmed for all the thematic datasets. In Table 11 we show the reliability values obtained for the five thematic datasets in each isoreliable zone.

After the calculation of the final strings for each thematic dataset, we combine them for obtaining the final string of the spatial dataset of Cava de’ Tirreni (Italy). This final string is recalculated by considering the weights assigned to the thematic datasets for each isoreliable zone. The weights assigned for the five isoreliable zones are showed in Table 12.

The weights of the spatial datasets 3 and 4 for the isoreliable zones and are different with respect to the ones assigned for the isoreliable zones , , and . Indeed in the isoreliable zones and the surface terrains slopes are significant and there are many hydrographic characteristics. Finally in Figure 7 we show the reliability map for the spatial database.

The results in Figure 7 confirm the previous ones corresponding to the single thematic datasets. The reliability index is good in the isoreliable zones and , fairly good in the isoreliable zone , and poor in the isoreliable zones and .

5. Conclusions

To give an evaluation of the reliability of spatial database is a complex problem due to the lack of homogeneity of the spatial datasets and to the variation of the data quality on the area of study. Then the usage of a fuzzy logic approach is adequate for measuring the quality of spatial information. In this research we adopt the fuzzy algebraic structure [6] and the fuzzy reliability method applied in [4, 5] for evaluating the reliability of spatial datasets, in order to estimate the reliability of a whole spatial database.

We structure the spatial database in a three hierarchical levels, evaluating the reliability of the single layers, of the thematic datasets, and finally of the spatial database. We test our method on the spatial database of Cava de’ Tirreni (Italy). An expert identifies the isoreliable zones and assigns the weights to each parameter, to the layers, and to the thematic datasets. We present the results obtained and the final reliability map of the spatial database on the area of study.

Acknowledgment

This work is performed in the context of the Project FARO 2010–2013 under the auspices of the “Polo delle Scienze e delle Tecnologie” dell’Università degli Studi di Napoli Federico II.