Hotspots Detection in Spatial Analysis via the Extended Gustafson-Kessel Algorithm

Di Martino, Ferdinando; Sessa, Salvatore

doi:https://doi.org/10.1155/2013/876073

Advances in Fuzzy Systems

On this page

Abstract Introduction Conclusions References Copyright Related Articles

Special Issue

Fuzzy Methods and Approximate Reasoning in Geographical Information Systems

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 876073 | https://doi.org/10.1155/2013/876073

Hotspots Detection in Spatial Analysis via the Extended Gustafson-Kessel Algorithm

Ferdinando Di Martino¹and Salvatore Sessa^1,2

Academic Editor: Sabrina Senatore

Received31 Oct 2013

Accepted10 Nov 2013

Published09 Dec 2013

Abstract

We show a new approach for detecting hotspots in spatial analysis based on the extended Gustafson-Kessel clustering method encapsulated in a Geographic Information System (GIS) tool. This algorithm gives (in the bidimensional case) ellipses as cluster prototypes to be considered as hotspots on the geographic map and we study their spatiotemporal evolution. The data consist of georeferenced patterns corresponding to positions of Taliban’s attacks against civilians and soldiers in Afghanistan that happened during the period 2004–2010. We analyze the formation through time of new hotspots, the movement of the related centroids, the variation of the surface covered, the inclination angle, and the eccentricity of each hotspot.

1. Introduction

Hotspot detection is a known spatial clustering process in which it is necessary to detect spatial areas on which specific events thicken [1]; the patterns are the events georeferenced as points on the map; the features are the geographical coordinates (latitude and longitude) of any event. Hotspot detection is used in many disciplines, as in crime analysis [2–4], for analyzing where crimes occur with a certain frequency, in fire analysis [5] for studying the phenomenon of forest fires, and in disease analysis [6–9] for studying the localization and the focuses of diseases. Generally speaking, for detecting more accurately the geometrical shapes of hotspot areas algorithms based on density [10, 11] are used and they measure the spatial distribution of patterns on the area of study, but these algorithms have a high computational complexity.

In [5, 12, 13] a new hotspot detection method based on the extended fuzzy C-means algorithm (EFCM) [14, 15] was proposed, which is a variation of the famous fuzzy C-means (FCM) algorithm that detects cluster prototypes as hyperspheres. With respect to the FCM algorithm, the EFCM algorithm has the advantages of determining recursively the optimal number of clusters and being robust in the presence of noise and outliers. In [5, 12, 13] the EFCM is encapsulated in a GIS tool for detecting hotspots as circles displayed on the map. The pattern event dataset is partitioned according to the time of the event’s detection, so each subset is corresponding to a specific time interval. The authors compare the hotspots obtained in two consecutive years by studying their intersection on the map. In this way it is possible to follow the evolution of a particular phenomenon studying how its incidence is shifting and spreading through time.

In this paper we present a new hotspot detection method based on the extended Gustafson-Kessel algorithm (EGK) [14, 15] for studying the spatiotemporal evolution of hotspots. Our aim is to improve the shape of the hotspots, maintaining a good computational complexity: indeed the EGK algorithm gives the cluster prototypes as hyperellipsoids and ellipses in the bidimensional case. The EGK algorithm is an extension of the Gustafson-Kessel (GK) algorithm [16] which we briefly present.

Let be a dataset composed of patterns , where is the th component (feature) of the pattern . The GK algorithm minimizes the following objective function: where is the number of clusters fixed a priori, is the membership degree of the pattern to the th cluster (), is the set of points given by the centroids of the clusters, is the fuzzifier parameter, and is the distance between and . The general form of this distance is given by where is the norm matrix, defined to be symmetric and positive. In the FCM algorithm is equal to the identity matrix . In the GK algorithm the following Mahalanobis distance [17] is used: where is the covariance matrix of the th cluster given by

The covariance matrix provides information about the shape and orientation of the cluster. The length of the th axis of the hyperellipsoid is given by the root square of the th eigenvalue of . The directions of the axes of the hyperellipsoid are given by the directions of the eigenvectors of the matrix . In Figure 1 we show an example of ellipsoidal cluster prototype.

Using the Lagrange multipliers for minimizing objective function (1), we obtain the following solution for the centroids of each cluster prototype: where and are given by:

Initially the ’s and the are assigned randomly and updated in each iteration. If is the matrix U calculated at the th step, the iterative process stops when where is a prefixed parameter.

This algorithm is sensitive to the presence of outliers and noise and the number of cluster is fixed a priori; as in the FCM algorithm, we need to use a validity index for determining an optimal value for the number of clusters . In order to overcome these shortcomings, in [1, 16] the EGK algorithm is proposed which is a variation of the GK algorithm: there the optimal number of clusters is obtained during the iteration process. Furthermore, the EGK algorithm is robust with respect to the presence of noise and outliers.

In this paper we propose a new approach based on the EGK clustering method for detecting hotspots and studying their spatiotemporal evolution. Taking into consideration the bidimensional case, we obtain ellipses to be approximated as hotspot area better than the circular areas produced in the EFCM method.

Figure 2 shows an example of two intersecting elliptical hotspots, obtained as clusters detected by means of EGK method in two consecutive periods.

Figure 2 show three different regions:(i)the area in which the hotspot is not intersected by the hotspot (corresponding to ): this region can be considered as a set of geographical areas in which the prematurely detected event disappears successively;(ii)the area of intersection : this area can be considered a geographical area in which the event persists in the course of time;(iii)the area in which the hotspot is not intersected by the hotspot (corresponding to ): this region can be considered as a set of geographical areas in which the prematurely not detected event propagates successively.

We can study the spatiotemporal evolution of the hotspots by analyzing the interactions between elliptical hotspots detected for consecutive periods, by verifying the presence of clusters in areas in which clusters have not yet been detected previously and the disappearance of clusters in areas previously covered by hotspots.

In this research we present a method for studing the spatiotemporal evolution of hotspots areas of war in Afghanistan; we apply the EGK algorithm for comparing consecutive years’ event datasets corresponding to positions of Taliban’s attacks against civilian and soldiers. Each event corresponds to the geolocalization of the site where Taliban’s attack happened as well.

We study the spatiotemporal evolution of the hotspots by analyzing the intersections of hotspots corresponding to two consecutive years, the displacement of the centroids, the increase or reduction of the hotspots areas, and the emergence of new hotspots.

In Section 2 we give an overview of the EGK algorithm. In Section 3 we present our method for studying the spatiotemporal evolution of hotspots in spatial analysis. In Section 4 we present the results of the spatiotemporal evolution of hotspots. Our conclusions are contained in Section 5.

2. The EGK Algorithm

In the EGK algorithm we consider clustering prototypes given by hyperellipsoids in the -dimensional feature’s space. The th hyperellipsoidal cluster prototype is characterized by a centroid and a mean radius and we say that belongs to if .

The radius is obtained considering the covariance matrix of the th cluster, defined by (4), whose determinant gives the volume of the th cluster. Since is symmetric and positive, it can be decomposed in the form where is an orthonormal matrix and is a diagonal matrix. The mean radius is given as [15]

In the EGK algorithm the objective function to be minimized is where is given by the Mahalanobis distance (3). The solutions obtained for the centroids are given by formula (5) and the functions are obtained by where for . By setting and for any , we have that formula (11) holds if , while if , one defines in the following way:

Formula (12) produces the negative effect of diminishing the objective function (10) when a meaningful number of features are placed in a cluster; this effect can prevent the separation of the clusters. In order to solve this problem in [15], one starts with a small value and by setting gradually , where is the number of clusters at the th iteration and is defined recursively as ( is the initial number of clusters). By setting one defines the symmetric matrix , where . If is the matrix at the th iteration () and the threshold is introduced, two indexes and are determined such that . Then these indices are merged by setting, for any , thus, the th row can be removed from the matrix . In conclusion the following steps hold for the EGK algorithm.(1)The user assigns initially , (usually ), , and .(2)are fixed randomly.(3) and are calculated with formulae (5) and (9), respectively.(4) are calculated with formulae (11) and (12).(5)Determine and such that .(6)If , then the th and th clusters are merged via formula (14) and the th row is deleted from .(7)If formula (7) is satisfied, then the process stops otherwise go to for the th iteration.

3. Hotspots Detection and Evolution in Military War

Each pattern is given by the event corresponding to a place in which an attack has occurred. The two features of the pattern are the geographic coordinates of this place.

We divide the event dataset into subsets corresponding to the events that occurred in a specific year or set of years. For each subset of events we apply the EGK algorithm to detect the final cluster prototypes.

The dataset is extracted from the URL http://www.acleddata.com/data/asia/; the data are the geolocalizations of Taliban’s attacks in Afghanistan during the period 2004–2010. The EGK algorithm is encapsulated in the ESRI/ArcGIS tool. Figure 3 shows the mask used for setting the parameters and running the EGK algorithm.

We can set other numerical fields for adding other features to the geographical coordinates. Initially we set the initial number of clusters, the fuzzifier (equal to 2 by default), and the error threshold for stopping the iterations (equal to 0.01 by default). At the end of the process we displayed on the form of the number of iterations, the final number of clusters, and the error calculated at the last iteration. The resultant clusters are shown as ellipses on the geographical map and can be saved in a new geographical layer.

In Figure 4 we show the mask used for displaying the information of each elliptical prototype detected: centroid’s coordinates, length of each semiaxis, and orientation of the ellipses with respect to the horizontal plane on the geographical map.

The final process concerns the comparative analysis of the hotspots obtained by the final clusters resulting for each subset of events. Figure 5 shows an example of the display of hotspots obtained as final clusters corresponding to three consecutive years.

In order to assess the expansion and the displacement of any hotspot, we measure the area covered by each hotspot, the distance between the centroids of two intersecting hotspots detected in consecutive periods, the variation of the inclination angle, the eccentricity, and the length of both semiaxis.

4. Test Results

After partitioning the dataset in the five periods 2004–2006, 2007, 2008, 2009, and 2010, respectively, we apply the EGK algorithm for detecting the sequences of elliptical cluster prototypes. We fix , , and . Table 1 shows the results obtained for each period.

We present the details relating to the comparison of the hotspots by considering the event data that occurred in the five periods. In Figures 6, 7, 8, 9, and 10 we show the hotspots detected.

By analyzing Figures 6–8 we can deduce that in the period 2004–2008 seven hotspot areas approximated as ellipses are present; in these periods each hotspot modified only slightly its angle, width, and position of the centroid. In the years 2009 and 2010 a new hotspot is detected in a region neighboring with Turkmenistan. In Figure 10 the hotspots obtained for two consecutive years 2009 and 2010 are overlapped as well. In blue (resp., red) we enumerate the hotspots corresponding to the year 2009 (resp., 2010); see Figure 11. The hotspots are labeled, and the hotspot number 8 is the new hotspot detected and coming from overlap of the related hotspots.

In Table 2 the first column shows the labels of each hotspot; the second and third columns show the area, in km², of the hotspot detected in 2009 and 2010, respectively. The fourth column (resp., fifth) shows the intersection area of the two hotspots (resp., the percentage of area of the hotspot detected in 2009 covered by the corresponding hotspot detected in 2010, that is, the ratio “intersection area/area hotspot detected in 2009”).

The results in Table 2 show that over 65% of the area of each hotspot detected in 2009 is also covered by the corresponding hotspot detected in 2010. Another significant result is the increase of the area of the hotspot 8, which exceeds 2 × 10⁴ km² in 2010. In Table 3 we show the eccentricity of each hotspot and the distance between the centroids of each hotspot detected in 2009 and the corresponding one detected in 2010.

The results show that the eccentricity increases significantly in 2010 for hotspots 4 and 6, whereas it decreases for hotspot 3; the eccentricity remains almost unchanged for the remaining hotspots in 2010. Another significant result is the distance exceeding 40 km between the centroid of hotspot 4 detected in 2009 and the centroid of the corresponding hotspot detected in 2010.

5. Conclusions

We present a new approach for detecting hotspots in spatial analysis using the EGK clustering method encapsulated in a GIS tool. Similar to the EFCM algorithm, the EGK method is robust with respect to noise and outliers and we obtain the optimal number of the clusters iteratively during the process; furthermore, it has the advantage to detect hotspots of elongated shape. In our experiments we consider the site of Taliban’s attacks in Afghanistan during the period 2004–2010. The spatial dataset is partitioned into subsets in order to study the evolution of the hotspots through time. We study the evolution of each hotspot in terms of movement of the centroids, surface covered, inclination, and eccentricity. The results show the formation, starting from 2009, of a new hotspot in the north-western zone neighboring with Turkmenistan. The results of the comparison of the hotspots detected in 2009 and 2010 show that this hotspot is increased with an extension of (about) 2 × 10⁴ km².

Acknowledgment

This work is performed in the context of the project FARO 2010–2013 under the auspices of the “Polo delle Scienze e delle Tecnologie” of Università degli Studi di Napoli Federico II, Vapoli, Italy.

References

T. H. Grubesic and A. T. Murray, “Detecting hotspots using cluster analysis and GIS,” Annual Conference of CMRC, Dallas, 2001, http://www.ojp.usdoj.gov/cmrc.
View at: Google Scholar
S. P. Chainey, S. Reid, and N. Stuart, “When is a hotspot a hotspot? A procedure for creating statistically robust hotspot geographic maps of crime,” in Innovations in GIS 9: Socioeconomic Applications of Geographic Information Science, D. Kidner, G. Higgs, and S. White, Eds., Taylor and Francis, London, UK, 2002.
View at: Google Scholar
K. Harries, Geographic Mapping Crime: Principle and Practice, National Institute of Justice, Washington, DC, USA, 1999.
A. T. Murray, I. McGuffog, J. S. Western, and P. Mullins, “Exploratory spatial data analysis techniques for examining urban crime,” British Journal of Criminology, vol. 41, no. 2, pp. 309–329, 2001.
View at: Publisher Site | Google Scholar
F. Di Martino and S. Sessa, “The extended fuzzy C-means algorithm for hotspots in spatio-temporal GIS,” Expert Systems with Applications, vol. 38, no. 9, pp. 11829–11836, 2011.
View at: Publisher Site | Google Scholar
M. R. Barillari, U. E. Barillari, F. Di Martino, R. Mele, I. Perfilieva, and S. Senatore, “Spatio-temporal hotspot analysis for exploring evolution of diseases: an application to oto-laryingo-pharingeal diseases,” Advances in Fuzzy Systems, vol. 2013, Article ID 385974, 7 pages, 2013.
View at: Publisher Site | Google Scholar
R. M. Mullner, K. Chung, K. G. Croke, and E. K. Mensah, “Geographic information systems in public health and medicine,” Journal of Medical Systems, vol. 28, no. 3, pp. 215–221, 2004.
View at: Publisher Site | Google Scholar
K. Polat, “Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets,” Journal of Medical Systems, vol. 36, no. 4, pp. 2657–2673, 2012.
View at: Google Scholar
C.-K. Wei, S. Su, and M.-C. Yang, “Application of data mining on the development of a disease distribution map of screened community residents of taipei county in Taiwan,” Journal of Medical Systems, vol. 36, no. 3, pp. 2021–2027, 2012.
View at: Google Scholar
I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773–780, 1989.
View at: Publisher Site | Google Scholar
R. Krishnapuram and J. Kim, “Clustering algorithms based on volume criteria,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 2, pp. 228–236, 2000.
View at: Publisher Site | Google Scholar
F. Di Martino, V. Loia, and S. Sessa, “Extended fuzzy c-means clustering algo-rithm for hotspot events in spatial analysis,” International Journal of Hybrid Intelligent Systems, vol. 4, pp. 1–14, 2007.
View at: Google Scholar
F. Di Martino and S. Sessa, “Implementation of the extended fuzzy C-means algorithm in geographic information systems,” Journal of Uncertain Systems, vol. 3, no. 4, pp. 298–306, 2009.
View at: Google Scholar
U. Kaymak, R. Babuska, M. Setnes, H. B. Verbruggen, and H. M. van Nauta Lemke, “Methods for simplification of fuzzy models,” in Intelligent Hybrid Systems, D. Ruan, Ed., pp. 91–108, Kluwer Academic, Dordrecht, The Netherlands, 1997.
View at: Google Scholar
U. Kaymak and M. Setnes, “Fuzzy clustering with volume prototypes and adaptive cluster merging,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, pp. 705–712, 2002.
View at: Publisher Site | Google Scholar
D. E. Gustafson and W. C. Kessel, “Fuzzy clustering with a fuzzy covariance matrix,” in Procedings of the 17th IEEE Conf Decis Control Incl Symp Adapt Processes, pp. 761–766, San Diego, Calif, USA, January 1979.
View at: Google Scholar
R. Gnanadesikan and J. R. Kettenring, “Robust estimates, residuals, and outlier detection with multiresponse data,” Biometrics, vol. 28, pp. 81–124, 1972.
View at: Google Scholar

Copyright

Copyright © 2013 Ferdinando Di Martino and Salvatore Sessa. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

4009

Downloads

1683

Citations