Computational Intelligence and Neuroscience

Volume 2016 (2016), Article ID 4835932, 14 pages

http://dx.doi.org/10.1155/2016/4835932

## Improved Ant Colony Clustering Algorithm and Its Performance Study

Key Laboratory of Ministry of Education for Geomechanics and Embankment Engineering, College of Civil and Transportation Engineering, Hohai University, Nanjing 210098, China

Received 25 May 2015; Revised 18 August 2015; Accepted 16 September 2015

Academic Editor: Jussi Tohka

Copyright © 2016 Wei Gao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering.

#### 1. Introduction

Clustering divides data into homogeneous subgroups, with some details disregarded to simplify the data. Clustering can be viewed as a data modeling technique that provides for concise data summaries. The objective of the division is twofold: data items within one cluster must be similar to each other, whereas those within different clusters should be dissimilar. Problems of this type arise in a variety of disciplines ranging from sociology and psychology to commerce, biology, computer science, and civil engineering. Clustering is thus utilized in many disciplines and plays an important role in a broad range of applications; because of this, clustering algorithms continue to be the subject of active research. Consequently, numerous clustering algorithms exist that can be classified into four major traditional categories: partitioning, hierarchical, density-based, and grid-based clustering methods [1].

The ant-based clustering algorithm is a relatively new method inspired by the clustering of corpses and larval sorting activities observed in actual ant colonies. The first studies in this field were conducted by Deneubourg et al. [2], who proposed a basic model that allowed ants to randomly move, pick up, and deposit objects in clusters according to the number of similar surrounding objects. This basic model has been successfully applied in robotics. Lumer and Faieta [3] modified the basic model into the LF algorithm, which was extended to numerical data analysis. The algorithm’s basic principles are straightforward: ants are modeled as simple agents that randomly move in their environment, a square grid with periodic boundary conditions. Data items that are scattered within this environment can be picked up, transported, and dropped by the agents. The picking and dropping operations are biased by the similarity and density of the data items within the ants’ local neighborhood: ants are likely to pick up data items that are either isolated or surrounded by dissimilar ones, and they tend to drop them in the vicinity of similar ones. In this way, clustering and sorting of the elements are obtained on the grid.

As a recently developed bionics optimization algorithm, the ant colony clustering algorithm possesses several advantages over traditional methods such as flexibility, robustness, decentralization, and self-organization [4–6]. These properties are well suited in distributed real-world environments. It has thus been applied in many fields such as data mining [4], graph partitioning [7], and text mining [8].

There has been a significant amount of research recently conducted on the improved performance and wider applications of ant colony clustering algorithms.

Ramos and Merelo [8] studied ant-based clustering with different ant speeds in the clustering of text documents. Wu and Shi [13] studied similarity coefficients and proposed a simpler probability conversion function. Moreover, the clustering algorithm was combined with a* K*-means method to solve document clustering. The new algorithm was called CSIM [14]. Xu et al. [15] suggested an artificial ant sleeping model (ASM) and an adaptive artificial ant clustering algorithm (A^{4}C) to solve the clustering problem in data mining. In the ASM model, each datum was represented by an agent within a two-dimensional grid environment. In A^{4}C, the agents formed into high-quality clusters by making simple moves based on local information within neighborhoods. An improved ant clustering algorithm called Adaptive Time-Dependent Transporter Ants (ATTA) was proposed [16] that incorporated adaptive and heterogeneous ants and time-dependent transporting activities. Yang et al. [17, 18] proposed a multi-ant colony approach for clustering data that consisted of parallel and independent ant colonies and a queen ant agent. Each ant colony had a different moving speed and probability conversion function. A hypergraph model was used to combine the results of all parallel ant colonies. Kuo et al. [19] proposed a novel clustering method called an ant* K*-means (A) algorithm. The A algorithm modified the* K*-means to locate objects in a cluster with probability that was updated by the pheromone, whereas the rule of the updating pheromone was based on total cluster variance. An improved ant colony optimization-based clustering technique was proposed using nearest-neighborhood interpolation, and an efficient arrhythmia clustering and detection algorithm based on a medical experiment and new ant colony clustering technique for a QRS complex was also presented [20]. Ramos et al. [21] proposed a new clustering algorithm called Hyperbox Clustering with Ant Colony Optimization (HACO) that clustered unlabeled data by placing hyperboxes in the feature spaces optimized by the ant colony optimization. A novel ant-based clustering algorithm called ACK was proposed [9] that incorporated the merits of kernel-based clustering into ant-based clustering. Tan et al. [22] proposed a simplified ant-based clustering (SABC) method based on existing research of a state-of-the-art ant-based clustering system. Tao et al. [12] redefined the distance between two data objects and improved the strategy for ants letting go and picking up data objects, thus proposing an improved ant colony clustering algorithm. Wang et al. [10] proposed an improvement to the ATTA called Logic-Based Cold Ants (LCA). In LCA, ant populations initially pick up data objects and calculate the current locations suitable for dropping; they then take the data objects not suitable for putting down directly to various objects that maximize the similarity value of the position. Moreover, to allow for the rapid formation of class cluster centers, a logic-based similarity measure was proposed in which an ant classifies objects as similar or dissimilar and groups similar objects while detaching dissimilar ones. Xu et al. [23] proposed a constrained ant clustering algorithm that was embedded with a heuristic walk mechanism based on a random walk to address constrained clustering problems that give pairs must-link and cannot-link constraints. More recently, Inkaya et al. [24] presented a novel clustering methodology based on ant colony optimization (ACO-C). In this ACO-C, two new objective functions were used that adjusted for compactness and relative separation. Each objective function evaluated the clustering solution with respect to the local characteristics of the neighborhoods.

Although many of these recently created methods appear promising, there are still shortcomings with ant colony clustering algorithms. Because ants move randomly and spend significant time finding proper places to drop or pick up objects, the computational efficiency and accuracy of ant colony clustering algorithms are low, particularly for large and complicated engineering problems. To overcome these shortcomings, a new abstraction ant colony clustering algorithm is proposed that uses a data combination mechanism. In this new algorithm, the random projections of the patterns are modified to improve computational efficiency and accuracy. The performance of the new algorithm is verified by actual datasets and compared with those of the ant colony clustering algorithm and other algorithms proposed in previous studies.

#### 2. Ant Colony Clustering Algorithm and Abstraction Ant Colony Clustering Algorithm

##### 2.1. Ant Colony Clustering Algorithm

To correctly describe the proposed algorithm, the basic principle underlying the ant colony clustering algorithm must be introduced.

First, data objects are randomly projected onto a single plane. Next, each ant chooses an object at random and picks up, moves, and drops the object according to a picking-up or dropping probability based on the similarity of the current object to objects in the local region. Finally, clusters are collected from the plane.

The ant colony clustering algorithm is described by the following pseudocode:(1)Initialization: initialize the number of ants , the entire number of iterations , the local region side length , the constant parameters and , and the maximum speed .(2)Project the data objects onto a plane; that is, assign a random pair of coordinates () to each object.(3)Each ant that is currently unloaded chooses an object at random.(4)Each ant is given a random speed ;(5)For For The average similarity of all of the clustered objects is calculated. If the ant is unloaded, the picking-up probability is computed. If is greater than a random probability and an object is not simultaneously picked up by another ant, the ant picks up this object, marks itself as loaded, and moves this object to a new position; otherwise, the ant does not pick up this object and randomly selects another object. If the ant is loaded, the dropping probability is computed. If is greater than a random probability, the ant drops the object, marks itself as unloaded, and randomly selects a new object; otherwise, the ant continues moving the object to a new position. End End(6)For // for all objects [18] If an object is isolated (i.e., the number of neighbors it possesses is less than a given constant) then it is labeled as an outlier; otherwise, give this object a cluster labeling number and recursively label the same number to those objects that are neighbors of this object within the local region. End

The operations of the algorithm are described in detail in the following section.

###### 2.1.1. The Average Similarity Function

We assume that an ant is located at site at time and that it finds an object at that site. The average similarity density of object with the other objects present in its neighborhood is given bywhere defines a parameter used to adjust the similarity between objects. The parameter defines the speed of the ants, and is the maximum speed. is distributed randomly in . denotes a square of sites surrounding site . is the distance between two objects and in the space of attributes. The Euclidean distance is used, which can be determined aswhere defines the number of attributes.

From (1), we note that the parameter affects the number of clusters and the algorithm convergence rate. Objects with greater degrees of similarity have greater values of and tend to cluster. Thus, the number of clusters decreases, and the algorithm becomes faster. On the contrary, if is smaller, the objects have smaller degrees of similarity, and the larger group will split into smaller groups. Thus, the number of clusters will increase, and the algorithm will become slower.

###### 2.1.2. The Probability Conversion Function

The probability conversion function is a function of , and its purpose is to convert the average similarity into picking-up and dropping probabilities. This approach is based on the following: the smaller the similarity of a data object is (i.e., fewer objects belong to the same cluster in its neighborhood), the higher the picking-up probability is, and the lower the dropping probability is. However, the larger the similarity is, the lower the picking-up probability is (i.e., objects are unlikely to be removed from dense clusters), and the higher the dropping probability is. According to this principle, the sigmoid function is used as the probability conversion function.

The picking-up probability for a randomly moving ant that is currently not carrying an object to pick up an object is given bywhere is the average similarity function.

Using the same method, the dropping probability for a randomly moving, loaded ant to deposit an object is given by

The sigmoid function has a natural exponential form ofwhere is a slope constant that can speed up the algorithm convergence if increased.

It must be pointed out that, during the clustering procedure, some objects may exist (called outliers) with high dissimilarity to all other data elements. The outliers prevent ants from dropping them, which slows down the algorithm convergence. Here, we choose a larger parameter to force the ant to drop the outliers at the later stage of the algorithm.

##### 2.2. Abstraction Ant Colony Clustering Algorithm

The process behind the abstraction ant colony clustering algorithm is described as follows.

*(1) Initialization*. data objects are put into data reactors randomly , where one data reactor is corresponding to one data type.

*(2) Iteration*. Initially, ants are assigned to one data reactor, and this data reactor is the first visited data reactor. Each ant will traverse (the maximum iteration step) steps to visit each data reactor. In this process, the most dissimilar data objects in each visited data reactor will be selected to be put into a suitable data reactor.

During the iteration process, each ant should abide by the following rules:

(1) If one ant visits one data reactor while only one data object exists in this data reactor, this data object will be picked up with probability 1 to be dropped at suitable data reactor.

(2) If an ant is not loading a data object and the visited data reactor contains more than one data object, then the average similarity of all the data objects in the current data reactor (the similarity of one data object to the other data object in the current data reactor) is computed. The ant picks up the most dissimilar data object with probability and randomly visits another data reactor.

The average similarity of the data object in the current data reactor can be described as where is the average similarity of data to other data objects in the current data reactor, is the number of data objects in the data reactor visited by the current ant, is the data reactor that the data object belongs to, is the Euclidean distance, and defines a parameter used to adjust the similarity between objects.

The picking-up probability of a data object in the current data reactor can be described as where is a threshold for picking up one data object.

If , then . This is to say that the ant will pick up this data object, which is not similar to other data objects in this data reactor, with a very high probability. On the contrary, if , then , which shows that the object is similar to other data objects in data reactor, and this object has a very small probability of being picked up.

(3) If an ant that has loaded the data object visits one data reactor that contains more than one data object, the ant will place the data object into the current data reactor, and the “average similarity” of all the data objects in the current data reactor can be computed. Next, the most dissimilar data objects in the current data reactor will be picked up with a probability . Finally, the ant loads this new data object and visits the next data reactor.

(4) If an ant with one data object loaded has not found the data reactor to drop the data object after steps, the ant will construct a new data reactor to place the data object into.

When the number of clustering types is larger than the practical number of data object types, one principle of data reactor combination is applied. Before the most dissimilar data object in the current data reactor visited by the ant is selected, the current data reactor will be compared with the other data reactors, and the data reactors that are similar to a given degree will be combined with some probability.

The combination probability of data reactors can be described as where is the similarity function of the two data reactors and , which can be described as where is the Euclidean distance between the two data reactors’ centers, is the center of data reactor and is the center of data reactor , is a parameter used to adjust the similarity between data objects, and is a threshold parameter.

If , the combination probability will be , and if , the combination probability will be 1.

*(3) Termination*. The termination condition is that the difference of the clustering results for neighboring iterations is less than .

The flowchart of the abstraction ant colony clustering algorithm is as shown in Figure 1.