Advances in Fuzzy Systems

Volume 2016, Article ID 8198915, 16 pages

http://dx.doi.org/10.1155/2016/8198915

## Fuzzy Rules for Ant Based Clustering Algorithm

^{1}REGIM-Lab.: Research Groups in Intelligent Machines, University of Sfax, ENIS, BP 1173, 3038 Sfax, Tunisia^{2}Polytech Tours, University of Tours, Tours, France

Received 8 July 2016; Accepted 8 September 2016

Academic Editor: Katsuhiro Honda

Copyright © 2016 Amira Hamdi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS) algorithm with the fuzzy -means (FCM) clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy -means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of -means, -medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.

#### 1. Introduction

How do ants optimize food search? How do social spiders build communal nest? Why does a flock of birds fly in a v-shaped formation? How do termites build collectively their sophisticated nest structure? How do honey bee swarms cooperatively select their new nesting site? How does firefly flash its light in a wonderful pattern? How does a colony coordinate its behavior? How is it possible for social insects and animals to coordinate their actions and create complex patterns? How do such agents perform complex tasks without any direction and coordination between themselves? How agents in colony perform a work locally for global goal with sufficient flexibility as they are not controlled centrally? Collective behaviors in swarms of insects or animals have attached the attention of researches. They have proposed several intelligent models to solve a wide range of complex problems. This branch of artificial intelligence is addressed as swarm intelligence. The key components of swarm intelligence are self-organization, emergence, and stigmergy.

Self-organization is “a process whereby pattern at the global level of a system emerges solely from numerous interactions among the lower-level components of the system. Moreover, the rules specifying interactions among the system’s components are executed using only local information, without reference to the global pattern” [1]. In short it can be “a set of dynamical mechanisms whereby structures appear at the global level of a system from interactions of its lower-level components” [2].

Emergence seems to be the explication of what self-organizing systems produce. In this context the whole is not just the sum of its parts; it gets a surplus meaning that it is not captured by its part alone. The idea of emergence was firstly developed in [3] to explain indirect task coordination in the context of building behavior of termites. Grassé [3] showed that the coordination of building activities does not depend on the workers themselves but is mainly achieved by the nest structure.

The underlying idea of this paper is to propose a new approach to data clustering problem. We will show that the use of fuzzy logic combined with swarm intelligence technique yields robust results.

The remainder of the paper is organized as follows. In Section 2, we present an overview of data clustering problem and swarm intelligence tools for data clustering. A general description of our proposed approach called F-ASClass is given in Section 3. Experimental results and comparison with hard and fuzzy algorithms are reported in Section 4. Finally concluding remarks are drawn in Section 5.

#### 2. Literature Review

##### 2.1. Problem Definition

Cluster analysis is a technique that organizes data by abstracting underlying structure either as a grouping of objects. Each group consists of objects that are similar between themselves and dissimilar to objects of other groups.

Each object corresponds to a vector of numerical values which correspond to the numerical attributes. The relationships between objects are generated into a dissimilarity matrix in which rows and columns correspond to objects. As objects can be represented by points in numerical space, the dissimilarity between two objects can be defined as distance between the two corresponding points. Any distance can be used as dissimilarity measures. The most commonly used dissimilarity matrix is the* Minkowski* metric:where is a weighting factor that will be set to 1 thereafter. According to the value of (), the following measures are obtained: Manhattan distance (), Euclidean distance (), and* Chebyshev* distance (). As mentioned in [4], Euclidean distance is the most common of the* Minkowski* metric.

##### 2.2. Clustering Algorithms

The grouping step can be performed in a number of ways. In [5] different approaches to clustering data are described:(i)*Partitioning/Hierarchical Classification*. Partitional clustering technique identifies the partition that optimizes a clustering criterion defined on a subset of objects (locally) or over all of the objects (globally). Hierarchical clustering technique builds a sequence of nested partitions that are visualized, by example, by a dendrogram.(ii)*Hard/Fuzzy Classification*. A hard clustering algorithm allocates each object to a single cluster during its operations. Hence the clusters are disjoint. A fuzzy clustering algorithm associates each object with every cluster using a membership function. The output of such algorithm is a clustering but not a partition.(iii)*Deterministic/Stochastic*. Optimization in partitional approach can be accomplished using traditional technique or through a random search of the state space consisting of all possible labeling.(iv)*Supervised/Unsupervised Classification*. An unsupervised classification uses only the dissimilarity matrix. No information on the object class is provided to the method (objects are said unlabeled). In supervised classification, objects are labeled while knowing their dissimilarities. The problem is then to construct hyperplanes separating objects according to their class. The unsupervised classification objective is different from that of the supervised case: in the first case, the goal is to discover groups of objects while in the second, known groups are considered and the goal is to discover what makes them different or to classify new objects whose class is unknown.

Our proposed technique presented in this paper, which we call F-ASClass, belongs to fuzzy-semisupervised partitional clustering technique. It uses fuzzy rules and stochastic behavior to partition dataset into specified number of clusters. For the present paper, it suffices to note that the following techniques (-means, -medoid, and FCM) are used to improve F-ASClass algorithm. A comparative study between us will be presented in Section 4.

-means algorithm is a hard-unsupervised learning algorithm that appears to partition dataset into a specified number of clusters. The technique presented in [6] consists of starting with groups, each of which consists of a single randomly selected object, and thereafter adding each new object to its closest cluster center. After an object is added to a group, the mean of that group is adjusted in order to take account of the new added object. The algorithm is deemed to have converged when the assignments no longer change.

The -medoid algorithm described in [7] is based upon the search of representative objects of each cluster (called medoid), which should represent the various aspects of the structure of the data. -medoid algorithm is related to the -means algorithm. The main difference between -means and -medoid stands in calculating the cluster center. The medoid is a statistic which represents that data member of a dataset whose average dissimilarity to all the other members of the set is minimal. Therefore a medoid unlike mean is always a member of the dataset. It represents the most centrally located data item of the dataset.

The fuzzy -means (FCM) algorithm firstly presented in [8] and improved in [9, 10] allows one samples to belong to two or more one clusters. The first idea which aims to characterize an individual objects’ similarity to all the clusters was introduced in [11]. In this context, the similarity an object shares with each cluster is represented with a membership function whose values are between zero and one. Object in dataset will have a membership in every cluster; memberships close to unity indicate a high degree of similarity between the object and a cluster while memberships close to zero involve little similarity between the object and that cluster.

FCM method differs from previously presented -means and -medoid algorithms by the fact that the centroid of a cluster is the mean of all samples in the dataset, weighted by their degree of belonging to the cluster. The degree of belonging is presented by a function of the distance of the sample from the centroid, which includes a parameter controlling for the highest weight given to the closest sample. All these techniques are sensitive to initial condition.

Another fuzzy classification model is studied in [12] which constructs the membership function on the basis of available statistical data by using an extension of the well-known contamination neighborhood. Reference [13] presents a new fuzzy technique using an adaptive network of fuzzy logic connectives to combine class boundaries generated by sets of discriminant functions in order to address the “curse of dimensionality” in data analysis and pattern recognition.

Reference [14] is intended to solve the problem of dependence of clustering results on the use of simple and predetermined geometrical models for clusters. In this context, the proposed algorithm computes a suited convex hull representing the cluster. It determines suitable membership functions and hence represents fuzzy clusters based on the adopted geometrical model that it is used during the fuzzy data partitioning within an online sequential procedure in order to calculate the membership function.

##### 2.3. Swarm Intelligence Tools for Data Clustering Problem

We start with an illustration of swarm intelligence tools that have been developed to solve clustering problems: Particle Swarm Optimization [15], Artificial Bee Colony [16], Firefly algorithm [17], Fish swarm algorithm [18], and Ant Colony Algorithm. In [19] the basic data mining terminologies are linked with some of the works using swarm intelligence techniques. A comprehensive review of the state-of-the-art ant based clustering methods can be found in [20].

The first model of ants’ sorting behavior has been done by Deneubourg et al. [21] where a population of ants are randomly moving in a 2-dimensional grid and are allowed to drop or load objects using simple local decision rules and without any central control. The general idea is that isolated items should be picked up and dropped at some other locations where more items of that type are present. Based on this existing work, Lumer and Faieta [22] have extended it to clustering data problems. The idea is to define dissimilarity between objects in the space of object attributes. Each ant remembers a small number of locations where it has successfully picked up an object. And so, when deposing a new item this memory is used in order to bias the direction in which the ant will move: ant tends to move towards the location where it last dropped a similar item. From these basic models, in [23] Monmarché has proposed an ant based clustering algorithm, namely, AntClass which introduces clustering in a population of artificial ants capable to carry heaps of objects. Furthermore, this ant algorithm is hybridized with the -means algorithm. In [24], a number of modifications have been introduced on both LF and AntClass algorithm and authors have proposed AntClust, which is an ant based clustering algorithm for image segmentation. In AntClust, a rectangular grid was replaced by a discrete array of cells. Each pixel is placed in a cell and all cells of the array are connected to the nest of ants’ colony. Each ant performs a number of moves between its nest and the array and decides with a probabilistic rule whether or not to drop its pixel. If the ant becomes free, it searches for a new pixel to pick up [24].

According to [25], another important real ant’s collective behavior, namely, the chemical recognition system of ants, was used to resolve an unsupervised clustering problem. In [26], Azzag et al. considers another biologically observed behavior in which ants are able to build mechanical structure thanks to a self-assembling behavior. This can be observed through the formation of drops constituted of ants or the building of chains by ants with their bodies in order to link leaves together. The main idea here is to consider that each ant represents a data and is initially placed on a fixed point, called the support, which corresponds to the root of the tree. The behavior of an ant consists of moving on already fixed ants to fix itself to a convenient location in the tree. This behavior is directed by the local structure of the tree and by the similarity between data represented by ants. When all ants are fixed in the tree, this hierarchical can be interpreted as a partitioning of the data [26].

Bird flocks and schools clearly display structural order and appear to move as single coherent entity [27]. In [28, 29], it has been demonstrated that flying animal can be used to solve data clustering problem. The main idea is to consider that individuals represent data to cluster and that they move following local behavior rule in a way; after few movements, homogeneous individual clusters appear and move together. In [30] Abraham et al. propose a novel fuzzy clustering algorithm, called MPSO, which is based on a deviant variety of the PSO algorithm.

Social phenomena also exists in the case of spiders: in the* Anelosimus eximius* case, individuals live together, share the same web, and cooperate in various activities such as collective web building: spiders are gathered in small clusters under the vegetal leaves included in the web and distributed on the whole silky structure. In [31], the environment models the natural vegetation and is implemented as a square grid in which each position corresponds to a stake. Stakes can be of different heights to model the environmental diversity of the vegetation. Spiders are always situated on top of stakes and behave according to three several independent items (a movement item, a silk fixing item, and a return to web item). This model was transposed to region detection in image.

In [32] Hamdi et al. propose a new swarm-based algorithm for clustering, based on the existing work of [21, 23, 28] which uses ants’ segregation behavior to group similar objects together; birds’ moving behavior to control next relative positions for a moving ant; and spiders’ homing behavior to manage ants’ movements with conflicting situations.

In [33] we proposed using the stochastic principles of ant colonies in conjunction with the geometric characteristics of the bee’s honeycomb. This algorithm was called AntBee and it was improved in [34]. In this context, we used fractal rules to improve the convergence of the algorithm.

Another example of Ant Clustering algorithm, called AntBee algorithm, is developed in [29]. The proposed approach uses the stochastic principles of ant colonies in conjunction with the geometric characteristics of the bee’s honeycomb and the basic principles of stigmergy. An improved AntBee called FractAntBee was proposed [24] that incorporated main characteristics of fractal theory.

According to [35], a novel approach to image segmentation based on Ant Colony System (ACS) is proposed. In ACS algorithm an artificial ant colony is capable of solving the traveling salesman problem [36]. As in ACO for the TSP, in ACO-based algorithms for clustering, each ant tries to find a cost-minimizing path, where the nodes of the path are the data points to be clustered. Like in the TSP, the cost of moving from data point to is the distance between these points, measured by some appropriate dissimilarity metric. Thus, the next point to be added to the path tends to be similar to the last point on the path. An important way in which these algorithms deviate from ACO algorithms is that the ants do not necessarily visit all data points [37].

#### 3. Proposed Methodology

In ASClass algorithm, we have assumed that a graph of object has been collected by the domain expert where each object is a vector of numerical values . For measuring the similarity between objects we will use in the following Euclidean distance between two vectors, denoted by , which is used for edges in graph [38]. The complete set of parameters of our model will be presented in Section 3.3.

Initially all the objects will be scattered randomly on the graph ; each node in the graph represents an object in the datasets. The edge that connects two objects in the graph-data represents a measure of dissimilarity between these objects in the database. A class is represented by a route connecting a set of objects. In ASClass we chose to use more than one colony of ants, the number of colonies needed here is equal to the number of classes in the database. Initially, for each colony, artificial ants are placed on a selected object, called “nest-object.” For each cluster we randomly chose one and only one “nest-object.” The simulation model is detailed in Figure 1.