Advances in Fuzzy Systems

Volume 2015 (2015), Article ID 265135, 13 pages

http://dx.doi.org/10.1155/2015/265135

## Fuzzy Clustering Using the Convex Hull as Geometrical Model

Department of Information Engineering, Electronics and Telecommunications (DIET), University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy

Received 22 March 2015; Accepted 3 April 2015

Academic Editor: Katsuhiro Honda

Copyright © 2015 Luca Liparulo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A new approach to fuzzy clustering is proposed in this paper. It aims to relax some constraints imposed by known algorithms using a generalized geometrical model for clusters that is based on the convex hull computation. A method is also proposed in order to determine suitable membership functions and hence to represent fuzzy clusters based on the adopted geometrical model. The convex hull is not only used at the end of clustering analysis for the geometric data interpretation but also used during the fuzzy data partitioning within an online sequential procedure in order to calculate the membership function. Consequently, a pure fuzzy clustering algorithm is obtained where clusters are fitted to the data distribution by means of the fuzzy membership of patterns to each cluster. The numerical results reported in the paper show the validity and the efficacy of the proposed approach with respect to other well-known clustering algorithms.

#### 1. Introduction

Clustering algorithms always represented an efficient and important method for analysing either small or big amounts of data, namely, for dividing groups of objects into clusters by using some measures of similarity or dissimilarity on the basis of a suited number of features representing data [1–3]. The applications of clustering span in every field of science and technology, especially in machine learning, computer science, statistics, engineering, physics, mathematics, medicine, and so on. Early in the twentieth century, a huge number of algorithms and the related variants have been proposed in the literature, each being adapted to the specific field of application [4–16].

Clustering techniques deal with unsupervised learning as they are used when it is not possible to define data labels a priori. They utilize several metrics for the determination of similar objects (patterns) belonging to the same group (a cluster) that, in turn, are different from patterns of other clusters [17]. Clearly, the shape of a cluster is influenced by the chosen metric, such as Euclidean, Manhattan, Chebyshev, or Mahalanobis distance; in fact, two patterns can be “close” (or “similar”) using one metric and “far” (or “dissimilar”) by using another one.

Similar considerations are also valid when clusters are considered as fuzzy sets [18]. In this case, the patterns are assigned to several clusters in a nonexclusive way by determining the degree of fuzzy membership of every pattern to the present clusters. However, the geometrical constraints imposed by the membership function (MF) may represent a remarkable obstacle for the clustering analysis. In this regard, most algorithms tend to create spherical, ellipsoidal, or polygonal fuzzy clusters having a simple geometry that is computationally affordable but possibly unfit to the actual distribution of data.

Different taxonomies hold for both fuzzy and crisp algorithms; the most considered aspects are as follows:(i)-clustering or free-clustering techniques, according to the a priori determination of the number () of clusters;(ii)partitional or hierarchical (agglomerative/divisive) procedures for cluster generation, where the dataset is partitioned directly into a set of disjoint clusters or else the solution depends on the previous or successive ones in a hierarchical sequence;(iii)sequential (online) or batch (iterative) algorithms, through which either clusters are updated sequentially at any presentation of a new pattern or they are updated iteratively considering a given set of data. As discussed successively, there may exist hybrid cases when a dataset is used to determine clusters sequentially but several times, as, for example, in a learning process by epochs or in tuning procedures of parameters;(iv)model-based, distribution-based, or density-based clusters, when clusters are associated with geometric models defined in the data space or they are associated with suitable statistic distributions or density functions;(v)point-to-centroid or point-to-boundary based metrics, where the distances of patterns from clusters are computed considering a single prototype (i.e., a point or centroid) representing each cluster or distances are scaled according to the actual extension of clusters in the data space, independently of the use of model-based, distribution-based, or density-based clusters.

Nowadays, there are no clustering algorithms whose performance is universally recognized to be satisfactory for all problems. A trade-off is often necessary among computational complexity, model fitting, and explanatory tools of the clusters’ structure, depending on the nature of data under analysis and the specific field of application. Iterative algorithms perform clustering until a stopping rule is verified; they tend to be more accurate than sequential algorithms that, in turn, are faster but depend on the pattern presentation order. In this regard, well-known online clustering methods recently proposed in the literature are the recursive fuzzy -means [19], recursive Gustafson-Kessel clustering [20], recursive subtractive clustering (eTS) method [21], evolving clustering method (ECM) [22], dynamic evolving neural-fuzzy inference system (DENFIS) method [23], and so on.

Furthermore, -clustering techniques have a great limitation, since they are useful only for those problems when the number of clusters may be known in advance [24–26]. Actually, there is a huge amount of literature that focuses on the problem of “cluster validity,” that is, how to determine the optimal value of for a given dataset [27–29]. These methods are able to evaluate whether a final clustering result is better than another one by means of suited criteria as, for instance, the compactness and separability of clusters. Therefore, they usually work by defining an index and then by finding the minimum (or maximum) of the values associated with each clustering solution.

The underlying idea of this paper is to propose a new approach to fuzzy clustering, with the aim of relaxing some constraints imposed by known algorithms and using a new method for the computation of MFs. The starting point is Simpson’s idea of the well-known “Fuzzy Min-Max” clustering algorithm [30]: we propose a free-clustering, partitional, online algorithm using model-based clusters whose shape is determined in a new way by the convex hull computation. Our contribution comes from the awareness that Simpson’s method is very efficient but it has an important constraint, the shape of clusters, given that it creates hyperboxes parallel to the coordinate axes of the data reference frame only. This constraint will be removed by using the convex hull computation of clusters and, necessarily, an original methodology in order to define a metric associated with the MFs.

The use of unconstrained clusters in the analysis of large datasets allows us to assort patterns in extremely compact clusters [31, 32]. Nevertheless, we will show that the use of fuzzy logic combined with a more flexible geometry of clusters yields robust results with respect to the uncertainty of data by means of computationally efficient procedures [33–35]. Anyway, the approach herein proposed, essentially applied to the class of online algorithms and model-based clusters fitted by convex geometrical polytopes, can be generalized also to a larger choice of algorithms, even in the case of hierarchical procedures, iterative algorithms, and nonconvex models of clusters.

The paper is organized as follows. In Section 2, we introduce and discuss well-known techniques for convex hull computation with regard to their application in the field of pattern recognition, in particular for data clustering; an overview of the most relevant works presented in the literature is reported in Section 3. The new fuzzy clustering algorithm proposed in the paper is illustrated in detail in Section 4, where the use of convex hull is demonstrated by means of simple toy tests. Successively, the way by which MFs are determined in order to represent fuzzy clusters based on the adopted geometrical models is clearly explained in Section 5, while the performance of the proposed algorithm and its comparison with other popular clustering algorithms, considering different datasets, are reported in Section 6. Finally, our conclusions and discussions are drawn in Section 7.

#### 2. Convex Hull Computation and Data Clustering

In this paper, we propose a novel and generalized fuzzy clustering algorithm, which is useful to analyse data for online and real-time applications [36]. The shape of clusters is generalized by using less regular structures seemingly more complex but more computationally affordable and also it is able to fit better the local distribution of data, with less sparse geometrical structures as well as using more flexible and dynamic clustering rules. In this regard, we propose the use of the convex hull for the determination of irregular convex polytopes.

The convex hull of a set of points is the smallest convex set that contains these points, as illustrated in Figure 1 for 2D and 3D datasets. We can represent an -dimensional convex hull by a set of points in called “vertices” or, equivalently, by -dimensional faces called “facets.” Each facet is characterized by the following:(i)a set of vertices;(ii)a set of neighboring facets;(iii)a hyperplane equation.The -dimensional faces are the “ridges” of the convex hull; each ridge is the intersection of the vertices of two neighboring facets. The relationship between the number of vertices and facets of convex polytopes for is not trivial; for this reason the convex hull determination is also referred to as the “vertex enumeration” or “facet enumeration” problem.