Advances in Fuzzy Systems

Volume 2017, Article ID 7094046, 23 pages

https://doi.org/10.1155/2017/7094046

## An Extension of the Fuzzy Possibilistic Clustering Algorithm Using Type-2 Fuzzy Logic Techniques

Tijuana Institute of Technology, Tijuana, BC, Mexico

Correspondence should be addressed to Oscar Castillo; xm.anaujitcet@ollitsaco

Received 20 July 2016; Accepted 9 January 2017; Published 31 January 2017

Academic Editor: Ning Xiong

Copyright © 2017 Elid Rubio et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this work an extension of the Fuzzy Possibilistic C-Means (FPCM) algorithm using Type-2 Fuzzy Logic Techniques is presented, and this is done in order to improve the efficiency of FPCM algorithm. With the purpose of observing the performance of the proposal against the Interval Type-2 Fuzzy C-Means algorithm, several experiments were made using both algorithms with well-known datasets, such as Wine, WDBC, Iris Flower, Ionosphere, Abalone, and Cover type. In addition some experiments were performed using another set of test images to observe the behavior of both of the above-mentioned algorithms in image preprocessing. Some comparisons are performed between the proposed algorithm and the Interval Type-2 Fuzzy C-Means (IT2FCM) algorithm to observe if the proposed approach has better performance than this algorithm.

#### 1. Introduction

Different areas of research have widely used clustering algorithms for different purposes, such as image segmentation [1, 2], data mining [3], pattern recognition [4], classification [5], and modeling [6]. Clustering algorithms arise due to need to find data groups that share similar features in a given dataset; at this time there are several fuzzy clustering algorithms, such as FCM [4], PCM [7], FPCM [8], and PFCM [8]. The acceptance of these algorithms is due to the fact that they permit a datum to belong to different data clusters into a given dataset.

However, the algorithms mentioned above do not have the capability to handle the uncertainty that lies within a dataset during the clustering process; because of this, some of these algorithms (FCM and PCM) have been improved using Type-2 Fuzzy Logic Techniques [9, 10], and the improvement of these algorithms has been called Interval Type-2 Fuzzy C-Means (IT2FCM) [11, 12] and Interval Type-2 Possibilistic C-Means (IT2PCM) [12, 13], respectively. These algorithms have been used for different purposes, such as modeling [14–17], creation of membership functions [18, 19], image processing [20, 21], and classification [22]. In recent years research has also been performed in the extension of other clustering algorithms using Type-2 Fuzzy Logic Techniques, such as the ones proposed in [13, 23–27].

In this work we are presenting the extension of the FPCM using Type-2 Fuzzy Logic Techniques to provide this method with the capability of handling a higher degree of uncertainty in a dataset to solve real world problems where data clustering is involved. Other clustering algorithms have been extended using Type-2 Fuzzy Logic Techniques, but the FPCM algorithm has not been previously extended using these techniques.

This paper is organized as follows. Section 2 describes the extension of the FPCM algorithm presented in this paper, Section 3 shows the concept of cluster validation index to measure the performance of the clustering algorithm, Section 4 shows the results obtained by the IT2FPCM algorithm and its comparison with the IT2FCM algorithm, and Section 5 contains the conclusions and future work.

#### 2. Interval Type-2 Fuzzy Possibilistic C-Means Algorithm

This is an extension of the FPCM algorithm proposed by N. R. Pal et al. in 1997, using Type-2 Fuzzy Logic Techniques, and in the same way that FPCM algorithm produces membership and possibilities using the weight exponents and for the fuzziness and possibility, respectively, this may now be represented by a range rather than a precise value; that is, = [, ], where and represent the lower and upper limit of weighting exponent for fuzziness and = [, ], where and represent the lower and upper limit of weighting exponent for possibility.

Because the value is represented by an interval, the fuzzy partition matrix must be calculated for the interval [, ]; for this reason would be given by the belonging interval [, ], where and represent the lower and upper limit of the belonging interval of datum to a clustering , and updating the lower and upper limits of the range of the fuzzy membership matrix can be expressed asBecause the value is represented by an interval, the possibilistic partition matrix must be calculated for the interval [, ], and for this reason would be given by the belonging interval [], where and represent the lower and upper limit of the belonging interval of datum to a clustering , and the update of the lower and upper limits of the range of the fuzzy membership matrix can be expressed asUpdating the positions of the centroids of clusters should take into account the degree of belonging interval of the fuzzy and possibilistic matrices, resulting in a range of coordinates of the positions of the centroids of the clusters. The procedure for updating cluster prototypes in IT2FPCM requires calculating the centroids for the lower and upper of the limit of the interval using the fuzzy and possibilistic membership matrices, and these centroids are given by the following equations:The centroid calculation for the lower and upper limits of the interval results in an interval of coordinates of positions of the clusters centroids. Type-reduction and defuzzification use the type-2 fuzzy operations. The centroids matrix and the fuzzy partition matrix are obtained by the type-reduction operation as shown in the following equations:This extension on the FPCM algorithm is intended to show that this algorithm is capable of handling uncertainty and is less susceptible to noise. Figure 3 shows the graphical representation of the steps FPCM algorithm in a block diagram where we can appreciate the operation of the Fuzzy Possibilistic C-Means algorithm step by step.

#### 3. Cluster Validation

Cluster validation is one of the main topics in data clustering; this problem consists in finding and objective criterion to determine how good a partition generated by the clustering algorithm is. Nowadays there exist several index validation methods mentioned in [28–32], but these indices are proposed for validation of clusters found by Type-1 Fuzzy clustering algorithms. In order to evaluate the lower and upper bound of the interval of clusters found by the IT2FPCM and IT2FCM algorithms with some of the these indices of validation, we need to modify the following indices of validation to evaluate the partitions found by the Interval Type-2 Fuzzy clustering proposed in this work:(i)Partition entropy index,(ii)Xie-Beni Index,(iii)MPE-DMFP index.The partition entropy was proposed by Bezdek [2, 5, 6] as a validation index for the Fuzzy C-Means algorithm and was defined by the following equation:In a general we can define an optimal number of clusters with the solution for PE to produce a better performance by grouping the dataset . To make this index able to evaluate the lower and upper bounds we need to compute the following equations to the upper and lower bounds, respectively:Xie and Beni in 1991 proposed a validation index based on compactness and separation [2, 5, 6], which is defined by the following equation:In general, an optimal number of clusters is found by solving for XB to produce a better clustering performance for the dataset . To make this index able to evaluate the lower and upper bounds we compute the following equations to the upper and lower bounds, respectively:Elid Rubio et al. proposed the MPD-DFP index, which is composed of two metrics, the modified partition entropy index and the sum of the distances between the means of the fuzzy partitions. This validation index is represented by the following equation:where the modified partition entropy that represents the variation of the data in clusters of the dataset is represented by the following equations:And the sum of the distances between the means of the fuzzy partition that represents the separation between clusters in the datasetwhere is the mean of the fuzzy partitions generated by the Fuzzy C-Means algorithm. In general, we can define an optimal number of clusters for the solution to produce a better performance by grouping the dataset . To make this index able to evaluate the lower and upper bounds of the interval cluster we compute the following equations to the upper and lower bounds, respectively:where and represent the variation of the data in clusters of the dataset for the upper and lower bounds of the interval of clusters, respectively, and are represented by the following equations:and where and represent the separation between clusters in the dataset for the upper and lower bounds of the interval of clusters, respectively, and are represented by the following equations:

#### 4. Results of the Implementation of the IT2FPCM Algorithm

The IT2FPCM algorithm was tested with several benchmark datasets and images, in order to observe if the IT2FPCM algorithm is better than the IT2FCM algorithm. We perform 30 experiments using the Wine, WDBC, Iris Flower, Ionosphere, Abalone, and Cover type datasets. In order to observe the performance of the IT2FPCM algorithm against the IT2FCM algorithm we perform the data clustering of the datasets mentioned above with both algorithms mentioned above to compare the results obtained by these algorithms, and to measure the performance of these algorithms we use the validation indices mentioned in the previous section.

In Tables 1, 2, and 3, we show the results obtained for the WDBC dataset with 30 dimensions and 2 clusters with 569 samples; this dataset was tested with 2 to 10 clusters with the IT2FPCM and IT2FCM algorithms using different validation indices to evaluate the performance of both algorithms. The results that are shown are the mean of 30 experiments for each number of clusters tested in both algorithms. We can observe in Tables 1 and 2 that both algorithms find the correct number of clusters for the lower and upper bound of the interval and its defuzzification using the IT2PE and IT2MPE-DMFP validation indices. In Table 3 we can observe that with the IT2XB validation index the IT2FPCM did not find the correct number of clusters for the lower bound of the interval, but for the upper bound and defuzzification of the lower and upper bound of the interval it found the correct number of clusters.