Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms

Chaudhuri, Arindam

doi:https://doi.org/10.1155/2015/238237

Advances in Fuzzy Systems

On this page

Abstract Introduction Experimental Results Conclusion References Copyright Related Articles

Special Issue

Fuzzy Methods for Data Analysis

View this Special Issue

Research Article | Open Access

Volume 2015 | Article ID 238237 | https://doi.org/10.1155/2015/238237

Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms

Arindam Chaudhuri¹

Academic Editor: Ferdinando Di Martino

Received24 Aug 2014

Accepted04 Oct 2014

Published23 Mar 2015

Abstract

Intuitionistic fuzzy sets (IFSs) provide mathematical framework based on fuzzy sets to describe vagueness in data. It finds interesting and promising applications in different domains. Here, we develop an intuitionistic fuzzy possibilistic C means (IFPCM) algorithm to cluster IFSs by hybridizing concepts of FPCM, IFSs, and distance measures. IFPCM resolves inherent problems encountered with information regarding membership values of objects to each cluster by generalizing membership and nonmembership with hesitancy degree. The algorithm is extended for clustering interval valued intuitionistic fuzzy sets (IVIFSs) leading to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM). The clustering algorithm has membership and nonmembership degrees as intervals. Information regarding membership and typicality degrees of samples to all clusters is given by algorithm. The experiments are performed on both real and simulated datasets. It generates valuable information and produces overlapped clusters with different membership degrees. It takes into account inherent uncertainty in information captured by IFSs. Some advantages of algorithms are simplicity, flexibility, and low computational complexity. The algorithm is evaluated through cluster validity measures. The clustering accuracy of algorithm is investigated by classification datasets with labeled patterns. The algorithm maintains appreciable performance compared to other methods in terms of pureness ratio.

1. Introduction

Clustering algorithms [1, 2] form an integral part of computational intelligence and pattern recognition research. Clustering analysis is commonly used as an important tool to classify collection of objects into homogeneous groups, such that objects within a given group are similar to each other whereas objects within different groups are dissimilar to each other. The concept is based on the notion of similarity, which is a basic component of intelligence and ubiquitous to scientific endeavor. Clustering finds numerous applications [3] across a variety of disciplines such as taxonomy, image processing, information retrieval, data mining, pattern recognition, microbiology, archaeology and geographical analysis, and so forth. It is an exploratory tool for deducing the nature of data by providing labels to individual objects that describe how the data separate into groups. It has improved the performance of other systems by separating the problem domain into manageable subgroups [4]. Often researchers are confronted with the challenging datasets that are large and unlabeled. There are many methods available in exploratory data analysis [5, 6] by which researchers can elucidate these data.

Clustering an unlabeled dataset is partitioning of into subgroups such that each subgroup represents natural substructure in . This is done by assigning labels to vectors in and hence to objects generating . A partition of is a set of values that can be conveniently represented as matrix . There are generally three sets of partition matrices [7, 8]:The matrix in (1) has the property that for any there exists at least an index such that is greater than 0. The matrix in (2) states that if is equal to 1 for any , it is obvious that is greater than 0. The matrix in (3) is formed by boolean matrices as a subset of matrix in (2). The equations (1), (2), and (3) thus define the sets of possibilistic, fuzzy, or probabilistic and crisp partitions of , respectively. Hence, there are thus four kinds of label vectors, but fuzzy and probabilistic label vectors are mathematically identical having entries between 0 and 1 that sum to 1 over each column. The reason these matrices are called partitions follows from the interpretation of their entries. If is crisp or fuzzy, is taken as a membership of in th partitioning fuzzy subset or cluster of . If in is probabilistic, is the posterior probability . If in is possibilistic, it has entries between 0 and 1 that do not necessarily sum to 1 over any column. In this case, is taken as the possibility that belongs to class . An alternate interpretation of possibility is that it measures the typicality of to cluster . It is observed that . A clustering algorithm finds which best explains and represents an unknown structure in with respect to the model that defines . For in is represented uniquely by the hard 1 partition, which unequivocally assigns all objects to a single cluster, and is represented uniquely by , the identity matrix up to a permutation of columns. In this case, each object is in its own singleton cluster. Choosing or rejects the hypothesis that contains clusters.

In the last few decades, variety of clustering techniques [3, 5, 6, 9, 10] has been developed to classify data. Clustering techniques are broadly divided into hierarchical and partition methods. Hierarchical clustering [5] generates hierarchical tree of clusters called dendrogram which can be either divisive or agglomerative [3]. The former is a top-down splitting technique which starts with all objects in one cluster and forms hierarchy by dividing objects into smaller clusters in an iterative procedure until the desired number of clusters is achieved or considered objects which is constituted as unique cluster. The latter starts by considering each object as cluster, followed by comparing them amongst themselves using distance measure. The clusters with smaller distance are considered as constituting unique group and then merged. The merging procedure is repeated until the desirable number of clusters is achieved or only one cluster is left with all considered objects. Partition clustering method gives single partition of objects, with being the predefined number of clusters [11]. One of the most widely used partition clustering algorithms is fuzzy C means (FCM). FCM is a combination of means clustering algorithm and fuzzy logic [1, 7]. It works iteratively in which the desired number of clusters and initial seeds are predefined. FCM algorithm assigns memberships to which are inversely proportional to relative distance of to point prototypes that are cluster centers. For , if is equidistant from two prototypes, the membership of in each cluster will be the same regardless of absolute value of the distance of from two centroids as well as from other points. The problem this creates is noise points which are far but equidistant from central structure of two clusters that can never be given equal membership, when it seems far more natural that such points are given very low or no membership in either cluster. This problem was overcome by Krishnapuram and Keller [8], who proposed possibilistic C means (PCM) which relaxes column sum constraint in (2) so that sum of each column satisfies the constraint . In other words, each element of th column can be between 0 and 1, as long as at least one of them is positive. They suggested that value should be interpreted as typicality of relative to cluster . They interpreted each row of as possibility distribution over . The objective function of PCM algorithm sometimes helps to identify outliers, that is, noise points. However, Barni et al. [12] pointed that PCM pays price for its freedom to ignore noise points such that it is very sensitive to initializations and sometimes generates coincident clusters. Moreover, typicality can be very sensitive to the choice of additional parameters needed by PCM algorithm. The coincident cluster problem of PCM algorithm was avoided by two possibilistic fuzzy clustering algorithms proposed by Timm et al. [13–15]. They modified PCM objective function by adding an inverse function of distances between the cluster centers. This term acts in repulsive nature and avoids coincident clusters. In [13, 14], Timm et al. used the same concept to modify objective function as used by Gustafson and Kessel [16] clustering algorithm. These algorithms exploit benefits of both fuzzy and possibilistic clustering. Pal et al. [17] justified the need for both possibility, that is, typicality and membership values, and proposed a model and corresponding algorithm called fuzzy possibilistic C means (FPCM). This algorithm normalizes possibility values, so that the sum of possibilities of all data points in a cluster is 1. Although FPCM is much less prone to errors encountered by FCM and PCM, possibility values are very small when size of dataset increases.

The notion of intuitionistic fuzzy set (IFS) coined by Atanassov [22] for fuzzy set generalizations has interesting and useful applications in different domains such as logic programming, decision making problems, and medical diagnostics [23–26]. This generalization presents degrees of membership and nonmembership with a degree of hesitancy. Thus, knowledge and semantic representation become more meaningful and applicable [27, 28]. Sometimes it is not appropriate to assume that membership and nonmembership degree of an object are exactly defined [29], but value ranges or value intervals can be assigned. In such cases, IFS can be generalized and interval valued intuitionistic fuzzy set (IVIFS) [29] can be defined whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe and deal with vague and uncertain data [28, 30]. With this motivation, it is desirable to develop some practical approaches to clustering IFSs and IVIFSs. Intuitionistic fuzzy similarity matrix was defined by [31] and thereby intuitionistic fuzzy equivalence matrix was developed. The work in [31] gave an approach to transform intuitionistic fuzzy similarity matrices into intuitionistic fuzzy equivalence matrices, based on which a procedure for clustering intuitionistic fuzzy sets was proposed. Some methods for calculating association coefficients of IFSs or IVIFSs and corresponding clustering algorithm were introduced by [32]. The algorithm used derived association coefficients of IFSs or IVIFSs to construct an association matrix and utilized the procedure to transform it into an equivalent association matrix. Reference [33] introduced an intuitionistic fuzzy hierarchical algorithm for clustering IFSs which is based on traditional hierarchical clustering procedure and intuitionistic fuzzy aggregation operator. These algorithms cannot provide information about membership degrees of objects to each cluster.

In this work, an intuitionistic fuzzy possibilistic C means (IFPCM) algorithm to cluster IFSs is developed. IFPCM is obtained by applying IFSs to FPCM which is a known clustering method based on basic distance measures between IFSs [34, 35]. At each stage of the algorithm seeds are modified and for each IFS membership and typicality degrees to each of the clusters are estimated. The algorithm ends when all given IFSs are clustered according to estimated membership and typicality degrees. It overcomes the inherent problems encountered with information regarding membership values of objects to each cluster by generalizing membership and nonmembership with hesitancy degree. The algorithm is then extended to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM) for clustering IVIFSs. The algorithms are illustrated through conducting experiments on different datasets. The evaluation of the algorithm is performed through cluster validity measures. The clustering accuracy of the algorithm is determined by classification datasets with labeled patterns. IFPCM algorithm is simple and flexible in nature and provides information about membership and typicality degrees of samples to all clusters with low computational complexity.

This paper is organized as follows. In the next section, the concepts of IFSs and IVIFSs are defined. FPCM clustering algorithm is given in Section 3. The next section presents IFPCM clustering algorithms for IFSs and IVIFSs, respectively. The experimental results on both real world and simulated datasets are illustrated in Section 5. Finally, in Section 6 conclusions are given.

2. Intuitionistic Fuzzy Sets and Interval Valued Intuitionistic Fuzzy Sets

In this section, we present some basic definitions associated with IFSs and IVIFSs.

Definition 1. Considering as universe of discourse [22], then IFS is defined asIn (4) and are the membership and nonmembership degrees, respectively, satisfying the following constraints: Equation (5) is subject to the condition that , .

Definition 2. For each IFS in , if , then is called hesitation degree (or intuitionistic index) [36] of to . Obviously is specified in the range ; especially if , then IFS is reduced to fuzzy set. If and have 0 values such that , then IFS is completely intuitionistic.

Considering the fact that the elements ; in universe have different importance, let us assume should be the weight vector of ; withXu [37] defined the following weighted Euclidean distance between IFSs and :In particular, if , then (7) is reduced to normalized Euclidean distance [34] which is defined as follows:Atanassov and Gargov [29] pointed out that sometimes it is not appropriate to assume that membership and nonmembership degrees of the element are exactly defined but value ranges or value intervals can be given. In this context, Atanassov and Gargov [29] extended IFS and introduced the concept of IVIFS, which is characterized by a membership degree and a nonmembership degree, whose values are intervals rather than exact numbers.

Definition 3. An IVIFS over is an object having the following form [29]: Here, and are intervals , , , , and and , where , . In particular, if and , then is reduced to an IFS.

Now we extend the weighted Euclidean distance measure given in (7) to IVIFS theory:Particularly, if , then (10) is reduced to normalized Euclidean distance which is given as follows:

3. Fuzzy Possibilistic C Means Clustering Algorithm

This section illustrates FPCM clustering algorithm proposed by Pal et al. [17] in 1997 to exploit the benefits of fuzzy and possibilistic modeling while circumventing their weaknesses. To correctly interpret the data substructure, FPCM clustering uses both memberships (relative typicality) and possibilities (absolute typicality). When we want to crisply label a data point, membership is a plausible choice as it is natural to assign a point to cluster whose prototype is closest to the point. On the other hand, while estimating the centroids, typicality is an important means for alleviating the undesirable effects of outliers. Here, the number of clusters is fixed a priori to a default value considering the dataset used in the application such that it is completely data driven. Generally it is advisable to avoid trivial clusters which may be either too large or small.

FPCM extends FCM clustering algorithm [17] by normalizing possibility values so that sum of possibilities of all data points in a cluster is 1. Although FPCM is much less prone to the problems of both FCM and PCM, the possibility values are very small when size of the dataset increases. Analogous to FCM clustering algorithm, the membership term in FPCM is a function of data point and all centroids. The typicality term in FPCM is a function of data point and cluster prototype alone. That is, the membership term is influenced by the positions of all cluster centers whereas typicality term is affected by only one. Incorporating the abovementioned facets the FPCM model is defined by the following optimization problem [17]: The transpose of admissible ’s is member of set . is viewed as a typicality assignment of objects to clusters. The possibilistic term distributes with respect to all data points, but not with respect to all clusters. Under the usual conditions placed on c-means optimization problems, the first order necessary conditions for extrema of are stated in terms of the following theorem.

Theorem FPCM (see [17]). If and and contains at least distinct data points, then may minimize only if

The proof of the above theorem follows from [38]. FPCM has the same type of singularity as FCM. FPCM does not suffer from the sensitivity problem that PCM seems to exhibit. Unfortunately, when the number of data points is large, the typicality values will be very small. Thus, after FPCM-AO algorithm [38] for approximating solutions to (12) based on iteration through (17) terminates, the typicality values may need to be scaled up. Conceptually, this is not different than scaling typicality as is done in PCM. While scaling seems to solve the small value problem which is caused by row sum constraint on , the scaled values do not possess any additional information about points in the data. Thus scaling is an artificial fix for a mathematical drawback of FPCM.

4. Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms

In this section, we discuss intuitionistic fuzzy possibilistic C means clustering algorithms for IFSs and IVIFSs, respectively.

4.1. Intuitionistic Fuzzy Possibilistic C Means Algorithm for IFSs

We develop the intuitionistic fuzzy possibilistic C means (IFPCM) model and corresponding algorithm for IFSs. We take the basic distance measure in (7) as proximity function of IFPCM; the objective function of IFPCM model can then be defined as follows: Here are IFSs each with elements, is the number of clusters , and are the prototypical IFSs, that is, centroid of the clusters. The parameter is the fuzzy factor, is the membership degree of th sample to the th cluster, is matrix of order , parameter is the typicality factor, is the typicality of th sample to the th cluster, and is typicality matrix.

To solve the optimization problem stated in (18), we make use of Lagrange multiplier method [39], which is discussed below. Consideringwhere,

Furthermore, , ; letFrom the above system of equations, we have the following expressions:Now we proceed to compute ; , the prototypical IFSs. Let us assume thatFrom the above expression we have For simplicity, we define weighted average operator for IFSs as follows.

Let be a set of IFSs each with elements; let be a set of weights for IFSs, respectively, with ; and then the weighted operator is defined as According to (25) to (28), if we assume the prototypical IFSs of the IFPCM model can be computed as follows:Since the above equations (22), (23), and (30) are computationally interdependent, we exploit an iterative procedure similar to the FPCM algorithm to solve these equations. The steps of algorithm are as follows.

IFPCM Algorithm

Step 1. Initialize the seed values ; let and set .

Step 2(i). Calculate , where(a)if , then ; , ,(b)if such that , then let and .

Step 2(ii). Calculate , where(a)if , then ; , ,(b)if such that , then let and .

Step 3. Calculate , where

Step 4. If , then go to Step 5; otherwise, let , and return to Step 2.

Step 5. End

The pseudocode of the IFPCM algorithm is given in Algorithm 1.

Given an unlabeled dataset , partition into clusters such
that objective function is minimized
(1) Input: Consider the seed values and assume and set
(2) Output: Generate clusters using the IFPCM clustering algorithm for IFSs
(3) begin procedure
(4) repeat
(5) calculate
(6) begin
(7) if then
(8)
(9) end if
(10) if then
(11) and 0
(12) end if
(13) end
(14) calculate
(15) begin
(16) if then
(17)
(18) end if
(19) if then
(20) and 0
(21) end if
(22) end
(23) calculate

(24)
(25) until
(26) end procedure

4.2. Interval Valued Intuitionistic Fuzzy Possibilistic C Means Algorithm for IVIFSs

If the collected data are expressed as IVIFSs, then we extend IFPCM to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM) model. We take the basic distance measure in (10) as the proximity function of the IVIFCM. The objective function of IVIFPCM model can be defined as follows:Here are IVIFSs each with elements, is the number of clusters , and are the prototypical IVIFSs, that is, centroids of the clusters. The parameter is the fuzzy factor, is the membership degree of th sample to the th cluster, is matrix of order , parameter is the typicality factor, is the typicality of th sample to the th cluster, and is typicality matrix.

To solve the optimization problem stated in (30) to (35), we make use of Lagrange multiplier method [39], which is discussed below. Consideringwhere,Similar to IFPCM model, we establish the system of partial differential functions of as follows:The solution for the above system of equations is where Because (41) and (42) are computationally interdependent, we exploit a similar iteration procedure as follows.

IVIFPCM Algorithm

Step 1. Initialize the seed values ; let and set .

Step 2(i). Calculate , where(c)if , then ; , .(d)if such that , then let and .

Step 2(ii). Calculate , where(c)if , then ; , ,(d)if such that , then let and .

Step 3. Calculate , where

Step 4. If , then go to Step 5; otherwise let , and return to Step 2.

Step 5. End

The pseudocode of the IVIFPCM algorithm is given in Algorithm 2.

Given an unlabeled dataset , partition into clusters such
that objective function is minimized
(1) Input: Consider the seed values and assume and set
(2) Output: Generate clusters using the IFPCM clustering algorithm for IVIFSs
(3) begin procedure
(4) repeat
(5) calculate
(6) begin
(7) if then
(8)
(9) end if
(10) if then
(11) and 0
(12) end if
(13) end
(14) calculate
(15) begin
(16) if then
(17) ;
(18) end if
(19) if then
(20) and 0
(21) end if
(22) end
(23) calculate

(24)
(25) until
(26) end procedure

5. Experimental Results

In this section, we enumerate the results of experiments performed on both real world and simulated datasets [32] in order to demonstrate the effectiveness of IFPCM clustering algorithm. IFPCM algorithm is implemented through MATLAB. We first explain the steps of the algorithm by the use of some experimental data which is evaluated through cluster validity measures. Next, the algorithm is applied to some classification datasets, that is, data with labeled patterns in order to examine its clustering accuracy.

5.1. Application of IFPCM Algorithm on Experimental Data

The parameters set in IFPCM algorithm are shown in Table 1. It is to be noted that if , then IFPCM is reduced to FPCM algorithm. Hence, we present a comparative performance of both the algorithms.

The experimental data used here is investment portfolio dataset which contains information regarding ten investments at the disposal of the investor to invest some money to be classified in ICICI prudential financial services, India. Let ; be the investments described by six attributes, namely : investment price; : advance mobilization; : time period; : return on investment; : risk factor; : security factor. The weight vector of these attributes is . The characteristics of ten investments under six attributes are represented by IFSs in Table 2. Simulated datasets are used for comparing with the experimental data. We assume that there are three classes in the simulated dataset, ; . The number of IFSs in each class is considered as 300. The different classes have different IFSs which are characterized as follows:(a)IFSs in have relatively high and positive scores,(b)IFSs in have relatively high and uncertain scores,(c)IFSs in have relatively high and negative scores.Considering this, we generate simulated dataset as follows: (a),(b),(c).Here, is the uniform distribution on the interval . We thus generate a simulated dataset which consists of 3 classes comprising 900 IFSs.

5.1.1. Cluster Validity Measures

In IFPCM algorithm big challenge lies in setting the parameter , that is, the number of clusters. To resolve this, we use two relative measures for fuzzy cluster validity mentioned in [40], namely, partition coefficient (PC) and classification entropy (CE). The descriptions of these two measures are given in Table 3. In PC and CE is number of samples in the dataset.

5.1.2. IFPCM Algorithm on Investment Portfolio Dataset

IFPCM algorithm is used to cluster ten investments ; , involving the following steps.

Step 1. Let and . Now randomly select initial centroids from the dataset:

Step 2(i). Calculate the membership degrees and centroids iteratively. According to (15), we have

Step 2(ii). Calculate the typicality degrees and centroids iteratively. According to (16), we have

Step 3. According to (17), we update the centroids as follows:

Step 4. We made a check whether to stop the iterations:Since this value exceeds the chosen threshold value, we continue with the next iteration.

When , Since this value exceeds the chosen threshold value, we continue with the next iteration.

When ,

Since this value is less than the threshold value, we stop the iterations and calculate the values of and when :According to and , cluster validation measures and are calculated as If we further assume that , , where denotes cluster , then we have clusters as shown in Table 4.

5.1.3. Convergence of IFPCM Algorithm

Now, we proceed to investigate the convergence of IFPCM algorithm on investment portfolio dataset. The movements of objective function values are shown in Figure 1 along the iterations. As evident from Figure 1, the IFPCM algorithm decreases the objective function value continuously by iterating two phases, namely, updating the membership and typicality degrees in (15) and (16) and updating prototypical IFSs in (17). The IFPCM algorithm has lower computational complexity as compared to other clustering algorithms [1–3, 5–7, 9]. The space and time complexities of IFPCM algorithm are and , where is the number of samples, is the number of IFSs in sample, is the number of clusters, and is the maximum number of iterations preset for optimal value search process. Some advantages of IFPCM algorithm include simplicity and flexibility, information about the membership, and typicality degrees of samples to all clusters and relatively low computational complexity.

5.1.4. Comparative Performance of IFPCM and FPCM on Investment Portfolio Dataset

In this subsection, we present a comparative performance of IFPCM and FPCM algorithms. We first experiment IFPCM algorithm on the simulated dataset. Here, we set a series of values in the range of 2 to 10 and compute and measures for each clustering result. The results are given in Table 5, where is the objective function value after convergence of IFPCM algorithm. The optimal values of cluster validity measures are highlighted.

As evident from Table 5, when reaches its optimal value 0.9664 (maximum) and also reaches its optimal value 0.1866 (minimum), this implies that both and are capable of finding optimal number of clusters, that is, . However, this is not the case for objective function value. From Figure 2, as the number of clusters increases, decreases continuously and finally reaches 1.1969 when . Hence, the usage of and is justified in the evaluation of clustering results produced by IFPCM algorithm. Next, we experiment FPCM algorithm on the simulated dataset for comparison purpose. The results are given in Table 6. The optimal values of cluster validity measures are highlighted. As indicated by and values in Table 6, FPCM algorithm prefers to cluster the modified simulated datasets into three clusters which are actually away from four true clusters in the data. In other words, FPCM algorithm cannot identify all four classes precisely. This further signifies the importance of uncertainty information in IFSs.

5.2. Examining Clustering Accuracy of IFPCM Algorithm

To assess the ability of IFPCM algorithm to explore natural clusters in real world data, 11 classification and two clustering datasets with numerical attributes are chosen from the University of California at the Irvine Machine Learning Repository [41] and Knowledge Extraction based on Evolutionary Learning Repository [42]. In Table 7, these datasets are summarized.

The accuracy of IFPCM algorithm is adhered by removing the class labels of data before applying the algorithm. Each attribute value of all datasets is rescaled to a unit interval via linear transformation. The clustering results of the application of IFPCM algorithm on 11 classification datasets are shown in Table 8, where FPCM algorithm results as a benchmark fuzzy clustering method are also provided. The threshold for effectiveness measure is set to 0.1 for all the datasets, provided that at least two clusters are explored. For fairness of comparison between IFPCM and FPCM algorithms, the number of clusters that are needed by FPCM as a parameter for each dataset is set to the number of clusters that are explored by IFPCM. In this table, two super cells for each dataset are confusion matrices [43], which represent the clustering accuracy of IFPCM and FPCM algorithms on that data. In confusion matrix, cell contains the number of patterns with class label which are grouped by cluster . Accordingly, the cells in each row of actual class label are summed up to the number of patterns in that class. In addition, the summation of each column’s cells represents the number of patterns in that cluster. Ideally, optimal clusters are achieved when patterns of each class are covered by only one cluster and each cluster just contains patterns of one class. Such a case occurred for the first class of the Iris dataset in both clustering methods. As evident from results in Table 8, the performance of IFPCM is better as compared to FPCM algorithm in all datasets.

Although confusion matrices for Vehicle and WDBC datasets show almost identical overall performance, the clustering accuracy of IFPCM algorithm for Iris, Thyroid, Cancer, Glass, and Sonar datasets is comparatively better. On the other hand, FPCM algorithm obtains better performance for Ecoli, Vowel, Wine, and Ionosphere datasets. IFPCM algorithm explores potential clusters that are embedded in datasets and needs only a distinguishing threshold for effectiveness measure while the number of clusters in FPCM algorithm is provided in advance. The clusters obtained by FPCM algorithm convey no specific cognitive interpretation while those clusters explored by IFPCM algorithm are identified by intuitionistic measure. This intuitionistic interpretability which represents the clusters justifies the claim that IFPCM algorithm is more suitable for knowledge discovery in datasets. The IFPCM algorithm is more robust to outliers and noise in data. Moreover, the computational cost of IFPCM algorithm is higher than that of FPCM algorithm as given in Table 9. Although some of the datasets which are used in the experiments are high dimensional, they are not too large. The application of IFPCM algorithm on large datasets consumes greater CPU time. Since, threshold for effectiveness measure is set to 0.1 for all data given in Table 8, the number of clusters that are explored for multiclass datasets is less than their classes. Consequently, IFPCM algorithm needs a lower threshold to explore more clusters. Table 10 illustrates clustering results obtained for these datasets when threshold is set to 0.01.

To compare the effectiveness of IFPCM algorithm with other fuzzy clustering methods, some recently developed algorithms have been considered and their results on some real world datasets are presented in Table 11. The performance of these methods is expressed in terms of pureness ratio which is the average pureness of clusters after cluster labeling that is based on maximum number of sample classes in each cluster. Along with FPCM algorithms, some other clustering algorithms that run on these datasets are FCM, PCM [8], -cut FCM (AFCM) [18], entropy based fuzzy clustering (EFC) [19], fuzzy mixture of Student’s t factor analyzers (FMSFA) [20], and fuzzy principal component analysis guided robust k-means (FPRk) [21]. The IFPCM algorithm maintains appreciable performance compared to other methods in terms of pureness ratio although this is not true for clustering accuracy. The specification of threshold in IFPCM algorithm for effectiveness measure is more intuitionistic and less data dependent in nature.

6. Conclusion

In this paper, we have proposed IFPCM and IVIFPCM algorithms to cluster IFSs and IVIFSs, respectively. Both the algorithms are developed by integrating concepts of FPCM, IFSs, IVIFSs, and basic distance measures. In interval valued intuitionistic fuzzy environments, the clustering algorithm has membership and nonmembership degrees as intervals rather than exact numbers. The algorithms overcome problems involved with membership values of objects to each cluster by generalizing degrees of membership of objects to each cluster. This is achieved by extending membership and nonmembership degrees with hesitancy degree. The algorithms also provide information about membership and typicality degrees of samples to all clusters. Experiments on both real world and simulated datasets show that IFPCM has some notable advantages over FPCM. IFPCM algorithm is simple and flexible. It generates valuable information and produces overlapped clusters where instances have different membership degrees in accordance with different real world applications. The algorithm has relatively lower computational complexity. It also takes into account inherent uncertainty in information captured by IFSs which becomes crucial for success of some clustering tasks. The evaluation of the algorithm is performed through cluster validity measures. The clustering accuracy of the algorithm is determined by classification datasets with labeled patterns. IFPCM maintains appreciable performance compared to other methods in terms of pureness ratio although this is not true for clustering accuracy. For multiclass datasets there is a chance for exploring fewer clusters than classes. This is handled by decreasing value of threshold for effectiveness measure. The specification of threshold is more intuitionistic and less data dependent in nature. For an unknown dataset, IFPCM must compute cluster accuracy measure for all potential clusters. A sudden drop in values should be considered as stopping criterion whereby the number of clusters is determined which can be explored. The different drawbacks of FPCM are effectively handled by possibilistic fuzzy C means (PFCM) model proposed by Pal et al. in 2005. Our future work entails development of IFSs framework for PFCM.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

References

R. Duda, P. Hart, and D. Stork, Pattern Classification, John Wiley & Sons, New York, NY, USA, 2nd edition, 2000.
View at: MathSciNet
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ, USA, 1988.
View at: MathSciNet
A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.
View at: Publisher Site | Google Scholar
H. Frigui, “Simultaneous clustering and feature discrimination with applications,” in Advances in Fuzzy Clustering and Feature Discrimination with Applications, pp. 285–312, John Wiley & Sons, New York, NY, USA, 2007.
View at: Google Scholar
B. S. Everitt, S. Landau, and M. Leese, Cluster Analysis, Oxford University Press, Oxford, UK, 2001.
W. Pedrycz, Knowledge Based Clustering, John Wiley & Sons, Hoboken, NJ, USA, 2005.
J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, NY, USA, 1981.
View at: MathSciNet
R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering,” IEEE Transactions on Fuzzy Systems, vol. 1, no. 2, pp. 98–110, 1993.
View at: Publisher Site | Google Scholar
M. R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, NY, USA, 1972.
F. D. A. T. de Carvalho, “Fuzzy c-means clustering methods for symbolic interval data,” Pattern Recognition Letters, vol. 28, no. 4, pp. 423–437, 2007.
View at: Publisher Site | Google Scholar
G. Beliakov and M. King, “Density based fuzzy $c$ -means clustering of non-convex patterns,” European Journal of Operational Research, vol. 173, no. 3, pp. 717–728, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
M. Barni, V. Cappellini, and A. Mecocci, “Comments on “a possibilistic approach to clustering”,” IEEE Transactions on Fuzzy Systems, vol. 4, no. 3, pp. 393–396, 1996.
View at: Publisher Site | Google Scholar
H. Timm, C. Borgelt, C. Doring, and R. Kruse, “Fuzzy cluster analysis with cluster repulsion,” in Proceedings of the European Symposium in Intelligent Technologies, Tenerife, Spain, 2001.
View at: Google Scholar
H. Timm and R. Kruse, “A modification to improve possibilistic fuzzy cluster analysis,” in Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '02), pp. 1460–1465, Honolulu, Hawaii, USA, May 2002.
View at: Google Scholar
H. Timm, C. Borgelt, C. Doring, and R. Kruse, “An extension to possibilistic fuzzy cluster analysis,” Fuzzy Sets and Systems, vol. 147, no. 1, pp. 3–16, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
D. E. Gustafson and W. C. Kessel, “Fuzzy clustering with a fuzzy covariance matrix,” in Proceedings of the IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, pp. 761–766, San Diego, Calif, USA, January 1979.
View at: Publisher Site | Google Scholar
N. R. Pal, K. Pal, and J. C. Bezdek, “A mixed c-means clustering model,” in Proceedings of the 6th IEEE International Conference on Fuzzy Systems, vol. 1, pp. 11–21, Barcelona, Spain, July 1997.
View at: Publisher Site | Google Scholar
M.-S. Yang, K.-L. Wu, J.-N. Hsieh, and J. Y. Hsieh, “Alpha-cut implemented fuzzy clustering algorithms and switching regressions,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 38, no. 3, pp. 588–603, 2008.
View at: Publisher Site | Google Scholar
J. Yao, M. Dash, S. T. Tan, and H. Liu, “Entropy-based fuzzy clustering and fuzzy modeling,” Fuzzy Sets and Systems, vol. 113, no. 3, pp. 381–388, 2000.
View at: Publisher Site | Google Scholar
S. Chatzis and T. Varvarigou, “Factor analysis latent subspace modeling and robust fuzzy clustering using t-distributions,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 3, pp. 505–517, 2009.
View at: Publisher Site | Google Scholar
K. Honda, A. Notsu, and H. Ichihashi, “Fuzzy PCA-guided robust k-means clustering,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 1, pp. 67–79, 2010.
View at: Publisher Site | Google Scholar
K. T. Atanassov, “Intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 20, no. 1, pp. 87–96, 1986.
View at: Publisher Site | Google Scholar | MathSciNet
K. T. Atanassov and G. K. Gargov, “Intuitionistic fuzzy logic,” Computing Research Academy of Bulgarian Sciences, vol. 43, no. 3, pp. 9–12, 1990.
View at: Google Scholar | MathSciNet
K. Atanassov and C. Georgiev, “Intuitionistic fuzzy prolog,” Fuzzy Sets and Systems, vol. 53, no. 2, pp. 121–128, 1993.
View at: Publisher Site | Google Scholar | MathSciNet
E. Szmidt and J. Kacprzyk, “Intuitionistic fuzzy sets in group decision making,” Notes on Intuitionistic Fuzzy Sets, vol. 2, no. 1, pp. 15–32, 1996.
View at: Google Scholar
S. K. De, R. Biswas, and A. R. Roy, “An application of intuitionistic fuzzy sets in medical diagnosis,” Fuzzy Sets and Systems, vol. 117, no. 2, pp. 209–213, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
K. T. Atanassov, “New operations defined over the intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 61, no. 2, pp. 137–142, 1994.
View at: Publisher Site | Google Scholar | MathSciNet
K. T. Atanassov, Intuitionistic Fuzzy Sets: Theory and Applications, vol. 35 of Studies in Fuzziness and Soft Computing, Physica, Heidelberg, Germany, 1999.
View at: Publisher Site | MathSciNet
K. Atanassov and G. Gargov, “Interval valued intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 31, no. 3, pp. 343–349, 1989.
View at: Publisher Site | Google Scholar | MathSciNet
H. Bustince, F. Herrera, and J. Montero, Fuzzy Sets and Their Extensions: Representation, Aggregation and Models, Physica, Heidelberg, Germany, 2007.
H. M. Zhang, Z. S. Xu, and Q. Chen, “Clustering approach to intuitionistic fuzzy sets,” Control and Decision, vol. 22, no. 8, pp. 882–888, 2007.
View at: Google Scholar | MathSciNet
Z. S. Xu, J. Chen, and J. J. Wu, “Clustering algorithm for intuitionistic fuzzy sets,” Information Sciences, vol. 178, no. 19, pp. 3775–3790, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Xu, “Intuitionistic fuzzy hierarchical clustering algorithms,” Journal of Systems Engineering and Electronics, vol. 20, no. 1, pp. 90–97, 2009.
View at: Google Scholar
E. Szmidt and J. Kacprzyk, “Distances between intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 114, no. 3, pp. 505–518, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Xu, “Some similarity measures of intuitionistic fuzzy sets and their applications to multiple attribute decision making,” Fuzzy Optimization and Decision Making, vol. 6, no. 2, pp. 109–121, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
P. Burillo and H. Bustince, “Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets,” Fuzzy Sets and Systems, vol. 78, no. 3, pp. 305–316, 1996.
View at: Publisher Site | Google Scholar | MathSciNet
Z. S. Xu, “Some similarity measures of intuitionistic fuzzy sets and their applications to multiple attribute decision making,” Fuzzy Optimization and Decision Making, vol. 6, no. 2, pp. 109–121, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
N. R. Pal, K. Pal, J. M. Keller, and J. C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm,” IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 517–530, 2005.
View at: Publisher Site | Google Scholar
K. Ito and K. Kunisch, Lagrange Multiplier Approach to Variational Problems and Applications, SIAM Advances in Design and Control, SIAM, Philadelphia, Pa, USA, 2008.
E. N. Nasibov and G. Ulutagay, “A new unsupervised approach for fuzzy clustering,” Fuzzy Sets and Systems, vol. 158, no. 19, pp. 2118–2133, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
A. Asuncion and D. J. Newman, UCI Machine Learning Repository, Department of Information and Computer Science, University of California, Irvine, Calif, USA, 2007.
J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011.
View at: Google Scholar
R. Kohavi and F. Provost, “Glossary of terms,” Machine Learning, vol. 30, pp. 271–274, 1998.
View at: Google Scholar

Copyright

Copyright © 2015 Arindam Chaudhuri. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3960

Downloads

1876

Citations