Abstract

Fuzzy C-means (FCM) is an important clustering algorithm with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Especially, parameters in FCM have influence on clustering results. However, a lot of FCM algorithm did not solve the problem, that is, how to set parameters. In this study, we present a kind of method for computing parameters values according to role of parameters in the clustering process. New parameters are assigned to membership and typicality so as to modify objective function, on the basis of which Lagrange equation is constructed and iterative equation of membership is acquired, so does the typicality and center equation. At last, a new possibilistic fuzzy C-means based on the weight parameter algorithm (WPFCM) was proposed. In order to test the efficiency of the algorithm, some experiments on different datasets are conducted to compare WPFCM with FCM, possibilistic C-means (PCM), and possibilistic fuzzy C-means (PFCM). Experimental results show that iterative times of WPFCM are less than FCM about 25% and PFCM about 65% on dataset X12. Resubstitution errors of WPFCM are less than FCM about 19% and PCM about 74% and PFCM about 10% on the IRIS dataset.

1. Introduction

Clustering is a method of unsupervised learning and had been applied in various fields, including data mining, pattern recognition, computer vision, and bioinformatics. Cluster methods might be summarized as follows: partition-based [1, 2], hierarchical-based [3], density-based [46], and grid-based [7]. Partition methods included hard partition [8, 9] and soft partition [1012]. Soft partition is represented by using fuzzy membership, and membership value lies in interval [0, 1]. Many fuzzy clustering algorithms had been developed and widely used in a variety of areas [1316], such as data mining and pattern recognition. Ruspini [17] regarded fuzzy C-means (FCM) as a clustering algorithm, and DUNN [18] analyzed fuzzy exponent m and determined that the value of m was equal to 2. Bezdek generalized fuzzy exponent m > 1. The constraint of FCM might cause that membership conflicted with intuitive belonging degree; furthermore, it made the clustering results be sensitive to noise. In order to overcome this defect, Krishnapuram and Keller [19] relaxed the constraint and proposed a new algorithm named possibilistic C-means (PCM) [20] which reduced influence of noise on clustering and had good robustness. However, PCM relied on initialization condition and might produce coincident clusters [21]. Many algorithms were developed to overcome coincident problem. For example, studies [22, 23] modified the PCM objective function adding an inverse function of the distances between cluster centers to resolve coincident problem. The study [23] proposed a new model named the fuzzy possibilistic C-means (FPCM) which introduced membership and typicality value tij subjected to for unlabeled data. FPCM reduced sensitivity to noise in FCM and resolved coincident problem in PCM; however, typicality value became very small as dataset scale increasing for the reason of row sum constraints.

The study by Pal [24] proposed new algorithms named possibilistic fuzzy C-means (PFCM), which was a hybridization of PCM and FCM and overcame problems of PCM, FCM, and FPCM. PFCM solves the noise sensitivity. So, PFCM has been widely applied in many fields [2527] and solved some problems well. PFCM added coefficients a and b for membership and possibility, which measured the relative importance in the computation of centroids; however, the values of a and b were simply fix at 1, which meant that membership and possibility had the same importance during the process of computing centroids. This setting made clustering results become less evident in some clustering. PFCM did not give a scientific and rational method to compute parameters. The main objective of this study was to generalize FCM, PCM, and PFCM algorithms and propose a new algorithm named weight possibilistic fuzzy C-means (WPFCM). We designed a new objective function on basis of PFCM. According to the requirement of minimizing objective function, iterative functions of membership, typicality, and centroid were obtained by constructing Lagrange function and deriving its derivation.

This study is quite different from literature [28]. First, this study focussed on clustering on possibilistic fuzzy C-means, while the study by Schneider [28] aimed to the possibilistic C-means algorithm. The research algorithms were different. Second, the key point was that designing weight parameter was diverse. Algorithm in this study could allocated weight value to samples inlier and outlier automatically according to the calculation method of weight parameters, which made membership value maximizing and outlier reduce influence of estimation. Weight parameter could satisfy the optimization objective function, make it iterate faster, and avoid the coincident problem. The method in the study by Schneider [28] could not have these advantages.

Experiments on different datasets show that new algorithm not only makes clustering results obvious but also partitions overlapping data better and also reduces iterative times and speeds up convergence. The rest of this study is organized as follows: Section 2 reviews the FCM, PCM, FPCM, and PFCM clustering algorithm. Section 3 provides a new method on the computation of parameters and WPFCM was proposed. Section 4 experimentally demonstrates the improvement of performance of WPFCM on some UCI database. Section 5 offers conclusion.

Since fuzzy set theory was introduced by Zadeh, this method was applied in the clustering algorithm rapidly. FCM is one of the most famous algorithms and obtains clustering results by minimizing objective function and iterating membership and centroid. The objective function of FCM is designed as follows:where fuzzy exponent m is subjected to m > 1 and Euclidean distance is defined as . Membership can be obtained by minimizing objective function (1). The following equations are iterative functions of membership and centroid.

The clustering performance is better; however, the algorithm subjects to the following three constraints: , , and , which make the algorithm be sensitive to noise and usually lead to center deviation for individual anomalous data points.

The constraints of FCM require data points to consider the relation to other points in current cluster and in other clusters; therefore, membership might conflict with intuitive belonging degree and does not directly reflect real clustering results. The FCM algorithm is sensitive to noise and obtains poor clustering results in noisy data environment. Krishnapuram and Keller [19] improved FCM and proposed the possibilistic C-means algorithm which relaxed constraint. Objective function is designed as follows:where is the scaling parameter of the ith class and defined (common K = 1), and exponent q subjects to constraint q > 1, and Euclidean distance is defined . Iterative functions of typicality and centroid are obtained by minimizing objective function (3). Equations (4) and (5) are the iterative functions.where , ,

In equation (3), tij is not the membership but possibility, and clustering results are easy to interpret. PCM [29] relaxes the constraint and defines the constraint , so the rows and columns are independent, and data structure becomes loose. Therefore, the algorithm is insensitive to noise and could deal with the dataset including outlier; on the other side, there is another weakness. Experiments show that PCM’s clustering results depend on initialization and generate the coincident problem. Pal [30] held that clustering centroids were closed to data centers due to the effect of membership. Pal proposed a new algorithm FPCM on the basis of FCM and PCM. FPCM used data center as the clustering center. It is feasible to a great extent. Membership is a good method when data points need to be marked clearly because it is natural to assign a point to cluster whose prototype is the nearest to the point, while possibility is important to estimate clustering centers, and effectively reduces the influence that was brought by abnormal data point. The objective function was designed as follows:where membership subjects to constraint , (j = 1,…,n), and the typicality subjects to constraint (i = 1,…,c), and other constraints are m > 1, q > 1, and 0 < uij, tij < 1, and Euclidean distance was defined . Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. The equations (7)–(9) are the iterative functions, respectively.

Although FPCM overcomes weakness of PCM and FCM, the typicality value becomes very small with sample data increase. Typicality value is limited by FPCM. On a large sample dataset, the typicality value is inconsistent with real value due to constraint row sum. Pal [24] improved algorithm FPCM relaxed the typicality constraint row sum, retained membership constraint column sum, and proposed a new algorithm named possibilistic fuzzy C-means (PFCM). The objective function is designed as follows:where parameters subject to constraints m > 1, q > 1, η > 0, 0 < uij, tij < 1, 1 ≤ i ≤ c, and 1 ≤ j ≤ n. Euclidean distance is defined , and parameters and are the constants. Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. The equations (11)–(13) are iterative functions.where is defined , and usually, K is a constant (K = 1).

3. WPFCM Algorithm

This section includes three paragraphs. The motivation of weight parameters was first introduced, and then, the calculation method of weight parameters is presented in the second part, and the last part gives objection function and the steps of algorithm.

3.1. Motivation Weight Parameters

PFCM integrates merit of PCM with FCM, which includes membership and typicality. PFCM reduces the sensitivity to noise in FCM and overcomes coincident problem in PCM and can deal with the problem that typicality value becomes very small with data points increasing in FPCM. After analyzing the parameters a and b, we found that the values of a and b have influence on membership and typicality and then affect the clustering results. If parameter a is greater than b, prototype is affected more by membership than by typicality; on the contrary, if parameter b is higher than a, prototype is affected more by typicality than by membership. Therefore, if we want to reduce influence of clustering results caused by outlier, we should select values of a lower than b. How to determine values of parameters is difficult. Usually, values of parameters are all fixed at 1, which means that membership and typicality have the same importance to clustering results. At this time, setting two parameters a and b becomes meaningless. In many situations, we do not know whether values fit to parameters, and then, it depends on experience to determine values of parameters a and b. Assigning values of parameters a and b lacks mathematical basis, so it is occasional and unscientific in PFCM. The clustering results become unstable. There is another weakness in PFCM that all vector data share the same value of parameters in the cluster process; however, different vectors have various importance for clustering. So it is unreasonable for parameters a and b to be fixed at 1. In order to overcome these weaknesses, we proposed a new method to compute values of weight parameters which replace parameters a and b in PFCM. New parameters consider the importance of each sample data in the process of clustering. New calculation method is more reasonable. The importance of parameter lies in the fact that values of a and b directly affect typicality value tij and centroid value , affect membership indirectly, and then influence on the clustering results.

3.2. Calculation Method of Weight Parameters

Many literatures have mentioned methods for calculating weight parameters [3133]. The study by Fan et al. [32] assigned weights to properties according to the importance of each property to the cluster process. For example, in dataset IRIS [34], the third property and the fourth property are beneficial to get obvious clustering results, so they are assigned a high weight value and others are assigned a low weight value. The premise is that we must know which property is important and unimportant. To an unknown dataset, this method is inappropriate and cannot be applied. The study by Nock and Nielsen [33] estimated all samples’ probability density by using the analogy method. This method needs a great deal of computation. The study by Hung [29] gave a prototype-driven learning of parameter which is based on exponential separation strength between clusters and updated each iteration to improve the performance of FCM. Equation (14) is the definition of parameter of .where parameter is defined as the distance from data point xj to sample mean. Parameter is defined as follows:and can be defined as sample mean:

Definition 1. A given sample set to be classified is denoted by X (X= {x1,x2,…,xn}⊂F (X)), and X is partitioned c (0 < c < n) fuzzy subsets, and c is the number of clustering.

Definition 2. According to the importance of the data point xj (xj ∈ X) during the clustering process, weight parameter can be defined as γij, which is the weight of xj w. r. t. class i. The following equation is the calculation method.

Theorem 1. Distance from xj to center can be regarded as weight; if the distance is long, then the value of weight will be high; on the contrary, if the distance is short, then the value of weight will be low.

Proof:. is the mean value of sample, and the difference between xj and is reflected by the distance from data point xj to , which is constant. The smaller the value of , the shorter the distance from xj to class i. We can deduce that the larger the value of , the smaller the value of , and value of will be small; on the contrary, the long the distance from xj to class i, the larger the value of . Optimization of objective function requires a minimum value. Weight parameter should satisfy the optimization objective; so long distance should get large γij. It is appropriate to use γij as the weight parameter.

3.3. Design Objective Function

According to the rule of classification, there is little difference among all data in the same class and great difference in different classes. During the process of designing objective function, in order to assign data point, the nearest distance from data point to center should be selected, which is denoted by the maximizing membership value. The typicality value can be used to reduce influence of estimation caused by outlier. New objective function should meet two requirements: on the one hand, role of membership in objective function should be increased when the sample is inlier; on the other hand, the role of the typicality value should be increased in objective function when the sample is outlier. Therefore, the objective function is designed as in the following equation, which include two parts: the first part is fuzzy function denoted by fuzzy weight parameter and the second part is typicality function denoted the by typicality weight parameter.

Definition 3. The new objective function is designed as following which is based on FCM, PCM, and PFCM.where γij (0 < γij < 1) denotes the weight between data point xj and class i, which comes from equation (17). Different data points have various weight values, and then, clustering results are more reasonable by using different weight parameters and avoid the coincident problem. U, T, and V denote membership matrix (c×n), typicality matrix (c×n), and centroid matrix (c×1), respectively. Here, uij (0 < uij < 1) is the membership of feature point xj in cluster ci and tij (0 < tij < 1) is the typicality of xj in cluster ci. is the Euclidean distance between data point xj and . The parameters m (m > 1) and q (q > 1) are the fuzzy exponents. The parameter ηi (ηi > 0) is a constant, which is defined by , where K usually is fixed at 1.
According to the analysis of the preceding context, we know that the nearer the distance between data point xj and cluster ci, the smaller the value of weight parameter γij. The distance from data point xj to cluster ci is near, which shows that data point xj belongs to the ith cluster. The weight parameter value of membership should be increased and be set as (1−γij). On the contrary, the further the distance between xj and the ith cluster, the greater the difference between xj and the ith cluster, and xj may be an anomalous point. Typicality weight parameter should be increased to reduce the effect of xj on clustering. The typicality weight parameter is set as γij. With increase (decrease) of weight parameter of membership, typicality weight parameter will be decreased (increased). The weight parameter is calculated on the basis of different sample date points, which overcomes the unreasonable value of a and b in PFCM and resolves coincident problem which is caused by small value of a and poor initialization centroid.
According to Definition 3, the Lagrangian multiplier method was used to construct the Lagrange equation. In order to minimize equation (18), the partial derivatives of uij and tij were computed according to constraints and and acquired the membership uij and typicality tij and centroid as follows:

3.4. WPFCM Algorithm

According to the objective function, steps of algorithms were provided as following:

4. Experiments

In order to validate the algorithm efficiency, some experiments on different datasets were carried out. The initial value of parameters is set as follows: ε = 0.000001, the maximum iterative times max_iter = 100, constant K = 1, the number of class Cluster_n = 2 for dataset X12, and the number of class Cluster_n = 3 for dataset IRIS [34].

Experiment 1. Dataset X12 [35]; algorithm: FCM, PCM, PFCM, and WPFCM; initialization:X12 is a two-dimensional dataset with 12 data points. The coordinates of X12 are given in Table 1. Figure 1 shows the coordinate distribution of dataset X12. There are ten points forming two clusters with five points each on the left and the right sides of the axis y. Data points x6 and x12are considered as noise, and each has the same distance to two clusters.
Table 2 presents centroids which are generated by running FCM, PFCM, and WPFCM on X12. Suppose distance equation DistX = ||VX12VX ||2, which denotes the distance from real centroid to centroid VX generated by algorithms. The following are the centroids:The real centroid of is . We compute distance of each algorithm by using distance equation as follows: DistFCM = 0.4212, DistPFCM = 0.3860, and DistWPFCM = 0.1537. Comparing three distances, although each algorithm can get good result, DistWPFCM has the minimum distance, that is, to say VWPFCM is nearer to real centroid than other VX. VWPFCM reflects real cluster center better.
Table 3 provides the minimum iteration times of FCM, PFCM, and WPFCM with optimal given parameters. Iteration times of WPFCM are slight less than FCM and far less than PFCM. Therefore, WPFCM has less running time in large datasets and has a high speed of convergence.
Table 4 presents the membership value by running FCM, PFCM, and WPFCM. By comparison, membership values of WPFCM are better than the other two algorithms; especially for data points x3 and x9, the membership values are equal to one. Data points x3 and x9 are the center of two clusters, which show that WPFCM is easier to recognize the cluster center. Membership value cannot tell noisy data point x6 and x12, but noisy data are identified by using the typicality value in Table 5. By analyzing data in Table 5, typicality values of WPFCM are greater than PFCM. If one data point has larger typicality value, the data point is more likely to belong to the cluster. One of typicality values of x3 and x9 are up to 1 in Table 5, which show data point, respectively, belongs to two clusters with large possibility.
Membership values of noisy data x6 and x12 are equal to 0.5. Figure 1 shows that distance from x6 to two clustering centers is far less than x12, but Table 4 cannot show this difference. Table 5 shows that typicality values of ten data points are greater than 0.9 except for x6 and x12. Typicality values of x6 and x12 are far less than others, so we consider the data points x6 and x12 are noise. We also find the typicality value of x12 is far less than x6 in Table 5, which shows noisy data x12 belong to two clusters with less possibility than x6 and which reflects distribution of x6 and x12 in Figure 1. WPFCM improves the defect of FCM. From Table 5, we also find the typicality value of WPFCM is better than PFCM, so WPFCM can get more obvious clustering results.
Table 6 presents centroids and iteration times by running WPFCM on dataset X12 with different parameters. Clustering results of WPFCM are better than FCM and PFCM as a whole. Iteration times of WPFCM are a bit less than FCM and PFCM. When the value of m keeps unchanged and value of q varied from 2 to 5, there is an increasing tendency of membership values, but not obvious; however, typicality values have an evident decreasing tendency. Clustering centers also have great changes, and the values are increasing and nearer to real centroid. Iteration times are decreasing. With increase of q, the influence of weight parameter γij on clustering results is increasing. Weight parameter γij is generated in iterative procedure, and initial centroid is generated randomly, so WPFCM overcomes the defect of random selection a and b and improves the vulnerability of uncertain clustering results. Clustering results in Table 7 are better than in Table 6; however, we find that membership values greatly reduce but iteration times increase. Comprehensive consideration suggests that the value of m and q are 1.5 and 5, respectively.

Experiment 2. Dataset: IRIS; algorithm: FCM, PCM, PFCM, and WPFCM
IRIS is a four-dimensional dataset including three classes: setosa, versilcolor, and virginica. Each cluster has 50 data points, adding up to 150 data points. The first cluster setosa has good separation from the other two clusters without overlapping. There are some overlaps between versilcolor and virginica.
Data in Tables 8 and 9 were acquired by running FCM, PCM, PFCM, and WPFCM many times on . Each algorithm got good clustering centroid. Compared with other algorithms, WPFCM acquired more obvious membership and typicality values and better separation. The two centroids of versilcolor and virginica got by running the PCM algorithm almost overlap. It is difficult to find separation between clusters in Tables 8 and 9. In order to compare separation between different classes, we defined the distance between classes as Distij = ||ViVj||2, which denotes the distance from ith cluster to the jth cluster.
Table 10 provides distance values between different centroids generated by FCM, PCM, PFCM, and WPFCM with IRIS. Dist12 and Dist13 that are calculated by using Distij = ||Vi−Vj||2 in WPFCM, FCM, and PFCM reflect the fact that setosa separates from the other two classes versilcolor and virginica. However, in PCM, Dist12 and Dist13 have almost identical results and Dist23 is nearly zero, so the results do not reflect the features of dataset which is caused by coincidence of PCM. Although FCM, PFCM, and WPFCM all reflect separation of setosa from other two class and overlapping between versilcolor and virginica, by comparing Dist23, we find that Dist23 in WPFCM is the nearest to the real value. We conclude that WPFCM reflects the characteristic of dataset better than other algorithms and easily get good partition especially for class versilcolor and class virginica.
Table 11 provides the distance values between different centroids and real centroid generated by FCM, PCM, PFCM, and WPFCM. Formula (24) is defined as the sum of distance between centroid acquired from each algorithm and real centroid. Distxi represents distance from the ith cluster center to real centroid. Each in WPFCM is less than in other algorithms. Compared value of DistX, there is relation DistWPFCM < DistPFCM < DistFCM < DistPCM in Table 11, which shows that there is little difference between centroid of WPFCM and real centroid.
Iterative times generated by FCM, PCM, PFCM, and WPFCM with IRIS are given in Table 12. Iterative times of WPFCM are slightly larger than FCM, but far less than PCM and PFCM. The WPFCM algorithm acquires clustering center quickly and has fast convergence speed.
The number of resubstitution errors from FCM, PCM, PFCM, and WPFCM on dataset is given in Table 13. Resubstitution errors of WPFCM are slightly less than FCM and PFCM, but far less than PCM, no matter with regard to membership value or typicality value. Table 13 includes two relations: UeWPFCM<UePFCM<UeFCM<UePCM and TeWPFCM<TePFCM<TeFCM<TePCM. Resubstitution errors of membership and typicality are up to 50 in PCM, which are far greater than other algorithms. The reason is that the PCM algorithm has clustering consistency issues, and there are overlapping data in versilcolor and virginica.

5. Conclusions

A new possibilistic fuzzy C-mean based on weight parameters was proposed according to the importance of membership and typicality in the clustering process. First, aiming at unreasonable parameters a and b, we designed weight parameter γij based on literature [23] and provided the concrete calculation method. Weight parameters (1−γij) and γij were assigned to membership and typicality. Objective function (equation (18)) was improved, and then, new algorithm idea (Algorithm 1) was provided. Experiment on different datasets show that the new algorithm has good performance in dealing with noisy data and gets better clustering results. WPFCM resolves the coincident problem and overcomes defect of sensitivity to noisy data. New algorithm discusses the influence on membership values, typicality values, and centroids with different values of exponent parameters m and q. Exponent parameters are determined by comprehensively comparing membership values, typicality values, and centroids. Experiments compare iterative times in different algorithms. WPFCM has less iterative times and fast convergence speed. Resubstitution errors of WPFCM are near to FCM and PFCM, but far less than PCM. Comprehensively, many performance indexes suggest that WPFCM overcomes weakness of sensitivity of FCM and resolves the coincident problem of PCM and unreasonable weight parameters of PFCM. The next work is to extend the new algorithm to the nonpoint prototype clustering model such as the spherical prototype, the quadric prototype, and the shell prototype.

(1)Initializing parameters m (m > 1), q (q > 1), and ε, c (0 < c < 1), setting the maximum cycle number max_iter, setting the initial value of cycle number as 1, and randomly generating centroid V0.
(2)Computing distance according to
(3)Computing the weight parameter γij and (1−γij) by using equation (18)
(4)Computing membership value uij and typicality value tij by using equations (19) and (20)
(5)Computing the objective function obj_fcn
(6)If |obj_fcn (i)-obj_fcn (i−1) |<ε or iterative times are less than max_iter, then stop
 Else obj_fcn (i) ⟶ obj_fcn (i−1)
(7)Computing centroid by using equation (21) and going to step 2

Data Availability

The data used to support the findings of this study are available at http://archive.ics.uci.edu/ml/datasets/iris.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by “Petrel Program of Lianyungang Jiangsu Province, China” (KK18088), and “the Program of Science and Technology Associate Chief Engineer of Jiangsu Province of China” (FZ20200458).