Abstract
Fuzzy Cmeans (FCM) is an important clustering algorithm with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Especially, parameters in FCM have influence on clustering results. However, a lot of FCM algorithm did not solve the problem, that is, how to set parameters. In this study, we present a kind of method for computing parameters values according to role of parameters in the clustering process. New parameters are assigned to membership and typicality so as to modify objective function, on the basis of which Lagrange equation is constructed and iterative equation of membership is acquired, so does the typicality and center equation. At last, a new possibilistic fuzzy Cmeans based on the weight parameter algorithm (WPFCM) was proposed. In order to test the efficiency of the algorithm, some experiments on different datasets are conducted to compare WPFCM with FCM, possibilistic Cmeans (PCM), and possibilistic fuzzy Cmeans (PFCM). Experimental results show that iterative times of WPFCM are less than FCM about 25% and PFCM about 65% on dataset X_{12}. Resubstitution errors of WPFCM are less than FCM about 19% and PCM about 74% and PFCM about 10% on the IRIS dataset.
1. Introduction
Clustering is a method of unsupervised learning and had been applied in various fields, including data mining, pattern recognition, computer vision, and bioinformatics. Cluster methods might be summarized as follows: partitionbased [1, 2], hierarchicalbased [3], densitybased [4–6], and gridbased [7]. Partition methods included hard partition [8, 9] and soft partition [10–12]. Soft partition is represented by using fuzzy membership, and membership value lies in interval [0, 1]. Many fuzzy clustering algorithms had been developed and widely used in a variety of areas [13–16], such as data mining and pattern recognition. Ruspini [17] regarded fuzzy Cmeans (FCM) as a clustering algorithm, and DUNN [18] analyzed fuzzy exponent m and determined that the value of m was equal to 2. Bezdek generalized fuzzy exponent m > 1. The constraint of FCM might cause that membership conflicted with intuitive belonging degree; furthermore, it made the clustering results be sensitive to noise. In order to overcome this defect, Krishnapuram and Keller [19] relaxed the constraint and proposed a new algorithm named possibilistic Cmeans (PCM) [20] which reduced influence of noise on clustering and had good robustness. However, PCM relied on initialization condition and might produce coincident clusters [21]. Many algorithms were developed to overcome coincident problem. For example, studies [22, 23] modified the PCM objective function adding an inverse function of the distances between cluster centers to resolve coincident problem. The study [23] proposed a new model named the fuzzy possibilistic Cmeans (FPCM) which introduced membership and typicality value t_{ij} subjected to for unlabeled data. FPCM reduced sensitivity to noise in FCM and resolved coincident problem in PCM; however, typicality value became very small as dataset scale increasing for the reason of row sum constraints.
The study by Pal [24] proposed new algorithms named possibilistic fuzzy Cmeans (PFCM), which was a hybridization of PCM and FCM and overcame problems of PCM, FCM, and FPCM. PFCM solves the noise sensitivity. So, PFCM has been widely applied in many fields [25–27] and solved some problems well. PFCM added coefficients a and b for membership and possibility, which measured the relative importance in the computation of centroids; however, the values of a and b were simply fix at 1, which meant that membership and possibility had the same importance during the process of computing centroids. This setting made clustering results become less evident in some clustering. PFCM did not give a scientific and rational method to compute parameters. The main objective of this study was to generalize FCM, PCM, and PFCM algorithms and propose a new algorithm named weight possibilistic fuzzy Cmeans (WPFCM). We designed a new objective function on basis of PFCM. According to the requirement of minimizing objective function, iterative functions of membership, typicality, and centroid were obtained by constructing Lagrange function and deriving its derivation.
This study is quite different from literature [28]. First, this study focussed on clustering on possibilistic fuzzy Cmeans, while the study by Schneider [28] aimed to the possibilistic Cmeans algorithm. The research algorithms were different. Second, the key point was that designing weight parameter was diverse. Algorithm in this study could allocated weight value to samples inlier and outlier automatically according to the calculation method of weight parameters, which made membership value maximizing and outlier reduce influence of estimation. Weight parameter could satisfy the optimization objective function, make it iterate faster, and avoid the coincident problem. The method in the study by Schneider [28] could not have these advantages.
Experiments on different datasets show that new algorithm not only makes clustering results obvious but also partitions overlapping data better and also reduces iterative times and speeds up convergence. The rest of this study is organized as follows: Section 2 reviews the FCM, PCM, FPCM, and PFCM clustering algorithm. Section 3 provides a new method on the computation of parameters and WPFCM was proposed. Section 4 experimentally demonstrates the improvement of performance of WPFCM on some UCI database. Section 5 offers conclusion.
2. Related Works
Since fuzzy set theory was introduced by Zadeh, this method was applied in the clustering algorithm rapidly. FCM is one of the most famous algorithms and obtains clustering results by minimizing objective function and iterating membership and centroid. The objective function of FCM is designed as follows:where fuzzy exponent m is subjected to m > 1 and Euclidean distance is defined as . Membership can be obtained by minimizing objective function (1). The following equations are iterative functions of membership and centroid.
The clustering performance is better; however, the algorithm subjects to the following three constraints: , , and , which make the algorithm be sensitive to noise and usually lead to center deviation for individual anomalous data points.
The constraints of FCM require data points to consider the relation to other points in current cluster and in other clusters; therefore, membership might conflict with intuitive belonging degree and does not directly reflect real clustering results. The FCM algorithm is sensitive to noise and obtains poor clustering results in noisy data environment. Krishnapuram and Keller [19] improved FCM and proposed the possibilistic Cmeans algorithm which relaxed constraint. Objective function is designed as follows:where is the scaling parameter of the i^{th} class and defined (common K = 1), and exponent q subjects to constraint q > 1, and Euclidean distance is defined . Iterative functions of typicality and centroid are obtained by minimizing objective function (3). Equations (4) and (5) are the iterative functions.where , ,
In equation (3), t_{ij} is not the membership but possibility, and clustering results are easy to interpret. PCM [29] relaxes the constraint and defines the constraint , so the rows and columns are independent, and data structure becomes loose. Therefore, the algorithm is insensitive to noise and could deal with the dataset including outlier; on the other side, there is another weakness. Experiments show that PCM’s clustering results depend on initialization and generate the coincident problem. Pal [30] held that clustering centroids were closed to data centers due to the effect of membership. Pal proposed a new algorithm FPCM on the basis of FCM and PCM. FPCM used data center as the clustering center. It is feasible to a great extent. Membership is a good method when data points need to be marked clearly because it is natural to assign a point to cluster whose prototype is the nearest to the point, while possibility is important to estimate clustering centers, and effectively reduces the influence that was brought by abnormal data point. The objective function was designed as follows:where membership subjects to constraint , (j = 1,…,n), and the typicality subjects to constraint (i = 1,…,c), and other constraints are m > 1, q > 1, and 0 < u_{ij}, t_{ij} < 1, and Euclidean distance was defined . Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. The equations (7)–(9) are the iterative functions, respectively.
Although FPCM overcomes weakness of PCM and FCM, the typicality value becomes very small with sample data increase. Typicality value is limited by FPCM. On a large sample dataset, the typicality value is inconsistent with real value due to constraint row sum. Pal [24] improved algorithm FPCM relaxed the typicality constraint row sum, retained membership constraint column sum, and proposed a new algorithm named possibilistic fuzzy Cmeans (PFCM). The objective function is designed as follows:where parameters subject to constraints m > 1, q > 1, η > 0, 0 < u_{ij}, t_{ij} < 1, 1 ≤ i ≤ c, and 1 ≤ j ≤ n. Euclidean distance is defined , and parameters and are the constants. Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. The equations (11)–(13) are iterative functions.where is defined , and usually, K is a constant (K = 1).
3. WPFCM Algorithm
This section includes three paragraphs. The motivation of weight parameters was first introduced, and then, the calculation method of weight parameters is presented in the second part, and the last part gives objection function and the steps of algorithm.
3.1. Motivation Weight Parameters
PFCM integrates merit of PCM with FCM, which includes membership and typicality. PFCM reduces the sensitivity to noise in FCM and overcomes coincident problem in PCM and can deal with the problem that typicality value becomes very small with data points increasing in FPCM. After analyzing the parameters a and b, we found that the values of a and b have influence on membership and typicality and then affect the clustering results. If parameter a is greater than b, prototype is affected more by membership than by typicality; on the contrary, if parameter b is higher than a, prototype is affected more by typicality than by membership. Therefore, if we want to reduce influence of clustering results caused by outlier, we should select values of a lower than b. How to determine values of parameters is difficult. Usually, values of parameters are all fixed at 1, which means that membership and typicality have the same importance to clustering results. At this time, setting two parameters a and b becomes meaningless. In many situations, we do not know whether values fit to parameters, and then, it depends on experience to determine values of parameters a and b. Assigning values of parameters a and b lacks mathematical basis, so it is occasional and unscientific in PFCM. The clustering results become unstable. There is another weakness in PFCM that all vector data share the same value of parameters in the cluster process; however, different vectors have various importance for clustering. So it is unreasonable for parameters a and b to be fixed at 1. In order to overcome these weaknesses, we proposed a new method to compute values of weight parameters which replace parameters a and b in PFCM. New parameters consider the importance of each sample data in the process of clustering. New calculation method is more reasonable. The importance of parameter lies in the fact that values of a and b directly affect typicality value t_{ij} and centroid value , affect membership indirectly, and then influence on the clustering results.
3.2. Calculation Method of Weight Parameters
Many literatures have mentioned methods for calculating weight parameters [31–33]. The study by Fan et al. [32] assigned weights to properties according to the importance of each property to the cluster process. For example, in dataset IRIS [34], the third property and the fourth property are beneficial to get obvious clustering results, so they are assigned a high weight value and others are assigned a low weight value. The premise is that we must know which property is important and unimportant. To an unknown dataset, this method is inappropriate and cannot be applied. The study by Nock and Nielsen [33] estimated all samples’ probability density by using the analogy method. This method needs a great deal of computation. The study by Hung [29] gave a prototypedriven learning of parameter which is based on exponential separation strength between clusters and updated each iteration to improve the performance of FCM. Equation (14) is the definition of parameter of .where parameter is defined as the distance from data point x_{j} to sample mean. Parameter is defined as follows:and can be defined as sample mean:
Definition 1. A given sample set to be classified is denoted by X (X = {x_{1},x_{2},…,x_{n}}⊂F (X)), and X is partitioned c (0 < c < n) fuzzy subsets, and c is the number of clustering.
Definition 2. According to the importance of the data point x_{j} (x_{j} ∈ X) during the clustering process, weight parameter can be defined as γ_{ij}, which is the weight of x_{j} w. r. t. class i. The following equation is the calculation method.
Theorem 1. Distance from x_{j} to center can be regarded as weight; if the distance is long, then the value of weight will be high; on the contrary, if the distance is short, then the value of weight will be low.
Proof:. is the mean value of sample, and the difference between x_{j} and is reflected by the distance from data point x_{j} to , which is constant. The smaller the value of , the shorter the distance from x_{j} to class i. We can deduce that the larger the value of , the smaller the value of , and value of will be small; on the contrary, the long the distance from x_{j} to class i, the larger the value of . Optimization of objective function requires a minimum value. Weight parameter should satisfy the optimization objective; so long distance should get large γ_{ij}. It is appropriate to use γ_{ij} as the weight parameter.
3.3. Design Objective Function
According to the rule of classification, there is little difference among all data in the same class and great difference in different classes. During the process of designing objective function, in order to assign data point, the nearest distance from data point to center should be selected, which is denoted by the maximizing membership value. The typicality value can be used to reduce influence of estimation caused by outlier. New objective function should meet two requirements: on the one hand, role of membership in objective function should be increased when the sample is inlier; on the other hand, the role of the typicality value should be increased in objective function when the sample is outlier. Therefore, the objective function is designed as in the following equation, which include two parts: the first part is fuzzy function denoted by fuzzy weight parameter and the second part is typicality function denoted the by typicality weight parameter.
Definition 3. The new objective function is designed as following which is based on FCM, PCM, and PFCM.where γ_{ij} (0 < γ_{ij} < 1) denotes the weight between data point x_{j} and class i, which comes from equation (17). Different data points have various weight values, and then, clustering results are more reasonable by using different weight parameters and avoid the coincident problem. U, T, and V denote membership matrix (c × n), typicality matrix (c × n), and centroid matrix (c × 1), respectively. Here, u_{ij} (0 < u_{ij} < 1) is the membership of feature point x_{j} in cluster c_{i} and t_{ij} (0 < t_{ij} < 1) is the typicality of x_{j} in cluster c_{i}. is the Euclidean distance between data point x_{j} and . The parameters m (m > 1) and q (q > 1) are the fuzzy exponents. The parameter η_{i} (η_{i} > 0) is a constant, which is defined by , where K usually is fixed at 1.
According to the analysis of the preceding context, we know that the nearer the distance between data point x_{j} and cluster c_{i}, the smaller the value of weight parameter γ_{ij}. The distance from data point x_{j} to cluster c_{i} is near, which shows that data point x_{j} belongs to the i^{th} cluster. The weight parameter value of membership should be increased and be set as (1−γ_{ij}). On the contrary, the further the distance between x_{j} and the i^{th} cluster, the greater the difference between x_{j} and the i^{th} cluster, and x_{j} may be an anomalous point. Typicality weight parameter should be increased to reduce the effect of x_{j} on clustering. The typicality weight parameter is set as γ_{ij}. With increase (decrease) of weight parameter of membership, typicality weight parameter will be decreased (increased). The weight parameter is calculated on the basis of different sample date points, which overcomes the unreasonable value of a and b in PFCM and resolves coincident problem which is caused by small value of a and poor initialization centroid.
According to Definition 3, the Lagrangian multiplier method was used to construct the Lagrange equation. In order to minimize equation (18), the partial derivatives of u_{ij} and t_{ij} were computed according to constraints and and acquired the membership u_{ij} and typicality t_{ij} and centroid as follows:
3.4. WPFCM Algorithm
According to the objective function, steps of algorithms were provided as following:
4. Experiments
In order to validate the algorithm efficiency, some experiments on different datasets were carried out. The initial value of parameters is set as follows: ε = 0.000001, the maximum iterative times max_iter = 100, constant K = 1, the number of class Cluster_n = 2 for dataset X_{12}, and the number of class Cluster_n = 3 for dataset IRIS [34].
Experiment 1. Dataset X_{12} [35]; algorithm: FCM, PCM, PFCM, and WPFCM; initialization:X_{12} is a twodimensional dataset with 12 data points. The coordinates of X_{12} are given in Table 1. Figure 1 shows the coordinate distribution of dataset X_{12}. There are ten points forming two clusters with five points each on the left and the right sides of the axis y. Data points x_{6} and x_{12}are considered as noise, and each has the same distance to two clusters.
Table 2 presents centroids which are generated by running FCM, PFCM, and WPFCM on X_{12}. Suppose distance equation Dist_{X} = V_{X12}−V_{X} ^{2}, which denotes the distance from real centroid to centroid V_{X} generated by algorithms. The following are the centroids:The real centroid of is . We compute distance of each algorithm by using distance equation as follows: Dist_{FCM} = 0.4212, Dist_{PFCM} = 0.3860, and Dist_{WPFCM} = 0.1537. Comparing three distances, although each algorithm can get good result, Dist_{WPFCM} has the minimum distance, that is, to say V_{WPFCM} is nearer to real centroid than other V_{X}. V_{WPFCM} reflects real cluster center better.
Table 3 provides the minimum iteration times of FCM, PFCM, and WPFCM with optimal given parameters. Iteration times of WPFCM are slight less than FCM and far less than PFCM. Therefore, WPFCM has less running time in large datasets and has a high speed of convergence.
Table 4 presents the membership value by running FCM, PFCM, and WPFCM. By comparison, membership values of WPFCM are better than the other two algorithms; especially for data points x_{3} and x_{9}, the membership values are equal to one. Data points x_{3} and x_{9} are the center of two clusters, which show that WPFCM is easier to recognize the cluster center. Membership value cannot tell noisy data point x_{6} and x_{12}, but noisy data are identified by using the typicality value in Table 5. By analyzing data in Table 5, typicality values of WPFCM are greater than PFCM. If one data point has larger typicality value, the data point is more likely to belong to the cluster. One of typicality values of x_{3} and x_{9} are up to 1 in Table 5, which show data point, respectively, belongs to two clusters with large possibility.
Membership values of noisy data x_{6} and x_{12} are equal to 0.5. Figure 1 shows that distance from x_{6} to two clustering centers is far less than x_{12}, but Table 4 cannot show this difference. Table 5 shows that typicality values of ten data points are greater than 0.9 except for x_{6} and x_{12}. Typicality values of x_{6} and x_{12} are far less than others, so we consider the data points x_{6} and x_{12} are noise. We also find the typicality value of x_{12} is far less than x_{6} in Table 5, which shows noisy data x_{12} belong to two clusters with less possibility than x_{6} and which reflects distribution of x_{6} and x_{12} in Figure 1. WPFCM improves the defect of FCM. From Table 5, we also find the typicality value of WPFCM is better than PFCM, so WPFCM can get more obvious clustering results.
Table 6 presents centroids and iteration times by running WPFCM on dataset X_{12} with different parameters. Clustering results of WPFCM are better than FCM and PFCM as a whole. Iteration times of WPFCM are a bit less than FCM and PFCM. When the value of m keeps unchanged and value of q varied from 2 to 5, there is an increasing tendency of membership values, but not obvious; however, typicality values have an evident decreasing tendency. Clustering centers also have great changes, and the values are increasing and nearer to real centroid. Iteration times are decreasing. With increase of q, the influence of weight parameter γ_{ij} on clustering results is increasing. Weight parameter γ_{ij} is generated in iterative procedure, and initial centroid is generated randomly, so WPFCM overcomes the defect of random selection a and b and improves the vulnerability of uncertain clustering results. Clustering results in Table 7 are better than in Table 6; however, we find that membership values greatly reduce but iteration times increase. Comprehensive consideration suggests that the value of m and q are 1.5 and 5, respectively.
Experiment 2. Dataset: IRIS; algorithm: FCM, PCM, PFCM, and WPFCM
IRIS is a fourdimensional dataset including three classes: setosa, versilcolor, and virginica. Each cluster has 50 data points, adding up to 150 data points. The first cluster setosa has good separation from the other two clusters without overlapping. There are some overlaps between versilcolor and virginica.
Data in Tables 8 and 9 were acquired by running FCM, PCM, PFCM, and WPFCM many times on . Each algorithm got good clustering centroid. Compared with other algorithms, WPFCM acquired more obvious membership and typicality values and better separation. The two centroids of versilcolor and virginica got by running the PCM algorithm almost overlap. It is difficult to find separation between clusters in Tables 8 and 9. In order to compare separation between different classes, we defined the distance between classes as Dist_{ij} = V_{i}−V_{j}^{2}, which denotes the distance from i^{th} cluster to the j^{th} cluster.
Table 10 provides distance values between different centroids generated by FCM, PCM, PFCM, and WPFCM with IRIS. Dist_{12} and Dist_{13} that are calculated by using Dist_{ij} = V_{i}−V_{j}^{2} in WPFCM, FCM, and PFCM reflect the fact that setosa separates from the other two classes versilcolor and virginica. However, in PCM, Dist_{12} and Dist_{13} have almost identical results and Dist_{23} is nearly zero, so the results do not reflect the features of dataset which is caused by coincidence of PCM. Although FCM, PFCM, and WPFCM all reflect separation of setosa from other two class and overlapping between versilcolor and virginica, by comparing Dist_{23}, we find that Dist_{23} in WPFCM is the nearest to the real value. We conclude that WPFCM reflects the characteristic of dataset better than other algorithms and easily get good partition especially for class versilcolor and class virginica.
Table 11 provides the distance values between different centroids and real centroid generated by FCM, PCM, PFCM, and WPFCM. Formula (24) is defined as the sum of distance between centroid acquired from each algorithm and real centroid. Dist_{xi} represents distance from the i^{th} cluster center to real centroid. Each in WPFCM is less than in other algorithms. Compared value of Dist_{X}, there is relation Dist_{WPFCM} < Dist_{PFCM} < Dist_{FCM} < Dist_{PCM} in Table 11, which shows that there is little difference between centroid of WPFCM and real centroid.
Iterative times generated by FCM, PCM, PFCM, and WPFCM with IRIS are given in Table 12. Iterative times of WPFCM are slightly larger than FCM, but far less than PCM and PFCM. The WPFCM algorithm acquires clustering center quickly and has fast convergence speed.
The number of resubstitution errors from FCM, PCM, PFCM, and WPFCM on dataset is given in Table 13. Resubstitution errors of WPFCM are slightly less than FCM and PFCM, but far less than PCM, no matter with regard to membership value or typicality value. Table 13 includes two relations: U_{eWPFCM} < U_{ePFCM} < U_{eFCM} < U_{ePCM} and T_{eWPFCM} < T_{ePFCM} < T_{eFCM} < T_{ePCM}. Resubstitution errors of membership and typicality are up to 50 in PCM, which are far greater than other algorithms. The reason is that the PCM algorithm has clustering consistency issues, and there are overlapping data in versilcolor and virginica.
5. Conclusions
A new possibilistic fuzzy Cmean based on weight parameters was proposed according to the importance of membership and typicality in the clustering process. First, aiming at unreasonable parameters a and b, we designed weight parameter γ_{ij} based on literature [23] and provided the concrete calculation method. Weight parameters (1−γ_{ij}) and γ_{ij} were assigned to membership and typicality. Objective function (equation (18)) was improved, and then, new algorithm idea (Algorithm 1) was provided. Experiment on different datasets show that the new algorithm has good performance in dealing with noisy data and gets better clustering results. WPFCM resolves the coincident problem and overcomes defect of sensitivity to noisy data. New algorithm discusses the influence on membership values, typicality values, and centroids with different values of exponent parameters m and q. Exponent parameters are determined by comprehensively comparing membership values, typicality values, and centroids. Experiments compare iterative times in different algorithms. WPFCM has less iterative times and fast convergence speed. Resubstitution errors of WPFCM are near to FCM and PFCM, but far less than PCM. Comprehensively, many performance indexes suggest that WPFCM overcomes weakness of sensitivity of FCM and resolves the coincident problem of PCM and unreasonable weight parameters of PFCM. The next work is to extend the new algorithm to the nonpoint prototype clustering model such as the spherical prototype, the quadric prototype, and the shell prototype.

Data Availability
The data used to support the findings of this study are available at http://archive.ics.uci.edu/ml/datasets/iris.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by “Petrel Program of Lianyungang Jiangsu Province, China” (KK18088), and “the Program of Science and Technology Associate Chief Engineer of Jiangsu Province of China” (FZ20200458).