Research Article

A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

Figure 2

Demonstration of the process of density-based algorithm. (a) is the initial data distribution of a synthetic dataset. The dataset consists of two pieces of 2-dimensional Gaussian distribution data with centroids, respectively, as (2, 3) and (7, 8). Each class has 100 samples. In (b), the blue circle represents the highest density core point as the centroid of the first cluster, and the red plus sign represents the object belonging to the first cluster. In (c), the red circle represents the core point as the centroid of the second cluster, and the blue asterisk represents the object belonging to the second cluster. In (d), the purple circle represents the core point as the centroid of the third cluster, the green times sign represents the object belonging to the third cluster, and the black dot represents the final border point which does not belong to any cluster. According to a certain cutoff distance, the maximum number of clusters is 3. If calculated in accordance with the empirical rule, the maximum number of clusters should be 14. Therefore, the algorithm can effectively reduce the iteration of FCM algorithm operation.
(a) Initial data
(b) Iteration 1
(c) Iteration 2
(d) Iteration 3