Fuzzy Functions, Relations, and Fuzzy Transforms: Theoretical Aspects and Applications to Fuzzy Systems
View this Special IssueResearch Article  Open Access
Makoto Yasuda, "Deterministic Annealing Approach to Fuzzy CMeans Clustering Based on Entropy Maximization", Advances in Fuzzy Systems, vol. 2011, Article ID 960635, 9 pages, 2011. https://doi.org/10.1155/2011/960635
Deterministic Annealing Approach to Fuzzy CMeans Clustering Based on Entropy Maximization
Abstract
This paper is dealing with the fuzzy clustering method which combines the deterministic annealing (DA) approach with an entropy, especially the Shannon entropy and the Tsallis entropy. By maximizing the Shannon entropy, the fuzzy entropy, or the Tsallis entropy within the framework of the fuzzy cmeans (FCM) method, membership functions similar to the statistical mechanical distribution functions are obtained. We examine characteristics of these entropybased membership functions from the statistical mechanical point of view. After that, both the Shannon and Tsallisentropybased FCMs are formulated as DA clustering using the very fast annealing (VFA) method as a cooling schedule. Experimental results indicate that the Tsallisentropybased FCM is stable with very fast deterministic annealing and suitable for this annealing process.
1. Introduction
Statistical mechanics investigates the macroscopic properties of a physical system consisting of several elements. Recently, research activities that attempt to apply statistical mechanical models or tools to information science have become popular.
The deterministic annealing (DA) method [1] is a deterministic variant of the simulated annealing (SA) method [2]. DA characterizes the minimization problem of the cost function as the minimization of the free energy, which depends on temperature and tracks its minimum while decreasing the temperature, and thus it can deterministically optimize the cost function at each temperature. Hence, DA is more efficient than SA, but does not guarantee a global optimal solution.
There exists a strong relationship between the membership functions of the fuzzy cmeans (FCM) clustering [3] with the maximum entropy or entropy regularization methods [4, 5] and the statistical mechanical distribution function. That is, FCM regularized with the Shannon entropy gives a membership function similar to the Boltzmann (or Gibbs) distribution function [1, 4], and FCM regularized with the fuzzy entropy [6] gives a membership function similar to the FermiDirac distribution function [7]. These membership functions are suitable for the annealing methods because they contain a parameter corresponding to the system temperature.
Tsallis [8] achieved nonextensive extension of the BoltzmannGibbs statistics. Tsallis postulated a generalization form of entropy with a generalization parameter , which, in a limit of , reaches the Shannon entropy. Later on, Ménard et al. [9] derived a membership function by regularizing FCM with the Tsallis entropy.
In this study, the membership function which takes the familiar form of the statistical mechanical distribution function is derived by maximizing the Shannon and fuzzy entropy within the framework of FCM. Similarly, the Tsallis entropybased FCM membership function is derived [10, 11] by maximizing the Tsallis entropy. Then, the formulations of the free energy for these membership functions are calculated and examined from the statistical mechanical viewpoint.
On the other hand, there are some representative cooling schedules of the temperature for SA; for example, inversely proportional to a logarithmic function and inversely proportional to exponential function are well adopted. Rosen [12] proposed the more effective method for SA known as very fast annealing (VFA).
However, an applicability of VFA to DA is not known yet. In order to achieve good clustering by DA, a reliable annealing process is desirable. Therefore, by introducing VFA to DA, we formulate the Shannon and Tsallisentropy based FCMs as very fast DA clustering, to examine their reliabilities.
Experiments are performed on the numerical and iris data [13], and the obtained results indicate that Tsallisentropybased FCM clustering is suitable for very fast DA clustering because of its shape of the membership function.
2. Entropy Maximization Method
Let be a data set in the dimensional real space, which should be divided into clusters. In addition, let be the centers of clusters, and let (; ) be the membership functions. Furthermore, let be the objective function of FCM, where .
2.1. Shannon Entropy Maximization of FCM
First, we introduce the Shannon entropy into the FCM clustering. The Shannon entropy is given by Under the normalization constraint of and setting to 1, the fuzzy entropy functional is given by where and are the Lagrange multipliers and must be determined so as to satisfy (3). The stationary condition for (4) leads to the following membership function and the cluster centers
2.2. Fuzzy Entropy Maximization of FCM
We then introduce the fuzzy entropy into the FCM clustering.
The fuzzy entropy is given by The fuzzy entropy functional is given by where and are the Lagrange multipliers [14]. The stationary condition for (8) leads to the following membership function: and the cluster centers In (9), defines the extent of the distribution [7]. Equation (9) is formally normalized as
2.3. Tsallis Entropy Maximization of FCM
Let and be the centers of clusters and the membership functions, respectively.
The Tsallis entropy is defined as where is any real number. The objective function is rewritten as where .
Accordingly, the Tsallis entropy functional is given by The stationary condition for (14) yields the following membership function: where In this case, the cluster centers are given by
In the limit of , the Tsallis entropy recovers the Shannon entropy [8] and approaches in (5).
3. Statistical Mechanical Interpretation of EntropyBased FCM
3.1. ShannonEntropyBased FCM Statistics
In the Shannonentropybased FCM, the sum of the states (the partition function) for the grand canonical ensemble of fuzzy clustering can be written as By substituting (18) for [15], the free energy becomes Stable thermal equilibrium requires a minimization of the free energy. By formulating deterministic annealing as a minimization of the free energy, yields This cluster center is the same as that in (6).
3.2. FuzzyEntropyBased FCM Statistics
In the fuzzyentropy based FCM, by analogy with statistical mechanics, the grand partition function for the grand canonical ensemble of fuzzy clustering can be written as because data can belong to any cluster. By substituting (21) for [15], the free energy becomes It should be noted that , the Legendre transform of the fuzzy entropy, gives the same form for the free energy.
3.3. The TsallisEntropyBased FCM Statistics
On the other hand, and satisfy which leads to Equation (24) makes it possible to regard as an artificial system temperature [15]. Then, the free energy can be defined as can be derived from as also gives
4. Effects of Annealing Temperature
4.1. Dependency of Shapes of Membership Functions on Temperature
By reducing the temperature according to the annealing schedule, the deterministic annealing method achieves thermal equilibrium which minimizes the free energy. At absolute zero, the particle system settles down to the ground state, that is, the state of minimum energy. Figure 1 shows the forms of the entropy functions , , and . Figure 2 shows the forms of the membership functions , , and .
(a)
(b)
(a)
(b)
In the deterministic annealing method, cluster distribution which minimizes the free energy is searched at the given temperature. At high temperature, the membership functions are widely distributed and clusters to which a data belongs are fuzzy. In case of with , the width of the distribution is roughly proportional to . At the limit of low temperature, on the other hand, fuzzy clustering reaches hard clustering. The relationship suggests that the higher temperature causes the larger entropy state, that is, chaotic state. This increase of the entropy is the result of the extent of the membership function.
In Figure 2, it can be seen that has a flat peak, though both and have Gaussian forms. Also, it can be found that has a more gentle base slope than .
4.2. Cooling Schedule
4.2.1. Representative Annealing Methods
In SA, the temperature decreases according to a cooling schedule. The representative cooling schedules [16] for SA are(i)proportional to an exponential function where is a sufficiently high initial temperature, is a parameter which defines a temperature reduction speed, and is the number of iterations, (ii)inversely linear function (iii)inversely proportional to a logarithmic function (iv)inversely proportional to exponential function
4.2.2. Very Fast Annealing
Rosen proposed another inversely proportional to exponential function known as very fast annealing (VFA).
In VFA, is given by where is a temperature reduction parameter and is a dimension of a state space. Equations (31) and (32) are compared in Figure 3. It is observed that VFA initially decreases a temperature extremely.
(a) Inversely proportional to an exponential function
(b) Very fast annealing
In Section 6, we apply VFA as a cooling schedule of entropy based FCM clustering using DA.
5. Fuzzy CMeans as Clustering Algorithm Using Very Fast Annealing DA
The very fast deterministic annealing algorithm for the Tsallisentropybased FCM is given as follows.(1)Set the number of clusters , the highest temperature , the temperature reduction rate , and the threshold of convergence test and ;(2)generate initial clusters at random positions and assign each data point to the nearest cluster. Set current temperature to ; (3)calculate by (15); (4)calculate cluster centers by (17); (5)compare the difference between the current centers and those obtained at the previous iteration . If the convergence condition If is satisfied, then go to (6), otherwise go back to (3); (6)if is satisfied, then stop. Otherwise decrease the temperature with (32) and go back to (3).
In case of Shannonentropybased FCM, (15) is replaced by (5) and (17) is replaced by (6), respectively.
6. Experiments
6.1. Experiment 1
In experiment 1, we generated five randomly placed clusters composed of 2,000 data points shown in Figure 4. We set to be 10, to be 50, and to be 2 (measured by the scale of Figure 4). We also set or .
First, we have applied the inversely exponential scheduling method to the Tsallisentropybased FCM clustering. The cooling schedule is illustrated in Figure 5. The changes of are parameterized by the temperature reduction rate : from 1 to 1000.
At the higher levels of (Figure 5 (A)), clusters are created near the center of gravity of data because is comparatively small and the membership function extends over the whole data area and is extremely uniform. As is lowered from Figure 5 (B) to (C), the width of the membership functions becomes narrower; that is, the Tsallis entropy decreases, and the associations become less fuzzy. And finally, the desired result is obtained.
In case of or (Figure 5 (E) or (F)), it is observed that and converge more rapidly. In case of , however, the initial distribution of becomes too wide and the algorithm is not converged with and (indicated by “not converged” in Figure 5). Thus, it is important to set and values properly.
To examine the effectiveness of VFA as a cooling schedule of DA, we made numerical experiments of the Shannon and Tsallisentropybased FCM clustering.
The shifts of cluster centers with decreasing temperature are illustrated in Figures 7 and 8.
Initially, clusters are located randomly. At the higher levels of , clusters move to near the center of gravity of data because is comparatively small and the membership function extends over the whole data area and is extremely uniform.
As is lowered, the width of the membership functions becomes narrower and the associations of data become less fuzzy. In this process, in the Shannonentropybased FCM clustering, the clusters move to their nearest local data distribution centers. However, in the Tsallisentropybased FCM clustering, clusters can move a long distance to optimal positions because of their gentle base slopes.
Figures 9 and 10 illustrate the threedimensional plots of and in the progress of very fast DA clustering.
At the higher temperature, roughness of is smaller than that of . After that, the shapes of both membership functions do not change greatly, because VFA reduces the temperature extremely only at the early annealing stage.
Consequently, because the Tsallisentropybased FCM has gentle slope in the region far from the origin, clusters can move long distance to optimal positions stably and the temperature can be reduced rapidly. This feature makes it possible to use VFA as a cooling schedule of DA for the Tsallisentropybased FCM. On the other hand, final cluster positions obtained by the Shannonentropybased FCM tend to depend on their initial positions.
6.2. Experiment 2
In experiment 2, the iris data set [13] consisting of 150 fourdimensional vectors of iris flowers are used. Three clusters of flowers detected are Versicolor, Virginia, and Setosa. Each cluster consists of 50 vectors.
The Shannon and Tsallisentropybased FCM with DA are examined. VFA is used as a cooling schedule of DA. We set the parameters as follows: , , , , and .
The minimum, maximum, and average values of misclassified data of 100 trials are summarized in Table 1. The Shannonentropybased FCM gives slightly better results than the Tsallisentropybased FCM. However, it is found that the Tsallisentropybased FCM gives the best results when the temperature reduction rate or 2.0, though the best results for the Shannonentropybased FCM are obtained only when . Furthermore, variances of the Tsallisentropybased FCM are smaller than those of the Shannonentropybased FCM. These features indicate that a wide range of values are applicable to Tsallisentropybased FCM.

Figure 6 shows the reduction of the objective values of the Tsallis and Shannonentropybased FCM with decreasing the temperature by VFA. The Shannonentropybased FCM does not converge properly when for and for . That is, with larger values, the Shannonentropybased FCM becomes unstable.
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
7. Conclusion
By maximizing the Tsallisentropy, the membership function of the Tsallisentropybased FCM is formulated. It has a more gentle base slope in the region far from the origin than that of the Shannon and fuzzyentropybased FCMs. This feature allows clusters to move long distance and the temperature can be reduced rapidly in the Tsallisentropybased FCM.
Next, the deterministic annealing (DA) method using very fast annealing (VFA) as its cooling schedule is applied to the Tsallisentropybased FCM. VFA initially decreases a temperature extremely, and experimental results showed that the Tsallisentropybased FCM was suitable for DA combined with VFA.
Our future works include convergence and computational time test under various conditions (temperatures and parameters, especially value) of the Tsallisentropybased FCM using very fast deterministic annealing. They also include experiments and examinations of its applications.
Acknowledgment
This research has been supported by the Kayamori Foundation of Informational Science Advancement.
References
 K. Rose, E. Gurewitz, and B. G. Fox, “A deterministic annealing approach to clustering,” Pattern Recognition Letters, vol. 11, no. 9, pp. 589–594, 1990. View at: Google Scholar
 S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983. View at: Google Scholar
 J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Prenum Press, New York, NY, USA, 1981.
 R.P. Li and M. Mukaidono, “A maximumentropy approach to fuzzy clustering,” in Proceedings of the 4th IEEE International Conference on Fuzzy Systems, (FUZZIEEE/ IFES ’95), pp. 2227–2232, March 1995. View at: Google Scholar
 S. Miyamoto and M. Mukaidono, “Fuzzy cmeans as a regularization and maximum entropy approach,” in Proceedings of the 7th International Fuzzy Systems Association World Congress, vol. II, pp. 86–92, 1997. View at: Google Scholar
 A. De Luca and S. Termini, “A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory,” Information and Control, vol. 20, no. 4, pp. 301–312, 1972. View at: Google Scholar
 M. Yasuda, T. Furuhashi, M. Matsuzaki, and S. Okuma, “Fuzzy clustering using deterministic annealing method and its statistical mechanical characteristics,” in Proceedings of the 10th IEEE International Conference on Fuzzy Systems, pp. 797–800, December 2001. View at: Google Scholar
 C. Tsallis, “Possible generalization of BoltzmannGibbs statistics,” Journal of Statistical Physics, vol. 52, no. 12, pp. 479–487, 1988. View at: Publisher Site  Google Scholar
 M. Ménard, V. Courboulay, and P. A. Dardignac, “Possibilistic and probabilistic fuzzy clustering: unification within the framework of the nonextensive thermostatistics,” Pattern Recognition, vol. 36, no. 6, pp. 1325–1342, 2003. View at: Publisher Site  Google Scholar
 M. Yasuda, “Entropy maximization and deterministic annealing approach to fuzzy cmeans clustering,” in Proceedings of Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on Advanced Intelligent Systems (SCIS&ISIS '08), 2008. View at: Google Scholar
 M. Yasuda, “Entropy maximization and very fast deterministic annealing approach to fuzzy cmeans clustering,” in Proceedings of the Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems, 2008. View at: Google Scholar
 B. E. Rosen, “Function optimization based on advanced simulated annealing,” in Proceedings of the IEEE Workshop on Physics and Computation, (PhysComp '92), pp. 289–293, 1992. View at: Google Scholar
 R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, part 2, pp. 179–188, 1936. View at: Google Scholar
 M. Yasuda, T. Furuhashi, and S. Okuma, “Entropy based fuzzy cmeans clustering: analogy with statistical mechanics,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, vol. 17, no. 4, pp. 468–476, 2005. View at: Google Scholar
 L. E. Reichl, A Modern Course in Statistical Physics, John Wiley & Sons, New York, NY, USA, 1998.
 H. Szu and R. Hartley, “Fast simulated annealing,” Physics Letters A, vol. 122, no. 34, pp. 157–162, 1987. View at: Google Scholar
Copyright
Copyright © 2011 Makoto Yasuda. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.