`Advances in Fuzzy SystemsVolume 2011 (2011), Article ID 960635, 9 pageshttp://dx.doi.org/10.1155/2011/960635`
Research Article

## Deterministic Annealing Approach to Fuzzy C-Means Clustering Based on Entropy Maximization

Department of Electrical and Computer Engineering, Gifu National College of Technology, Kamimakuwa 2236-2, Motosu, Gifu 501-0495, Japan

Received 13 June 2011; Revised 26 August 2011; Accepted 27 August 2011

Academic Editor: Salvatore Sessa

Copyright © 2011 Makoto Yasuda. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper is dealing with the fuzzy clustering method which combines the deterministic annealing (DA) approach with an entropy, especially the Shannon entropy and the Tsallis entropy. By maximizing the Shannon entropy, the fuzzy entropy, or the Tsallis entropy within the framework of the fuzzy c-means (FCM) method, membership functions similar to the statistical mechanical distribution functions are obtained. We examine characteristics of these entropy-based membership functions from the statistical mechanical point of view. After that, both the Shannon- and Tsallis-entropy-based FCMs are formulated as DA clustering using the very fast annealing (VFA) method as a cooling schedule. Experimental results indicate that the Tsallis-entropy-based FCM is stable with very fast deterministic annealing and suitable for this annealing process.

#### 1. Introduction

Statistical mechanics investigates the macroscopic properties of a physical system consisting of several elements. Recently, research activities that attempt to apply statistical mechanical models or tools to information science have become popular.

The deterministic annealing (DA) method [1] is a deterministic variant of the simulated annealing (SA) method [2]. DA characterizes the minimization problem of the cost function as the minimization of the free energy, which depends on temperature and tracks its minimum while decreasing the temperature, and thus it can deterministically optimize the cost function at each temperature. Hence, DA is more efficient than SA, but does not guarantee a global optimal solution.

There exists a strong relationship between the membership functions of the fuzzy c-means (FCM) clustering [3] with the maximum entropy or entropy regularization methods [4, 5] and the statistical mechanical distribution function. That is, FCM regularized with the Shannon entropy gives a membership function similar to the Boltzmann (or Gibbs) distribution function [1, 4], and FCM regularized with the fuzzy entropy [6] gives a membership function similar to the Fermi-Dirac distribution function [7]. These membership functions are suitable for the annealing methods because they contain a parameter corresponding to the system temperature.

Tsallis [8] achieved nonextensive extension of the Boltzmann-Gibbs statistics. Tsallis postulated a generalization form of entropy with a generalization parameter , which, in a limit of , reaches the Shannon entropy. Later on, Ménard et al. [9] derived a membership function by regularizing FCM with the Tsallis entropy.

In this study, the membership function which takes the familiar form of the statistical mechanical distribution function is derived by maximizing the Shannon and fuzzy entropy within the framework of FCM. Similarly, the Tsallis entropy-based FCM membership function is derived [10, 11] by maximizing the Tsallis entropy. Then, the formulations of the free energy for these membership functions are calculated and examined from the statistical mechanical viewpoint.

On the other hand, there are some representative cooling schedules of the temperature for SA; for example, inversely proportional to a logarithmic function and inversely proportional to exponential function are well adopted. Rosen [12] proposed the more effective method for SA known as very fast annealing (VFA).

However, an applicability of VFA to DA is not known yet. In order to achieve good clustering by DA, a reliable annealing process is desirable. Therefore, by introducing VFA to DA, we formulate the Shannon- and Tsallis-entropy based FCMs as very fast DA clustering, to examine their reliabilities.

Experiments are performed on the numerical and iris data [13], and the obtained results indicate that Tsallis-entropy-based FCM clustering is suitable for very fast DA clustering because of its shape of the membership function.

#### 2. Entropy Maximization Method

Let be a data set in the -dimensional real space, which should be divided into clusters. In addition, let be the centers of clusters, and let (; ) be the membership functions. Furthermore, let be the objective function of FCM, where .

##### 2.1. Shannon Entropy Maximization of FCM

First, we introduce the Shannon entropy into the FCM clustering. The Shannon entropy is given by Under the normalization constraint of and setting to 1, the fuzzy entropy functional is given by where and are the Lagrange multipliers and must be determined so as to satisfy (3). The stationary condition for (4) leads to the following membership function and the cluster centers

##### 2.2. Fuzzy Entropy Maximization of FCM

We then introduce the fuzzy entropy into the FCM clustering.

The fuzzy entropy is given by The fuzzy entropy functional is given by where and are the Lagrange multipliers [14]. The stationary condition for (8) leads to the following membership function: and the cluster centers In (9), defines the extent of the distribution [7]. Equation (9) is formally normalized as

##### 2.3. Tsallis Entropy Maximization of FCM

Let and be the centers of clusters and the membership functions, respectively.

The Tsallis entropy is defined as where is any real number. The objective function is rewritten as where .

Accordingly, the Tsallis entropy functional is given by The stationary condition for (14) yields the following membership function: where In this case, the cluster centers are given by

In the limit of , the Tsallis entropy recovers the Shannon entropy [8] and approaches in (5).

#### 3. Statistical Mechanical Interpretation of Entropy-Based FCM

##### 3.1. Shannon-Entropy-Based FCM Statistics

In the Shannon-entropy-based FCM, the sum of the states (the partition function) for the grand canonical ensemble of fuzzy clustering can be written as By substituting (18) for [15], the free energy becomes Stable thermal equilibrium requires a minimization of the free energy. By formulating deterministic annealing as a minimization of the free energy, yields This cluster center is the same as that in (6).

##### 3.2. Fuzzy-Entropy-Based FCM Statistics

In the fuzzy-entropy based FCM, by analogy with statistical mechanics, the grand partition function for the grand canonical ensemble of fuzzy clustering can be written as because data can belong to any cluster. By substituting (21) for [15], the free energy becomes It should be noted that , the Legendre transform of the fuzzy entropy, gives the same form for the free energy.

##### 3.3. The Tsallis-Entropy-Based FCM Statistics

On the other hand, and satisfy which leads to Equation (24) makes it possible to regard as an artificial system temperature [15]. Then, the free energy can be defined as can be derived from as also gives

#### 4. Effects of Annealing Temperature

##### 4.1. Dependency of Shapes of Membership Functions on Temperature

By reducing the temperature according to the annealing schedule, the deterministic annealing method achieves thermal equilibrium which minimizes the free energy. At absolute zero, the particle system settles down to the ground state, that is, the state of minimum energy. Figure 1 shows the forms of the entropy functions , , and . Figure 2 shows the forms of the membership functions , , and .

Figure 1: The plots of the entropy functions , and at (a) high and (b) low temperature (, , , ).
Figure 2: The plots of the membership functions , and at (a) high and (b) low temperature (, , , ).

In the deterministic annealing method, cluster distribution which minimizes the free energy is searched at the given temperature. At high temperature, the membership functions are widely distributed and clusters to which a data belongs are fuzzy. In case of with , the width of the distribution is roughly proportional to . At the limit of low temperature, on the other hand, fuzzy clustering reaches hard clustering. The relationship suggests that the higher temperature causes the larger entropy state, that is, chaotic state. This increase of the entropy is the result of the extent of the membership function.

In Figure 2, it can be seen that has a flat peak, though both and have Gaussian forms. Also, it can be found that has a more gentle base slope than .

##### 4.2. Cooling Schedule
###### 4.2.1. Representative Annealing Methods

In SA, the temperature decreases according to a cooling schedule. The representative cooling schedules [16] for SA are(i)proportional to an exponential function where is a sufficiently high initial temperature, is a parameter which defines a temperature reduction speed, and is the number of iterations, (ii)inversely linear function (iii)inversely proportional to a logarithmic function (iv)inversely proportional to exponential function

###### 4.2.2. Very Fast Annealing

Rosen proposed another inversely proportional to exponential function known as very fast annealing (VFA).

In VFA, is given by where is a temperature reduction parameter and is a dimension of a state space. Equations (31) and (32) are compared in Figure 3. It is observed that VFA initially decreases a temperature extremely.

Figure 3: The plots of the cooling functions of (a) proportional to an exponential () and (b) very fast annealing methods ().

In Section 6, we apply VFA as a cooling schedule of entropy based FCM clustering using DA.

#### 5. Fuzzy C-Means as Clustering Algorithm Using Very Fast Annealing DA

The very fast deterministic annealing algorithm for the Tsallis-entropy-based FCM is given as follows.(1)Set the number of clusters , the highest temperature , the temperature reduction rate , and the threshold of convergence test and ;(2)generate initial clusters at random positions and assign each data point to the nearest cluster. Set current temperature to ; (3)calculate by (15); (4)calculate cluster centers by (17); (5)compare the difference between the current centers and those obtained at the previous iteration . If the convergence condition If is satisfied, then go to (6), otherwise go back to (3); (6)if is satisfied, then stop. Otherwise decrease the temperature with (32) and go back to (3).

In case of Shannon-entropy-based FCM, (15) is replaced by (5) and (17) is replaced by (6), respectively.

#### 6. Experiments

##### 6.1. Experiment 1

In experiment 1, we generated five randomly placed clusters composed of 2,000 data points shown in Figure 4. We set to be 10, to be 50, and to be 2 (measured by the scale of Figure 4). We also set or .

Figure 4: The numerical data.

First, we have applied the inversely exponential scheduling method to the Tsallis-entropy-based FCM clustering. The cooling schedule is illustrated in Figure 5. The changes of are parameterized by the temperature reduction rate : from 1 to 1000.

Figure 5: The inversely exponential cooling schedule of DA. The temperature decreases from . (Inverse of temperature increases from or to ) The curves are parameterized by the temperature reduction rate .

At the higher levels of (Figure 5 (A)), clusters are created near the center of gravity of data because is comparatively small and the membership function extends over the whole data area and is extremely uniform. As is lowered from Figure 5 (B) to (C), the width of the membership functions becomes narrower; that is, the Tsallis entropy decreases, and the associations become less fuzzy. And finally, the desired result is obtained.

In case of or (Figure 5 (E) or (F)), it is observed that and converge more rapidly. In case of , however, the initial distribution of becomes too wide and the algorithm is not converged with and (indicated by “not converged” in Figure 5). Thus, it is important to set and values properly.

To examine the effectiveness of VFA as a cooling schedule of DA, we made numerical experiments of the Shannon- and Tsallis-entropy-based FCM clustering.

The shifts of cluster centers with decreasing temperature are illustrated in Figures 7 and 8.

Initially, clusters are located randomly. At the higher levels of , clusters move to near the center of gravity of data because is comparatively small and the membership function extends over the whole data area and is extremely uniform.

As is lowered, the width of the membership functions becomes narrower and the associations of data become less fuzzy. In this process, in the Shannon-entropy-based FCM clustering, the clusters move to their nearest local data distribution centers. However, in the Tsallis-entropy-based FCM clustering, clusters can move a long distance to optimal positions because of their gentle base slopes.

Figures 9 and 10 illustrate the three-dimensional plots of and in the progress of very fast DA clustering.

At the higher temperature, roughness of is smaller than that of . After that, the shapes of both membership functions do not change greatly, because VFA reduces the temperature extremely only at the early annealing stage.

Consequently, because the Tsallis-entropy-based FCM has gentle slope in the region far from the origin, clusters can move long distance to optimal positions stably and the temperature can be reduced rapidly. This feature makes it possible to use VFA as a cooling schedule of DA for the Tsallis-entropy-based FCM. On the other hand, final cluster positions obtained by the Shannon-entropy-based FCM tend to depend on their initial positions.

##### 6.2. Experiment 2

In experiment 2, the iris data set [13] consisting of 150 four-dimensional vectors of iris flowers are used. Three clusters of flowers detected are Versicolor, Virginia, and Setosa. Each cluster consists of 50 vectors.

The Shannon- and Tsallis-entropy-based FCM with DA are examined. VFA is used as a cooling schedule of DA. We set the parameters as follows: , , , , and .

The minimum, maximum, and average values of misclassified data of 100 trials are summarized in Table 1. The Shannon-entropy-based FCM gives slightly better results than the Tsallis-entropy-based FCM. However, it is found that the Tsallis-entropy-based FCM gives the best results when the temperature reduction rate or 2.0, though the best results for the Shannon-entropy-based FCM are obtained only when . Furthermore, variances of the Tsallis-entropy-based FCM are smaller than those of the Shannon-entropy-based FCM. These features indicate that a wide range of values are applicable to Tsallis-entropy-based FCM.

Table 1: Comparison of minimum, maximum, and average values of misclassified iris data (100 trials).

Figure 6 shows the reduction of the objective values of the Tsallis- and Shannon-entropy-based FCM with decreasing the temperature by VFA. The Shannon-entropy-based FCM does not converge properly when for and for . That is, with larger values, the Shannon-entropy-based FCM becomes unstable.

Figure 6: Reduction of the objective values for iris data with decreasing the temperature by VFA. The curves are parameterized by the temperature reduction rate .
Figure 7: The shifts of cluster centers of Shannon entropy based clustering with decreasing the temperature by VFA ().
Figure 8: The shifts of cluster centers of Tsallis entropy based clustering with decreasing the temperature by VFA (, ).
Figure 9: The changes of the landscape of with decreasing the temperature.
Figure 10: The changes of the landscape of with decreasing the temperature ().

#### 7. Conclusion

By maximizing the Tsallis-entropy, the membership function of the Tsallis-entropy-based FCM is formulated. It has a more gentle base slope in the region far from the origin than that of the Shannon- and fuzzy-entropy-based FCMs. This feature allows clusters to move long distance and the temperature can be reduced rapidly in the Tsallis-entropy-based FCM.

Next, the deterministic annealing (DA) method using very fast annealing (VFA) as its cooling schedule is applied to the Tsallis-entropy-based FCM. VFA initially decreases a temperature extremely, and experimental results showed that the Tsallis-entropy-based FCM was suitable for DA combined with VFA.

Our future works include convergence and computational time test under various conditions (temperatures and parameters, especially -value) of the Tsallis-entropy-based FCM using very fast deterministic annealing. They also include experiments and examinations of its applications.

#### Acknowledgment

This research has been supported by the Kayamori Foundation of Informational Science Advancement.

#### References

1. K. Rose, E. Gurewitz, and B. G. Fox, “A deterministic annealing approach to clustering,” Pattern Recognition Letters, vol. 11, no. 9, pp. 589–594, 1990.
2. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
3. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Prenum Press, New York, NY, USA, 1981.
4. R.-P. Li and M. Mukaidono, “A maximum-entropy approach to fuzzy clustering,” in Proceedings of the 4th IEEE International Conference on Fuzzy Systems, (FUZZIEEE/ IFES ’95), pp. 2227–2232, March 1995.
5. S. Miyamoto and M. Mukaidono, “Fuzzy c-means as a regularization and maximum entropy approach,” in Proceedings of the 7th International Fuzzy Systems Association World Congress, vol. II, pp. 86–92, 1997.
6. A. De Luca and S. Termini, “A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory,” Information and Control, vol. 20, no. 4, pp. 301–312, 1972.
7. M. Yasuda, T. Furuhashi, M. Matsuzaki, and S. Okuma, “Fuzzy clustering using deterministic annealing method and its statistical mechanical characteristics,” in Proceedings of the 10th IEEE International Conference on Fuzzy Systems, pp. 797–800, December 2001.
8. C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,” Journal of Statistical Physics, vol. 52, no. 1-2, pp. 479–487, 1988.
9. M. Ménard, V. Courboulay, and P. -A. Dardignac, “Possibilistic and probabilistic fuzzy clustering: unification within the framework of the non-extensive thermostatistics,” Pattern Recognition, vol. 36, no. 6, pp. 1325–1342, 2003.
10. M. Yasuda, “Entropy maximization and deterministic annealing approach to fuzzy c-means clustering,” in Proceedings of Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on Advanced Intelligent Systems (SCIS&ISIS '08), 2008.
11. M. Yasuda, “Entropy maximization and very fast deterministic annealing approach to fuzzy c-means clustering,” in Proceedings of the Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems, 2008.
12. B. E. Rosen, “Function optimization based on advanced simulated annealing,” in Proceedings of the IEEE Workshop on Physics and Computation, (PhysComp '92), pp. 289–293, 1992.
13. R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, part 2, pp. 179–188, 1936.
14. M. Yasuda, T. Furuhashi, and S. Okuma, “Entropy based fuzzy c-means clustering: analogy with statistical mechanics,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, vol. 17, no. 4, pp. 468–476, 2005.
15. L. E. Reichl, A Modern Course in Statistical Physics, John Wiley & Sons, New York, NY, USA, 1998.
16. H. Szu and R. Hartley, “Fast simulated annealing,” Physics Letters A, vol. 122, no. 3-4, pp. 157–162, 1987.