Quantitative Analyses and Development of a -Incrementation Algorithm for FCM with Tsallis Entropy Maximization
Tsallis entropy is a -parameter extension of Shannon entropy. By extremizing the Tsallis entropy within the framework of fuzzy -means clustering (FCM), a membership function similar to the statistical mechanical distribution function is obtained. The Tsallis entropy-based DA-FCM algorithm was developed by combining it with the deterministic annealing (DA) method. One of the challenges of this method is to determine an appropriate initial annealing temperature and a value, according to the data distribution. This is complex, because the membership function changes its shape by decreasing the temperature or by increasing . Quantitative relationships between the temperature and are examined, and the results show that, in order to change equally, inverse changes must be made to the temperature and . Accordingly, in this paper, we propose and investigate two kinds of combinatorial methods for -incrementation and the reduction of temperature for use in the Tsallis entropy-based FCM. In the proposed methods, is defined as a function of the temperature. Experiments are performed using Fisher’s iris dataset, and the proposed methods are confirmed to determine an appropriate value in many cases.
Statistical mechanics investigates the macroscopic properties of a physical system consisting of multiple elements. In recent years, a popular area of research has been the application of statistical mechanical models or tools to information science.
There exists a strong relationship between the membership functions of fuzzy -means clustering (FCM)  and the maximum entropy or entropy regularization methods [2, 3] and the statistical mechanical distribution functions. In other words, FCM, when regularized or maximized with a Shannon-like entropy, yields a membership function that is similar to the Boltzmann (or Gibbs) distribution function [2, 4], and when regularized or maximized with fuzzy-like entropy , FCM yields a membership function similar to the Fermi-Dirac distribution function . These membership functions are suitable for annealing methods, because they contain a parameter corresponding to a system temperature. The advantage of using entropy maximization methods is that fuzzy clustering can be interpreted and analyzed from both statistical physical and information-processing points of view.
Tsallis  achieved a nonextensive extension of Boltzmann-Gibbs statistics by postulating a generalized form of entropy with a generalization parameter , which, in the limit as goes to , approaches the Shannon entropy. Tsallis entropy is applicable to numerous fields, including physics, chemistry, bioscience, networks, and computer science, and it has proved to be useful [8–10]. For example, Tsallis entropy can be applicable for attribute selection in network intrusion detection . It also can be utilized as an optimization function of thresholding image segmentation . In [13, 14], Menard et al. discussed fuzzy clustering in the framework of nonextensive thermostatistics. By taking the possibilistic constraint into account, the possibilistic membership function was derived, and its properties were considered from various viewpoints.
On the other hand, based on the Tsallis entropy, another form of entropy (or a measure of fuzziness) for a membership function can be defined. A form of the membership function can then be derived by extremizing (maximizing) this entropy within the framework of FCM . In comparison with the conventional entropy maximization methods [2, 3], this method yields superior results .
Deterministic annealing (DA)  is a deterministic variant of simulated annealing, and it can be applied to clustering . By applying DA to FCM using Tsallis entropy, a DA-FCM algorithm using Tsallis entropy has been developed . As for another application example of DA, in , the -parameterized DA expectation maximization algorithm is proposed.
One of the important characteristics of the membership function of this method is that centers of clusters are given as a weighted function of the membership function to the power of . We also note that it changes its shape in a similar way by decreasing the system temperature (or annealing) or by increasing . However, it remains unknown how appropriate value and initial annealing temperature should be determined according to the data distribution.
The purpose of the present study is to overcome the above problem, which involves quantitative analyses of the relationships between the temperature and , and to develop -incrementation algorithms by integrating and the temperature.
The analyses show that the temperature and affect almost inversely. Based on these results, we developed two kinds of -incrementation algorithms for Tsallis entropy-based FCM, in which is defined as a function of the temperature. These algorithms are compared with the conventional Tsallis entropy-based DA-FCM method.
In the first algorithm, is increased so as to maintain similar shapes of with the conventional -reduction method. In the second algorithm, is defined as an inverse of a decreasing pseudo-temperature.
Experiments are performed using Fisher’s iris dataset , and it was confirmed that, in many cases, appropriate value is determined automatically from the temperature. Furthermore, the proposed methods improve the accuracy of classification and are superior to the conventional method.
However, it was also found that the number of computation iterations depends on , and sometimes it becomes greater than that of the conventional method; this suggests that should be optimized to some extent.
2. Entropy Maximization Method
Let () be a data set in -dimensional real space, which is to be divided into clusters. In addition, let () be the centers of the clusters, and let () be the membership functions. Furthermore, letbe the FCM objective function that is to be minimized.
2.1. Entropy Maximization for FCM
The Tsallis entropy is defined aswhere is the total number of microscopic possibilities of the system.
Based on (2), the entropy (or a measure of the fuzziness) of a membership function is defined as follows:
The objective function can be written as
Under the normalization constraint ofthe Tsallis entropy functional is given bywhere and are the Lagrange multipliers and must be determined so as to satisfy (5).
By extremizing (6) with respect to , the stationary condition yields the following membership function:where
In the same way, the center of the cluster is given by
and satisfywhich leads toBy analogy with statistical mechanics, this relationship makes it possible to regard as the internal energy and as an artificial system temperature .
3. Dependencies of on Temperature and
In (9), works as a weight value to each , and it determines . In this paper, for simplicity, is set to be . This makes the denominator of (7) become the sum of the same forms of its numerator. In Figures 1(a) and 1(b), the numerator of is plotted as a function of , parameterized by and , respectively. In these figures, in order to examine the shape of (the subscript is omitted in this formula) as a function of the distance between the center of the cluster and various data points, is considered to be a continuous variable .
The extent of becomes narrower with increasing and as the temperature decreases, the distribution becomes narrower. This leads to -incrementation clustering instead of annealing or -reduction.
4. Quantitative Relationship between Temperature and
As stated in the previous section, and inversely affect the extent of , which changes in a similar way with increasing or decreasing . Accordingly, in order to examine the quantitative relationship between and , we change them independently, as follows.
First, we define
Then, is calculated by fixing and to some constants and . Next, by decreasing , we determine the values that minimize the sum of squares of the residuals of these two functions:
In these calculations, the parameters are set as follows: is set to ; the domain of is set to ; the number of sampling points of the sum of residuals is ( and in (13) are set to and , resp.).
For values of , , , and and for decreasing from , the value that minimizes the sum of squares of the residuals (expressed by ) is shown in Figure 2(a).
Figure 2(b), on the other hand, shows the results of cases in which is set to and is lowered from , 20.0, 100.0, 200.0.
Approximate curves in Figures 2(a) and 2(b) are obtained by fitting the data to the following formula:where and are the fitting parameters. Optimal values for these parameters are summarized in Tables 1 and 2. It was found that is nearly equal to , suggesting that is inversely proportional to . In addition, it can be seen that though does not change its value, increases with increasing . Accordingly, by using the relationship of and as shown in Tables 1 and 2, -incrementation clustering is possible.
5. -Incrementation FCM Algorithm
In this section, we develop -incrementation FCM that uses Tsallis entropy instead of annealing. We begin by considering parameters and in (14) for and . In this case, is derived from by the following equation:
The temperature is held at during the clustering.
The -incrementation FCM algorithm using Tsallis entropy maximization is presented as follows:(1)Set the number of clusters , the highest temperature , the temperature reduction parameter, the thresholds of the convergence test and , and the -incrementation parameter.(2)Generate initial clusters at random positions and assign each data point to the nearest cluster. Set the current temperature to .(3)Calculate the membership function using (7).(4)Calculate the cluster centers using (9).(5)Compare the differences between the current centers and the centers of the previous iteration . If the convergence condition is satisfied, then go to Step (6). Otherwise, return to Step (3).(6)Compare the difference between the current centers and the centers obtained in the previous iteration . If the convergence condition is satisfied, then stop. Otherwise, update using (15) and return to Step (3).
6. Experiment 1
In Experiment 1, classification results of the conventional -reduction method and the -incrementation methods in the previous section are compared to examine if they give similar results.
In the experiment, we used Fisher’s iris dataset , consisting of 150 four-dimensional vectors of iris flowers. The dataset contains three clusters of flowers: versicolor, virginica, and setosa. Each cluster consists of 50 vectors.
The parameters were set as follows:
6.1. Classification Results of the -Reduction and -Incrementation Methods
The maximum, minimum, and average numbers of misclassified data points of the conventional -reduction and the -incrementation methods for 1000 trials are summarized in Table 3.
The maximum, minimum, and average numbers of computation iterations required for the conventional -reduction and the -incrementation methods for 1000 trials are summarized in Table 4.
Tables 3 and 4 show that the -incrementation method reduces misclassifications and requires fewer computation iterations than does the -reduction method; this is true even though is increased in order to minimize the sum of squares of the residuals of and . This suggests that there exists a significant difference between the shapes of in -reduction and those in -incrementation.
By comparing the plots of and and those of and , it can be seen that when , both ’s have a similar shape. However, when and is large, has a steeper slope than does ; this results in a lack of agreement of the clustering results.
7. Modified -Incrementation FCM Algorithm
In the previous section, it was confirmed that -incrementation clustering is available for FCM using Tsallis entropy maximization.
In this section, we consider a very simple and general algorithm, in which is fixed to and is defined as the inverse of , where is a pseudo-temperature that is decreased using the DA method. That is, is given aswhere is any small constant (the small constant is added in order to prevent from reaching when ; this needs to be avoided because, in the limit of , the Tsallis entropy equals the Shannon entropy). Steps (1), (2), and (6) in the algorithm presented in Section 5 should be changed as follows:(1)Set the number of clusters , the highest temperature , the temperature reduction parameter, the thresholds of the convergence test and , and the initial value.(2)Generate initial clusters at random positions and assign each data point to the nearest cluster. Set the current temperature to .(6)Compare the difference between the current centers and the centers obtained in the previous iteration . If the convergence condition is satisfied, then stop. Otherwise, decrease , update using (16), and return to Step (3).
8. Experiment 2
In Experiment 2, we compare the classification results of the -reduction method and those of the modified -incrementation method (hereafter the proposed method) that was presented in Section 7.
Fisher’s iris dataset was clustered using the same parameters as were used in Experiment 1, with the exception that was changed from to for the conventional method with , and was changed from to for the proposed method. VFA was used as the cooling schedule for both methods, and in (16) was set to .
8.1. Classification Results of the -Reduction and Modified -Incrementation Methods
The maximum, minimum, and average numbers of misclassified data points and required number of computation iterations for 1000 trials each of the conventional -reduction and the proposed method are summarized in Tables 5 and 6, respectively. In both cases, was set to .
The numbers of misclassified data points and the number of computation iterations required for the proposed method with are close to the minimums of those numbers for the conventional method with and , respectively.
In the conventional method, as increases, the number of misclassified data points decreases; this occurs because is narrowly distributed when is large, and thus clustering is done locally and optimally. On the other hand, the number of required computation iterations tends to decrease with decreasing , because when is small, clustering has been done widely and efficiently.
In the proposed method, is initially given as , which is nearly equal to . Thus, at an early stage in annealing, clustering is automatically done widely. This is because the modified method does not require as many iterations.
In summary, the conventional method has an inconsistency, in that the value that minimizes the number of misclassified data points increases the number of computation iterations that are required. However, by setting to be the same as the value used in the conventional method, the proposed method is better able than the conventional method to balance the number of misclassified data points with the number of computation iterations that are required.
8.2. Properties of the Modified -Incrementation Method
In this subsection, we examine the reason why the proposed method has improved clustering.
Tables 7 and 8 summarize, for changing from to , the maximum, minimum, and average numbers of misclassified data points and the number of computation iterations required for 1000 trials of the proposed method.
In Table 8, the number of computation iterations required by the proposed method decreases with decreasing until . After that point, it begins to increase, suggesting that there exist minima in . The reasons for this property are considered to be as follows. When is as high as , the relative size of is too small to change the shape of , and this increases the required number of computation iterations. However, when , the width of is very narrow, and a long time is required for the centers of the clusters to converge. For these reasons, we assume there is at least one minimum in .
As stated in the previous subsection, the proposed method can limit the number of misclassified data points to as few as to points. Thus, we conclude that does not significantly affect the number of misclassifications.
In summary, our experiments confirmed that when increases as the inverse of the decreasing pseudo-temperature, the proposed method works at least as well as the conventional method. However, the number of computation iterations required by the proposed method apparently depends on the value of , and it is not yet known how to determine an appropriate value of for a given dataset.
We formulated the membership function of Tsallis entropy-based FCM by maximizing the Tsallis entropy functional. In this formulation, the -parameter of the Tsallis entropy strongly affects the accuracy of the clustering.
In order to determine an appropriate value of for a given data distribution, it is first necessary to examine quantitatively the effect of on the extent of . We determined that, in order to minimize the square of the residual of for the -reduction and -incrementation, must be increased as the inverse of the temperature.
Based on this relationship, we proposed two kinds of -incrementation methods and combined them with the DA method. In the first method, is increased according to the approximation function that minimizes the square of the residual of . The experimental results show that, compared to the conventional annealing method, the proposed method reduces both the number of misclassifications and the number of required computation iterations.
In the second method, is simply defined as the inverse of the decreasing pseudo-temperature. The experimental results reveal that, in most cases, this method determines an appropriate value, has improved accuracy, and is superior to the conventional method.
However, the results also confirm that the number of computation iterations depends on , and, in some cases, it can become greater than that of the conventional method. This should be avoidable by using the value of that minimizes the number of iterations.
In the future, first we intend to estimate the validity of our approximation method used in Sections 3 and 4 accurately. We then intend to explore ways to increase efficiency by performing convergence and computation time tests using various formulas of and . Furthermore, we intend to develop a better schedule for annealing and a method for optimizing .
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
The present study was supported by JSPS KAKENHI Grant no. 25330297.
J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Prenum Press, New York, NY, USA, 1981.
R.-P. Li and M. Mukaidono, “A maximum entropy approach to fuzzy clustering,” in Proceedings of the 4th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE/IFES '95), pp. 2227–2232, 1995.View at: Google Scholar
S. Miyamoto and M. Mukaidono, “Fuzzy C-means as a regularization and maximum entropy approach,” in Proceedings of the 7th International Fuzzy Systems Association World Congress (IFSA '97), vol. 2, pp. 86–92, Prague, Czech Republic, June 1997.View at: Google Scholar
M. Yasuda, T. Furuhashi, M. Matsuzaki, and S. Okuma, “Fuzzy clustering using deterministic annealing method and its statistical mechanical characteristics,” in Proceedings of the 10th IEEE International Conference on Fuzzy Systems, vol. 3, pp. 797–800, December 2001.View at: Google Scholar
S. Abe and Y. Okamoto, Eds., Nonextensive Statistical Mechanics and Its Applications, Springer, 2001.
M. Gell-Mann and C. Tsallis, Eds., Nonextensive Entropy—Interdisciplinary Applications, Oxford University Press, New York, NY, USA, 2004.
C. Tsallis, Ed., Introduction to Nonextensive Statistical Mechanics, Springer, 2009.
C. F. L. Lima, F. M. Assis, and C. P. de Souza, “A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection,” in Proceedings of the IEEE International Workshop on Measurements and Networking, pp. 77–82, IEEE, Anacapri, Italy, October 2011.View at: Publisher Site | Google Scholar
M. Menard, P. Dardignac, and C. C. Chibelushi, “Non-extensive thermostatistics and extreme physical information for fuzzy clustering,” International Journal of Computational Cognition, vol. 2, no. 4, pp. 1–63, 2004.View at: Google Scholar
M. Yasuda, “Entropy maximization and very fast deterministic annealing approach to fuzzy C-means clustering,” in Proceedings of the 5th Joint International Conference on Soft Computing and 11th International Symposium on Intelligent Systems, SU-B1-3, pp. 1515–1520, 2010.View at: Google Scholar
L. E. Reichl, A Modern Course in Statistical Physics, John Wiley & Sons, New York, NY, USA, 1998.