ISRN Artificial Intelligence

VolumeΒ 2012Β (2012), Article IDΒ 929085, 6 pages

http://dx.doi.org/10.5402/2012/929085

## Generalized Fuzzy C-Means Clustering with Improved Fuzzy Partitions and Shadowed Sets

^{1}Machine Vision Laboratory, Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad 9177948944, Iran^{2}Departments of Electrical Engineering and Computer Engineering, Ferdowsi University of Mashhad, Mashhad 9177948944, Iran

Received 13 September 2011; Accepted 18 October 2011

Academic Editor: I.Β Buciu

Copyright Β© 2012 Seyed Mohsen Zabihi and Mohammad-R Akbarzadeh-T. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Clustering involves grouping data points together according to some measure of similarity. Clustering is one of the most significant unsupervised learning problems and do not need any labeled data. There are many clustering algorithms, among which fuzzy c-means (FCM) is one of the most popular approaches. FCM has an objective function based on Euclidean distance. Some improved versions of FCM with rather different objective functions are proposed in recent years. Generalized Improved fuzzy partitions FCM (GIFP-FCM) is one of them, which uses norm distance measure and competitive learning and outperforms the previous algorithms in this field. In this paper, we present a novel FCM clustering method with improved fuzzy partitions that utilizes shadowed sets and try to improve GIFP-FCM in noisy data sets. It enhances the efficiency of GIFP-FCM and improves the clustering results by correctly eliminating most outliers during steps of clustering. We name the novel fuzzy clustering method shadowed set-based GIFP-FCM (SGIFP-FCM). Several experiments on vessel segmentation in retinal images of DRIVE database illustrate the efficiency of the proposed method.

#### 1. Introduction

One of the first mental activities of humans is clustering, where the goal is to find data structures and assign similar data to a group (cluster). Actually it tries to put unlabeled data into groups so that data points in a group are more similar to each other compared to those in other groups. It means that the goal is to maximize the intraclass likeness while minimizing the intercluster similarity.

Clustering has been used in diverse areas like machine vision and pattern recognition as well as in medical applications. There is a wide array of clustering algorithms [1β8]. But they can be generally classified to some groups as listed below:(i)exclusive clustering: one data point to belong to one cluster. No Overlapping (K-means, any linear classifier belongs to this kind),(ii)overlapping clustering: data can be in two or more clusters (fuzzy C-means),(iii)hierarchical clustering,(iv)probabilistic clustering (model based).

Among the above, overlapping clustering or partition-based clustering methods that group objects with some membership degrees are often used to handle noisy and uncertain data and hence can have a good utility in many practical applications.

Probabilistic and fuzzy clustering are two kinds of overlapping clustering methods when we consider the uncertainty in data. Actually, crisp clustering can be considered as a special type of these two clustering methods. FCM [9] is the most popular one among the other presented fuzzy clustering methods. It goes without saying that the later proposed fuzzy clustering methods are based on FCM in general. FCM and its derived methods cluster data according to an objective function and several constraints. For instance, the summation of all the memberships of each data point to all clusters must be one.

A novel presentation of constraining the membership functions can be seen in [10]. In addition, a new FCM algorithm is also presented in [10], which is based on improved fuzzy partitions, named IFP-FCM. This algorithm is driven by crisp membership degrees, so it may seem resistant to noise and outliers. However, the fuzziness index *m* [10] is fixed and cannot be changed. Since the clusters in a data set have different densities, the performance of FCM may significantly depend on the choice of fuzziness index. Therefore, a good value of this parameter should be adopted to be consequent on the data distribution in data set [11]. So, another objective function should be introduced so that adjusts *m*. In [12] GIFP-FCM is proposed that does not restrict *m* to a fixed value. There is a parameter in this algorithm () that connects it to FCM and IFP-FCM. We can say briefly that this algorithm has the benefit of IFP-FCM and also generalizes it because of the various values that *m* can get.

In the paper, we present a novel Shadowed set-based Generalized Improved fuzzy partitions FCM (SGIFP-FCM). This new algorithm uses shadowed sets and the most important idea of the proposed method is to improve performance of GIFP-FCM in the stage of determining new cluster centers at each iteration. This is accomplished by removing the outliers and unsuitable data that have negative effects on structure of clusters. We will see that it improves the results of clustering in noisy systems and also it decreases the time of clustering in comparison with GIFP-FCM and the previous methods.

Clustering methods are often used in image processing applications such as in image segmentation [7]. To investigate the proficiency and utility of the presented method, the algorithm is applied here to vessel extraction of retinal images. These images are often used in medical applications to diagnosis of some diseases such as diabetes. The proposed approach is compared against other competing clustering algorithms on this database as well as an artificial dataset.

Some fuzzy clustering algorithms are briefly reviewed in Section 2. Section 3 describes the proposed algorithm (SGIFP-FCM) in detail. Section 4 shows experimental results of our work. Finally, conclusion is brought in Section 5.

#### 2. Previous Fuzzy Clustering Algorithms

Many areas such as pattern recognition and machine vision utilize fuzzy clustering in solving their problems. A great number of fuzzy clustering algorithms are there, and it is noticeable that most of them use distance criteria. One of these algorithms that is so common and popular is FCM [13]. Reverse distance is used in FCM to fuzzy memberships.

In FCM, each feature vector can belong to every cluster with a coefficient between zero and one. Finally, algorithms label each data point (feature vector) based on the maximum coefficient of this data point over all clusters.

The fuzzy membership matrix and the cluster centers are computed by minimizing the following partition formula:

In this equation, *n* denotes the number of data, *c* the number of clusters, the fuzzy membership of the th data point to the th cluster, the Euclidean distance between the data point and the cluster center, and a fuzzy weighting factor that defines the degree of fuzziness of the results. The data class becomes fuzzier and less discriminating with increasing . In general, is chosen (it is mentioned that this value of does not produce optimal solution for all problems).

The constraint in (1) implies that each point must entirely distribute its membership among all the clusters. The cluster centers (centroids) are determined as the fuzzy-weighted center of gravity of the data ,

Since affects the computation of the cluster center , the data with a high membership will influence the prototype location more than points with a low membership. For the fuzzy C-means algorithm, distance is defined as follows:

The cluster centers represent the typical values of that cluster, whereas the component of the membership matrix denotes the extent to which the data point is similar to its prototype. The minimization of the partition functional (1) will give the following expression for the membership:

Equation (4) is determined in an iterative way since the distance depends on membership .

The procedure to calculate the FCM is as follows.(1) Opt for the number of clusters , ; Choose , *.* Initialize .(2) Calculate the cluster centers usingββ(2).(3) Calculate new partition matrix usingββ(4).(4) Compare and . If the variation of the membership degree , calculated with an appropriate norm, is smaller than a given threshold, terminate the algorithm, in other respects go back to step .

FCM clustering has a shortcoming in producing membership functions. For instance, these functions are not bounded and do not have to decay rapidly, so cannot be understood locally. Distances to the Voronoi cell of the cluster instead of using distances to the cluster prototypes are applied in IFP-FCM method [10] by changing FCMβs objective function. Objective function of IFP-FCM can be seen below:

In this equation *a _{j}* is a rewarding parameter. IFP-FCM seems more resistance to outliers and even noise.

Fuzzy index *m* is equal to 2 in many cases for FCM and IFP-FCM. Despite the fact that this parameter is required to be various or in different values for optimal or near-to-optimal results, fuzziness index *m *should be flexible and generalized. To do that, another objective function is necessary accordingly.

For that reason, GIFP-FCM clustering approach was presented in [12]. It is based on the rival-penalized competitive learning (RPCL) concept [14]:

It is clear that the new objective function uses the opinion of RPCL that for minimizing, only a specified *u _{ij}* gets the maximum reward and the other

*u*(rivals) get the minimum reward.

_{kj}Authors in [12] showed that GIFP-FCM can convert to FCM or IFP-FCM with choosing proper values for some parameters and also it converges quicker rather than the other two clustering algorithms that we discussed before, but computational complexity of GIFP-FCM is the same as theirs. Fuzziness index in IFP-FCM was equal to 2, whereas this parameter can be changed in GIFP-FCM properly.

#### 3. The Proposed Method

In this section, we briefly describe shadowed sets and then try to propose our new clustering algorithm, SGIFP-FCM.

##### 3.1. Shadowed Sets

Suppose that A is a fuzzy set in which is an interval-valued one and maps element of into , and the unit interval . Shadowed set B can be defined by this fuzzy set A so that B is a mapping , where , and illustrate complete exclusion from B, complete inclusion in B, and complete ignorance, respectively. Shadowed sets have some characteristics. For example, they are isomorphic with a three-valued logic. Shadowed sets and logic have similar operations. (For more details you can refer to [15, 16]).

Figure 1 depicts a fuzzy set and also its shadowed set.

##### 3.2. Creating a Shadowed Set

Pedrycz in [17] presented the way of creating a shadowed set from a fuzzy set. Two threshold of and should be defined at first as follows: and ; next, low membership grades and high membership grades should be changed to 0 and 1, respectively. At last, memberships are converted to some grades between and in . Figure 2 shows this trend for a unimodal fuzzy set.

In Fact, the thresholds and are proper choices, since the threshold is an essential part in process of constructing a shadowed set. So, finding such an optimal threshold is of utmost importance. Pedrycz considers the balance of uncertainty for this issue. The balance of uncertainty can be preserved by recompensing the changes of membership grades to zero and one for by creating the shadowed set that βabsorbβ the former elimination of partial membership at low and high ranges of membership.

According to Figure 2, the balance equation for fuzzy set with discrete membership function is defined as

It is better to change the above problem to (8) so that find the optimal value of parameter

##### 3.3. Applying Shadowed Sets to GIFP-FCM

As we saw, clusters centers are modified in each iteration of the clustering algorithm and in addition to this, all data are participated in computation of each cluster center that increases time complexity of the algorithm. For decreasing the time of algorithm and improve the efficiency, we use shadowed sets to remove unsuitable data, outliers, and also data in borders before determining centers of clusters. We show partitions matrix with and each row of this matrix describes a cluster. We should compute optimal threshold () for each row and then can remove outliers. () can be computed with solving the below optimization problem: where

##### 3.4. SGIFP-FCM Algorithm

Now we can summarize our proposed algorithm (S FIFP-FCM) in some steps as follows.

*Step 1. *Set the number of clusters (*c* is between 1 and *n*) and also choose proper values for threshold of stopping algorithm (), parameter in [0, 1) that determines the rate of rewarding, number of iterations and fuzziness index *m*, and initial values of *u _{ij}*.

*Step 2. *Compute the optimum value for each row of partition matrix using (9). If , it does not have any role in determining cluster center.

*Step 3. *Compute clusters centers using

*Step 4. *Compute membership functions using

*Step 5. *If the terminating condition is satisfied, the algorithm is finished, otherwise increase iteration number and go back to Step 2.

#### 4. Results

In the first experiment, we create a random and artificial data set with three clusters or groups of data as it is shown in Figure 3. We can observe the results of these data clustering with three algorithms of FCM, GIFP-FCM, and SGIFP-FCM in Figure 4. Almost FCM and SGIFP-FCM depict similar results and detect the three clusters correctly, but in GIFP-FCM contrary to SGIFP-FCM all data are participated in determining centers and then could not cluster data exactly. This is because of the effects of outliers and data in borders of clusters on determining centers, which are removed in SGIFP-FCM.

In Figure 4, clusters are distinguished with different colors and centers of clusters are specified with β+β. Values of parameters and are supposed to be 2 and 0.9, respectively.

In the second experiment, we use retinal images of DRIVE database [18]. We want to detect vessels in these images using clustering methods. For this purpose, we first extract some features for each pixel of the processed image and then we get these feature vectors to clustering algorithm to label the pixels as vessels or nonvessels or clearly divide the pixels into two clusters of vessels and nonvessels. There are many feature extraction methods. We use LBP (Local Binary Patterns) method for feature extraction [19]. The features vector of each pixel is composed of gray level intensity, variance, and LBP value in three local windows around the central pixel [20].

Sample of a segmented retina image using the proposed algorithm (SGIFP-FCM) is presented in Figure 5 along with manually segmented version of the image.

To evaluate the algorithm, we calculated true positive ratio (TPR) and False Positive Ratio (FPR). TPR is number of pixels of resulting image, which are correctly clustered as vessel (according image generated by human expert) to total number of pixels of human-expert generated image that are labeled as vessel. FPR is the number of pixel of resulting image, which are incorrectly clustered as vessel to total number of pixel of human-expert generated image that are labeled as background.

Table 1 compares performance of retina vessel extraction with SGIFP-FCM and GIFP-FCM versus 2nd human observer with regard to the TPR, FPR, and time of clustering.

The results in Table 1 are average results of 20 test images of DRIVE database.

As we see in Table 1, performance of SGIFP-FCM is better than GIFP not only in extracted vessels but also in time consuming.

#### 5. Conclusion

In this paper, we present a Generalized Fuzzy C-Means Clustering With Improved Fuzzy Partitions and Shadowed sets (SGIFP-FCM). As is illustrated, this new algorithm improves the previous clustering algorithms such as GIFP-FCM in noisy data sets. In the proposed algorithm, the effects of outliers and noisy data are reduced in determining clusters centers using shadowed sets. Finally this new algorithm is tested on several experiments. Performance of this method is evaluated on image segmentation application (retina images) and because there are many noises in these images this algorithm presented better results than GIFP-FCM algorithm.

#### References

- R. Ng and J. Han, βEfficient and effective clustering methods for spatial data mining,β in
*Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94)*, pp. 144β155, Santiago, Chile, 1994. - J. C. Bezdek, R. J. Hathaway, M. J. Sabin, and W. T. Tucker, βConvergence theory for fuzzy c-means: counterexamples and repairs,β
*IEEE Transactions on Systems, Man and Cybernetics*, vol. 17, no. 5, pp. 873β877, 1987. View at Google Scholar Β· View at Scopus - G. Karypis, E. H. Han, and V. Kumar, βChameleon: hierarchical clustering using dynamic modeling,β
*Computer*, vol. 32, no. 8, pp. 68β75, 1999. View at Publisher Β· View at Google Scholar Β· View at Scopus - T. Zhang, R. Ramakrishnan, and M. Livny, βBIRCH: a new data clustering algorithm and its applications,β
*Data Mining and Knowledge Discovery*, vol. 1, no. 2, pp. 141β182, 1997. View at Google Scholar Β· View at Scopus - G. Sheikholeslami, S. Chatterjee, and A. Zhang, βWaveCluster: a multiresolution clustering approach for very large spatial databases,β in
*Proceedings of the 24th International Conference on Very Large Data Bases (VLDB '98)*, pp. 428β439, New York, NY, USA, 1998. - Y. Xiu, S. T. Wang, X. Wu, and D. Hu, βThe directional similarity-based clustering method DSCM,β
*Journal of Computer Research and Development*, vol. 43, no. 8, pp. 1425β1431, 2006 (Chinese). View at Publisher Β· View at Google Scholar - F. L. Chung, S. T. Wang, M. Xu, D. Hu, and Q. Lin, βPossibility theoretic clustering and its preliminary application to large image segmentation,β
*Soft Computing*, vol. 11, no. 2, pp. 103β113, 2007. View at Publisher Β· View at Google Scholar Β· View at Scopus - J. L. Fan, W. Z. Zhen, and W. X. Xie, βSuppressed fuzzy c-means clustering algorithm,β
*Pattern Recognition Letters*, vol. 24, no. 9-10, pp. 1607β1612, 2003. View at Publisher Β· View at Google Scholar Β· View at Scopus - J. C. Bezdek,
*Pattern Recognition With Fuzzy Objective Function Algorithms*, Plenum, New York, NY, USA, 1981. - F. Höppner and F. Klawonn, βImproved fuzzy partitions for fuzzy regression models,β
*International Journal of Approximate Reasoning*, vol. 32, no. 2, pp. 85β102, 2003. View at Google Scholar - C. Hwang and F. Rhee, βUncertain fuzzy clustering: interval type-2 fuzzy approach to C-means,β
*IEEE Transactions on Fuzzy Systems*, vol. 15, no. 1, pp. 107β120, 2007. View at Publisher Β· View at Google Scholar Β· View at Scopus - L. Zhu, F.-L. Chung, and S. Wang, βGeneralized fuzzy C-means clustering algorithm with improved fuzzy partitions,β
*IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics*, vol. 39, no. 3, pp. 578β591, 2009. View at Publisher Β· View at Google Scholar Β· View at PubMed Β· View at Scopus - F. Höppner and F. Klawonn, βA contribution to convergence theory of fuzzy c-means and derivatives,β
*IEEE Transactions on Fuzzy Systems*, vol. 11, no. 5, pp. 682β694, 2003. View at Publisher Β· View at Google Scholar Β· View at Scopus - L. Xu, A. Krzyzak, and E. Oja, βRival penalized competitive learning for clustering analysis, RBF net, and curve detection,β
*IEEE Transactions on Neural Networks*, vol. 4, no. 4, pp. 636β649, 1993. View at Publisher Β· View at Google Scholar Β· View at PubMed Β· View at Scopus - W. Pedrycz, βShadowed sets: representing and processing fuzzy sets,β
*IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics*, vol. 28, no. 1, pp. 103β109, 1998. View at Google Scholar Β· View at Scopus - W. Pedrycz and G. Vukovich, βGranular computing with shadowed sets,β
*International Journal of Intelligent Systems*, vol. 17, no. 2, pp. 173β197, 2002. View at Publisher Β· View at Google Scholar Β· View at Scopus - W. Pedycz, βShadow sets: bridging fuzzy and rough sets,β in
*Rough Fuzzy Hybridization: A New Trend in Decision-Making*, S. K. Pal and A. Skowron, Eds., Springer, Singapore, 1999. View at Google Scholar - M. Niemeijer and B. van Ginneken, http://www.isi.uu.nl/Research/Databases/DRIVE/results.php, 2002.
- T. Ojala, M. Pietikèinen, and T. Mèenpaèa, βMultiresolution gray-scale and rotation invariant texture classification with local binary patterns,β
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 24, no. 7, pp. 971β987, 2002. View at Publisher Β· View at Google Scholar Β· View at Scopus - S. H. Rezatofighi, A. Roodaki, and H. A. Noubari, βAn enhanced segmentation of blood vessels in retinal images using Contourlet,β in
*Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS '08)*, pp. 3530β3533, August 2008. View at Scopus