Fuzzy Functions, Relations, and Fuzzy Transforms: Theoretical Aspects and Applications to Fuzzy Systems
View this Special IssueResearch Article  Open Access
Takeshi Yamamoto, Katsuhiro Honda, Akira Notsu, Hidetomo Ichihashi, "A Comparative Study on TIBA Imputation Methods in FCMddBased Linear Clustering with Relational Data", Advances in Fuzzy Systems, vol. 2011, Article ID 265170, 10 pages, 2011. https://doi.org/10.1155/2011/265170
A Comparative Study on TIBA Imputation Methods in FCMddBased Linear Clustering with Relational Data
Abstract
Relational fuzzy clustering has been developed for extracting intrinsic cluster structures of relational data and was extended to a linear fuzzy clustering model based on Fuzzy Medoids (FCMdd) concept, in which Fuzzy Means(FCM) like iterative algorithm was performed by defining linear cluster prototypes using two representative medoids for each line prototype. In this paper, the FCMddtype linear clustering model is further modified in order to handle incomplete data including missing values, and the applicability of several imputation methods is compared. In several numerical experiments, it is demonstrated that some preimputation strategies contribute to properly selecting representative medoids of each cluster.
1. Introduction
Relational fuzzy clustering is a relational extension of fuzzy clustering for revealing cluster structures buried in relational data. Relational Fuzzy Means (RFCM) [1] extended the Fuzzy Means (FCM) [2] clustering criterion with mutual dissimilarity measures instead of objecttype observation in FCM. Although FCM and other variants of Means [3] use the clustering criterion of the distance between a data point and a cluster prototype, RFCM defines the clustering criterion by using mutual dissimilarities only. When the dissimilarities among objects are measured by squared Euclidean distances, the RFCM criterion is equivalent to the centroidless formulation of the FCM criterion. Using other dissimilarity measures; however, the RFCM criterion has no clear connection with distances between data points and prototypes. In Medoids [4], cluster prototypes are selected from data points, and the clustering criterion coincides with one of mutual dissimilar degree among objects. So, Medoids can be directly extended to relational data analysis even if taking an average cannot be done in nonEuclidean space. Fuzzy Medoids (FCMdd) [5] is a fuzzy extension of Medoids and can deal with various dissimilarity measures.
Linear fuzzy clustering models [6, 7] extract linear substructures by modifying the point prototypes of FCM into lines, planes, and linear varieties. Because the subspace learning model in each cluster can be identified with fuzzy principal component analysis (fuzzy PCA) [8], they are often regarded as a kind of local principal component analysis (local PCA) [9]. This paper studies the FCMddbased linear clustering model [10], which can reveal local linear substructures buried in relational data. In [10], Haga et al. defined each prototypical line by using two representative medoids and demonstrated that the clustering modal can be applied to Euclidean relational data. The FCMddtype linear clustering model was further modified for dealing with nonEuclidean relational data [11, 12], in which data transformation, called spread transformation, was performed before applying the clustering algorithm in a similar manner to NonEuclideantype Relational Fuzzy (NERF) Means [13].
In this paper, a comparative study on the applicability of spread transformation is performed in FCMddbased linear clustering of incomplete relational data. Hathaway and Bezdek [14] proposed several methods for imputing (predicting and substituting) missing elements of incomplete relational data and showed that imputation errors can be revised by spread transformation in NERF Means. This paper demonstrates that the performance of FCMddtype linear fuzzy clustering for incomplete relational data can also be improved by spread transformation through several comparative experiments including an example of document clustering.
The remaining part of this paper is organized as follows. In Section 2, linear clustering and relational clustering are briefly reviewed. Section 3 introduces FCMddtype linear clustering model and applies several imputation methods called TIBA. Comparative results are shown in Section 4, and conclusions are given in Section 5.
2. Linear Clustering and Relational Clustering
2.1. FCMType Linear Clustering
Assume that we have dimensional observations of patterns ,. With the goal of partitioning the patterns into clusters, the objective function for FCMtype clustering is defined as is the fuzzy membership degree of pattern to cluster , and is the fuzzification parameter. The larger the , the fuzzier the membership assignment. is the clustering criterion which measures the deviation between pattern and the prototype of cluster . In the original FCM clustering [2], cluster prototypes are given by the centroid vectors , and is the squared Euclidean distance as follows: The FCM model is reduced to the hard (nonfuzzy) Means model [3] when , in which cluster memberships are given by the nearest prototype principle.
Besides pointtype prototypes in FCM, Fuzzy Lines (FCL) [6] for extracting linear clusters used linear prototypes defined as where is the basis vector of the principal subspace, and is the centroid, which the linear prototype passes through. The clustering criterion is calculated as The updating rules for membership and the cluster center are derived as The basis vectors are the principal eigenvectors of the generalized fuzzy scatter matrices:
This linear clustering model has close relation with local PCA [9]. Indeed, when we consider only a single cluster (), the FCL clustering model is equivalent to the conventional PCA and the basis vector is reduced to the principal component vector. In this sense, FCL is a type of local PCA, which simultaneously performs membership estimation (local fuzzy group extraction) and fuzzy PCA [8] in each local fuzzy group considering the fuzzy membership degree of . The prototypical line can be identified with principal subspace spanned by fuzzy principal component vector from the local PCA view point.
When , the FCL model is also reduced to the hard (nonfuzzy) local PCA model [15, 16], in which cluster memberships are given by the nearest prototype principle.
2.2. FCMType Relational Clustering
RFCM [1] is the relational extension of FCM. When we have relational data composed of mutual relations among patterns , the FCMtype objective function is redefined as can be any type of dissimilarity between patterns and but is assumed to be Euclideanlike one in RFCM. Indeed, this model is equivalent to FCM only when is the squared Euclidean distance, and the clustering model derives only poor results if the relational information is highly nonEuclidean.
In order to modify RFCM for handling nonEuclidean distance metrics, Hathaway and Bezdek [13] considered NERF, which includes the following spread transformation: where is added to offdiagonal elements of nonEuclidean relational data . is a unit matrix, and is a suitably chosen scalar. is a matrix whose elements are all 1. Hathaway and Bezdek discussed that is Euclidean if with is negative semidefinite; that is, is greater than or equal to the largest eigenvalue of . By the way, the basic RFCM iteration can be continued when clustering criteria are all nonnegative. In NERF, is gradually increased from 0 to a certain value by considering the negative elements of clustering criteria.
3. FCMddType Linear Clustering and TIBA Imputation
3.1. FCMddType Linear Clustering
Assume that is the mutual Euclidean distance such that FCMdd [5] is a fuzzy extension of medoids [4], which performs an FCMlike clustering by selecting from patterns ,. The representative objects are called “medoids” and are given by solving combinatorial optimization problems. Haga et al. [10] applied the idea to linear fuzzy clustering, in which each linear prototype is spanned by two representative medoids and as The squared Euclidean distance between object and the prototypical line is given as
With fixed fuzzy memberships , the optimal medoids are derived by the following combinatorial optimization problem: The optimal medoid set of is searched by enumerating all pairs of objects. In order to reduce the computational cost, a simplified medoid search process was also proposed, in which medoids are selected from a subset of objects: where .
This linear fuzzy clustering model was also extended to the 2D prototype case by spanning 2D prototypical planes using three medoids [10].
Although nonEuclidean relational data may bring negative values for the clustering criteria of (12), from the practical view point, we have no trouble in operating the conventional FCMddtype linear clustering algorithm if all clustering criteria are not negative.
Yamamoto et al. [11] proposed a procedure for spread transformation so as to avoid negative criterion values in FCMddtype linear clustering. Because a negative criterion value implies a nonEuclidean situation, relational data should be revised so that the criterion value is always nonnegative. In the previous research [12], it was shown that the clustering criterion is always nonnegative if triangle inequality () is satisfied. Then, spread transformation should be performed so that the following triangle inequality is satisfied for all objects: A plausible value of in an iteration step is obtained as Here, is positive when some are negative, while is zero when all are nonnegative. Then, is monotonically increasing.
A sample procedure including the automated spread transformation can be summarized as follows:
Step 1. Set . Randomly initialize the prototypical medoids (two representative objects) of each cluster.
Step 2. Calculate the clustering criteria by (12).
Step 3. If there is at least one object that has , update by (16).
Step 4. Update fuzzy memberships by (5).
Step 5. Search medoids in each cluster.
Step 6. Repeat Steps 2–5 until a certain stopping criterion is satisfied.
In Step 6, such a stopping criterion as is used where is a small positive value.
Although the proposed model is in the fuzzy clustering category, it is easily seen that a hard (nonfuzzy) version can be covered when , in which cluster memberships are given by the nearest prototype principle.
3.2. Missing Value Imputation by TIBA
Hathaway and Bezdek [14] demonstrated that the spread transformation is also useful for handling missing elements in relational data matrices. Although preimputation of missing elements may cause imputation errors and bring illegal effects in clustering process, spread transformation can decrease the illegal effects.
This paper considers the applicability of several imputation techniques in FCMddtype linear clustering.
Hathaway and Bezdek [14] used three imputation techniques based on triangle inequalitybased approximation (TIBA). The triangle inequality, which Euclidean relational data always satisfy, is represented as follows: Assume that an element of relational matrix is missing and is to be preimputed before applying the clustering algorithm. Let be the corresponding index set as For each , the triangle inequality (17) is given as the upper bound of . Missing elements are replaced with the minimum upper bound of : which is called minimax TIBA. By the way, is imputed by zero value if is empty.
The triangle inequality is also represented as follows: and brings the following inequalities: So, the lower bound of is given as Missing elements are replaced with the maximum lower bound of : which is called maximin TIBA.
It is also possible to combine the previous two imputation values for predicting a reasonable estimation of missing values. The average values of minimax TIBA and maximin TIBA are used for imputing missing elements. This TIBA is called average TIBA.
These imputation techniques based on triangle inequalities can be easily applied to relational clustering problems. In the next section, these three imputation approaches are compared in FCMddtype linear clustering tasks in conjunction with spread transformation.
4. Numerical Experiments
Two experimental results are shown in order to consider the applicability of the three TIBA imputation techniques in FCMddtype linear clustering with spread transformation.
In previous researches, it has been shown that “soft” clustering models outperformed “hard” ones in local PCA tasks [15–17], and “fuzzy” models can be more useful than probabilistic ones [9]. Therefore, in this paper, the characteristics of the fuzzy version are investigated.
4.1. Artificial Data Set
An artificial relational data set composed of 60 patterns was generated from a 2D data set shown in Figure 1, in which patterns form two lineshaped clusters. It is obvious that the local linear structures cannot be extracted by the conventional pointprototype models such as FCMlike models and FCMdd. We made two relational data matrices. The first relational data matrix was generated by Euclidean norm, and the second one was generated by norm, which is nonEuclidean measure. The iterative algorithm was performed until the medoids became unchanged, and the model parameters were set as . In order to demonstrate the characteristics of the algorithm, the initial memberships were given in a supervised manner; that is, for the first visual cluster and for the second one.
In the previous research [12], it was demonstrated that the two linear substructures can be successfully revealed by the FCMddbased linear clustering algorithm without spread transformation for Euclidean relational data while it can be done only with spread transformation for norm.
First, Euclidean incomplete relational data matrices were generated by removing a part of offdiagonal elements where was not empty. In order to protect tridiagonal parts of relational data, the maximum number of missing elements was set as .
Clustering results are compared with those without spread transformation in Figure 2. Objects were partitioned into two clusters of circles and times, and smaller times mean that the patterns were shared almost equally by the two clusters. Medoids and prototypical lines are indicated by black circles and lines, respectively.
(a) Minimax TIBA (no. missing : 500)
(b) Maximin TIBA (no. missing : 400)
(c) Average TIBA (no. missing : 500)
Each approximation method with spread transformation could estimate cluster medoids for capturing the two visual linear prototypes until the numbers of missing elements are less than about 30% although patterns having ambiguous memberships increased more than complete relational data. spread transformation performed on each approximation, minimax TIBA: , maximin TIBA: , average TIBA: . Here, the maximum eigenvalues of PDP after imputation were, minimax TIBA: 0.053808, maximin TIBA: 0.1711612, average TIBA: 0.083694. So, the TIBA imputation brought a slightly nonEuclidean situation, and spread transformation successfully modified the data set.
On the other hand, without spread transformation, only average TIBA made it possible to extract linear substructures while minimax TIBA and maximin TIBA brought inappropriate results where some patterns depicted by black diamonds in Figure 2 had negative clustering criterion values.
These results imply that the FCMddtype linear clustering can successfully extract linear substructures of incomplete Euclidean relational data using spread transformation although the three imputation techniques cause nonEuclidean relational matrices.
Second, FCMddtype linear fuzzy clustering was applied to nonEuclidean relational data.
Incomplete relational data matrices were generated in the same manner with the Euclidean case. Clustering results are depicted in Figure 3.
(a) Minimax TIBA (no. missing : 1000)
(b) Maximin TIBA (no. missing : 700)
(c) Average TIBA (no. missing : 1100)
With spread transformation, minimax TIBA and average TIBA could extract linear prototypes until the numbers of missing elements are less than about 60%, and maximin TIBA also could until about 40%. The parameters in spread transformation were minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP, minimax TIBA:, maximin TIBA:, average TIBA: 0.619776.
Without spread transformation; however, all the three TIBAs brought inappropriate partitions because many patterns had negative clustering criterion values.
In this way, spread transformation also works well in incomplete situations.
4.2. Document Clustering
In the second experiment, TIBA imputation methods are compared in a document classification task. A relational data set was generated using a famous Japanese novel “Kokoro” by Soseki Natsume. The novel is composed of 3 chapters (Sensei and I, My Parents and I, Sensei and His Testament), and the chapters include 36, 18, 56 sections, respectively. The text data (Japanese language) can be downloaded from Aozora Bunko (http://www.aozora.gr.jp/). The sections were used as individual text documents (), which should be partitioned without the chapter information. The text documents were preprocessed using “Chasen” morphological analysis system software (http://chasen.naist.jp/hiki/ChaSen/), which segments the Japanese text string into morphemes. Wada et al. [18] performed a PCAbased structural analysis with the 83 most frequently used substantives and verbs with their tfidf weights and revealed that the chapter structure can be emphasized by using 10 meaningful keywords as is shown in Figure 4, which is 2D biplots of principal components. Chapters 2 and 3 form two linear clusters in 10D data space, and chapter 1 exists on their intersection. In this experiment, parameters were set as , with the goal of revealing the two linear substructures.
Two relational data matrices were generated considering cooccurrence information of the 10 keywords. Jaccard coefficient and Dice coefficient are the similarity measures for asymmetric information on binary variables [19]. Assume that the cooccurrence information of keywords among two text documents are summarized in a 2 × 2 contingency table as shown in Table 1 where “1” means occurrence of the keyword.

Jaccard's coefficient is the similarity represented as
Dice's coefficient is also the similarity represented as
Because the linear clustering model uses distance (dissimilarity) measures, the similarity measures were transformed into dissimilarity ones .
Before applying the FCMddbased linear fuzzy clustering, randomly selected elements were withheld from the relational matrix with 11,772 elements and were imputed by the three TIBA methods. Then, the cluster partitions for Jaccard's index were derived as shown in Figure 5. Two clusters are depicted by circles and times, and small times mean ambiguous assignment. Documents were properly partitioned into two clusters considering linear substructures.
(a) Minimax TIBA (no. missing : 6200)
(b) Maximin TIBA (no. missing : 8000)
(c) Average TIBA (no. missing : 7200)
Minimax TIBA allowed with 50% missing values or fewer. Average TIBA tolerated 60% missing values or fewer. Maximin TIBA resulted in a good partition with 68% missing values or fewer. The parameters in spread transformation were given as minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP without missing elements, minimax TIBA: 5.662, maximin TIBA: 4.548, average TIBA: 2.427.
Clustering results for Dice coefficient are depicted in Figure 6. Our approach also extracted linear substructure from incomplete relational data of Dice coefficient. Minimax TIBA allowed with 48% missing values or fewer. Average TIBA tolerated 55% missing values or fewer. Maximin TIBA resulted in a good partition with 63% missing values or fewer. The parameters in spread transformation were given as, minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP, minimax TIBA:, maximin TIBA:, average TIBA:.
(a) Minimax TIBA (no. missing : 5600)
(b) Maximin TIBA (no. missing : 7400)
(c) Average TIBA (no. missing : 6600)
In the experiments, it was demonstrated that the TIBA imputation methods work well for incomplete nonEuclidean relational data in conjunction with spread transformation.
Finally, comparison with other methods is discussed. Although we have already many clustering algorithms, some of which are used in document clustering tasks [20], most of them are designed for finding groups composed of similar pattern from the view point of “point prototype” or “hierarchical aggregation”. For example, Fuzzy Medoids (FCMdd) [5], which is a representative method of pointprototype models, can be applied to the relational data set of this subsection. Figure 7 shows the clustering results of finding three chapter structures of circles, times, and triangles. Small times mean ambiguous assignment as well. The conventional clustering methods are useful for finding such document groups considering mutual similarity among documents (or sometime keyword groups).
(a) Jaccard (no. missing : 8000)
(b) Dice (no. missing : 7400)
On the other hand, the proposed method is designed for a different purpose of finding “local linear structures” from the view point of local PCA, which is useful for clusterwise information summarization such as local feature map construction. In this sense, the proposed method has different future application area from the conventional clustering tools.
5. Conclusion
This paper compared the applicability of TIBA imputation methods and spread transformation for handling incomplete relational data in FCMddtype linear clustering. In numerical experiments, three imputation techniques of minimax TIBA, maximin TIBA, and average TIBA were compared using two data sets. The experimental results indicated that spread transformation still works well for incomplete data in conjunction with spread transformation. All the three TIBAs are useful for imputing incomplete nonEuclidean relational data.
From the view point of local PCA concept, the proposed method can be used for local information summarization or local feature map construction where data structures are visually summarized in lowdimensional space in conjunction with data clustering. The application is remained in future works. Another potential future work is an extension to the case of multidimensional prototype models, which is useful for constructing 2D feature map.
Acknowledgment
This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology, Japan, under GrantinAid for Scientific Research (23500283).
References
 R. J. Hathaway, J. W. Davenport, and J. C. Bezdek, “Relational duals of the cmeans clustering algorithms,” Pattern Recognition, vol. 22, no. 2, pp. 205–212, 1989. View at: Google Scholar
 J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, 1981.
 J. B. MacQueen, “Some methods of classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, Berkeley, Calif, USA, 1967. View at: Google Scholar
 L. Kaufman and P. J. Rousseeuw, Finding Groups In Data: An Introduction To Cluster Analysis, WileyInterscience, 1990.
 R. Krishnapuram, A. Joshi, O. Nasraoui, and L. Yi, “Lowcomplexity fuzzy relational clustering algorithms for web mining,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 595–607, 2001. View at: Publisher Site  Google Scholar
 J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, “Detection and characterization of cluster substructure 1. Linear structure fuzzy clines,” SIAM Journal on Applied Mathematics, vol. 40, no. 2, pp. 339–357, 1981. View at: Google Scholar
 J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, “Detection and characterization of cluster substructure 2. Fuzzy cvarieties and convex combinations thereof,” SIAM Journal on Applied Mathematics, vol. 40, no. 2, pp. 358–372, 1981. View at: Google Scholar
 Y. Yabuuchi and J. Watada, “Fuzzy principal component analysis and its application,” Biomedical Fuzzy and Human Sciences, vol. 3, pp. 83–92, 1997. View at: Google Scholar
 K. Honda and H. Ichihashi, “Regularized linear fuzzy clustering and probabilistic PCA mixture models,” IEEE Transactions on Fuzzy Systems, vol. 13, no. 4, pp. 508–516, 2005. View at: Publisher Site  Google Scholar
 N. Haga, K. Honda, A. Notsu, and H. Ichihashi, “Local subspace learning by extended fuzzy cmedoids clustering,” International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 2, no. 2, pp. 169–181, 2010. View at: Google Scholar
 T. Yamamoto, K. Honda, A. Notsu, and H. Ichihashi, “An FCMdd based linear clustering model for nonEuclidean relational data,” in Proceedings of the 5th International Conference on Soft Computing and Intelligent Systems, pp. 243–247, 2010, 11th International Symposium on Advanced Intelligent Systems. View at: Google Scholar
 T. Yamamoto, K. Honda, A. Notsu, and H. Ichihashi, “NonEuclidean extension of FCMddbased linear clustering for relational data,” Journal of Advanced Computational Intelligence and Intelligent Informatics. In press. View at: Google Scholar
 R. J. Hathaway and J. C. Bezdek, “Nerf cmeans: nonEuclidean relational fuzzy clustering,” Pattern Recognition, vol. 27, no. 3, pp. 429–437, 1994. View at: Publisher Site  Google Scholar
 R. J. Hathaway and J. C. Bezdek, “Clustering incomplete relational data using the nonEuclidean relational fuzzy cmeans algorithm,” Pattern Recognition Letters, vol. 23, no. 1–3, pp. 151–160, 1994. View at: Publisher Site  Google Scholar
 N. Kambhatla and T. K. Leen, “Dimension reduction by local principal component analysis,” Neural Computation, vol. 9, no. 7, pp. 1493–1516, 1997. View at: Google Scholar
 G. E. Hinton, P. Dayan, and M. Revow, “Modeling the manifolds of images of handwritten digits,” IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 65–74, 1997. View at: Google Scholar
 M. E. Tipping and C. M. Bishop, “Mixtures of probabilistic principal component analyzers,” Neural Computation, vol. 11, no. 2, pp. 443–482, 1999. View at: Google Scholar
 H. Wada, K. Honda, A. Notsu, and H. Ichihashi, “Document map construction and keyword selection based on local PCA,” in Proceedings of the 4th International Conference on Soft Computing and Intelligent Systems, pp. 682–685, 2008, 9th International Symposium on Advanced Intelligent Systems. View at: Google Scholar
 M. R. Anderberg, Cluster Analysis for Applications, Academic Press, 1973.
 A. Hotho, A. Nürnberger, and G. Paaß, “A brief survey of text mining,” Journal for Computational Linguistics and Language Technology, vol. 20, pp. 19–62, 2005. View at: Google Scholar
Copyright
Copyright © 2011 Takeshi Yamamoto et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.