Abstract

Relational fuzzy clustering has been developed for extracting intrinsic cluster structures of relational data and was extended to a linear fuzzy clustering model based on Fuzzy -Medoids (FCMdd) concept, in which Fuzzy -Means-(FCM-) like iterative algorithm was performed by defining linear cluster prototypes using two representative medoids for each line prototype. In this paper, the FCMdd-type linear clustering model is further modified in order to handle incomplete data including missing values, and the applicability of several imputation methods is compared. In several numerical experiments, it is demonstrated that some pre-imputation strategies contribute to properly selecting representative medoids of each cluster.

1. Introduction

Relational fuzzy clustering is a relational extension of fuzzy clustering for revealing cluster structures buried in relational data. Relational Fuzzy -Means (RFCM) [1] extended the Fuzzy -Means (FCM) [2] clustering criterion with mutual dissimilarity measures instead of object-type observation in FCM. Although FCM and other variants of -Means [3] use the clustering criterion of the distance between a data point and a cluster prototype, RFCM defines the clustering criterion by using mutual dissimilarities only. When the dissimilarities among objects are measured by squared Euclidean distances, the RFCM criterion is equivalent to the centroid-less formulation of the FCM criterion. Using other dissimilarity measures; however, the RFCM criterion has no clear connection with distances between data points and prototypes. In -Medoids [4], cluster prototypes are selected from data points, and the clustering criterion coincides with one of mutual dissimilar degree among objects. So, -Medoids can be directly extended to relational data analysis even if taking an average cannot be done in non-Euclidean space. Fuzzy -Medoids (FCMdd) [5] is a fuzzy extension of -Medoids and can deal with various dissimilarity measures.

Linear fuzzy clustering models [6, 7] extract linear substructures by modifying the point prototypes of FCM into lines, planes, and linear varieties. Because the subspace learning model in each cluster can be identified with fuzzy principal component analysis (fuzzy PCA) [8], they are often regarded as a kind of local principal component analysis (local PCA) [9]. This paper studies the FCMdd-based linear clustering model [10], which can reveal local linear substructures buried in relational data. In [10], Haga et al. defined each prototypical line by using two representative medoids and demonstrated that the clustering modal can be applied to Euclidean relational data. The FCMdd-type linear clustering model was further modified for dealing with non-Euclidean relational data [11, 12], in which data transformation, called -spread transformation, was performed before applying the clustering algorithm in a similar manner to Non-Euclidean-type Relational Fuzzy (NERF) -Means [13].

In this paper, a comparative study on the applicability of -spread transformation is performed in FCMdd-based linear clustering of incomplete relational data. Hathaway and Bezdek [14] proposed several methods for imputing (predicting and substituting) missing elements of incomplete relational data and showed that imputation errors can be revised by -spread transformation in NERF -Means. This paper demonstrates that the performance of FCMdd-type linear fuzzy clustering for incomplete relational data can also be improved by -spread transformation through several comparative experiments including an example of document clustering.

The remaining part of this paper is organized as follows. In Section 2, linear clustering and relational clustering are briefly reviewed. Section 3 introduces FCMdd-type linear clustering model and applies several imputation methods called TIBA. Comparative results are shown in Section 4, and conclusions are given in Section 5.

2. Linear Clustering and Relational Clustering

2.1. FCM-Type Linear Clustering

Assume that we have dimensional observations of patterns ,. With the goal of partitioning the patterns into clusters, the objective function for FCM-type clustering is defined as is the fuzzy membership degree of pattern to cluster , and is the fuzzification parameter. The larger the , the fuzzier the membership assignment. is the clustering criterion which measures the deviation between pattern and the prototype of cluster . In the original FCM clustering [2], cluster prototypes are given by the centroid vectors , and is the squared Euclidean distance as follows: The FCM model is reduced to the hard (nonfuzzy) -Means model [3] when , in which cluster memberships are given by the nearest prototype principle.

Besides point-type prototypes in FCM, Fuzzy -Lines (FCL) [6] for extracting linear clusters used linear prototypes defined as where is the basis vector of the principal subspace, and is the centroid, which the linear prototype passes through. The clustering criterion is calculated as The updating rules for membership and the cluster center are derived as The basis vectors are the principal eigenvectors of the generalized fuzzy scatter matrices:

This linear clustering model has close relation with local PCA [9]. Indeed, when we consider only a single cluster (), the FCL clustering model is equivalent to the conventional PCA and the basis vector is reduced to the principal component vector. In this sense, FCL is a type of local PCA, which simultaneously performs membership estimation (local fuzzy group extraction) and fuzzy PCA [8] in each local fuzzy group considering the fuzzy membership degree of . The prototypical line can be identified with principal subspace spanned by fuzzy principal component vector from the local PCA view point.

When , the FCL model is also reduced to the hard (nonfuzzy) local PCA model [15, 16], in which cluster memberships are given by the nearest prototype principle.

2.2. FCM-Type Relational Clustering

RFCM [1] is the relational extension of FCM. When we have relational data composed of mutual relations among patterns , the FCM-type objective function is redefined as can be any type of dissimilarity between patterns and but is assumed to be Euclidean-like one in RFCM. Indeed, this model is equivalent to FCM only when is the squared Euclidean distance, and the clustering model derives only poor results if the relational information is highly non-Euclidean.

In order to modify RFCM for handling non-Euclidean distance metrics, Hathaway and Bezdek [13] considered NERF, which includes the following -spread transformation: where is added to off-diagonal elements of non-Euclidean relational data . is a unit matrix, and is a suitably chosen scalar. is a matrix whose elements are all 1. Hathaway and Bezdek discussed that is Euclidean if with is negative semidefinite; that is, is greater than or equal to the largest eigenvalue of . By the way, the basic RFCM iteration can be continued when clustering criteria are all nonnegative. In NERF, is gradually increased from 0 to a certain value by considering the negative elements of clustering criteria.

3. FCMdd-Type Linear Clustering and TIBA Imputation

3.1. FCMdd-Type Linear Clustering

Assume that is the mutual Euclidean distance such that FCMdd [5] is a fuzzy extension of -medoids [4], which performs an FCM-like clustering by selecting from patterns ,. The representative objects are called “medoids” and are given by solving combinatorial optimization problems. Haga et al. [10] applied the idea to linear fuzzy clustering, in which each linear prototype is spanned by two representative medoids and as The squared Euclidean distance between object and the prototypical line is given as

With fixed fuzzy memberships , the optimal medoids are derived by the following combinatorial optimization problem: The optimal medoid set of is searched by enumerating all pairs of objects. In order to reduce the computational cost, a simplified medoid search process was also proposed, in which medoids are selected from a subset of objects: where .

This linear fuzzy clustering model was also extended to the 2D prototype case by spanning 2D prototypical planes using three medoids [10].

Although non-Euclidean relational data may bring negative values for the clustering criteria of (12), from the practical view point, we have no trouble in operating the conventional FCMdd-type linear clustering algorithm if all clustering criteria are not negative.

Yamamoto et al. [11] proposed a procedure for -spread transformation so as to avoid negative criterion values in FCMdd-type linear clustering. Because a negative criterion value implies a non-Euclidean situation, relational data should be revised so that the criterion value is always non-negative. In the previous research [12], it was shown that the clustering criterion is always nonnegative if triangle inequality () is satisfied. Then, -spread transformation should be performed so that the following triangle inequality is satisfied for all objects: A plausible value of in an iteration step is obtained as Here, is positive when some are negative, while is zero when all are nonnegative. Then, is monotonically increasing.

A sample procedure including the automated -spread transformation can be summarized as follows:

Step 1. Set . Randomly initialize the prototypical medoids (two representative objects) of each cluster.

Step 2. Calculate the clustering criteria by (12).

Step 3. If there is at least one object that has , update by (16).

Step 4. Update fuzzy memberships by (5).

Step 5. Search medoids in each cluster.

Step 6. Repeat Steps 25 until a certain stopping criterion is satisfied.

In Step 6, such a stopping criterion as is used where is a small positive value.

Although the proposed model is in the fuzzy clustering category, it is easily seen that a hard (non-fuzzy) version can be covered when , in which cluster memberships are given by the nearest prototype principle.

3.2. Missing Value Imputation by TIBA

Hathaway and Bezdek [14] demonstrated that the -spread transformation is also useful for handling missing elements in relational data matrices. Although preimputation of missing elements may cause imputation errors and bring illegal effects in clustering process, -spread transformation can decrease the illegal effects.

This paper considers the applicability of several imputation techniques in FCMdd-type linear clustering.

Hathaway and Bezdek [14] used three imputation techniques based on triangle inequality-based approximation (TIBA). The triangle inequality, which Euclidean relational data always satisfy, is represented as follows: Assume that an element of relational matrix is missing and is to be preimputed before applying the clustering algorithm. Let be the corresponding index set as For each , the triangle inequality (17) is given as the upper bound of . Missing elements are replaced with the minimum upper bound of : which is called minimax TIBA. By the way, is imputed by zero value if is empty.

The triangle inequality is also represented as follows: and brings the following inequalities: So, the lower bound of is given as Missing elements are replaced with the maximum lower bound of : which is called maximin TIBA.

It is also possible to combine the previous two imputation values for predicting a reasonable estimation of missing values. The average values of minimax TIBA and maximin TIBA are used for imputing missing elements. This TIBA is called average TIBA.

These imputation techniques based on triangle inequalities can be easily applied to relational clustering problems. In the next section, these three imputation approaches are compared in FCMdd-type linear clustering tasks in conjunction with -spread transformation.

4. Numerical Experiments

Two experimental results are shown in order to consider the applicability of the three TIBA imputation techniques in FCMdd-type linear clustering with -spread transformation.

In previous researches, it has been shown that “soft” clustering models outperformed “hard” ones in local PCA tasks [1517], and “fuzzy” models can be more useful than probabilistic ones [9]. Therefore, in this paper, the characteristics of the fuzzy version are investigated.

4.1. Artificial Data Set

An artificial relational data set composed of 60 patterns was generated from a 2D data set shown in Figure 1, in which patterns form two line-shaped clusters. It is obvious that the local linear structures cannot be extracted by the conventional point-prototype models such as FCM-like models and FCMdd. We made two relational data matrices. The first relational data matrix was generated by Euclidean norm, and the second one was generated by norm, which is non-Euclidean measure. The iterative algorithm was performed until the medoids became unchanged, and the model parameters were set as . In order to demonstrate the characteristics of the algorithm, the initial memberships were given in a supervised manner; that is, for the first visual cluster and for the second one.

In the previous research [12], it was demonstrated that the two linear substructures can be successfully revealed by the FCMdd-based linear clustering algorithm without -spread transformation for Euclidean relational data while it can be done only with -spread transformation for norm.

First, Euclidean incomplete relational data matrices were generated by removing a part of off-diagonal elements where was not empty. In order to protect tridiagonal parts of relational data, the maximum number of missing elements was set as .

Clustering results are compared with those without -spread transformation in Figure 2. Objects were partitioned into two clusters of circles and times, and smaller times mean that the patterns were shared almost equally by the two clusters. Medoids and prototypical lines are indicated by black circles and lines, respectively.

Each approximation method with -spread transformation could estimate cluster medoids for capturing the two visual linear prototypes until the numbers of missing elements are less than about 30% although patterns having ambiguous memberships increased more than complete relational data. -spread transformation performed on each approximation, minimax TIBA: , maximin TIBA: , average TIBA: . Here, the maximum eigenvalues of PDP after imputation were, minimax TIBA: 0.053808, maximin TIBA: 0.1711612, average TIBA: 0.083694. So, the TIBA imputation brought a slightly non-Euclidean situation, and spread transformation successfully modified the data set.

On the other hand, without -spread transformation, only average TIBA made it possible to extract linear substructures while minimax TIBA and maximin TIBA brought inappropriate results where some patterns depicted by black diamonds in Figure 2 had negative clustering criterion values.

These results imply that the FCMdd-type linear clustering can successfully extract linear substructures of incomplete Euclidean relational data using -spread transformation although the three imputation techniques cause non-Euclidean relational matrices.

Second, FCMdd-type linear fuzzy clustering was applied to non-Euclidean relational data.

Incomplete relational data matrices were generated in the same manner with the Euclidean case. Clustering results are depicted in Figure 3.

With -spread transformation, minimax TIBA and average TIBA could extract linear prototypes until the numbers of missing elements are less than about 60%, and maximin TIBA also could until about 40%. The parameters in -spread transformation were minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP, minimax TIBA:, maximin TIBA:, average TIBA: 0.619776.

Without -spread transformation; however, all the three TIBAs brought inappropriate partitions because many patterns had negative clustering criterion values.

In this way, -spread transformation also works well in incomplete situations.

4.2. Document Clustering

In the second experiment, TIBA imputation methods are compared in a document classification task. A relational data set was generated using a famous Japanese novel “Kokoro” by Soseki Natsume. The novel is composed of 3 chapters (Sensei and I, My Parents and I, Sensei and His Testament), and the chapters include 36, 18, 56 sections, respectively. The text data (Japanese language) can be downloaded from Aozora Bunko (http://www.aozora.gr.jp/). The sections were used as individual text documents (), which should be partitioned without the chapter information. The text documents were preprocessed using “Chasen” morphological analysis system software (http://chasen.naist.jp/hiki/ChaSen/), which segments the Japanese text string into morphemes. Wada et al. [18] performed a PCA-based structural analysis with the 83 most frequently used substantives and verbs with their tf-idf weights and revealed that the chapter structure can be emphasized by using 10 meaningful keywords as is shown in Figure 4, which is 2D biplots of principal components. Chapters 2 and 3 form two linear clusters in 10D data space, and chapter 1 exists on their intersection. In this experiment, parameters were set as , with the goal of revealing the two linear substructures.

Two relational data matrices were generated considering co-occurrence information of the 10 keywords. Jaccard coefficient and Dice coefficient are the similarity measures for asymmetric information on binary variables [19]. Assume that the cooccurrence information of keywords among two text documents are summarized in a 2 × 2 contingency table as shown in Table 1 where “1” means occurrence of the keyword.

Jaccard's coefficient is the similarity represented as

Dice's coefficient is also the similarity represented as

Because the linear clustering model uses distance (dis-similarity) measures, the similarity measures were transformed into dissimilarity ones .

Before applying the FCMdd-based linear fuzzy clustering, randomly selected elements were withheld from the relational matrix with 11,772 elements and were imputed by the three TIBA methods. Then, the cluster partitions for Jaccard's index were derived as shown in Figure 5. Two clusters are depicted by circles and times, and small times mean ambiguous assignment. Documents were properly partitioned into two clusters considering linear substructures.

Minimax TIBA allowed with 50% missing values or fewer. Average TIBA tolerated 60% missing values or fewer. Maximin TIBA resulted in a good partition with 68% missing values or fewer. The parameters in -spread transformation were given as minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP without missing elements, minimax TIBA: 5.662, maximin TIBA: 4.548, average TIBA: 2.427.

Clustering results for Dice coefficient are depicted in Figure 6. Our approach also extracted linear substructure from incomplete relational data of Dice coefficient. Minimax TIBA allowed with 48% missing values or fewer. Average TIBA tolerated 55% missing values or fewer. Maximin TIBA resulted in a good partition with 63% missing values or fewer. The parameters in -spread transformation were given as, minimax TIBA: , maximin TIBA: , average TIBA: . The derived values are still smaller than the maximum eigenvalues of PDP, minimax TIBA:, maximin TIBA:, average TIBA:.

In the experiments, it was demonstrated that the TIBA imputation methods work well for incomplete non-Euclidean relational data in conjunction with -spread transformation.

Finally, comparison with other methods is discussed. Although we have already many clustering algorithms, some of which are used in document clustering tasks [20], most of them are designed for finding groups composed of similar pattern from the view point of “point prototype” or “hierarchical aggregation”. For example, Fuzzy -Medoids (FCMdd) [5], which is a representative method of point-prototype models, can be applied to the relational data set of this subsection. Figure 7 shows the clustering results of finding three chapter structures of circles, times, and triangles. Small times mean ambiguous assignment as well. The conventional clustering methods are useful for finding such document groups considering mutual similarity among documents (or sometime keyword groups).

On the other hand, the proposed method is designed for a different purpose of finding “local linear structures” from the view point of local PCA, which is useful for cluster-wise information summarization such as local feature map construction. In this sense, the proposed method has different future application area from the conventional clustering tools.

5. Conclusion

This paper compared the applicability of TIBA imputation methods and -spread transformation for handling incomplete relational data in FCMdd-type linear clustering. In numerical experiments, three imputation techniques of minimax TIBA, maximin TIBA, and average TIBA were compared using two data sets. The experimental results indicated that -spread transformation still works well for incomplete data in conjunction with -spread transformation. All the three TIBAs are useful for imputing incomplete non-Euclidean relational data.

From the view point of local PCA concept, the proposed method can be used for local information summarization or local feature map construction where data structures are visually summarized in low-dimensional space in conjunction with data clustering. The application is remained in future works. Another potential future work is an extension to the case of multidimensional prototype models, which is useful for constructing 2D feature map.

Acknowledgment

This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology, Japan, under Grant-in-Aid for Scientific Research (23500283).