Abstract
Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of threedimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a threedimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on MovieLens demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches.
1. Introduction
Clustering is one of the most important techniques of exploratory data mining, used in many applications such as automatic categorization of documents, grouping search engine results, detecting social communities, and others [1, 2]. Clustering algorithms try to detect intrinsic structures of data objects so that a set of clusters is generated where intercluster similarity is maximized and intracluster similarity is minimized.
There is already a large body of work that investigates approaches to clustering objects [3–5]. With different accumulation rules of data objects in clustering and methods employing these rules, clustering algorithms could be divided into four types: hierarchical clustering, partitional clustering, density and gridbased clustering, and others [6]. Hierarchical and partitional clustering methods are used in many applications, and their representative algorithms include agglomerative hierarchical clustering algorithm and KMeans algorithm, respectively. Generally, these algorithms put each object into a single cluster. However, one object may contain multiple subjects and characteristics, so it could belong to multiple clusters [7]. This suggests the appearance of soft clustering algorithms.
Different from hard clustering algorithms where each object belongs to exactly one cluster, soft clustering allows an object to exist in multiple clusters. Introducing fuzziness to clustering gives us the flexible solutions for soft clustering algorithms [8]. Fuzzy clustering extends traditional clustering and represents the affiliation of objects to clusters by memberships. Fuzzy Means (FCM) [9] is a representative fuzzy clustering algorithm. Based on this approach, there are many varieties [5, 8]. Nowadays, as most web datasets are known to be large and high dimensional, it becomes more and more challenging to develop satisfactory clustering process. Still, some new clustering algorithms have been designed such as fuzzy coclustering which combines fuzzy clustering with coclustering. Coclustering is a technique for extracting objectfeature structures from cooccurrence information of objects and features and is very effective to handle high dimensional data by simultaneously grouping objects and features based on the cooccurrence information [10–12]. Several fuzzy coclustering algorithms have been previously proposed. Oh et al. [8] presented an algorithm of fuzzy clustering for categorical multivariate (FCCM) data. Tjhi and Chen [13] designed a new algorithm fuzzy coclustering with Ruspini’s (FCR) condition. In 2013, Hanmandlu et al. [14] proposed a fuzzy coclustering algorithm for images (FCCI). Besides FCR, Tjhi and Chen introduced two algorithms: one is possibilistic fuzzy coclustering (PFCC) [15]; the other is robust fuzzy coclustering (RFCC) [16]. In 2013, Yan et al. [17] proposed a new heuristic semisupervised fuzzy coclustering algorithm (SSHFCR) for categorization of large web documents. Keeping the benefits of coclustering and fuzzy clustering, these algorithms improves the representation of overlapping clusters using fuzzy membership function and suitable to categorize documents, particularly web documents in Internet.
Although above clustering algorithms show good clustering quality, they are insufficient in such applications as timelocationtype environmental sensor monitoring, timeauthorkeywords social network analysis, sourcedestinationtext web graph mining. In these applications, data space has three dimensions. None of the existing coclustering algorithms is designed to work in this scenario.
Motivated by the threedimensional clustering requirements, we propose a novel fuzzy triclustering algorithm, called FTC. The approach combines some popular algorithms such as fuzzy standard clustering and fuzzy coclustering and expands them to support three dimensions.
The remainder of this paper is organized as follows. In Section 2, we provide a literature review of fuzzy standard clustering and coclustering algorithms. Section 3 introduces in detail the proposed FTC. Section 4 presents our experimental results on the MovieLens dataset. Finally, we conclude our work.
2. Related Work
In this section, we briefly review two types of clustering algorithms that have been proposed earlier, fuzzy clustering and fuzzy coclustering. These algorithms could help to understand FTC introduced in next section. The explanations on the mathematical notations used in this paper are listed in the Mathematical Notations section.
2.1. Fuzzy Clustering
FCM is the most classical fuzzy clustering algorithm, which is known as the fuzzy version of the Means and has been studied by many researchers. FCM aims at minimizing the objective function in (1), subject to the membership constrain in (2). Consider where is the th object, is centroid of the th cluster, and measures the distances between objects and clusters. The former item in (1) means that each object should be put into a close cluster, which achieves the purpose of clustering that intercluster similarity is maximized and intracluster similarity is minimized. The latter item controls the fuzziness of clustering, and is just the weighting parameter that specifies the degree of fuzziness.
Lagrange’s method of indeterminate multiplier is used to derive the objective function. The solution of this constrained optimization problem can be approximated by Picard iteration and alternating optimization through the following equations (3) and (4), which are the update equations for the object memberships and cluster centroids, respectively, in each iteration. Consider the following:
In FCM, any point has a set of coefficients, , giving the degree of being in the th cluster, and is the centroid of a cluster which is the mean of all points, weighted by their degrees of belonging to the cluster.
There have been hundreds of FCMvariant algorithms in the literatures that are designed based on the basic principle of FCM, which shows that FCM is very important and still popular in clustering analysis now.
2.2. Fuzzy Coclustering
Coclustering is an important technique in data mining which simultaneously clusters objects and features. Compared to standard clustering, coclustering can offer several benefits [13, 16], including dimensionality reduction, interpretable document cluster, and improvement in accuracy due to local model on clustering.
Fuzzy coclustering extends coclustering by adding fuzzy sets theory into coclusters. Besides the above three benefits, fuzzy coclustering could generate coclusters that are more realistic because one object may contain multiple subjects and one feature may contain multiple concepts.
Due to these advantages, fuzzy coclustering algorithms have been widely studied and developed. Prominent algorithms proposed earlier include FCCM, FCR, FCCI, and PFCC. We briefly review FCCM and FCCI in this paper which are theoretical principles of FTC.
FCCM tries to maximize an objective function defined in (5) to complete coclustering, subject to the membership constraints in (6) and (7). Consider the following:
In (5), there are three terms. The first term is the degree of aggregation which tries to make highly interrelated objects and features. The and are two membership functions, indicating object membership and feature membership, respectively. The second and the third terms control the fuzziness of clustering, where and are two parameters that adjust the levels of fuzziness of object and feature memberships, respectively.
By applying Lagrange multiplier method, the solution of this constrained optimization problem can be approximated by iteratively updating the following equation in an alternating fashion:
To understand fuzzy coclustering easily, Oh et al. gave a numerical example. They applied FCCM to literature retrieval data set which shows the cooccurrence relations among the literatures and the key words. The rows represent the literatures and columns are the key words. The results of FCCM were shown as two tables. The first table listed the memberships of literatures, that is, the values of . And the second table showed the memberships of key words (values of ). Larger memberships mean literatures and key words are more likely to belong to a cluster; thus the final clusters could be generated based on the two tables.
Besides FCCM, FCCI is a novel color segmentation technique using fuzzy coclustering approach where both the objects and features are assigned membership functions. In FCCI, the following objective function equation (9) is minimized subject to the membership constraints in (10) and (11). Consider the following:
The first term in (9) shows the distance relationship. In this term, , which means the crucial distance equals the square of Euclidean distance between feature data point and the feature cluster centroid . The minimization of this term assigns to an object a higher membership value taking into account the feature cluster center it is closest to and which is more relevant than other features for that particular cluster. The second and third terms in (9) contribute to the fuzziness in the resulting clusters, where and are the weighting parameters specifying the degree of fuzziness.
The solution of this constrained optimization problem can be approximated by iteratively updating following equations in an alternating fashion:
3. Proposed FTC Algorithm
Although experimental results show that clustering quality of above clustering algorithms is encouraging, they are insufficient in some scenarios. Here there are two examples.
Scenario 1. Users submit queries to search engines and obtain search results. When grouping users, we generally employ the userdocument matrix and neglect queries submitted. Sometimes the query information is actually useful. A user selects one document just because he submits a relative query before. Thus it becomes an issue of conditional probability and users, queries, and documents construct a threedimensional space.
Scenario 2. We could model user interests by mining documents clicked. However, interests of a user may change over time. If we analyze all documents he browsed without regard to time, user interest drift could not be mined. Therefore, time is an important factor. When we take account of time factor, the data format becomes a userdocumenttime threedimensional matrix from a userdocument contingency table.
In these scenarios, we have data spaces with three dimensions, and the data format provided to us is not a contingency table with data object and feature, but a threedimensional matrix. Current clustering algorithms primarily aim at contingency table, and few concern threedimensional data. Gnatyshak et al. [18] proposed two novel methods for biclustering and triclustering data collected from online social networks, and they could reveal users’ interests as tags and use them to describe Vkontakte groups with triclustering. In the context of two relational datasets that share labels among one of the dimensions, Mahiskar et al. [19] simultaneously process two datasets to unveil triclusters and presented a triclustering algorithm that searches for meaningful combinations of biclusters in two related datasets. Guigourès et al. [20] introduced a novel technique to track structures in time evolving graphs, based on a parameter free approach for threedimensional coclustering of the source vertices, the target vertices, and the time. Above algorithms implement triclustering process without fuzzy sets theory, which means they concentrate on hard clustering.
We propose a novel fuzzy triclustering algorithm, FTC, in this paper. In FTC, we target to keep the benefits of coclustering/triclustering and fuzzy clustering simultaneously. As indicated in Figure 1, the new algorithm aims at essentially achieving fuzzy threedimensional clustering (as Figure 1(e)). Figures 1(a) and 1(b) refer to hard clustering and hard coclustering, respectively. Figures 1(c) and 1(d) both indicate fuzzy coclustering. The latter is more flexible than the former because each cluster can contains part elements of some rows and columns in the later mode. We enlarge the issue of Figure 1(d) from two dimensions to three and in turn extend fuzzy coclustering algorithms to fuzzy triclustering (FTC) algorithm.
(a)
(b)
(c)
(d)
(e)
The goal of FTC is to maximize the objective function in (13), subject to the following constraints in (14), (15), and (16). Consider the following:
The first term in (13) is the degree of aggregation that should be maximized among the triclustering, which intends to make highly related  in Figure 1(e) to be triclustered together. In other words, the cube in Figure 1(e) will be divided into many little blocks which should be triclusters with strong coherent bonding among its members (i.e., , , and ). , , and are three membership functions, indicating memberships of three dimensions, respectively. This term is similar to the first term in objective function of FCCM which is denoted by the componentwise inner product of two matrices. The second, third, and fourth terms are entropy regularization factors that combines all ’s, ’s, and ’s separately. They control the degree of fuzziness in final clusters, where , , and are weighting parameters.
The constrained optimization of FTC can be solved by applying the Lagrange multipliers , , and to constraints in (14), (15), and (16), respectively. Consider the following:
Taking the partial derivative of in (17) with respect to U, V, and , respectively, and setting the gradient to zero we have
Solving above equations yield the formulae for , , and as
Equations (19), (20), and (21) are the update equations for the three dimensions memberships, through which the solution of the constrained optimization problem in (17) can be approximated by Picard iteration. Therefore FTC can be written as Pseudocode 1.

The pseudocode of FTC in Pseudocode 1 shows that the time complexity of FTC is , where denotes the number of iterations. Its time complexity is higher than such fuzzy coclustering algorithms as FCM, FCCM with . The main reason is that FTC generates fuzzy clusters simultaneously on three dimensions while other algorithms on two dimensions.
4. Experiments
In order to test the effectiveness of FTC, we carried out a set of experiments in this paper. The results are also compared with five well received approaches in the literature, Means, FCM, FCCM, RFCC, and FCCI. Of the five algorithms, Means is a traditional standard clustering algorithm, FCM is a fuzzy standard clustering algorithm, and other three are fuzzy coclustering algorithms.
4.1. Experimental Setup
We employ the MovieLens dataset [21] to evaluate the performance of FTC in categorizing realworld data. This dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000. The usermovie matrix is very sparse, so we construct two dense subsets from MovieLens. The first subset contains top 100 users who give the most times of ratings, and top 100 movies that are evaluated most often. In addition, by counting genres of all movies and the top 100 ones, we observe that some genres of movies are popular and evaluated most. The concise movie genre information (as Table 1) is analyzed, and drama, comedy, action, thriller, and romance five genres are added into the first subset to help accomplish experiments. Therefore, this subset is a 100 × 100 × 5 matrix. The second subset is larger, which has top 500 users, top 500 movies, and all 18 genres. Based on the usermoviegenre threedimensional matrices, we implemented our experiments.
Although we could get movie clusters based on coclustering or even genre clusters based on triclustering, the absence of the groundtruth genre categorization makes it difficult to evaluate the accuracies of the resulting genre clusters. We mainly intend to get the user and movie clusters by above six algorithms, whose performance will be evaluated using the following evaluation measures.
4.2. Evaluation Criteria
There are several ways for numerically scoring the cluster quality, such as Entropy, Measure, and Overall Similarity. We choose Measure and Entropy as the criteria to evaluate the performance of FTC.
Measure is the weighted harmonic mean of precision and recall, and it is often used to measure clustering quality. The higher the value of Measure is, the better the clustering quality is. For standard datasets, class information is known. We get clusters after clustering. When objects in class is grouped into cluster , the Measure value is given by where and are computed using the following equations, respectively: where is the number of members of class in cluster , is the number of members of cluster , and is the number of members of class . The overall Measure value is given by the following: where is the total number of documents.
Essentially, Entropy is designed for measuring the randomness of molecules in a thermodynamics system. It can also be used to evaluate cluster distribution during clustering in information theory. If documents are distributed uniformly and there are little differences between clusters, the value of Entropy will be high. On the contrary, if there are great differences between clusters, the value of Entropy will be low. The purpose of clustering is to enlarge the differences between the clusters, so the lower the value of Entropy is, the higher the clustering quality will be. The expression for Entropy of the whole clustering result is listed as follows: where is the whole Entropy value, is the number of documents in cluster , is the number of all the documents, is the number of clusters, and is the Entropy value of cluster , which is calculated using the following formula: where is the probability that one document belonging to class could be put into cluster during the partition.
4.3. Experimental Results
We divided the first subset into 2 classes in terms of user gender and 3 classes in terms of user age where Class 1 includes users younger than 24, Class 2 includes users whose age ranges from 25 to 49, and Class 3 includes users older than 50. The second subset is divided into 7 classes according to user age that correspond to 7 age ranges tagged with 1, 18, 25, 35, 45, 50, and 56, respectively, in MovieLens. We conducted three groups of experiments on the subsets of which two groups of experiments divide the first subset into 2 and 3 clusters, respectively, and the third group classifies the second subset into 7 clusters.
Before discussing the performance comparisons, we inspect the value of objective function. Since FTC tries to maximize the objective function in (13), we run a 10trial simulation on the two subsets and choose the one that resulted in the best performance. For all the experiments, we set = 1, = 1, = 0.5, maximum number of iterations = 50, and convergence indicator = 0.0001. Figure 2 displays the changing trend of objective function with updating iteratively.
(a) Exp. Group 1
(b) Exp. Group 2
(c) Exp. Group 3
Figure 2(a) shows value of objective function in the first group of experiments that divide the first subset into 2 clusters, Figure 2(b) describes the second group dividing the first subset into 3 clusters, and Figure 2(c) illustrates the third group of experiments dividing the second subset into 7 categories. In Figure 2, different trials converge at different local maxima. The second, third, eighth, and tenth trials result in the best performance in Figure 2(a), the first trial results in the best performance in Figure 2(b), and fifth, sixth, and eighth trials result in the best performance in Figure 2(c). It can also be seen from Figure 2 that each trial almost reaches local maxima when is less than 10, which shows that FTC has the rapid constringency speed of approaching local maxima.
The clustering performance comparisons of Means, FCM, FCCM, RFCC, FCCI, and FTC are illustrated in Figure 3. On the three groups of experiments, FTC achieves either better or comparable performance to its partner algorithms, which strengthen our argument that the proposed fuzzy triclustering algorithm, FTC, can generally perform clustering better than such existing formulations as fuzzy clustering and fuzzy coclustering. KMeans is a hard clustering algorithm that just implements clustering on one dimension. In the first group of experiments where the values of and are both low, Means achieve higher clustering quality. When the number of clusters is greater, the performance of Means drops dramatically and is gradually poorer than fuzzy algorithms. Of the five fuzzy algorithms, except in first group of experiments, FCM is shown to consistently result in poor clustering quality, which further confirms that coclustering could achieve better performance than standard clustering. Another thing to highlight from Figure 3 is that FCCM, RFCC, and FCCI are competitive, and they can generate clusters with relatively high quality.
(a) Measure
(b) Entropy
Besides clustering users, coclustering algorithms and FTC also simultaneously generate fuzzy movie and even genre clusters. We have genre information of movies as conditions classifying movies and focus on evaluating quality of fuzzy movie clusters. The above experiments are investigated and experimental results about movie clusters are illustrated in Figure 4. Means and FCM cannot generate cluster in movie dimension; thus they do not participate in this comparison. Figure 4 shows that FTC achieves the best performance among the four algorithms, and moreover, the improvement in terms of Measure is significant. Particularly in the second group of experiments, the Measure value of FTC equals 0.77, and the values of FCCM, RFCC, and FCCI are 0.50, 0.50, and 0.48, respectively. The Entropy values of the FTC, FCCM, RFCC, and FCCI in this group experiments are 0.28, 0.46, 0.45, and 0.44, respectively.
(a) Measure
(b) Entropy
In our experiments, we observe that some algorithms are sensitive to the value of parameter . Different values of may generate different results. We extend the above three groups of experiments and investigate the effects of . In the experiments, the parameter ranges from 0.1 to 1.5, the clustering performance is illustrated in Figure 5. It indicates that FTC is not so sensitive to the value of parameter . The Measure value varies only slightly with the value of changing.
We have analyzed the computation complexity of FTC above. The time complexity of FTC is , where denotes the number of iterations, and the time complexities of such fuzzy coclustering algorithms as FCM and FCCM are all . FTC is more timeconsuming because FCCM, RFCC, and FCCI group objects simultaneously on two dimensions; however FTC objects on three dimensions whose clustering process takes into account one more dimension. Table 2 shows the actual runtime of the six algorithms in above experiments. It can be seen that sometimes FTC is more timesaving than other fuzzy coclustering algorithms; though its time complexity is higher theoretically. In the first and second groups of experiments, FTC generates clusters in 80 and 235 ms, respectively; however FCCM generates clusters in 154 and 287 ms and RFCC generates clusters in 151 and 350 ms. Even in the third group of experiments, clustering time of FTC is less than times of such coclustering algorithms as FCCM and FCCI. In addition, clustering time of FTC is manageable by adjusting the value of . Therefore, FTC may complete clustering process more rapidly in practice than in theory.
5. Conclusion
Recently, some new clustering algorithms have been proposed, such as fuzzy clustering and fuzzy coclustering. They still focus primarily on contingency tables with two dimensions and are insufficient in threedimensional scenarios.
In this paper, a novel fuzzy triclustering approach, FTC, based on simultaneous clustering of threedimensional memberships is proposed. Under the proposed framework, a new objective function and the update rules are formulated. FTC offers numerous benefits, such as dimensionality reduction, interpretable clusters, and overlapping clusters, deriving from (fuzzy) coclustering. We implement three groups of experiments to evaluate the performance of FTC on MovieLens subsets, and it is also compared with some popular fuzzy (co)clustering algorithms and proves to outperform them.
It is challenging to determine the number of clusters in the literature. In our study, the value of is still specified by users manually, which determines that FTC is not unsupervised absolutely. In the future, we intend to incorporate techniques evaluating the number of clusters to extend our approach.
Mathematical Notations
, and :  Numbers of (co)clusters, objects, and features ( and ) 
:  Fuzzy object partitioning membership 
:  Fuzzy feature partitioning membership 
:  Relatedness measure between an object and a feature 
, and :  Coclustering userdefined parameters 
:  Number of iterations 
:  Maximum number of iterations parameter 
:  Convergence indicator. 
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was financially supported by the NSFC Fund (no. 61202286) and Special Fund for Fast Sharing of Science Paper in Net Era by CSTD (no. 2013117).