Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 235790, 10 pages

http://dx.doi.org/10.1155/2015/235790

## A Partitioning Based Algorithm to Fuzzy Tricluster

School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China

Received 17 October 2014; Revised 30 December 2014; Accepted 1 January 2015

Academic Editor: Zhan Shu

Copyright © 2015 Yongli Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of three-dimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a three-dimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on *MovieLens* demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches.

#### 1. Introduction

Clustering is one of the most important techniques of exploratory data mining, used in many applications such as automatic categorization of documents, grouping search engine results, detecting social communities, and others [1, 2]. Clustering algorithms try to detect intrinsic structures of data objects so that a set of clusters is generated where intercluster similarity is maximized and intracluster similarity is minimized.

There is already a large body of work that investigates approaches to clustering objects [3–5]. With different accumulation rules of data objects in clustering and methods employing these rules, clustering algorithms could be divided into four types: hierarchical clustering, partitional clustering, density and grid-based clustering, and others [6]. Hierarchical and partitional clustering methods are used in many applications, and their representative algorithms include agglomerative hierarchical clustering algorithm and* K*-Means algorithm, respectively. Generally, these algorithms put each object into a single cluster. However, one object may contain multiple subjects and characteristics, so it could belong to multiple clusters [7]. This suggests the appearance of soft clustering algorithms.

Different from hard clustering algorithms where each object belongs to exactly one cluster, soft clustering allows an object to exist in multiple clusters. Introducing fuzziness to clustering gives us the flexible solutions for soft clustering algorithms [8]. Fuzzy clustering extends traditional clustering and represents the affiliation of objects to clusters by memberships. Fuzzy -Means (FCM) [9] is a representative fuzzy clustering algorithm. Based on this approach, there are many varieties [5, 8]. Nowadays, as most web datasets are known to be large and high dimensional, it becomes more and more challenging to develop satisfactory clustering process. Still, some new clustering algorithms have been designed such as fuzzy coclustering which combines fuzzy clustering with coclustering. Coclustering is a technique for extracting object-feature structures from cooccurrence information of objects and features and is very effective to handle high dimensional data by simultaneously grouping objects and features based on the cooccurrence information [10–12]. Several fuzzy coclustering algorithms have been previously proposed. Oh et al. [8] presented an algorithm of fuzzy clustering for categorical multivariate (FCCM) data. Tjhi and Chen [13] designed a new algorithm fuzzy coclustering with Ruspini’s (FCR) condition. In 2013, Hanmandlu et al. [14] proposed a fuzzy coclustering algorithm for images (FCCI). Besides FCR, Tjhi and Chen introduced two algorithms: one is possibilistic fuzzy coclustering (PFCC) [15]; the other is robust fuzzy coclustering (RFCC) [16]. In 2013, Yan et al. [17] proposed a new heuristic semisupervised fuzzy coclustering algorithm (SS-HFCR) for categorization of large web documents. Keeping the benefits of coclustering and fuzzy clustering, these algorithms improves the representation of overlapping clusters using fuzzy membership function and suitable to categorize documents, particularly web documents in Internet.

Although above clustering algorithms show good clustering quality, they are insufficient in such applications as time-location-type environmental sensor monitoring, time-author-keywords social network analysis, source-destination-text web graph mining. In these applications, data space has three dimensions. None of the existing coclustering algorithms is designed to work in this scenario.

Motivated by the three-dimensional clustering requirements, we propose a novel fuzzy triclustering algorithm, called FTC. The approach combines some popular algorithms such as fuzzy standard clustering and fuzzy coclustering and expands them to support three dimensions.

The remainder of this paper is organized as follows. In Section 2, we provide a literature review of fuzzy standard clustering and coclustering algorithms. Section 3 introduces in detail the proposed FTC. Section 4 presents our experimental results on the* MovieLens* dataset. Finally, we conclude our work.

#### 2. Related Work

In this section, we briefly review two types of clustering algorithms that have been proposed earlier, fuzzy clustering and fuzzy coclustering. These algorithms could help to understand FTC introduced in next section. The explanations on the mathematical notations used in this paper are listed in the Mathematical Notations section.

##### 2.1. Fuzzy Clustering

FCM is the most classical fuzzy clustering algorithm, which is known as the fuzzy version of the -Means and has been studied by many researchers. FCM aims at minimizing the objective function in (1), subject to the membership constrain in (2). Consider where is the th object, is centroid of the th cluster, and measures the distances between objects and clusters. The former item in (1) means that each object should be put into a close cluster, which achieves the purpose of clustering that intercluster similarity is maximized and intracluster similarity is minimized. The latter item controls the fuzziness of clustering, and is just the weighting parameter that specifies the degree of fuzziness.

Lagrange’s method of indeterminate multiplier is used to derive the objective function. The solution of this constrained optimization problem can be approximated by Picard iteration and alternating optimization through the following equations (3) and (4), which are the update equations for the object memberships and cluster centroids, respectively, in each iteration. Consider the following:

In FCM, any point has a set of coefficients, , giving the degree of being in the th cluster, and is the centroid of a cluster which is the mean of all points, weighted by their degrees of belonging to the cluster.

There have been hundreds of FCM-variant algorithms in the literatures that are designed based on the basic principle of FCM, which shows that FCM is very important and still popular in clustering analysis now.

##### 2.2. Fuzzy Coclustering

Coclustering is an important technique in data mining which simultaneously clusters objects and features. Compared to standard clustering, coclustering can offer several benefits [13, 16], including dimensionality reduction, interpretable document cluster, and improvement in accuracy due to local model on clustering.

Fuzzy coclustering extends coclustering by adding fuzzy sets theory into coclusters. Besides the above three benefits, fuzzy coclustering could generate coclusters that are more realistic because one object may contain multiple subjects and one feature may contain multiple concepts.

Due to these advantages, fuzzy coclustering algorithms have been widely studied and developed. Prominent algorithms proposed earlier include FCCM, FCR, FCCI, and PFCC. We briefly review FCCM and FCCI in this paper which are theoretical principles of FTC.

FCCM tries to maximize an objective function defined in (5) to complete coclustering, subject to the membership constraints in (6) and (7). Consider the following:

In (5), there are three terms. The first term is the degree of aggregation which tries to make highly interrelated objects and features. The and are two membership functions, indicating object membership and feature membership, respectively. The second and the third terms control the fuzziness of clustering, where and are two parameters that adjust the levels of fuzziness of object and feature memberships, respectively.

By applying Lagrange multiplier method, the solution of this constrained optimization problem can be approximated by iteratively updating the following equation in an alternating fashion:

To understand fuzzy coclustering easily, Oh et al. gave a numerical example. They applied FCCM to literature retrieval data set which shows the cooccurrence relations among the literatures and the key words. The rows represent the literatures and columns are the key words. The results of FCCM were shown as two tables. The first table listed the memberships of literatures, that is, the values of . And the second table showed the memberships of key words (values of ). Larger memberships mean literatures and key words are more likely to belong to a cluster; thus the final clusters could be generated based on the two tables.

Besides FCCM, FCCI is a novel color segmentation technique using fuzzy coclustering approach where both the objects and features are assigned membership functions. In FCCI, the following objective function equation (9) is minimized subject to the membership constraints in (10) and (11). Consider the following:

The first term in (9) shows the distance relationship. In this term, , which means the crucial distance equals the square of Euclidean distance between feature data point and the feature cluster centroid . The minimization of this term assigns to an object a higher membership value taking into account the feature cluster center it is closest to and which is more relevant than other features for that particular cluster. The second and third terms in (9) contribute to the fuzziness in the resulting clusters, where and are the weighting parameters specifying the degree of fuzziness.

The solution of this constrained optimization problem can be approximated by iteratively updating following equations in an alternating fashion:

#### 3. Proposed FTC Algorithm

Although experimental results show that clustering quality of above clustering algorithms is encouraging, they are insufficient in some scenarios. Here there are two examples.

*Scenario 1*. Users submit queries to search engines and obtain search results. When grouping users, we generally employ the user-document matrix and neglect queries submitted. Sometimes the query information is actually useful. A user selects one document just because he submits a relative query before. Thus it becomes an issue of conditional probability and users, queries, and documents construct a three-dimensional space.

*Scenario 2*. We could model user interests by mining documents clicked. However, interests of a user may change over time. If we analyze all documents he browsed without regard to time, user interest drift could not be mined. Therefore, time is an important factor. When we take account of time factor, the data format becomes a user-document-time three-dimensional matrix from a user-document contingency table.

In these scenarios, we have data spaces with three dimensions, and the data format provided to us is not a contingency table with data object and feature, but a three-dimensional matrix. Current clustering algorithms primarily aim at contingency table, and few concern three-dimensional data. Gnatyshak et al. [18] proposed two novel methods for biclustering and triclustering data collected from online social networks, and they could reveal users’ interests as tags and use them to describe Vkontakte groups with triclustering. In the context of two relational datasets that share labels among one of the dimensions, Mahiskar et al. [19] simultaneously process two datasets to unveil triclusters and presented a triclustering algorithm that searches for meaningful combinations of biclusters in two related datasets. Guigourès et al. [20] introduced a novel technique to track structures in time evolving graphs, based on a parameter free approach for three-dimensional coclustering of the source vertices, the target vertices, and the time. Above algorithms implement triclustering process without fuzzy sets theory, which means they concentrate on hard clustering.

We propose a novel fuzzy triclustering algorithm, FTC, in this paper. In FTC, we target to keep the benefits of coclustering/triclustering and fuzzy clustering simultaneously. As indicated in Figure 1, the new algorithm aims at essentially achieving fuzzy three-dimensional clustering (as Figure 1(e)). Figures 1(a) and 1(b) refer to hard clustering and hard coclustering, respectively. Figures 1(c) and 1(d) both indicate fuzzy coclustering. The latter is more flexible than the former because each cluster can contains part elements of some rows and columns in the later mode. We enlarge the issue of Figure 1(d) from two dimensions to three and in turn extend fuzzy coclustering algorithms to fuzzy triclustering (FTC) algorithm.