Advances in Fuzzy Systems

Volume 2017, Article ID 9842127, 8 pages

https://doi.org/10.1155/2017/9842127

## FCM-Type Fuzzy Coclustering for Three-Mode Cooccurrence Data: 3FCCM and 3Fuzzy CoDoK

^{1}Graduate School of Engineering, Osaka Prefecture University, Sakai, Osaka 599-8531, Japan^{2}Graduate School of Humanities and Sustainable System Sciences, Osaka Prefecture University, Sakai, Osaka 599-8531, Japan

Correspondence should be addressed to Katsuhiro Honda; pj.ca.u-ufakaso.sc@adnoh

Received 25 August 2017; Accepted 27 November 2017; Published 18 December 2017

Academic Editor: Ferdinando Di Martino

Copyright © 2017 Katsuhiro Honda et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Cocluster structure analysis is a basic technique for revealing intrinsic structural information from cooccurrence data among objects and items, in which coclusters are composed of mutually familiar pairs of objects and items. In many real applications, it is also the case that we have not only cooccurrence information among objects and items but also intrinsic relation among items and other ingredients. For example, in food preference analysis, users’ preferences on foods should be found considering not only user-food cooccurrences but also the implicit relation among users and cooking ingredients. In this paper, two FCM-type fuzzy coclustering models, that is, FCCM and Fuzzy CoDoK, are extended for revealing intrinsic cocluster structures from three-mode cooccurrence data, where the aggregation degree of three elements in each cocluster is maximized through iterative updating of three types of fuzzy memberships for objects, items, and ingredients. The characteristic features of the proposed methods are demonstrated through a numerical experiment.

#### 1. Introduction

In many web data analyses, we often have cooccurrence information among objects and items instead of multidimensional observations on objects. For example, web document summarization and web market purchase summarization are reduced to document-keyword cooccurrence analysis and customer-product basket analysis, respectively. FCM-type fuzzy coclustering is an extension of fuzzy -Means (FCM) [1], where the degree of belongingness to clusters is represented by fuzzy memberships under the fuzzy partition concept [2]. Fuzzy clustering for categorical multivariate data (FCCM) [3] replaced the FCM clustering criterion with the aggregation degree of objects and items in coclusters by adopting entropy-based fuzzification [4, 5]. In fuzzy coclustering of documents and keywords (fuzzy CoDoK) [6], the FCCM criterion was maximized with quadratic regularization-based fuzzification [7], so that it can be applied to large data sets.

Besides their usefulness in many applications, it is also the case that the conventional fuzzy coclustering models cannot work well under severe influences of other intrinsic features. For example, in food preference analysis, users’ preferences on foods cannot be revealed considering only user-food cooccurrences but should be found considering implicit relation among users and cooking ingredients, which compose the foods. Then, when we have not only cooccurrence information among objects and items but also intrinsic relation among items and other ingredients; we can expect to find more useful cocluster structures in three-mode cooccurrence information data.

In this paper, two FCM-type fuzzy coclustering models are extended for analyzing three-mode cooccurrence information data, in which FCM-like alternative optimization schemes are performed considering cooccurrence relation among objects, items, and other ingredients. First, the FCCM algorithm is extended to the three-mode FCCM (3FCCM) algorithm by utilizing three types of fuzzy memberships for objects, items, and ingredients, where the aggregation degree of three features in each cocluster is maximized through iterative updating of memberships supported by the entropy-based fuzzification. Second, the 3FCCM algorithm is further extended to the three-mode Fuzzy CoDoK (3Fuzzy CoDoK) by introducing the quadratic regularization-based fuzzification. The characteristic features of the proposed methods are demonstrated through a numerical experiment.

The remainder of this paper is organized as follows: Section 2 gives a brief review on the conventional FCM-type fuzzy coclustering models and Section 3 proposes the novel extensions of the FCM-type coclustering models for three-mode cooccurrence information data. The experimental result is shown in Section 4 and a summary conclusion is presented in Section 5.

#### 2. FCM-Type Fuzzy Coclustering

Fuzzy -Means (FCM) [1, 5] is a fuzzy extension of the conventional crisp -Means [8] by introducing fuzzy partition concept [2]. When we have multidimensional observations on objects , they are partitioned into fuzzy clusters by estimating fuzzy memberships for each object, where represents the degree of belongingness of object to cluster and is generally calculated under the probabilistic constraint of . In FCM, each cluster is represented by prototypical centroids and objects are partitioned so that the membership-weighted within-cluster errors from prototypes are minimized in the multidimensional data space. On the other hand, in the coclustering context, we have only relational information among elements but do not use any cluster prototypes in multidimensional space. In this paper, two variants of FCM-type fuzzy coclustering are considered.

##### 2.1. FCCM

Assume that we have cooccurrence information among objects and items; for example, in document-keyword analysis, can be the frequency of keyword (item) in document (object) . The goal is to extract coclusters composed of mutually familiar pairs of objects and items by simultaneously estimating fuzzy memberships of objects and items such that mutually familiar objects and items with large tend to have large memberships in the same cluster considering the aggregation degree of each cocluster. The sum of aggregation degrees to be maximized is defined as [3]This objective function is based on the similar concept to such relational matrix decomposition methods as corresponding analysis (CA) [9] and nonnegative matrix factorization (NMF) [10], where relational matrices are decomposed into two component matrices having orthogonal columns. Beside both objects and items are equally forced to be exclusive in the matrix decomposition methods, FCM-type coclustering models adopt different kinds of partition constraints [11]. Here, object memberships have a similar role to those of FCM under the same condition, such that . If item memberships also obey a similar condition of , the aggregation criterion has a trivial maximum of in a particular cluster . Then, in order to avoid trivial solutions, are forced to be exclusive in each cluster, such that , and, so, represent the relative typicalities of items in each cluster. As a result, object partitioning is mainly targeted in FCM-type coclustering while CA and NMF equally force exclusive nature to partitions of both objects and items.

Because of the linear nature with respect to and , (1) is maximized with crisp memberships of and in a similar manner to -Means. In order to find fuzzy partition, some fuzzification mechanism must be introduced like FCM.

In [3], the linear aggregation criterion of (1) was nonlinearized with respect to and by entropy-based penalties [4, 5] for fuzzification of two-types of memberships and the objective function for Fuzzy Clustering for Categorical Multivariate data (FCCM) was proposed aswhere and are the fuzzification weights for object memberships and item memberships, respectively. Larger and bring fuzzier partitions of objects and items.

Based on the alternative optimization principle, and are iteratively updated until convergent using the following updating rules:

Although the two updating rules are always fair under the constraints, they can be numerically unstable due to overflows because function can take extremely large values with very large or .

##### 2.2. Fuzzy CoDoK

As an alternative approach, Kummamuru et al. [6] extended FCCM by introducing the quadric term-based fuzzification mechanism [7] instead of the entropy-based fuzzification, so that it can handle larger data sets. The objective function of fuzzy coclustering of documents and keywords (Fuzzy CoDoK) was proposed aswhere and play similar roles to FCCM.

Based on the Lagrangian multiplier method, the updating rules are obtained as

The updating rules are more numerically stable than those of FCCM because their calculation ranges are in linear orders with respect to and . However, and can be negative and are not fair under the constraints. Then, in practice, the negative memberships are set to zero, and the remaining positive memberships are renormalized so that their sum is one.

Besides the usefulness of these fuzzy coclustering models in handling two-modes cooccurrence information, their cocluster structures may be influenced by other third elements. Specifically, if each item is related to some other ingredients, the partition quality is expected to be improved by considering the intrinsic relation among three-mode elements. In the following section, the FCM-type coclustering algorithms are extended for analyzing such three-mode cooccurrence information data.

#### 3. Extension of FCM-Type Coclustering for Three-Mode Cooccurrence Data Analysis

Assume that we have cooccurrence information among objects and items, and the items are characterized with other ingredients, where cooccurrence information among items and other ingredients are summarized in matrix with representing the cooccurrence degree of item and ingredient . For example, in food preference analysis, can be an evaluation matrix by users on foods and may be appearance/absence of cooking ingredients in foods. The goal of three-mode cocluster analysis is to reveal the cocluster structures among the objects, items, and ingredients considering and and intrinsic relation among objects and ingredients.

In order to extend the conventional FCCM and Fuzzy CoDoK algorithms to three-mode cocluster analysis, additional memberships are introduced for representing the membership degree of ingredients to cocluster . Besides the familiar pairs of objects and items simultaneously occur in the same cluster; typical ingredients of the items should also belong to the same cluster. Then, the aggregation degree to be maximized in the three-mode coclustering can be aswhere each cluster should be composed of the familiar group of objects, items, and ingredients such that they are assigned to the same cluster when object cooccurs with item composed of ingredient by implying an intrinsic connection between object and ingredient .

In the following parts of this section, the conventional FCCM and Fuzzy CoDoK algorithms are extended to their three-mode versions utilizing the above aggregation criterion.

##### 3.1. Three-Mode Extension of FCCM

First, the FCCM algorithm is extended by using the modified aggregation criterion of (6) supported by the entropy-based fuzzification scheme. The objective function for three-mode FCCM (3FCCM) is constructed by modifying the FCCM objective function of (2) aswhere is the additional penalty weight for fuzzification of ingredient memberships . The larger the value of is, the fuzzier the ingredient memberships are.

Here, it should be noted that we can adopt two different types of constraints to ingredient memberships , such that object-type probabilistic constraint or item-type typicality constraint . In such cases as food preference analysis, some common ingredients may be widely used in many foods while other rare ingredients can be negligible in all clusters. Then, from the view point of typical ingredient selection for characterizing cocluster features, item-type typicality constraint is adopted in this paper, such that .

The clustering algorithm is an iterative process of updating , , and under the alternative optimization principle. Considering the necessary conditions for the optimality , , and under the sum-to-one constraints, the updating rules for three memberships are given as

##### 3.2. Three-Mode Extension of Fuzzy CoDoK

Next, Fuzzy CoDoK is extended to the three-mode coclustering model named three-mode Fuzzy CoDoK (3Fuzzy CoDoK). The objective function of (4) is modified aswhere play a similar role to that in 3FCCM and the three types of fuzzy memberships also follow the same constraints with 3FCCM.

The updating rules are given in the similar manner to the previous section as follows:

In a similar manner to Fuzzy CoDoK, the above updating rules are computationally more stable than 3FCCM because of the lack of function. However, , , and can be negative. Then, in practice, the negative memberships should be set to zero, and the remaining positive memberships can be renormalized so that their sum is one.

##### 3.3. A Sample Algorithm for FCM-Type Fuzzy Coclustering for Three-Mode Cooccurrence Data

Following the above derivation, a sample algorithm is represented as follows:

*[FCM-Type Fuzzy Coclustering for Three-Mode Cooccurrence Data: 3FCCM and 3Fuzzy CoDoK]*(1)Given cooccurrence matrix and cooccurrence matrix , let be the number of clusters. Choose the fuzzification weights , , and .(2)*[Initialization]* Randomly initialize , , and , such that , , and .(3)*[Iterative process]* Iterate the following process until convergence of all .(a)Update with (8) for 3FCCM or (12) for 3Fuzzy CoDoK.(b)Update with (9) for 3FCCM or (13) for 3Fuzzy CoDoK.(c)Update with (10) for 3FCCM or (14) for 3Fuzzy CoDoK.

#### 4. Experimental Results

##### 4.1. Experimental Design

In order to demonstrate the characteristics of the proposed algorithms, a numerical experiment was performed with an artificially generated three-mode data set, in which 40 objects have relational connection with 50 items and the items are related to 30 ingredients . The artificial three-mode cooccurrence matrices were generated under the assumption that objects and ingredients have intrinsic (unknown) connections, as shown in the matrix of Figure 1(a), where black and white cells represent full-connection and no-connection , respectively. (Note that all the following gray-scale figures depict visual images of matrices, where black and white cells represent maximum and minimum values.)