Complexity

Volume 2018, Article ID 9653404, 16 pages

https://doi.org/10.1155/2018/9653404

## Multityped Community Discovery in Time-Evolving Heterogeneous Information Networks Based on Tensor Decomposition

Correspondence should be addressed to Hongbin Huang; nc.ude.tdun@gnauhbh

Received 29 August 2017; Revised 15 January 2018; Accepted 31 January 2018; Published 6 March 2018

Academic Editor: Manlio De Domenico

Copyright © 2018 Jibing Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The heterogeneous information networks are omnipresent in real-world applications, which consist of multiple types of objects with various rich semantic meaningful links among them. Community discovery is an effective method to extract the hidden structures in networks. Usually, heterogeneous information networks are time-evolving, whose objects and links are dynamic and varying gradually. In such time-evolving heterogeneous information networks, community discovery is a challenging topic and quite more difficult than that in traditional static homogeneous information networks. In contrast to communities in traditional approaches, which only contain one type of objects and links, communities in heterogeneous information networks contain multiple types of dynamic objects and links. Recently, some studies focus on dynamic heterogeneous information networks and achieve some satisfactory results. However, they assume that heterogeneous information networks usually follow some simple schemas, such as bityped network and star network schema. In this paper, we propose a multityped community discovery method for time-evolving heterogeneous information networks with general network schemas. A tensor decomposition framework, which integrates tensor CP factorization with a temporal evolution regularization term, is designed to model the multityped communities and address their evolution. Experimental results on both synthetic and real-world datasets demonstrate the efficiency of our framework.

#### 1. Introduction

Most artificial online systems, such as World Wide Web, social networks, and collaboration networks, can be represented as information networks, which describe the interactions and relationships between numerous objects, for example, hyperlinks between web pages, friendships between users, and coauthorships between researchers. The information network analysis is attracting an increasing number of researchers from a variety of fields, such as social science [1, 2], machine learning [3–5], and recommendation systems [6, 7]. Community discovery is one of the most significant focuses in information network analysis, which aims to discover interpretable hidden structures, patterns of interactions among objects, and their evolution along with time in such network. Although community detection in networks has been studied for many years, most existing approaches are designed to analyze static information network [1, 8, 9] and homogeneous information network [10–12]. That is, there is only one type of objects and links contained in the network, and the objects and links are not time-varying.

However, in real-world scenarios, information networks are typically heterogeneous and time-evolving. In contrast to communities in traditional approaches, which only contain one type of static objects and links, communities in time-evolving heterogeneous information networks contain multiple types of dynamic objects and links. For example, the DBLP network, an open resource including most bibliographic information on computer science, is a typical time-evolving heterogeneous information network. DBLP network contains four types of objects: author (), paper (), venue (i.e., conference or journal) (), and term (). The links between different object types represent different semantic relationships, such as “an author wrote a paper” and “a paper published in a conference.” The most intriguing communities in DBLP are research areas, which contain the authors with similar research interests, the papers they wrote, the conferences they attended, and the terms they used. With the addition of new authors and new hot topics, the structures of communities are dynamic and varying gradually.

Although the traditional community discovery methods can be applied to time-evolving heterogeneous information network by converting such network into a set of homogeneous information networks and aggregating the time-evolving objects and links along with all timestamps into one snapshot, the rich semantic relationships among different object types and the dynamic property of the communities are lost. In recent years, community discovery in time-evolving heterogeneous information networks has emerged as an outstanding challenge and attracted the attention of many researchers. For instance, Sun et al. used net-clusters [13] to describe the communities and proposed a Dirichlet Process Mixture Model based algorithm named Evo-NetClus [14, 15] to detect the communities in heterogeneous information networks with star network schema. In the star network schema, the links only appear between target objects and attribute objects.

In this paper, we focus on community discovery in time-evolving heterogeneous information networks with general network schemas, which presents several challenges as follows:(i)Heterogeneity: obviously, the communities in heterogeneous information networks are also heterogeneous, which contain multityped objects and links.(ii)Time-varying: the communities are constantly changing, with new objects coming and old objects vanishing. We assume that the evolution of communities at two adjacent snapshots should be smooth.(iii)Being suitable for general network schema: the network schema of a heterogeneous information network is often more complex than star network schema. The community discovery method should be able to handle the general network schema.(iv)Online mode: although some offline frameworks can produce a global view of community evolution along time by capturing all historical information, online framework is more realistic.

To overcome the aforementioned challenges, we propose a tensor decomposition framework for modeling the multityped communities and address their evolution in time-evolving heterogeneous information networks with general network schemas. Essentially, a time-evolving heterogeneous information network consists of a sequence of network snapshots. We model the time-evolving heterogeneous information network as a sequence of multiway arrays, that is, tensors. Tensor is a highly effective and veracious approach for modeling high-mode data, which can naturally express the complex structures and interactions in heterogeneous information networks. By integrating the tensor CP factorization with a temporal evolution regularization term, the multityped communities and their evolution along time can be formalized as a tensor decomposition problem. A second-order stochastic gradient descent algorithm is presented to solve the problem, and the experimental results on both synthetic and real-world datasets demonstrate the efficiency of our framework.

The rest of this paper is organized as follows. In Section 2, we discuss the related work on community discovery in time-evolving heterogeneous information networks. Section 3 formalizes the problem as tensor decomposition, which integrates tensor CP factorization with a temporal evolution regularization term. A second-order stochastic gradient descent algorithm is presented in Section 4. Section 5 discusses some implementation issues, including dead and new objects, online deployment, and time complexity analysis. The experimental results on both synthetic and real-world datasets are presented in Section 6. Finally, the conclusions are drawn in Section 7.

#### 2. Related Work

Community discovery is a fundamental technique of information network analysis. Many creative methods for discovering communities in static and homogeneous network have been deployed in the past decades. Stochastic block model [16, 17] and mixed membership model [18] are powerful probabilistic community discovery models for analyzing static networks. These two models, however, lack capability of time-evolving networks and cannot be directly used for heterogeneous information networks.

Tracking the evolution of communities [11, 19] takes the dynamic properties in time-evolving networks into consideration. A commonly used framework [20–22] is to apply the static community detection algorithms for each snapshot of the time-evolving networks and then generate the evolution of communities by computing the match between two adjacent snapshots. Another attempt to track community evolution in time-evolving networks is multiobjective optimization model [23–25], which integrates the measurement of community quality and temporal smoothness into a multiobjective cost function. Nevertheless, these methods are designed for homogeneous networks.

Recently, the community discovery in heterogeneous information networks has become a hot topic. Tang et al. introduced the community evolution in multimode network and proposed a framework which partitioned the multimode network into a set of bityped networks [26, 27]. Sun et al. used net-clusters [13] to describe the communities and proposed Evo-NetClus [14, 15] to detect the communities automatically. However, the net-clusters and Evo-NetClus are only suitable for star network schema, where the links only appear between target objects and attribute objects.

To analyze the heterogeneous information networks with general network schemas, tensor factorization offers a promising way for extracting hidden communities in such networks. Tensor is an effective expression of complicated and interpretable structures among different dimensions in heterogeneous information network. For instance, Lin et al. proposed MetaGraph Factorization [28, 29] to detect the communities from dynamic social networks. In addition, a tensor factorization based mixed membership framework [30] simulates the generation of communities as Dirichlet distribution, which can identify the communities automatically. However, this method needs to partition the heterogeneous network into four parts artificially and organize them as a 3-star network. Meanwhile, the 3-star count tensor must be converted to an orthogonal symmetric tensor. Thus the capability of this method to deal with time-evolving heterogeneous information networks could be degraded.

Our prior works in [31–33] have also focused on clustering heterogeneous information networks based on tensor decomposition, which can cluster multityped objects simultaneously in heterogeneous information networks. However, these methods treat the heterogeneous information networks as static networks and integrate the time-evolving networks into one snapshot, which lose the dynamic properties among multityped objects and links.

Another line related to our work is on the incremental tensor factorization [34]. Though tensor factorization has been widely studied in many domains, such as image processing [35] and computer vision [36], the incremental tensor factorization is still a challenging intellectual task [34]. Sun et al. proposed a general framework of incremental tensor analysis [34] for mining higher-order data streaming, which included three methods: dynamic tensor analysis, streaming tensor analysis, and window-based tensor analysis. Even though the higher-order data streaming can be effectively analyzed in such framework, the smooth evolution of latent patterns cannot be guaranteed.

#### 3. Problem Formulation

Following the works by Sun et al. in [15] and our prior work [33], we first introduce some definitions of heterogeneous information networks and tensor construction from a given heterogeneous information network.

A* heterogeneous information network* [15] is a graph consisting of more than one type of objects or links . Assume that belongs to object types , and belongs to link types . That is, in a heterogeneous information network, or . Otherwise, the network becomes a homogeneous information network.

The indicates the set of objects from the th type. We denote an arbitrary object in as , for ; , where is the number of objects in type ; that is, . Thus, the total number of objects in the heterogeneous information network is given by .

The* network schema* [15] for a given heterogeneous information network is a metatemplate that indicates the formation of object types and link types in the network. The network schema is denoted by . In other words, is an instance of . For example, the star network schema shown in Figure 1 is a typical network schema, in which four types of objects are contained, that is, author, paper, venue, and term. In Figure 1, paper is target object, and the others are attribute objects. The feature of star network schema is that the links in the network only appear between target object and attribute objects.