Mathematical Problems in Engineering

Volume 2016 (2016), Article ID 3790590, 15 pages

http://dx.doi.org/10.1155/2016/3790590

## A Neighborhood-Impact Based Community Detection Algorithm via Discrete PSO

Aeronautics and Astronautics Engineering College, Air Force Engineering University, Xi’an 710038, China

Received 30 August 2015; Revised 6 December 2015; Accepted 31 December 2015

Academic Editor: Daniel Aloise

Copyright © 2016 Dongqing Zhou and Xing Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The paper addresses particle swarm optimization (PSO) into community detection problem, and an algorithm based on new label strategy is proposed. In contrast with other label propagation strategies, the main contribution of this paper is to design the definition of the impact of node and take it into use. Special initialization and update approaches based on it are designed in order to make full use of it. Experiments on synthetic and real-life networks show the effectiveness of proposed strategy. Furthermore, this strategy is extended to signed networks, and the corresponding objective function which is called modularity density is modified to be used in signed networks. Experiments on real-life networks also demonstrate that it is an efficacious way to solve community detection problem.

#### 1. Introduction

Complex networks have attracted increasing attention of researchers from different fields, such as physics, sociology, mathematics, and computer science [1]. The related theory has been widely applied in many aspects, including the Internet [2], communication [3], biology [4], and economy [5]. Networks are usually composed of subgroup structures, whose interconnections are dense and intraconnections are sparse. This property is called community structure. Detecting community structure is one of the fundamental issues in networks study; it could reveal latent meaningful structure in networks. It is particularly important to detect the structure of the commonly used networks, such as daily social networks, recommendation system, and nation power distribution networks [6].

Community detection is an NP-hard problem [7]; traditional methods for detecting communities in networks can be concluded in two categories: graph partitioning and hierarchical clustering. The graph partitioning method has been widely used in computer science and the related fields [8]. However, this method needs to know the number of communities and the size of them before partitioning. The hierarchical clustering methods include agglomerative clustering algorithm and divisive clustering algorithm; they do not require the number or size of the communities [9, 10]. However, the results of this method depend on the specific similarity measure adopted.

In recent years, a lot of effort has been made to develop new approaches and algorithms to detect and quantify community structure in complex networks. Newman introduces the modularity as a stopping criterion originally, which becomes one of the most commonly used and best known quality functions. A lot of modularity based methods have been proposed, including modularity optimization, simulated annealing [11], extremal optimization [12], and spectral optimization [13]. These modularity based methods provide an outstanding way to solve the community problem and many researches are carried out based on these methods. Pizzuti models the community problems as single-objective optimization problem and multiobjective optimization problem, respectively, in [14] and [15]. Arenas and Díaz-Guilera investigate the connection between the dynamics of synchronization and the modularity on complex networks in [16]. However, the modularity has the disadvantage of resolution limits found by Fortunato and Barthélemy in [17]. It is that the modularity has an intrinsic scale and the modularity may not be resolved even in the extreme case if it is smaller than this scale.

Li et al. propose a quantitative measure called the modularity density, which uses the average modularity degree to evaluate the partition of a network in [18]. This method provides a mesoscopic way to describe the network structure and overcomes the resolution limit of modularity. The larger the value of modularity density is, the more accurate a partition is. Thus, the community detection can be viewed as an optimization problem of finding a partition of a network which has the maximum modularity density.

Particle swarm optimization (PSO) algorithm is successfully used in many optimization fields [19–22]. It was proposed to solve continuous optimization problems at first; then Kennedy and Eberhart develop the continuous PSO to discrete binary PSO in [23]. Chen et al. propose the candidate solution and velocity based on discrete PSO in [24]. Recently, Gong et al. use discrete PSO to solve the community detection problem in [25]. Compared with traditional optimization methods, the discrete PSO is simple to implement and it has a high speed of convergence. It needs to know neither the number of the communities nor the size of the communities. What is more, bare mathematical assumptions that may be needed in the conventional methods are ignored. These advantages make it feasible and promising in solving community detection problems.

In this paper, a new algorithm based on discrete PSO is proposed to solve community detection problems by optimizing the modularity density function. We design a definition called the impact of node in this paper, which takes the neighborhood information into consideration. A new initialization and updating strategy based on the impact of node is proposed. At last, we modify the modularity density function to signed networks according to the character of community, and the proposed strategies are also extended to signed networks.

The rest of this paper is organized as follows. Section 2 is a review of the related introduction of modularity density. Section 3 gives a detailed description of the proposed method. In Section 4, the experimental results of the proposed algorithm in comparison with other approaches are shown. At last the conclusions are summarized in Section 5.

#### 2. Community Structure and Modularity Density

A network can be modeled as , where and represent the vertices and edges, respectively. Assume that is the adjacency matrix of . If there is a link between nodes and , ; otherwise . Suppose that is subgraph which belongs to , and is a node which belongs to . is the degree of the node ; , . The community of a network usually has the following property:

It means that the sum of all degrees within the community is larger than the sum of all degrees toward the rest of the network.

If the links between different nodes have negative or positive signs, they are called the signed networks. The signed networks can be modeled as . is the set of nodes; PE and NE are the positive and negative links, respectively. Let be the link between nodes and ; then the adjacency matrix can be formulated as follows: if , ; if , . Then the signed networks usually have the following property:in which

In an intuitive way, take the real-life networks SPP and GGS as examples (please refer to Section 4.5) for signed networks; the solid line and dashed line represent positive and negative links, respectively. The internal positive degree in a community is dense, and the negative external degree between different communities is dense.

Community detection can be formulated as a* modularity* () optimization problem. was proposed by Newman in [13], which aims at finding a partition of the network. Suppose that there is a graph whose edges are drawn at random; it has the same distribution of degrees as . Modularity is such a measurement that maximizes the sum of the inner edges over all the modules of minus that of which has the expected sum of number of inner edges [26]. As is an evaluation function to estimate the community structure of the network, the larger the value of is, the clearer the structure of the community is. Otherwise, the structure is more obscure. The mathematical description of is as follows:where is the number of edges in the network. If and are in the same community, . Otherwise, . According to [18], another form of is as follows: where is the number of communities and , , and .

A class of methods aiming to maximize the modularity has been developed. However, the modularity has the disadvantage of resolution limits that it contains an intrinsic scale which depends on the size of links in the network. It cannot detect them exactly when the modules are smaller than this scale.

*Modularity density* () was proposed by Li et al. in [18] to evaluate the partition of a network based on the concept of average modularity degree. It overcomes the resolution limit in community detection: and mean the average internal and external degrees of the th community, respectively. tries to maximize the average internal degree and minimize the average external degree of the communities. is related to the density of subgraphs; it provides a way to overcome the problem that is sensitive to the the size of network and interconnections of modules. Thus, we could use to decide whether the networks are partitioned into correct communities. According to the definition of modularity density , the larger the value of is, the more accurate a partition is. Then Li et al. improved to a general version by setting a parameter to the proportion of average internal degree and external degree:

is a convex combination of ratio cut and ratio association. It tries to maximize the density of links inside a community and minimize the density of links among different communities. When , is equal to ratio association; when , is equal to ratio cut; when , is equal to . We can decompose the network into large communities when using small ; otherwise, small communities are obtained. And for this, more details and levels of the network can be found. In this paper we use the discrete PSO algorithm to optimize to detect community structure.

#### 3. The Description of Proposed Algorithm

In this section, the proposed algorithm for community detection is described in detail. First, the objective function of the algorithm is given. Next, the impact of node is described. Meanwhile, the particle swarm initialization and updating are presented. At last, the framework of the proposed algorithm is elaborated, and the complexity analysis is presented.

##### 3.1. Objective Function

In this paper, we adopt as the objective function because of its efficiency in detecting community structure, which has been described in detail in Section 2. In order to solve signed network problem, we extend it to (8). It is consistent with the character of signed networks:

##### 3.2. The Impact of Node

In order to solve community detection problem with discrete PSO, label propagation is introduced in [25]. The label of node is used to assign the node to different communities, and the nodes which have the same label are considered to be in the same community. This label propagation strategy considers the number of nodes with the same label in the neighborhood to update the current node’s label. For example, consider a node whose neighbors are , when using this label propagation to update the label of node , check all the labels of to find the label which appears the most, and then node is assigned to this label. But actually, the contribution of each node to the neighborhood is not the same as the node we choose. In this subsection, a new definition that defines the “impact of node” is introduced and a new label propagation based on the “impact of node” is proposed. Consider a node whose neighbors are ; then the impact of node on node can be defined as follows: where is the impact of node, is the degree of node , means the label of node , and refers to the number of neighbors of that have the same label with node . It is noticed that the impact of node is usually different when the node we choose is in the neighborhood of different nodes. For signed networks, we only consider the positive links for that the number of negative links in a community is usually less than that of negative links between communities, and two nodes connected with negative links are usually located in different communities.

The ion describes the effectiveness of the node in the network, and the value of the ion can show the connection level in the corresponding local network. The bigger the ion is, the tighter the connection will be. So, the ion can be used as a measure to detect which community the node belongs to.

Suppose that the neighborhood of node has the label set ; the new label propagation based on the impact of node is as follows:It means that the label of current node is decided by its neighbour which has the biggest impact of node.

We choose the karate network to illustrate the rationality of the impact of node. The karate network was completed by Gong et al. after two years’ observation. The club splits into two groups because of the dispute between the administrator and the instructor of the club. Figure 1 shows the real community structure of karate network; the nodes with the same mark belong to the same community. Vertices 1 and 34 represent the administrator and the instructor, respectively. When deciding the labels of each node in the community, there are three cases needed to consider. The first case is that when all the neighbors of a node have different labels, then the degree of node will decide the label of the current node; the other case is that when some of the neighbors of the current node share the same label, then the impact of these neighbors on the current node depends on the number of neighbors that share the same label if they have similar degree; otherwise, both of the two factors ( and ) will play important role in deciding the label of the current node. Take node 3 as an example; its neighbors are , and their degrees are . If all the labels of these neighbors are different, node 3 will be assigned to the same label with node 1 since the ion of node 1 is the biggest according to our proposed strategy. If all the neighbors of node 3 have been assigned to the right communities, the ion of all these neighbors to node 3 is ; then node 1 has the highest impact on node 3, and node 3 is assigned to the same community with node 1. However, we can not decide the label of node 3 if we use label propagation method in [25] in both cases.