Abstract
The paper addresses particle swarm optimization (PSO) into community detection problem, and an algorithm based on new label strategy is proposed. In contrast with other label propagation strategies, the main contribution of this paper is to design the definition of the impact of node and take it into use. Special initialization and update approaches based on it are designed in order to make full use of it. Experiments on synthetic and reallife networks show the effectiveness of proposed strategy. Furthermore, this strategy is extended to signed networks, and the corresponding objective function which is called modularity density is modified to be used in signed networks. Experiments on reallife networks also demonstrate that it is an efficacious way to solve community detection problem.
1. Introduction
Complex networks have attracted increasing attention of researchers from different fields, such as physics, sociology, mathematics, and computer science [1]. The related theory has been widely applied in many aspects, including the Internet [2], communication [3], biology [4], and economy [5]. Networks are usually composed of subgroup structures, whose interconnections are dense and intraconnections are sparse. This property is called community structure. Detecting community structure is one of the fundamental issues in networks study; it could reveal latent meaningful structure in networks. It is particularly important to detect the structure of the commonly used networks, such as daily social networks, recommendation system, and nation power distribution networks [6].
Community detection is an NPhard problem [7]; traditional methods for detecting communities in networks can be concluded in two categories: graph partitioning and hierarchical clustering. The graph partitioning method has been widely used in computer science and the related fields [8]. However, this method needs to know the number of communities and the size of them before partitioning. The hierarchical clustering methods include agglomerative clustering algorithm and divisive clustering algorithm; they do not require the number or size of the communities [9, 10]. However, the results of this method depend on the specific similarity measure adopted.
In recent years, a lot of effort has been made to develop new approaches and algorithms to detect and quantify community structure in complex networks. Newman introduces the modularity as a stopping criterion originally, which becomes one of the most commonly used and best known quality functions. A lot of modularity based methods have been proposed, including modularity optimization, simulated annealing [11], extremal optimization [12], and spectral optimization [13]. These modularity based methods provide an outstanding way to solve the community problem and many researches are carried out based on these methods. Pizzuti models the community problems as singleobjective optimization problem and multiobjective optimization problem, respectively, in [14] and [15]. Arenas and DíazGuilera investigate the connection between the dynamics of synchronization and the modularity on complex networks in [16]. However, the modularity has the disadvantage of resolution limits found by Fortunato and Barthélemy in [17]. It is that the modularity has an intrinsic scale and the modularity may not be resolved even in the extreme case if it is smaller than this scale.
Li et al. propose a quantitative measure called the modularity density, which uses the average modularity degree to evaluate the partition of a network in [18]. This method provides a mesoscopic way to describe the network structure and overcomes the resolution limit of modularity. The larger the value of modularity density is, the more accurate a partition is. Thus, the community detection can be viewed as an optimization problem of finding a partition of a network which has the maximum modularity density.
Particle swarm optimization (PSO) algorithm is successfully used in many optimization fields [19–22]. It was proposed to solve continuous optimization problems at first; then Kennedy and Eberhart develop the continuous PSO to discrete binary PSO in [23]. Chen et al. propose the candidate solution and velocity based on discrete PSO in [24]. Recently, Gong et al. use discrete PSO to solve the community detection problem in [25]. Compared with traditional optimization methods, the discrete PSO is simple to implement and it has a high speed of convergence. It needs to know neither the number of the communities nor the size of the communities. What is more, bare mathematical assumptions that may be needed in the conventional methods are ignored. These advantages make it feasible and promising in solving community detection problems.
In this paper, a new algorithm based on discrete PSO is proposed to solve community detection problems by optimizing the modularity density function. We design a definition called the impact of node in this paper, which takes the neighborhood information into consideration. A new initialization and updating strategy based on the impact of node is proposed. At last, we modify the modularity density function to signed networks according to the character of community, and the proposed strategies are also extended to signed networks.
The rest of this paper is organized as follows. Section 2 is a review of the related introduction of modularity density. Section 3 gives a detailed description of the proposed method. In Section 4, the experimental results of the proposed algorithm in comparison with other approaches are shown. At last the conclusions are summarized in Section 5.
2. Community Structure and Modularity Density
A network can be modeled as , where and represent the vertices and edges, respectively. Assume that is the adjacency matrix of . If there is a link between nodes and , ; otherwise . Suppose that is subgraph which belongs to , and is a node which belongs to . is the degree of the node ; , . The community of a network usually has the following property:
It means that the sum of all degrees within the community is larger than the sum of all degrees toward the rest of the network.
If the links between different nodes have negative or positive signs, they are called the signed networks. The signed networks can be modeled as . is the set of nodes; PE and NE are the positive and negative links, respectively. Let be the link between nodes and ; then the adjacency matrix can be formulated as follows: if , ; if , . Then the signed networks usually have the following property:in which
In an intuitive way, take the reallife networks SPP and GGS as examples (please refer to Section 4.5) for signed networks; the solid line and dashed line represent positive and negative links, respectively. The internal positive degree in a community is dense, and the negative external degree between different communities is dense.
Community detection can be formulated as a modularity () optimization problem. was proposed by Newman in [13], which aims at finding a partition of the network. Suppose that there is a graph whose edges are drawn at random; it has the same distribution of degrees as . Modularity is such a measurement that maximizes the sum of the inner edges over all the modules of minus that of which has the expected sum of number of inner edges [26]. As is an evaluation function to estimate the community structure of the network, the larger the value of is, the clearer the structure of the community is. Otherwise, the structure is more obscure. The mathematical description of is as follows:where is the number of edges in the network. If and are in the same community, . Otherwise, . According to [18], another form of is as follows: where is the number of communities and , , and .
A class of methods aiming to maximize the modularity has been developed. However, the modularity has the disadvantage of resolution limits that it contains an intrinsic scale which depends on the size of links in the network. It cannot detect them exactly when the modules are smaller than this scale.
Modularity density () was proposed by Li et al. in [18] to evaluate the partition of a network based on the concept of average modularity degree. It overcomes the resolution limit in community detection: and mean the average internal and external degrees of the th community, respectively. tries to maximize the average internal degree and minimize the average external degree of the communities. is related to the density of subgraphs; it provides a way to overcome the problem that is sensitive to the the size of network and interconnections of modules. Thus, we could use to decide whether the networks are partitioned into correct communities. According to the definition of modularity density , the larger the value of is, the more accurate a partition is. Then Li et al. improved to a general version by setting a parameter to the proportion of average internal degree and external degree:
is a convex combination of ratio cut and ratio association. It tries to maximize the density of links inside a community and minimize the density of links among different communities. When , is equal to ratio association; when , is equal to ratio cut; when , is equal to . We can decompose the network into large communities when using small ; otherwise, small communities are obtained. And for this, more details and levels of the network can be found. In this paper we use the discrete PSO algorithm to optimize to detect community structure.
3. The Description of Proposed Algorithm
In this section, the proposed algorithm for community detection is described in detail. First, the objective function of the algorithm is given. Next, the impact of node is described. Meanwhile, the particle swarm initialization and updating are presented. At last, the framework of the proposed algorithm is elaborated, and the complexity analysis is presented.
3.1. Objective Function
In this paper, we adopt as the objective function because of its efficiency in detecting community structure, which has been described in detail in Section 2. In order to solve signed network problem, we extend it to (8). It is consistent with the character of signed networks:
3.2. The Impact of Node
In order to solve community detection problem with discrete PSO, label propagation is introduced in [25]. The label of node is used to assign the node to different communities, and the nodes which have the same label are considered to be in the same community. This label propagation strategy considers the number of nodes with the same label in the neighborhood to update the current node’s label. For example, consider a node whose neighbors are , when using this label propagation to update the label of node , check all the labels of to find the label which appears the most, and then node is assigned to this label. But actually, the contribution of each node to the neighborhood is not the same as the node we choose. In this subsection, a new definition that defines the “impact of node” is introduced and a new label propagation based on the “impact of node” is proposed. Consider a node whose neighbors are ; then the impact of node on node can be defined as follows: where is the impact of node, is the degree of node , means the label of node , and refers to the number of neighbors of that have the same label with node . It is noticed that the impact of node is usually different when the node we choose is in the neighborhood of different nodes. For signed networks, we only consider the positive links for that the number of negative links in a community is usually less than that of negative links between communities, and two nodes connected with negative links are usually located in different communities.
The ion describes the effectiveness of the node in the network, and the value of the ion can show the connection level in the corresponding local network. The bigger the ion is, the tighter the connection will be. So, the ion can be used as a measure to detect which community the node belongs to.
Suppose that the neighborhood of node has the label set ; the new label propagation based on the impact of node is as follows:It means that the label of current node is decided by its neighbour which has the biggest impact of node.
We choose the karate network to illustrate the rationality of the impact of node. The karate network was completed by Gong et al. after two years’ observation. The club splits into two groups because of the dispute between the administrator and the instructor of the club. Figure 1 shows the real community structure of karate network; the nodes with the same mark belong to the same community. Vertices 1 and 34 represent the administrator and the instructor, respectively. When deciding the labels of each node in the community, there are three cases needed to consider. The first case is that when all the neighbors of a node have different labels, then the degree of node will decide the label of the current node; the other case is that when some of the neighbors of the current node share the same label, then the impact of these neighbors on the current node depends on the number of neighbors that share the same label if they have similar degree; otherwise, both of the two factors ( and ) will play important role in deciding the label of the current node. Take node 3 as an example; its neighbors are , and their degrees are . If all the labels of these neighbors are different, node 3 will be assigned to the same label with node 1 since the ion of node 1 is the biggest according to our proposed strategy. If all the neighbors of node 3 have been assigned to the right communities, the ion of all these neighbors to node 3 is ; then node 1 has the highest impact on node 3, and node 3 is assigned to the same community with node 1. However, we can not decide the label of node 3 if we use label propagation method in [25] in both cases.
3.3. Particle Swarm Initialization
In PSO algorithm, proper initialization scheme can generate a set of high quality solutions and reduce the searching time significantly. Traditional initialization method generates solutions randomly. It does not take the adjacency matrix of the community into consideration, while this matrix usually provides important information to the optimization. In order to make use of this prior knowledge, we put forward a new particle swarm initialization based on the impact of node in this paper. The procedure is as follows.
Step 1. Initialize every node with unique labels, .
Step 2. Sort all the nodes in descending order according to their degree.
Step 3. Initialize the th particle, select the node which has the th highest degree (denoted as node ), and find the node (denoted as ) which has the maximal degree connected with node . Assign all the nodes connected with both node and node to the same label.
Step 4. Repeat Step until all the networks are gone through or the termination criteria are met.
If the number of nodes is less than particles, select two linked nodes randomly and assign them to the same label for the rest particles. For signed networks, only the positive links are considered in Steps and .
The description of proposed initialization method can be seen in Figure 2. According to the adjacency graph, node 6 has the greatest degree among all the nodes, so it is selected firstly. The degree of node 10 is the biggest in all the neighbors of node 6. In this way, nodes (6, 7, 8, 9, 10, 11) will be assigned to the same label. But if we simply assign the nodes connected to node 6 to the same label based on label propagation in [25], the nodes (4, 5, 6, 7, 8, 9, 10, 11) will share the same label. Actually, node 4 and node 5 should have the same label with node 3 according to the structure of the network. The advantage of proposed initialization scheme can be seen from this example.
3.4. Particle Status Updating
Particle status updating is a key process in PSO algorithm. It consists of velocity updating and position updating. In this paper, velocity updating uses the update rule in [25], and the position updating depends on the velocity updating and the proposed impact of node. The velocity updating rule in discrete form is as follows:where is the inertia weight and and are the cognitive and social components, respectively. and are two random numbers which range from 0 to 1. In this paper, the inertia weight is randomly generated within , and the cognitive and social components and are set to the typical value of 1.494 [25]. is defined as an XOR operator. Suppose that and ; the function is defined as follows:
The sigmoid function is defined as
Now the position updating rule is defined as the following discrete form: where the new position is generated by “position velocity,” that is, given a position and a velocity . The element of new position is defined as where is calculated by new label propagation in (10) which is based on the impact of node:
The procedure of the new label propagation can be described as follows:(1)Count the labels of all the nodes connected with the current node, which are called neighborhood nodes in the following steps.(2)Calculate the impact of these nodes according to (9).(3)Find the label of node which has the greatest ion and assign this label to the current node; if there are more than 2 labels which have the same votes, select one label randomly to assign it to the current node.
It is noted that this update strategy is more effective in the later stage of community detection because we often encounter the situation that a node’s neighbors belong to different communities. However, we do not know when the algorithm converges to a stable state on different networks. So, we use the update strategy in [25] in uneven number generation because of its simplicity, and we use the proposed update strategy in even number generation because of its effectivity.
3.5. Framework of the Proposed Algorithm
After illustrating all the corresponding strategies in detail, the framework of the proposed algorithm is presented in Algorithm 1.

3.6. Complexity Analysis
The main complexity of the proposed algorithm relies on the cycling process. Suppose that the network we tested has nodes and edges. In Algorithm 1, Step can be finished in linear time. The time complexity of Steps and is and in the worst case, respectively. Step can be accomplished in time; Steps and need operation. So, the total time complexity of the proposed algorithm is , which can be simplified as .
4. Experiment Discussion
In this section, detailed experiments are carried out to test the effectiveness of the proposed algorithm against other five algorithms or methods, including CNM, FTQ, Informap, and other two nature inspired algorithms GAnet and label propagation in Gong et al.’s literature [25].
CNM was presented by Clauset et al. in [10]. This method is essentially a fast implementation of the GN approach. GAnet is a singleobjective algorithm proposed by Pizzuti in [14]. The algorithm optimizes a simple but efficacious fitness function, which is called community score, to identify densely connected groups. Finetuned modularity density algorithm (denoted as FTQ) is a finetuned algorithm based on modularity density proposed by Chen et al.; it detects the community structure via splitting and merging the network [6]. Informap is introduced by Rosvall and Bergstrom in [27]. This algorithm uses a new information theoretic approach to reveal community structure in weighted and directed networks. In [25], a multiobjective algorithm based on MOEA/D and PSO is combined to detect the structure of community. Considering the objective function we adopted is a singleobjective function, only the label propagation and particles updating strategy in [25] are used and combined with modularity density function (denoted as DPSO in the following) to compare with proposed algorithm.
4.1. Experimental Settings
In this paper, we choose modularity and Normalized Mutual Information (NMI) as the measurement when the ground truth of a network is known. Otherwise, only modularity is adopted. For signed networks, is adopted to evaluate the performance in [28], which is formulized as where and represent the sum of all positive and negative weights of node , respectively, and represents the weight of adjacency matrix of the signed network.
The Normalized Mutual Information (NMI) described in [29] is an index to estimate the similarity between two community detection results. If we assume that , are the two partitions of a network and is the confusion matrix, then measures the similarity between and . The larger the value of NMI is, the more similar the partitions and are. If , ; if and are completely different, . can be calculated as follows: where is the number of the communities in partition () and is the sum of the elements of in row (column ).
In this paper, parameter in objective function increases from 0.3 to 0.8 with interval 0.1, and parameter in GAnet ranges from 1 to 1.5 with interval 0.1. For each value of parameter or , all the algorithms run 30 independent times on the test problems with setting pop to 100 and maxgen to 100. Among all the results for each network, the best one is selected and shown in the following experiments.
4.2. Experiments on GN Extended Benchmark Networks
GN extended benchmark network was proposed by Lancichinetti et al. in [30], which is an extension of the classical GN benchmark proposed by Girvan and Newman in [4]. GN extended benchmark network consists of 128 nodes divided into four communities, and each community has 32 nodes. The average degree of node is 16, and mixing parameter decides the percentage of connections between communities to the total connections. When , the network has strong community structure. On the contrary, when , the community structure is vague, and it is hard to detect its structure.
Experiments on GN extended benchmark networks are done to test the performance of our algorithm. CNM, FTQ, GAnet, Informap, DPSO, and the proposed algorithms are tested on 10 GN extended networks with mixing parameter ranging from 0.05 to 0.5. Figure 3 shows the average maximum and values obtained from different algorithms when the mixing parameter increases from 0.05 to 0.5 with interval 0.05.
(a)
(b)
As is shown in Figure 3(a), when the mixing parameter , Informap, FTQ, CNM, GAnet, DPSO, and the proposed algorithm can figure out the true partition (NMI equals 1). With the mixing parameter increasing, the community structure of the network becomes fuzzy and it becomes difficult to detect the true structure of the community. Informap and DPSO first show their weakness, and their detection ability decreases rapidly from to . Then detection ability of CNM decreases from . When , GAnet and FTQ show its limitation in detecting the community structure. Compared with them, the proposed algorithm shows its superiority. As seen from Figure 3(b), the same conclusion could be derived from measurement .
More experiments are discussed on GN extended benchmark network in detail to illustrate the performance of proposed algorithm. In our objective function, is a tuning parameter. The bigger the is, the more the number of the communities will be detected generally.
Figure 4 shows how NMI changes with mixing parameter when adopts different values. As seen from them, DPSO almost gets the same NMI values when ranges from 0.3 to 0.8. On the contrary, the proposed algorithm shows its excellent detection ability when community structure gets more and more obscure with the changing of parameter . From these figures, the superiority of proposed algorithm is demonstrated. In our view, the designed definition takes the topology of community structure into consideration, which allows the algorithm to detect more obscure structure than the others with a suitable tuning parameter value in the objective functions.
(a)
(b)
(c)
(d)
(e)
(f)
In order to discuss the convergence of the nature inspired algorithms (the proposed algorithm, DPSO, and GAnet algorithm), we choose GN extended benchmark networks and to illustrate it. As shown in Figures 5(a) and 5(c), the number of communities (denoted as in the figures) obtained from all three algorithms converges to a stable value no matter or . During the optimization process, the number of communities has converged to a stable value since the 20th iteration. At the same time, the labels of nodes still keep changing to obtain better results. At last, NMI converges to 1 or almost 1 in about the 30th iteration in Figures 5(b) and 5(d). For DPSO, NMI values are quite low (about equal to 0) since it can not detect the exact number of communities. For GAnet, when , it can obtain a satisfying result, but it has a slow convergence speed. When the mixing parameter increases to 0.4, it fails to detect community structure exactly. From Figure 5, a conclusion can be derived that the proposed algorithm significantly outperforms other algorithms and can detect the real structure of a network effectively.
(a) , number of communities
(b) , NMI
(c) , number of communities
(d) , NMI
4.3. Experiments on LFR Benchmark Network
On the GN extended benchmark network, all communities have exactly the same size, and the size of network is small, so it cannot reflect some important features in real networks. Because of this, LFR benchmark networks are proposed by Lancichinetti et al. in [30], in which the parameters and are set to tune the distribution of degree and community size. Each node shares a fraction of its links with the other nodes in the same community and a fraction with the nodes in other communities.
In this paper, we use 16 LFR benchmark networks; their mixing parameter increases from 0.05 to 0.8 with interval 0.05. Each of them consists of 1000 nodes and the cluster size ranges from 10 to 50, and . The averaged degree for each node is 20 and the maximum node degree is 50.
As is shown in Figure 6, the proposed algorithm and DPSO have a better performance than the other 4 algorithms on LFR networks because the proposed algorithm and DPSO share the same objective function. What is more, the detection ability of the proposed algorithm is stronger than that of DPSO with the mixing parameter increasing. A conclusion that our initialization and updating schemes can reveal the structure of community more exactly than label propagation in DPSO can be obtained from the comparison result. The performance of GAnet is the worst in all the 6 algorithms when . With the increasing of mixing parameter, GAnet has the best performance when , for the reason that it uses different encoding scheme. DPSO and our proposed algorithm tend to consider all the nodes as a whole community since the structure is too obscure. The experiment on LFR networks also shows the excellent detection ability of the proposed algorithm.
(a)
(b)
4.4. Experiments on RealLife Networks
We apply our algorithm to six wellknown reallife networks, including Zacharys Karate Club network [31], Dolphin social network [32], American College Football network, Krebs Books on US Politics network, Santa Fe Institute (SFI) network [4], and Netscience network [33]. The characteristics of the networks are shown in Table 1.
Comparison experiments on six reallife networks are tested and shown in Table 2. From what is recorded on the Zacharys Karate Club, Dolphin, and American College Football networks, the proposed algorithm shows a better performance than the other five algorithms. Referring to the Krebs’ Books on US Politics network, the performance of the proposed algorithm is little worse than FTQ and Informap algorithm. For the two reallife datasets without ground truth labels, SFI network and the Netscience network, the proposed algorithm outperforms the other algorithms, too.
In order to give a time cost analysis of the proposed algorithm, an experiment on the time comparison of three nature inspired algorithms is carried out in Figure 7. We can see that DPSO and the proposed algorithm have comparative time cost, but GAnet is the most timeconsuming since the time complexity of decoding step is more than that of the others.
A further analysis on the effect of the proposed definition (impact of the node) and the adopted objective function is carried out on reallife networks. Figure 8 shows the experiment result obtained from different initialization and updating strategies; (a)~(d) represent the results of the proposed initialization and the proposed updating strategy, the proposed initialization and the updating strategy in [25], the initialization strategy in [25] and the proposed updating strategy, the initialization in [25] and the updating strategy in [25], respectively. When comparing (a) to (b) and (c) to (d), we can clearly see that the updating strategy has a better performance than the other strategy. The designed initialization strategy shows a weak superiority effect compared to other initialization strategies when comparing (a) to (c) and (b) to (d), which indicates that the proposed initialization strategy can generate a set of solutions with better quality than others. From this figure, we can also derive that the proposed algorithm have a faster convergence speed than the other strategies.
(a)
(b)
(c)
(d)
In order to analyse the benefit of the adopted modularity density objective function, we give a comparison experiment obtained from two different objective functions: modularity density and community score (GAnet adopted) in Figure 9. It shows that when we use the designed initialization and updating strategy, modularity density function is more effective than community score function.
Figure 10 displays a visible detection result obtained from the proposed algorithm on Zacharys Karate Club network. As is shown in this figure, the proposed algorithm figures out the true partition of the network when . In the partition with , our algorithm detects 4 communities, which is still reasonable since the original one community is divided into two subcommunities.
(a)
(b)
4.5. Experiments on Signed Networks
In this subsection, we apply the proposed algorithm to three realworld signed networks, including the illustrative signed network [34], the Slovene Parliamentary Party (SPP) network [35], and the GahukuGama Subtribes (GGS) network [36].
Only DPSO and the proposed algorithm are considered on the three realworld signed networks because of the limitation of objective functions in the other algorithms. The best average statistical result is selected to be shown in Table 3.
As is shown in Table 3, we can clearly notice that both DPSO and the proposed algorithm can successfully detect the community structure of the network () on SPP and GGS, but for the Illustrate network, the proposed algorithm shows a better performance than DPSO.
SPP consists of 10 Slovene Parliamentary Parties set up by a series of experts on parliamentary activities in 1994. The topology community structure of the SPP network recognized by our algorithm is shown in Figure 11, in which the network is divided into two subcommunities when .
GGS consists of 16 nodes which represent 16 GahukuGama Subtribes involved in the warfare distributed in a particular area. The detection results of the proposed algorithm are displayed in Figure 12. Our algorithm detects three communities when .
The illustrative signed network consists of 28 nodes and is divided into three communities. The community structure of it detected by the proposed algorithm is different when the value of parameter differs, in which the network is divided into three subcommunities with and four subcommunities with . The corresponding result is shown in Figure 13.
(a)
(b)
From all the above analysis, the results on synthetic and reallife networks show that the proposed algorithm can deal with community detection problems effectively and promisingly.
5. Conclusion
After showing detailed description and experiments analysis in this paper, a conclusion about it can be drawn.
Firstly, the definition of the impact of node is proposed and the new label propagation based on it is designed. This new definition demonstrates its effectiveness by experiments on synthetic and reallife networks.
Secondly, special initialization and updating strategies based on the impact of node are designed. By using these strategies, the structure of the community is detected more exactly compared with other methods.
Thirdly, we modify modularity density function to signed networks according to the character of networks. Combining proposed initialization and updating strategies with this objective function, it can detect the community structure of signed networks exactly.
Additionally, the proposed strategy can be implemented into other graphrelated fields easily. This paper adopts singleobjective function, modularity density, to detect community structure, which can be easily extended to a multiobjective optimization problem. We believe that it would show its superiority to others in this way. Moreover, applying community detection to our radar communication networks to improve the cooperation quality is also one of our future works.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.