The Scientific World Journal

The Scientific World Journal / 2014 / Article
Special Issue

Recent Advances in Information Technology

View this Special Issue

Research Article | Open Access

Volume 2014 |Article ID 121609 | https://doi.org/10.1155/2014/121609

Zhixiao Wang, Ya Zhao, Zhaotong Chen, Qiang Niu, "An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network", The Scientific World Journal, vol. 2014, Article ID 121609, 7 pages, 2014. https://doi.org/10.1155/2014/121609

An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network

Academic Editor: F. Yu
Received27 Aug 2013
Accepted17 Nov 2013
Published29 Jan 2014

Abstract

Topology potential theory is a new community detection theory on complex network, which divides a network into communities by spreading outward from each local maximum potential node. At present, almost all topology-potential-based community detection methods ignore node difference and assume that all nodes have the same mass. This hypothesis leads to inaccuracy of topology potential calculation and then decreases the precision of community detection. Inspired by the idea of PageRank algorithm, this paper puts forward a novel mass calculation method for complex network nodes. A node’s mass obtained by our method can effectively reflect its importance and influence in complex network. The more important the node is, the bigger its mass is. Simulation experiment results showed that, after taking node mass into consideration, the topology potential of node is more accurate, the distribution of topology potential is more reasonable, and the results of community detection are more precise.

1. Introduction

Most complex networks show community structure; that is, groups of vertices that have a higher density of edges within them and a lower density of edges between groups [1]. Identifying community structure is crucial for understanding the structural and functional properties of complex networks [2]. Many works inspired by different paradigms are devoted to the development of community detection [3]. Recently, topology potential theory was introduced to complex network area for community detection [4]. Because of its inherent advantage, such as low time complexity and good performance, this novel theory has attracted plenty of attentions [59].

Gan et al. [4] used the topology potential theory to describe the interaction and association among complex network nodes and put forward a community detection algorithm based on topology potential. The community structure can be uncovered by detecting all local high potential areas margined by low potential nodes.

Han et al. [5] proposed an overlapping community detection algorithm based on topology potential. A complex network will be divided into separate communities by spreading outward from each local maximum potential node. The algorithm claims that different nodes play different roles in complex network, such as seed node, overlapping node, and isolated node. Different community roles are identified during spreading process.

Zhang et al. [6] proposed a variable scale network overlapping community identification method based on topology potential. This method defines an identity uncertainty measure to identify overlapping nodes and utilizes the parameter to control community scale.

Topology potential calculation is the foundation and key step for the above topology-potential-based community detection methods. In a given network , where is a set of nodes, is the total number of nodes, is a set of edges, and is the total number of edges. The topology potential of any node can be computed as follows: where is the topology potential of node ; is the distance between node and node ; is the mass of node ; and is impact factor, which is used to control the affecting hops of node. The optimal impact factor can be obtained by using the method described in [4].

In formula (1), the node mass is an important parameter, which will directly affect the value of . However, almost all the above topology-potential-based community detection methods ignore the difference between nodes and assume . This hypothesis is debatable, and the reasons are described as follows.

On one hand, a node’s mass reflects its inherent properties, such as importance and influence. Different nodes have different inherent properties. For example, in social network, the importance of different people is significantly different, and public figures obviously have more influence than general people.

On the other hand, (1) shows that topology potential depends on the distance and the mass (the impact factor is a constant). If we suppose , the calculated topology potential value will deviate from the actual value, and this deviation may affect the precision of community detection.

In order to solve the above problems, this paper puts forward a mass calculation method for complex network nodes, which is inspired from the idea of PageRank [10] algorithm. Node mass calculated by this method can effectively reflect the importance and influence of nodes in complex network. The more important a node is, the bigger its mass is. Simulation experiment results showed that, after taking node mass into consideration, the topology potential of node is more accurate, the distribution of topology potential is more reasonable, and the results of community detection are more precise.

This paper is organized as follows: Section 2 describes the node mass calculation method; Section 3 analyzes the influence of node mass on topology-potential-based community detection; and Section 4 comes to the conclusion of this paper.

2. Node Mass Calculation

Apparently, matter particle has its inherent mass. But how to weigh the mass of network nodes? A node’s mass should reflect its importance and influence in the complex network. The more important a node is, the bigger its mass should be. Inspired by the idea of PageRank algorithm, this paper puts forward a mass calculation method for complex network nodes.

The PageRank algorithm has been successfully used by Google to evaluate the importance of web pages. Each web page is assigned a PR value to reflect its importance. The algorithm claims that the PR value of a web page can be measured by the number and importance of web pages linking to this page. Generally speaking, the more web pages link to this page, the more important it is. The contributions of these web pages are different: the more important these pages themselves are, the more contribution they make to this page.

Similarly, the importance of a network node can be measured by the number and importance of its neighbor nodes. The more neighbor nodes the node has, the more important it is. The more important its neighbors themselves are, the more important the node is.

Definition 1. In a given network , where is a set of nodes, is the total number of nodes, is a set of edges, and is the total number of edges. The mass of any node is defined as follows: where is the mass of node ; are neighbors of node , is the mass of node ; ; is the degree of node ; and is the damping factor, .

Definition 1 shows that the value of damping factor will influence the distribution of node mass. The PageRank algorithm set at 0.85 according to a large number of experiments and experiences. Apparently, a suitable damping factor is also needed in node mass calculation.

This paper selected a representative social network—Zachary network to analyze the relationship between damping factor and node mass. The Zachary network is a karate club network with 34 members. This karate club finally split into two communities because of the confliction between its chairman and coach. Table 1 shows the mass of number 1 node–number 7 node with different damping factors.


Damping factor Number 1 nodeNumber 2 nodeNumber 3 nodeNumber 4 nodeNumber 5 nodeNumber 6 nodeNumber 7 node

0.001.0000001.0000001.0000001.0000001.0000001.0000001.000000
0.101.3952551.1266441.1123631.0215420.9662751.0138091.013809
0.201.7470791.2370611.2195731.0419980.9354921.0257501.02575
0.302.0616671.3351011.3237131.0621840.9070041.0352981.035298
0.402.3439181.4240511.4267511.0829540.8801281.0418001.041800
0.502.5977381.5069131.5306931.1052970.8540661.0443371.044337
0.602.8262181.5867121.6377881.1304670.8277591.0414931.041493
0.703.0316371.6668691.7508301.1601930.7996221.0309091.030909
0.803.2150211.7517641.8737241.1970290.7669161.0082391.008239
0.903.3741511.8475682.012651.2448930.7241110.9644800.964480
1.003.4956051.9630562.1787591.3097090.6580290.8781510.878151

As can be seen from Table 1, when is 0, the mass of the seven nodes are all 1, which means that there is no difference in these nodes. With increasing, the mass difference between nodes gradually becomes apparent. When comes to 1, the mass difference reaches the maximum.

Figure 1 shows the gap between maximum mass and minimum mass of Zachary network nodes with different damping factors. As can be seen from Figure 1, the gap is increasing with the increasing of , and it almost shows a linear uptrend. In order to ensure mass difference between nodes, highlight important nodes, and meanwhile avoid extreme mass difference, this paper selects , to which the half position (B in Figure 1) between no mass difference (C in Figure 1) and the biggest mass difference (A in Figure 1) corresponds, as optimal value. For the Zachary network, the corresponding optimal damping factor is 0.38 (D in Figure 1).

Node mass calculated by our method can effectively reflect the importance and influence of nodes in complex network. The more important a node is, the bigger its mass is. After taking node mass into consideration, the topology potential of node will be more accurate, and the distribution of topology potential will be more reasonable. Now that mass reflects the influence of node in whole complex network, thus, the topology potential, which depends on the distance and the mass , is of global characteristic to some extent. This global characteristic will be meaningful for community detection.

3. Simulation Experiments

This section will empirically analyze the influence of node mass on three typical topology-potential-based community detection methods. These three methods come from literature [4], literature [5], and literature [6]. In this paper, they are called Gan, Han, and Zhang, respectively.

Simulation program was implemented using scientific computing software MATLAB in the Windows environments. The experiment data include two complex networks: one is a real world network—Dolphin social network, which comes from http://www-personal.umich.edu/~mejn/netdata/; and the other is an artificial network, which is generated by LFR-Benchmark generator [11]. LFR-Benchmark is a network generator, which produces networks with power-law degree distribution and with implanted communities within the network.

For each network, there are two schemes: one is “without mass” scheme, which ignores node difference and sets , and the other is “with mass” scheme, which takes node mass into consideration; node mass is computed according to Definition 1. We analyzed the topology potential of nodes and community detection results with these two schemes.

3.1. Artificial Complex Network

The artificial complex network is generated by the LFR-Benchmark generator. The node number is 100, the edge number is 230, the average degree is 4.6, and the implanted community number is 2. The structure of the artificial complex network is shown in Figure 2.

3.1.1. The Influences of Node Mass on Topology Potential

Table 2 shows the topology potential of number 1 node–number 20 node with two schemes. As seen from Table 2, the topology potential of artificial network nodes shows obvious changes after taking node mass into consideration.


NodeWithout massWith massNodeWithout massWith mass

Node 12.41401.8383Node 112.51752.0233
Node 22.10351.4705Node 122.24151.6444
Node 32.55201.7553Node 132.55201.3730
Node 42.51751.8146Node 142.58651.4425
Node 52.51751.8885Node 152.41401.9363
Node 61.91401.2719Node 162.24151.9283
Node 72.69001.9954Node 172.41401.5655
Node 82.62101.7898Node 183.32801.5591
Node 92.27601.5757Node 193.01751.6472
Node 102.00001.4835Node 202.81051.4862

Table 3 shows the top 20 nodes with the biggest topology potential in two schemes. As seen from Table 3, the top 20 nodes sequence changes from the fourth biggest node after taking node mass into consideration. The change of node sequence implies the change of topology potential distribution, which may affect community detection results.


Serial numberWithout massWith massSerial numberWithout massWith mass

1Node 99Node 9911Node 91Node 91
2Node 97Node 9712Node 90Node 88
3Node 100 Node 10013Node 85Node 85
4Node 98Node 9414Node 87Node 87
5Node 95Node 9815Node 88Node 86
6Node 94Node 9616Node 86Node 90
7Node 96Node 9517Node 81Node 84
8Node 93Node 9318Node 82Node 76
9Node 92Node 8919Node 84Node 81
10Node 89Node 9220Node 78Node 78

3.1.2. The Influences of Node Mass on Community Detection Results

The artificial complex network contains two communities: the community and the community . The representative node of is number 97 node, and the representative node of is number 99 node.

(1) The Gan Method. The Gan method first identifies internal nodes and boundary nodes and then uses defined benefit function to determine which community a boundary node belongs to. For the “without mass” scheme, the boundary nodes identified by the Gan method are , with a total number of 12. For the “with mass” scheme, the boundary nodes identified by the Gan method are , with a total number of 11. Obviously, after taking node mass into consideration, the boundary nodes reduced from 12 to 11; thereby it can lighten the load of determining which community a boundary node belongs to. As can be seen from Figure 2, number 41 node is apparently the internal node of community . But if we do not take node mass into consideration, this node is regarded as boundary node by mistake.

(2) The Zhang Method. Zhang method uses the same strategy as the Gan method to identify internal nodes and boundary nodes. The only difference is the way of determining which community a boundary node belongs to. Therefore, For the “without mass” scheme, the boundary nodes identified by the Zhang method are also , with a total number of 12. When we take node mass into consideration, the boundary nodes identified by the Zhang method are also , with a total number of 11.

(3) The Han Method. The community detection results remain the same with these two schemes. The reason is as follows: the Han method simply utilizes topology potential to find local maximum topology potential nodes, that is, representative nodes of communities, and then it uses a strategy similar to modularity to determine which community nodes it belongs to. Whether we take node mass into consideration or not, local maximum topology potential nodes are not changed (always number 97 node and number 99 node), and complex network structure is steadiness; therefore, community detection results remain the same.

3.2. Dolphin Social Network

The Dolphin social network describes the frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand. The structure of the Dolphin social network is shown in Figure 3.

3.2.1. The Influences of Node Mass on Topology Potential

Table 4 shows the topology potential of number 1 node–number 20 node with two schemes. As seen from Table 4, the topology potential of Dolphin nodes shows obvious changes after taking node mass into consideration.


NodeWithout massWith massNodeWithout massWith mass

Node 16.28954.5944Node 115.36103.9025
Node 27.02125.4907Node 122.39731.7321
Node 34.15132.9848Node 132.39731.6296
Node 44.16052.8547Node 146.45855.5909
Node 52.39731.7321Node 159.60987.6395
Node 63.86993.2161Node 166.46784.7977
Node 75.44564.6391Node 176.19574.8912
Node 85.45483.8740Node 187.10576.1104
Node 96.28954.6118Node 196.93675.4700
Node 105.81145.0576Node 204.33883.1913

Table 5 shows the top 20 nodes with the biggest topology potential in two schemes. As seen from Table 5, the top 20 nodes sequence changes from the fifth biggest node after taking node mass into consideration. This change may affect community detection results.


Serial numberWithout massWith massSerial numberWithout massWith mass

1Node 15Node 1511Node 39Node 51
2Node 38Node 3812Node 18Node 39
3Node 46Node 4613Node 2Node 14
4Node 34Node 3414Node 19Node 44
5Node 21Node 5215Node 44Node 2
6Node 41Node 3016Node 58Node 19
7Node 30Node 1817Node 16Node 37
8Node 52Node 2118Node 14Node 22
9Node 37Node 4119Node 22Node 10
10Node 51Node 5820Node 9Node 25

3.2.2. The Influences of Node Mass on Community Detection Results

Dolphin social network contains two communities: the community and the community . The representative node of is number 15 node, and the representative node of is number 18 node.

(1) The Gan Method. For the “without mass” scheme, the boundary nodes identified by the Gan method are , with a total number of 13. For the “with mass” scheme, the boundary nodes identified by the Gan method are , with a total number of 9. Obviously, after taking node mass into consideration, the boundary nodes reduced from 13 to 9; thereby it can lighten the load of determining which community a boundary node belongs to.

As can be seen from Figure 3, number 7 node, number 26 node, number 27 node, number 42 node, and number 57 node are apparently the internal nodes of community . But if we do not take node mass into consideration, these nodes are regarded as boundary nodes by mistake.

There emerge some new boundary nodes for the “with mass” scheme, such as number 24 node and number 60 node. In Figure 3, these two nodes locate in the overlapping area of community and community , and they all directly connect to number 37 node, which definitely is an overlapping node. So it is reasonable to claim that number 24 node and number 60 node are boundary nodes.

(2) The Zhang Method. Whether for the “without mass” scheme or the “with mass” scheme, the boundary nodes identified by Zhang method are all the same as Gan method, and after taking node mass into consideration, the boundary nodes reduced from 13 to 9. The reason is the same as that explained in Section 3.1.

(3) The Han Method. The community detection results remain the same with these two schemes. The reason is the same as that explained in Section 3.1.

4. Conclusions

Topology potential theory is a new community detection theory for complex network. At present, almost all topology-potential-based community detection methods assume that network nodes have the same mass. This hypothesis leads to inaccuracy of topology potential calculation and then decreases the precision of community detection. Inspired by the idea of PageRank algorithm, this paper puts forward a novel mass calculation method for complex network node. A node’s mass obtained by our method can effectively reflect its importance and influence in complex network. The more important a node is, the bigger its mass is. Simulation experiment results showed that, after taking node mass into consideration, the topology potential of node is more accurate, the distribution of topology potential is more reasonable, and the results of community detection are more precise.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the Fundamental Research Funds for the Central Universities (2014QNB23).

References

  1. S. Zhang, “Hierarchical modular structure identification with its applications in gene co-expression networks,” The Scientific World Journal, vol. 2012, Article ID 523706, 8 pages, 2012. View at: Publisher Site | Google Scholar
  2. R. W. Myster, “A refined methodology for defining plant communities using postagricultural data from the neotropics,” The Scientific World Journal, vol. 2012, Article ID 365409, 9 pages, 2012. View at: Publisher Site | Google Scholar
  3. G. K. Orman, V. Labatut, and H. Cherifi, “Comparative evaluation of community detection algorithms: a topology approach,” Journal of Statistical Mechanics, vol. 2012, Article ID P08001, 2012. View at: Publisher Site | Google Scholar
  4. W.-Y. Gan, N. He, D.-Y. Li, and J.-M. Wang, “Community discovery method in networks based on topological potential,” Journal of Software, vol. 20, no. 8, pp. 2241–2254, 2009. View at: Publisher Site | Google Scholar
  5. Y. Han, D. Li, and T. Wang, “Identifying different community members in complex networks based on topology potential,” Frontiers of Computer Science in China, vol. 5, no. 1, pp. 87–99, 2011. View at: Publisher Site | Google Scholar
  6. J. Zhang, H. Li, J. Yang, J. Bai, L. Zhang, and Y. Chu, “Variable scale network overlapping community identification based on identity uncertainty,” Acta Electronica Sinica, vol. 40, no. 12, pp. 2512–2518, 2012. View at: Google Scholar
  7. J. Zhang, H. Li, J. Yang, J. Bai, Y. Chu, and L. Zhang, “Community discovery method with uncertainty measure of overlapping nodes based on topology potential,” Journal of Harbin Institute of Technology, vol. 19, no. 2, pp. 16–22, 2012. View at: Google Scholar
  8. J. Zhang, H. Li, J. Yang, J. Bai, and L. Zhang, “An importance-sorting algorithm of network community nodes based on topology potential,” Journal of Harbin Engineering University, vol. 33, no. 6, pp. 745–752, 2012. View at: Google Scholar
  9. J. Zhang, H. Li, J. Yang, J. Bai, and Y. Chu, “Network soft partition based on topological potential,” in Proceedings of the 6th International ICST Conference on Communications and Networking in China (CHINACOM '11), pp. 725–729, Harbin, China, August 2011. View at: Publisher Site | Google Scholar
  10. S. Brin, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks, vol. 30, no. 1–7, pp. 107–117, 1998. View at: Google Scholar
  11. A. Lancichinetti and S. Fortunato, “Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities,” Physical Review E, vol. 80, no. 1, Article ID 016118, 2009. View at: Publisher Site | Google Scholar

Copyright © 2014 Zhixiao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views1201
Downloads975
Citations

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.