Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 2942054, 12 pages

https://doi.org/10.1155/2018/2942054

## An Autonomous Divisive Algorithm for Community Detection Based on Weak Link and Link-Break Strategy

Correspondence should be addressed to Jianpei Zhang; nc.ude.uebrh@iepnaijgnahz

Received 4 June 2017; Accepted 14 December 2017; Published 15 January 2018

Academic Editor: Sebastian Anita

Copyright © 2018 Xiaoyu Ding et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Divisive algorithms are widely used for community detection. A common strategy of divisive algorithms is to remove the external links which connect different communities so that communities get disconnected from each other. Divisive algorithms have been investigated for several decades but some challenges remain unsolved: (1) how to efficiently identify external links, (2) how to efficiently remove external links, and (3) how to end a divisive algorithm with no help of predefined parameters or community definitions. To overcome these challenges, we introduced a concept of the weak link and autonomous division. The implementation of the proposed divisive algorithm adopts a new link-break strategy similar to a tug-of-war contest, where communities act as contestants and weak links act as breakable ropes. Empirical evaluations on artificial and real-world networks show that the proposed algorithm achieves a better accuracy-efficiency trade-off than some of the latest divisive algorithms.

#### 1. Introduction

The study of networks is now one of the most active interdisciplinary research fields [1, 2]. In the research of computer science and sociology, complex systems are abstracted as networks or graphs. The basic components of the network are nodes and links. Nodes represent entities of interest. Links represent associations among entities. Community structure is one of the most important properties of complex systems, and community detection is an effective approach to study this property. The goal of detecting community structure is to get an appropriate classification where the links to the nodes with the community are dense, while the links to the nodes out of the community are sparse [3–7].

Nowadays, different community detection algorithms have been proposed [1, 2], such as divisive algorithms [8–11], clustering algorithms [5, 12–15], modularity optimization algorithms [16–20], and label propagation algorithms [21–24]. This paper focuses on the study of divisive algorithms which separate communities by detecting and removing links. Girvan and Newman [25] proposed a significant algorithm based on the betweenness which can identify external links [10]. However, as a global centrality index, the calculation of betweenness is time-consuming and each iteration of the algorithm removes only one link from the network. To improve the efficiency of divisive algorithms, Radicchi et al. [9] proposed the edge-clustering coefficient which is a local centrality index. Based on the edge-clustering coefficient, the proposed algorithm can remove multiple links from the network at each iteration. However, the result of the algorithm is a mass of trivial partitions. To get a trade-off between accuracy and efficiency, Yang et al. [11] proposed an algorithm based on closed walks. However, the termination of the algorithm depends on the quality function modularity [16, 26, 27].

This paper focuses on three challenges: (1) how to detect external links efficiently, (2) how to remove external links efficiently, and (3) how to end a divisive algorithm with no help of predefined parameters or community definitions. Actually, if communities can distinguish between internal and external links, then communities can remove external links, keep internal links, and define themselves. Based on this idea, we present a concept of the weak link and autonomous division. The implementation of the autonomous divisive (AD) algorithm adopts a new link-break strategy similar to a tug-of-war contest.

We summarize the main contributions of this paper as follows:(i)We propose a concept of the weak link. We define the weak link as a link which locates on the boundary of a community and is likely to connect another community. By removing weak links, communities get disconnected from each other. The experimental results on both artificial and real-world networks show that the weak link improves the efficiency for detecting external links.(ii)We propose a link-break strategy based on the weak link. The link-break strategy achieves a great efficiency by detecting and removing multiple weak links at each iteration of the proposed algorithm. Based on the link-break strategy, the number of iterations of a divisive algorithm can be reduced.(iii)We propose an autonomous divisive algorithm based on the weak link and link-break strategy. “Autonomous” means the proposed divisive algorithm does not require parameters, nontopological information, and community definition. The proposed algorithm can end with no help of predefined parameters or community definitions.

The rest of the paper is organized as follows. Section 2 reviews related works of divisive algorithms. Section 3 introduces the proposed definitions and algorithm. We test our algorithm and compare it with other divisive algorithms in Section 4. Section 5 concludes our study.

#### 2. Related Works

##### 2.1. Betweenness (GN) Algorithm

Girvan and Newman [25] proposed the GN algorithm. In their work, they proposed betweenness focusing on the links that are most “between” communities. Each iteration of GN removes the link with the highest betweenness and then recalculates the betweenness of all the links affected by the removal. For further study, they considered alternative definitions of betweenness. Experimental results showed that the proposed algorithm based on the shortest path betweenness shows the best performance [10].

##### 2.2. Distance Dissimilarity (DD) Algorithm

Zhou [8] proposed the DD algorithm to quantify the differences between communities. Zhou introduced the dissimilarity index to measure the possibility that two adjacent nodes belong to the same community. Besides, Zhou also introduced a resolution threshold value known as the dissimilarity threshold. At each iteration of DD, the value of dissimilarity threshold decreases differentially. Based on the dissimilarity threshold, DD can remove multiple links at each iteration and get hierarchically organized communities characterized by upper and lower dissimilarity thresholds [8].

##### 2.3. Information Centrality (IC) Algorithm

Fortunato et al. [28] proposed the IC algorithm based on information centrality defined as the relative decrement of network efficiency caused by the removal of a link. IC expects the link locating between communities to have high information centrality and the link locating within a community to have low information centrality [28]. Each iteration of IC accomplishes two tasks: calculating the information centrality of each link and removing the link with the highest information centrality. Experimental results showed that IC is effective at discovering community structures when the communities are cohesively connected with each other [28].

##### 2.4. Edge-Clustering Coefficient (CD) Algorithm

Radicchi et al. [9] proposed the CD algorithm to solve two problems. The first problem is the quantitative definition of community, and the other problem is the time-consuming nature of divisive algorithms. To solve the first problem, they introduced two alternative quantitative definitions of community. To solve the second problem, they suggested a local centrality index edge-clustering coefficient. Based on the edge-clustering coefficient, CD can remove multiple links at each iteration.

##### 2.5. Closed Walks (CW) Algorithm

Yang et al. [11] proposed the CW algorithm and introduced closed walks as a local centrality index. CW considers the closed walks of orders three and four based on three convincing pieces of evidence. The first evidence comes from statistical data where, in complex networks, the proportion of the links that participated in closed walks of orders 3 and 4 reaches ninety percent [11]. The second evidence comes from the* three degrees of influence* property of sociological significance [29]. The third evidence comes from the property that information usually propagates along paths without repeated nodes. Experimental results showed that CW is an effective way to solve the double peak structure problem [11].

#### 3. An Autonomous Divisive Algorithm for Community Detection

##### 3.1. Motivation

In real-world networks, it is often easier to discriminate between internal links and external links than to recognize overlapping nodes [1]. Defining communities as sets of links rather than nodes may be a promising strategy to analyze networks with overlapping communities [30, 31]. Based on this idea, many community detection algorithms [32–36] aim to find the differences in the property of links to extract high-quality community structures from networks. Based on network topology information, this paper discusses the difference between the properties of internal links and external links. We introduce a concept of the weak link to locate external links. In addition, we introduced a new link-break strategy and an autonomous division, so that the proposed divisive algorithm is free from parameters, nontopological information, and definition of community.

##### 3.2. Definition of Weak Link

Many real-world complex systems can be represented as a graph . is the set of nodes, is the set of links, , and . Most community detection algorithms are based on the notion that a community should have more internal connections than external connections [1, 2]. This notion skillfully generalizes the difference of the density distribution between internal links and external links. However, more properties of links are urgently needed to make divisive algorithms free from parameters, nontopological information, and community definitions.

First, to tell the difference between the properties of internal links and external links, as a baseline, we investigate the expected contribution of a node to its neighbors for spreading information. If a node can only get its neighbors’ information, then the node will expect that its neighbors’ contribution for spreading information is uniform. We define the expected contribution that node made to its neighbor for spreading information aswhere is the degree of node .

Second, we investigate the property of internal links. In a community, core members, hub members, and outlier members play different roles in spreading information [37].* Core members* contribute greatly to spreading information inside communities;* hub members* serve as hubs for spreading information both inside and outside communities;* outlier members* prefer to receive information rather than send information. In a community, from core members to outlier members, the node’s contribution for spreading information declines. Therefore, if node and node are two endpoints of an internal link and is the real contribution of node to its neighbor for spreading information, we expect that and or and .

Lastly, we investigate the property of external links. The biggest difference between the properties of internal links and external links is that external links connect different communities. As the two endpoints of an external link play an important role in spreading information between communities, we expect that both of the endpoints have a real contribution which is greater than the expected contribution. Hence, if node and node are the two endpoints of an external link, we expect that and . We define the weak link as follows.

*Definition 1 (weak link). *A link with two endpoints and is a weak link if and .

##### 3.3. Determination of Weak Link

To determine whether a link is a weak link, it is essential to quantify the real contribution of the two endpoints of a link for spreading information. Thus, we investigate the structure of the shortest path tree of each endpoint and introduce the shortest path coverage as a measure.

We use the shortest path coverage to estimate whether a node is at the edge of a potential community based on the following observations. There are three subgraphs in Figure 1. The graphs in Figures 1(b) and 1(c) are the isomorphic graphs of Figure 1(a). In Figures 1(b) and 1(c), the solid lines present the shortest path tree of nodes and . If we consider Figure 1(a) as a community, then node is a core member. In Figure 1(a), there are eight links, and there are four links and six links in the shortest path tree in Figures 1(b) and 1(c). We can see that there are four links in Figure 1(b) and two links in Figure 1(c) which are presented as dashed lines making no contribution to the shortest path tree; besides, the length of the shortest path from the source node is 4 and the length of the shortest path from the source node is 6. From Figure 1, we can summarize that, in a community, a core member gets in touch more quickly with the other members than a less important member, and the depth of the shortest path tree of a core member is shorter than that of a less important member.