Mathematical Problems in Engineering

Volume 2015, Article ID 675713, 8 pages

http://dx.doi.org/10.1155/2015/675713

## Identifying Super-Spreader Nodes in Complex Networks

^{1}Department of Computer Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan^{2}Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, 259 Wen Hwa 1st Road, Taoyuan 333, Taiwan

Received 26 May 2014; Accepted 25 September 2014

Academic Editor: He Huang

Copyright © 2015 Yu-Hsiang Fu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Identifying the most influential individuals spreading information or infectious diseases can assist or hinder information dissemination, product exposure, and contagious disease detection. Hub nodes, high betweenness nodes, high closeness nodes, and high -shell nodes have been identified as good initial spreaders, but efforts to use node diversity within network structures to measure spreading ability are few. Here we describe a two-step framework that combines global diversity and local features to identify the most influential network nodes. Results from susceptible-infected-recovered epidemic simulations indicate that our proposed method performs well and stably in single initial spreader scenarios associated with various complex network datasets.

#### 1. Introduction

Network-spreading studies range from information diffusion via online social media sites, to viral marketing, to epidemic disease identification and control, among many others [1–9]. Key spreader identification strategies are being established and tested to accelerate information dissemination, increase product exposure, detect contagious disease outbreaks, and execute early intervention strategies [10]. Topological structure is a core concept in this identification process [1, 2, 11–15].

Centrality measures for identifying influential social network nodes are broadly categorized as local or global [3, 7, 16]. Degree centrality (the number of nodes that a focal node is connected to) measures node involvement in a network. However, most network node researchers fail to consider global topological structures. Betweenness (which assesses the degree to which a node lies on the shortest path between two other nodes) and closeness (the inverse sum of the shortest distances from a focal node to all other nodes) are the two most widely used measures for overcoming these limitations. Influence is tied to advantageous network positions, including high degree, high closeness, and high betweenness. In simple network structures, these advantages tend to vary; in complex networks, significant disjunctures can emerge among position characteristics, so that a spreader’s location may be simultaneously advantageous and disadvantageous.

Results from -shell decomposition analyses indicate that network nodes in core layers are capable of spreading throughout much broader areas compared to those in peripheral layers [1, 2]. Although spreading capability differs among nodes, those with similar -shell values are perceived as having equal importance. To rank spreaders, a method called mixed degree decomposition adds otherwise ignored degree nodes to the decomposition process [3, 6, 8, 17]. Still, researchers have tended to overlook the importance of network topology and node diversity, despite their positive correlations with events such as community economic development [18].

We used the concept of entropy to develop a robust and reliable method for measuring the spreading capability of nodes and identifying super-spreader nodes in complex networks. It can be used to analyze numbers of global network topological layers and local neighborhood nodes affected by specific individual nodes. Our assumption is that -shell decomposition [1, 2] can be used for global analysis, with high global diversity/high local centrality nodes capable of penetrating multiple global layers and influencing large numbers of neighbors in local layers of complex networks.

To measure node influence, we propose a two-step framework for acquiring global and local node information within complex networks. Global node information is initially obtained using algorithms (e.g., a community detection algorithm for complex networks [5, 19, 20] or a -shell decomposition algorithm for core/periphery network layers), after which entropy is used to evaluate network node global diversity. Next, local node information is acquired using various types of local centrality. Last, global diversity and local features are combined to determine node influence. In our experiments, spreading ability equaled the total number of recovered nodes over time. We used a susceptible-infective-recovered (SIR) epidemic simulation with various social network datasets [21–25] to compare the spreading capabilities of our proposed measure and social network local/global centralities [2, 26, 27].

#### 2. Background

To represent a complex network, let an undirected graph , where is the network node set and the edge set. indicates the number of network nodes and the number of edges. Network structure is represented as an adjacency matrix and , where if a link exists between nodes and , otherwise .

Degree (or local) centrality is a simple yet effective method for measuring node influence in a complex network. Let denote node degree centrality. Higher values indicate larger numbers of connections between a node and its neighbors. denotes the set of node neighbors at a -hop distance. Node degree centrality is therefore defined aswhere is the number of node neighbors at a -hop distance; in most cases, [7].

Betweenness centrality or dependency measures the proportion of shortest paths going through a node in a complex network. denotes node betweenness centrality. Higher values indicate that a complex network node is located along an important communication path. Accordingly, node betweenness centrality is defined aswhere is the number of shortest paths from node to node through node and is the total number of shortest paths from node to node [3, 7, 16].

Closeness (or global) centrality measures the average length of the shortest paths from one node to other nodes. Let denote node closeness centrality. Higher values indicate node location in the center of a complex network, with a shorter average distance from that node to other nodes. Node closeness centrality is thus defined as where is the average length of the shortest paths from node to the other nodes and is the distance from node to node [16].

-shell decomposition [1, 2] iteratively assigns -shell layer values to all nodes in a complex network. During the first step, let and remove all nodes where . Following removal, some remaining network node degrees may be . Nodes are continuously pruned until there are no nodes. All removed nodes are assigned a -shell value of 1. The next step is similar: let , prune nodes, and assign a -shell value of 2 to all removed nodes. Repeat the procedure until all network nodes are removed and assigned -shell indexes. This method reveals the significant features of a complex network—for example, all Internet nodes can be classified as nuclei, peer-connected components, or isolated components [1].

The SIR epidemic model [2, 26, 27] is used in many fields to study the spreading processes of information, rumors, biological diseases, and other phenomena. The model consists of three states: susceptible (), infective (), and recovered (). nodes are susceptible to information or diseases, nodes are capable of infecting neighbors, and nodes are immune and cannot be reinfected. Initially, almost all network nodes are in the set, with a small number of infected nodes acting as spreaders. During each time step, nodes infect their neighbors at a preestablished infection rate, after which they become recovered nodes at a recovery rate of . The total number of nodes in an SIR model is , with denoting the number of susceptible nodes at time , the number of infected nodes at time , the number of recovered nodes at time , and the proportion of immune nodes.

#### 3. The Proposed Measure

Our two-step method for obtaining global and local node information in a complex network is illustrated in the following steps. In step 1, global algorithms (e.g., community detection, graph clustering, and -shell decomposition) are used to analyze the global features of nodes, and results are used to compute their global diversity. In step 2, degree centrality is used to measure local node features. Global diversity and local features are then combined to determine the influence of complex network nodes.

In step 1, -shell decomposition was used as an example for obtaining global node information in a complex network, with Shannon’s entropy [28] used to calculate node -shell values and to determine how many network layers are affected by a node. According to (4), maximum entropy indicates a case in which a node is capable of connecting with all layers of a complex network, and minimum entropy (0) indicates a case in which all node connections are in the same network layer. The -shell entropy of node , which ensures that its neighbors’ -shell values are significantly more diverse, is defined aswhere are the -shell values of the neighbors of node , the probability of the -core layer of neighbors, the number of nodes in the -core layer of the complex network, and the normalized -core entropy required for the case under consideration.

In step 2, the node’s degree centrality is used to analyze the value of local features in the complex network; the degree centralities of neighbors are also considered. High influence values indicate high degree centralities of a node and its neighbors, meaning that the node is capable of reaching the widest possible local range. The local feature of node is defined aswhere is the degree centrality of neighbor and is the node neighbor set at a -hop distance. can be extended to become a “neighbor’s neighbor” version, meaning that all node neighbors with a 2-hop distance are considered.

Finally, and are combined to denote , the final influence of node , defined as

#### 4. Results and Discussion

Basic complex network properties and results from a network GCC structure analysis are shown in Table 1. We used three network dataset classifications: scientific collaboration, traditional social, and “other.” Measures were degree, betweenness, and closeness centralities; -shell decomposition; neighbor’s core (also known as* coreness*) [29]; PageRank [30]; and our proposed method. Spreading experiment and SIR epidemic model parameters were 1,000 simulations for each dataset, 50 time steps per simulation, and with the top-1 node for each measure serving as the initial spreader. infection rates are shown in Table 1. According to at least one study, a large infection rate makes no difference in terms of spreading measures [2]. To assign a suitable infection rate for each network dataset, rates were determined by comparing the theoretical epidemic threshold with the number used in referenced studies [29]. Recovery rate was always , meaning that every node in set entered set immediately after infecting its neighbors.