Abstract

Identifying influential spreaders in complex networks is crucial for containing virus spread, accelerating information diffusion, and promoting new products. In this paper, inspired by the effect of leaders on social ties, we propose the most influential neighbors’ -shell index that is the weighted sum of the products between -core values of itself and the node with the maximum -shell values. We apply the classical Susceptible-Infected-Recovered (SIR) model to verify the performance of our method. The experimental results on both real and artificial networks show that the proposed method can quantify the node influence more accurately than degree centrality, betweenness centrality, closeness centrality, and -shell decomposition method.

1. Introduction

Identifying influential spreaders is of theoretical and practical significance in understating the dynamics of spreading in a complex network. It is conducive to containing epidemic spread, studying information dissemination, and controlling virus diffusion [114]. In this view, researchers have developed various methods, such as degree centrality (DC) [15], betweenness centrality (BC) [16], closeness centrality (CC) [17], and semilocal centrality [18] to identify the most influential spreaders in a network.

Kitsak et al. [1] argue that the most efficient spreaders are located at the core of a network. By using the -shell decomposition (KS) analysis [1921], the location of each node is defined as an integer index, -shell value, according to successive layers in a network. A small -shell value represents the periphery of the network while a large -core value defines the most influential neighbors; however, many nodes with an identical -core value have different spreading influence. In other words, the -shell method manifests a relatively low performance of monotonicity. To address the issue, many other methods have been proposed to improve the effectiveness of the -shell method. For example, Zeng et al. designed a mixed degree decomposition method [22] to rank spreaders according to the links connecting to both the remaining nodes and the removed nodes. Bae et al. [23] defined a coreness centrality index by summing all neighbors’ -shell values and subsequently found that this method provides a more monotonic ranking list than other ranking methods. Ma et al. [24] proposed a gravity model by considering the -shell value of each node as its mass and the shortest distance between two nodes in a network as their distance.

In this paper, we propose a novel influence measure, the most influential neighbors’ -shell (MINK) method to quantify the spreading capability of a node. Here, we refer to the nodes with the largest -shell value as the most influential neighbors. Inspired by the effect of the leaders on social ties, we identify a spreader’s influence by focusing on its interaction with the most influential neighbors. By using the -shell method to define the most influential neighbors, our proposal also takes into account a node’s complex integration with other nodes in the network. It is worth noting that, compared with the gravity model that considers the interaction with a node’s neighbors within a given distance value r, MINK is unacted on the influence of subjective parameters. According to structural holes theory [25], structural equivalence occurs among a node’s neighbors in networks that are lacking in structural holes (“structural holes” are network gaps between unconnected nodes, creating opportunities for unique information access and control), which is to say that the neighbors tend to have strong ties with each other and bring redundant information to the node. The MINK index originates from the idea that generally an influential neighbor has more unique information and therefore exerts more influence on a node’s spreading capacity than other neighbors. In this case, the MINK index, which excludes repeated information from its less influential neighbors, is more refined and less costly in computation.

The rest of this paper is organized as follows. We briefly review previous studies and present our method in Section 2. In Section 3, we apply the Susceptible-Infected-Recovered (SIR) model to evaluate the performance of our proposed method in both real and synthetic example networks. Conclusion is provided in Section 4.

2. The MINK Index

Normally, a network with nodes and edges can be described by an adjacent matrix , where if node is connected to node , and otherwise. For the sake of simplicity, is viewed as an undirected, unweighted, and simple network.

The degree centrality, based on local information, defines the influence of a node as the number of its adjacent vertices. The degree of a node can be expressed as follows:The betweenness centrality measures the fraction of all shortest paths between each node pair which passes through the considered node . It can be described as where denotes the total number of shortest paths between vertex s and vertex and stands for the number of shortest paths from to travelling through vertex . The higher the score ranked by the method is, the more likely a node is a hub vertex, which is an information transfer station in a network.

The closeness centrality is introduced to measure how long the information possessed by a node will propagate in a network. The closeness centrality of node is defined as the reciprocal of the average of shortest distances to all the other nodes:where is the number of all nodes and stands for the geodesic distance between vertex and vertex .

Bao et al. [18] put forward a semilocal centrality by including the effects of shortest distance, the number of shortest paths, and the transmission rate simultaneously, which is defined aswhere is the number of shortest paths between nodes and , denotes the average degree of the network, and is the neighborhood set whose distance to node is less than or equal to a coverage radius . Specifically, is set in literature [18].

The -shell decomposition method [1] endows all nodes with a corresponding -shell value by removing nodes iteratively as follows. First, we start with removing all nodes with degree and continue dropping the remaining nodes until no node with exists in the network. All nodes removed are assigned with -shell value. Secondly, we iteratively remove all nodes with degree until no node with exists in the network. All of these removed nodes are assigned with . Next, we repeat this process until all nodes are removed and assigned with a corresponding -shell value. In the end, each node is defined by the KS index, according to its relative topological location in the network.

However, by using the -shell decomposition, too many nodes with different spreading influences turn out to be assigned to the identical KS index. To improve the monotonicity of the -shell decomposition, we propose most influential neighbors’ -shell (MINK) index, which is inspired by the effect of the leaders on the social ties. Specifically, leaders in social networks share information, provide advice, assign work, and collaborate with other members in the networks. The influence of other members on the network is highly determined by their connection to the leaders. Therefore, we measure the spreading ability of a node based on the interaction of its influential neighbors characterized by the largest -shell values. On one hand, a node will have a greater influence if its most influencing neighbor or itself has a higher value of KS; on the other hand, the effect increases as their distance shortens. In this way, the influence of node is measured bywhere is the shortest path distances between node and node and is the set of spreaders with the maximum -shell value.

3. Empirical Results

Susceptible-Infected-Recovered (SIR) model [26] is a simulation process to mimic the epidemic spreading. It is widely used in identifying the spreading capacity of nodes by scholars and adopted in the study of vaccination strategy and infection control [2729] as well. In principle, the SIR model detects the influential vertices due to the fact that key nodes are more likely to play an indispensable role in information and viral transmission, and thereby an effective ranking is supposed to stand the test of real spreading coverage.

Therefore, we employ the standard SIR model herein to evaluate the performance of our proposed model. It starts from setting a node as an infected node and the remaining nodes as suspected nodes. At each step, the infected node will infect its susceptible neighbors at the spreading rate and then it will recover with probability . The process continues until all infected nodes are recovered with no infected nodes left in the network. The spreading influence of a node can be obtained by calculating the number of infected nodes at the end of the process. In the paper, we set and . By using this relatively small infection probability, we avoid the situation where most nodes of a network will be inflected easily so that the different influence of each node cannot be detected.

To check the performance of our proposed method, six real networks are introduced in this paper, including Dolphins (friendship) [30], USAir97 (US air flights network), C.elegans (neural) [31], Email (communication) [32], PGP (an encrypted communication network) [33], and Internet (router level). For simplicity, we view these networks as simple undirected and unweighted networks. The statistical properties of the six real networks are listed in Table 1, including the number of nodes , edges , the degree heterogeneity , the degree assortativity , the clustering coefficient , and the average shortest path length .

Next, applying these real networks, we compare the effectiveness of our proposed method with degree centrality, the -shell method, betweenness centrality, and closeness centrality. Both the resolution and correctness of these different ranking methods are studied, respectively.

First, following the literature [24], we define the monotonicity index to quantify the resolution of different ranking methods, as follows: where is the size of the network and is the number of the nodes with the same ranking result when implementing an algorithm. By definition, a ranking method with the monotonicity index closer to 1 has a higher resolution to distinguish nodes’ different influence. If , the ranking method is perfectly monotonic, and each node is identified by a different index value. The monotonicity indexes for different ranking methods are summarized in Table 2. The results suggest that our proposed method can generate higher resolution values than degree centrality, the -shell method, and betweenness centrality do in all six of the real networks. is close to 1 in networks C.elegans, Dolphins, and Internet. In addition, we find out that although the -shell method may identify the most influential spreaders, its resolution is relatively low in these six networks, implying that the different influences of spreaders are not classified. This means that it is necessary to develop alternative methods to overcome the disadvantage of the -shell method.

Secondly, Kendall’s tau rank correlation coefficient [34] is used to quantify the correctness of the ranking methods. Let and be a pair of joint observations that are randomly selected from ranking lists and . The observations and are concordant, if both and or if both and . They are said to be discordant, if and or if and . If or , the pair is neither concordant nor discordant. Kendall’s tau coefficient is defined as where is the number of concordant pairs, is the number of discordant pairs, and is the size of a network. Kendall’s tau is within , and the large values imply a higher level of correlation between the SIR model and the compared method. Kendall’s tau is affected by the network infection rate. In this paper, we set the infection rate to derive Kendall’s tau different under infection rates. Note that the inflection rate cannot be too large, because, with a large , the whole network will be easily infected so that the influences of different notes cannot be distinguished. The average values of Kendall’s tau under for different ranking methods are summarized in Table 3. The results indicate that our proposed model outperforms existing models generally and it is effective especially in networks USAir97 and C.elegans.

We also show how Kendall’s tau changes in the infection rate for different methods in Figure 1. As described in Figure 1, in most cases, our proposed method achieves a better performance than other methods. As the infection rate increases, Kendall’s tau using the -shell method is positively correlated with the value using our method generally, but the former is less than the latter . This implies that our method yields higher correctness than does the -shell method.

Besides real networks, we also check the effectiveness of our methods on a typical synthetic network using the Barabási-Albert (BA) model [35]. Creating the BA network starts with a network with nodes. Then, at each step, a new node is added to the network and connected to existing nodes according to the preferential attachment mechanism. In this paper, we set and .

For the BA network, we calculate Kendall’s tau rank correlation coefficients for DC, CC, BC, and our proposed model. Figure 2 shows that MINK performs better than CC and much better than DC and BC. Note that all nodes in the BA network are assigned with the same -shell value, so we do not consider the -shell method in our comparison. The average tau values using different methods are listed in Table 4. The results indicate that our model outperforms existing models.

4. Conclusion

In this paper, we propose the MINK index to measure the ability of spreaders in complex networks using the neighbors with the largest -shell values. Our method is based on the facts that a node's spreading ability is proportional to the -shell values of itself and its most influential neighbors and decreases with the distances between itself and these neighbors. By using real networks and a synthetic network using the BA model, we compare our method with the degree centrality, the betweenness centrality, the closeness centrality, and the -shell decomposition method. The empirical results suggest that our method produces a more monotonic ranking than the degree centrality, the -shell method, and the betweenness centrality in all six real networks. Moreover, in most cases, the ranking result of our method is highly correlated with the epidemic spreading range compared with other well-known methods.

Some limitations of our method need to be addressed. First, we only investigated the performance of our method in some typical networks and the classical SIR model was used to mimic the epidemic spreading process. In practice, the structure of a network and spreading dynamic can be different. Thus, the effectiveness of this method needs to be tested more generally. Second, our MINK index is weighted by the distance between a node and its most influential neighbors, but this distance cannot be calculated if these nodes are not connected. Therefore, our method is not appropriate in identifying spreaders’ influence in an unconnected network.

Data Availability

The Dolphins and Internet network data used to support the findings of this study are available from Mark Newman's network data repository (http://www-personal.umich.edu/~mejn/netdata/). The C.elegans, Email, and PGP network data used to support the findings of this study are available from the Alex Arenas' data sets (http://deim.urv.cat/~alexandre.arenas/data/welcome.htm). The USAir97 used to support the findings of this study is available from Vladimir Batagelj and Andrej Mrvar (2006) Pajek datasets. (http://vlado.fmf.uni-lj.si/pub/networks/data/).

Disclosure

Any errors in the work are our own with no responsibility on the funders.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to Tian Bian for kindly providing help. Financial support from the National Natural Science Foundation of China (no. 71771154 and no. 71702109), the Natural Science Foundation of Guangdong Province (no. 2017A030310304 and no. 2017A030310566), and the Research Foundation of Shenzhen University (no. CCSEZR1810) is also gratefully acknowledged.