Abstract

Influential spreader identification is a vital research area in complex network theory, which has important influence on application and popularization. Each of the existing methods has its own advantages and disadvantages, and there are still various methods proposed to solve this issue. In this paper, we come up with a new centrality of influential spreader identification based on network connectivity and efficiency (CEC). The consequences of spreader deletion can be generally divided into two parts, one is that the connectivity of network topology is destroyed, and the other is that network’s performance is degraded, which makes the network unable to meet the functional requirement. Therefore, the relative changes of connectivity and efficiency of network before and after removing spreaders are used to present the influence of spreaders. We adopt susceptible-infected (SI) model, a well-known infectious disease model, to verify the effectiveness of CEC through the spreading ability simulation of spreaders in actual networks. And the simulation results demonstrate the superiority of CEC.

1. Introduction

At present, complex networks are closely associated with our real life, for example, networks [1, 2], traffic systems [3, 4], power grids [5, 6], and ecological networks [7, 8]. Influential spreader identification remains an open and vital research issue that has attracted increasing attention, which helps to understand the structure of networks and control the propagation process. Some hazards caused by load propagation and cascading effect, for example, North American blackout and WannaCry’s spread, often begin with a small portion of spreaders but spread rapidly to the entire network [9, 10]; this small portion of spreaders has a great impact on network. Therefore, accurate quantification and identification of influential spreaders is very important. For instance, we can effectively suppress the spread of the virus and prevent its large-scale outbreak by vaccinating key individuals in infectious disease network [11]. In power grids, we can effectively prevent the cascading failure with taking prior precautions for circuits in vital areas [12]. In social network, such as MicroBlog and Twitter, we can control the dissemination of information to guide speech [13].

A variety of approaches have been proposed over the past few decades. Most of these approaches measure the influence of spreaders from the structural information of network.

There are a lot of methods proposed to search these key spreaders [14]. Degree centrality (DC) [15], one of the simplest and earliest methods, only counts the number of the directly connected spreaders and results in low complexity. Closeness centrality (CC) [16] measures spreader’s capability to affect others through the network, while it will fail when applied to disconnected networks. Betweenness centrality (BC) [17] measures spreader’s influence with the perspective from shortest path. Except these classical measures, some new methods have been proposed such as H-index centrality [18] and evidence theory [19]. Roberts et al. [20] suggested a centrality which considered the fourth-level neighbors as a trade-off measure. However, these centralities ignore the connections among spreaders, and then, the ClusterRank [21] was proposed by taking the effect of clustering coefficient into consideration. What is more, Kitsak et al. [22] measured the influence of spreaders from location perspective and put forward a new method named K-shell decomposition (Ks). The spreaders were moved layer by layer based on continuously updated DC value. The biggest problem of Ks is the poor distinguish capacity of centrality value, i.e., poor monotonicity. Then, some approaches were put forward to solve this issue. Zeng and Zhang [23] came up with a MDD approach by considering the degree of initial spreaders and removed spreaders. Bae and Kim [24] summed the Ks value of neighbors to measure the importance of spreaders. In addition, there are also some approaches based on iteration such as PageRank [25], LeaderRank [26], and Hits [27]. Different centralities reflect the influence of spreaders from limited parts; some researchers have proposed multiattribute ranking approaches which combines several centralities to comprehensively rank the influence of spreaders. Liu et al. [28] proposed an improved Ks and used TOPSIS to fuse DC, CC, and BC and improved K-shell decomposition. Yang et al. [29] combined DC, CC, and BC with VIKOR method and adopted entropy weighting method to reasonably obtain the weights of attributes. Wen and Deng proposed a local information dimensionality (LD) to rank key spreaders [30]. Wang et al. focused on the contribution of spreaders to network efficiency and proposed EffC method to identify influential spreaders [31].

In this paper, we consider the importance of spreaders from global information perspective, and then, a novel centrality called connectivity and efficiency centrality (CEC) is put forward. The consequences of network spreaders removal can be generally divided into two aspects [32, 33], one is that the connectivity of network topology is destroyed, and the other is that the performance of the network is degraded, which makes the network unable to meet the service requirement. Therefore, we consider the relative changes of connectivity and efficiency of network before and after removing spreaders, and the combination of them is taken as an indicator to determine the influence of spreaders. Note that the removal of spreaders will also delete the links connected to them at the same time. To assess the effectiveness of CEC, we adopt susceptible-infected (SI) model [34] to measure spreading ability of spreaders in actual databases, and we compare the performance between CEC and others to verify the superiority of CEC.

2. Centralities

Given a network , where and , respectively, represent the set of spreaders and the set of edges, they meet and . indicates the adjacent matrix; if spreader and spreader are connected by edge , ; otherwise,

Degree centrality [15], one of the simplest and earliest local centrality, only counts the number of the directly connected spreaders and results in low complexity.

Degree centrality indicates spreaders’ ability to communicate directly with others.

Closeness centrality [16] considers the influence of spreaders based on the distance between them. It measures spreader’s capability to affect others through the network. wherein represents the Euclidean distance between spreader and spreader . CC uses average transmission time of information to determine the influence of a spreader.

Betweenness centrality [17] measures spreader’s influence with the perspective from shortest path. BC considers a spreader influential if it expressed as a “bridge.” wherein represents the number of the shortest paths between spreader and spreader and indicates the number of shortest paths passing through spreader . BC can reflect the degree of independence between spreaders.

K-shell decomposition [22] measures the influence of spreaders from location perspective, which has important milestone significance. The spreaders were moved layer by layer based on continuously updated DC value.

3. The Proposed Centrality

We consider the influence of spreaders from global information perspective. The influence of spreaders can be measured by the relative changes of some global characteristic parameters of network before and after removing corresponding spreaders. The consequences of network spreader deletion can be generally divided into two parts, one is that the connectivity of network topology is destroyed, and the other is that the network efficiency is degraded, which makes the network unable to meet the service requirement. Both the two aspects should be taken into consideration to give comprehensive identification results.

Definition 1. The network connectivity represents the average influence of network to maintain connectivity, which is indicated as the mean value of the ratio of number of connected spreader pairs to the total number of spreader pairs in network. wherein represents the connection parameter from spreader to spreader ; if they have a connected path, including directly connected path and indirectly connected path, then ; otherwise, .

Definition 2. The residual network is denoted as after removing spreader from , and the relative changes of network connectivity can be defined as

Definition 3. The network efficiency refers to the effectiveness of information transmission on the network. It is denoted as wherein refers to the shortest distance between spreader and spreader . Note that if spreader and spreader have no connected path, and .

Definition 4. The residual network is denoted as after removing spreader from , and the relative changes of network efficiency can be written as

Definition 5. The proposed connectivity and efficiency centrality (CEC) can be defined as The greater the value of , the more influential the spreader .

4. Simulation and Analysis

4.1. Datasets

We choose four actual networks to conduct experiments and simulations, which cover multiple fields and network scales. (i) Karate club [35]: it is a widely used dataset describing the relationship between karate club members. (ii) Jazz musicians [36]: it is a social dataset describing the cooperative relationship between jazz musicians. (iii) USAir97 [37]: it is a transportation dataset representing the airline relationship of American airports in 1997. (iv) Email: it describes the email exchange in a university.

4.2. Experiment and Analysis
4.2.1. Experiment 1: Comparison of Top 10 Spreaders Ranked by Different Centralities

The influence of each spreader in network is calculated using CEC and classical centralities. The actual spreading ability I(t) () calculated by SI model is used as benchmark; the definition of I(t) will be introduced later. We pay attention to the top 10 spreaders sorted by several centralities. As shown in Table 1, in karate club network, the identification results of CEC and CC are the best due to their 10 same spreaders as I(t), and DC and BC have 9 same spreaders as I(t), while Ks owns 8 same spreaders. In Jazz musician network, shown in Table 2, there are 5 same spreaders with I(t) in top 10 lists using DC and CC, while it is only 2 using Ks. CEC owned 7 same spreaders as I(t) performs slightly worse than BC. Besides, the top 2 spreaders of CEC are the same with I(t). In USAir97 network (Table 3), DC and CC both own 6 same spreaders; the number of same spreaders of CEC, BC, and Ks is 9, 7, and 3, separately. In email network, depicted in Table 4, CEC, CC, and BC have 6 same spreaders as I(t), which is lightly greater than DC, while there is no any same spreader between Ks and I(t). In a word, CEC has the most similar performance with actual ranking results; that is, CEC can identify spreaders more accurately.

4.2.2. Experiment 2: Comparison of Capability of Different Centralities to Distinguish Spreaders’ Spreading Ability

When ranking the influence of spreaders, we find that some spreaders have the same centrality value and it is impossible to distinguish them. This phenomenon will reduce the accuracy of centrality. We consider the frequency of spreaders with same rank as an index to assess the distinguishing capability. The lower the frequency, the better the method. The experimental results of different centralities are shown in Figure 1. In the four networks, CEC has the lowest frequency; that is to say, CEC performs best in distinguishing spreaders’ spreading ability. However, the frequency of DC and Ks is greater than other methods. The experimental result indicates the superiority of our method in distinguishing spreading ability.

4.2.3. Experiment 3: Comparison of the Average Spreading Ability of Top 10 Spreaders

We conduct transmission simulation with SI model [34] to examine the spreading ability of spreaders. We take spreader as the source spreader and the spread process will start from the source spreader. The total number of infected spreaders will reach after time step. Then, the spreading ability, denoted as , is expressed as ratio of infected spreaders to network size. And the average spreading ability of top 10 spreaders is represented as . We set ; the simulation results are presented in Figure 2.

From Figure 2, the average spreading ability of top 10 spreaders increases with , and eventually almost the entire network is infected. In karate club network, we can see that the black curve and the blue curve overlap; that is to say, the spreading ability of CEC is the same with CC, because the top 10 spreaders of them are the same. It is clear that the spreading ability of CEC is superior to that of DC, BC, and Ks. In Jazz musician network, we can find that there are more infected spreaders of CEC than others, which demonstrates that the spreading ability of CEC is better than that of other methods. In USAir97 network, CEC is marginally better than CC, DC, and Ks, and BC is the poorest because the average number of infected spreaders of BC is much less than that of others. In email network, the number of infected spreaders at each step of CEC is marginally greater than DC, CC, and BC.

4.2.4. Experiment 4: Comparison of the Correlation between Centralities and the Actual Ranking Result

We choose Kendall’s tau coefficient () [36] to be a linear correlation coefficient between the five methods and the actual ranking result. The value of ranges between [0, 1]; the larger the value of is, the more similar two sequences is. Give two sequences and . () is regarded as a positive sequence pair when and , or and , or else it will be considered as a negative sequence pair. Then, Kendall’s tau can be denoted as , where and indicate the number of positive sequence pairs and negative sequence pairs, respectively, and .

We consider the ranking list at obtained by SI model as the actual ranking result ; then, we calculate the correlation between and centralities. As shown in Figure 3, CEC outweighs other centralities before spreading probability 0.07 in karate club network, and it is lower than CC after spreading probability 0.08. In Jazz musician network, DC has the greatest value, while it has very poor performance in email network, and the value of CEC is similar with CC. In USAir97 network, CEC outweighs other centralities across the spreading probability. In email network, the value of CEC is lower than CC and Ks before spreading probability 0.04, and it is similar with Ks after spreading probability 0.04. Overall speaking, CEC has the best correlation with actual ranking result in the four networks.

5. Conclusion

Identifying influential spreaders is essential for network invulnerability. In this paper, we pay attention to the approach of identifying influential spreaders based on global information, and the connectivity and efficiency centrality (CEC) are put forward to achieve this goal. Removing spreaders and the corresponding links will lead to two consequences: the destruction of network connectivity and the decline of network efficiency. Therefore, we consider both the two aspects to provide a novel centrality in identifying influential spreaders. The relative changes of network connectivity and efficiency before and after removing spreaders are taken as indicators to measure the influence of spreaders; we combine the relative changes of network connectivity and efficiency to give comprehensive identifying results. The greater the relative changes, the more influential the spreader. We conduct several experiments based on actual datasets, and the results show that CEC performs better than other methods.

Data Availability

All data are available in the manuscript references, which can be accessed at Pubmed, google scholar and other web resources.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.