Identification of Influential Nodes via Effective Distance-based Centrality Mechanism in Complex Networks

Ullah, Aman; wang, Bin; Sheng, Jinfang; Long, Jun; Khan, Nasrullah

doi:https://doi.org/10.1155/2021/8403738

Complexity

On this page

Abstract Introduction Related Work Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 8403738 | https://doi.org/10.1155/2021/8403738

Identification of Influential Nodes via Effective Distance-based Centrality Mechanism in Complex Networks

Aman Ullah,¹Bin wang,¹Jinfang Sheng,¹Jun Long,^1,2and Nasrullah Khan^3,4

Academic Editor: Lucia Valentina Gambuzza

Received13 Jul 2020

Revised07 Sept 2020

Accepted25 Jan 2021

Published09 Feb 2021

Abstract

Efficient identification of influential nodes is one of the essential aspects in the field of complex networks, which has excellent theoretical and practical significance in the real world. A valuable number of approaches have been developed and deployed in these areas where just a few have used centrality measures along with their concerning deficiencies and limitations in their studies. Therefore, to resolve these challenging issues, we propose a novel effective distance-based centrality (EDBC) algorithm for the identification of influential nodes in concerning networks. EDBC algorithm comprises factors such as the power of K-shell, degree nodes, effective distance, and numerous levels of neighbor’s influence or neighborhood potential. The performance of the proposed algorithm is evaluated on nine real-world networks, where a susceptible infected recovered (SIR) epidemic model is employed to examine the spreading dynamics of each node. Simulation results demonstrate that the proposed algorithm outperforms the existing techniques such as eigenvector, betweenness, closeness centralities, hyperlink-induced topic search, H-index, K-shell, page rank, profit leader, and gravity over a valuable margin.

1. Introduction

In recent years, complex networks are an attractive and hot research area by virtue of its wide range of practical and theoretical applications in many major fields [1–5]. Several real-world application areas such as management science, chemistry, economics, and financial systems [6, 7], computer science, biological science [8, 9], and many other similar fields can be regarded as complex networks [10, 11]. Finding the most important nodes is helpful to efficiently analyze the entire network, such as controlling the spreading of disease, detecting the most vital node of the disease transmission rapidly can control the spreading of disease, and erecting a new marketing tool [12–15]. Some of the existing commonly used approaches to identify the influential nodes are closeness centrality (CC) [16], degree centrality (DC) [17], betweenness centrality (BC) [18], information centrality (IC) [19], load centrality(LC) [20], eigenvector centrality(EC) [21], page rank (PR) [22], H-index [23], K-shell decomposition, hyperlink-induced topic search (HITS) [24, 25], and so on. Several algorithms are usually not precise, but precise algorithms have comparatively high computational time complexity. For example, DC is a very simple method of finding influential nodes, but it only considers limited information. Therefore, it is not suitable in some cases. Similarly, BC and CC are not applicable in large networks because of their high computation time complexity. PR is based on global information, so it works very well in directed networks but is not suitable for undirected ones [26]. Similarly, the K-shell algorithm takes into account the core or periphery position of the nodes in the networks to investigate influential nodes. It is a straightforward index mechanism for finding the influential nodes; due to its low complexity, it can be applied in large-scale networks, although it cannot differentiate among the key nodes in the same core layer [27–29]. All in all, the above existing approaches have low accuracy of ranking and are ineffective for some networks. There are numerous ranking approaches for influential or key nodes identification, but it is not easy to find all influential nodes. Frequently, two types of models are used to evaluate ranking results of important nodes: (i) SIR model [30] and (ii) Kendall’s correlation coefficient [31]. The main job of the SIR model is to choose the top-k ranking node’s results as seed nodes. When the selected seed nodes or vertices of a technique make the flow of network spread faster, then the technique is said to be better than the other techniques. For measurement between two ranking results, Kendall’s correlation coefficient is used. If Kendall’s correlation coefficient is positive, then it means the two rankings are positively correlated, and if Kendall’s correlation coefficient is negative, then the two rankings are negatively correlated. If the coefficient is 1, it means the rankings are the same, and if it is -1, it means the rankings are completely opposite. The technique is said to be more accurate when Kendall’s correlation coefficient is higher. From the above discussion, we propose a new, efficient, and effective method termed as EDBC algorithm for key or influential nodes identification in complex networks. EDBC comprises effective distance, K-shell, nodes degree, and many other levels of neighbor influences or neighborhood potential-like factors. We applied EDBC on nine different real-world networks. The simulation results proved that our algorithm is better than other conventional algorithms, including newly developed algorithms such as gravity centrality and profit leader, which were, respectively, developed in 2017 and 2018.

The core contribution of the proposed EDBC algorithm is summarized as follows. (1) A new ranking centrality perspective: from the last two decades, several models and algorithms regarding the identification of influential or key nodes have been developed but still it is a challenge. In this regard, we propose a novel effective distance-based centrality algorithm which is comprising of several features to have experimented on unweighted networks and structure to sort out the important nodes. (2) Accuracy: we have compared the proposed algorithm with various ranking methods where experiments have shown that EDBC is comparatively more effective and efficient algorithm. Through rendering comprehensive experiments on nine real-world networks, the EDBC has outperformed various existing algorithms such as BC, CC, EC, PR, HITS, H-index, K-shell, GR, and PL. (3) Parameter-free : the proposed EDBC neither depends on prior knowledge nor relies on adjustments of parameters. It has its own way to quickly and efficiently identifying the key nodes. (4) Applicability: the proposed EDBC reduces the calculation cost as its basis on two-stage neighbor nodes. Therefore, its application is appropriate on any type of networks, i.e., directed or undirected networks.

The rest of this paper is organized as follows. In Section 2, we present overviews of the related work. In Section 3, we discuss the proposed EDBC algorithm, whereas its performance evolution is discussed in Section 4. Finally, we elaborate conclusion and future recommendations in Section 5.

The criterion of detecting the most important nodes in complex networks is one of the challenging issues from the past few decades. Two types of centrality measures can be used for identification of key nodes, i.e., (i) matrix-based centrality such as random walk matrix [32], Laplacian matrix [33], and adjacency matrix [34] (ii) superficial-based properties such as CC, BC, DC, and motif centrality measurement [35]. Currently, a vast number of studies have been offered regarding the identification of the key nodes, e.g., node removals, profit leader [36], and local neighbor contribution [3]; these studies have their own advantages and disadvantages. Zhong-Kui Bao et al. [37] proposed heuristic clustering (HC) method of detecting the most influential nodes; HC works on the basis of similarity index which categorizes nodes into various clusters; in this way, the center nodes in clusters are taken as multiple spreaders. Hongming Mo et al. [38] proposed an evidential method for the influential node identification, which is based on the Dempster–Shafer evidence theory. Furthermore, Bian and Yong [39] introduced a new evidential centrality (NEC) algorithm, which is the extension of the evidential method. Tian et al. [40] proposed an analytic hierarchy process (AHP), which works on the basis of multiple attribute decision-making (MADM) method; AHP is used to detect the important branch of every decision and choose the best nodes in the entire network. Zeng et al. [41] have proposed a mixed degree decomposition algorithm for finding ranking nodes; however, the limitation of this technique is the high time complexity and high degree peripheral nodes. Lin et al. [42] improve the K-shell decomposition method; they consider the shortest distance among the nodes and set of nodes by high index K-shell. For further improvement of the K-shell decomposition method, Bae et al. [43] suggested neighborhood coreness centrality (cn) and extended neighborhood coreness (cn+) algorithms. In [44], authors proposed a centrality method for influential nodes identification, which is based on gravity formula. In this method, mass is replaced by the K-shell value of each node and the distance is the shortest path between two nodes. According to the authors, the proposed gravity-based method has better results than other existing methods. Liu et al. [45] proposed a weight degree centrality measure method, which considered the degree of nodes, neighborhoods of their nodes, and one tuning parameter. They compared the proposed WDC algorithm with other various algorithms such as BC, DC, K-shell decomposition, neighborhood coreness, and extended neighborhood coreness methods. A neighborhood centrality method was proposed by Liu et al. [46] for influential nodes identification, which cogitates numerous neighbors’ level influence factors that are based on benchmark centrality measures. On the contrary, there were some indexing centrality methods which are based on edges’ potential to enhance the effectiveness of ranking centrality measures [47, 48]. Pei et al. [49] have claimed that the K-shell decomposition algorithm performs well for finding key nodes in the entire real network. In [50], a bio-inspired centrality method was proposed, which combines the K-shell index, with Physarum centrality to identify key nodes. Gomez et al. [51] improve the BC centrality method to take into account explicitly numerous dimensions. Furthermore, Zang et al. [52] proposed an advanced unbiased betweenness centrality method, which considers a reverse propagation algorithm to choose the key nodes in the real network. There have been numerous studies to improve the existing algorithm for influential nodes identification. Though, all these studies have their own advantages and disadvantages. Still, efficient and effective influential nodes identification persists as a nontrivial challenge. Inspired from [44, 53], we proposed an efficient algorithm called EDBC, which effectively identifies the highly influential nodes in various scales of complex networks.

3. EDBC Model

In this section, we discuss some basic concepts about the proposed method. The proposed method considers the degree of nodes, effective distance, power of K-shell, and several levels of neighborhood nodes influences. EDBC calculation is presented in Sections 3.1 and 3.2.

3.1. Preliminaries

Definition 1. (degree ). In graph theory, the degree of a node denotes the number of all connected edges in the node, that is, the sum of all connected edges in the network. In graph G, set denotes the adjacency matrix of G which is defined as follows:where represents node degree,, and indicates the element of .

Definition 2. (power of K-shell ). We consider the nodes and for K-shell ( and ). We add and in equation (2) for the measurement of power influences:where and represent the K-shell of node and node . Here, we used the square root for normalization of the influence factor for the power of K-shell.

Definition 3. (effective distance , see [54]). Effective distance is an essential parameter in the networks spreading process, and we employ this parameter in the EDBC algorithm. For calculating the distance between the nodes, we have not used the existing distance path algorithm such as Dijkstra and the Bellman–Ford algorithm because the time complexities of these algorithms are very high. For this purpose, we propose effective distance between the nodes, which can be calculated through the following formula:where denotes the flow of information ratio from node to : . As can be seen from the formula, .

Definition 4. (influence ). Normally, when the neighbors of nodes have high influence, then nodes’ influences will increase automatically. Besides, the influence of nodes on neighbors will be decreased with an increase of shortest distances among them. Inspired from inverse square law, we can compute the influence of node as follows:The sum of node influence on all neighbor nodes, which measure the influence of that node, which can be calculated as follows:where is the set of nearest all neighbors node .
Algorithm 1 depicts the working mechanism of our proposed solution.

Input:;
(1)	for each node in G = (N, M) do
(2)	calculate the degree of v
(3)	calculate the K-shell of v
(4)	//Computing the interaction in the most nearest neighbors nodes
(5)	for node u in G.neighbors of node do
(6)	evaluate the power of K-shell () using equation (2)
(7)	evaluate the power of K-shell (u) using equation (2)
(8)	evaluate ED(v,u) using equation (4)
(9)	evaluate influence(v,u) using equation (5)
(10)	//Computing the interaction in the next-nearest neighbors’ nodes
(11)	for node in G.neighbors of node u do
(12)	evaluate the power of K-shell () using equation (2)
(13)	evaluate the power of K-shell () using equation (2)
(14)	evaluate ED(v,w) using equation (3)
(15)	evaluate influence(v,w) using equation (4)
(16)	end for
(17)	end for
(18)	end for
(19)	//Calculate influence I(v) of all neighbours’ nodes
(20)	evaluate EDBC(v) using equation (5)
(21)	return EDBC(v)
Output: Ranked nodes

3.2. EDBC Algorithm

The node influence in a network depends on three aspects. (i) The node location in the network when a node is in the center of the network, and the influence will be high. Otherwise, if the node is located on the edge of the network, the influence will definitely be relatively small. (ii) The number of node neighbors: a large number of neighbors will be the greater influence of the node. (iii) The distance between the nodes: the smaller the distance is, there is a greater probability of information transmission between nodes and its neighbors. On this basis, we proposed EDBC algorithm, as described in section 1. The proposed EDBC algorithm mainly comprises factors such as the power of K-shell, degree nodes, effective distance, and numerous levels of neighbor influence. At the first stage, it calculates the degree and K-shell of nodes in network, and then, it finds the power of K-shell according to equation (2). Furthermore, it calculates the distance between the nodes (nearest and next-nearest neighbors) through equation (3), and finally, it calculates the influence of each node using equations (4) and (5). To further demonstrate the EDBC algorithm over a specific calculation process, we consider a simple network to provide a clear and detailed picture of the proposed methodology. Figure 1 presents a synthetic network with 16 nodes and 21 edges, where node V16 has a higher influence than other nodes in the network. Moreover, a control flowchart of EDBC algorithm is also presented through Figure 2, where, in the first phase, we have constructed a concerning network; then, we calculate the degree and K-shell of each node; after that, we calculate the power of K-shell as well as effective distance that only considers the nearest and the next-nearest neighbors nodes. At the last step, influence of each node in the entire network is hereby calculated.

3.2.1. Computation of the Power of K-Shell

For calculating the power of K-shell, here we apply equation (2). For node , the K-shell value is 2; first of all, we calculate the K-shell of node and node . Table 1 shows the K-shell of given toy network. We have , , and . Similarly, in this way, other K-shell power can be calculated.

3.2.2. Calculation of the Effective Distance

ED (between nearest and next-nearest neighbor of nodes) depends on the closeness among the nodes and its neighbors, to calculate effective distance among the neighbors and the next neighbors of V16 by using equation (3):where

Now, to calculate the distance between the next neighbors of , the distance can be

Similarly,

3.2.3. Calculation of the Neighbor Nodes’ Influence

In this context, EDBC algorithm considers only the nearest and the next-nearest neighbor’s nodes. Here, we take V9 as an example of the nearest neighbor node, according to equation (4):where is the degree of V16, which is 8. Now, we take the next-nearest neighbor of V9. Therefore, the influence of node V16 to node V7 or node V5 can be computed as follows:

By this rule, all the nearest and next-nearest neighbor’s nodes of the influence with V16 can be calculated.

3.2.4. Computation of the Influence for Each Node

The computation of the total influence of the nodes in the entire network means the sum of the total number of influence nodes on the nearest and next-nearest neighbor nodes. By using equation (5), we can compute the total influence of the node V16 for a given graph:where and show the nearest and next-nearest neighbor nodes. Applying all of the above procedure, we get all nodes influence the ranking results of the given toy network, as shown in Table 2.

3.3. Computational Complexity

EDBC takes three components. In the first phase, it calculates the degree and K-shell, so the time complexity becomes . In the second phase, it computes the effective distance among the nodes and their adjacent or neighbors. The distance between the nearest and next-nearest neighbors in the entire network is calculated. Therefore, the computational time complexity is O , and in the third and fourth phases, the node’s influence of all neighbors will be calculated. Therefore, EDBC algorithm’s total time complexity can become , where N and k denote the number of nodes and average degree of nodes in the network, respectively.

4. Experimentation and Results’ Analysis

In this section, we discuss the experimental setup we performed on various real-world networks and datasets to overview of the performance evaluation and comparison between the proposed EDBC algorithm and ten aforementioned state-of-the-art algorithms. We implemented EDBC algorithm in Python 3.7 and rendered experiments on Feiteng Server (1, 16-core, and 1.5gHZ) provided by Kirin Operating System in the main lab of our school.

4.1. Comparison of Some Benchmark Centralities

We compared EDBC of our proposed methodology with nine baseline algorithms, which we are going to briefly summarize and classify according to their characteristic in this section. The existing centrality measures are classified into categories:(a)Structure-based centralities: the node influence is significantly affected via the network topology. In fact, a lot of centralities consider structural information for the identification of influential nodes. The structure-based measure is further divided into two categories: (i) based on the neighborhood of each node, for example, K-shell and H-index and (ii) based on paths between nodes, such as betweenness and closeness centralities. K-shell (KS) algorithm decomposes a network into sublayers that are directly connected to the centrality [55]. In fact, this algorithm assigns Ks (integer index) to each node, which are representatives of the nodes’ location in the network. The Ks of each node with high or low values is placed at the center or periphery of the network. By this rule, the network is designated via a layered structure, revealing the complete hierarchy of its nodes. The innermost nodes are known as core or nucleus, and K-shell nodes are placed into internal and external layers. It is a straightforward indexing algorithm for identifying the influential nodes and can be applied to any network; however, it cannot differentiate the influential nodes in the same core layer. Therefore, KS algorithm is not suitable for some of the networks. Our experiments provide evidence in this case. The H-index [23] algorithm is commonly used to evaluate researchers’ and scientists’ academic achievements. The computation process of the H value in the H-index signifies that the author or scholar has at least H citations of H article published by an author. High H-index reflects that the node has greater influence. Still, this algorithm has many problems with its effective execution, e.g., edge value weight needs to be in appropriate range, otherwise the desired ranking could not be achieved. Betweeness centrality (BC) [18] is used to computes important nodes via global information. It works on the shortest path between the nodes where a node with a higher BC value indicates that it is more important than other nodes in the entire network. However, it is not appropriate for large and complex networks, whereas experimental results have clearly depicted that EDBC is a suitable approach for any kind of network whether it is large or small. Closeness centrality (CC) [16] depends on global information, and it works on the relative distance between each pair of nodes to detect the important nodes. CC can better be identifying influential nodes but very difficult to apply in large networks. The limitation of this algorithm is the lack of applicability to networks by disconnected components. In short, these kinds of centralities have high computational complexity and cannot be suitable for large-scale networks.(b)Eigenvector-based centralities: eigenvector-based centralities not only consider the neighborhood node number but also consider their influences. These centralities are PageRank, eigenvector, profit leader, HITS, and gravity index. Eigenvector centrality (EC) [21] is based on the information gain method to choose important nodes in the network. It neither depends only on the degree number of neighborhood nodes nor the impact of each neighbor node. This algorithm has excessive applications both in theoretical and practical. Though, if there exist several nodes with higher degrees in the graph, the risky phenomenon of fractional convergence will still occur; therefore, EC is not an appropriate solution for such networks. PageRank (PR) [22] is one of the famous centrality algorithms; Google search engine is working based on this algorithm. Like EC, PR supposes that the influence of a web page depends on both the quantity and quality of the pages joined by it [55]. It is suitable for directed networks but not for undirected and unweighted, but EDBC is appropriate in case of any type of network whether it is directed or undirected. hyperlink-induced topic search (HITS) [25] is a link analysis technique that uses various metrics concurrently. It scrutinizes the influence of the node via two attributes, i.e., hub and authority value, where the hub value emulates the node role for information transmission, while the authority value analyzes the original node creativity in the information. These two attributes interact and converge through the iteration process. Profit leader (PL) [36] is one of the most recent algorithms, which was proposed by Yu et al. in May 2018. This algorithm is based on the profit leader concept, which chooses the important nodes in the entire network by calculating the profit the node can make. This algorithm is very simple and applicable to some networks. However, this algorithm is not working very well in case of small networks. Gravity (GR) was proposed by Ma et al. in 2016 [44], which works on the principle of gravity formula. Here, mass is replaced by the K-shell value of each node, and the distance is the shortest path between two nodes, to identify the key nodes in the network. Its drawback is the computational time complexity which is too high to be suitable for large-scale networks.

4.2. Data Description

We evaluated EDBC on nine real-world networks to reveal its performance. The data has been collected from different genres of fields where its basic properties, scales, and structures are presented in Table 3. These datasets are publicly available and can be downloaded from sites http://konect.uni-koblenz.de/networks/, http://snap.stanford.edu/data/as-caida.html, and http://networkrepository.com/web-spam.php. Physicians [56]. This dataset contains 241 nodes and 1098 edges, collected in 1966 by RonBurt. A node denotes a physician, and the link between two physicians represents that the left-hand physician is the friend of the right-hand physician. There is only one link that subsists between any two nodes. e-mail [57]. This network is an e-mail system at the University RV in south Tarragona, Spain. It consists of 5451 links and 1133 nodes. Nodes denote users, and links show that one e-mail was sent at least. Subelij-euroroad [58]. This is a Europe e-road network, which contains 1174 nodes and 1417 edges. Nodes represent cities, and edges of the two nodes indicate that they are linked via an e-road. Air-traffic control [59]. This network was created for FAA (federal aviation administration) system in the USA. Nodes represent service centers or airports, and edges are formed from a string of preferred routes. Petster-friendships [60]. This network consists of friendships among users on the website hamsterster.com. In this network, users represent nodes, and the closeness among users represents edges. US-Powergrid [61]. This is an undirected network which consists of information about the power grid of the Western states in the USA. It contains 6594 links and 4941 nodes. The generator or transformator in the network represents nodes, while the power supply line denotes edges. Web-spam [62]. This dataset is constructed by Purdue University network repository. It consists of 4767 nodes and 37,375 edges. Pages and hyperlinks represent nodes and edges, respectively. PGP algorithm [63]. This is the user’s communication network for the Pretty Good Privacy (PGP) algorithm and consists of 24,316 links and 10,680 nodes. It comprises only the giant interconnected elements of the network. CAIDA-project [64]. This is the CAIDA-project network, collected in 2007 and containing 26,475 nodes and 53,381 edges. A node represents autonomous systems (AS), while edges represent communication.

4.3. Evaluation Metrics

4.3.1. SIR Model

To evaluate the effectiveness of the identification of influential nodes, we employ the SIR model to simulate the spreading influence of ranking nodes in our experiments [65, 66]. This model consists of three states, i.e., S (susceptible) means a healthy state and may be infected by others. I (infected) means an infected state and can infect other individuals. R (recovered) represents a recovered state, which cannot be infected by other individuals again. All the seed nodes in the network are in a susceptible state initially. At each time step, the seed node in the network may infect its neighbor’s susceptible nodes via a probability ; then, infected nodes are recovered (enter into the recovered state) with probability . This process will continue until there has no longer infected nodes. Finally, all the recovered nodes are used to imitate the real node influence.

S(t), I(t), and R(t) denote the nodes’ numbers in susceptible, infected, and recovered states. So,

The spreading influence of the node is as follows:where denotes iteration numbers; here, we set independent run, and and represent the number of infected and recovered nodes, respectively.

4.3.2. Kendall’s Correlation Coefficient

Kendall’s correlation coefficient [67, 68] is used to measure the correctness results of two ranking methods. In this paper, we acquired Kendall’s correlation coefficient to measure the performance of the proposed algorithm. Let us suppose that two node sequences are correlated with similar nodes’ number (n), and . One pair of two annotations and are said to be concordant if the ranking of both component agree, i.e., if both and or and . They are said to be discordant if and or and or if or , the pair is neither concordant nor discordant. Kendall’s is defined as follows:where and represent the number of concordant and discordant pairs, respectively. If Kendall’s correlation coefficient (t) is positive, then the two rankings are positively correlated, and if Kendall’s correlation coefficient (t) is negative, then two rankings are negatively correlated. The coefficient is 1 when the two rankings are the same, and if the coefficient is -1, then it means the two rankings are completely opposite. Higher Kendall’s values specify more precise and better performance.

4.4. Performance Evaluation

In this experiment, we used the SIR and Kendall’s models to verify the effectiveness of EDBC. First, we have used a toy network as an example, as shown in Figure 1, where we applied EDBC to identify the influential nodes, compared it with nine baseline algorithms, and listed the ranking results and SIR values of each node, as shown in Table 4. It can be seen that the EDBC performs well than other various raking algorithms in terms of identification of influential nodes’ ranking. Figure 3 illustrates Kendall’s correlation coefficient () calculation results of the ten algorithms, where the ranking effects are generated via the K-shell, HITS, H-index, PageRank, BC, CC, EC, PL, and GR.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

In Figure 3, EDBC gets the highest Kendall’s correlation coefficients in (physicians, e-mail, Subelij-eroroad, petster-friendships, air-traffic control, US-powergrid, and web-spam networks. The results in the range between 0.8 and 1 such as 0.9 for physicians and web-spam, 0.95 for e-mail, Subelij-euroroad, CAIDA-project, and US-powergrid, and 1.0 for PGP algorithm and petster-friendships, respectively. Figure 3 shows the performance between ten algorithms based on the comparison of the Kendall’s , and EDBC performs well on all kinds of networks. We note that the performance of BC measure is always the worst on all networks because BC is usually based on the definition of the shortest paths between the nodes and information in most networks which does not flow along with the shortest paths [69].

In order to further examine the performance measures of EDBC. We used the SIR model for spreading the impact of the ranked nodes, where we used a relatively small for big networks such as PGP algorithm and CAIDA-project. Especially, we have set due to the bigger value, and the propagation will occur across the whole network [70]; in this situation, it will be hard to separate the influence of different nodes. For small networks such as web-spam, US-powergrid, physicians, e-mail, Subelij-euroroad, petster-friendships, and air-traffic control, we have set to estimate the influences of each node. Therefore, either we achieve the propagation efficiency in the entire network for all nodes or we keep the recovery rate probability, i.e., is equal to 1 and the time is equal to 1000.

The ranking results of the average number of infected nodes by the ten algorithms’ comparison are shown in Figure 4. A more influential node can affect more nodes; therefore, an efficient algorithm can produce a curve that reduces from left to right showing that EDBC is comparatively capable of the best performance on physicians, e-mail, Subelij-euroroad, air-traffic control, US-powergrid, and web-spam networks. Performance-wise, on petster-friendships network, the GR and EDBC algorithms are approximately the same, but still, EDBC works well in other networks; it is why all the curves are drawn smoothly with a negligible variation. For CAIDA-Project, all the algorithms show similar effects, but EDBC still provides a greater spreading effect than other algorithms. Moreover, ranking results of the performance of ten mentioned algorithms on the Subelij-euroroad network are presented in Table 5, where each node is considered a seed node that has recursively infected its neighbor nodes. It can be seen that EDBC performs better than other baseline algorithms. In addition, we performed the influence comparison of the top-10 nodes, which are distinctly selected by EDBC and other various algorithms. All top-10 distinct nodes are selected as seed nodes, and the time is set to t where values’ range of t varies from 1 to 20. As shown in Table 6, EDBC has highest propagation capability for all nodes, and it clearly shows that when the infection increases as time t increase, we finally get a steady value at a time t after having consecutive time points. Table 7 shows the top-10 ranked nodes. Due to the limited space, we have only shown the top-10 nodes of PGP algorithm network. Consequently, there are ten seed nodes, and most network propagation arrives at a steady-state at time period t = 20 where we can examine the spreading effect of EDBC and all some other baseline algorithms. Moreover, Figure 5 indicates that the EDBC algorithm has a good spreading efficiency of top-10 nodes. Specifically, EDBC has better performance on the physicians, e-mail, Subelij-euroroad, air-traffic control, petster-friendships, and PGP algorithm US-Powergrid networks. In petster-friendships and CAIDA-project networks, PR and EC have the best propagation effect. And, EDBC has also better spreading efficiency than other baseline algorithms.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

5. Conclusion and Future Recommendations

This paper investigates the problem of influential or key node ranking from a performance evaluation’s viewpoint. In view of related studies, several methods have been offered to explore how to detect the key or influential node-based centrality measures. However, these centrality measures’ algorithms have their own advantages and disadvantages. In this study, we proposed an effective-based centrality method for detecting the influential nodes in complex networks. Our proposed algorithm considered K-shell, degree nodes, effective distance, and several levels of neighbor’s influence or neighborhood potential. Consequently, this algorithm can be employed in any network, i.e., directed or undirected. In order to evaluate the performance and effectiveness of the proposed algorithm, we applied several types of real-world networks and used two standard evaluation criteria, SIR and Kendall’s correlation coefficient methods, to analyze the spreading influence of ranking nodes. The experimental setup demonstrated that the proposed algorithm regarding accuracy and effectiveness is reasonable and significant as compared to the classical sorting algorithms and recently proposed several relevant algorithms. However, there persist challenging issues that need to be addressed to the quality extent of current work. For instance, adding more parameters to adjust the intensity between the nodes to yield better performance is a challenge. Onward, we plan to improve the proposed algorithm in consideration of weighted formal-concept analysis.

Data Availability

All relevant data are publicly available at http://konect.uni-koblenz.de/networks/, http://snap.stanford.edu/data/as-caida.html and http://networkrepository.com/web-spam.php.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Program of China under Grant no. 2018YFB1003602.

References

J. Zhao, Y. Wang, and Y. Deng, “Identifying influential nodes in complex networks from global perspective,” Chaos, Solitons & Fractals, vol. 133, Article ID 109637, 2020.
View at: Google Scholar
N. Zhao, J. Bao, and N. Chen, “Ranking influential nodes in complex networks with information entropy method,” Complexity, vol. 2020, Article ID 5903798, 15 pages, 2020.
View at: Publisher Site | Google Scholar
J. Dai, B. Wang, J. Sheng et al., “Identifying influential nodes in complex networks based on local neighbor contribution,” IEEE Access, vol. 7, pp. 131719–131731, 2019.
View at: Publisher Site | Google Scholar
X. Chen, M. Tan, J. Zhao, T. Yang, D. Wu, and R. Zhao, “Identifying influential nodes in complex networks based on a spreading influence related centrality,” Physica A: Statistical Mechanics and Its Applications, vol. 536, Article ID 122481, 2019.
View at: Publisher Site | Google Scholar
S. Wang, Y. Du, and Y. Deng, “A new measure of identifying influential nodes: efficiency centrality,” Communications in Nonlinear Science and Numerical Simulation, vol. 47, pp. 151–163, 2017.
View at: Publisher Site | Google Scholar
F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo, A. Vespignani, and D. R. White, “Economic networks: the new challenges,” Science, vol. 325, no. 5939, p. 422, 2009.
View at: Publisher Site | Google Scholar
R. C. T. M. Garas and H. S. Argyrakis, “Worldwide spreading of economic crisis,” New Journal of Physics, vol. 12, no. 11, 2010.
View at: Google Scholar
U. Alon, “Biological networks: the tinkerer as an engineer,” Science, vol. 301, no. 5641, p. 1866, 2003.
View at: Google Scholar
R. Milo and S. Shen-Orr, “Network motifs: simple building blocks of complex networks,” Science, vol. 298, no. 5594, p. 824, 2002.
View at: Publisher Site | Google Scholar
H. Yu, X. Cao, Z. Liu, and Y. Li, “Identifying key nodes based on improved structural holes in complex networks,” Physica A: Statistical Mechanics and Its Applications, vol. 486, pp. 318–327, 2017.
View at: Publisher Site | Google Scholar
J. Liu, Q. Xiong, W. Shi, X. Shi, and K. Wang, “Evaluating the importance of nodes in complex networks,” Physica A: Statistical Mechanics and Its Applications, vol. 452, pp. 209–219, 2016.
View at: Publisher Site | Google Scholar
J. Liu, F. Lian, and M. Mallick, “Distributed compressed sensing based joint detection and tracking for multistatic radar system,” Information Sciences, vol. 369, pp. 100–118, 2016.
View at: Publisher Site | Google Scholar
A. Sheikhahmadi, M. A. Nematbakhsh, and A. Shokrollahi, “Improving detection of influential nodes in complex networks,” Physica A: Statistical Mechanics and Its Applications, vol. 436, pp. 833–845, 2015.
View at: Publisher Site | Google Scholar
D. Y. Kenett, M. Perc, and S. Boccaletti, “Networks of networks - an introduction,” Chaos, Solitons & Fractals, vol. 80, pp. 1–6, 2015.
View at: Publisher Site | Google Scholar
T. Bian and Y. Deng, “Identifying influential nodes in complex networks: a node information dimension approach,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 28, no. 4, Article ID 043109, 2018.
View at: Publisher Site | Google Scholar
G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31, no. 4, pp. 581–603, 1966.
View at: Publisher Site | Google Scholar
P. Bonacich, “Factoring and weighting approaches to status scores and clique identification,” Journal of Mathematical Sociology, vol. 2, no. 1, pp. 113–120, 1972.
View at: Google Scholar
L. C. Freeman, “A set of measures of centrality based on betweenness,” Sociometry, vol. 40, no. 1, pp. 35–41, 1977.
View at: Google Scholar
E. Estrada and N. Hatano, “Resistance distance, information centrality, node vulnerability and vibrations in complex networks,” in Network Science: Complexity in Nature and Technology, p. 13, Springer, London, UK, 2010.
View at: Publisher Site | Google Scholar
M. E. J. Newman, “Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” Physical Review E, vol. 64, Article ID 016132, 2001.
View at: Publisher Site | Google Scholar
E. Estrada and J. A. Rodríguez-Velázquez, “Subgraph centrality in complex networks,” Physical Review E, vol. 71, Article ID 056103, 2005.
View at: Publisher Site | Google Scholar
S. Brin and L. Page, “Reprint of: the anatomy of a large-scale hypertextual web search engine,” Computer Networks, vol. 56, no. 18, pp. 3825–3833, 2012.
View at: Publisher Site | Google Scholar
D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang, and T. Zhou, “Vital nodes identification in complex networks,” Physics Reports, vol. 650, pp. 1–63, 2016.
View at: Google Scholar
M. Kitsak, L. K. Gallos, S. Havlin et al., Identification of Influential Spreaders In Complex Networks, 2010.
J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM, vol. 46, no. 5, pp. 604–632, 1999.
View at: Publisher Site | Google Scholar
L. Lv, K. Zhang, T. Zhang, D. Bardou, J. Zhang, and Y. Cai, “Pagerank centrality for temporal networks,” Physics Letters A, vol. 383, no. 12, pp. 1215–1222, 2019.
View at: Publisher Site | Google Scholar
M. Kitsak, L. K. Gallos, S. Havlin et al., “Identification of influential spreaders in complex networks,” Nature Physics, vol. 6, no. 11, p. 888, 2010.
View at: Publisher Site | Google Scholar
B. L. H. S. Shao and S. H. E. Buldyrev S, “Structure of shells in complex networks,” Physical Review E, vol. 80, no. 3, 2009.
View at: Google Scholar
B. A. Alvarez-Hamelin and V. A. Dall’Asta, “K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases,” NHM, vol. 3, no. 2, p. 371, 2008.
View at: Google Scholar
J. E. Cohen, “Infectious diseases of humans: dynamics and control,” JAMA: The Journal of the American Medical Association, vol. 268, no. 23, p. 3381, 1992.
View at: Publisher Site | Google Scholar
M. G. Kendall, “A new measure of rank correlation,” Biometrika, vol. 30, no. 1-2, pp. 81–93, 1938.
View at: Publisher Site | Google Scholar
O. Fercoq, M. Akian, M. Bouhtou, and S. Gaubert, “Ergodic control and polyhedral approaches to pagerank optimization,” IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 134–148, 2013.
View at: Publisher Site | Google Scholar
Z. Jovanović, E. Milovanović, and I. Milovanović, “Some remarks on laplacian eigenvalues of connected graphs,” Linear Algebra and Its Applications, vol. 503, pp. 48–55, 2016.
View at: Google Scholar
C. Canali and R. Lancellotti, “A quantitative methodology based on component analysis to identify key users in social networks,” International Journal of Social Network Mining, vol. 1, no. 1, pp. 27–50, 2012.
View at: Publisher Site | Google Scholar
R. Elhesha, T. Kahveci, and B. Baiser, “Motif centrality in food web networks,” Journal of Complex Networks, vol. 5, no. 4, pp. 641–664, 2017.
View at: Publisher Site | Google Scholar
Z. Yu, J. Shao, Q. Yang, and Z. Sun, “Profitleader: identifying leaders in networks with profit capacity,” World Wide Web, vol. 22, no. 2, pp. 533–553, 2019.
View at: Publisher Site | Google Scholar
Z.-K. Bao, J.-G. Liu, and H.-F. Zhang, “Identifying multiple influential spreaders by a heuristic clustering algorithm,” Physics Letters A, vol. 381, no. 11, pp. 976–983, 2017.
View at: Publisher Site | Google Scholar
H. Mo, C. Gao, and Y. Deng, “Evidential method to identify influential nodes in complex networks,” Journal of Systems Engineering and Electronics, vol. 26, no. 2, p. 381, 2015.
View at: Publisher Site | Google Scholar
T. Bian and Y. Deng, “A new evidential methodology of identifying influential nodes in complex networks,” Chaos, Solitons & Fractals, vol. 103, pp. 101–110, 2017.
View at: Google Scholar
T. Bian, J. Hu, and Y. Deng, “Identifying influential nodes in complex networks based on ahp,” Physica A: Statistical Mechanics and Its Applications, vol. 479, pp. 422–436, 2017.
View at: Publisher Site | Google Scholar
A. Zeng and C.-J. Zhang, “Ranking spreaders by decomposing complex networks,” Physics Letters A, vol. 377, no. 14, pp. 1031–1035, 2013.
View at: Publisher Site | Google Scholar
J.-H. Lin, Q. Guo, W.-Z. Dong, L.-Y. Tang, and J.-G. Liu, “Identifying the node spreading influence with largest k -core values,” Physics Letters A, vol. 378, no. 45, pp. 3279–3284, 2014.
View at: Publisher Site | Google Scholar
J. Bae and S. Kim, “Identifying and ranking influential spreaders in complex networks by neighborhood coreness,” Physica A: Statistical Mechanics and Its Applications, vol. 395, pp. 549–559, 2014.
View at: Publisher Site | Google Scholar
L.-l. Ma, C. Ma, H.-F. Zhang, and B.-H. Wang, “Identifying influential spreaders in complex networks based on gravity formula,” Physica A: Statistical Mechanics and Its Applications, vol. 451, pp. 205–212, 2016.
View at: Publisher Site | Google Scholar
Y. Liu, B. Wei, Y. Du, F. Xiao, and Y. Deng, “Identifying influential spreaders by weight degree centrality in complex networks,” Chaos, Solitons & Fractals, vol. 86, pp. 1–7, 2016.
View at: Publisher Site | Google Scholar
Y. Liu, M. Tang, T. Zhou, and Y. Do, “Identify influential spreaders in complex networks, the role of neighborhood,” Physica A: Statistical Mechanics and Its Applications, vol. 452, pp. 289–298, 2016.
View at: Publisher Site | Google Scholar
J. Wang, X. Hou, K. Li, and Y. Ding, “A novel weight neighborhood centrality algorithm for identifying influential spreaders in complex networks,” Physica A: Statistical Mechanics and Its Applications, vol. 475, pp. 88–105, 2017.
View at: Publisher Site | Google Scholar
B. Wei, J. Liu, D. Wei, C. Gao, and Y. Deng, “Weighted k-shell decomposition for complex networks based on potential edge weights,” Physica A: Statistical Mechanics and Its Applications, vol. 420, pp. 277–283, 2015.
View at: Publisher Site | Google Scholar
P. M. Gleiser and L. Danon, “Community structure in jazz,” Advances in Complex Systems, vol. 06, no. 04, pp. 565–573, 2003.
View at: Publisher Site | Google Scholar
C. Gao, X. Lan, X. Zhang, and Y. Deng, “A bio-inspired methodology of identifying influential nodes in complex networks,” PLoS One, vol. 8, no. 6, 2013.
View at: Google Scholar
D. Gómez, J. R. Figueira, and A. Eusébio, “Modeling centrality measures in social network analysis using bi-criteria network flow optimization problems,” European Journal of Operational Research, vol. 226, no. 2, pp. 354–365, 2013.
View at: Publisher Site | Google Scholar
W. Zang, P. Zhang, C. Zhou, and L. Guo, “Locating multiple sources in social networks under the sir model: a divide-and-conquer approach,” Journal of Computational Science, vol. 10, pp. 278–287, 2015.
View at: Publisher Site | Google Scholar
L. Fei, Q. Zhang, and Y. Deng, “Identifying influential nodes in complex networks based on the inverse-square law,” Physica A: Statistical Mechanics and Its Applications, vol. 512, pp. 1044–1059, 2018.
View at: Publisher Site | Google Scholar
D. Brockmann and D. Helbing, “The hidden geometry of complex, network-driven contagion phenomena,” Science, vol. 342, no. 6164, pp. 1337–1342, 2013.
View at: Publisher Site | Google Scholar
V. Batagelj and M. Zaveršnik, “Fast algorithms for determining (generalized) core groups in social networks,” Advances in Data Analysis and Classification, vol. 5, no. 2, pp. 129–145, 2011.
View at: Publisher Site | Google Scholar
J. Coleman, E. Katz, and H. Menzel, “The diffusion of an innovation among physicians,” Sociometry, vol. 20, no. 4, pp. 253–270, 1957.
View at: Publisher Site | Google Scholar
R. Guimer, L. Danon, A. Daz-Guilera, F. Giralt, and A. Arenas, “Self-similar community structure in a network of human interactions,” Physical Review E, vol. 68, no. 6, Article ID 065103, 2003.
View at: Google Scholar
L. Šubelj and M. Bajec, “Robust network community detection using balanced propagation,” European Physical Journal B, vol. 81, no. 3, pp. 353–362, 2011.
View at: Google Scholar
Aviation Administration, “Air traffic control system command center,” 2017, http://www.fly.faa.gov/.
View at: Google Scholar
Hamsterster friendships network dataset–KONECT (Apr. 2017), http://konect.uni-koblenz.de/networks/petster-friendships-hamster.
D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 1, pp. 440–442, 1998.
View at: Google Scholar
C. Castillo, K. Chellapilla, and L. Denoyer, “Web spam challenge,” in Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb, Beijing, China, 2008.
View at: Google Scholar
M. Bogu, R. Pastor-Satorras, A. Daz-Guilera, and A. Arenas, “Models of social networks based on social distance attachment,” Physical Review E, vol. 70, no. 5, Article ID 056122, 2004.
View at: Google Scholar
J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: densification and shrinking diameters,” ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, pp. 1–40, 2007.
View at: Google Scholar
L. J. Allen, “Some discrete-time si, sir, and sis epidemic models,” Mathematical Biosciences, vol. 124, no. 1, pp. 83–105, 1994.
View at: Google Scholar
Y. Gang, Z. Tao, W. Jie, F. Zhong-Qian, and W. Bing-Hong, “Epidemic spread in weighted scale-free networks,” Chinese Physics Letters, vol. 22, no. 2, p. 510, 2005.
View at: Google Scholar
M. G. Kendall, “The treatment of ties in ranking problems,” Biometrika, vol. 33, no. 3, pp. 239–251, 1945.
View at: Google Scholar
W. R. Knight, “A computer method for calculating kendall’s tau with ungrouped data,” Journal of the American Statistical Association, vol. 61, no. 314, pp. 436–439, 1966.
View at: Google Scholar
Y. Yang, X. Wang, Y. Chen, and M. Hu, “Identifying key nodes in complex networks based on global structure,” IEEE Access, vol. 8, pp. 32904–32913, 2020.
View at: Google Scholar
M. Kitsak, L. K. Gallos, S. Havlin et al., “Identification of influential spreaders in complex networks,” Nature Physics, vol. 6, no. 11, pp. 888–893, 2010.
View at: Google Scholar

Copyright

Copyright © 2021 Aman Ullah et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2234

Downloads

1280

Citations

Complexity

Identification of Influential Nodes via Effective Distance-based Centrality Mechanism in Complex Networks

Abstract

1. Introduction

2. Related Work

3. EDBC Model

3.1. Preliminaries

3.2. EDBC Algorithm

3.2.1. Computation of the Power of K-Shell

3.2.2. Calculation of the Effective Distance

3.2.3. Calculation of the Neighbor Nodes’ Influence

3.2.4. Computation of the Influence for Each Node

3.3. Computational Complexity

4. Experimentation and Results’ Analysis

4.1. Comparison of Some Benchmark Centralities

4.2. Data Description

4.3. Evaluation Metrics

4.3.1. SIR Model

4.3.2. Kendall’s Correlation Coefficient

4.4. Performance Evaluation

5. Conclusion and Future Recommendations

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright