Mining Important Nodes in Directed Weighted Complex Networks
In complex networks, mining important nodes has been a matter of concern by scholars. In recent years, scholars have focused on mining important nodes in undirected unweighted complex networks. But most of the methods are not applicable to directed weighted complex networks. Therefore, this paper proposes a Two-Way-PageRank method based on PageRank for further discussion of mining important nodes in directed weighted complex networks. We have mainly considered the frequency of contact between nodes and the length of time of contact between nodes. We have considered the source of the nodes (in-degree) and the whereabouts of the nodes (out-degree) simultaneously. We have given node important performance indicators. Through numerical examples, we analyze the impact of variation of some parameters on node important performance indicators. Finally, the paper has verified the accuracy and validity of the method through empirical network data.
Complex networks are composed of nodes and edges between nodes . But the importance of each node is different in most cases; that is, different nodes have different weights. In the actual network, identifying some important nodes is crucial for understanding and controlling the whole networks [2–5].
Currently, most of the algorithms for mining important nodes focused on undirected unweighted complex networks. For example, degree centrality  represents that the more the number of neighboring nodes is, the more important the nodes are. Information index  depends on the amount of information through its propagation path. Kitsak et al.  proposed -shell decomposition. In addition, there are many other concepts such as closeness centrality , subgraph centrality , eigenvector centrality , and cumulative nomination  which were proposed to evaluate the importance of the nodes in the networks. In those regards, Ren and Lv , Wang and Zhang , and He et al.  have done excellent summaries. Further, Sun and Luo  and Liu et al.  also summarized the history of the methods of mining important nodes in complex networks and summarized the research results. Most of the discussions focused on undirected unweighted complex networks. However, undirected unweighted complex networks only reflect the connections between the nodes and the topology of complex networks. They cannot describe the directions and intensities of interaction between nodes, since when the networks are abstracted into simple undirected unweighted networks they would lose a lot of existing information which is helpful to analyze accuracy. So, some scholars [18–21] have begun to research on it. Among them, Hu  proposed an evaluation method for the importance of a node in directed weighted complex networks based on PageRank-DWNodeRank evaluation method. Chen  improved DWNodeRank algorithm and proposed B-DWNodeRank algorithm. The literature  analyzed the structural characteristics of weighted complex networks considering influence of the weight of an edge on nodes. Since there are many factors to identify the influential nodes, this issue can be seen as a multiattribute decision making model (MADM) [25, 26]. Many MADM methods, such as fuzzy sets  and evidence theory [28, 29], are widely used to ranking the nodes in complex networks . The author gave a new definition on important degree of weighted nodes.
These literatures did not consider the out-degree of a node. In addition, PageRank  developed by Google founders Brin and Page at Stanford University considers that the most important pages on the Internet are the pages with the most links leading to them. In other words, the importance of a web page focuses on its inbound (in-degree), rather than its link out (out-degree). Indeed, the importance of a node depends on its in-degree and its out-degree. For example, the importance of a school is determined by its appeal to students (its in-degree) and students employment (its out-degree). The importance of a person depends on how many people he/she can attract and what he/she is concerned about. In addition, another major feature of this paper is that we have mainly considered the frequency of contact between nodes and the length of time of contact between nodes as the weight of an edge.
This paper proposes the Two-Way-PageRank method based on PageRank and analyzes the importance of two important factors that affect the importance of the nodes and gives the definition and expression of the importance of the nodes. Secondly, we give the expression of the importance of nodes. Subsequently, the effects of some parameters to the results are analyzed through numerical simulation. Finally, the conclusions are given.
A directed weighted network is a tuple . is a finite set, and the elements of are called nodes; that is, represents node . And is a set of ordered tuples in . The elements of are called nodes edges with ( is the cardinality of a set). The indices , run from 1 to , where is the size of the network. In a directed network, the edges are formed by ordered pairs of nodes, so represents an edge from to . is a set of edge-weights. is the value of edge-weight. is defined as a weighted network if could be any real number greater than 0. In this paper, we consider a directed weighted network, and is defined as follows.
Because is a directed network, each has a weighted-out-degree for and a weighted-in-degree for . is the sum of the weighted of edges which point out from . Similarly, is the sum of the weighted of edges which point to . The weighted-in-degree and weighted-out-degree of nodes are thus related by the following expressions: where is the set of nodes which point out from . Analogously, is the set of nodes which point to .
Generally speaking, the strength of a relationship depends primarily on two factors: the frequency of contact and the length of contact (intimacy). In various practical networks, represents a different meaning. In the paper, it represents the strength of a relationship between and . If and represent persons, means the closeness between them. The larger the frequency of contact and the length of time of contact (intimacy) are, the closer the relationship between them is. And these two factors are calculated based on  as follows.
(a) Frequency Factor. It depends on the frequency of pointing to , that is, the number of times that takes the initiative to meet with . Here, is frequency factors of pointing to , and is the number of times that takes the initiative to meet with . is the number of times that all of the nodes take the initiative to meet with .
(b) Length of Contact (Intimacy). That is the length of time contact. It depends on the length of time that takes the initiative to contact with .where is the length of contact of pointing to . is the length of contact time that takes the initiative to meet with . is the length of contact time that all of nodes take the initiative to meet with . Then with , variable parameters, and , , .
We should note that, generally speaking, .
3. The Two-Way-PageRank for Mining Important Nodes in Directed Weighted Complex Networks
Indeed, the importance of a node depends on its in-degree (its links source) and its out-degrees (its links whereabouts). Hence, here we consider the importance of a node from both its weighted-in-degree and its weighted-out-degrees simultaneously.
3.1. The Definition of the Importance of in Directed Weighted Complex Networks
Let be the importance of in directed weighted complex networks. We assume that the nodes which point to are (including ), and then the sum of the weighted-in-degrees of is as follows:Then has got an importance of value from the importance of :
Likewise, we assume that the nodes to which points are (including node ), and then the sum of the weighted-out-degrees of is defined by Then node has got an importance of value from the importance of : Then with and own importance of , . Moreover, are random jump factors and . in (9a) is the ending node of the edge that starts from node . As well, in (9b) is the node which points into node . is weight value of edge and is weight value of edge .
3.2. The Algorithm of the Two-Way-PageRank
Let be the adjacency matrix of the directed weighted complex networks , and whose elements are the weight on the edge connecting to , and 0 otherwise. Here we use the convention . ThenHere we normalize processing for firstly; in other words, each element of the matrix is divided by the sum of the elements in its row. Thus we get the probability transition matrix , and it can be written aswith the transfer probability from to . Obviously, each element of the matrix is nonnegative. The sum of elements of each row is 1, and that is . So it is a random matrix. Let the probability transition matrix make transpose and get the probability transition matrix . The reason for transposing to the matrix is that we consider weighted-in-degrees of the node.
In addition, we normalize processing for ; namely, each element of the matrix is divided by the sum of the elements in its column. Thus we then obtain the probability transition matrix .where is the transfer probability from to . It is not difficult to find each element of the matrix is nonnegative. The sum of elements of each column is 1, and . Likewise, it is a random matrix. Similarly, the reason for doing this to the matrix is that we consider weighted-out-degrees of the node. Then according to (9a) and (9b), the equations for the matrix and the matrix can be explicitly solved, obtaining in which is the matrix whose elements are 1.
It is not difficult to find that matrix and matrix are irreducible random matrixes, and they have an eigenvalue for 1. The eigenvectors of eigenvalue 1 are the stationary distributions of the matrix and the matrix .
We can use the power iteration method to compute the stationary distributions of the matrix and the matrix . The iterative formulas are as follows:
Set the initial values of the importance of weighted-out-degree and weighted-in-degree for , , respectively. So , . Here, for simplicity, let the initial vector be the ratio of the weighted-out-degree (out-degree) of and the sum of the weighted-in-degrees (out-degree) of all nodes in the network by the following expressions.
Given a precision error . The iteration would stop when , . At this time we get approximations and with and . Finally, calculate the formula . Further, we rank the elements of from big to small. It is the order of the importance of nodes.
It is worth noticing, however, that this indicates .
The algorithm steps of the importance of mining important nodes in directed weighted complex networks are described in the following.
Step 1. Give the adjacency matrix of , .
Step 2. Normalize processing for the adjacency matrix , and get the probability transition matrices and .
Step 3. Let the probability transition matrix make transpose and get the probability transition matrix .
Step 5. Solve the stationary distributions of the matrix and the matrix using the power iteration method. And that is to calculate and .
Step 6. Count and rank the elements of from big to small. It is the order of the importance of nodes.
4. Experiment Simulation
In the section, we show the application of the method on a directed weighted network (see Figure 1). Its adjacency matrix can be expressed as .
According to (16a) and (16b), we get and as shown below = , , = , . Further, , , and are given when , , and , respectively, as shown in Table 1. The ranks of the importance nodes are defined by , , and , respectively, as shown in Table 2.
In an actual network, according to the actual situation, at the same time we consider the weighted-out-degree and the weighted-in-degree of nodes for mining important nodes accurately relatively. Figures 2–4 provide some comparisons between the actual value of , the rank of , and the rank of for several networks when takes different values. In Figure 2, the rank of basically tallies with the rank of . This is because we mainly consider the impact of out-degree of a node on the importance of the node. For realizations of random networks, the out-degree provides almost complete information. In Figure 3, the rank of is associated with both and at the same time. This is because we give them almost the same weight. In Figure 4, the rank of basically tallies with the rank of . This is because we mainly consider the impact of in-degree of a node on the importance of the node. For realizations of random networks, the in-degree provides almost complete information. At this time, the numerical calculation tallies with the experiment result using the PageRank.
The results of relate to the in-degree and out-degree of a node, so when only considering in-degree or out-degree of a node it is clearly not enough. When considering simultaneously the in-degree and out-degree of the nodes, we can better find the important nodes. Through the example, we can mine the important nodes preferably using the method of the Two-Way-PageRank.
5. Conclusions and Discussion
Recently, research on complex networks has shown that some real networks exhibit the property of important nodes. Some nodes play an important role in the actual network and control the entire network. Some different physical quantities are considered in the definitions of important nodes of complex networks. However, the existing studies on the importance nodes mainly have focused on undirected unweighted complex networks. The previous analytical study does not accurately reflect the actual information on the networks.
In this paper, we therefore addressed the problem of mining important nodes in directed weighted complex networks by constructing a novel Two-Way-PageRank analysis method. We have presented a quantifiable metrics and shown how it can be used to analyze the relative importance of nodes in a network with respect to the contributions nodes which make the overall network connectivity. Numerical examples of real directed weighted complex networks show that when only considering the in-degree or out-degree of a node, the importance of the node cannot well be characterized. The Two-Way-PageRank analysis method proposed can well reveal the importance of the node of directed weighted complex networks such as the infectious disease networks and social networks. To sum up, the proposed method is capable of revealing the importance of the node of directed weighted complex networks. These results not only deepen our understanding of the interplay between network topology and dynamical processes but also have implications in all areas where ranking has a role, from social network to marketing.
Our algorithm has been verified in small networks. In future work, we will further build a real data set and verify the algorithm. In addition, what is the relationship between the accuracy of the results and the number of iterations? How can we mine important nodes in directed weighted dynamic complex networks? In future work, we hope to address this problem more systematically.
The authors declare that they have no competing interests.
This work was supported by Innovation Foundations of Education for Graduate Students of Shanxi Province (no. 2016BY061) and also supported by the National Natural Science Foundation of China (nos. 61503271, 61402319, and 61603267). Additionally, the authors would like to thank Z. M. Gao for valuable insights on experimental results analysis and useful feedback on the manuscript. Finally, they particularly thank Y. P. Liang with expertise in technical English editing for the English of this manuscript being improved.
K. Thulasiraman and M. N. S. Swamy, Graphs: Theory and Algorithms, John Wiley & Sons, 2011.
E. Estrada and J. A. Rodriguez-Velazquez, “Subgraph centrality in complex networks,” Physical Review E, vol. 71, no. 5, pp. 1539–3755, 2005.View at: Google Scholar
X. L. Ren and L. Y. Lv, “Review of ranking nodes in complex networks,” Chinese Science Bulletin, vol. 13, pp. 4–7, 2014.View at: Google Scholar
L. Wang and J. J. Zhang, “Centralization of complex networks,” Complex System and Complex Science, vol. 3, no. 1, pp. 13–20, 2006.View at: Google Scholar
N. He, D. Y. Li, W. Y. Gan, and X. Zhu, “Mining vital nodes in complex networks,” Computer Science, vol. 34, no. 12, pp. 1–5, 2008.View at: Google Scholar
R. Sun and W. B. Luo, “Review on evaluation of node importance in public opinion,” Application Research of Computers, vol. 29, no. 10, pp. 3606–3608, 2012.View at: Google Scholar
J. G. Liu, Z. M. Ren, Q. Guo, and B. H. Wang, “Node importance ranking of complex networks,” Acta Physica Sinica, vol. 62, no. 17, Article ID 178901, 2013.View at: Google Scholar
M. Y. Hu, Identification Method for Key Nodes in Directed-Weighted Complex Networks Based on Link Structures, Nanjing University of Science and Technology, 2012.
L. H. Chen, Research of Stabiligy and Identification of Key Nodes in Directed-weighted Complex Networks, Nanjing University of Science and Technology, 2014.
S. W. Li, Research of Weighted Complex Network Evolution Model and Vital Nodes, Hefei University of Technology, 2010.
S. Wang, Y. Du, and Y. Deng, “A new measure of identifying influential nodes: efficiency centrality,” Communications in Nonlinear Science and Numerical Simulation, vol. 47, pp. 151–163, 2017.View at: Google Scholar
Y. Yang and G. Xie, “Efficient identification of node importance in social networks,” Information Processing & Management, vol. 52, no. 5, pp. 911–922, 2016.View at: Google Scholar
X. Zhou, Y. Shi, X. Deng, and Y. Deng, “D-DEMATEL: a new method to identify critical success factors in emergency management,” Safety Science, vol. 91, pp. 93–104, 2017.View at: Google Scholar
S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1, pp. 107–117, 1998.View at: Google Scholar
K. Zhang, P. Li, B. Zhu, and M. Hu, “Evaluation method for node importance in directed-weighted complex networks based on PageRank,” Journal of Nanjing University of Aeronautics and Astronautics, vol. 45, no. 3, pp. 429–434, 2013.View at: Google Scholar