Research Article  Open Access
Mining Important Nodes in Directed Weighted Complex Networks
Abstract
In complex networks, mining important nodes has been a matter of concern by scholars. In recent years, scholars have focused on mining important nodes in undirected unweighted complex networks. But most of the methods are not applicable to directed weighted complex networks. Therefore, this paper proposes a TwoWayPageRank method based on PageRank for further discussion of mining important nodes in directed weighted complex networks. We have mainly considered the frequency of contact between nodes and the length of time of contact between nodes. We have considered the source of the nodes (indegree) and the whereabouts of the nodes (outdegree) simultaneously. We have given node important performance indicators. Through numerical examples, we analyze the impact of variation of some parameters on node important performance indicators. Finally, the paper has verified the accuracy and validity of the method through empirical network data.
1. Introduction
Complex networks are composed of nodes and edges between nodes [1]. But the importance of each node is different in most cases; that is, different nodes have different weights. In the actual network, identifying some important nodes is crucial for understanding and controlling the whole networks [2–5].
Currently, most of the algorithms for mining important nodes focused on undirected unweighted complex networks. For example, degree centrality [6] represents that the more the number of neighboring nodes is, the more important the nodes are. Information index [7] depends on the amount of information through its propagation path. Kitsak et al. [8] proposed shell decomposition. In addition, there are many other concepts such as closeness centrality [9], subgraph centrality [10], eigenvector centrality [11], and cumulative nomination [12] which were proposed to evaluate the importance of the nodes in the networks. In those regards, Ren and Lv [13], Wang and Zhang [14], and He et al. [15] have done excellent summaries. Further, Sun and Luo [16] and Liu et al. [17] also summarized the history of the methods of mining important nodes in complex networks and summarized the research results. Most of the discussions focused on undirected unweighted complex networks. However, undirected unweighted complex networks only reflect the connections between the nodes and the topology of complex networks. They cannot describe the directions and intensities of interaction between nodes, since when the networks are abstracted into simple undirected unweighted networks they would lose a lot of existing information which is helpful to analyze accuracy. So, some scholars [18–21] have begun to research on it. Among them, Hu [22] proposed an evaluation method for the importance of a node in directed weighted complex networks based on PageRankDWNodeRank evaluation method. Chen [23] improved DWNodeRank algorithm and proposed BDWNodeRank algorithm. The literature [24] analyzed the structural characteristics of weighted complex networks considering influence of the weight of an edge on nodes. Since there are many factors to identify the influential nodes, this issue can be seen as a multiattribute decision making model (MADM) [25, 26]. Many MADM methods, such as fuzzy sets [27] and evidence theory [28, 29], are widely used to ranking the nodes in complex networks [30]. The author gave a new definition on important degree of weighted nodes.
These literatures did not consider the outdegree of a node. In addition, PageRank [31] developed by Google founders Brin and Page at Stanford University considers that the most important pages on the Internet are the pages with the most links leading to them. In other words, the importance of a web page focuses on its inbound (indegree), rather than its link out (outdegree). Indeed, the importance of a node depends on its indegree and its outdegree. For example, the importance of a school is determined by its appeal to students (its indegree) and students employment (its outdegree). The importance of a person depends on how many people he/she can attract and what he/she is concerned about. In addition, another major feature of this paper is that we have mainly considered the frequency of contact between nodes and the length of time of contact between nodes as the weight of an edge.
This paper proposes the TwoWayPageRank method based on PageRank and analyzes the importance of two important factors that affect the importance of the nodes and gives the definition and expression of the importance of the nodes. Secondly, we give the expression of the importance of nodes. Subsequently, the effects of some parameters to the results are analyzed through numerical simulation. Finally, the conclusions are given.
2. Preliminaries
A directed weighted network is a tuple . is a finite set, and the elements of are called nodes; that is, represents node . And is a set of ordered tuples in . The elements of are called nodes edges with ( is the cardinality of a set). The indices , run from 1 to , where is the size of the network. In a directed network, the edges are formed by ordered pairs of nodes, so represents an edge from to . is a set of edgeweights. is the value of edgeweight. is defined as a weighted network if could be any real number greater than 0. In this paper, we consider a directed weighted network, and is defined as follows.
Because is a directed network, each has a weightedoutdegree for and a weightedindegree for . is the sum of the weighted of edges which point out from . Similarly, is the sum of the weighted of edges which point to . The weightedindegree and weightedoutdegree of nodes are thus related by the following expressions: where is the set of nodes which point out from . Analogously, is the set of nodes which point to .
Generally speaking, the strength of a relationship depends primarily on two factors: the frequency of contact and the length of contact (intimacy). In various practical networks, represents a different meaning. In the paper, it represents the strength of a relationship between and . If and represent persons, means the closeness between them. The larger the frequency of contact and the length of time of contact (intimacy) are, the closer the relationship between them is. And these two factors are calculated based on [32] as follows.
(a) Frequency Factor. It depends on the frequency of pointing to , that is, the number of times that takes the initiative to meet with . Here, is frequency factors of pointing to , and is the number of times that takes the initiative to meet with . is the number of times that all of the nodes take the initiative to meet with .
(b) Length of Contact (Intimacy). That is the length of time contact. It depends on the length of time that takes the initiative to contact with .where is the length of contact of pointing to . is the length of contact time that takes the initiative to meet with . is the length of contact time that all of nodes take the initiative to meet with . Then with , variable parameters, and , , .
We should note that, generally speaking, .
3. The TwoWayPageRank for Mining Important Nodes in Directed Weighted Complex Networks
Indeed, the importance of a node depends on its indegree (its links source) and its outdegrees (its links whereabouts). Hence, here we consider the importance of a node from both its weightedindegree and its weightedoutdegrees simultaneously.
3.1. The Definition of the Importance of in Directed Weighted Complex Networks
Let be the importance of in directed weighted complex networks. We assume that the nodes which point to are (including ), and then the sum of the weightedindegrees of is as follows:Then has got an importance of value from the importance of :
Likewise, we assume that the nodes to which points are (including node ), and then the sum of the weightedoutdegrees of is defined by Then node has got an importance of value from the importance of : Then with and own importance of , . Moreover, are random jump factors and . in (9a) is the ending node of the edge that starts from node . As well, in (9b) is the node which points into node . is weight value of edge and is weight value of edge .
3.2. The Algorithm of the TwoWayPageRank
Let be the adjacency matrix of the directed weighted complex networks , and whose elements are the weight on the edge connecting to , and 0 otherwise. Here we use the convention . ThenHere we normalize processing for firstly; in other words, each element of the matrix is divided by the sum of the elements in its row. Thus we get the probability transition matrix , and it can be written aswith the transfer probability from to . Obviously, each element of the matrix is nonnegative. The sum of elements of each row is 1, and that is . So it is a random matrix. Let the probability transition matrix make transpose and get the probability transition matrix . The reason for transposing to the matrix is that we consider weightedindegrees of the node.
In addition, we normalize processing for ; namely, each element of the matrix is divided by the sum of the elements in its column. Thus we then obtain the probability transition matrix .where is the transfer probability from to . It is not difficult to find each element of the matrix is nonnegative. The sum of elements of each column is 1, and . Likewise, it is a random matrix. Similarly, the reason for doing this to the matrix is that we consider weightedoutdegrees of the node. Then according to (9a) and (9b), the equations for the matrix and the matrix can be explicitly solved, obtaining in which is the matrix whose elements are 1.
It is not difficult to find that matrix and matrix are irreducible random matrixes, and they have an eigenvalue for 1. The eigenvectors of eigenvalue 1 are the stationary distributions of the matrix and the matrix .
We can use the power iteration method to compute the stationary distributions of the matrix and the matrix . The iterative formulas are as follows:
Set the initial values of the importance of weightedoutdegree and weightedindegree for , , respectively. So , . Here, for simplicity, let the initial vector be the ratio of the weightedoutdegree (outdegree) of and the sum of the weightedindegrees (outdegree) of all nodes in the network by the following expressions.
Given a precision error . The iteration would stop when , . At this time we get approximations and with and . Finally, calculate the formula . Further, we rank the elements of from big to small. It is the order of the importance of nodes.
It is worth noticing, however, that this indicates .
The algorithm steps of the importance of mining important nodes in directed weighted complex networks are described in the following.
Step 1. Give the adjacency matrix of , .
Step 2. Normalize processing for the adjacency matrix , and get the probability transition matrices and .
Step 3. Let the probability transition matrix make transpose and get the probability transition matrix .
Step 4. Calculate matrices and of the directed weighted complex networks according to (9a) and (9b).
Step 5. Solve the stationary distributions of the matrix and the matrix using the power iteration method. And that is to calculate and .
Step 6. Count and rank the elements of from big to small. It is the order of the importance of nodes.
4. Experiment Simulation
In the section, we show the application of the method on a directed weighted network (see Figure 1). Its adjacency matrix can be expressed as .
According to (16a) and (16b), we get and as shown below = , , = , . Further, , , and are given when , , and , respectively, as shown in Table 1. The ranks of the importance nodes are defined by , , and , respectively, as shown in Table 2.


In an actual network, according to the actual situation, at the same time we consider the weightedoutdegree and the weightedindegree of nodes for mining important nodes accurately relatively. Figures 2–4 provide some comparisons between the actual value of , the rank of , and the rank of for several networks when takes different values. In Figure 2, the rank of basically tallies with the rank of . This is because we mainly consider the impact of outdegree of a node on the importance of the node. For realizations of random networks, the outdegree provides almost complete information. In Figure 3, the rank of is associated with both and at the same time. This is because we give them almost the same weight. In Figure 4, the rank of basically tallies with the rank of . This is because we mainly consider the impact of indegree of a node on the importance of the node. For realizations of random networks, the indegree provides almost complete information. At this time, the numerical calculation tallies with the experiment result using the PageRank.
The results of relate to the indegree and outdegree of a node, so when only considering indegree or outdegree of a node it is clearly not enough. When considering simultaneously the indegree and outdegree of the nodes, we can better find the important nodes. Through the example, we can mine the important nodes preferably using the method of the TwoWayPageRank.
5. Conclusions and Discussion
Recently, research on complex networks has shown that some real networks exhibit the property of important nodes. Some nodes play an important role in the actual network and control the entire network. Some different physical quantities are considered in the definitions of important nodes of complex networks. However, the existing studies on the importance nodes mainly have focused on undirected unweighted complex networks. The previous analytical study does not accurately reflect the actual information on the networks.
In this paper, we therefore addressed the problem of mining important nodes in directed weighted complex networks by constructing a novel TwoWayPageRank analysis method. We have presented a quantifiable metrics and shown how it can be used to analyze the relative importance of nodes in a network with respect to the contributions nodes which make the overall network connectivity. Numerical examples of real directed weighted complex networks show that when only considering the indegree or outdegree of a node, the importance of the node cannot well be characterized. The TwoWayPageRank analysis method proposed can well reveal the importance of the node of directed weighted complex networks such as the infectious disease networks and social networks. To sum up, the proposed method is capable of revealing the importance of the node of directed weighted complex networks. These results not only deepen our understanding of the interplay between network topology and dynamical processes but also have implications in all areas where ranking has a role, from social network to marketing.
Our algorithm has been verified in small networks. In future work, we will further build a real data set and verify the algorithm. In addition, what is the relationship between the accuracy of the results and the number of iterations? How can we mine important nodes in directed weighted dynamic complex networks? In future work, we hope to address this problem more systematically.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This work was supported by Innovation Foundations of Education for Graduate Students of Shanxi Province (no. 2016BY061) and also supported by the National Natural Science Foundation of China (nos. 61503271, 61402319, and 61603267). Additionally, the authors would like to thank Z. M. Gao for valuable insights on experimental results analysis and useful feedback on the manuscript. Finally, they particularly thank Y. P. Liang with expertise in technical English editing for the English of this manuscript being improved.
References
 K. Thulasiraman and M. N. S. Swamy, Graphs: Theory and Algorithms, John Wiley & Sons, 2011.
 P. S. Dodds, R. Muhamad, and D. J. Watts, “An experimental study of search in global social networks,” Science, vol. 301, no. 5634, pp. 827–829, 2003. View at: Publisher Site  Google Scholar
 M. E. Newman, Networks: An Introduction, Oxford University Press, Oxford, UK, 2010. View at: Publisher Site  MathSciNet
 H. Wang, J. Huang, X. Xu, and Y. Xiao, “Damage attack on complex networks,” Physica A: Statistical Mechanics and its Applications, vol. 408, pp. 134–148, 2014. View at: Publisher Site  Google Scholar
 R. PastorSatorras and A. Vespignani, “Epidemic spreading in scalefree networks,” Physical Review Letters, vol. 86, no. 14, pp. 3200–3203, 2001. View at: Publisher Site  Google Scholar
 P. Bonacich, “Factoring and weighting approaches to status scores and clique identification,” The Journal of Mathematical Sociology, vol. 2, no. 1, pp. 113–120, 1972. View at: Publisher Site  Google Scholar
 P. Bonacich, “Power and centrality: a family of measures,” American Journal of Sociology, vol. 92, no. 5, pp. 1170–1182, 1987. View at: Publisher Site  Google Scholar
 M. Kitsak, L. K. Gallos, S. Havlin et al., “Identification of influential spreaders in complex networks,” Nature Physics, vol. 6, no. 11, pp. 888–893, 2010. View at: Publisher Site  Google Scholar
 L. C. Freeman, “Centrality in social networks conceptual clarification,” Social Networks, vol. 1, no. 3, pp. 215–239, 1978. View at: Publisher Site  Google Scholar
 E. Estrada and J. A. RodriguezVelazquez, “Subgraph centrality in complex networks,” Physical Review E, vol. 71, no. 5, pp. 1539–3755, 2005. View at: Google Scholar
 K. Stephenson and M. Zelen, “Rethinking centrality: methods and examples,” Social Networks, vol. 11, no. 1, pp. 1–37, 1989. View at: Publisher Site  Google Scholar
 R. Poulin, M.C. Boily, and B. R. Mâsse, “Dynamical systems to define centrality in social networks,” Social Networks, vol. 22, no. 3, pp. 187–220, 2000. View at: Publisher Site  Google Scholar
 X. L. Ren and L. Y. Lv, “Review of ranking nodes in complex networks,” Chinese Science Bulletin, vol. 13, pp. 4–7, 2014. View at: Google Scholar
 L. Wang and J. J. Zhang, “Centralization of complex networks,” Complex System and Complex Science, vol. 3, no. 1, pp. 13–20, 2006. View at: Google Scholar
 N. He, D. Y. Li, W. Y. Gan, and X. Zhu, “Mining vital nodes in complex networks,” Computer Science, vol. 34, no. 12, pp. 1–5, 2008. View at: Google Scholar
 R. Sun and W. B. Luo, “Review on evaluation of node importance in public opinion,” Application Research of Computers, vol. 29, no. 10, pp. 3606–3608, 2012. View at: Google Scholar
 J. G. Liu, Z. M. Ren, Q. Guo, and B. H. Wang, “Node importance ranking of complex networks,” Acta Physica Sinica, vol. 62, no. 17, Article ID 178901, 2013. View at: Google Scholar
 N. Sett, S. Ranbir Singh, and S. Nandi, “Influence of edge weight on node proximity based link prediction methods: an empirical analysis,” Neurocomputing, vol. 172, pp. 71–83, 2016. View at: Publisher Site  Google Scholar
 L. Lü, D. Chen, X.L. Ren, Q.M. Zhang, Y.C. Zhang, and T. Zhou, “Vital nodes identification in complex networks,” Physics Reports, vol. 650, pp. 1–63, 2016. View at: Publisher Site  Google Scholar
 D. J. Robinaugh, A. J. Millner, and R. J. McNally, “Identifying highly influential nodes in the complicated grief network,” Journal of Abnormal Psychology, vol. 125, no. 6, pp. 747–757, 2016. View at: Publisher Site  Google Scholar
 L. Lü, T. Zhou, Q.M. Zhang, and H. E. Stanley, “The Hindex of a network node and its relation to degree and coreness,” Nature Communications, vol. 7, Article ID 10168, 2016. View at: Publisher Site  Google Scholar
 M. Y. Hu, Identification Method for Key Nodes in DirectedWeighted Complex Networks Based on Link Structures, Nanjing University of Science and Technology, 2012.
 L. H. Chen, Research of Stabiligy and Identification of Key Nodes in Directedweighted Complex Networks, Nanjing University of Science and Technology, 2014.
 S. W. Li, Research of Weighted Complex Network Evolution Model and Vital Nodes, Hefei University of Technology, 2010.
 S. Wang, Y. Du, and Y. Deng, “A new measure of identifying influential nodes: efficiency centrality,” Communications in Nonlinear Science and Numerical Simulation, vol. 47, pp. 151–163, 2017. View at: Google Scholar
 Y. Yang and G. Xie, “Efficient identification of node importance in social networks,” Information Processing & Management, vol. 52, no. 5, pp. 911–922, 2016. View at: Google Scholar
 R. Zhang, X. Ran, C. Wang, and Y. Deng, “Fuzzy evaluation of network vulnerability,” Quality and Reliability Engineering International, vol. 32, no. 5, pp. 1715–1730, 2016. View at: Publisher Site  Google Scholar
 X. Zhou, Y. Shi, X. Deng, and Y. Deng, “DDEMATEL: a new method to identify critical success factors in emergency management,” Safety Science, vol. 91, pp. 93–104, 2017. View at: Google Scholar
 H. M. Mo and Y. Deng, “A new aggregating operator for linguistic information based on d numbers,” International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 24, no. 6, pp. 831–846, 2016. View at: Publisher Site  Google Scholar
 H. Mo, C. Gao, and Y. Deng, “Evidential method to identify influential nodes in complex networks,” Journal of Systems Engineering and Electronics, vol. 26, no. 2, Article ID 7111175, pp. 381–387, 2015. View at: Publisher Site  Google Scholar
 S. Brin and L. Page, “The anatomy of a largescale hypertextual Web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1, pp. 107–117, 1998. View at: Google Scholar
 K. Zhang, P. Li, B. Zhu, and M. Hu, “Evaluation method for node importance in directedweighted complex networks based on PageRank,” Journal of Nanjing University of Aeronautics and Astronautics, vol. 45, no. 3, pp. 429–434, 2013. View at: Google Scholar
Copyright
Copyright © 2017 Yunyun Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.