Abstract

We define several novel centrality metrics: the high-order degree and combined degree of undirected network, the high-order out-degree and in-degree and combined out out-degree and in-degree of directed network. Those are the measurement of node importance with respect to the number of the node neighbors. We also explore those centrality metrics in the context of several best-known networks. We prove that both the degree centrality and eigenvector centrality are the special cases of the high-order degree of undirected network, and both the in-degree and PageRank algorithm without damping factor are the special cases of the high-order in-degree of directed network. Finally, we also discuss the significance of high-order out-degree of directed network. Our centrality metrics work better in distinguishing nodes than degree and reduce the computation load compared with either eigenvector centrality or PageRank algorithm.

1. Introduction

The theory of network has gone through rapid development since the late 1990s. One of the hottest points is the research on the attributes of network. Node degree has always been considered as one of the most important and fundamental attributes. Many relative researches have defined other attributes of network based on node degree [1, 2], such as degree distribution [3, 4], clustering coefficient [5, 6], the characteristic path length [6], and so on. As early as 1960s, Rapoport [3, 4] emphasized the importance of the degree distribution in all kinds of real networks. Wasserman and Faust [5] introduced fraction of transitive triples in social network in 1994. In order to describe cliquishness of a typical neighborhood, Watts and Strogatz [6] defined clustering coefficient of general complex network in 1998 based on the fraction of transitive triples. Watts and Strogatz [6] also defined the characteristic path length to measure the typical separation between two nodes in the network. In more recent years, numerous researches paid attention to degree distribution [710]. Early researches believed that degree distribution followed Poisson Distribution just as the random network theory has described [11, 12]. Recent researches have found out that the degree distribution for a large number of networks, such as the World Wide Web [13], the Internet [14], the metabolic networks [15], genome-wide disruption networks for yeast [16], and the network of interregional direct investment stocks across Europe [17], have a power-law tail. Such networks are called scale-free [18].

When a practical problem is transformed into a complex network model, people tend to use node centrality to describe the importance and influence of a node, or people need to sort these nodes [19]. Node degree (or degree centrality) is one of the basic methods of sorting the nodes [2029]. Other common methods are also based on node degree [1, 2]. As long ago as 1948, Bavelas [30] studied the center of social network. Sabidussi et. al. [31] defined what it means for a network node to approach closeness. Freema [32] used node degree and betweenness to define two kinds of node centrality, and, furthermore, he used node centrality to define graph centrality. Network eigenvector centrality [33, 34] is often used to describe the importance of nodes in social network. In 1998, Brin and Page [35, 36] simplified the eigenvector centrality for undirected network into PageRank algorithm (), which is widely used in searching engine Google [36] and many other directed networks [3742]. By using the degree distribution of neighbor nodes, Ai [43] gave the definition of the neighbor vector centrality. Zeng [44] implemented the Mixed Degree Decomposition () procedure by using coreness centrality, and he defined the mixed degree of nodes. Bae [45] defined the kernel centrality of a node by using the kernel of the neighbor nodes.

The degree centrality used for sorting nodes has the advantages of simple calculation, but the results are not accurate enough. Therefore, it may require verification by other methods [20] or other network attributes [27, 28]. Closeness centrality and betweenness centrality are often given with degree centrality for comparison purpose [27, 28]. The computation of both closeness centrality and betweenness centrality are so complicated that big networks often need fast approximate algorithm [4648]. Even though the eigenvector centrality on undirected network and the PageRank on directed work can give satisfying results in nodes sorting, those two methods often involve expensive computations, such as iterations [42]. In the following paragraphs, we will define several novel centrality metrics, which are cheaper in computation compared with closeness centrality, betweenness centrality, eigenvector centrality on undirected network, and PageRank on directed network. We will show that node degree, the eigenvector centrality on undirected network and in-degree, and PageRank on directed network are all special cases of (or equivalent to) one of our novel centrality metrics (see Sections 4.1 and 4.2). It should be pointed out that the combined degree defined in our paper is different from the mixed degree in Zeng [44], which is the Mixed Degree Decomposition of coreness centrality (see Section 4.4).

2. Methods

Random walk has always been one of the most important methods in the research of complex network [4953]. Noh et. al. [49] derived the mean first passage time () between any two nodes by using random walk of complex network. Tejedor [50] computed the of a network based on a broad class of random walk. Rosvall et. al. [51] used the probability flow of random walks on a network as a proxy for information flow in the real system. Saramäki et. al. [52] generated scale-free networks based on selecting parent nodes by using random walk. Weng [53] used the newly defined mean first traverse distance () to describe anomalous random walks. Now let us think about an ideal random walk of a simple network (could be an undirected network or a directed network): choose a node of the network, which we call the origin node, put a ball at the origin node, and the ball obeys the following rules to split repeatedly and to walk randomly.(i)Splitting: the ball splits up into (or ) balls at the node , where is the degree of in undirected network and is the out-degree of in directed network.(ii)Random walk: after every split, the balls move to the adjacent nodes along the edges; in directed network they can only move along outgoing edges.(iii)Disappearance: balls that can no longer walk would disappear, for example, at the isolated nodes or dead nodes of directed network.

The number of repeats of the random walk would be different depending on the selection of network and/or the selection of origin node, such as in Figure 1.

In the undirected network, the number of balls after split is defined as the -order degree of the origin node , denoted by . Clearly, if , is the degree of (see Theorem 2, Section 4.1). In the directed network, the number of balls after split is called the s-order out-degree of the origin node , denoted by . Also, when , is the out-degree of (see Theorem 3, Section 4.2).

Consider a more complicated random walk of a directed network: select a network node , which we call the sink node. Then we put balls at every node in the network (including the sink node ), and all of the balls follow the above rules to split repeatedly and to walk randomly, the number of balls at the sink node after the random walk is defined as the -order in-degree of the sink node , denoted by . Clearly, if , is the in-degree of (see Theorem 3, Section 4.2).

2.1. 2-Order Degree

As for the undirected network, according to the above definition, 2-order degree of the node is the number of two-edge paths connected to . The two-edge paths may overlap. For example, Figure 1(e), both nodes have an overlap two-edge path, then the 2-order degree of both nodes is 1. Note that the 2-order degree is not necessarily equal to the number of neighbors of a node’s neighbors, since there may be more than one two-edge path between any two nodes (here the two nodes may be the same one). The computation of node 2-order degree is relatively simple. Suppose matrix is the adjacency matrix of the undirected network , we call the 2-order adjacency matrix, and then the sum of elements in each row (or each column) of is the 2-order degree of the corresponding node. Use to denote the sum of the -th row of matrix , then the 2-order degree of the -th node of the undirected network is .

A directed graph has both a 2-order in-degree and a 2-order out-degree for each node, which are the numbers of incoming and outgoing two-edge paths, respectively. Still denote the adjacency matrix as , and denote and as the sum of the -th row and -th column of the 2-order adjacency matrix , respectively. Then the 2-order out-degree and the 2-order in-degree of the -th node in this directed network are and , respectively.

If has vertices , the sequence is called thedegree sequence  of [54]. As we all know, the mean value and the distribution of degree sequence guide the definition of mean degree and degree distribution. We can obtain the mean 2-order degree and the 2-order degree distribution of the network by using the definition of 2-order degree.

In Figure 1, (a) the 2-order degree of the three nodes all is 2; (b) the 2-order out-degree and 2-order in-degree of the three nodes all are 1; (c) the 2-order out-degree and 2-order in-degree of the three nodes all are 0; (d) the 2-order out-degree of the three nodes is 1, 0, 0, and the 2-order in-degree of the three nodes is 0, 0, 1; (e) the 2-order degree of the two nodes is 1.

Figure 2 is a relatively complicated undirected network and a relatively complicated directed network. Table 1 shows the 2-order (in-/out-)degree of the all nodes and both mean 2-order degrees for the two networks in Figure 2. As we can see from Table 1, node 5 has the highest 2-order degree in Figure 2(a), which is the same as the common sense that node 5 is the most important node in Figure 2(a). However this conclusion cannot be obtained simply by calculating node degree. Of course we can obtain such conclusion by calculating node betweenness centrality or eigenvector centrality, and so forth. Another thing we can tell from Table 1 is that nodes having the highest 2-order out-degree are 1, 2, and 5, and the node having the highest 2-order in-degree is 10. Therefore, we could consider that nodes 1, 2, and 5 are the source nodes of (b), and the node 10 could be thought as the collection node of (b). In general, we need to run the PageRank algorithm of the entire network in order to get such conclusions.

2.2. -Order Degree and Its Computation

Same as 2-order degree, suppose matrix is an adjacency matrix of a network , we denote as the -order adjacency matrix of . The matrix was used to calculate the number of walk with length between nodes [5456]. Denote and as the sum of the -th row and -th column respectively in matrix ; then the -order degree of node in undirected network is and (if is a directed network, then the -order out-degree is , and the -order in-degree is ). It is easy to define mean -order (out-/in-)degree and the -order (out-/in-)degree distribution, too.

The computation of -order degree can make use of adjacency matrix, but this involves the exponentiation computation of matrix. The order of matrix equals the number of nodes in the network, and there could be thousands and tens of thousands of nodes in a network. Therefore even though the computation of -order adjacency matrix only involves the exponentiation computation of matrix and additions of integers, it would be a high requirement on the computer’s CPU and memory. A simple method to tackle this problem is to firstly compute the lower order degree sequence of the network and then compute the higher-order degree sequence. We can prove Theorem 1 by the method of mathematical induction.

Theorem 1. For -order degree sequence, the following conclusions are true: (i)If is a -order degree sequence of an undirected network , is the adjacency matrix, then the -order degree sequence of is ; particularly, if is the degree sequence of an undirected network , then the -order degree sequence of is .(ii)Similarly, if and are the -order out-degree sequence and -order in-degree sequence of directed network respectively, is the adjacency matrix, then the -order out-degree sequence and the -order in-degree sequence are and respectively; particularly, if and are the out-degree sequence and in-degree sequence of the directed network respectively, then the -order out-degree sequence and the -order in-degree sequence are and , respectively.

2.3. Combined Degree

Based on the -order degree defined as above, the following gives the definition of combined degree of a node of an undirected network:where denotes the various order degrees of node , and constants are all nonnegative real numbers, with a sum of 1. If and , the combined degree is the node degree in common sense, if and , the combined degree is the 2-order degree, and so forth. For the values of parameters, we usually consider the case , where indicates that a single neighbor’s influence to is no less than a single neighbor’s influence. Notice that the value of constant may not be integers; therefore the combined degree usually not be integer. Since the combined degree is the combination of regular degree and various high-order degree, we need not discuss the mean combined degree and combined degree distribution. Same as above, we can define combined in-degree and combined out-degree of a directed network. According to Theorem 1, we could give the following formula of combined degree sequence. Here we omit the formulas of combined in-degree sequence and combined out-degree sequence of directed networks. where is the identity matrix of order .

3. Results

In order to compare the behavior of degree and high-order degree and/or combined degree, we apply these attributes to a couple of best-known networks. We believe that the attributes of network with better behavior should be better to discover the differences among different nodes. By the definition of the differences of a given set of data, standard deviation () is the most important metric parameter. The greater the standard deviation is, the more diversities the data has, and the better the discrimination is, the better the attributes behavior is. As a complement to , we define a novel parameter and call it as overflow ratio (), which denotes the ratio of the number of elements outside to the number of elements in the given set of data, where denotes the mean value of the given set of data and . The greater the overflow ratio is, the less the possibility that data cluster around the mean value of the given set is, and the better the discrimination is, the better the attributes behavior is. Therefore, these two parameters both reflect the diversity of a given set of data to some extent. Clearly, if one of the network attributes shows more diversity, it is easier to distinguish or sort the nodes.

3.1. Random Network

Random network was put forward for the first time by Paul Erds and Alfred Rényi in 1960 (which is called Random Network) [11]. Here we generate a random network with nodes, and we investigate the influence of node connection probability taking different values on degree and high-order degree. Figure 3 shows the simulation results of 1000 times for node degree distribution, 2-order degree distribution, 4-order degree distribution, and combined degree distribution with and . We can tell from the results that the two high-order degrees and combined degree of random network still preserve the property of degree distribution, which is similar to Poisson Distribution (or Normal Distribution) with mean degree/mean high-order degree as the peak value.

Table 2 gives the discrimination of some attributes with and . Since the node degree distribution and high-order degree/combined degree distribution are all similar to Poisson Distribution (or Normal Distribution), we take big values for the overflow ratios. We can also tell from Table 2 that high-order degree has better discrimination compared with regular node degree, which is also true when and change (see Supplementary Table 1).

3.2. Small World Model

Watts [57] gave the basic attributes of Small World Model in 1999. Watts and Stregatz [6] proposed the construction of small world network by edge redistribution, which is called the small world network. Monasson [58] and Newman et al. [59] initiated the construction of small world network through edge augment, which is called the small world network. In this subsection we construct a small world network by edge augment, and investigate the behaviors of high-order degrees in small world network. Firstly, we construct a regular network with nodes, and with each node connected to the nearest nodes (where ). For a given probability , we stochastic add edges to produce the small world network. We can tell from Figure 4 that the scatter plot of 2-order degree distribution, 3-order degree distribution and 4-order degree distribution are all similar to Poisson Distribution (or Normal Distribution), which is the same as the degree distribution of the small world network.

For small world network, similar to random network, we are more concerned with the discrimination of small world network. Table 3 shows the mean discrimination of 1000 simulations for degree, 2-order degree, 4-order degree and 8-order degree, with , initial edge connection number , and connection probability . Of all the situations, 4-order degree and 8-order degree show better discrimination results than the others. But sometimes the 2-order degree does not show good results in discrimination than degree, which is probably caused by the smallness of the value of parameter . In Supplementary Table 2, we increase the value of , and show the discrimination of degree, 2-order degree, and 4-order degree, with the number of nodes and initial connection edges change over 54 cases. Among all of the cases, the standard deviation discrimination gets bigger (better), 4-order degree shows smaller overflow ratio twice than that of node degree, and 2-order degree shows smaller overflow ratio four times than that of node degree, which suggests that 4-order degree and 2-order degree are better than node degree in the aspect of overflow ratio. Moreover, as the order of high-order degree increases, the overflow ratio shows better results in discriminating the nodes.

3.3. Undirected Scale-Free Network

Barabási and Albert [18, 60, 61] proposed scale-free network (which is called scale-free network) with node degree distribution following the power-law distribution. In this subsection, we construct an undirected, scale-free network to investigate the behavior of various orders of degree. Firstly, we produce a random network with the number of nodes , stochastic connection probability . Then we add one node each time whose degree equals by the preferential attachment mechanism. Preferential attachment means that the more connected a node is, the more likely it is to receive new links. Nodes with higher degree have stronger ability to grab links added to the network. In our paper, the probability that new node connects to node is proportional to the degree of . Repeating this process for times, we get an undirected scale-free network with nodes and about edges.

The high-order degree distribution and combined degree distribution are also power-law distribution; see Figure 5. Supplementary Table 3 gives the discrimination results of various orders of degree, with initial number of nodes , initial node connection probability , and running times , and the new node added to the network each time has a degree of . As a matter of fact, only if , the higher-order degree shows better discrimination results. The reason maybe that the range of node degree sequence is relatively large for scale-free network (compared with random network and small world network), which coincides with our knowledge [18, 60, 61].

3.4. Directed Scale-Free Network

In this subsection, we construct a directed scale-free network. Firstly, produce a random network with nodes number and connection probability . Secondly assign the direction of each existing edge at random (with equal probability) to establish a directed network. Thirdly, run this step times, add a new node at each time, and add in-edges by out-degree priority mechanism and out-edges by in-degree priority mechanism; there we usually choose . Therefore we obtain a directed network with nodes and edges. Figure 6 shows that both the in-degree distribution and the out-degree distribution in the directed network we developed follow power-law distribution; therefore it is a directed scale-free network [56, 57]. But there are variations at the beginning of the graphs for high-order in-degree and high-order out-degree (especially the out-degree), even though the general shapes of these graphs still resemble the power-rule distributions; see Figure 6. We suspect that this may be caused by the fact that we always add in-edges and out-edges at each time when we construct the network, and the numbers of edges maintain constants rather than random numbers. Since the discrimination comparison between high-order degree and node degree in directed scale-free network is similar to that in undireceted scale-free network, we will not demonstrate the results here.

4. Discussion

The high-order degree in a network considers the influence of different path distances on a node. In a social network, consider a node is a person, and an edge is a friendship. Node degree of node indicates the influence of ’s friends on . The 2-order degree of indicates the influence of ’s friends’ friends on . An interesting question is that, if and are friends, there is an undirected edge connecting and , then will influence , which will in return have influence on itself also. If has many friends, will have influence on all of its friends, which in return will affect many times.

We know from Section 3 that high-order degrees and combined degree are superior to degree in discriminating nodes. However the computation of degree is simpler than high-order degrees and combined degree. Compared with the eigenvector centrality in undirected network, and PageRank in directed network, the high-order (out-/in-)degree and/or combined (out-/in-)degree we defined in this paper do not need iterations. The computation of our method only involves multiplications of matrices and vectors, which is clearly easier than eigenvector centrality or PageRank.

4.1. The High-Order Degree and Eigenvector Centrality/Betweenness in Undirected Network

In the undirected network, node degree is used to measure the influence of all the nodes connected to a node. 2-order degree considers the influence of a node’s neighbor’s neighbor on the node. 3-order degree considers the influence of a node’s neighbors’ neighbors’ neighbor on that node. As a result, the lower the order of the high-order degree is, the closer it is to the node degree. On the other hand, the higher the order of the high-order degree is, the closer it is to eigenvector centrality.

Theorem 2. For -order degree in undirected network, if , it is the node degree of the network. As approaches to infinity, -order degree is equivalent to eigenvector centrality.

We only need to show that when approaches to infinity, -order degree is equivalent to eigenvector centrality. Suppose is the adjacency matrix in an undirected network and is the degree sequence. Then by Theorem 1, the -order degree sequence is , which is consistent with calculating the largest eigenvalue and the corresponding eigenvector by the method of power rule [62]. Therefore, the normalized result of is the node eigenvector centrality. If the eigenvector centrality is deemed as the most accurate method in ranking nodes, then from node degree to high-order degree, and to eigenvector centrality, the accuracies get higher and higher, and the computation complexity gets higher and higher at the same time. Therefore, if there is not high requirement on ranking accuracy and computation complexity, high-order degree is a relatively good choice.

In fact, high-order degrees are not simple alternative to eigenvector centrality. Sometimes high-order degrees may be more intuitive than some methods including eigenvector centrality. For example, Figure 2(a), the top three nodes sorted by both 2-order degree and 4-order degree are 5, 6, 4, which we can see from Tables 1 and 4; it is the same as our supposition. Otherwise the top three nodes sorted by both betweenness and eigenvector centrality are 6, 5, 4.

4.2. The High-Order Degree and PageRank for Directed Network

Theorem 3. For -order out-degree in a directed network, if , it is the out-degree of the directed network. For -order in-degree in a directed network, if , it is the in-degree of the directed network. When approaches to infinity, the -order in-degree is equivalent to PageRank (without damping factor).

Theorem 3 is clearly true because PageRank is the simplification of eigenvector centrality in directed network [63]. Moreover, we can reach similar conclusions that when ranking network nodes, high-order in-degree is a good choice if one has certain but not high requirements on the accuracy and computation complexity.

Same as Section 4.2, high in-order degrees are not simple alternative to PageRank. Sometimes high in-order degrees may be more intuitive than PageRank. For example Figure 2(b), the top node sorted by both 2-order in-degree and 3-order in-degree is 9, which we can see from Tables 1 and 4, it is the same with our supposition. Otherwise the top node sorted by both PageRank without damping factor and PageRank with damping factor is 9.

4.3. The Significance of Network High-Order Out-Degree

PageRank (high-order in-degree) indicates that the value of a node is larger when there are more nodes pointing to in a directed network (quantity hypothesis) and/or the nodes pointing have larger values (quality hypothesis) [35, 36]. Reversely, we call high-order out-degree Reverse PageRank (). The value of a node in a directed network is determined by how many nodes pointed by (reverse quantity hypothesis) and/or how large value that nodes pointed by have (reverse quality hypothesis). For example, we can say that the World Wide Web navigation websites have large value. When someone surfs the Internet, if he does not know which website is worth visiting (with larger value), he should start with the navigation website (with larger value). There is more research on the high-order in-degree () for directed network, but there is no research on high-order out-degree () as far as we know.

Figure 2(b), although nodes 1, 2, 5 have the same 2-order out-degree, only node 1 has the largest -order out-degree (). Node 1 is the only one that can reach any nodes of this graph.

4.4. The Combined Degree We Defined and the Mixed Degree Zeng Defined

Zeng [44] used the weighted sum of both the residual degree and the exhausted degree to define the mixed degree of the node of a network. On one hand, Zeng’s definition involves Mixed Degree Decomposition () to the whole network by using coreness centrality, while the combined degree we defined is based on a linear combination of various high-order degrees. Even though our method involves more terms in the weighted sum, it does not need to consider network decomposition. Therefore it is cheaper in computation than Zeng’s definition. On the other hand, Zeng divides the nodes that are connected to a node into two classes according to , while we divide the nodes that are connected to a node into finite number of classes according to path distance, which is more delicate in classification and has more accurate results.

5. Conclusion

In this paper, we define several novel centrality metrics: the high-order (out-/in-)degree and combined degree. For the values of combined degree’s parameters, we usually consider the case and . We prove that both the degree centrality and eigenvector centrality are the special cases of the high-order degree of undirected network, and both the in-degree and PageRank algorithm without damping factor are the special cases of the high-order in-degree of directed network. We present several experiments to discuss the performance of our novel centrality metrics. It can be seen from the experiments that the centrality metrics we defined are easy to calculate and perform better than degree centrality. In a large-scale complex network study, our centrality metrics will be an effective alternative to the eigenvector centrality/PageRank algorithm. The manuscript is only limited in introducing the definition of new metrics. We hope to discuss their efficacy and computational cost in the further works.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

All authors worked together to produce the results and read and approved the final manuscript.

Acknowledgments

The research is supported by the National Natural Science Foundation of China (61572522 and 11371230) and Shandong Provincial Natural Science Foundation (ZR2018PF004).

Supplementary Materials

Supplementary 1 . Supplementary Table 1: in the random network, we increase the value of and show the discrimination of degree, 2-order degree, and 4-order degree, with the number of nodes and initial connection edges changing over 27 cases.

Supplementary 2 . Supplementary Table 2: in the small world network, we increase the value of and show the discrimination of degree, 2-order degree, and 4-order degree, with the number of nodes and initial connection edges changing over 54 cases.

Supplementary 3 . Supplementary Table 3: in the undirected scale-free network, the discrimination results of various orders of degree, with initial number of nodes , initial node connection probability , running times , and the new node added to the network each time have a degree of over 90 cases.