Abstract

The key nodes play important roles in the processes of information propagation and opinion evolution in social networks. Previous work rarely considered multiple relationships and features into key node discovery algorithms at the same time. Based on the relational networks including the forwarding network, replying network, and mentioning network in a social network, this paper first proposes an algorithm of the overlapping user relational network to extract different relational networks with same nodes. Integrated with these relational networks, a multirelationship network is established. Subsequently, a key node discovery (KND) algorithm is presented on the basis of the shortest path, degree centrality, and random walk features in the multirelationship network. The advantages of the proposed KND algorithm are proved by the SIR propagation model and the normalized discounted cumulative gain on the multirelationship networks and single-relation networks. The experiment’s results show that the proposed KND method for finding the key nodes is superior to other baseline methods on different networks.

1. Introduction

With the rapid development of social networks (e.g., Facebook, Twitter, and Sina Weibo), they have become main platforms for people to obtain, spread, and exchange information. In social networks, how do we quickly spread information? How do we effectively control the speed of virus diffusion? How do we efficiently suppress the width of rumor propagation? How do we correctly control and guide the evolution trend of public opinion? For these practical application scenarios, key nodes are able to play important roles in the structures and functions of networks [13]. In recent decades, scholars have mainly focused on single-relation networks [4, 5]. Single-relation networks mean that the networks consist of the same type of nodes and only one type of relationship between nodes. Traditional single feature mining contained degree centrality and related variants of degree centrality [6]. Chen et al. [7] proposed a degree discount centrality algorithm to effectively make influence maximization. Sheikhahmadi et al. [8] proposed degree distance centrality algorithm. Their experiments showed that the performance of this algorithm was better than other measures including high degree and betweenness in eight large-scale networks. Wang et al. [9] proposed degree punishment method to select spreaders. They adopted SIR (Susceptible-Infected-Recovered) model to assess the performance of the method. Combined with the neighbors’ numbers, neighbors’ influences, and clustering coefficient, Chen et al. [10] proposed a cluster rank algorithm. Their experiments showed that the performance of this algorithm is significantly superior to the degree centrality and -core decomposition. Besides, there are many methods to find the key nodes in networks, such as feature vector method [11], shortest path increment [12], spreading influence related centrality [13], PageRank [14], LeaderRank [15], HITS [16], -shell centrality [17], and -shell improved algorithm [1821]. From these research results, it can be seen that the key node identification is very successful in single-relation networks.

However, in real social networks, users often participate in different social activities in various ways and form various connections with different users [4, 2224]. The different interactive relationships form multirelationship networks [25]. For the multirelationship networks, there are many types of relationships between nodes; and the types of nodes may be different. Obviously, the multirelationship networks possess more information than single-relation networks to effectively express the diversity of relationships for social networks. Battiston et al. [25] first proposed the relevant symbolic representation of the multirelation network. Boccaletti et al. [26] further enhanced the understanding of multirelationship networks. Al-Garadi et al. [27] applied the node identification method of single-relation networks to multirelation networks. They only mined the single feature of networks, failing to accurately identify the key nodes. Chen et al. [13, 28] integrated multiple centrality measures into the hierarchical ranking method to identify the key nodes in complex social networks. Wang et al. [29] proposed a new metric for measuring key nodes in multilayer networks. They further verified the metric in single-layer networks, multirelationship networks, and aggregated networks, respectively. Li et al. [30] proposed a key node identification method for multilayer networks based on evidence theory. Their method had high computational complexity and was not applicable to large social networks. Pedroche et al. [31, 32] extended some algorithms for single-layer networks to two-layer networks. Fu et al. [33] used the representation learning to learn the global structural features and local structural features of networks. Their method well represented the characteristics of nodes. Singh et al. [34] proposed a new multirelationship network aggregation method to identify key nodes. Their experiments showed that their method has obviously advantage for influence maximization across multiple social networks. Huang et al. [35] treated the community as a node through extended neighbor strategy. They transformed the multilayer network into a single-layer network and then proposed an algorithm to find the key nodes in monolayer or multilayer networks. Thus, the key node discovery is beneficial to understand the structure properties of multirelation networks.

At present, there is no unified concept of the key node discovery in multirelation networks [3640]. The existing methods of key node discovery for multirelationship networks have not fully used the importance of the relationships between different layers and the importance of the edges between layers. Therefore, the key node discovery algorithms have yet to study for multirelation networks.

In this paper, we will design a novel algorithm for identifying top-K key nodes in social networks. Firstly, we propose an algorithm of overlapping user relational networks to establish different relational networks in social networks. Subsequently, we reconstruct a multirelationship network based on these relational networks. Then, we build a node influence measure based on the multiple relationships and multiple features to rank influential nodes. On the multirelationship network, we propose a key node discovery (KND) algorithm to obtain key node. Some comparison experiments are made to verify the effectiveness of the KND method on social networks. The main contributions of this paper are as follows:(1)An algorithm of overlapping user relational network is proposed. This algorithm is able to extract different relational networks with the same nodes.(2)An algorithm of establishing multirelationship network is proposed based on the different relational networks. The multirelationship network fully integrates the multiple relationships and multiple features in social networks.(3)A key node discovery (KND) algorithm with multiple relationships and multiple features is proposed. In this method, a novel node influence measure is built on the multirelationship network. Based on the node influence method and the algorithms of overlapping user relational network and multirelationship network, the key node discovery algorithm is designed to find the key nodes on multirelationship network. Some comparison experiments on different datasets verify that the performance of the proposed KND algorithm is better than baseline methods. By the evaluation of the normalized discounted cumulative gain (NDCG), the proposed KND algorithm gets the best NDCG score in different networks. These results show that the proposed KND algorithm can accurately find the top-K nodes in the social networks.

The rest of the paper is organized as follows. Section 2 presents related definitions and proposed algorithms. Section 3 explains the experimental setup and result analyses of comparison experiments on three datasets. Section 4 draws conclusions and future directions.

2. Key Node Discovery Algorithm Based on Multiple Relationships and Multiple Features of Social Networks

2.1. Related Definitions

In a social network, there exist many relational networks. The th relational network can be represented as an undirected graph for , where represents the set of users in the th relational social network; represents the set of edges between nodes in the th relational network; represents the set of relationships among users at each layer in the social network, and their values belong to [0, 1]; and represents the set of weights in the th relational social network.

Integrated with the relational networks, the multirelationship network is defined, where and . Specifically, represents the set of interlayer edges connected between the th and th relational networks; and , respectively, represent the nodes of the th and th relational networks; and is the set of the weights of all user pairs in different relationships. In order to better measure the weights of the relationships in the social network, their values will be normalized in the following. Figure 1 shows an example of a social network and the multirelationship network. In Figure 1(a), a social network has 3-layer relational networks, where each layer network has 10 nodes, and the dotted lines represent the edges between layers.

Each layer represents a type of relationship; the first layer network is a forwarding network, where the network edges with the blue lines represent the relationships between users; the second layer is a replying network, with relationships between users within the network represented by red lines; and the third layer is a mentioning network, with relationships between users of the network represented by green lines. Assume that users in different relational networks are connected by a user participating in two or more relationships. Then all the relational networks aggregate the multirelationship network, as shown in Figure 1(b).

Definition 1 (Neighbors) [34]. The neighbors for a node are denoted by .

Definition 2 (Degree centrality) [34]. For a node , the degree centrality is the number of links incident upon node ; that is, .

Definition 3 (Shortest path of a node) [12]. Let represent the distance from node to node . Then the sum of the distances from node to the other nodes in a graph , denoted by , is called the shortest path of node ; that is,

Definition 4 (Shortest path in the graph ) [12]. Shortest path in graph represents the sum of the shortest paths of all nodes of . That is,

2.2. Overlapping User Relational Networks

In social networks, users actively participate in different relational networks. These users are called overlapping users. The subgraph induced by the overlapping users in the th relational network is called the th overlapping user relational network for . The network that is aggregated by all overlapping user relational networks is called multirelationship network. In the following, for convenience, the th overlapping user relational network for is briefly called the th relational network.

In the th relationship network (), the weight of the edge between users and is defined as

Then, the algorithm of overlapping user relational networks that are abstracted from a social network with relational networks is represented by Algorithm 1.

Input: Relational networks in the social network ,
Output: Each overlapping relational network: ,
(1), , ,
(2)for each graph do
(3)
(4)fordo
(5)  for in each graph do
(6)   
(7)   
(8)for each edge do
(9) the weight of the edge in the th overlapping relational network:
(10)
(11)Return
2.3. Establishing Multirelationship Network

In a social network containing relational networks, the weight of the edge between users and is calculated as follows:

Based on the definition of the multirelationship network, we can further establish a multirelationship network with weights of edges through aggregating the relational networks. For example, Figure 2 gives four relational networks: following, forwarding, replying, and mentioning networks. Figure 3 illustrates the process of aggregating the four relational networks and the multirelationship network. The multirelationship network can be established by Algorithm 2.

Input: () obtained from Algorithm 1
Output: Multirelationship network
(1), , ,
(2)for each graph , do
(3)for each edge do
(4)   is updated by formula (4)
(5)  
(6)  
(7)end for
(8)end for
(9)
(10)for each edge do
(11)
(12)
(13)end for
(14)Return
2.4. Key Node Discovery of the Multirelationship Network

Let be the graph removing node . Based on the definition of the shortest path for a node, the shortest path increment for a node , denoted by , is the following:

Let and denote the degree centrality [6] and PageRank value [14] of a node in graph , respectively; and let denote 0-1 normalization method. Then, we havewhere stands for the degree of node ; and is the damping coefficient, generally 0.85.

Since a multirelationship network is aggregated by different relational social networks through the overlapping users, the node influence score of a user in the network can be calculated by combining with the degree centrality, PageRank value, and shortest path increment as follows:where , , and are parameters and .

The shortest path increment is based on global features; the degree centrality is based on local features; and PageRank is a random walk feature. Thus, this formula combines both local and global features.

In multirelationship network, based on the weight of edge in the th relationship network (), the node weight of node , denoted by , can be defined as follows:

Considering the edges of each relational network and the characteristics of the network structure of different relational networks, the final node influence score of a node , denoted by , is aggregated with the node weight and the node influence score in the multirelationship network.where .

By the algorithms on the overlapping user relational network and multirelationship network, the key node discovery algorithm based on multiple relationships and multiple features of social networks (KND algorithm) can be stated by Algorithm 3.

Input:, : which obtained from Algorithm 1
Output: top-K nodes:
(1), , ,
(2)for each graph do
(3)for each edge do
(4)   by formula (10)
(5)end for
(6)end for
(7)Obtain the multirelationship network by Algorithm 2
(8)Compute by formula (6)
(9)Compute by formula (7)
(10)Compute by formula (8)
(11) Compute by formula (9)
(12)fordo
(13)Compute by formula (11)
(14)end for
(15)Sort ()
(16)Return top-K nodes

3. Experiments

In this section, we will give the experimental dataset, baseline methods, and evaluation experiments of KND algorithm.

3.1. Datasets

Four datasets are employed to verify the performance of the proposed KND algorithm. They are classified into two groups: single-relationship networks and multirelationship networks. The single-relationship networks include two LFR artificial synthetic networks and a karate club network. The multirelationship network is Higgs Twitter network, including following, forwarding, replying, and mentioning relational networks. The statistics of the two groups of datasets are shown in Tables 1 and 2.

3.1.1. Karate Club Network (https://blog.csdn.net/weixin_41857995/article/details/105454517)

It was a friendship social network between 34 members of a college karate club in a college of America.

3.1.2. LFR Artificial Synthesis Networks (LFR Networks) [41]

Since the networks follow power law distribution, they can be used to simulate real networks. If the LFR network has M nodes, then the network is denoted by LFR-M network.

3.1.3. Higgs Twitter Network (http://snap.stanford.edu/data/higgs-twitter.html)

The twitters contained news about the discovery of new particles with Higgs boson features before, on, and after July 4, 2012. This dataset has four types of networks: following, forwarding, replying, and mentioning networks.

3.2. Baseline Methods

To evaluate the accuracy of the KND algorithm, six baseline methods are selected as follows:(1)PageRank (PR) [14]: The initial values of nodes are given the same score. Then each value is constantly updated by an iteration formula of PR. When the iteration results tend to converge in a stable state, the top-K nodes with higher scores are selected as the key node.(2)Degree centrality (DC) [11]: It computes the centrality of a network. Then, the top-K nodes with high centrality are selected as the key nodes.(3)SPIS [12]: It computes the shortest path increments of all nodes in networks. Then, the nodes that have the highest top-K shortest path increments are selected as the key nodes.(4)MCIM [42]: This method considers four metrics to evaluate a node. The overlapping influence and the influence between nodes and their neighbors are both considered to evaluate a node. Then, the nodes with top-K evaluation values are selected as the key nodes.(5)Eigenvector [17]: The importance of a node depends on both the number of neighbor nodes and the importance of its neighbor nodes. Both the topology and the properties of the node are considered. Then the nodes with top-K eigenvector scores are selected as the key nodes.(6)Random [43]: It randomly selects K nodes as the key nodes.

3.3. Evaluation Experiments of KND Algorithm

The SIR (Susceptible-Infected-Removed) model [9] is adopted to compare transmission ability on the KND algorithm and baseline methods. The susceptible population S is converted to the infected population with the probability , and the infected population I recovers to be immune to the information with probability . When there are no infected nodes in the network or a preset number of iterations is reached, the propagation process stops in networks. Based on different sizes of dataset, we adopt different values of and according to [44]: , where indicates average degree in the whole network and indicates the mean of the squared degrees in all nodes of the network. For convenient comparison, we set and . The values of for each network are shown in Table 3.

Moreover, parameters , , and are set as 0.4, 0.2, and 0.4, respectively. Considering the fact that all experiments are probabilistic experiments, we get 50 times of calculations each time and take their average result as the final result to avoid the effect of random error. For the Higgs Twitter network, to reduce the complexity of the experiments, we only consider three important relationships: replying, mentioning, and forwarding. The three relational networks are aggregated as the Higgs multirelationship network by Algorithm 2. Under the above parameters, we make some comparison experiments for the KND, DC, PR, Eigenvector, MCIM, SPIS, and Random methods on the karate club, LFR-500, LFR-1000, replying, mentioning, forwarding, and Higgs multirelationship networks. The more infected nodes are, the stronger transmission ability of the initial infected nodes is. When the initial infected nodes are the same, the more infected nodes are, the better the performances of the methods are.

From Figure 4, on the karate club network, whatever the initial numbers of infected nodes are, the total number of infected nodes stays the same, close to 34. The reason is that the size of the karate club network is too small; and the initial infected nodes easily infect other nodes of the network. The performances of the KND, DC, PR, Eigenvector, MCIM, and SPIS methods are almost the same and better than the performance of the Random method on the karate club network.

On the LFR-500 and LFR-1000 networks, with the increase of initial numbers of infected nodes, the total numbers of infected nodes for the KND and baseline methods are going up; the performance of the KND method is the best; and the performance of the Random method is the worst. Since the sizes of the two networks are different, at the same initial number of infected nodes, the total number of infected nodes on LFR-500 is more than that on LFR-1000. It indicates that the more the number of nodes in the network is, the weaker the transmission ability is under the same initial number of infected nodes.

On the replying, mentioning, and forwarding networks, the performance of the KND method is better compared to all baseline methods; the Random method gets the worst performance. On the replying and mentioning networks, as the initial numbers of infected nodes increase, the total numbers of infected nodes steadily rise. However, on the forwarding network, the total numbers of infected nodes rise from 5 to 10; from 10 to 20, they keep about 280; from 20 to 25, they go through a descent; and then they increase slowly. Therefore, the KND method on the three single networks has the best performance overall.

On the Higgs multirelationship network, from 5 to 10, the performance of the MCIM method is the best, while the performance of the KND method is the second; from beginning to end, the total numbers of infected nodes continuously rise; from about 10 to end, the performance of the KND method is the best; and, from beginning to end, the Eigenvector method is the worst.

On the replying, mentioning, forwarding, and Higgs multirelationship networks, when the initial numbers of infected nodes are the same, the total number of infected nodes on the Higgs multirelationship network is lower than those on the replying, mentioning, and forwarding networks. This implies that the transmission abilities on single networks are stronger than those on multirelationship network. This is because real social networks always contain multiple relationships and features, which reduces the transmission capacity to some extent. Therefore, the whole performance of the KND method is superior to the baseline methods on single-relationship and multirelationship networks; and the multirelationship network can indicate social networks better.

Summing up these discussions, compared with all baseline methods, the proposed KND method can effectively discover the key nodes in social networks.

3.4. Evaluation Indicator and Its Analysis

To evaluate the sorting qualities of key nodes (top-K nodes) for all methods, the normalized discounted cumulative gain (NDCG) [45] is adopted. Suppose that denotes the normalized discounted cumulative gain of the first nodes; represents the cumulative loss gain of the nodes; and represents the maximum in the ideal case. Thenwhere and represents the correlation between node and the final result.

Figure 5 shows the values of on different networks. From Figure 5(a), on the karate club network, when , the KND, PR, and MCIM methods get the largest NDCG scores. When , the Random method is the worst. When , the NDCG score of the MCIM method is the highest, whereas the NDCG score of the KND method is very close to the highest value. When , the NDCG scores of all methods are slightly different. The reason is that the size of the network is very small. It can be concluded that the advantage of the KND method is not obvious in small size networks.

From Figure 5(b), on the LFR-500 network, when , the KND method is the best; the Random method is the worst; and the NDCG scores of the KND, DC, PR, Eigenvector, and MCIM methods are subtly different. They show that the KND method is the best.

From Figure 5(c), on the LFR-1000 network, when , the NDCG scores of the KND are the largest; and the NDCG score of the Random method is the smallest. When , the NDCG scores of the PR, MCIM, and SPIS methods are slightly different. When , the DC, Eigenvector, and SPIS methods almost obtain the same NDCG@30. They show that the KND method can take the best performance on the LFR-1000 network.

From Figure 5(d), on the replying network, when , the NDCG score of the KND method is the same as that of the MCIM, the highest. When , the DC method gets the largest NDCG@10, but the NDCG score of the KND method is slightly lower than that of the DC method. When , the KND and MCIM methods obtain the highest NDCG@30. When , the NDCG scores for Random method are the lowest. These imply that the overall performance of the KND method is superior to those of the other methods on the replying network.

From Figure 5(e), on the mentioning network, when , the SPIS method gets the largest NDCG@10; and the KND method obtains the second NDCG@10. When , the NDCG score of the MCIM method is the largest and slightly more than KND method. When , the NDCG scores of the KND and MCIM methods are almost the same and the largest. When , the Random method gets the worst performance. These imply that the more the value of is, the better the KND method obtains the performance on mentioning network.

From Figure 5(f), on the forwarding network, when , the NDCG scores of the KND, DC, PR, MCIM, and SPIS methods are slightly different; the KND method gets the best performance; and the Random method obtains the worst performance. Thus, the performance of the KND method is the best on the forwarding network.

From Figure 5(g), on the Higgs multirelationship network, when , the NDCG score for the KND method far exceeds all baseline methods. Thus, the KND method can get the best performance on the Higgs multirelationship network.

Therefore, the overall sorting quality of the KND method is superior compared to all baseline methods on the single-relationship network and multirelationship network. In particular, the sorting quality of the KND method is far more than all baseline methods on the multirelationship network. It is concluded that the KND method is very helpful to find the key nodes in social networks.

4. Conclusion

In social networks, a user always has multiple relationships and features. Considering the fact that the different relationships constitute different relational networks, this paper proposed the algorithm of overlapping user relational network to find different relational network consisting of the same nodes in social networks. Based on these relational networks, we first proposed the algorithms of the overlapping user relational network and multirelationship network. Then, we proposed the key node discovery algorithm with multiple relationships and multiple features to find the top-K nodes in social networks. The experiments of the KND algorithm and six baseline methods on karate club, LFR-500, LFR-1000, replying, mentioning, forwarding, and multirelationship networks show that the KND algorithm can obtain the best performance on the multirelationship network. The key node discovery methods with multiple relationships, multiple features, and users’ attributions are worth research directions in the future.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was partially supported by the National Natural Science Foundation (nos. 61802316, 61872298, and 61902324), Chunhui Plan Cooperation and Research Project, Ministry of Education of China (nos. Z2015109 and Z2015100), “Young Scholars Reserve Talents” program of Xihua University, Science and Technology Department of Sichuan Province (no. 2021YFQ0008), and Key Scientific Research Fund of Xihua University (no. z1422615).