Complexity

Volume 2018, Article ID 1528341, 16 pages

https://doi.org/10.1155/2018/1528341

## A Comprehensive Algorithm for Evaluating Node Influences in Social Networks Based on Preference Analysis and Random Walk

School of Software and Communication Engineering, Jiangxi University of Finance and Economics, 330013 Nanchang, China

Correspondence should be addressed to Chengying Mao; ten.haey@yhcoam

Received 18 March 2018; Revised 3 August 2018; Accepted 14 August 2018; Published 8 October 2018

Academic Editor: Ana Meštrović

Copyright © 2018 Chengying Mao and Weisong Xiao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In the era of big data, social network has become an important reflection of human communications and interactions on the Internet. Identifying the influential spreaders in networks plays a crucial role in various areas, such as disease outbreak, virus propagation, and public opinion controlling. Based on the three basic centrality measures, a comprehensive algorithm named PARW-Rank for evaluating node influences has been proposed by applying preference relation analysis and random walk technique. For each basic measure, the preference relation between every node pair in a network is analyzed to construct the partial preference graph (PPG). Then, the comprehensive preference graph (CPG) is generated by combining the preference relations with respect to three basic measures. Finally, the ranking of nodes is determined by conducting random walk on the CPG. Furthermore, five public social networks are used for comparative analysis. The experimental results show that our PARW-Rank algorithm can achieve the higher precision and better stability than the existing methods with a single centrality measure.

#### 1. Introduction

With the rapid development of network and information technology, the applications in form of rich media have involved in all aspects of our lives. Accordingly, the interaction and communication between individuals have become more and more convenient and frequent. For example, the platforms such as Facebook, WeChat, QQ, and WhatsApp are very helpful for users to deliver their messages, options, or pictures. As a result, the individuals in the society have been tighted together in an invisible way, that is, the so-called *social network* [1, 2]. Under the impetus of intelligent mobile terminals like iPhone, the scale of social network has a sharp increase in recent years. As reported by TechCrunch, the monthly active users of Facebook have climbed to 2 billion in the middle of 2017 [3]. Similarly, WeChat, as one of the most impactful mobile products, has monthly users which are over 980 million now [4]. Faced with such a huge and complex social network, we usually feel hard and tricky to analyze its overall features and understand the behaviors of the individuals in it. Consequently, the analysis and modeling of social networks have caused much attention in the recent two decades [5].

At the early stage, the studies mainly focus on the static statistical properties that characterize the structure of social networks [6]. Some concepts, such as degree distribution, clustering coefficient, and the average path length, have been proposed and widely applied to the measurement of social networks. Although the above metrics can reflect the feature of overall network or a single node well, they usually ignore the dynamic interaction behaviors of the individuals in a network [7]. Nowadays, the structure of social networks is not just a mathematical toy; it has been employed extensively as a model of real-world networks in various types, such as network of friendships, network of telephone calls, and network in epidemiology. In most of these application scenarios, the dynamic features of nodes or communities need to be deeply investigated so as to make scientific decisions. For example, in public opinion emergencies, the recognition of special individuals like opinion leaders and the evaluation of the spreading of their influences can contribute to the understanding and controlling of the opinion (or rumor) transmission [8, 9]. Similarly, the identification of influential individuals is also very helpful in controlling the disease spreading.

As reported in [10–12], many mechanisms such as cascading, spreading, and synchronizing in social networks are highly affected by a tiny fraction of influential nodes. In other words, *identifying influential nodes* is an effective way to reveal the potential disciplines behind the information, rumor, or disease spreading over social networks. Due to the theoretical and practical significance, how to identify the influential nodes in a social network has been widely investigated in recent years [13–15]. At present, quite a few centrality indicators have been presented to address this problem. Typically, *degree centrality* [16], *betweenness centrality* [16, 17], and *closeness centrality* [16] are three well-known measures. However, most of them quantify the influence of nodes in a network from the perspective of a single indicator. Although the single measure is reasonable from its own point of view, it is usually lack of the ability of comprehensive evaluation. At the same time, each measure has its own advantages and limitations. Thus, it is more appropriate to consider multiple different measures simultaneously. In this paper, we attempt to integrate the above three representative measures together by preference analysis and then adopt the random walk algorithm to rank the nodes in a network according to their spreading influences. In order to validate the effectiveness of our proposed algorithm, we use the Susceptible–Infected–Recovered (SIR) [18] model to evaluate the rationality of node ranking results.

Recently, some hybrid approaches have been proposed for the influence maximization problem. In their solutions, several different measures like degree centrality are usually taken into account to design a comprehensive model for evaluating the influence spread of node. Typically, Jalayer et al. proposed a “greedy TOPSIS and community-based” (GTaCB) algorithm [19] for this problem. It could be seen that the TOPSIS in [19, 20] belongs to a greedy technique. Thus, it only generates a local optimal solution for identifying the influential nodes in a social network. By contrast, the random walk technology used in our solution is a global optimization algorithm. In theory, the rank generated by random walk is more reasonable than that of the TOPSIS-based method. In literature [21], Ko et al. proposed the Hybrid-IM algorithm to maximize the influence spread over a social network by combining PB-IM (path-based influence maximization) and CB-IM (community-based influence maximization). In general, it is not easy to collect the information about path and community from a social network. By contrast, in our algorithm, three basic and typical measures are used for the preference analysis, so it can be easily implemented and has a certain advantage in efficiency. The main contribution of our work is the comprehensive evaluation framework for node influences by combining *preference analysis* and *random walk*. In this paper, although we only adopt three basic centrality measures, the measures used in the framework can be replaced and extended according to some specific requirements. That is, our approach has good scalability in the integration of multiple measures.

The remainder of this paper is organized as follows. In Section 2, we describe the problem to be solved and review some background knowledge. Section 3 presents the overall framework of a comprehensive algorithm for evaluating node influences firstly, and then addresses the technical details. Subsequently, the experimental comparison and analysis are conducted in Section 4. Section 5 discusses the threats to validity and the potential extension. The related studies about the evaluation of node influences are addressed in Section 6. Finally, the conclusions and further research directions are stated in Section 7.

#### 2. Preliminaries

##### 2.1. Problem Description

During the spreading process of diseases or rumors, their influences are usually sparked by one or several initial nodes in a social network. Due to the difference in the location of node in the entire network structure, different nodes will have different transmission abilities for disease or rumors and thus will bring different influences on the network. Therefore, it is very necessary to evaluate nodes’ influences and then rank them. This measurement is helpful in scientific decision-making on social networks, such as the monitoring of public opinion transmission and the controlling of disease propagation.

In this paper, we assume that the initial source of spreading is only due to one node in a network. Then, the *node influence analysis* in a social network can be formally described as below: given a social network represented by a directed graph , for each node , its influence is firstly measured by considering its location and the connections to other nodes in , and then all nodes are ranked according to their influence metrics. Here, and are the node set and edge set of such network, respectively.

It should be noted that the information or disease propagation may be caused by several original source nodes in a social network, and hence to identify multiple influential nodes is also an interesting problem [22]. In this paper, however, we mainly focus on the influence evaluation for a single source node.

##### 2.2. Three Typical Measures for Node Influences

In this study, our objective is to design a framework for evaluating node influences in a social network through comprehensively considering some basic measures. In the past, quite a few measures have been presented to capture the importance of each node in a network. *Degree centrality* [16], *betweenness centrality* [16, 17], and *closeness centrality* [16] are three basic, representative, and widely used measures to reflect the influence of node. As a result, the above three measures are very suitable for use in our comprehensive model of influential node identification. Similarly, in the literatures [23–26], the three measures are also used in their MADM (multiple-attribute decision-making) models to identifying influential nodes. Here, we firstly give a brief review on them.

###### 2.2.1. Degree Centrality

The degree centrality is the earliest and most simple method to depict the influence of node in a network. For node , the influence is directly reflected by its degree, that is, so-called degree centrality. Here, it is denoted as and formally defined as where is the degree of node , and is the number of nodes in the given network.

Degree centrality measures the node’s importance from the perspective of degree. Its inherent limitation lies in that it can only reflect the local structure around a given node, i.e., the node and its neighbors, but the reachability from it to the nodes beyond its neighborhood is completely ignored.

###### 2.2.2. Betweenness Centrality

The betweenness centrality is used to capture how well situated a node is in terms of paths that it lies on. Specifically, for a node in network , its betweenness centrality (denoted as ) is the fraction of the shortest paths passing through node to all shortest path pairs in network . where is the number of the shortest paths between nodes and , and denotes the number of the shortest paths between and which pass through node .

It is easy to see that betweenness centrality is a measure to reflect the gateway feature of a node. But it has poor capability to express the strength of connections from the node of interest to its neighbors.

###### 2.2.3. Closeness Centrality

The closeness centrality is a measure of tracking how close a given node is to any other nodes in a network. For node , its closeness centrality, denoted as , can be defined as where represents the distance between node and node , and is the number of nodes in the network.

According to the definition in (3), closeness centrality can characterize the speed of information propagation for a given node, but it cannot distinguish the difference in node location information like the gateway.

Based on the analysis on the above three measures, we can find that each measure has its own specialty for reflecting information (or disease) propagation, but it also has shortcomings. Therefore, combining these representative issues into a comprehensive measure is probably a rational way for identifying influential nodes. As mentioned earlier, the basic measures in our framework can be extended or replaced according to the specific requirements. Besides the above three centrality measures, quite a few other measures have been presented in recent years, such as diffusion centrality [27], sociability centrality [28], and BridgeRank [29]. In fact, all these basic measures can be applied into our comprehensive framework for evaluating the influences of nodes in a social network. For the sake of simplicity, we only take the three basic and representative centrality measures into consideration in this study.

##### 2.3. Random Walk Model

The random walk model is a special case of Markov chain, that is, a finite and time-reversible Markov chain. It arises in many models in mathematics and physics [30]. In the field of computer science, the rank walk is usually modeled in the following way: suppose there is a system with states, and the initial probability distribution of these states is represented as . In this system, the states can be transited to each other. Specifically, if state has different transition probabilities to other states, the sum of these probabilities should be 1.0, that is, , where is the transition probability from state to state . The transition probabilities of all state pairs can be represented as a probability matrix of state transition, i.e., . Thus, the random walk model can be clearly described through using matrix notations [31]. Let be the probability distribution of states after walking steps, then it can be iteratively calculated according to the initial distribution as below.

In fact, the random walk model can be easily applied to the directed graph [30]. For nodes in a directed graph, we can consider them as states. At the same time, the connection strength of a directed edge between two nodes is treated as transition probability. Once an initial distribution for all nodes is determined, the stationary distribution can be yielded finally through the finite-step transitions shown in (4).

#### 3. Comprehensive Algorithm for Evaluating Node Influences

##### 3.1. The Overall Framework

In the paper, we attempt to design a comprehensive algorithm for evaluating node influences by synthetically considering three basic and independent measures about influence. Thus, the three basic measures are the input data for further processing in our algorithm. Here, assume that the basic measures, such as , , and , are obtained by degree counting and path analysis on the given network according to (1), (2), and (3).

As shown in Figure 1, the procedure of comprehensively ranking nodes according to their influences can be divided into the following two steps: At the first step, for each basic measure, the metrics of all nodes are firstly regulated into the interval from 0 to 1. Then, the preference relation of each node pair is analyzed by comparing the metrics of two nodes in the pair. Based on the preference relations, a subgraph of the preference relation (also known as partial preference relation graph) can be built. In this graph, nodes are still the nodes in the original network, but each edge represents the preference relation of two nodes with respect to the given basic measure. Secondly, a complete preference relation graph is formed by adding subgraphs together. In this paper, is set to 3 because we mainly combine three centrality measures (i.e., , , and ) together in our algorithm. Then, the complete graph is converted to a matrix, and the regulation is performed on it for further computation. Finally, a ranked list of nodes can be generated by applying a random walk on the complete model of preference relations.