Complexity

Volume 2017 (2017), Article ID 5049836, 14 pages

https://doi.org/10.1155/2017/5049836

## On the Shoulders of Giants: Incremental Influence Maximization in Evolving Social Networks

School of Computer, National University of Defense Technology, Changsha 410073, China

Correspondence should be addressed to Xiaodong Liu; nc.ude.tdun@gnodoaixuil

Received 13 March 2017; Revised 4 July 2017; Accepted 1 August 2017; Published 28 September 2017

Academic Editor: Piotr Brodka

Copyright © 2017 Xiaodong Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Influence maximization problem aims to identify the most influential individuals so as to help in developing effective viral marketing strategies over social networks. Previous studies mainly focus on designing efficient algorithms or heuristics on a static social network. As a matter of fact, real-world social networks keep evolving over time and a recalculation upon the changed network inevitably leads to a long running time. In this paper, we propose an incremental approach, IncInf, which can efficiently locate the top- influential individuals in evolving social networks based on previous information instead of calculation from scratch. In particular, IncInf quantitatively analyzes the influence spread changes of nodes by localizing the impact of topology evolution to only local regions, and a pruning strategy is further proposed to narrow the search space into nodes experiencing major increases or with high degrees. To evaluate the efficiency and effectiveness, we carried out extensive experiments on real-world dynamic social networks: Facebook, NetHEPT, and Flickr. Experimental results demonstrate that, compared with the state-of-the-art static algorithm, IncInf achieves remarkable speedup in execution time while maintaining matching performance in terms of influence spread.

#### 1. Introduction

The increasing popularity of online social network has promoted the diffusion of information, opinions, adoption of new products, and so forth and provided great opportunities for intelligent viral marketing. To benefit best from the word-of-mouth effect, influence maximization (IM) is one fundamental and important problem that aims to identify a small set of influential individuals so as to develop effective viral marketing strategies to maximize the influence over a given social network [1]. As a matter of fact, real-world social networks keep evolving over time. For example, in Facebook, new people might join, while old ones might withdraw, and people might make new friends with each other. Moreover, real-world social networks are evolving in a rather surprising speed; it is reported that as much as 1 million new accounts are created in Twitter every day [2]. Such massive evolution of network topology, on the contrary, may lead to a significant transformation of the network structure, thus raising a natural need of efficient reidentification.

Existing researches and solutions on influence maximization focus mainly on developing effective and efficient algorithms on a given static social network. Although one could possibly run any of the static influence maximization methods, such as [3–6], to find the new top- influential individuals when the network is updated, this approach has some inherent drawbacks that cannot be neglected: () the running time of a specific static method can be extremely long and unacceptable, especially on large-scale social networks, and () whenever the network topology is changed, we need to recalculate the influence spreads for all the nodes, which leads to very high costs. Can we quickly and efficiently identify the influential nodes in evolving social networks? Can we incrementally update the influential nodes based on previously known information instead of frequently recalculating from scratch?

Unfortunately, the rapidly and unpredictably changing topology of a dynamic social network poses several challenges in the reidentification of influential users, which we list as follows. On one hand, the interconnections between edges in real-world social graphs are rather complicated; as a result, even one small change in topology may affect the influence spreads of a large number of nodes, not to mention the massive changes in large-scale social networks. It is very difficult to efficiently compute the changes of influence spreads for all the nodes after the evolution. On the other hand, since there are a great number of nodes in large-scale social networks, how to effectively limit the range of potential influential nodes and reduce the amount of calculation to the maximum is a very challenging problem.

To well address these challenges, we investigate the dynamic characteristics exhibited during the evolution of real-world social networks. Through tests on three real-world dataset traces, Facebook, NetHEPT, and Flickr, we observe that, first, the growth of social network is mainly based on the preferential attachment principle [7]; that is, the new-coming edges prefer to attach to nodes with higher degree, which naturally leads to the “rich-get-richer” phenomena; and, second, the top- influential nodes are mainly selected from those high-degree nodes. Inspired by such observations, we know that the influence changes of some nodes will have no impact on the top- selection and thus can be pruned to reduce the amount of calculation. Motivated by this, we propose IncInf, an incremental method to identify the top- influential nodes in evolving social networks instead of recalculating from scratch, thus significantly improving the efficiency and scalability to handle extraordinarily large-scale networks. To sum up, the main contributions of IncInf are as follows.

First, we design an efficient approach to quantitatively analyze the influence spread changes from network topology evolution by adopting the idea of localization. A tunable parameter is provided for tradeoff between efficiency and effectiveness.

Second, we propose a pruning strategy that could effectively narrow the search space into nodes only experiencing major increases or with high degrees based on the changes of influence spread and the previous top- information.

Third, we conduct extensive experiments on three dynamic real-world social networks. Compared with the state-of-the-art static algorithm, IncInf achieves remarkable speedup in execution time while providing matching influence spread. Moreover, IncInf provides better scalability to scale up to extraordinarily large-scale networks.

A preliminary version of this paper appears in [8], where we presented the basic idea of IncInf algorithm. In this paper, we make the following additional contributions. First, we add corresponding experiments to compare IncInf with IMM [9] in terms of influence spread and running time. Second, we test the effect of our pruning strategy to demonstrate its effectiveness. Third, we add a new experiment to evaluate the sensitivity of the localization parameter and pruning threshold in terms of influence spread and running time.

The remainder of this paper is organized as follows. In Section 2, we show the related work. Section 3 presents related preliminaries and problem definition. Section 4 shows the structural evolution characteristics of dynamic social networks that we observe from three datasets: Facebook, NetHEPT, and Flickr. Section 5 details the design of our incremental algorithm IncInf. The performance of IncInf is evaluated by comprehensive experiments in Section 6. We conclude the paper in Section 7.

#### 2. Related Work

Influence maximization on static networks has attracted a great deal of attention. The hill-climbing greedy algorithm proposed by Chen et al. suffers from low efficiency, and many efficient algorithms have been proposed recently to address this problem. Leskovec et al. [5] exploit the submodularity of influence spread function and develop an optimized greedy algorithm, CELF, which is much faster than basic greedy algorithm. Chen et al. [3] propose MixGreedy, which computes the influence spread for each seed set in one single simulation and incorporates the CELF optimization. MIA [4] uses local arborescence structures of each node to approximate the influence spread, thereby gaining efficiency by restricting computations and updates only to the local regions. However, MIA only considers static networks, while in this paper we specifically design an incremental algorithm for evolving social networks. Recently, Wang et al. [10] propose a Community Greedy Algorithm (CGA) that took community property into account. Goyal et al. propose CELF++ [11], which further exploits the property of submodularity of the spread function to avoid unnecessary recomputations of marginal gains and considerably improves the efficiency of CELF algorithm. IRIE [12] is also a heuristic proposed by Jung et al., which incorporates influence ranking algorithm with influence estimation method to achieve scalability. Liu et al. [13] design a new framework to accelerate the influence maximization by leveraging the parallel processing capability of GPU. Chen et al. [14] develop a community-based framework to tackle the influence maximization problem with an emphasis on the efficiency issue. Tang et al. [9] design a martingale approach that tries to find the top- nodes in near-linear time. And, in [15], Wang proposes a method to obtain each node’s marginal contribution by Owen value and deploys it in online terrorist network analysis. Lu et al. study the complexity of the influence maximization problem in deterministic linear threshold model in [16]. In [17], Lu et al. show how to efficiently estimate the influence spread for influence maximization under the linear threshold model. In [18], Nguyen and Zheng focus on the budgeted influence maximization (BIM) problem that aims to select seed nodes at a total cost no more than the fixed budget. Han et al. [19] study the influence maximization in timeliness networks and design a novel algorithm that incorporates time delay for timeliness and opportunistic selection for acceptance ratio. Liu et al. [20] propose the time-constrained influence maximization problem and develop a set of parallel algorithms for achieving more time savings. Pei et al. [21] take advantage of the concept of subcritical path and propose CI-TM, a collective influence algorithm of optimal percolation for second-order transitions.

The influence maximization problem on dynamic social networks still remains largely unexplored to date. Habiba et al. [22] and Michalski et al. [23] propose a dynamic social network model that is different from ours. In their proposal, the network keeps evolving during the process of influence propagation, and their goal is to find the top- influential nodes over such a dynamic network. When compared to [22, 23], our work is based on snapshot graph model and our goal is to incrementally identify top- influential nodes based on the topology changes of two adjacent snapshots. Chen et al. [24] extend the IC model to incorporate the time delay aspect of influence diffusion among individuals in social networks and consider time-critical influence maximization, in which one wants to maximize influence spread within a given deadline. Meanwhile, in [25], the authors consider a continuous time formulation of the influence maximization problem in which information or influence can spread at different rates across different edges. Aggarwal et al. [26] try to discover influential nodes in dynamic social networks and they design a stochastic approach to determine the information flow authorities with the use of a globally forward approach and a locally backward approach. Their influence model and target are different from ours. Zhuang et al. [27] argue that the evolution of online social network could not be fully observed and design a probing strategy so that the actual influence diffusion process can be best uncovered with the probing nodes. Tong et al. [28] mainly focus on the fact that the diffusion processes in real-world dynamic social networks have many aspects of uncertainness and propose a method that selects seed users in an adaptive manner.

#### 3. Preliminaries and Problem Statement

In this section, we illustrate the definition of social network and the influence diffusion model that we will use throughout the paper and then give the problem definition of influence maximization in evolving networks.

##### 3.1. Preliminaries on Influence Maximization

*Social Network*. A social network is formally defined as a directed graph , where node set denotes entities in the social network. Each node can be either active or inactive and will switch from being inactive to being active if it is influenced by other nodes. Edge set is a set of directed edges representing the relationship between different users. Take Twitter as an example. A directed edge will be established from node to if is followed by , which indicates that may be influenced by . denotes the influence probability of edges; each edge is associated with an influence probability defined by function . If , then .

*Independent Cascade (IC) Model*. IC model is a popular diffusion model that has been well studied in [3, 6, 10, 29]. Given an initial set , the diffusion process of IC model works as follows. At step 0, only nodes in are active, while other nodes stay in the inactive state. At step , for each node that has just switched from being inactive to being active, it has a single chance to activate each currently inactive neighbor and succeeds with a probability . If succeeds, will become active at step . If has multiple newly activated neighbors, their attempts in activating are sequenced in an arbitrary order. Such a process runs until no more activations are possible [29]. We use to denote the influence spread of the initial set , which is defined as the expected number of active nodes at the end of influence propagation.

*Basic Greedy Algorithm*. Domingos and Richardson [1, 30] first introduced the influence maximization problem on static networks in 2001. In [29], Kempe et al. propose a basic hill-climbing greedy algorithm as shown in Algorithm 1. The proposed greedy algorithm works in iterations, starting with an empty set (line ()). In each iteration, a node that brings the maximum marginal influence spread is selected to be included in (lines () and ()). The process ends when the size of reaches (line ()). However, this algorithm has a serious efficiency drawback due to the compute-intensive influence spread calculation. Several recent studies [3–6, 10, 12, 31–35] aimed at addressing this efficiency issue.