Influence Maximization Algorithm Based on Reverse Reachable Set

Sun, Gengxin; Chen, Chih-Cheng

doi:https://doi.org/10.1155/2021/5535843

Mathematical Problems in Engineering

On this page

Abstract Introduction Results Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Mathematical Problems of Applied System Innovations for IoT Applications

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 5535843 | https://doi.org/10.1155/2021/5535843

Influence Maximization Algorithm Based on Reverse Reachable Set

Gengxin Sun¹and Chih-Cheng Chen^2,3

Academic Editor: Isabella Torcicollo

Received11 Feb 2021

Revised05 Apr 2021

Accepted19 Jul 2021

Published28 Jul 2021

Abstract

Most of the existing influence maximization algorithms are not suitable for large-scale social networks due to their high time complexity or limited influence propagation range. Therefore, a D-RIS (dynamic-reverse reachable set) influence maximization algorithm is proposed based on the independent cascade model and combined with the reverse reachable set sampling. Under the premise that the influence propagation function satisfies monotonicity and submodularity, the D-RIS algorithm uses an automatic debugging method to determine the critical value of the number of reverse reachable sets, which not only obtains a better influence propagation range but also greatly reduces the time complexity. The experimental results on the two real datasets of Slashdot and Epinions show that D-RIS algorithm is close to the CELF (cost-effective lazy-forward) algorithm and higher than RIS algorithm, HighDegree algorithm, LIR algorithm, and pBmH (population-based metaheuristics) algorithm in influence propagation range. At the same time, it is significantly better than the CELF algorithm and RIS algorithm in running time, which indicates that D-RIS algorithm is more suitable for large-scale social network.

1. Introduction

Because the rapid development of social networks, the number of users, and the scale of information dissemination continue to expand, the problem of maximizing influence has received more and more attention. It is widely used in “viral marketing” [1, 2]. “Viral marketing” is a way to maximize brand awareness through word-of-mouth effects among users. Therefore, with limited resources, the key to maximizing influence is to select the appropriate initial communication users to maximize the final communication effect.

Richardson et al. [1] regard the problem of maximizing influence as an algorithmic problem; that is, under a specific information dissemination model, select k initial seed node sets from a social network to maximize the final influence dissemination range. Kempe et al. [3] proved for the first time that impact maximization is an NP-hard subject based on the Independent Cascade model (IC model) [4] and the Linear Threshold model (LT model) [5]. At the same time, a greedy algorithm (GA) is proposed. The algorithm selects the node with the most considerable marginal effect by iteration to ensure that it is close to the optimal solution within the range of . Due to the high time complexity, this algorithm is not suitable for large-scale social networks. Therefore, many researchers have proposed some optimization algorithms for the low efficiency of greedy algorithms. In 2007, Leskovec et al. [6] proposed the Cost-Effective Lazy Forwards (CELF) algorithm. It uses the characteristics of the inter-node influence propagation function to satisfy the submodularity, which increases the running speed of the greedy algorithm by 700 times. In 2011, Goyal et al. [7] proposed the CELF++ algorithm, which further reduced the time complexity of the CELF algorithm. These algorithms have achieved a certain degree of speed improvement. However, each time a node is selected to join the node set, the increase in the influence of the node is calculated, so the operating efficiency is still very low, and it is difficult to apply to large-scale social networks. At present, most scholars use heuristic algorithms to improve the running speed. Literature [8, 9] proposed different influence maximization algorithms on the basis of degree centrality. In 2010, Chen et al. [10] proposed the PMIA algorithm based on the maximum influence propagation path between nodes. In 2012, Jung et al. [11] proposed the IRIE heuristic algorithm for the IC model. In addition, heuristic influence maximization algorithms based on network topology have been proposed successively [12–14]. In 2016, Xie et al. [15] proposed a new heuristic algorithm to improve operational efficiency. Cao et al. [16] proposed a CCA algorithm based on K core. Still, these algorithms only focus on the topological structure of the network and lack a specific theoretical guarantee, which may cause the algorithm to fail to obtain the optimal solution. Based on the above problems, Sun et al. [13] proposed a RIS algorithm that combines theory and actual efficiency. The algorithm selects nodes by generating a certain number of reverse reachable sets and then calculates node influence many times so that the time complexity is close to linear, and there is a specific theoretical guarantee. Although the RIS algorithm has many advantages, it still has disadvantages such as insufficient accuracy and stability in selecting the number of the reverse reachable sets. Therefore, a lot of calculation costs are required in practice.

Any social animal has mutual influence between groups and individuals. As an advanced social animal with complex means of communication, interpersonal and social influence are everywhere in our social life. In-depth understanding of the generation and transmission mode of influence helps us understand the behavior of human groups and individuals, so as to predict people's behavior and provide reliable basis and suggestions for the decision-making of government, institutions, enterprises, and other departments.

In this paper, we propose a dynamic-reverse reachable set (D-RIS) algorithm based on reverse reachable set. The algorithm does not need to preset the theoretical threshold of the number of reverse reachable sets in advance but based on the monotonicity and submodularity of the influence propagation function, set the judgment conditions for generating the critical value of the random reverse reachable set, and automatically debugs the generation A certain number of reverse reachable sets can avoid time wastage while obtaining a better influence spread range.

2. Influence Maximization Algorithm Based on Reverse Reachable Set

The social network is abstracted as a network graph with a node-set (user) and a directed edge set E (the relationship between users), with , , , and . Assume that each edge e in has a propagation probability ; then, represents the probability that node activates node . For the convenience of presentation, Table 1 lists the symbols commonly used in this article.

2.1. Communication Model and Question Description

When looking for a specific set of seed nodes with the most significant influence in social networks, it is necessary to use a particular spread model to simulate the rules of spreading information on the network. The current classic information dissemination models include the IC model and the LT model.

The experiment in this article uses the IC model to simulate the maximum spread of user influence. In this model, a directed weighted graph with n nodes and m edges is given to represent the underlying network. The weight of edge represents the probability that node propagates to node u along edge e. Nodes in the IC model are divided into three states: activated state, newly activated state, and inactive state. Each newly activated node has one and only one chance to try to start adjacent nodes that are not activated with probability . The higher the value of , the greater the possibility of activation. When there are no influential active nodes in , the propagation process ends. The influence of propagation simulation on the IC model is started by random propagation from a set of seed nodes. Let be the number of random nodes eventually infected by the propagation simulation process and be the ultimate propagation impact of the node sets. This model simulates the propagation process of the infectious disease model [15, 16]. The seed set S is similar to a group of infected individuals, and the propagation simulation process of activating its adjacent nodes is identical to the spread of disease from one individual to another.

The following example describes how influence spreads in the IC model.

Figure 1 is an initial graph of a social network composed of four nodes, and the weight on each edge represents the propagation probability from the outside node to the in-side node. The activation probability of all nodes in the social network is defined as 0.5. The information propagation process of the social network is simulated as in Figure 1. is the initial seed set. Node is activated at time . Then, at time , node has a probability of 0.2 to activate node and probability of 0.8 to activate node , because of . At time , node is activated, . At time , node and node b have a probability of 1 being activated by node . Suppose that node activates node but not node , which affects the end of the propagation process. Because no new nodes on the network can be activated, the total number of nodes activated in the propagation process is 3; that is, , S = {a, b, c}. If node is activated at time node , then , . Since the IC model is a probability model [17], the propagation process and the final propagation result are not necessarily. The Monte Carlo method [14] is often used in experiments to take the average of multiple runs to ensure the accuracy of the results.

Given the social network and a constant , the problem of maximizing influence is to find a set of seed nodes in so that it has the widest range of influence under the IC propagation model, that is, finding the node set and such that is maximum.

2.2. RIS Algorithm

Borgs et al. [18] proposed the Reverse Influence Sampling (RIS) algorithm based on the IC model, which is a completely different influence maximization algorithm from other classic algorithms. The algorithm introduces a novel reverse reachable set (reverse reachable set, referred to as RR set) sampling method to replace the Monte Carlo method to calculate the influence of the expected propagation of nodes. The main idea is to generate as few reverse reachable set samples as possible and, finally, obtain a near-optimal solution in the range of . This algorithm proves that for any , it can run in the time of , and the time complexity is approximately linear time ( is the number of steps to select the reverse reachable centralized operation).

The RIS algorithm avoids the limitation of the high time complexity of the greedy algorithm and also solves the problem that the heuristic algorithm lacks theoretical guarantee and cannot obtain the optimal solution. But this algorithm cannot effectively control the number of random RR sets. They proposed a threshold-based method to generate random RR sets: when the total number of generated nodes and edges reaches a predetermined theoretical threshold, they stop generating random RR sets. Although this method has approximately linear time complexity, there is a great correlation between the generation of reverse reachable sets of fixed theoretical thresholds, and the hidden constants in practice are large, resulting in two shortcomings in the RIS algorithm. (1) The actual RR set sample size generated is greater than the theoretical threshold. (2) There is no guarantee that the theoretical threshold is the minimum number of samples generated in the RR set. Therefore, the sample size of the RR set selected by this algorithm is not accurate, and it is not well suited for solving large-scale social networks.

2.3. Based on Reverse Reachable Set: D-RIS Algorithm

For most of the classic influence maximization algorithms, the time complexity is too high or the optimal solution cannot be obtained. Based on the IC model and combined with the reverse reachable set sampling method, we propose a D-RIS (Dynamic-Reverse Influence Sampling) algorithm for maximizing influence.

The D-RIS algorithm is divided into two steps:(1)Generate a reverse reachable set (RR set): randomly select n nodes with replacement and generate a set R of node RR sets by performing propagation simulation on a random graph . The value of is determined by the method in Section 2.3.1.(2)Node selection: use the maximum coverage method to find k nodes that cover the most RR sets and return the node set S.

Analyzing the theory of the RIS algorithm, it can be known that if the sampling number of the random RR sets is too small, the algorithm will not get the optimal solution due to insufficient selection of nodes. If the sampling number of the random RR set is too large, although the error is reduced, the time complexity will be too high. Therefore, the accuracy of selecting the seed node set determines the final influence spread range and time efficiency. Therefore, the research focus of the algorithm in this paper is how to select the smallest possible RR set sample size, so that the algorithm can achieve a better balance between the spread of influence and operating efficiency.

This paper firstly refers to the sampling method in [17] to define a unified reverse reachable set sampling framework. On this basis, Section 2.3.1 puts forward a new critical value judgment method, which can dynamically select as few RR set samples as possible. Finally, Section 2.3.2 uses the maximum coverage method to select the seed node set.

Given a network , the algorithm captures the influence propagation process of nodes in by generating a set R of random RR sets. Let be a subset of the RR set of node , that is, the random RR set of the node. Graph is a random graph obtained by removing edge e in with a probability of . The specific definition and sampling process are as follows.

Definition 1. Reverse reachable set (RR set)
The set of reachable nodes in the random graph (for each node in the RR set, there is a directed path from to in ).
Sampling process is as follows. (1) Randomly select a node . (2) Generate a sample random graph on the network . (3) Return the reverse reachable set of node in the random graph .
The node in the above sampling process is called the source in , and all nodes in have a certain probability to activate the source node . Therefore, the presence of a certain node in more RR sets means that more nodes can be activated, and, at the same time, this node can produce a larger influence spread range. Based on the same inference, if the node set S with nodes covers a large number of RR sets, the nodes in the network have strong propagation ability to spread to the maximum range; that is, . Therefore, the influence of the node set S is proportional to the probability that S and the RR set intersect. So, to solve the problem of maximizing influence, we determine the lower bound of the R set. Section 2.3.1, based on this reverse reachable set sampling framework, sets up a dynamic debugging method to determine the minimum number of R sets.
Use an example to illustrate the process of generating a reverse reachable set for the social network in Figure 1 under the IC model, and set . Figure 2 shows three random RR sets , , generated on . Three random RR sets, , , and , generated for three randomly selected source nodes , , and , respectively. Because node appears in three random RR sets, node a is the most influential node. Therefore, the final return result is .

(a)

(b)

(c)

2.3.1. Determination of the Number of Reverse Reachable Sets

Analysis of the selection of the number of random RR sets in the RIS algorithm shows that the more the number (the larger the R set), the more accurate the selected seed node set, but it will cause a waste of time. Therefore, this section proposes a method to control the number of generated R sets as small as possible without affecting the final influence spread.

In the experiment of Section 3.2.1, we found that as the number of random RR sets increases, the increase in the spread of influence is not linear but diminishing in utility. Therefore, the relationship between the number of RR sets and the influence propagation range function satisfies both monotonicity and submodularity (diminishing marginal utility), which is defined as follows.(1)Monotonicity: set the influence propagation range function ; for any number of reverse reachable sets , there are (2)Submodularity: for the total number of nodes in the graph , set the influence propagation range function ; for the number of reverse reachable sets , and all , if , there are

Based on the above theory, for a given , the algorithm sets a critical value for the number of random RR sets, where (= is the random RR set selection ratio). When the number of random RR sets is less than , the maximum influence spread range cannot be achieved because the number of random RR sets selected is not enough. When the number of random RR sets is greater than , due to diminishing marginal benefits, the range of influence increases too slowly or no longer increases, resulting in a waste of time. Therefore, based on the current propagation situation of the nodes in the network, the algorithm automatically doubles the generation of reverse reachable sets in each round until the critical value judgment condition set in Algorithm 1 (line 7) is met three times, and the number of RR sets generated by the algorithm is considered to be infinitely close to critical value. The specific description is as follows.

	Input: ,
	Output: ,
	, , ,

	Generate a set of seed nodes with ratio

	generate random RR sets and add all to
	,
	if or ( and )

	else ,
	if
	break

	return ,

Set the influence spread range of this round to and the influence spread range of the last round to . Algorithm 1 gives the pseudocode in the process of generating the reverse reachable set of the D-RIS algorithm. The specific process is as follows:(1)Set the initial reverse reachable set number ratio to a very small value (e.g., in Algorithm 1, the value of is 0.001; then ), randomly select nodes with a ratio of from the node set in the graph to generate an RR set, and calculate the impact Power transmission range (Algorithm 1: Lines 4–6).(2)Each round doubles the value of and calculates the increase in the spread of influence in this round , which is . The following will make an effective judgment on the increase in the scope of influence in this round (Algorithm 1: Line 7); if the conditions are met, it is determined that this round of doubling has no effect on the growth of influence and may have been close to the critical value: Judgment condition is as follows: if or . That is, the increase in the range of influence of this round is less than or equal to 0 or less than the result of the root sign of the increase in the range of influence of the previous round.(3)Repeat the above steps until three consecutives doublings are invalid or when the value of is greater than or equal to 1 and stop generating reverse reachable sets. At this time, the number of random RR sets generated by the algorithm approaches the critical value.

Suppose the final inverse reachable set ratio is , at this time, a relatively stable and effective critical value of inverse reachable set is obtained, and at this time .

In the process of dynamic debugging to determine the value of , the value of rises gradually until it approaches the critical value. Except for the first round, each cycle does not generate proportional reverse reachable sets but generates proportional reverse reachable sets. We will scale the previous round to reverse reachable sets. The reached set is stored to combine the reverse reachable set of the ratio of the cost round. That is, the same number of reverse reachable sets are generated based on the original reverse reachable sets to double the effect. Therefore, the time efficiency of the algorithm is greatly improved.

In short, this section proposes a method to determine the critical value of a random RR set based on the monotonicity and submodularity of the influence propagation function, according to the real-time propagation of nodes in the network, and follows the reverse reachable set sampling framework to generate reverse reachable sets. Next, the D-RIS algorithm calls Algorithm 2 in Section 2.3.2 to find the set of seed nodes .

	Input:
	Output:

	for


	for RR sets contain

2.3.2. Seed Node Selection

The D-RIS algorithm uses the maximum coverage method for seed node selection. Algorithm 2 gives the pseudocode at this stage. Given and the number of reverse reachable sets , first, insert the random RR sets generated in Algorithm 1 into the set R. If , the seed set S covers a random RR set and define . Then, define the approximate value of as . So, the specific iteration process is as follows:(1)Each time, the algorithm greedily selects a node that covers the greatest number of nodes in the R set(2)Delete all the nodes in the R set in reverse reachable set (i.e., the node in the deleted reverse reachable set has a path that can be reached through the node)(3)Add the node to the set , update the R set, and proceed to the next iteration(4)Selected node set iteration ends

In the process of using the maximum coverage method to select k node sets, the greedy algorithm is used to repeatedly select the nodes that cover the largest marginal revenue to join the node set S, so the approximate solution of can be returned, and the nearly linear time complexity can be obtained.

The D-RIS algorithm mainly includes two stages. In the first stage, nodes are randomly selected to generate reverse reachable sets, among which and the time complexity is . For any randomly selected node , suppose the time complexity of the reverse reachable set generated by propagation simulation based on a certain propagation model is , where EVP is the width of the random RR set (i.e., the number of directed edges pointing to the node in the random graph ), and the time complexity of the first stage of the D-RIS algorithm is . The maximum coverage method used in the second stage selects k nodes using greedy thinking, which can get linear time complexity. So, the time complexity of the D-RIS algorithm is . We have the time complexity of the greedy algorithm , with representing the number of times Monte Carlo sampling is used, and and represent the total number of nodes and edges in the network , respectively. The values of , , are commonly very large. In contrast, D-RIS algorithm has better time complexity. Besides, compared with the RIS algorithm that can also achieve linear time complexity, the D-RIS algorithm is more accurate and reasonable in the selection of the number of reverse reachable sets. The experiment also shows that the operating efficiency of the D-RIS algorithm has a better advantage. According to the above analysis, it can be concluded that the D-RIS algorithm is more suitable for large-scale social networks.

3. Experiments and Results

3.1. Datasets

In order to verify the timeliness of the D-RIS influence maximization algorithm, we use two real datasets for experiments. As shown in Table 2, the first Slashdot dataset [19] is a dataset of friends sharing technology information websites. The site allows users to mark each other as “friends” or “enemies.” Of these, 76.7% of the nodes are in “friend” relationships. Some nodes with few or isolated social relationships are meaningless for the study of influence maximization. Therefore, we need to preprocess the original dataset and only select the nodes with a large number of social relationships.

In order to facilitate the comparison between different algorithms, in this paper, we processed the dataset and kept the friendship between 10,000 nodes. The number of friends after preprocessing is 36,338. The second dataset, Epinions [19], is an online social network based on trust. It is a dataset containing multiple relationships. If there is a directed edge from node to node, the node trusts the node. In this paper, we preserved the trust relationship of 10,000 nodes after preprocessing this dataset, which can be downloaded from the Stanford large network dataset website.

3.2. Experimental Results and Analysis

The information dissemination model used in the experiment is the independent cascade (IC) model, and the dissemination probability is set to 0.08. The experiment was run 10,000 times in Monte Carlo and averaged to obtain the influence propagation range of the simulated propagation process. In order to verify the rationality and timeliness of the D-RIS algorithm, the comparative experiment algorithms we selected are currently five representative algorithms: CELF algorithm is an improved algorithm of greedy algorithm. The core idea is basically the same, and the efficiency is improved by hundredfold. Therefore, this paper selects the CELF algorithm as a contrast algorithm with greedy thinking. HighDegree algorithm [20] is the most classic heuristic algorithm based on node centrality; K nodes with the largest degree value are selected as the seed node set. LIR algorithm [13] is a heuristic algorithm based on topological structure. This algorithm selects the node with the largest local degree value and sorts it and then selects the seed node set. pBmH algorithm [14] is a heuristic algorithm, which is based on topological structure; this algorithm takes into account the influence of nodes by multiple neighbor nodes and avoids the phenomenon of rich clubs. RIS algorithm [17] is an algorithm based on reverse reachable set sampling that generates a certain theoretical threshold number of reverse reachable sets and then selects the seed node set.

We set up the simulation experiment as follows: D-RIS algorithm rule verification uses the Slashdot dataset to verify and analyze the monotonicity and submodality of the influence propagation function in the RIS algorithm and test this rule on the D-RIS algorithm. D-RIS algorithm and RIS algorithm comparison experiment verification, the number of reverse reachable sets of different ratios of the RIS algorithm is set on the two datasets of Slashdot and Epinions, which will affect the D-RIS algorithm separately. The influence propagation range and running time are compared and analyzed Comparison of D-RIS algorithm with other four classic algorithms: Section 3.2.3 of the experiment compares D-RIS algorithm with CELF algorithm, HighDegree algorithm, LIR algorithm, and pBmH algorithm on two different real datasets for influence propagation range, and the comparative analysis of running time verifies that the D-RIS algorithm has better timeliness than that of existing algorithms.

3.2.1. D-RIS Algorithm Rule Verification

Set ; the RIS algorithm starts to iterate from and double the ratio of the reverse reachable set in each round until three consecutive doublings are invalid or stop.

It can be seen from Figure 3 that as the reverse reachable set ratio becomes larger, the front part of the curve shows an upward trend. The spread of influence continues to increase, which shows that the spread of influence of the RIS algorithm and the D-RIS algorithm is monotonic. In the RIS algorithm, when the reverse reachable set ratio is more significant than 0.01, the upward curve with the number of reverse reachable sets tends to be flat. This shows that the influence spreading function has the property of diminishing marginal effects due to the submodularity. From the curve in the figure, it can be seen that the expansion of the influence range gradually weakens. When the reverse reachable set ratio is 0.03, the curve's downward trend is slow, which is in line with the actual situation. Theoretically, the influence propagation range of the algorithm is monotonic. Due to the probability model’s use as the propagation model, there are inevitable fluctuations in the experiment.

Figure 3 verifies that the basic reverse reachable sets influence propagation function has certain rules based on monotonicity and submodularity. With this rule, the RIS algorithm can be improved, which is also the theoretical basis of the D-RIS algorithm proposed in this paper. In the figure, the D-RIS algorithm is also verified on the real dataset, and the result shows that the upward trend of the curve increases with the increase of the number of reverse reachable sets and then becomes flat. The D-RIS algorithm only needs to preset a smaller reverse reachable set ratio, and it can automatically double the debugging ratio until the condition is met. It avoids the problem that the unreasonable selection of the reverse reachable set ratio in the RIS algorithm leads to the failure of the optimal propagation range or the wasted time. This experiment shows that the D-RIS algorithm has certain rationality and practical significance.

3.2.2. D-RIS Algorithm and RIS Algorithm Comparison Experiment Verification

Set the reverse reachable set ratio of the RIS algorithm to 0.001, 0.2, and 0.5. Compare the influence spread range and running time with the D-RIS algorithm on two different datasets. Figures 4–9 are the comparative experimental results of the two algorithms on two different datasets.(1)Set the reverse reachable set ratio of the RIS algorithm to 0.001: when the RIS algorithm’s reverse reachable set ratio is 0.001 (Figures 4 and 5), the RIS algorithm runs fast, but the influence spread is smaller than the D-RIS algorithm. Especially when the value is low, there is a doubled gap in the spread of influence between the two. This is because the threshold of the number of reverse reachable sets in the RIS algorithm is too small, which results in the insufficient number of seed nodes selected, which affects the final propagation range of the algorithm.(2)Set the reverse reachable set ratio of the RIS algorithm to 0.2: as shown in Figures 6 and 7, when the reverse reachable set ratio of the RIS algorithm is 0.2. In the Slashdot dataset, the influence spread of the two algorithms is close, but the time efficiency of the D-RIS algorithm is higher than that of the RIS algorithm. In the Epinions dataset, the D-RIS algorithm greatly improves the running time under the premise of obtaining a larger influence spread range, and the larger the selected seed node set, the more obvious the advantage.(3)Set the reverse reachable set ratio of the RIS algorithm to 0.5: as shown in Figures 8 and 9, the RIS algorithm sets the reverse reachable set ratio to 0.5. On the two datasets, the D-RIS algorithm has a better spread range of influence, and the operating efficiency is much higher than that of the RIS algorithm. It can be seen that a too large reverse reachable set ratio will result in a waste of the final time cost of the algorithm. For the Slashdot dataset, the running time of the RIS algorithm is more than twice that of the D-RIS algorithm. For the Epinions dataset, the running time of the RIS algorithm is more than 7 times that of the D-RIS algorithm. Therefore, the D-RIS algorithm in this article is in the running time. The advantages are more obvious.

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

In summary, through experimental verification on two real datasets, it can be seen that when the theoretical threshold of the reverse reachable set of the RIS algorithm is set too small, the influence propagation range is small. When the theoretical threshold of the reverse reachable set is too large, the time efficiency of the RIS algorithm is too poor. The D-RIS algorithm can achieve a better influence spreading range and at the same time run more efficiently.

In addition, compared with the RIS algorithm, the D-RIS algorithm avoids the inaccurate setting of the theoretical threshold of the number of reverse reachable sets, which leads to the problem of not reaching the optimal influence propagation range or causing a large waste of time. For the current complex social networks, the D-RIS algorithm does not require repeated calculations, and the algorithm automatically debugs to generate a certain ratio of reverse reachable set that is also more suitable for subsequent network structure changes. Therefore, the D-RIS algorithm has certain practical significance.

3.2.3. Comparison of D-RIS Algorithm with Other Four Classic Algorithms

On two different data sets, the D-RIS algorithm is compared with the heuristic HighDegree algorithm, LIR algorithm, and pBmH algorithm and the greedy-based CELF algorithm to compare the influence propagation range and running time.

According to the analysis of the experimental results in Figure 10 (Slashdot dataset) and Figure 11 (Epinions dataset), we have the following:(a)The influence propagation range of the D-RIS algorithm is basically similar to the CELF algorithm which is close to the optimal solution within the range. But D-RIS runs faster, and the larger the seed node set, the more obvious the advantage; the difference is close to hundreds of times; this is because the CELF algorithm uses the Monte Carlo method for calculations, resulting in extremely high time complexity, so D-RIS algorithm is more suitable for large-scale social networks.(b)Compared with heuristic algorithms (HighDegree algorithm, LIR algorithm, and pBmH algorithm), although the D-RIS algorithm performs poorly in terms of running speed, the spread of the algorithm’s influence is much higher than these heuristic algorithms. In the Epinions dataset, the influence spread of the heuristic algorithm is only about 50% of that of the D-RIS algorithm. In the Slashdot dataset, the D-RIS algorithm has more obvious advantages in spreading influence. It can be seen that although the heuristic algorithm has extremely high operating efficiency, it does not take into account that the complex network follow-up structure results in the selection of seed nodes that are not accurate enough, and the spread of influence is small, and the optimal solution is not reached. In addition, the stability of the heuristic algorithm is not good in different datasets.

(a)

(b)

(a)

(b)

Based on the comparative experimental analysis of the above algorithms, it can be seen that the D-RIS algorithm proposed in this paper has achieved a good balance between the influence spread range and time efficiency and has shown good versatility and stability and is more suitable for large-scale social networks.

4. Conclusions

In this paper, we propose a D-RIS influence maximization algorithm based on the independent cascade model combined with the reverse reachable set. Compared with the traditional RIS algorithm, the above algorithm obtains the number of reverse reachable sets by setting the automatic tuning threshold instead of the fixed threshold. The experimental results show that D-RIS algorithm is close to CELF algorithm and higher than RIS algorithm, HighDegree algorithm, LIR algorithm, and pBmH algorithm in the spread of influence, and it is significantly better than CELF algorithm and RIS algorithm in running time. Therefore, the D-RIS algorithm proposed in this paper has dual advantages in terms of time efficiency and influence spread and can be applied to structural changes and large-scale social networks. In the following research, we will focus on extending the D-RIS algorithm to a more realistic multirelationship influence propagation model and improve the efficiency of the D-RIS algorithm.

Data Availability

The data used to support the findings of this study are included within the article. The nature of the data is an excel file, and the data can be accessed on https://github.com/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the Shandong Provincial Natural Science Foundation, China, under Grant no. ZR2017MG011.

References

M. Richardson and P. Domingos, “Mining knowledge-sharing sites for viral marketing,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 61–70, Edmonton, Alberta, Canada, July 2002.
View at: Publisher Site | Google Scholar
P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66, San Francisco, CA, USA, August 2001.
View at: Publisher Site | Google Scholar
D. Kempe and J. Kleinberg, “Maximizing the spread of influence through a social network,” in Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146, ACM, Washington, WA, USA, August 2003.
View at: Publisher Site | Google Scholar
J. Goldenberg, B. Libai, and E. Muller, “Talk of the network: a complex systems look at the underlying process of word-of-mouth,” Marketing Letters, vol. 12, no. 3, pp. 211–223, 2001.
View at: Publisher Site | Google Scholar
J. Goldenberg, B. Libai, and E. Muller, “Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata,” Academy of Marketing Science Review, vol. 9, no. 3, pp. 1–18, 2011.
View at: Google Scholar
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance, “Cost-effective outbreak detection in networks,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429, ACM, San Jose, CA, USA, August 2007.
View at: Publisher Site | Google Scholar
A. Goyal, W. Lu, and L. V. S. Lakshmanan, “CELF++: optimizing the greedy algorithm for influence maximization in social networks,” in Proceedings of the 20th International Conference Companion on World Wide Web, pp. 47-48, ACM, Hyderabad India, March 2011.
View at: Google Scholar
S. Bin and G. Sun, “Matrix factorization recommendation algorithm based on multiple social relationships,” Mathematical Problems in Engineering, vol. 2021, 8 pages, 2021.
View at: Publisher Site | Google Scholar
W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization in social networks,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 199–208, ACM, Paris, France, June-July 2009.
View at: Publisher Site | Google Scholar
W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization for prevalent viral markerting in large-scale social networks,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1029–1038, ACM, Washingtone, WA, USA, July 2010.
View at: Publisher Site | Google Scholar
K. Jung, W. Heo, and W. Chen, “IRIE: scalable and robust influence maximization in social networks,” in Proceedings of the 12th IEEE International Conference Data Mining (ICDM), pp. 918–923, IEEE, Piscataway, NJ, USA, March 2012.
View at: Google Scholar
Z. Wang, H. Wang, Q. Liu, and E. Chen, “Influence nodes selection: a data reconstruction perspective,” in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 879–882, ACM, Gold Coast, Queensland, Australia, July 2014.
View at: Publisher Site | Google Scholar
G. Sun, C.-C. Chen, and S. Bin, “Study of cascading failure in multisubnet composite complex networks,” Symmetry, vol. 13, no. 3, p. 523, 2021.
View at: Publisher Site | Google Scholar
D.-L. Nguyen, T.-H. Nguyen, T.-H. Do, and M. Yoo, “Probability-based multi-hop diffusion method for influence maximization in social networks,” Wireless Personal Communications, vol. 93, no. 4, pp. 903–916, 2017.
View at: Publisher Site | Google Scholar
S. Xie, Y. Liu, J. Zhu et al., “Research on topic-based local influence maximizing algorithm in social network,” Journal of Frontiers of Computer Science & Technology, vol. 10, no. 5, pp. 646–656, 2016.
View at: Google Scholar
J. Cao, D. Dong, S. Xu et al., “Self-Interest influence maximization algorithm based on subject preference in competitive environment,” Chinese Journal of Computers, vol. 2, pp. 238–248, 2015.
View at: Google Scholar
G. L. Tian, S. Zhou, G. X. Sun, and C. C. Chen, “A novel intelligent recommendation algorithm based on mass diffusion,” Discrete Dynamics in Nature and Society, vol. 2021, Article ID 4568171, 9 pages, 2021.
View at: Publisher Site | Google Scholar
C. Borgs, M. Brautbar, J. Chayes, and B. Lucier, “Maximizing social influence in nearly optimal time,” 2012, https://arxiv.org/abs/1212.0884.
View at: Google Scholar
R. M. May and A. Lloyd, “Infection dynamics on scale-free networks,” Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, vol. 64, no. 2, Article ID 066112, 2001.
View at: Publisher Site | Google Scholar
G. Sun and S. Bin, “A new opinion leaders detecting algorithm in multi-relationship online social networks,” Multimedia Tools and Applications, vol. 77, no. 4, pp. 4295–4307, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Gengxin Sun and Chih-Cheng Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

899

Downloads

643

Citations