Abstract

Influence maximization (IM) is fundamental to social network applications. It aims to find multiple seed nodes with an enormous impact cascade to maximize these nodes’ spread of influence in social networks. Traditional methods for solving influence maximization of the social network, such as the distance method, greedy method, and PageRank method, may suffer from issues of low calculation accuracy and high computational cost. In this paper, we propose a new bacterial foraging optimization algorithm to solve the IM problem based on the complete-three-layer-influence (CTLI) evaluation model. In this algorithm, a novel grid-based reproduction strategy and a direction-adjustment-based chemotaxis strategy are devised to enhance the algorithm’s searchability. Finally, we conduct comprehensive experiments on four social network cases to verify the effectiveness of the proposed algorithm. The experimental results show that our proposed algorithm effectively solves the social network’s influence maximization.

1. Introduction

With the rapid development of various new Internet technologies in recent years, many social network applications [1], such as Facebook, Twitter, WhatsApp, and Instagram [1, 2], have emerged and become increasingly fashionable. With an increasing number of users and interconnections, the social network applications usually serve as the popular information exchange platform where online nodes’ information and opinion are propagated in a word-of-mouth way [1], instead of the traditional communication channels, for example, landline and newspaper [2]. Furthermore, these social networks can also serve as the virtual marketing platform for commercial advertisers [3], having the potential of saving costs and increasing profit. Under such an information exchange environment, the spread of various user information is dependent on the word-of-mouth communication coming from social circles because the word-of-mouth effect from a small group of seed nodes can lead to a broader range of cascading influences. Therefore, selecting a set of seed nodes that maximize the spread of information (e.g., opinions and ideas) in the network is a significant problem faced by social network decision-makers [49]. This problem is called the influence maximization (IM) problem. It has been shown from the previous research that the IM problem is an NP-hard problem [10, 11], especially when the scale of the social network increases. It is difficult for the optimization algorithm to solve an NP-hard problem. This paper focuses on finding an effective algorithm to determine a specific seed node that can maximize the target social network’s influence.

Domingos and Richardson [3] took the IM model as an optimization problem according to market promotion principles, used the evolutionary algorithm (EA) for processing, and obtained satisfactory results. Kempe et al. [12] demonstrated that the IM problem is an NP-hard problem and the KK-greedy algorithm is an excellent optimization method, which can obtain an optimal solution similar to (1–1/e). Specifically, the independent cascade (IC) model [13] and the weighted connection (WC) model used this greedy algorithm to determine the initial seed node set [12, 14]. Through a set of experiments, the greedy algorithm showed a significant improvement in terms of solution accuracy compared with the degree-based heuristic algorithm. However, the KK-greedy algorithm is rather computation-expensive, requiring many simulations for computing in each round. Accordingly, it is inefficient to deal with large-scale social network scenarios, where millions or even billions of online nodes and interconnections are involved. Recently, many studies have made many efforts to reduce the computational cost of the target social network. Leskovec et al. [15] employed the greedy method’s submodularity to propose a cost-effective delayed forward strategy (CELF). Using this strategy, the calculation time of the CELF algorithm can be a hundred times faster than the previous greedy algorithm. Goyal et al. [5] proposed a new CELF++ algorithm based on CELF, which is 35%∼55% faster than CELF. After that, some heuristic algorithms were improved and used to find a seed set of size k that can influence other seeds in a specific social network propagation mode. However, these algorithms still face several challenges. They are still inefficient in approximating the influence spread.

As a promising bioinspired computation paradigm, the evolutionary algorithms (EAs) [1621] have shown a significant potential to solve various complex IM problems. EAs are widely used in many practical application problems [2227]. Among those, the bacterial foraging optimizer (BFO) [19, 28] is a relatively new population-based optimization paradigm due to its fast convergence and strong robustness. Recent computation results have verified the superior performance of BFO paradigms in solving various complex real-world practical applications [2932] because the BFO algorithm can carry out parallel search and can be easy to jump out of the local optimal solution. BFO algorithm and its variants [32, 33] have shown powerful ability in solving complex (binary multimodal) optimization problems with high dimensionality and large-sized data sets than some previous intelligence algorithms. The IM problem in the social network is an NP-hard problem, and most social network instances are large-sized graph structures with a large number of discrete data interrelated. Therefore, the NDBFO algorithm proposed by adding a new discrete strategy and grid strategy has strong performance when dealing with IM problems.

In this paper, based on the complete-three-layer-influence (CTLI) evaluation model proposed in our previous work [32], a new discrete grid-based BFO is proposed to deal with the complex IM problem of the social network. This algorithm’s basic idea is to integrate the grid-based search strategy into the reproduction operation and incorporate the direction-adjustment strategy into the chemotaxis operation. The new strategy can enhance the algorithm’s development and exploration search capabilities on IM problems. The contribution of our algorithm is as follows:(1)Aiming at the complex CTLI problem, a new discrete grid-based bacterial foraging optimization algorithm (NDBFO) is proposed to approximate the propagation range of nodes in the IC model. In this algorithm, the reproduction operation is enhanced by an improved grid-based strategy. The direction-adjustment strategy improves the chemotaxis operation to deal with discrete variables. Compared with the classic BFO algorithm, the NDBFO algorithm can deal with complex discrete problems and has better search performance.(2)The effectiveness and efficiency of the NDBFO algorithm in terms of search accuracy and computational efficiency are verified through experiments.

The remainder of this paper is organized as follows: In Section 2, we present the background of the social network’s IM problem. The description of the model and algorithms is proposed in Section 3. Section 4 proposes our algorithm NDBFO. The experiment results of our algorithm and other compared algorithms are shown in Section 5. Conclusions are outlined in Section 6.

2. Background

The online social networks can be represented by graph G = (V; E; A). V is a node set that represents all users. E is an edge set that represents the social relationship between two connected users. A is an adjacent matrix that represents connections between nodes and edges. The graph includes a directed graph and an undirected graph. Specifically, (, u) (from to u) and (u, ) are equivalent. Two connected nodes can reach each other due to the undirected edge of the undirected graph. Therefore, the undirected social network is a special form of directed networks.

2.1. Propagation Model

Based on the IC model [22], the calculations on the activation probability between nodes is calculated as follows:where and are weight and probability of edge (, u), respectively. Also, node will not have a chance to activate node u again if node does not activate node u.

2.2. Conventional Algorithms for IM

Qin et al. proposed a three-layer approximation approach (TLAA) [34]. It is defined as follows:where is the propagation attenuation coefficient of the layer-i (when ), represents that is owned by layer-i, and denotes the node set composed of nodes whose shortest distance to the node seed set S is equal to i.

(1)Initialize the food sources ().
(2)Elimination-dispersal loop: l = 1 to Ned do
(3)Reproduction loop: k = 1 to Nre do
(4)Chemotaxis loop: j = 1 to Nc do
(5)For each i = 1 to S do
(6)Compute fitness function for ith bacteria, J (i, j, k, l).
(7)Let .
(8)Tumble: generate a random vector .
(9)Swim: continue to move in the direction of with the chemotactic step-size , and update the location of ith bacteria until the fitness value does not become better.
(10)End for
(11)End Chemotaxis loop
(12)End Reproduction loop
(13)End Elimination-dispersal loop
2.3. Bacterial Foraging Optimization Algorithm

The BFO algorithm’s implementation is enabled by simulating the foraging behavior of bacteria, as shown in Algorithm 1. Ned is the maximum number of cycles in the elimination-dispersal loop, Nre is the maximum number of cycles in the reproduction loop, Nc is the maximum number of cycles in the chemotaxis loop, S is the number of bacteria in the population. In BFO, the way of bacteria foraging is to maximize energy in unit time. Bacteria communicate with each other by sending signals. According to the above two factors, bacteria make a reasonable foraging decision. During this foraging process (lines 4–8 in Algorithm 1), bacteria make the unit move step (chemotactic step-size) to find food.

To be specific, there are four main processes in the classic BFO algorithm: chemotaxis, swarming, reproduction, and elimination-dispersal. In the chemotaxis process (lines 8-9 in Algorithm 1), each bacterium generates a random vector , and each bacterium moves by chemotactic step-size until meeting the criterion. The cell-to-cell communication of bacteria is performed by releasing cell-to-cell signaling in the swarming process (lines 6-7 in Algorithm 1). The reproduction process accelerates convergence by retaining elite individuals. In the elimination-dispersal process, some bacteria are eliminated and a new replacement is initialized at random, which maintains the diversity of the population.

3. Objective Model

3.1. Primary Definitions

In conventional models [35], a node’s global influence is limited within the two-hop range. Accordingly, the local influence estimation (LIE) is used to approximate the influence spread within the two-hop area for a set of nodes. Moreover, the LIE function can be incorporated to compute the influence spread in traditional IC and WC models. The three-degree theory [35] has recently shown that the influence spread of a node in social networks can be within the circle of neighbors’ neighbors’ neighbors (i.e., three-hop area). As presented in [32], although the information spread in a social network gradually deteriorates with the layers’ increase, we can consider the influence of the nodes within three layers, which can be more accurate than the conventional two-hop model [34].

For the sake of clarity, we first give the following basic definitions:

Definition 1. S is the initial node set, and u, , and m are nodes:(1)If , , such that , then is a neighbor of u in layer-1(2)If , such that and , then m is a neighbor of u in layer-2(3)If , such that and , then t is a neighbor of u in layer-3In practice, information diffusion exists across the layers and exists in the same layer. Furthermore, the information diffusion propagated not only from high to low but also from low to high. Therefore, the three-degree model is necessary to consider the propagation of intralayer and interlayer simultaneously.
Next, based on the above definitions, we have introduced a complete-three-layer-influence (CTLI) evaluation model [32], aiming to approximate the influence spread within the three-hop area of node set. Consequently, the IM problem can be converted into an optimization model represented by the CTLI functions.

3.2. CTLI Model

The main idea of the CTLI is to incorporate the propagation of interlayer and intralayer into the three-degree model. In particular, as presented in our previous work [32], the CTLI model is formulated in the following procedures.

The number of expected propagation nodes is calculated as follows:Layer-1:Layer-2:Layer-3:

Given the above equations, the proposed CTLI function is defined aswhere u is a node in S, and are nodes in layer-1, m and n are nodes in layer-2, t and r are nodes in layer-3, k is the number of initial nodes, is a propagation attenuation coefficient in layer-i, and is the final activation probability of neighbor in layer-i.

4. Proposed Approach

Firstly, in the chemotaxis process, the novel principle of chemotaxis is defined to accelerate the convergence, and a novel strategy for solving a discrete problem is applied to this step, as presented in Algorithm 2. An improved grid-based strategy is then applied to the reproduction process to maintain the diversity of the population, as displayed in Algorithm 3.

(1)For each bacterium in the population
(2)Calculate the fitness of each bacterium.
(3)Tumble: generate a vector by equation (7). Transform the chemotactic direction vector into probability by equations (8) and (9).
(5)Swim: continue to change the node in the probability by equation (10) until the new probability does not dissatisfy the criteria of equation (10).
(6)End
4.1. Chemotaxis Process

The updated ways of position and direction in the BFO algorithm are redesigned according to the characteristics of influence maximization that is a discrete problem.

4.1.1. Update Way for the Direction of Chemotaxis

Each bacterium in the population uses chemotaxis to find promising areas, which is the most crucial step in the optimization process. Based on [22, 36], we make each reference to excellent experience from others to promote the poor convergence of BFO. In the NDBFO algorithm we propose, the new update way for the direction of chemotaxis is defined as follows:where and are learn factors and is the current global optimal position.

First, we transform the chemotactic direction vector into a probability vector using the sigmoid function. This probability vector is defined as follows:where is the updated probability of jth dimensional decision vector of ith individual in th iteration. It represents that the further away from the outstanding individual, the greater the updated probability.

4.1.2. Update Way for the Position

The position update way can be calculated as follows:where N is the node set in the social network. is a running function by randomly selecting a node from node set N to replace element .

4.2. Reproduction Process

In the classic BFO algorithm, we perform reproduction to mimic the evolutionary principle: the survival of the fittest. The unhealthiest bacteria will die. On the contrary, healthier bacteria (those that produce lower target functional values) split asexually into two individuals in the same place, which keeps the population size the same. Nevertheless, such operation is not considered because the more extensive the population within a specific range, the more challenging the competition with the population. According to this consideration, we proposed a novel reproduction of BFO using an improved grid strategy.

We use a grid strategy to simulate that the larger the population within a specific range, the more challenging the competition with the population where the grid strategy is a valuable technology to keep the diversity of individuals in the population.

Although keeping the balance of convergence and diversity is difficult in the reproduction phase, the grid strategy should be a natural way to combine them in BFO. We denote the number of individuals in the grid as crowding degree, which can reflect the diversity of the population within a range. Since the diversity of the population can be clearly shown in the grid, we can select the individuals with solid adaptability by the fitness of the individuals in the same grid.

Although the grid strategy can significantly improve population diversity, the traditional grid strategy, shown in Figure 1(a), is not very useful in real-world social networks with enormous initial node numbers. Accordingly, we proposed a modified grid strategy, which is described in Figure 1(b), using Euclidean distance as the scale to divide the grid. In this way, we can transform the multidimensional grid problem into a single-dimensional grid problem.

The setting of the grid in the value of the kth dimensional decision vector is shown in Figure 2. The boundary values of all dimensions in the population are recorded. Maximum and minimum values are selected among these values, denoted by and .

In the improved grid strategy, and represent the maximum and minimum Euclidean distance in a population, respectively. The lower boundary lb and the upper boundary ub are calculated by equations (11) and (12), respectively. A quarter circle with the origin is drawn as the center and lb and ub as radius, respectively, as shown in Figure 1(b). Then, the region formed by the two arcs is divided into rings. This way is used to structure a new grids system, where the width of each ring is a constant parameter d.

The specific process of using grid strategy to improve the reproduction of BFO is shown in Algorithm 3.

(1)Each dimension of decision space is divided into M hyperboxes.
(2)While size (P)> pop/2
(3)If only one hyperbox has the highest crowding degree:
Compute the fitness of each individual in this hyperbox.
Remove the individual with the largest value of fitness.
(4)If more than one hyperbox have the same crowding degree:
Pick one at random from the same hyperbox.
Compute the fitness of each individual in this hyperbox.
Remove the individual with the largest value of fitness.
(5)End While
(6)Replicate these pop/2 individuals in the population end up
with the size of population equal to pop.

Some of the formulas are as follows:

Thus, the width of each ring can be formed as

The coordinates of any individual in the decision space-based grid are determined aswhere is coordinate of individual A in decision space-based gird, is the actual value of individual A in decision space.

4.3. Encoding

To solve the IM problem, the data structure of each bacterium is redefined to establish the mapping relationship between the bacterium and the initial seed node set. In NDBFO, the solution vector is redefined as a discrete form. For an initial node set including k nodes selected, which is expressed in the form of the solution and defined as a k-dimensional discrete vector . Figure 3 shows an example of encoding. The example shows a bacterium , which means that the seven nodes in graphs 3, 6, 9, 17, 25, 39, and 55 are a set of candidate initial seed nodes that can spread those influences in social networks.Initial node-k is (, V is a node set that represents all users).

4.4. Problem Solving Process

The process of the NDBFO algorithm proposed in this paper to solve the CTLI model is shown in Figure 4. First, initialize the population, and calculate the CTLI values of the individuals in the population. Then, enter the three-layer cycle structure of the NDBFO algorithm, including the novel chemotaxis process and novel reproduction process proposed above, where the comparison for fitness value of the individual is converted to the comparison of CTLI value.

5. Experimental Study

Firstly, we conduct the experiment parameter sensitivity to prove the influence of parameter variation on algorithm performance. The DBFO with novel chemotaxis strategy and reproduction strategy is compared with the original DBFO on four real-world social networks to evaluate the proposed novel chemotaxis strategy and reproduction strategy. Afterward, the Degree, Distance, DegreeDiscount, PageRank are used to do performance comparison algorithms with NDBFO on influence spread. By optimizing the CTLI function for Football, NetScience, Power, and NetGRQC network, we can get the advantages and disadvantages of the NDBFO on these real-world instances. The main parameters’ setting of compared algorithms is referred to in [12, 29, 37, 38]. The attributes for all of the above instances are shown in Table 1.

This experiment uses four real-world social networks: Football, NetScience, Power, and NetGRQC network. Football represents 115 college football games played in the United States in 2000, where a node denotes a Football team and an edge means playing a game between two nodes. In NetScience and NetGRQC, a node represents an author, and edges denote coauthorships between scientists. In Power, an edge denotes transmission lines and transformer branches between nodes. The attributes for all of the above instances are shown in Table 2.

5.1. Experiment 1: Parameter Selection

In NDBFO, n (population size) and CR (selection rate) have an excellent effect for experimental results, so we use comparative experiments to get the value of n and CR with different values on four instances based on IC models, including Football, NetScience, Power, and NetGRQC network. Furthermore, rest parameters setting are referred to in [32].

5.1.1. Analysis of n

In this experiment, we only need to control other parameters to keep consistent, only change the value of n. Here, CR is set to 0.5. The setting of n includes 10, 50, 80, 100, 120, 150, and 200. As shown in Figure 5, when the value of n is 50, the value of CTLI is the most smooth and best on all four instances. So, the value of n was set to 50.

5.1.2. Analysis of CR

It is essential to maintain population diversity to select a suitable CR value in the bacterium update process. This experiment follows the above experimental setting for analyzing n, that is, the effect of different values of CR for NDBFO on these four social network instances. Table 3 shows the mean value of CTLI, where CR varies from 0.1 to 1. As can be seen from Table 3, the optimal solution changes with the increased value of CR.

Clearly, the changed values of CR did not have much effect on Football social networks. The value of CTLI is stable between 35 and 36. The same conclusion is reached on the NetScience social network. On a robust social network, the CTLI value where CR = 0.1 is significantly higher than that with other values, but when CR is not 0.1, CTLI values’ difference is not apparent. The change in the CR values makes a noticeable difference in CTLI values on the NetGRQC social network. When CR = 0.5, NDBFO has the best effect on the NetGRQC social network. In conclusion, the CR value is set to 0.5.

5.2. Experiment 2: Effect of the Proposed Reproduction Strategy

The proposed NDBFO algorithm joined the proposed novel reproduction strategy (DBFO-nrep) compared with the original DBFO on the Football network, NetScience network, Power network, and NetGRQC network. The y-axis is CTLI values on four real-world networks, respectively. Moreover, the x-axis is the number of iteration. It can be used to detect whether the proposed novel reproduction strategy can increase the performance by comparing the two algorithms’ trends in these two curves.

As shown in Figures 6(b) and 6(c), in each iteration, the curve obtained by DBFO with the best CTLI value converges almost faster than that obtained by DBFO-nrep. As shown in Figures 6(a) and 6(d), in most iteration, the curve obtained by DBFO with the best CTLI value converges almost faster than that obtained by DBFO-nrep. Besides, the final CTLI value of DBFO is significantly better than that of DBFO-nrep. Specifically, the superiority of DBFO over DBFO-nrep is gradually enhanced on the Football network, NetScience network, and NetGRQC network with the progress of evolution. On the Football network, Compared to DBFO, DBFO-nrep kept a great advantage until the 20th iteration, and DBFO began to surpass DBFO-nrep after the 20th iteration. On NetScience network and Power network, DBFO has maintained its advantage throughout the optimization process compared to DBFO-nrep. On the NetGRQC network, the performance of DBFO-nrep is better than that of DBFO in the 7th to 12th iteration, and DBFO has a better performance compared to DBFO-nrep in the rest iterations. From the above analysis, it can be shown that the proposed novel reproduction strategy effectively improves the performance of the original DBFO.

5.3. Experiment 3: Effect of the Proposed Chemotaxis Strategy

The proposed NDBFO algorithm joined the proposed novel chemotaxis strategy (DBFO-nche) compared with the original DBFO on Football network, NetScience network, Power network, and NetGRQC network instances. It can be proved whether the proposed novel chemotaxis strategy can increase the original DBFO algorithm’s performance by comparing the rising trend in Figure 7.

As shown in Figures 7(a), 7(b), and 7(d), in each iteration, the curve obtained by DBFO with the best CTLI value converges almost faster than that obtained by DBFO-nche. As shown in Figure 7(c), in most iteration, the curve obtained by DBFO with the best CTLI value converges almost faster than that obtained by DBFO-nche. Besides, the final CTLI value of DBFO is significantly better than that of DBFO-nche. Specifically, the superiority of DBFO over DBFO-nche is gradually enhanced on the Football network, NetScience network, and NetGRQC network with the progress of evolution. On Football network, NetScience network, and NetGRQC network, DBFO maintains better performance compared to DBFO-nche throughout the iteration. Compared to DBFO-nche, DBFO kept a significant advantage in the Power network until the 50th iteration, and DBFO-nche began to surpass DBFO after the 50th iteration. With 60th iteration, DBFO overrode DBFO-nche. After the above analysis, the proposed novel chemotaxis strategy effectively improves the performance of the original DBFO.

5.4. Experiment 4: Comparison of Influence Spread

Tables 47 list the influence spread results of algorithms: NDBFO, Degree, Distance, DegreeDiscount, PageRank, DBFO on Football network, NetScience network, Power network, and NetGRQC network, respectively. The best results in Tables 47 are shown in bold. The changed value of the seed size has a significant influence on their performance. From these tables, the performance of DegreeDiscount, PageRank, and DBFO is at a medium level on most of the involved real-world network instances. Besides, degree and distance perform worst on most of the involved real-world network instances.

In Figures 8(a), 8(b), and 8(d), NDBFO always does better than other algorithms. Specifically, NDBFO obtains a considerable performance advantage on NetScience networks and NetGRQC networks where the seed size = 20 and seed size = 15 separately. From the curve of DegreeDiscount and DBFO, their trends are very similar, where the seed size is from 3 to 35. The performance of DBFO does better than that of DegreeDiscount, where the seed size = 40. The degree’s performance gradually stagnates where the seed size is from 15 to 35 on NetScience network and NetGRQC network. Furthermore, the degree’s performance on different networks varies greatly, and its performance has declined where the seed size = 40 on the NetGRQC network. In these algorithms, the performance of distance is worst on the NetScience network and NetGRQC network.

Figure 8(c) shows no significant difference between NDBFO and Degree or DegreeDiscount when the seed size of the initial activation node set is from 15 to 35. In other cases, the performance of NDBFO has absolute advantages compared to the other algorithms. The value of CTLI obtained by distance varies very little where the seed size of the initial activation node set is between 20 and 35.

The performance of PageRank and DBFO is at the middle level in most cases. Similarly, the performance of distance is worst on the NetScience network and NetGRQC network.

6. Conclusions

We propose a new bacterial foraging optimization algorithm (NDBFO) to solve the IM problem, formulated as a discrete optimization problem. We tested the effectiveness of the newly incorporated strategies for NDBFO separately. We have conducted a set of experiments to investigate the performance of NDBFO for the influence maximization problem, in comparison with Degree, Distance, DegreeDiscount, PageRank, and DBFO on four real-world networks: Football network, NetScience network, Power network, and NetGRQC network. The results show that NDBFO is a powerful optimizer for the IM problem, performing better than other comparison algorithms.

We have only shown the robust performance of the NDBFO algorithm in these four test cases at present. Therefore, we cannot assert that the performance of NDBFO is more potent than other comparison algorithms in all cases.

Accordingly, in the future, we will concentrate on proposing a new set of objective functions and developing IM problems in different cascade models. Meanwhile, we will research the advantages and disadvantages of other swarm intelligent optimization algorithms for solving IM problems to find the optimal scheme for the IM problem.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Intelligent Manufacturing Standardization and Test Verification Project “Time Sensitive Network (TSN) and Object Linking and Embedding Unified Architecture for Industrial Control OPC UA Fusion Key Technology Standard Research and Test Verification” project of the Ministry of Industry and Information Technology of the People’s Republic of China.