#### Abstract

Users can obtain intelligent services by sharing information in social networks. Big data technologies can discover underlying benefits from this information. However, stringent security concern is raised at the same time. The public data can be utilized by adversaries, which will bring dire consequences. In this paper, the influence maximization problem is investigated in a privacy protection environment, which aims to find a subset of secure users that can make the spread of influence maximization and privacy disclosure minimization. At first, in order to estimate the risk level for each user, a Bayesian-based individual privacy risk evaluation model is proposed to rank the individual risk levels. Secondly, as the aim is to measure the influence capability for each user, a cascade influence capability evaluation model is designed to rank the friend influence capability levels. Finally, based on these two factors, a privacy protection method is designed for solving the influence maximization with attack constraint problem. In addition, the comparison experiments show that our method can achieve the goal of influence maximization and privacy disclosure minimization efficiently.

#### 1. Introduction

Big data sharing through social media has meaningfully grown in the current era of social network [1, 2]. The social media has assumed great importance through WeChat, Facebook, social network sites, and Twitter. The issue of how information spreads through the social network has drawn more and more attention [3, 4]. The publicly available data can be utilized for market analysis, social research, and personalized service formulation [5]. Based on these analyses, more effective marketing strategies such as “viral marketing” can be found [6, 7]. The shared data may include a lot of individual information such as user’s occupation, family members, and religious affiliation. [8, 9]. These data are gathered and shared by many organizations, companies, institutions, and public websites. Unquestionably they bring valuable benefits for intelligent services [10]. However, they also pose a series of serious privacy risks [11–13]. Therefore, appropriate privacy protection should be undertaken for secure information spread [14–16].

Influence maximization problem in privacy protection environment is to find a subset of secure and reliable users that can make the spread of influence maximization and privacy disclosure minimization. It is a critical problem of finding the main factors of privacy risk and estimating the risk level of these factors [17]. There are many kinds of attributes for the shared big data, which can be classified into three categories: quasi-attributes, direct attributes, and sensitive attributes [18]. Quasi attributes are those that can be shared and do not belong to just one user, such as gender. Direct attributes have character of uniqueness such as e-mail and WeChat ID. Sensitive attributes contain individual private information such as personal health status. Sometimes other sensitive attributes are externally visible [19]. It is necessary to design an estimation to measure the risk level for each user. The user with a low individual privacy risk level is assumed to be safe. Because the attackers will pay no attention to the lower ones, the user with the lower risk level will be safe. While the user with high privacy risk level will be of great interest to attackers, all attributes of this user are leaked. Furthermore, with the rapid development of communication technology, the world is getting smaller [4]. The influence of friends can pose privacy risks. It has been found that a great deal of privacy leakage comes from friends' indirect disclosure [7]. In a word, information attributes and friends’ influence are two important factors for privacy risks evaluation.

This paper focuses on the design of the influence maximization method in privacy protection environment. Individual privacy risk and friends’ influence are two important factors for privacy risks evaluation. It is necessary to design an estimation to measure the risk level of the user and friends’ influence capability. Based on this, a privacy protection method can be designed for the social network. For these purposes, at first, an attribute risk level grading method is designed based on Bayesian Network. Secondly, the cascade influence model is employed for designing the friend influence capability model. Thirdly, a privacy protection method is designed for solving the influence maximization with attack constraint problem. Specifically, the contributions of this paper are summarized as follows:(1)Bayesian-based individual privacy risk evaluation model (IPREM) is proposed to evaluate the individual privacy risk levels. Since the actual multidimensional attribute data may not be completed, it is difficult to deal with the complex nonlinear relationship between the individual privacy risk and the multidimensional attribute evaluation index by using the regression analysis method. However, Bayesian Network has the function of reverse reasoning. Under the premise of some serious privacy risk, the trained Bayesian Network can be used to carry out reverse operation and analyze the objective factors causing risk.(2)Cascade influence capability evaluation model (CICEM) is designed to evaluate the friend influence capability levels based on the cascade influence model. According to the users’ cascade influence capability, the benefits and threats for the friends’ influence capability can be measured.(3)Based on the two evaluation models for two important factors, an IPREM and CICEM based Privacy Protection Method (ICPM) is designed for solving the influence maximization with attack constraint problem. It is the first attempt, to our knowledge, to consider the individual privacy risk and influence maximization on the privacy protection design.

The rest of the article is organized as follows: the related work is given in Section 2; the preliminaries are given in Section 3; the privacy protection method is designed in Section 4; the simulation analysis is discussed in Section 5; and finally, the conclusion is given in Section 6.

#### 2. Related Work

There are many researches that focus on influence maximization issue, such as degree base heuristic algorithm [20] and greedy algorithm [4]. However, most of them do not consider the privacy threat problem. Privacy threats toward social network have been extensively documented. To deal with these concerns, many privacy preserving techniques have been proposed in literature. The aspect of data protection, the behaviors of data collecting or publishing, and the privacy characterization and measurement method are three main methods.

As an aspect of data protection method, Li et al. considered data security by putting all data into a cloud [21]. A mobile-cloud framework is presented by eradicating the data over-collection. However, this kind of approach mainly involves restricting data sharing, which is not suitable for the social network. Some approaches are designed based on cryptography, such as a match-then-decrypt technique proposed by Zhang et al. [22]. The data can be decrypted only when the attribute private key can match the hidden access policy. Some approaches are designed by setting the access control permission, such as Li et al. proposed a lightweight approach to protecting privacy, which applies the information flow control in routers [23]. However, these approaches are controlled by the servers without considering the user’s personalization, which are not suitable for the social network.

Some studies consider data security from data collection or publish behavior [19, 24]. For example, based on the theory of planned behavior and the privacy calculus model for social network, Li et al. proposed an integrated model to explain privacy disclosure behaviors [25]. In order to reduce disclosure risk and enhance data utility, Marmar et al. proposed an improved suppression method by targeting the highest risk records and keeping other records intact [18]. Three new theories extended parallel process model, self-control theory, and routine activity theory which are employed by Chen et al. to explore online privacy concerns [26]. Based on the anonymity technology, Javier et al. present a generalization of aggregation method, where the individual data are replaced by cluster mean for data publishing [19]. However, without considering the difference of the individual attributes, they use the same strategy for different data sets. In fact, different people can use different strategies. Therefore, it is very desirable to have a lightweight and scalable mechanism to protect privacy.

The above researches focus on data protect, which lack a proper privacy characterization and measurement. A quantification model with multi-variable privacy characterization is presented by Ref. [14], which can analyze the sensitivity of individual privacy characterization. Investigate how to optimize the tradeoff between latent-data privacy and customized data utility. He et al. proposed a data-sanitization strategy that does not greatly reduce the benefits brought about by social network data, while sensitive latent information can still be protected [8]. Based on defining user vulnerability, Gundecha et al. present a privacy setting model by keeping users away from the high threatening users [7]. In order to quantify the location privacy leak, Li et al. designed a model by matching the users shared locations with their real mobility traces [6]. Several link-prediction and attribute prediction algorithms are proposed in social attribute networks [27]. In order to predict sensitive information, a data-sanitization strategy is proposed by harnessing link and attribute information simultaneously [28]. In order to resist inference attack, Cai et al. proposed a collective inference model with a mixture of nonsensitive attributes and social relationships [9]. However, these researches only consider the individual vulnerability, without considering the friendship influence. In fact, both individual vulnerability and friendship influence will affect the privacy risk. Therefore, it is very desirable to design some models to estimate the risk level of each attribute and the friend’s influence capability. Based on these analyses, we focus on the design of the individual privacy risk evaluation model and friend influence capability evaluation model. Furthermore, based on these two models, we need to design a privacy protection method for solving the influence maximization with attack constraint problem.

#### 3. Preliminaries

This section describes some necessary background of the privacy protection, such as the social network, the attribute set, the cascade model, and the Bayesian model.

*Definition 1 (social network) (see [8]). *Social network can be described as a graph , with node set , edge set , and attribute sets . An edge exists if and only if nodes and can communicate with each other. and represent the total number of nodes and edges, respectively. For the uniformity, all users are referred to as nodes in this article.

*Definition 2 (attribute set) (see [8]). *The attribute set of node can be represented by an attribute vector . represents the total number of attributes. Each attribute takes value from the -th dimension attribute.

*Definition 3 (risk of individual index) (see [7]). *Individual index is defined to estimate the risk of privacy, the risk may be incurred by allowing individual attributes to be visible. The risk of individual index for node can be defined as a function of individual attribute, which is shown as follows:where is the sensitivity weight of an -th attribute , which will be defined in the IPREM model. if -th attribute is visible, otherwise the attribute is not visible. , where indicates that all attributes of node can be visible. On the other hand, indicates that the attribute of node is nonvisible.

*Definition 4 (cascade model) (see [4]). *The cascade model is an influence spreading model with probability. The activated node will attempt to activate its inactive neighbor under the probability . Furthermore, the active node has only one chance to activate each of its inactive neighbors. Such attempts are mutually independent for different neighbors, namely, the activation of to will not be affected by the influences from other neighbors of .

Two kinds of cascade models, the Independent Cascade Model (ICM) with random and the Weight Cascade Model (WCM) with the weighted probability, will be utilized in this article.

*Definition 5 (independent cascade model (ICM)) (see [4]). *The ICM is an influence spreading model with probability. The activated node will attempt to activate its inactive neighbor under the random probability .

*Definition 6 (weight cascade model (WCM)) (see [4]). *The WCM is an influence spreading model with probability. The activated node will attempt to activate its inactive neighbor under the weighted probability .

*Definition 7 (cascade index). *Cascade index is defined to estimate the status of the cascade influence capability. It is defined as a function of transmission capability for cascade influenceas follows:where is the probability of transmission subgraph and if -th level needs to be calculated, otherwise the level is not considered. , where indicates the highest level, where all nodes in the network can be influenced by node . indicates the lowest level, where none of the nodes can be influenced in the network.

*Definition 8 (Bayesian model) (see [29]). *If are the models considered, and is the quantity of interest, then its posterior distribution under given data is shown asThe posterior probability for model can be given bySome important symbols and their definitions are presented in Table 1.

#### 4. IPREM and CICEM Based Privacy Protection Method (ICPM)

In order to maximize the influence and minimize the privacy risk, the seed set with -size needs to be selected. The node with high cascade influence capability but low individual privacy risk evaluation can satisfy this necessary criterion. The nodes with high cascade influence capability can improve the network influence. However, if it also has high private risk level, it will be of great interest to attackers. Then, it can pose a threat to his friends, and the threats are increased with the number of vulnerable nodes that are influenced. So, the user with a low individual privacy risk level will paid less attention by the attackers, at the same time, the user with high cascade influence capability can improve the network influence. In other words, it is necessary to find the nodes with high cascade influence capacity and stay away from the vulnerable ones. This kind of problem can be defined as the Influence Maximization with Attack Constraint problem (IMAC).

**IMAC**: is a selected seed set with high vulnerable weight , where and represent the weight factors. The seed set size needs to satisfy . is the set whose member is vulnerable to be attacked. The aim of the IMAC is to maximize the influence under the constraint environment of minimizing the privacy risk. So is positively correlated with the cascade influence capability while negatively correlated with the individual privacy risk . and can be calculated by (9) and (10), respectively.

For a graph with nodes and edges: The probability of the subgraph generated can be calculated as (5).where can be defined by the concrete cascade influence model.

According to (5), the probability of the example subgraph as shown in Figure 1 is . Assume graph has probability graphs , then the number of nodes that can be influenced by the nodes set can be calculated by the arbitrary . The node conditional expected influence privacy is shown as (6). represents the nodes set influenced by seed set after steps through cascade influence model, when the attacked set exists. is the total number of the steps in the cascade influence model.

**(a)**

**(b)**

Users in the social network can decide whether or not to reveal their individual attributes based on the risk levels. So, estimating the privacy risk is the basic precondition for the privacy protection. In this section, the evaluation models for quantifying privacy disclosure risks are discussed. Two factors are estimated, individual privacy risk and cascade influence capability. The Bayesian-based Individual Privacy Risk Evaluation model (IPREM) is designed for the hierarchy of individual attribute at first. Individual attributes include personal information such as name, age, gender, family members, e-mail, QQ ID, WeChat ID, occupation, and even religious affiliations. Furthermore, one node’s vulnerability depends not only on the visibility of individual attributes but also on the exposure of the profile through his friends. Then, the Cascade Influence Capability Evaluation model (CICEM) is designed, which aims to rank the friend influence risk levels based on the cascade influence model. At last, an IPREM and CICEM based Privacy Protection Method (ICPM) is designed for solving the IMAC problem.

##### 4.1. IPREM: Hierarchy of Individual Privacy Risk

The individual privacy risk is one of the most important factors for privacy protection. For example, if one of your friends who has most of your information has a high individual privacy risk, there is high probability that your information will be leaked indirectly. So how to evaluate each user’s individual privacy risk level is the first important issue. The probability of individual privacy risk can be predicted based on the Bayesian Network, under the condition that some risk is known. Since the actual multidimensional attribute data may not be completed, it is difficult to deal with the complex nonlinear relationship between the individual privacy risk and the multidimensional attribute evaluation index by using the regression analysis method. However, Bayesian Network has the function of reverse reasoning. Under the premise of some serious privacy risk, the trained Bayesian Network can be used to carry out reverse operation and analyze the objective factors causing risk. Bayesian Networks can be obtained by means of data analysis and expert experience.

The individual attribute exposure statuses are denoted as . Assume the prior probability is . The risk level is denoted as and new additional information obtained from the investigation is. According to Bayesian Network, the posterior probability can be calculated:

The probability of each factor leading to its occurrence can be calculated by (7), when risk happens. In this paper, the levels of privacy risk is defined as the number of the attributions , from level 1 to level , with level ’s risk being the highest. The risk levels can be denoted as .

The privacy risk level may be affected by the status of the factors. The level of exposed information determines the status of the node and thus the risk level of the node can be calculated. So, it can be considered that there is a causal relationship between various statuses and the levels of privacy risk. According to (7), there is

Based on these theoretical foundations, the IPREM can be designed by 4 steps as follows:

Bayesian-Based Individual Privacy Risk Evaluation Model (IPREM) **Input**: the network and the probability of each attribute being exposed. **Output**: individual privacy risk for each individual privacy.

*Step 1. *The first step is to design a disclosure risk measure. According to the analysis of network, the probability of exposure for each attribute can be obtained, and sensitivity weight for each value can be calculated by .

An example of 4 sample attributes is shown in Table 2. According to the analysis of Facebook network, about nodes reveal their gender, about nodes reveal their individual websites. Then the sensitivity weight of the gender is 0.1823. Assume the four statuses of the individual privacy risk is depended by the attribute public situation:(1)The probability of exposure between 0.3 and 1 is defined as status , such as the gender, whose probability . These attributes only trigger the lowest level of individual privacy risk(2)The probability of exposure between and is defined as status (3)The probability of exposure between and is defined as status (4)The probability of exposure between and is defined as status For example, the attribution phone number is set to be visible by only 00.36% of users, and then it has a sensitivity weight of 0.9964, which will trigger the highest level of leakage risk.

*Step 2. *The second step is to calculate the prior probability of each attribute based on (1).

The prior probability can be calculated as Table 3 based on (1). Since , the risk levels can be denoted as . Take status as an example; the probability for the privacy risk at level 1, can be calculated. Then, the coefficient of individual privacy risk evaluation can be calculated as the last column of Table 3.

*Step 3. *The modeling of Bayesian Network can be completed based on (8).

Assume , and based on (8), Table 4 shows the . When the privacy risk is at , the posterior probability , , and can be calculated. It is easy to find that the risk is higher when the individual privacy risk is in the higher status. For example, when risk happens, the probabilities in status is . However, if risk happens, the probabilities in statuses , , , and are , , , and , respectively.

*Step 4. *The individual privacy risk can be calculated by where represents the value of the ’s attribute value.

For example, based on the data in Tables 3 and 5, since , for node , , and , the individual privacy risk can be calculated. In the same way, the individual privacy risk for , and can be calculated as , , and , respectively. Node has the highest individual privacy risk.

##### 4.2. CICEM: Hierarchy of Influence Capability Based on Cascade Influence Model

Cascade influence capability is another important factor in the social network. On the one hand, users’ cascade influence capability is a key factor for influence maximization. Users with high cascade influence capability are selected into the seed set and can make the spread of influence maximization. On the other hand, the friend influence risk is another factor for privacy protection. A friend with high cascade influence capability may have higher influence risk. For example, if one of your friends who has high cascade influence capability know most of your information, there is a high probability that your information may be leaked indirectly. So, how to evaluate each user’s cascade influence capability is the second important issue.

The idea is similar to the IPREM, and based on the cascade influence model, the CICEM is designed as follows:

###### 4.2.1. Cascade Influence Capability Evaluation Model (CICEM)

**Input**: one social network **Output**: cascade influence capability Step 1. Similar to IPREM, the statuses can be defined according to the cascade index , which also can be dynamic regulated by the user or environment requirement. Here, assume the cascade influence has four kinds of statuses , the prior probability is . According to (2), the probability of cascade influence can be calculated. represents the level of the cascade influence capability, which can be set based on some established principles. Take Figure 1 as an example. Four levels can be set based on the node degrees. The nodes are arranged in descending order of the degree. The top nodes with the lowest degree are set as , those arranged between and are set as , those arranged between and are set as , and the top nodes with the highest degree are set as . Figure 1(a) is the original graph. Figure 1(b) is the subgraph; nodes and are affected by node through cascade influence model, that is to say that, in this subgraph, two nodes have been influenced. In this example, set node ’s influence status to . According to (2), the total probability of influence is shown in Table 6. Then, the coefficients of cascade influence evaluation can be calculated as shown in the last column of Table 6. Step 2. The posterior probability can be calculated based on (9). Take the same example, are shown in Table 7. It is easy to find that the cascade influence capability is higher when more nodes are influenced. For example, when cascade influence capability happens, the influence set size more than 4 has a probability of . If cascade influence capability happens, only one node is influenced with a probability of , and nodes 2, 3, and 4 have been influenced with probabilities of , , and , respectively. Step 3. Based on the cascade influence evaluation , the cascade influence capability can be calculated as

is the cascade influence evaluation, which can be calculated, as shown in Table 6. The selection of coefficient is determined by the node. For example, when node is in level , then will be selected as the coefficient. is the probability of the edges in the route between the node and . is the total number of the nodes in the network. The cascade influence capability needs to be normalized as .

##### 4.3. ICPM: IPREM and CICEM Based Privacy Protection Method

Influence maximization problem in privacy protection environment is to find a subset of secure and reliable users that can make the spread of influence maximization and privacy disclosure minimization. For this purpose, at last, an IPREM and CICEM based privacy protection method (ICPM) is designed. The process of the ICPM is as follows: Step 1. Calculate the individual privacy risk based on IPREM. Step 2. Calculate the cascade influence capability based on CICEM. Step 3. Calculate the for each node . Step 4. Select some nodes into seed set . Step 5. Calculate . Step 6. For node , calculate , if , add into . Step 7. Repeat Step 6 until .

#### 5. Performance Evaluations

In order to analyze the performance of the ICPM method, the Facebook network is selected for experimental analysis. Imitating the data source of literature [7], we captured some Facebook data containing user information. This network contains about 130,000 users and 1,000,000 edges. The profile information includes 26 attributes for users such as age, gender, mobile phone number, and address. Without invasion of privacy, each of the attribute information is defined as true or false. True means this attribute is visible, while false means nonvisible. Since there are 26 attributes in the simulation, the risk levels can be denoted as . Figure 2 shows the percentages of people who enable the particular attribute to be visible. For example, it can be found that users enable their mobile phone numbers to be visible. users enable their gender to be visible.

In this section, three influence maximization methods, two cascade influence models, and two attack models are discussed for comparison. Three influence maximization methods are degree-based [20], random-based [30], and our ICPM method. For the degree-based method, nodes with higher degree will be selected. For the random-based seed set selected method, nodes will be selected randomly. For our ICPM method, nodes will be selected according to the method proposed in Section 4. The weight coefficients can be set by the environments and requirements. In this simulation experiment, they are set as and . The nodes selected by these methods are taken as the initial active nodes. Furthermore, ICM and WCM are two cascade influence models we will use. For each influence maximization method, with or without edge weight modified models will be discussed. In addition, in order to test the security, two attack models will be modeled: (1) the attack model based on high individual privacy risk and (2) the attack model based on high degree. Two measurements, influence size and protection degree, are discussed. The simulation experiments are carried out in the MATLAB environment. The final influence effects and protection degrees are the average of 50 times simulation experiment.

##### 5.1. Comparison Experiment Based on ICM

At first, the comparison experiment based on ICM will be discussed. Figure 3 shows the influence for three methods in Facebook network. For simplicity and lack of information, the activation probability between nodes is set as the same value of 0.05, which is also the probability value commonly used under this model [4]. Figure 3(a) is a comparison of the number of influenced nodes of different seed sizes by three methods in the Facebook network, where the -coordinate is the seed set size, and the -coordinate is the size of the set to be influenced to eventually. It can be found that the seed set selected by our ICPM method can spread much wider than other methods when no attack happens.

**(a)**

**(b)**

**(c)**

**(d)**

An important conclusion that can be drawn from Figures 3(b)–3(d) is that, when the high degree attack happens, the ICPM method is affected slightly, and its property of antiattack is the best. The degree-based method has the lowest influence set size. Assume that 200 nodes are selected as the seed set; for the degree method, the influence set sizes are decreased from 933 to 261, 163, and 51 when the attacked set sizes are , , and of the seed set size, respectively. However, for the ICPM method, the influence set sizes are decreased from 1249 to 372, 355. and 331 when the attacked set sizes are , , and of the seed set size, respectively. Take the attacked set size as as an example, the influence set sizes fall to and by degree method and ICPM method, respectively.

However, It is not clear at a glance for the privacy protection level. Then, the protection degree is defined as the ratio of the influences set sizes under attack to that without attack. Figure 4 shows the protection degree under high individual privacy risk attack. Figures 4(a) and 4(b) show and nodes of the whole network are attacked, respectively. For example, when nodes are attacked, the protection degree is about 0.4 by our ICPM method, while that is nearly 0 by the degree method. It also can be found that ICPM method is affected slightly under the individual privacy risk attack, which can protect the privacy more. The reason for this behavior is that the degree is not the only factor considered in our ICPM method. The nodes with high individual privacy risk have a lower probability to be selected as the seed node. That is to say, the ICPM method has the ability to find more security influential nodes than the other methods.

**(a)**

**(b)**

##### 5.2. Comparison Experiment with Edge Weight Modification

Second, based on the ICM, the influence with or without edge weight modification will be discussed. The edge weight modification means that the weight can be modified according to the actual environment and requirement. For example, three kinds of activation probabilities between nodes are set. For the nodes with top highest individual privacy risk, the probabilities of the incidence edge are modified to , and for the nodes with top lowest individual privacy risk, the probability of the incidence edge are modified to , where and .

Figure 5 shows the influence under high degree attack model. Six kinds of influence maximization methods, random-based, degree-based, and ICPM method with or without edge weight modification, are discussed based on ICM. For example, assume that 200 nodes are selected as the seed set. Under high degree attack and without edge weight modification, the attacked set sizes are and , respectively, of the seed set size. For the degree method, the influence set sizes are 165 and 51, respectively. For the random method, the influence set sizes are 302 and 259, respectively. However, for the ICPM method, the influenced set sizes are 355 and 331, respectively. It can be found that the ICPM method is affected slightly under the degree attack.

**(a)**

**(b)**

Furthermore, the protection degrees under the six kinds of methods are discussed. Figure 6 shows the protection degree based on ICM under high individual privacy risk attack in the Facebook network. Figures 6(a) and 6(b) show and nodes of the whole network are attacked, respectively. For example, when nodes are attacked, the protection degree is about 0.35 by our ICMP method, while that is 0 by the degree based method. The reason is that according to equations (9)–(10) and the high individual privacy risk attack principle, the nodes with higher degree will have higher individual privacy risk. They are selected as the seed nodes by the degree-based method, that is to say, almost all the seed nodes will be attacked by the high individual privacy risk attack, and no information can be spread.

**(a)**

**(b)**

As mentioned above, some important conclusions can be drawn. At first, the attack effect for the ICPM method is less than other methods. Second, the influence set sizes are almost the same by the methods with or without edge weight modification. The reason for this behavior is due to the fact that the individual privacy risk and the cascade influence capability are two factors considered in the ICPM method. The edge weight modification is that the weight can be modified according to the actual environment and requirement.

##### 5.3. Comparison Experiment Based on WCM

At last, the comparison experiment based on WCM will be discussed. Different from the ICM, the activation probability between nodes in WCM is set as the inverse of the degree. The Facebook network is also utilized for the experimental analysis.

Figures 7(a)–7(d) show the influence set sizes when different number of nodes are attacked by high degree attacks in the Facebook network. It is easy to find that the ICPM method has higher influence set sizes than the other two kinds of methods. Assume 200 nodes are selected as the seeds. For the degree-based method, the influenced set sizes are decreased from 935 to 547, 365, and 136 when the attacked set sizes are , , and of the seed set size, respectively. However, for the ICPM method, the influenced set sizes are decreased from 1250 to 1153, 1129, and 1096 when the attacked set sizes are , , and of the seed set size, respectively. Furthermore, compared with Figure 3, it can be found that the influence set sizes based on WCM are higher than that based on ICM.

**(a)**

**(b)**

**(c)**

**(d)**

In addition, the protection degrees under six kinds of methods are discussed. Figure 8 shows the protection degrees based on WCM under individual privacy risk attack in the Facebook network. Figures 8(a) and 8(b) show that and nodes of the whole network are attacked, respectively. It is easy to find that our ICPM method is affected slightly under the individual privacy risk attack, which can protect the privacy more. From these two figures, it can be found that the ICPM method has the highest protection degree, while the degree-based method is the worst method. From the analysis, it can be concluded that under different kinds of attacks, the ICPM method is affected slightly and its property of anti-attack is the best.

**(a)**

**(b)**

#### 6. Conclusion

This paper focuses on the research of privacy protection model in social networks. One of our key methods beyond the existing literature is considering both the individual risk and cascade influence capability: (1) Bayesian-based Individual Privacy Risk Evaluation Model (IPREM) is proposed to rank the individual risk levels; (2) by considering the influence capability, Cascade Influence Capability Evaluation Model (CICEM) is designed; and (3) an IPREM and CICEM based Privacy Protection Method (ICPM) is designed. It is the first attempt, to our knowledge, to consider jointly individual privacy risk and influence maximization on the privacy protection design. Finally, the performance and security are compared with different methods, and our method can obtain the highest influence set sizes and exhibit the best antiattack property when some attacks happened.

Our IPREM, CICEM models and ICPM method provide good starting points in the influence maximization privacy protection research in future social network. Further studies may concentrate on the temporal and spatial variation environment, the case when the attacker has strong reasoning attack ability. Furthermore, the attributes are not discussed independent of analysis in this article. Next, the problem of what is the amount of private attribute leakage and privacy breach when the attacks happen will be discussed.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank the National Natural Science Foundation of China (nos. 61902069 and U1905211) and the Natural Science Foundation of Fujian Province of China (no. 2021J011068).