Mining Community-Level Influence in Microblogging Network: A Case Study on Sina Weibo

Liu, Yufei; Pi, Dechang; Cui, Lin

doi:https://doi.org/10.1155/2017/4783159

Complexity

On this page

Abstract Introduction Related Works Conclusion Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Advances in Processing, Mining, and Learning Complex Data: From Foundations to Real-World Applications

View this Special Issue

Research Article | Open Access

Volume 2017 | Article ID 4783159 | https://doi.org/10.1155/2017/4783159

Mining Community-Level Influence in Microblogging Network: A Case Study on Sina Weibo

Yufei Liu,¹Dechang Pi,^1,2and Lin Cui¹

Academic Editor: Jia Wu

Received07 Jun 2017

Accepted12 Nov 2017

Published04 Dec 2017

Abstract

Social influence analysis is important for many social network applications, including recommendation and cybersecurity analysis. We observe that the influence of community including multiple users outweighs the individual influence. Existing models focus on the individual influence analysis, but few studies estimate the community influence that is ubiquitous in online social network. A major challenge lies in that researchers need to take into account many factors, such as user influence, social trust, and user relationship, to model community-level influence. In this paper, aiming to assess the community-level influence effectively and accurately, we formulate the problem of modeling community influence and construct a community-level influence analysis model. It first eliminates the zombie fans and then calculates the user influence. Next, it calculates the user final influence by combining the user influence and the willingness of diffusing theme information. Finally, it evaluates the community influence by comprehensively studying the user final influence, social trust, and relationship tightness between intrausers of communities. To handle real-world applications, we propose a community-level influence analysis algorithm called CIAA. Empirical studies on a real-world dataset from Sina Weibo demonstrate the superiority of the proposed model.

1. Introduction

Community-level influence analysis is an emerging problem, which can be used in many filed, for example, recommendation system [1, 2], public opinion prediction [3], and cybersecurity analysis [4]. There are many researchers who are interested in analyzing the social influence in social networks [5], but rarely assessing the influence in community level. With the rapid spread of online social networks, such as Twitter, Facebook, and Sina Weibo, large amounts of data with the real world are produced, which provide support for the social influence analysis.

How to establish an effective model for analyzing community-level influence has become an important research for online social network. Community-level influence is greater than individual-level influence, but few researchers have studied community influence. The existing studies establish various social influence analysis models [6, 7], but they just study the influence in the individual level and mostly ignore the existence of a common influence pattern from a community that includes multiple nodes. A large number of achievements have been obtained on individual-level influence, but most of the studies are based on static statistics method [8–11], link analysis algorithms [12–14], or probabilistic models [15–17]. These studies do not consider whether the user is willing to receive or diffuse information or what the role of social trust between users is or do not remove zombie fans. However, these factors are very important for analyzing the social influence. Meanwhile, the existing works about community-level influence focus on the influence strength between communities and ignore the problem of analyzing the community-level influence. For example, Belák et al. [18] calculated the community-level influence by only averaging influence of all users in a community.

An important observation is that zombie fans have no contribution to the social influence, and the willingness of users to diffuse information has a certain effect on the accuracy of calculating social influence, and social trust plays an important role in social influence. The trust degree of user A to user B determines the influence of user B on user A. The more the user A trusts user B, the more influence the user B has on the user A. Because user influence is the basis of the community influence, a little carelessness on the former will lead to errors on the later.

Aiming to assess the community-level influence effectively and accurately, we construct a community-level influence analysis model that can assess community influence. Based on our model, a community-level influence analysis algorithm (short for CIAA) is proposed, which can assess the community influence more effectively and accurately. The main idea of our model is as follows. First, we eliminate the interference of zombie fans on the social influence to make the results more accurate. Then, in the process of calculating user influence, we consider the social trust and use the random walk method to calculate the user influence. In evaluating the user’s theme information, the user mean willingness is calculated by exploring the content related to the user’s theme information. We combine these two factors (the user influence and the user willingness to diffuse theme information) to calculate the user final influence. Finally, the community-level influence is calculated by comprehensively studying the user final influence, the social trust, and relationship tightness between intrausers of communities. Experiments are conducted on a real-world dataset crawled from Sina Weibo. Comparing with the state-of-the-art algorithm (the averaging user influence algorithm [18]), the results show that our model is more effective and accurate to evaluate the community-level influence.

The contributions of this paper can be summarized as follows. We formulate the problem of analyzing the community-level influence and design a community-level influence analysis model. CIAA, a community-level influence analysis algorithm based on our model, is proposed, which is effective and reliable to evaluate the community influence of microbloggers from Sina Weibo. We conduct extensive experiments to assess the performance of the proposed model. Experimental results on the real-world dataset demonstrate the superiority of the proposed CIAA.

The rest of the paper is organized as follows. In Section 2, we summarize the related works. In Section 3, we propose the community-level influence analysis model and give an example to illustrate its working principle, and the CIAA is proposed. In Section 4, we conduct experiments on the real-world dataset crawled from Sina Weibo and then analyze the performance of the proposed approach. Finally, we state the conclusion and future work in Section 5.

Since Katz and Lazarsfeld [19] found that social influence plays an important role in social life and decision-making in the 1950s, researchers in computer field have spare no effort to study the relevant problems. It is found that the popular users play an important role in adopting innovation, social public opinion propagation and guidance, group behavior formation and development [5], and so on.

There are a great deal of research efforts to measure individual-level influence [20, 21], typically, the “opinion leaders.” Existing methods can be categorized into three types: the network structure based methods, the user behavior based methods, and the mutual information based methods. The network structure based methods are degree centrality [22], closeness centrality [23], betweenness centrality [24], eigenvector centrality [25], Katz centrality [26], PageRank [27], and clustering coefficient [28]. We know that node degree essentially means the connection between a node and its neighbors. The method based on node degree can intuitively express this meaning, and its computational cost is smaller than other methods [29]. These methods are widely used in measuring the users’ influence in the social network. However, the methods based on node degree only reflect the connection between the users and their neighbors and cannot measure the users’ influence in the entire social network for the local influence of users. For example, based on the community scale-sensitive maxdegree, Hao et al. [30] proposed an influential users discovering approach called CSSM when placing advertisements. CSSM uses the degree centrality and neighbor’s degree to evaluate node’s (microbloggers) influence. However, the algorithm does not consider the contribution of microblogs to user influence. Comparing with the methods based on the degree, the method based on the shortest path (closeness centrality and betweenness centrality) can measure the individual-level influence in the entire social network. Nevertheless, its computational complexity is higher than the degree centrality method. For example, based on text mining and social network analysis, Bodendorf and Kaiser [31] proposed an approach to detect opinion leaders in directed graph of user communication relationship. It can predict tendency of network opinion leaders via closeness centrality and betweenness centrality. Moreover, measuring the individual-level influence by the shortest path is an ideal status, and it is difficult to achieve in the real-world application scenarios. Besides, the methods based on random walk only consider the structure characteristics of the node while ignoring the behavior characteristics. For example, Xiang et al. [32] provided an understanding of PageRank and authority from an influence propagation perspective by performing random walks. However, they did not consider the personal attributes to understanding of PageRank as well as the relationship between PageRank and social influence analysis. Zhu et al. [33] proposed a novel information diffusion model called CTMC-ICM, which introduces the continuous-time Markov Chain theory into the Independent Cascade Model. Based on the model, they proposed a new ranking metric called SpreadRank. Based on continuous-time Markov process, Li et al. [34] proposed a dynamic information propagation model called IDM-CTMP to predict the influence dynamics of social network users. IDM-CTMP defined two other dynamic influence metrics and could predict the spreading coverage of a user within a given time period. Zhou et al. [35] established new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step. Based on the bound, they proposed a new upper bound based lazy forward algorithm for discovering the top- influential nodes in social networks.

The aforementioned models focus only on assessing the social influence of single individuals. However, a small number of works attempt to build models on the community influence analysis. Qi et al. [36] applied degree centrality, closeness centrality, and betweenness centrality to groups and classes as well as individuals. Latora and Marchiori [37] put forward a group information centrality to measure the importance of node sets. Mehmood et al. [38] exploited information diffusion records to calculate the influence strength between different communities. Although these works preliminarily study the community-level influence, none of them focuses on how to measure a community’s influence. Belák et al. [18] assessed the community-level influence according to the average of the all users’ influence in the same community. Because the distribution of the users’ influence is uneven in different communities, average based method is inequitable to bigger communities, while summation based method is inequitable to smaller ones. At present, community-level influence analysis is still a challenging problem.

3. Proposed Methodology

We construct our model and implement the corresponding algorithm in this section. First, we give the related definitions in Section 3.1. Then, we propose the community-level influence analysis model for microbloggers. Next, we describe the working principle of our model via an example in Section 3.2. Finally, the community-level influence analysis algorithm is proposed in Section 3.3.

3.1. Related Definitions and Community-Level Influence Analysis Model

3.1.1. Related Definitions

Social networks and communities are described as follows: a typical social network can be represented as a bipartite graph , is a set of nodes (users) in a social network, and is a set of edges used to describe the relationships between nodes. A community can be represented as a subgraph of a social network: that is, ; is a set of users in a community. is a set of relationships between users within a community. A node is defined as a user within the community if he/she belongs to the community; otherwise, he/she is defined as a user outside the community. The set of users outside the community is written as UOC. Modeling and calculating the community influence of are the basis of our work, and the objective function of our model is as follows:

denotes the community influence of the community , and the function indicates that the assessment method is based on and . There are two entities (i.e., users and communities) which can produce influence. To study the community-level influence, we give the related definitions as follows.

Definition 1.
Trust. A node in a social network has a certain trust degree in other nodes according to its past contact with other nodes or the reputation of other nodes [39, 40]. According to the different sources of trust, we divide the trust into direct trust and indirect trust.
(1) Direct Trust (DT). Assume that the node is the entry node of the node , indicating that there is contact between and . According to the previous contacts and the reputation of , will have direct trust on .
(2) Indirect Trust (IT). Assume that the node is the reachable node of the node ; will have indirect trust on because the reputation of can be transmitted to .
Users not only have mutual trust, but also mutually influence each other. According to the different sources of influence, this paper divides the influence into direct influence and indirect influence.

Definition 2.
(1) Direct Influence (). Assume that the node is the entry node of the node ; will have an influence on : that is, produces direct influence on .
(2) Indirect Influence (II). Assume that the node is a reachable node of the node ; will have an influence on through transmission layer by layer: that is, produces indirect influence on .
In order to assess the overall influence of on , we define the user combined influence.

Definition 3.
User Combined Influence (UCI). Because has direct trust or indirect trust to , and has direct influence or indirect influence on , we comprehensively combine the four factors to calculate the combined influence of on .

Definition 4.
(1) User Influence (UI). User influence refers to the influence of individual on other users.
(2) Community Influence (CI). Community influence is the overall influence of the community, which is formed by the of all the users in the community and the community’s self-factors.

Definition 5.
Mean Willingness to Diffuse Theme Information (). In communities, some users receiving the theme information may not diffuse it, some users prefer to post their own blog, and some users prefer to forward others’ blog. We assess the community influence by taking into account the diffusion of information between users. represents a user’ willingness to diffuse the information of a blog. The theme information of the user is stored in the set , where represents the user’s th theme information. If is diffused in a social network, a path map is formed to describe the propagation path. We store the path graphs formed by in the set .

3.1.2. Model Framework

Our model consists of four modules: data preprocessing module, data source module, the user final influence module, and the community influence module. Figure 1 shows our model framework.

Data preprocessing module is used to eliminate zombie fans. We judge the zombie fans from the behavior dimension and time dimension. Behavior dimension is based on the amount of theme information posted by the user and the fans’ influence of the user. Time dimension is based on the user login frequency and the frequency of diffusing theme information. Finally, the data preprocessing results are stored to the data source.

Data source module is responsible for providing the relevant data needed for influence analysis. We establish the user information table, the microblog table, the user fans information table, and the user attention table to access the user’s relevant information efficiently.

The user final influence module first calculates the mean willingness to diffuse theme information for each user in a community and then calculates the user’s influence. Next, it combines these two results to get the user final influence.

The community influence module first calculates the community size, the tightness of user relationship, and the user-integrated influence in the community and then evaluates the community influence by integrating the three factors.

3.2. Working Principle

In this subsection, we introduce the working principle of each module in the model framework in detail. We assume that and are two users in community . After performing data preprocessing, Figure 2 shows the working principle, where the mathematical notations will be described in the following subsections in detail.

The working principle can be described as the following steps.

Step 1. Calculate the and of . Then calculate the of . Finally, calculate of .

Step 2. According to Step 1, calculate the and of .

Step 3. Integrate and to calculate the . Then calculate and . Finally, combine the three factors to calculate the community influence.

3.2.1. Data Preprocessing

In microblogging networks, some users of ulterior motives or business purpose lead to producing the zombie fans. According to the definition in [41], zombie fans are the users who are fake fans generated and maintained mostly for economic purpose. Zombie fans certainly interfere in analyzing the social influence. A small number of empirical researches have been conducted on recognizing zombie fans [41–43]. The existing studies were mostly subject to the Twitter platform.

Presently, researchers generally detect the zombie fans based on the amount of attention, the number of fans, original and forward information frequencies, and other basic attributes. With the ever-changing escalation of zombie fans, zombie fans will produce more features [44]. The existing feature-based methods to eliminate zombies may gradually fail. We observe that because zombie fans are occasionally managed via software program or a few people behind, zombie fans often rarely speak, even seldom log in, or no longer are used; and their behaviors can be vastly different with ordinary users in profile information and contents. Moreover, no matter how the features of zombie fans change, they can be split into time dimension and behavior dimension. Thus, it is reasonable to recognize zombie fans from the time dimension and behavior dimension, and it is more able to adapt to the needs of detecting zombie fans in microblogging networks.

According to expert knowledge criteria [45], in the time dimension, we assess zombie fans from the user login frequency and the diffusing advertisement frequency. Thus, time dimension includes login frequency () and diffusing advertisement frequency (). Login frequency refers to the number of logins in a period. The lower the frequency of login is, the higher the probability of the user becoming zombie fans is. The login frequency is calculated as follows:where indicates the number of logins. The higher the diffusing advertisement frequency is, the higher the probability of the user becoming zombie fans is. The diffusing advertisement frequency is calculated as follows:where indicates the number of diffusing advertisement frequencies.

For the same reason, in the behavior dimension, we assess zombie fans from the amount of user theme information and the individual influence of the user’s fans. Thus, we take into account the number of user theme information (), the number of attention users (), and the number of user’s fans ().

To ensure that the criteria of the parameters are reliable, the corresponding criteria are obtained by prior knowledge, expert knowledge, or experimental trial. For example, we select the users who are the last 10% of the login frequency and whose login time interval is greater than 7 days into the set . To reduce the amount of calculation, we filter all users in a microblogging network. If a user has a certified user in his/her fans, the user is not considered a zombie fan. If a user does not have a certified user in his/her fans, the details to eliminate zombie fans can be described in Algorithm 1.

(1) Input: , , , , , ,
(2) Output:
(3) Select the users who are the last 10% of the login frequency and whose login
time interval is greater than 7 days, into the set
(4) Put the users with the top 10% of the diffusing advertisement frequency into
the set
(5) Select the users who are the last 10% of the number of user’ theme
information into the set
(6) Put the users with the top 10% of the attention users into the set
(7) Put the users with the number of fans between 10–200 into the set
(8)
(9) Update and
(10) return ,

As we can see that, unlike the classification and pattern recognition, the proposed method to eliminating zombie fans does not require labeled data and training model. It is effective and easy to use in practice.

3.2.2. The User Final Influence

The traditional models are simple, not taking into account the degree of social trust between users and the user’s willingness to diffuse theme information. However, the two factors are important to the user final influence. In this paper, the user final influence is calculated by integrating the and . Because the influence of a user on other users is related to the user’s willingness to exert his/her influence, the bigger the value of , the greater the probability of the user diffusing a theme information. is calculated as follows:

Mean Willingness to Diffuse Theme Information. The higher frequency of diffusing theme information means a higher user influence, because more users will know the user. Therefore, reflects the probability that a user has high-impact in a microblogging network. The parameter indicates the state of receiving theme information for the user as follows:

The initial value of is set to 0. Meanwhile, to know the result of diffusing the theme information , we observe . The parameter indicates whether diffuses the theme information that he/she received.

When the outdegree of is greater than 0, it indicates that has already diffused the theme information; otherwise, has never diffused the theme information. The number of users receiving theme information is written as NRTI and the number of users diffusing theme information is written as NDTI.

is calculated aswhere . is the of . is the weight. represents the total number of theme information posts by . is the set of indegree nodes of . represents the weight of the user , which is determined by his/her outdegree. is the total number of . The initial value of is set as 1. We give an example for calculating in Figure 3.

(a)

(b)

(c)

(d)

Assume that the of all users initially are 1, , and then calculate the as follows.

(1) . From Figures 3(b)–3(d), we have . For , he/she posts two-theme information, which forms two theme information graphs in Figures 3(b) and 3(c). Thus, we get the set . From Figure 3(d), , NDTI = 0, because the outdegree of node is 0, and forms its one theme information graph. The is calculated as follows:

(2) . Similar to the calculation of , we have the set , . From Figures 3(b) and 3(c), we have , . is calculated as follows:Similarly, for , , and , we have

The User Influence. There are mutual impact and mutual trust between users. Social trust plays an important role in calculating the user influence. She/he is impacted by others including users inside and outside the community.

(1) Calculating Direct Trust and Direct Influence. If is an entry node of , then will have direct trust on .where is the direct trust of on . is the reputation of user . is the set of entry nodes of , and is the reputation of the entry neighbor of . The value of depends on the average reputation of all ’s entry neighbors. For each node, we give the initial direct trust value 0.1. In Figure 3(a), we calculate the direct trust on from other nodes as follows:

has a direct influence on as follows:where is the direct influence of on . is the degree of interest of to . is the amount of the theme information from in the receiving theme information of .

In Figure 3, we calculate the direct influence on produced by other users as follows:In Figure 3(a), we have

(2) Indirect Trust and Indirect Influence. If is the reachable node of , then will have indirect trust on as follows:

is ’s indirect trust on . is the length of the shortest path from to .

In Figure 3(a), we calculate the indirect trust on gained from other nodes as follows:

has an indirect influence on as follows:

In Figure 3(a), we calculate the indirect influence of other nodes on as follows. The calculation of is the same as the above formula.

(3) User Combined Influence. Assuming that can reach through a path, we introduce the factor .

If is the entry node of , the combined influence of on isIf is not an entry node of node , but is a reachable node of , the combined influence is Assume . In Figure 3, we calculate the combined influence of other nodes on as follows. is the entry node of ; then we have . is the entry node ; then we have . is the entry node of ; then we have . is the entry node of ; then we have . is the reachable node of ; then we have .

(4) User Influence. User influence is got by combining all users’ influence:where SUCP represents a set of users that can reach through a certain path. For example, in Figure 3, the user influence of is calculated as follows:

When we get and , the user final influence can be calculated according to (4).

3.2.3. Community Influence

The community influence is composed of the users’ interaction inside and outside the community. In this paper, we consider it from three factors, that is, the user-integrated influence, the community size, and the degree of relationship tightness among users inside the community.

User-integrated influence () is integrated from the final influence of all users within the community.where is of the community . is the set of users inside community .

The community size () is important to the calculation of the community-level influence. The larger the number of users in a community is, the greater the influence of the community becomes. The formula is as follows:where represents the number of users in a community and represents the total number of users in the social network.

The degree of relationship tightness () represents the degree of closeness between users inside a community. We describe it from the user’s outdegree and indegree as follows:

Therefore, we calculate the as follows:where and () are used to distinguish the importance of different factors.

3.3. The Proposed Algorithm

According to the above description, we propose a community-level influence analysis algorithm, called CIAA, in a pseudo-code format in Algorithm 2. It can be seen from the algorithm that the total time complexity is . This means that our algorithm can be applied on large-scale social dataset.

Input: ; ; ; ; ; ; ;
Output: community influence
(1)for to do
(2)
(3)
(4)end for
(5)for to do
(6)
(7)end for
(8)
(9)for to do
(10)
(11)end for
(12)
(13) return

4. Experiments

We conduct experiments to validate the effectiveness of the proposed approach on a real-world microblogging network. In this section, we describe the experimental setup followed by the discussion of experiment results.

4.1. Dataset

The real-world dataset in this paper is crawled from Sina Weibo by Weibo crawler. Similar to a hybrid of Twitter and Facebook, Sina Weibo is one of the most popular sites in China. It has more than 33% of the Internet users in China, and its market penetration is equivalent to that of Twitter in the United States. As released by the Sina Weibo, as of June 2016, the active users from different social and cultural backgrounds have reached 282 million monthly and 86.8 million daily. Moreover, there are nearly 100 million new microblogs every day. They promote and disseminate views and attitudes on business, culture, education, and so forth. The crawled data includes 20,151,129 microblogs, 932,578,467 comments, and 9,218 users. In this paper, we collected more than 1000 users from the crawled dataset and divided the related information into Tables 1, 2, 3, and 4 for data sources according to our model framework. They are stored in txt-formatted files.

4.2. Experimental Setting

All experiments are conducted on a PC with Intel Core i5 processor, 8 GB RAM. According to prior knowledge, we set the parameters of the experiments as Table 5.

4.3. Results

4.3.1. Community Structure Analysis

In order to mine and study the characteristic of community, we plot the outdegree distribution and degree distribution of users in community. In a directed social network, the indegree of nodes is the number of fans of the user. The outdegree of nodes is the amount of the user’s attention. Figure 4 shows the outdegree and degree distribution of data sources.

(a)

(b)

As shown in Figure 4, the outdegree distribution and the degree distribution of Sina Weibo dataset follow the power-law distribution, which indicates that the social network composed of the dataset is a scale-free network.

4.3.2. Eliminating Zombie Fans

In order to improve the accuracy of our model, we remove zombie fans. According to the eliminating zombie fans method in Algorithm 1, we finally remove 12 zombie fans, as shown in Table 6.

As shown in Table 6, the three sets are , , and . The little black boxes in Table 6 represent the shared users of three sets, and they are the same as the shared users from time dimension and behavior dimension. Therefore, the shared users will be removed. We compare the user final influence without the zombie fans with the user final influence with the zombie fans, as shown in Table 7.

From Table 7, the result of the comparison shows that the accuracy of the with zombie fans for the actual user ranking is only 60%. It is concluded that the elimination of zombie fans is very important for the accuracy of the user final influence.

4.3.3. Accuracy Analysis of the User Final Influence

We calculate the user final influence of users in community, but we compare the top ten users for simplicity. The top 10 user final influences and their related information are shown in Table 8.

According to the ranking in Table 8, we find that these users are authenticated user. It is concluded that the authenticated users are more influential in microblogging networks. There are two reasons for this phenomenon. First, the majority of well-known users are authenticated users, and the influence of well-known users is larger than the user average influence. Second, the authenticated user’s identity is transparent, which makes the user have higher social trust. Table 8 also shows that the user final influence needs to be considered from the quality of the user fans, the number of user microblogs, and user authentication.

Table 9 and Figure 5 show the comparison between the UFI method and the microblog-fans ranking algorithm. Table 9 shows the UFI method ranking and the corresponding ranking via microblog-fans ranking algorithm. Figure 5 shows the overall ranking order via the microblog-fans ranking algorithm.

It can be seen from Table 9 and Figure 5 that the UFI ranking is almost completely different from the microblog-fans ranking. Overall, according to the UFI method, the number of microblogs and fans of the top users must reach a certain quantity to support individual influence. Thus, the number of microblogs and fans is a factor of measuring influence in UFI method. However, social trust between users can help improve individual influence in the UFI method.

The user final influence is an experimental evaluation of the user, and there is no existing dataset with its comparison. We can only refer to the ranking of the user influence from some affiliations. Based on the ranking of user influence provided by Sina Weibo official, we verify the calculation method proposed in this paper. We compare the results of the proposed method with the official ranking to verify the correctness of the user final influence. Because each microblogging platform has its own influence calculation method, we cannot numerically compare the results, but we compare the results from the relative position, that is, ranking. If the influence rankings of the two methods are in the similar order, we consider the results of the influence analysis to be similar. The comparison of the users ranking by Sina Weibo officially and UFI method is shown in Table 10.

In Table 10, the user final influence calculation method and the user actual ranking are mainly the same but having the user pair of and . That is because user influence ranking by Sina Weibo emphasizes the number of microblogs and fans, and the number of microblogs and fans of user and user is largely different. However, the UFI method considers the factors of influence more reasonably.

Considering the results of Sina Weibo official as the standard, the accuracy of UFI method will change with different and , as shown in Figure 6.

From Figure 6, it can be seen that the UFI method accuracy changes with the different and . When , , UFI method has the highest accuracy. Therefore, the parameter pair is used for other experiments. We also find that the UFI method is more accurate than the microblog-fans ranking algorithm. Moreover, this experiment indicates the importance of the user willingness to diffusing theme information in the accuracy of the user influence.

4.3.4. Accuracy Analysis of CIAA

Because the existing studies of community influence are few, we compare the proposed algorithm CIAA with the averaging user influence algorithm (AI). We set different parameters pair and for comparing the two algorithms. Then, we can calculate the corresponding community influence, as shown in Figure 7.

Figure 7 shows that the results of the CIAA are changing with the different parameter values. When and , the results of the two algorithms are closest. That is because the AI algorithm is mainly the weighted average of the user influence, and the CIAA is the integration of the user-integrated influence, the community size, and the degree of relationship tightness among users inside the community. The greater the proportion of the user final influence, the closer the results of the two algorithms. Therefore, the proposed algorithm outperforms the state-of-the-art baseline algorithm.

5. Conclusion

In this paper, we studied the emerging problem on how to model community-level influence. Online social networks, especially microblogging networks, are more and more important in our daily life. Previous works can effectively cope with the individual influence in microblogging network, but they rarely evaluate the social influence in community level, which outweighs the individual influence. We defined the related concepts for the community-level influence and constructed a model that combined the user influence, social trust, and relationship tightness of intrausers in a community to reveal the community-level influence appropriately. We proposed the algorithm CIAA to cope with the real-world applications. We conducted empirical studies on a real-world microblogging crawled from Sina Weibo, where the CIAA outperformed the state-of-the-art baseline algorithm. To the best of our knowledge, the proposed approach has a significant effect on community influence in microblogging network. The highlights of this paper can be summarized as follows: formulating the problem of analyzing community-level influence and designing a community-level influence analysis model; proposing community-level influence analysis algorithm called CIAA, to cope with real-world microblogging applications; and extensively demonstrating the superiority of the proposed method. In the future work, we plan to extend the proposed method to assess the community influence in dynamic online social network.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants U1433116 and 61702355, in part by the Fundamental Research Funds for the Central Universities under Grant NP2017208, and in part by the Funding of Jiangsu Innovation Program for Graduate Education under Grants KYLX15_0324 and KYLX15_0321.

References

L. Yao, Q. Z. Sheng, A. H. H. Ngu, J. Yu, and A. Segev, “Unified collaborative and content-based web service recommendation,” IEEE Transactions on Services Computing, vol. 8, no. 3, pp. 453–466, 2015.
View at: Publisher Site | Google Scholar
J. Wu, S. Pan, X. Zhu, C. Zhang, and X. Wu, “Multi-instance Learning with Discriminative Bag Mapping,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–16, 2018.
View at: Google Scholar
D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
T. Cruz, L. Rosa, J. Proenca et al., “A Cybersecurity Detection Framework for Supervisory Control and Data Acquisition Systems,” IEEE Transactions on Industrial Informatics, vol. 12, no. 6, pp. 2236–2246, 2016.
View at: Publisher Site | Google Scholar
D. Kim, D. Hyeon, J. Oh, W.-S. Han, and H. Yu, “Influence maximization based on reachability sketches in dynamic graphs,” Information Sciences, vol. 394-395, pp. 217–231, 2017.
View at: Publisher Site | Google Scholar
G. Wang, W. Jiang, J. Wu, and Z. Xiong, “Fine-grained feature-based social influence evaluation in online social networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 9, pp. 2286–2296, 2014.
View at: Publisher Site | Google Scholar
J. M. Hofman, A. Sharma, and D. J. Watts, “Prediction and explanation in social systems,” Science, vol. 355, no. 6324, pp. 486–488, 2017.
View at: Publisher Site | Google Scholar
S. Myers and J. Leskovec, “The bursty dynamics of the twitter information network,” in Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 913–923, Republic of Korea, April 2014.
View at: Publisher Site | Google Scholar
Y. Liu, Q. Li, X. Tang, N. Ma, and R. Tian, “Superedge prediction: What opinions will be mined based on an opinion supernetwork model?” Decision Support Systems, vol. 64, pp. 118–129, 2014.
View at: Publisher Site | Google Scholar
R. De Caux, C. Smith, D. Kniveton, R. Black, and A. Philippides, “Dynamic, small-world social network generation through local agent interactions,” Complexity, vol. 19, no. 6, pp. 44–53, 2014.
View at: Publisher Site | Google Scholar
J. Wu, S. Pan, X. Zhu, C. Zhang, and P. S. Yu, “Multiple structure-view learning for graph classification,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–16, 2017.
View at: Publisher Site | Google Scholar
L. Zhu, D. Guo, J. Yin, G. V. Steeg, and A. Galstyan, “Scalable temporal latent space inference for link prediction in dynamic social networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2765–2777, 2016.
View at: Publisher Site | Google Scholar
S.-Y. Tan, J. Wu, L. Lü, M.-J. Li, and X. Lu, “Efficient network disintegration under incomplete information: The comic effect of link prediction,” Scientific Reports, vol. 6, Article ID 22916, 2016.
View at: Publisher Site | Google Scholar
J. Wu, S. Pan, X. Zhu, C. Zhang, and X. Wu, “Positive and unlabeled multi-graph learning,” IEEE Transactions on Cybernetics, vol. 47, no. 4, pp. 818–829, 2017.
View at: Publisher Site | Google Scholar
A. Almaatouq, L. Radaelli, A. Pentland, and E. Shmueli, “Are you your friends' friend? Poor perception of friendship ties limits the ability to promote behavioral change,” PLoS ONE, vol. 11, no. 3, Article ID e0151588, 2016.
View at: Publisher Site | Google Scholar
Q. Fang, J. Sang, C. Xu, and Y. Rui, “Topic-sensitive influencer mining in interest-based social media networks via hypergraph learning,” IEEE Transactions on Multimedia, vol. 16, no. 3, pp. 796–812, 2014.
View at: Publisher Site | Google Scholar
J. Wu, S. Pan, X. Zhu, and Z. Cai, “Boosting for multi-graph classification,” IEEE Transactions on Cybernetics, vol. 45, no. 3, pp. 416–429, 2015.
View at: Publisher Site | Google Scholar
V. Belák, S. Lam, and C. Hayes, “Towards maximising cross-community information diffusion,” in Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2012, pp. 171–178, Turkey, August 2012.
View at: Publisher Site | Google Scholar
P. F. Lazarsfeld, Personal Influence: The Part Played by People in the Flow of Mass Communications, Transaction Publishers, New York, NY, USA, 2006.
C. Dong, Y. Zhao, and Q. Zhang, “Assessing the influence of an individual event in complex fault spreading network based on dynamic uncertain causality graph,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 8, pp. 1615–1630, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
N. Ma and Y. Liu, “SuperedgeRank algorithm and its application in identifying opinion leader of online public opinion supernetwork,” Expert Systems with Applications, vol. 41, no. 4, pp. 1357–1368, 2014.
View at: Publisher Site | Google Scholar
X. Tang, J. Wang, J. Zhong, and Y. Pan, “Predicting essential proteins based on weighted degree centrality,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 11, no. 2, pp. 407–418, 2014.
View at: Publisher Site | Google Scholar
M. K. Tarkowski, P. Szczepa, T. Rahwan, T. P. Michalak, and M. Wooldridge, “Closeness centrality for networks with overlapping community structure,” in Proceedings of the in Thirtieth AAAI Conference on Artificial Intelligence, pp. 622–629, Phoenix, Ariz, USA, 2016.
View at: Google Scholar
N. Kourtellis, G. De Francisci Morales, and F. Bonchi, “Scalable Online Betweenness Centrality in Evolving Graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 9, pp. 2494–2506, 2015.
View at: Publisher Site | Google Scholar
G. Lohmann, D. S. Margulies, A. Horstmann et al., “Eigenvector centrality mapping for analyzing connectivity patterns in fMRI data of the human brain,” PLoS ONE, vol. 5, no. 4, Article ID e10232, 2010.
View at: Publisher Site | Google Scholar
P. Grindrod and D. J. Higham, “A matrix iteration for dynamic network summaries,” SIAM Review, vol. 55, no. 1, pp. 118–128, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
D. F. Gleich, “PageRank beyond the web,” SIAM Review, vol. 57, no. 3, pp. 321–363, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
J. Wang, M. Li, H. Wang, and Y. Pan, “Identification of essential proteins based on edge clustering coefficient,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1070–1080, 2012.
View at: Publisher Site | Google Scholar
O. Sporns, “Contributions and challenges for network models in cognitive neuroscience,” Nature Neuroscience, vol. 17, pp. 652–660, 2014.
View at: Publisher Site | Google Scholar
F. Hao, M. Chen, C. Zhu, and M. Guizani, “Discovering influential users in micro-blog marketing with influence maximization mechanism,” in Proceedings of the 2012 IEEE Global Communications Conference, GLOBECOM 2012, pp. 470–474, USA, December 2012.
View at: Publisher Site | Google Scholar
F. Bodendorf and C. Kaiser, “Detecting opinion leaders and trends in online social networks,” in Proceedings of the 2nd ACM Workshop on Social Web Search and Mining, SWSM'09, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009, pp. 65–68, China, November 2009.
View at: Publisher Site | Google Scholar
B. Xiang, Q. Liu, E. Chen, H. Xiong, Y. Zheng, and Y. Yang, “PageRank with priors: An influence propagation perspective,” in Proceedings of the 23rd International Joint Conference on Artificial Intelligence, IJCAI'13, pp. 2740–2746, Beijing, China, 2013.
View at: Google Scholar
T. Zhu, B. Wang, B. Wu, and C. Zhu, “Maximizing the spread of influence ranking in social networks,” Information Sciences, vol. 278, pp. 535–544, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
J. Li, W. Peng, T. Li, T. Sun, Q. Li, and J. Xu, “Social network user influence sense-making and dynamics prediction,” Expert Systems with Applications, vol. 41, no. 11, pp. 5115–5124, 2014.
View at: Publisher Site | Google Scholar
C. Zhou, P. Zhang, W. Zang, and L. Guo, “On the upper bounds of spread for greedy algorithms in social network influence maximization,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 10, pp. 2770–2783, 2015.
View at: Publisher Site | Google Scholar
X. Qi, E. Fuller, R. Luo, and C.-Q. Zhang, “A novel centrality method for weighted networks based on the Kirchhoff polynomial,” Pattern Recognition Letters, vol. 58, pp. 51–60, 2015.
View at: Publisher Site | Google Scholar
V. Latora and M. Marchiori, “A measure of centrality based on network efficiency,” New Journal of Physics , vol. 9, article 188, 2007.
View at: Publisher Site | Google Scholar
Y. Mehmood, N. Barbieri, F. Bonchi, and A. Ukkonen, “CSI: Community-level social influence analysis,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 8189, no. 2, pp. 48–63, 2013.
View at: Publisher Site | Google Scholar
C. S. E. Bale, N. J. Mccullen, T. J. Foxon, A. M. Rucklidge, and W. F. Gale, “Modeling diffusion of energy innovations on a heterogeneous social network and approaches to integration of real-world data,” Complexity, vol. 19, no. 6, pp. 83–94, 2014.
View at: Publisher Site | Google Scholar
P. De Meo, E. Ferrara, D. Rosaci, and G. M. L. Sarné, “Trust and compactness in social network groups,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 205–216, 2015.
View at: Publisher Site | Google Scholar
H. Liu, Y. Zhang, H. Lin, J. Wu, Z. Wu, and X. Zhang, “How many zombies around you?” in Proceedings of the 13th IEEE International Conference on Data Mining, ICDM 2013, pp. 1133–1138, USA, December 2013.
View at: Publisher Site | Google Scholar
Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Detecting automation of Twitter accounts: Are you a human, bot, or cyborg?” IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 6, pp. 811–824, 2012.
View at: Publisher Site | Google Scholar
Y. Liu, D. Pi, and L. Cui, “Metric Learning Combining With Boosting for User Distance Measure in Multiple Social Networks,” IEEE Access, vol. 5, pp. 19342–19351, 2017.
View at: Publisher Site | Google Scholar
Q. Zhang, J. Wu, Q. Zhang, P. Zhang, G. Long, and C. Zhang, “Dual influence embedded social recommendation,” World Wide Web, 2017.
View at: Publisher Site | Google Scholar
Q. Yan, L. Wu, and L. Zheng, “Social network based microblog user behavior analysis,” Physica A: Statistical Mechanics and its Applications, vol. 392, no. 7, pp. 1712–1723, 2013.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Yufei Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1156

Downloads

1189

Citations

Complexity

Advances in Processing, Mining, and Learning Complex Data: From Foundations to Real-World Applications

Mining Community-Level Influence in Microblogging Network: A Case Study on Sina Weibo

Abstract

1. Introduction

2. Related Works

3. Proposed Methodology

3.1. Related Definitions and Community-Level Influence Analysis Model

3.1.1. Related Definitions

3.1.2. Model Framework

3.2. Working Principle

3.2.1. Data Preprocessing

3.2.2. The User Final Influence

3.2.3. Community Influence

3.3. The Proposed Algorithm

4. Experiments

4.1. Dataset

4.2. Experimental Setting

4.3. Results

4.3.1. Community Structure Analysis

4.3.2. Eliminating Zombie Fans

4.3.3. Accuracy Analysis of the User Final Influence

4.3.4. Accuracy Analysis of CIAA

5. Conclusion

Conflicts of Interest

Acknowledgments

References

Copyright