Collective Behavior Analysis and Graph Mining in Social NetworksView this Special Issue
Statistical Analysis of Dispelling Rumors on Sina Weibo
Analyzing the process and results of dispelling rumors is a prerequisite for designing an effective anti-rumor strategy. Current research on this subject focuses on the simulation experiments, short of empirical study. By using the False Information Publicity Results of Sina Weibo as the data source of empirical research, this article compares the typical features of rumor and anti-rumor accounts. Furthermore, taking COVID-19 as the target topic, distributions of the reported time, frequency, platform penalty levels, and diffusion parameters of rumors related to COVID-19 are given, and some interesting results are obtained.
With the rapid development of online social media, such as Twitter, Facebook, YouTube, Sina, etc., a large amount of unreliable information is generated and propagated at an unprecedented rate. Compared with truth, rumor is proved to be more novel and inspired fear, disgust, and surprise in replies. Statistical results on big data showed that false rumors diffused significantly farther, faster, deeper, and more broadly than the truth . Rumors not only have a strong ability to spread but also can pose a threat to the network security and social stability . For instance, on April 23, 2013, fake news claiming two explosions happened in the White House and Barack Obama got injured was widely diffused on Twitter, caused severe social panic and a loss of $136.5 billion in the stock market . The negative impact is more dangerous and destructive when faced with sudden public events (public health incident, natural disaster, accident disaster, social security incident, and economic crisis). For example, when the coronavirus disease 2019 (COVID-19) spreads all over the globe, numerous rumors appeared on social media. Some of them are masking healthy behaviors and providing incorrect suggestions that result in poor physical and mental health outcomes . In Iran, more than 700 people have died after ingesting toxic methanol because false rumors it helps cure the coronavirus . In Nigeria, there are several cases of overdose of chloroquine after the rumors about the effectiveness of chloroquine for treating COVID-19 spread . Many other rumors caused psychological panic and panic buying of groceries.
Recognizing the dangers of rumors, Tedros Adhanom Ghebreyesus, Director-General of the WHO, said, “We’re not just fighting an epidemic; we’re fighting an infodemic,” at a gathering of foreign policy and security experts in Munich, Germany, in mid-February, referring to fake news that “spread faster and more easily than this virus.” In response, a team of WHO “Mythbusters,” including Facebook, Google, Pinterest, Tencent, Twitter, TikTok, and YouTube, unite to fight against the spread of rumors . In early February, Facebook announced that it would be removing false claims and conspiracy theories about the disease posted on its social media platforms . On March 18, Twitter updated its safety policy to prohibit tweets that “could place people at a higher risk of transmitting COVID-19.” . Sina Weibo’s official account created a special home page for fighting with the rumors; meanwhile, it encouraged users to report fake news .
Social media platforms set different strategies for dispelling rumors, including deleting posts and accounts, deducting scores, lowering credit ratings, tagging rumor labels, training professional whistleblowers, and so on. In the field of extreme information confrontation, it has been verified that, for achieving the best results, four strategies should be balanced. The strategies are deleting accounts, disintegrating clusters, supporting anti-hate clusters, and setting competing hate narratives [11, 12]. However, during the actual rumor confrontation, the effect of the strategies has not been analyzed, nor the features of rumor and anti-rumor accounts, and the dispelling process. To discover more patterns from real data, we will conduct an in-depth empirical analysis based on the False Information Publicity Results of Sina Weibo.
In this paper, Section 2 introduces related work. Section 3 reports the results of empirical analysis, including the different features between rumor spreaders and whistleblowers and the confrontation process of rumors related to COVID-19.
2. Related Work
It is always difficult to distinguish true information from false rumor, especially if the data are well formatted and structured . Thus, source credibility is vital as external information in the task of rumor detection. As the COVID-19 pandemic continues to spread around the world, people are suggested to get the accurate and timely news from trusted healthcare professionals and government officials, such as WHO, CDC, CIDRAP, and so on. . Furthermore, some institutions, such as the new Coronavirus Misinformation Tracking Center of NewsGuard , provide over 157 reports about low-quality and questionable new outlets that spread the rumor about COVID-19, which reminds users to stay away from the unreliable sources. Other organizations, such as Media Bias/Fact Check, provide users with a reputation search for news sites . These professional reputation scores are valuable when facing with a large number of news. However, not all websites have reputation scores, and most messages are provided by anonymous users in social media. How to rank the credibility of information sources automatically and effectively is a big challenge. In unsupervised and semisupervised rumor detection methods, different features of information sources are analyzed. Yang et al.  analyzed the characteristics of accounts, including the user’s identity, personal descriptions, gender, avatar type, name type, registration time, place, number of followers, friends, and posts. The classifying accuracy of using account-based features alone is 72.6%. These features have been used widely in subsequent methods, such as hierarchical neural networks . Besides the classical features, Liu et al.  considered the features of the credibility identification, diversity, and the relationship between profile location and event location. Han and Guo  classified sources into three levels from low to high and analyzed the corresponding source descriptions, characteristics, and examples. It is proved that these features are valid on some data sets. However, whether they are always effective in various topics on different platforms requires further statistical analysis. After distinguishing rumors from the truth, the next task is to dispel rumors. For solving this problem, many researchers designed several kinds of simulation models. Zhang et al.  improved the ICSAR model (I: ignorance; C: information carrier; S: information spreader; A: information advocate; R: removal) to accurately analyze rumor propagation and refutation in metropolises. Zhang et al.  designed a novel rumor and authoritative information propagation model in complex social networks considering the superspreading phenomenon and analyzed the dynamic interaction between rumor and authoritative information. Bodaghi et al.  proposed a novel model based on two hypotheses: one is that the number of followers influences the impact level of the received rumor/anti-rumor messages by those users, and the second deals with the effect of time on the impact of rumor/anti-rumor posts. The evaluation of the model supported their hypotheses. Zhang et al.  designed a novel two-stage rumor propagation and refutation model with time effect for online social networks and obtained the corresponding mean-field equations in both homogeneous and heterogeneous networks. Askarizadeh et al.  proposed an evolutionary game model to analyze the rumor process in the social network considering the impacts of people’s decisions on rumor propagation and control. The analysis shows that propagation of convincing anti-rumor messages and locating rumor control centers impact debunking the rumor. In addition to modeling, Takahashi and Igata  investigated the difference of word distribution between rumor and anti-rumor tweets in the two instances. Zhiyuan et al.  found that a large number of anti-rumor accounts reported messages about themselves, and many rumor spreaders are spammers.
Some of our former works have been applied in the area of extracting reliable information in the social networks. Xiong et al. [28, 29] made effective suggestion of location despite a large amount of rumor and noise. A study by Xiong et al.  helps to recognize the real information from web data.
Different from previous works, this article focused on empirical analysis and tried to discover useful patterns of dispelling rumors.
3. Empirical Analysis
3.1. Data Preprocessing
Sina Weibo is a popular social networking site in China with more than 500 million registered users . In this paper, the False Information Publicity Results of Sina Weibo is used as the data source. There are 39297 records of anti-rumors dated from 2012-5-29 to 2020-5-8. Monthly data goes up and down frequently, and yearly data arrived at the peak in 2014 and then remained stable in the following years, as shown in Figure 1.
For analyzing the text, three kinds of features, including users, Weibo content, and dispelling results, are collected, as illustrated in Table 1. Then ICTCLAS  is used for word segmentation, and each word is given a weighted score computed by TF-IDF . In the case study, rumors are filtered according to the topic keywords, such as COVID-19, coronavirus, pandemic, and so on.
3.2. Characteristics of Rumor and Anti-Rumor Spreaders
In this dataset, there are 34754 rumor spreaders and 15866 anti-rumor accounts, most of the accounts appeared only once. After data filtering, 2553 rumor spreaders and 2653 anti-rumor accounts who appeared more than once are conserved. The characteristics of the two types of accounts are compared, including basic features, high-frequency words of descriptions, geographic location, number of followees, fans, and posts in Sina platform, and number of posts in False Information Publicity Results, as shown in Tables 2 and 3 and Figures 2–4.
Table 2 shows that, compared with anti-rumors, more rumor spreaders are deleted. This is because in Sina Weibo if an account spreads rumor with a serious consequence, it would be deleted immediately. In Table 2, the proportion of verified anti-rumor accounts is a little more than that of verified rumor spreaders, which is reasonable. However, compared with anti-rumor accounts, rumor spreaders have more comprehensive information, including brief introductions, tags, and educational information.
Table 3 illustrates the high-frequency words of descriptions between the two kinds of accounts. The names of rumor spreaders are mainly cities, while those of anti-rumor accounts are related to police. Besides that, the labels of the two types of users are mainly about entertainment, and this is depended on the nature of the platform.
From the perspective of geographical distribution in Figure 2, rumor spreaders are more dispersed than anti-rumor accounts. The areas where both types of accounts are concentrated are Beijing, Shanghai, and Guangdong, which are the most developed cities and provinces in China. Besides, the anti-rumor reporters are also distributed in the three coastal provinces; they are Fujian, Zhejiang, and Jiangsu.
Figure 3 gives the distributions of followers, fans, and posts between rumor and anti-rumor accounts. The three subfigures have a common pattern, that is when the numbers of followers, fans, and posts are small, anti-rumor accounts are more than rumor spreaders. But, when the numbers raise, results in the three situations are reversed.
Figure 4 shows the distribution of posts by anti-rumor accounts and rumor spreaders. Overall, the number of spreaders who post just once is more than that of anti-rumor accounts. In addition, a small number of anti-rumor accounts reported many times. Some of them are professionals, for example, the account who reported about 104 times is found to be a professional whistleblower.
3.3. Rumors Related to COVID-19
3.3.1. Number of Reported Rumors
Rumors related to COVID-19 from January 2 to May 8, 2020, in Sina Weibo are obtained as the topic data source. Totally, there are 526 rumors on the topic of COVID-19 posted by 505 accounts. The number of reported rumors fluctuates every day, as shown in Figure 5.
3.3.2. Report and Punishment
Reported time and frequency distribution of rumors related to COVID-19 are shown in Table 4 and Figure 6, respectively. In Table 4, 81% of the rumors are reported within one day, 10% of them are even marked within 10 minutes, which reflects the rumor supervision of the Sina platform is relatively strict, and the anti-rumor reporters respond quickly to rumors.
Besides, about 60% of the rumors are reported only once by whistleblowers, but there are also some rumors reported more than 100 times by different anti-rumor accounts, as shown in Figure 6. Through further analysis, it is found that the accounts who have been reported many times have comparatively lots of fans. For example, the account whose rumor was reported more than 600 times has 7119 followers.
The levels distribution of punishment in the False Information Publicity Results of Sina Weibo is shown in Figure 7. It illustrates that the number of rumor spreaders with punishment level 1 (the lightest punishment) accounts for nearly 50%. In addition, ranging from level 1 to level 5, as the level increases, the number of people decreases rapidly. Moreover, there are a few rumor spreaders in level 6 and level 7 (heaviest punishment), the two add up to only about 1.5%, and it seems that most rumors related to COVID-19 in Sina Weibo are not very harmful. As to the harmful rumors, we found that 11.95% of the rumors were deleted, and 3.31% of the accounts could not be displayed on the Sina platform. Furthermore, the correlation coefficients between punishment level and the number of reported times, reposts, comments, and likes are tested. All the correlation coefficients are around 0.2, which illustrates that there is no obvious positive relationship between them. We speculate the punishment level is related to the possible harmfulness of the message.
3.3.3. Rumor Propagation
In social networks, there are three common behaviors, reposting, commenting, and giving somebody thumbs up. Distributions of the three behaviors are shown in Figure 8. Besides, these rumors are ranked according to the number of comments; features of the top-10 and the last-10 rumors are compared, as shown in Tables 5 and 6, respectively; in-depth analysis of the comments of the top 4 rumors are given in Figures 9 and 10 and Table 7.
In the three behaviors, giving a “like” is much easier than the other two behaviors, which just needs to press a button and will not lead to further broadcasting. Reposting is easier than commenting but may need to be responsible for the results of information dissemination. Commenting is the most time-consuming action. So, Figure 8 illustrates that the most popular action is liking and the next one is reposting. A few of rumors related to COVID-19 even get more than 105 likes, and some rumors are reposted more than 104 times.
There are 10 features in Tables 5 and 6, including the number of followers, fans, posts, and credit level of the rumor spreaders, average reposts, comments, and likes of ten posts written before the rumors, the earliest time and frequency of being reported, and whether the rumor spreader has been reported previously. (1) As a whole, the accounts in the top-10 set have more fans than those who are in the last-10 set, and the number of followers and posts are various on different accounts. (2) The credit level of accounts in the two sets is at least medium, so it is unreliable to predict the authenticity of an account’s information by using the credit level provided by the platform. (3) The accounts in the last-10 sets cannot attract other users’ attention, and the average numbers of their reposts, comments, and likes are zero. In contrast, the messages posted by the top-10 accounts often receive much more feedback from others. (4) All the accounts in the top-10 set are reported equal or more than three times, while all the accounts in the last-10 are reported equal or less than three times. (5) In the cases of the last-10 rumors, some rumors are reported immediately after it appeared on the platform. For example, a piece of rumor whose writer has more than 170 thousand fans was reported one hour after it was written. This illustrates that as long as the supervisors pay close attention to the accounts with big fans, the scope of rumors they posted can be controlled. (6) Moreover, we found that in the data of the top-10 set, there are three accounts posted rumors beforehand, which reminds us that we need to pay more attention to the accounts who have already posted rumors previously.
Figure 9 illustrates the number of comments on the four most popular rumors related to COVID-19. All the comments of the four rumors arrive at their maximum value the next day after publishing. We checked the date of the rumors and found that except Rumor 2 which was written at 13 : 44, the other three rumors were written after 21 : 00. We hypothesized it is the release time that impacts the number of comments on the first day, which reminds us that the evening is a good time to refute rumors. In the meantime, it can be seen that the numbers of comments of the four rumors decrease sharply on the third day, and this is because the Sina platform took measures and marked the rumors, so when a message is confirmed as a fake message, people’s attention will be greatly reduced. Figure 10 visualizes the comment networks of the four popular rumors related to COVID-19. There is an obvious core network in Figures 10(a)–10(c), respectively, in which the center node is the source of the rumor. In Figure 10(d), there are two groups, the center node of one group is the rumor source, and another center node is an account who argued frequently with others. In addition to the core networks, there are some nodes which caused a small number of users to reply. Through text analysis, we found most of these kinds of nodes are accounts who expressed controversial views. Besides, the anti-rumor accounts who appeared in the comment networks are represented as big and red nods. From Figures 10(a), 10(b), and 10(d), it can be seen that some anti-rumor users do not reply to the rumor source directly, nor do they cause extensive discussion. This may be an important reason for not controlling the spread of rumors in time.
Several network parameters are shown in Table 7. From Table 7, the four popular rumors about COVID-19 did not spread deeply, because the diameters of the networks are equal or less than five. Judging from Modularity Index, comment networks of Rumor 3 and Rumor 4 have reached a certain degree of modularity.
In this work, we performed a case study of dispelling rumors. We compared classical features between the two kinds of accounts and found that the percentage of being deleted, high-frequency words of usernames, geographical distributions of followers, fans, and posts in Sina Weibo and posts in False Information Publicity Results are quite different. The geographical distribution of rumor spreaders is relatively more dispersed, their personal information is more complete, and many of them own a large number of followers, fans, and Weibo. The geographic location of anti-rumor accounts is concentrated, their user names are usually about police, and some of them are very professional that reported rumor over 104 times. Furthermore, we analyzed the reported rumors of COVID-19. The results suggest that most of the rumors related to COVID-19 do not have serious harm, 60% of the rumors are reported only once, and about 81% of the rumors are reported in one day. As for dispelling rumors, the credit rating provided by Sina is unreliable to detect rumor spreaders. In addition, the time after 9 PM is a prime time to debunk rumors. If the rumor is controlled well in the evening, the next day’s outbreak may be avoided. Besides, the accounts who had written rumors previously and whose daily message is popular should be focused on observation. Meanwhile, the anti-rumor accounts should reply to the rumor source directly.
Readers can access the data of the manuscript titled “Confrontation analysis between rumor spreaders and anti-rumor reporters” from the False Information Publicity Results of Sina Weibo. In the manuscript, the URL of the data source is given in Reference , that is, https://service.account.weibo.com/?type=5&status=4&display=0&retcode=6102.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The work was partially supported by the National Natural Science Foundation of China under Grant no. 81901389 and the China Postdoctoral Science Foundation under Grant no. 2019M653400.
T. Chen, X. Li, H. Yin et al., “Call attention to rumors: deep attention based recurrent neural networks for early rumor detection,” in Proceedings of the Pacific- Asia Conference on Knowledge Discovery and Data Mining, pp. 40–52, Springer, Melbourne, Australia, June 2018.View at: Publisher Site | Google Scholar
N. F. Johnson, N. Velásquez, N. J. Restrepo et al., “The online competition between pro-and anti-vaccination views,” Nature, vol. 582, pp. 230–233, 2020.View at: Google Scholar
N. F. Johnson, R. Leahy, N. J. Restrepo et al., “Hidden resilience and adaptive dynamics of the global online hate ecology,” Nature, vol. 573, pp. 261–265, 2019.View at: Google Scholar
H. Han and X. Guo, “Construction on framework of rumor detection and warning system based on web mining technology,” in Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), pp. 767–771, IEEE, Singapore, June 2018.View at: Publisher Site | Google Scholar
N. Zhang, H. Huang, M. Duarte, and J. Zhang, “Risk analysis for rumor propagation in metropolises based on improved 8-state ICSAR model and dynamic personal activity trajectories,” Physica A: Statistical Mechanics and Its Applications, vol. 451, pp. 403–419, 2016.View at: Publisher Site | Google Scholar
T. Takahashi and N. Igata, “Rumor detection on twitter,” in Proceedings of the 13th International Symposium on Advanced Intelligence Systems, pp. 452–457, IEEE, Kobe, Japan, November 2012.View at: Google Scholar
L. Zhiyuan, Z. Yue, T. Cuncao, and S. Maosong, “Statistical semantic analysis of Chinese social media rumors,” Science of China: Information Science, vol. 45, no. 12, p. 1536, 2015.View at: Google Scholar
X. Xiong, F. Xiong, J. Zhao et al., “Assessment of public attention, risk perception, dynamic discovery of favorite locations in spatio-temporal social networks,” Information Processing and Management, vol. 57, p. 102337, 2020.View at: Google Scholar
H. P. Zhang, H. K. Yu, D. Y. Xiong, and Q. Liu, “HHMM-based Chinese lexical analyzer ICTCLAS,” in Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, vol. 17, pp. 184–187, Association for Computational Linguistics, Sapporo, Japan, June 2003.View at: Publisher Site | Google Scholar
T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, Carnegie-Mellon Univ Pittsburgh pa Dept of Computer Science, Pittsburgh, PA, USA, 1996.