Abstract

Social-bots-mediated information manipulation is influencing the public opinion environment, and their role and behavior patterns in news proliferation are worth exploring. Based on the analysis of bots' posting frequency, influence, and retweeting relationship, we take the diffusion of The New York Times' coverage of Xinjiang issue on the overseas social platform Twitter as an example and employ the two-step flow model. It is found that in the role of second-step diffusion, unlike posting news indiscriminately in first-step diffusion, social bots are more inclined to postcontroversial information in second-step diffusion; in terms of diffusion patterns, although social bots are more engaged in first-step diffusion than in second-step diffusion and can trigger human users to retweet, they are still inferior to humans in terms of influence.

1. Introduction

Social bots are playing an increasingly important role in global political communication, especially when the development of social media platforms and intelligent technologies provides a hotbed for political communication activities of social bots. In social media platforms, such as Twitter and Facebook, social bots are producing and disseminating information flows in various ways, interacting with users, spreading disinformation utilizing calculated propaganda, influencing the information environment perceived by users, and creating a false opinion climate and public opinion ecology in major international events as well as national democratic politics. In December 2019, Facebook deleted hundreds of accounts that were generated by artificial intelligence and registered with falsified profiles by one media corporation [1]. It is worth noting that when social networks play a pivotal role in the flow of professional journalism, users become a key variable influencing whether news reaches a wider public; therefore, traditional news corporations shift their distribution from transparent editorial distribution to less transparent social distribution. This paper adopts a computational communication approach and focuses on The New York Times, the representative of American media, to investigate the information diffusion process on social media for its series of reports on Xinjiang issues. We make use of the two-step flow of communication model to explore the role played by social bots in the diffusion of such reports and to assess the extent of their influence on human users. Specifically, we explore the following questions: (1) to what extent do social bots participate in the information distribution of news in professional media; (2) what role do social bots play in the diffusion of such stories? (3) can social bots become opinion leaders along the diffusion path of professional news stories?

2. Literature Review

2.1. Social Bots

Social robots are a kind of social media account generated by algorithms that undertakes information distribution tasks, the essence of which is automatic program operation [2]. Social bots take on the identity of humans with different types of personality attributes in social networks and interact virtually with people to create a virtual AI image. It uses the functionality of social accounts to deliver news and information like a human and can also perform malicious activities such as sending spam, posting harassment, and delivering hate speech. Such social bots can post messages quickly, mass-produce replicated messages, and eventually distribute messages in the form of human-like users [3]. Social bots are more active compared to regular human users [4], and their purpose is to learn and imitate humans to manipulate public opinion on social media platforms. Social bots, while using algorithms as the underlying configuration and performing automated, are still different from purely algorithmic technologies. The behavior of social bots is always inseparable from the intentions of the manipulators, and social bots performing different tasks often exhibit very heterogeneous behaviors. Therefore, identifying the behavioral patterns of social bots participating in social issue discussions has also become a research focus of scholars. Rossi et al. [5] classified political bots into three types according to their complexity. The simplest bots are only used to increase the number of followers of an account or the number of tags/likes of a particular post, but these bots are easily monitored and deleted by the platforms; the second-most complex bots can dynamically adjust their content information by tracking the behavior of other target accounts; the most sophisticated bots are those in which humans control some of the bot's activities in terms of content creation, and so on. This most sophisticated bot is also known as a human-computer hybrid political bot, making it difficult for humans to tell whether the object they are interacting with is a bot according to its content and behavior.

Social bots can communicate with users through predetermined content, algorithm-driven Big Data, appropriate expressions, and constant imitation and adaption of human discourse systems through machine learning. Under these circumstances, Douglas Guilbeault [6] elaborated that social bots have both social and functional characteristics, and their “interaction” with humans is their most telling trait, wherein human beings and social bots work together to solve Internet problems ethically probably. In addition, many social bots are used for legitimate and effective information production activities, such as news bot writing, and other content creation and information services. Meanwhile, social bots use the data and algorithms of social platforms to generate personalized content, including news, advertisements, and political propaganda information, and deliver them to users, so that the recognition and credibility of social robots by users are constantly improved [7]. For example, one of the main functions of political bots in Twitter is to create topics for discussion and propaganda information. The botnet establishes a set of algorithms to manipulate major social media, such as Facebook and Twitter, which can lead users to post content that supports or attacks political candidates and political issues. Indeed, social bots account for about 9% to 15% of active Twitter accounts [8]. Stefan provided evidence that up to 66% of the tweets with hyperlinks are suspected to be posted by bot accounts [9].

At present, artificial intelligence and computer science have distinguished between intelligent agents and automated software programs. Zannettou et al. cautioned that social bots tend to congregate under the major events in social media; moreover, as news events change, bot accounts delete their previous tweets, change their display usernames, and then emerge with new faces [10].

The fact is that computational propaganda, in the form of botnets, fake news, and algorithmic manipulation, plays a key role in the globalized political system. Whether the political actors are governments, parties, civil society groups, or grassroots groups, political communication actors often use computational propaganda consciously or unconsciously to spread disinformation and misinformation when using political bots to achieve their goals. As Freelon and Wells noted, the circumstances where such digital disinformation is created and operated are a larger “public communication crisis” [11]. Their point is that disinformation is an intentional deception, as opposed to misinformation, which is the unintentional spread of false information. To generate some kind of influence flow on political issues, the highly engaged public builds the order of digital disinformation and destroys the democratic political decision-making process.

2.2. Diffusion and Prediction of Social Media Information

Traditionally, information diffusion does not reach the public directly but flows first to the active members of groups and then to the general public, forming a process of information diffusion from opinion leaders to the public [12]. Social media users may not receive information directly from the promulgator but rely on key nodes in the social network to access and diffuse information. Dodds et al. developed a tool called Hedonometer to capture and calculate the sentiment status in tweets in real time [13]; Bollen et al. found that sentiment in tweets could better predict the trends of the stock market [14]. Jeremy et al. [15] (researchers from Google) also developed the use of search engine retrieval data to predict influenza outbreaks, which is a typical application of Big Data in social sciences. Vosoughi et al. [16] analyzed the information diffusion network of 126,000 messages that were retweeted by 3 million people on Twitter from 2006 to 2017 and found that fake news spreads farther, faster, deeper, and wider than real news; they further showed that fake news is more novel and arouses more fear, disgust, and surprise, and humans, not social bots, are the main cause of the spread of fake news.

However, real-world network structures are constantly changing and enriching. Shang et al. [17] provided evidence that existing link-prediction algorithms for information diffusion only focus on regular complex networks, relying too much on the closed triangular structure of the network or the so-called preferential attachment phenomenon. Better prediction models can improve the user experience of social networks by enhancing interaction with the network, and relative link-prediction algorithms are particularly suitable to be models for online information diffusion networks [18]. In this way, machine prediction and robotic prediction also have broad application prospects in analyzing the future direction and trend of social events. Through Big Data and deep mining techniques, machine prediction and robotic prediction can be used to monitor online public opinion in real-time and quickly perceive public opinion. For instance, the United Nations and other international organizations collect and analyze dynamic information and data about the dynamics of socio-political tensions and armed conflicts and build early warning systems by machine-learning technology to prevent social conflicts [19]. Kallus [20] used intelligent robots to successfully predict the time and place of the Arab Spring in Egypt and other information.

Moreover, machine learning and robotic prediction techniques were used to analyze Twitter hashtags and topic trends, such as predicting the anti-Trump protests after the U.S. presidential election was settled in November 2016 and the protests launched in major U.S. airports in January 2017 against the refugee entry ordinance [21]. Social bots have also been used to predict the formation and development of hate speech online, such as racist remarks in the United States and the Philippines during the COVID-19 pandemic [22].

Compared with the aforementioned information diffusion modes, the two-step flow of communication model is more appropriate to the topics concerned in this study. Lazarsfeld proposed the two-step communication theory, which argues that messages emanating from mass media flow to the masses by intermediary forces, where the intermediary forces are opinion leaders. After its introduction, the two-level communication theory has been further developed in studies on interpersonal influence and innovation diffusion. In the communication of social media, the communication nodes on social networks are opinion leaders in the new media environment, influencing the flow of information from the source to the audience, and in the process of flow, there may be single-level, two-level, or multi-level communication patterns. After being proposed, the two-step flow of communication theory has been further developed in studies such as interpersonal communication and innovation diffusion [2325]. The two-step flow of communication model is more about major events; information chains are more engaged but less influential in primary communication, and humans retweet more. All in all, social bots stimulate people to retweet and spread information. Focusing on Twitter, one of the most popular social platforms overseas, we examine how social bots bring news stories produced by professional media corporations such as The New York Times to social media and enable their further diffusion on social media.

In summary, in this study, the primary and secondary levels of communication are defined as:(1)The process of introducing professional media coverage into social media is first-step dissemination. On the Web pages of news organizations, whether pasting hyperlinks into the text or using buttons to share off-site content on the Twitter platform with one click, users can bring news from professional media organizations to social media and initiate its dissemination on social media.(2)The complete process of spreading the news on Twitter in the form of “retweets” is secondary-step dissemination. On Twitter, users can post original tweets and retweet tweets from other users with one click. If Twitter user B retweets user A's original tweet with one click, and user C then sees the tweet from B and retweets it with one click, Twitter considers that A has been retweeted twice and B has been retweeted zero times, which means that there is a retweeting relationship between B and A and between C and A, but not between B and C. Thus, in Twitter's tweet management logic, the original tweet is considered to be the first step of dissemination, and the subsequent retweets are considered to be the second step of dissemination of the original tweet.

This study examines how social bots introduce news reports produced by professional media organizations into social media and explores the role and characteristics of social bots in the proliferation of China-related issues. For diffusion, Twitter's diffusion mechanism is the first level of diffusion from news agencies to retweets, and the complete process of news diffusion on Twitter in the form of “retweets” is the second level of diffusion, which is explained in the text.

3. Data

(1)Obtaining the document reports: From the official website of The New York Times (https://www.nytimes.com), Xinjiang was used as the keyword to scrape reports from January 1, 2021, to June 1, 2021, and a total of 169 reports were obtained, and their corresponding 121 URLs were saved.(2)Obtaining first-step dissemination information: The original tweets containing the aforementioned 121 URLs from January 1, 2021, to June 1, 2021, were scraped on Twitter by an independently designed Web-scraping program. According to Twitter's retrieval rules, news pointing to the URLs can be scraped regardless of whether users post news in the form of media cards or hyperlinks to the URLs. About 121 reports generated a total of 5353 original tweets in the first-step dissemination. We also obtained the ID numbers, number of retweets, number of endorsements, number of replies, and user information of these tweets.(3)Obtaining secondary-step dissemination information: In this study, we consider the process of spreading original tweets in the form of retweets (retweets) as secondary dissemination. We used the Twitter API to capture the retweets generated based on the IDs of the original tweets obtained in step (2) and recorded the usernames and personal profiles of both parties in the retweeting relationship. About 121 reports had a total of 5353 original tweets on Twitter, and these original tweets further generated 9247 retweeted tweets in the secondary dissemination.(4)Extraction: We extracted a total of 10,429 Twitter users involved in primary-level and secondary-level dissemination and assessed the degree of automated manipulation of these users. Botometer, a social bot identification tool, calculates the likelihood of a particular account being a bot by generating a score between 0 and 5 for each user. The closer the score is to 5, the more likely the account is a bot; and the closer the score is to 0, the more likely the account is a human user. In this study, accounts with Botometer scores of 3.5 and above were determined to be social bots. If a user has set privacy permissions, has been logged off or frozen by the shutdown, or the number of tweets is insufficient to support the analysis, Botometer is unable to evaluate their bot scores. And we mark such users as unknown users and exclude them from the analysis of this study. Through screening, the remaining valid users were 10,218, of which 1405 were bots and 8813 were humans, and the tweets of valid users were 14,268 distributed as of which 2407 were by bots and 11,861 were by humans.

4. Results and Analysis

4.1. Overview of Primary- and Secondary-Level Communication

This study involved 121 articles, 5160 tweets generated through primary-step communication, and 9108 retweets generated through secondary-step communication. Table 1 shows the distribution of the number of tweets and retweets received by the 121 articles. On average, each article generated 42.6 original tweets and 75.2 retweets on Twitter. The number of tweets received is more evenly distributed across articles than retweets, meaning that the disparity in popularity between articles is further widened in secondary step communication. Among the 121 articles, some articles received only one retweet, but some articles received as many as 982 retweets.

We further examined the contribution of bots to both first-level and secondary-level communication. After excluding users who could not be identified as bots or humans, we found that bots accounted for 14.6% of the first-level communication. As shown in Figure 1, after excluding users who could not be identified as bots or humans, we found that bots accounted for 14.6% of the first-level communication, producing 1161 tweets (22.5%), and 13.0% of the secondary-level communication, producing 1246 retweets (13.6%). Bots were involved in both levels of communication and were equal in number but had a higher percentage of posts in the first level of communication.

The cumulative distribution function is used to visualize the contribution of users with botscores less than a specific value to the scale of the first and secondary level of communication. The curve of secondary-level communication is mostly above the first-level communication. Using the bot score = 3.5 as the dividing line, it can also be seen that the proportion of human-produced content in secondary-level communication is higher than the proportion of content produced by human users in the first distribution.

Figure 2 presents the contribution of bots to the dissemination of 121 stories in the first-level and second-level dissemination levels. The horizontal axis indicates the bot content of a story in the first-level or second-level dissemination posts, where 0 means the percentage of bot posts is 0 and 5 means all posts are made by bots, using 3.5 as the statistical interval; the vertical axis indicates the proportion of stories with bot content falling in the interval shown on the horizontal axis among the 121 stories. As shown, although there is some intersection between the two histograms, the retweet histogram as a whole lies to the left of the tweeting histogram. Specifically, the proportion of bots is higher among users who participate in the first-level dissemination of stories. For some articles, bots even contribute 60% to 70% of the dissemination, and for most stories, bots contribute about 10% to 50% of the dissemination. In contrast, the contribution of bots in the secondary dissemination of stories is relatively low, and although up to 90% of the retweets of individual stories are produced by bots, the contribution of bots lies below 35% for most of the retweets of stories. This suggests that bots are more active in first-level dissemination than in second-level dissemination.

In this paper, the 121 news reports collected about Xinjiang are content coded by topic Figure 3, and the data show that about 74% of the articles deal with political topics, indicating that The New York Times considers the issue of cotton in Xinjiang not as a purely economic issue but as a political issue, followed closely by economic and trade issues in Xinjiang at 40%, and public health and public policy coverage at about 21%, The New York Times not only focuses on Xinjiang issues from a serious perspective but also extends further in terms of topics, focusing on local culture, technology, sports, entertainment, environment, and so on. Although each topic takes up a small amount of space, we can also see that the issue discussion about cotton in Xinjiang is of a diffuse character.

After recalculation, we divided the 121 original article topics into 11 categories. Based on the aforementioned calculation, we further evaluated the relationship between robot retweeting behaviors and topics in the first and second stage retweets from the perspective of conditional probability and found that the robot retweets in the second stage spread were larger than those in the first stage spread, and the vertical coordinate retweets in the first stage spread were 3500 (Figure 4), ranking the top six topics were politics, business, economy, public policy, gender, and technology, and the vertical coordinate retweets of the second-level communication were 7000 (Figure 5), ranking the top six topics as politics, business, economy, gender, technology, and sports. By comparison, it was found that in the second-level communication, bots were more inclined to communicate topics such as gender, sports, and culture, while the more serious topics such as public policy and health were downplayed, indicating that bots prefer to post content that can provoke controversial and inflammatory content in secondary communication.

4.2. Identity and Intention of Robots in Primary- and Secondary-Level Communication

Among the users who participated in the first-level communication, 1161 were bots, accounting for 22.5%; among the users who participated in the secondary-level communication, 1246 were bots, accounting for 13.6%. We ranked the bots in descending order of tweeting frequency to obtain the ranking of bot activity. The number one account in terms of tweeting frequency, @purelyfast, had posted 46 tweets containing the aforementioned story, the number two account, @dubvNOW, had posted 35 tweets containing the aforementioned story, and the number three to top-10 bot users had posted between 15 and 30 tweets. In terms of content, these active bots posted almost all of the 121 stories. An examination of the top-10 bot personal pages (accounts in order of @purelyfast, @dubvNOW, @TheHelenWang, @sapiopath, @akbarth3great, @SinoSpectator, @toffanin, @jlitwinetz, @YaBoi37137010, and @FloLake), these bots do not have a special interest in China issues or Hong Kong issues, but rather carry a lot of undifferentiated news reports on various topics published in The New York Times. Among the users who participated in the secondary-level communication, the top-5 bots in terms of posting frequency were @ZTSecure, @agent_guzman, @getMazak, @Aahmattt, and @MihrigulAbdulw1, which contributed retweets in the secondary-level communication were 27, 27, 27, 13, and 13, respectively, for a total share of 8.68%.

4.3. Communication Effects of Robots in Primary and Secondary Communication

There is a significant difference in the interactive feedback triggered by bots and human users in first-level communication. Since The New York Times has official accounts on Twitter (Asian news is often posted by @nytimes and @ nytimesworld), to exclude the interference of the role and influence of official accounts, we analyzed human users separately by including and excluding official accounts when counting them. Table 2 shows that the number of bot posts (2.38) far exceeds the number of posts by human users (i.e., the number of human secondary communication) by 1.6 times (1.4), but lags far behind human users in terms of the number of likes, spins, and comments (a maximum difference of 10 times). However, in both ways of calculating human users, removing the official account does decrease the average number of likes, spins, and comments of human users, but the decrease is not large enough to affect the comparison between human users and bots.

We screened the 100 most active users and the 100 most influential users in the first-level propagation, evaluated their bot score distributions, and compared both with the bot score distributions of all users. Active users are defined as those with the highest ranking in terms of tweets, and influential users are defined as those with the highest ranking in terms of the total number of retweets received. Figure 6 shows that the bot score distribution of all users involved in the first level of communication shows a clear right-skewed trend, with some users having a bot score greater than 0.5, but more users having a bot score less than 0.5, and the most concentrated distribution of users in the range less than 0.2. The bot scores of the most influential users also show an obvious right-skewed trend, with scores concentrated between [0, 0.15), which is much higher than the proportion of the distribution of the full sample of users in this interval, among which [0.05, 0.1) is the most dense, followed by [1.0.15) and [0, 0.05). In contrast, the most active users do not have a significant skewed dynamic, and their distribution in the high bot score interval is much higher than that of the full sample of users and high-impact users. It can be seen that a high proportion of active users with high-frequency posting are bots, but bots are less likely to be high-influential users who trigger a large number of retweets.

Since bots exist to influence humans, the interactions that occur between users with different identities are of interest, and thus we introduce the identity dimension into the observation of retweeting relationships in our study of secondary level of communication. To represent the frequency distribution of retweets among users with different bot scores, we slice the bot score into intervals of size 0.05 and construct a two-dimensional matrix of the bot scores of the tweeters and retweeters. Once a retweet occurs, the values of the corresponding interval are added one according to the bot scores of both retweeters. Figure 7 illustrates the heat map of this matrix. The higher the brightness, the higher the frequency of interaction between users whose bot scores belong to this interval. As can be seen in the figure, the top left is the brightest area, followed by the bottom left. This indicates that the most frequent retweets occur between human users and human users; moreover, bots also retweet human users' tweets frequently. The right half of Figure 7 is darker overall, suggesting that both human and bot users are less likely to retweet from bots.

As is presented in Figure 8, based on the aforementioned calculations, we further evaluate from a conditional probability perspective what percentage of the bot-initiated retweets are responded to by human users. As shown in the lighter line, the bot score of retweets is more evenly distributed when the tweeter is a bot, except for a peak at the end; however, according to the darker line, the majority of retweets are made by human users when the tweeter is a human user. In other words, although bots as primary distributors can trigger human users for the second step of communication, most humans still choose to retweet news from human users in the first step of communication.

5. Conclusion and Discussion

In this paper, we focus on the first-level and second-level dissemination of The New York Times' series of reports on China's Xinjiang-related issues on social media Twitter and explore the roles and behavioral patterns of social bots in spreading related reports based on the analysis of bots' posting frequency, influence, and retweeting relationships. We found that a large number of social bots proliferated The New York Times story in both primary and secondary levels of communication, but the bots played different roles in the primary and secondary communication. In the first step of communication, bots produced up to 22.5% of the content, and bots were mostly undifferentiated carriers of information and diffusers of the original media coverage, showing no clear interest preferences. In the second step of communication, bots contribute 13.6% of the content, but for secondary retweets, they are more inclined to retweet negative stories, forming a manipulator of public opinion on a particular issue.

Overall, the observation of social bot influence reveals that social bots are still weak in terms of spreading the influence of professional information and do not form a large-scale triggering retweeting effect. In the first step of communication, although social bots post heavily, they are still inadequate in terms of gaining retweets and likes for their influence. In the second step of communication, we find that the most intensive retweets still occur among human users, followed by bot-to-human retweets. Social bots do have the ability to trigger retweets from human users to them, but the frequency of these retweets is low compared to those between human users.

The study of social bots in secondary step communication of news stories allows us to think further in two ways. In terms of journalistic professionalism, in recent years, there have been numerous concerns about the manipulation of public opinion by social bots, yet the proliferation of bots for professional media news coverage in the social media era has not yet been a concern of academia and industry. The objectivity and impartiality of news from professional media organizations are being distorted by the secondary dissemination of social bots and to a certain extent, the journalistic professionalism of news organizations and the algorithmic bias of social bots can form an obstacle to achieving a seemingly objective and biased expression of media views. From the perspective of the political dissemination of misleading information, the power behind social bot manipulation should not be underestimated. In recent years, computational political manipulation has emerged as an act of prediction and association by states, state-like actors, politicians, or politically inclined Internet users using intelligent algorithms. Computational political manipulation is achieved through the use of social bots for the mass dissemination of baiting messages on social media and virtual accounts to guide and manipulate the ideology of the target user in the political sphere. A computational political manipulation is an act of technopoliticization. Among them, the most obvious political manipulation is carried out by social bots as intermediaries. In the digital age, computational political manipulation implementers can operate with high-precision microtargets using big data and machine learning to influence public sentiment and shape the audience's political paranoid personality to threaten the ideology of the organization in power. This study looks at computational communication from the perspective of information diffusion by social bots, revealing the role of social bots in two-step communication, and facilitating us to think deeply about journalistic ethics and journalistic professionalism in the digital age.

Data Availability

All data sets can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest regarding this paper.

Authors’ Contributions

All authors supervised, validated, visualized, wrote the original draft, and reviewed and edited the manuscript.

Acknowledgments

The present study was endorsed by the National Social Science Foundation of China (grant no. 21BXW066), National Social Science Foundation of China (grant no. 21&ZD193), Foundation of People's Public Security University of China (grant No. 2021JKF207), Fundamental Research Funds for the Central Universities (project no. 2020CDJSK07XK01), Chongqing Natural Science Foundation (project no. cstc2020jcyc-bshX0017), and Chinese Postdoctoral Science Foundation (project no. 2020M673115).