Abstract

This paper addresses the problem of detecting internet rumors in social media. Rumors do great harm to information society, making rumor detection necessary. However, existing methods for detecting rumors generally only learn pattern features or text content features from the whole propagation process, which fall short in capturing multilevel features with topic relevance of text content from social media data. In this paper, we propose a novel graph convolution network model, named multilevel feature fusion-based graph convolution network (MFF-GCN) which can employ multiple streams of GCNs to learn different level features of rumor data, respectively. We build a heterogeneous tweet graph for each single-level feature GCN to encode the topic relation among tweets based on the text contents. Experiments on real-world Twitter data demonstrate that our proposed approach achieves much better performance than the state-of-the-art methods with higher values of precision and recall as well as their corresponding F1 score. In addition, the diversity of our experimental results shows the generalization ability of our model.

1. Introduction

With the widespread use of platforms such as Facebook and Twitter, people are increasingly accustomed to understanding and tracking the latest developments in news reporting on social media. However, it has become an unavoidable disadvantage that much fake information and real information are mixed and quickly disseminated through the publishing and sharing behavior of users on social media. Since it is not advisable to manually check all the information, which is too time-consuming to handle each piece of information generated by an emergency, a rumor detection model that automatically evaluates the reliability of text is needed for detecting the rumors.

In this study, rumor detection aims to detect an unverified and instrumentally related information statement in circulation [1]. To analyze the credibility of posts on platforms such as Twitter, early methods apply hand-craft features from posts to train their rumor detection models. In recent years, deep learning models have achieved impressive success in the field of natural language processing (NLP), many scholars use deep learning models to automatically extract features in rumor detection tasks. Ma et al. [2] constructed the tree-structured recursive neural network to capture the information about the propagation structure. Asghar et al. [3] combined convolutional neural networks (CNN) with long short-term memory (LSTM) to extract features while preserving the context information. Since the graph neural network (GCN) is introduced by Kipf and Welling [4], graph-structure data can be encoded directly by the neural networks, and the GCN-based method has become a powerful tool for detecting rumors. Bian et al. [5] applied tree-structured data to represent rumor propagation threads and used it as input to graph convolution networks for the rumor detection. Huang et al. [6] constructed a graph attention network model based on the text content and propagation threads of the source tweets for rumor detection.

In this paper, we propose a multilevel feature fusion-based graph convolution network (MFF-GCN) model for the rumor detection task. Different from other GCN-based rumor detection methods, our proposed model can extract topic relevance information among source tweets and responses in each stream of our MFF-GCN model. Specifically, we first mine topical documents (TDs) which contain topic information of source tweets, and hierarchical response documents (HRDs) to build tweet propagation structures. Then we utilize tweets, TDs, and HRDs to construct a heterogeneous tweet graph (HTG) according to the different levels of HRDs. Finally, we construct and train our MFF-GCN model based on the HTGs to make the final prediction decision for rumor detection. The flowchart of our method is shown in Figure 1. The main contributions are summarized as follows:

(i)We propose the MFF-GCN for rumor detection with topic relevance mining. To our best knowledge, this is the first attempt to apply GCN-based methods for rumor detection based on a multilevel feature learning framework.(ii)We propose the HTG for each stream of our MFF-GCN model to capture the topic relation of text contents among tweets. In addition, we mine TDs for HTG using global statistical information of corpus to make better use of the topic similarity between documents.(iii)The experimental results on real-world Twitter data show that our proposed approach outperforms the state-of-the-art methods.

2.1. Traditional Rumor Detection Methods

Traditional rumor detection methods usually use feature engineering to extract features from user’s profiles [7, 8], text content [9, 10], and propagation patterns [1113] and train classifiers based on these features to detect rumors. Tripathy et al. [14] mined a small amount of provenance information to train logistic regression classifier for rumor detection. Kwon et al. [15] proposed the periodic external shocks (PES) model to capture the pattern of rumor propagation. Yang et al. [8] extracted 19 features from Sina Weibo data. In these features, they found that the client program used for microblogging and the event location is particularly effective for detecting rumors. Wu et al. [16] built a graph-kernel-based hybrid SVM classifier that captured semantic information, sentiments, and high-order propagation patterns from Sina Weibo. Ruchansky et al. [17] constructed a model called CSI which is composed of three modules: capture, score, and integrate. They trained the CSI model based on the text content, source users, and user responses. Xing et al. [18] proposed an algorithm based on the information entropy theory, which can quantitatively analyze the influence of Weibo users. Other scholars use NLP and machine learning methods to extract abstracts [19] and emotions [20] from text content. However, these models need a lot of time to manually extract features.

2.2. Deep Learning Methods for Rumor Detection

Deep learning methods for rumor detection can automatically learn features from raw data. Ma et al. [21] treated the posts as variable-length time series and fed them into recurrent neural network as input. Liu et al. [22] trained a time series classifier for the early detection of fake news. The classifier incorporated both recurrent and convolutional networks to capture the global and local variations of user characteristics along the propagation path. Two classifiers, a neural network and a model based on stylometric analysis were proposed by Przybyla [23]. The experimental results show that their methods can capture the affective features of language elements. Yuan et al. [24] proposed a global–local attention network (GLAN) to learn local semantic and global structural information from tweeter data. A framework called LSTM-CNN was proposed by Ajao et al. [25] based on a hybrid of CNNs and LSTM to detect false news. Asghar et al. [3] proposed a BiLSTM-CNN model, where the BiLSTM layer is used to learn the long-term dependency in tweets. Song et al. [26] proposed an adversary-aware rumor detection model which includes weighted-edge transformer-graph network and position-aware adversarial response generator. Recently, GCN-based methods for rumor detection have received a generous concern from researchers. Many scholars have improved the GCN model in various directions, such as adding attention mechanism [27], improving the training speed of the model [28], and resolving data incompleteness issue [29, 30]. Some scholars have also applied the GCN model to rumor detection. Huang et al. [6] proposed a three-module model based on a GCN to obtain user behavior information. Shakshi and Rajesh [31] used a GCN model to exploit the inherent network property for identifying possible rumor spreaders in dataset. Chen et al. [32] introduced a GCN model to solve role-aware rumor problem. In addition, Huang et al. [33] proposed a meta-path-based graph attention network framework that can capture the global semantic relations of text contents. Bian et al. [5] proposed a Bi-GCN model that can mine the characteristics of patterns of deep propagation and the structures of wide dispersion. Sun et al. [34] proposed a novel graph adversarial contrastive learning (GACL) method which has better generalization in the face of noise and adversarial rumors. Different from the above GCN-based models, our method can learn multilevel features which encode the topic relationship among tweets.

3. Materials and Methods

In this section, we first introduce the TD and HRD. Then we build the HTG to construct each single-level feature GCN. Finally, the MFF-GCN is obtained for rumor detection.

3.1. Topical Document and Hierarchical Response Document Construction
3.1.1. Topical Document

In this section, we use the latent dirichlet allocation (LDA) model to extract topic words from rumor and nonrumor posts respectively, and construct TDs using the topic words. A TD is a document that consists of a single topic word and has the same label as the document it extracted from. Specifically, let and be the rumors and nonrumors in the source tweet set, respectively. The and are defined as the topic-word set of rumor and nonrumor documents, respectively. The and are computed as below:The and are obtained from and using LDA and are defined as the topic-word set of rumor and nonrumor documents, respectively. Then, each TD is defined as a single word in the topic-word set or with the same label as the document set is extracted from.

3.1.2. Hierarchical Response Document

We hierarchize all responses to each original post according to the rumor propagation. Specifically, denote and be all source tweets and all responses, respectively, where represents the -th original posts and represents the set of all responses for and be the set of all the -th level responses for , be the -layer HRD for . Then, the -th HRD with layer structure is computed as follows:The structure of responses to the tweet is shown in Figure 2. The first-level responses are all responses to the original post, the second-level responses are all responses to the first-level responses, and so on.

3.2. Heterogeneous Tweet Graph

In this section, we build a large HTG based on the whole rumor dataset. The HTG contains the rumor contents and the information involved in source tweet propagations of rumors. The HTG contains the contents of source tweets and responses, together with the relation among them. The structure of the HTG is shown in Figure 3. Specifically, let be the HTG, where represents nodes and represents edges. consists of source tweets, TDs, words of the whole corpus, and HRDs. contains four types of edges: (1) The word–word edges which represent the relationship between words. (2) The word–document edges which represent the relationship between words and documents (source tweet nodes, TD nodes, and HRD nodes). (3) The tweet–HRD edges which represent the relationship between source tweets and HRDs. (4) The self-connected edge for each node. We calculate the word–word edge weights based on PMI [35]. The word–document edge weights are obtained by TF-IDF. The tweet-HRD edge weights both and self-connected edge weights are equal to 1.

3.3. Single-Level Feature Graph Convolutional Network

Using the constructed HTG, we employ a GCN model [4] for rumor detection. The architecture of GCN has two layers. The first layer output of the GCN model, denoted by , is expressed as follows:In Equation (4), represents the input of the GCN model and represents the first layer parameter of the GCN model. is obtained by normalizing symmetric adjacency matrix. Note that we initialize the words, source tweets, HRDs, and TDs with one-hot encoding, hence, X is an identity matrix. Then, the second layer output of GCN model, denoted by , is expressed as follows:In Equation (5), represents the second layer parameter of the GCN model. The single-level feature GCN is demonstrated in the green box of Figure 4.

3.4. Multilevel Feature Fusion-Based GCN

The MFF-GCN fuzes multiple single GCNs for obtaining the predicted results. Since the TDs are a part of the training set, we just fuze the predicted values of source tweets and TDs. Specifically, is a series of HTGs, in which is the -th HTG in and obtained based on the HRDs. Then, () is the -th GCN in the MFF-GCN according to , is the predicted values of source tweets, and TDs based on the . Our MFF-GCN model can be obtained byIn Equation (6), represents the weight coefficient of , and represents the predicted values of source tweets and TDs. Our MFF-GCN model is demonstrated in Figure 4.

4. Experiment

In this section, we perform the evaluation of MFF-GCN with TD mining on the challenging rumor detection dataset and show the diverse experimental results.

4.1. Datasets

Pheme dataset was collected by Zubiaga et al. [36]. They found five newsworthy events that aroused great interest in the media and were full of rumors. They sampled the tweets that caused a lot of forwarding and collected all the tweets that responded to them. Subsequently, the reporter reads the timeline to mark whether each tweet is a rumor, to ensure that the identification of the rumor meets the established criteria [37]. Finally, 5,802 tweets were sampled, of which 1,972 were considered rumors and 3,830 were annotated as nonrumors. These annotations were distributed in different ways in the five events, as shown in Table 1. This article cleans the text content of the source tweets and their responses. The statistical information of the cleaned dataset is shown in Table 2.

4.2. Baselines

The baseline results are given as below:(i)SVM-TS: a time series model for capturing social background information over time using SVM classifiers by Ma et al. [38].(ii)BURvNN: a bottom-up tree-structured neural networks model proposed by Ma et al. [2] for rumor representation learning and classification.(iii)TDRvNN: a top–down tree-structured neural networks model proposed by Ma et al. [2] for rumor representation learning and classification.(iv)GAN-GRU: a text-based GAN-style framework by Ma et al. [39].(v)Bi-GCN: a bidirectional graph model, named bidirectional GCN, proposed by Bian et al. [5], operates on both top–down and bottom-up propagation of rumors.(vi)AARD: a rumor detection framework, adversary-aware rumor detection model, proposed by Song et al. [26], which includes weighted-edge transformer-graph network and position-aware adversarial response generator.(vii)AARD-PARG: the detector of the AARD model without adversarial learning.(viii)GACL: a graph adversarial contrastive learning method with adversarial feature transformation module by Sun et al. [34].

4.3. Settings

In the experimental settings, we build a three-level feature fusion-based GCN for rumor detection according to three levels of HRDs. The values of weights of first HTG, second HTG, and third HTG are set as 0.126, 0.406, and 0.468 for MFF-GCN and 0.233, 0.386, and 0.381 for pretrained MFF-GCN with bidirectional encoder representations from transformers (BERT), respectively. These values are obtained by the grid-search method. In each HTG, we use the experimental settings according to Yao et al. [40]. We set the window size as 20 and embedding size as 200. In training process, we set the learning rate as 0.02, dropout rate as 0.5, and loss weight as 0. During data preprocessing, we split the pheme dataset into fivefold (80% for training and 20% for testing) as Song et al. [26] did, and report the average results. Particularly, we combine the remaining 80% of the original training set and TDs together to train the model.

4.4. Evaluation Metrics

For the sake of fairness, we utilized the experimental results by Song et al. [26] and Sun et al. [34], and used the same metrics to evaluate our model, including accuracy, precision, recall, and F1 score of the two classes.

4.5. Performance

Table 3 shows the performance of our proposed method and other rumor detection methods on the pheme dataset, respectively. Comparison with other methods in Table 3, it is clear that our MFF-GCN model can generally outperforms other models, especially the pretrained MFF-GCN with BERT is more effective. Figure 5 shows the experimental results of five events, with a significant difference in accuracy between rumor and nonrumor results in Charlie Hebdo and Ferguson, in combination with Table 1, it is due to the impact of class imbalance problem. Moreover, the high proportion of these two events in the Pheme dataset makes the average accuracy of nonrumor detection better than that of rumor detection. From the experimental results, it is clear that deep learning models perform much better than traditional models which shows the advantages of deep learning for detecting rumors. Specifically, our model is significantly more accurate than sequence deep learning models (TDRvNN and BURvNN), which indicates that the GCN-based model can effectively capture the semantic relationships among documents based on the graph models. In addition, it is also observed that our MFF-GCN model generally outperforms other GCN-based models (GAN-GRU, Bi-GCN, and GACL). This is largely due to two reasons: (1) it is because each stream of our MFF-GCN model can encode topic information among source tweets, and HRDs to capture topic relevance of social media data; (2) based on the multistream learning framework, our MFF-GCN model can fuze multilevel features to learn tweet propagation structures for rumor detection. From Table 3, we can also see that the performance of our MFF-GCN model improves with increasing the number of HTGs. Since there are few responses in the fourth level of HRDs in Pheme dataset, the maximum number of levels of HRDs is set as 3 in this experiment.

5. Conclusion and Future Work

In this work, we propose a novel MFF-GCN model for rumor detection with topic relevance mining. The proposed method can capture the multilevel semantic relation of text contents among tweets. In addition, the TDs based on global statistical information of corpus enable our method to make more full use of the topic similarity between documents. The experimental results show that our model can achieve state-of-the-art results. Future work includes the application of our proposed method for multimodal rumor detection tasks. It would be interesting to research how to construct a multilevel feature fusion model for this task.

Data Availability

The Pheme data used to support the findings of this study have been deposited in https://figshare.com/articles/dataset/PHEME_dataset_of_rumours_and_non-rumours/4010619.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Hebei Province (no. F2019207118), Foundation of Hebei Educational Department (nos. ZD2021319 and ZD2021043), Hebei University of Economics and Business Foundation (nos. 2019PY01 and 2020YB13).