Abstract

This paper aims to explore coevolution of emotional contagion and behavior for microblog sentiment analysis. Accordingly, a deep learning architecture (denoted as MSA-UITC) is proposed for the target microblog. Firstly, the coevolution of emotional contagion and behavior is described by the tie strength between microblogs, that is, with the spread of emotional contagion, user behavior such as emotional expression will be affected. Then, based on user interaction and the correlation with target microblog, the Hawkes process is adopted to quantify the tie strength between microblogs so as to build the corresponding weighted network. Secondly, in the weighted network, the Deepwalk algorithm is used to build the sequence representation of microblogs which are similar to the target microblog. Next, a CNN-BiLSTM-Attention network (the convolutional neural network and bidirectional long short-term memory network with a multihead attention mechanism) is designed to analyze the sentiment analysis of target and similar microblogs. Finally, the experimental results on two real Twitter datasets demonstrate that the proposed MSA-UITC has advanced performance compared with the existing state-of-the-art methods.

1. Introduction

Information diffusion is a widely concerned research topic, which involves the study of epidemic transmission in biology [1], computer virus propagation over complex networks [2, 3], and other topics [4, 5]. Social networks are one of the important carriers of information spreading [6, 7]; users often express their opinions and emotions on social media and even imitate the expressions, sounds, and gestures of others, which is called emotional contagion [8]. Under the contagion, emotions can be infected and spread from one person to another [9]. Therefore, emotions have become an important driving factor for information spreading on social networks. Furthermore, the emotional contagion on social networks is rarely independent of the propagation of behavior or information, and they are coevolving with strong interactions [10]. With the explosive growth of information on social media, microblogging has become a source of public opinions and emotions on various public topics. Consequently, it is necessary to analyze the sentiment of microblogging data. For example, microblog sentiment analysis can help enterprises accurately obtain the feedback information of customers on products [11, 12], thereby improving product quality according to the feedback information and developing more efficient product promotion plans. Moreover, the government can make quick response to public events through public opinion supervision and realize emotional guidance for netizens. In addition, microblog sentiment analysis plays an important role in many other fields [13, 14].

The most commonly studied methods of microblog sentiment analysis are lexicon-based methods, machine learning-based methods, and deep learning-based methods. The lexicon-based methods use the weight algorithm to analyze sentiment [15, 16] and are relatively simple in classifying sentiment polarity, but their performance is limited by the construction of emotional lexicon and the quality of judgment rules. For this issue, many researchers have applied machine learning for microblog sentiment analysis [17, 18]. Unfortunately, the performance of machine learning-based methods depends on the quality of the annotated datasets. For automatic feature extraction, deep learning-based methods have been widely developed [19, 20]. However, microblog texts are relatively short with irregular grammar and rich data noise, which aggravates the validity problem of useful data. Therefore, these methods that only rely on independent microblogging data have their own disadvantages.

In fact, the work [21] claimed that there is a coevolutionary relationship between emotional contagion and user behavior. In other words, with the contagion and the spread of emotions, user behavior such as emotional expression will be affected. When the user browses information, the user’s emotions are easily affected, so the content posted by the user will show the influence. Leskovec et al. [22] pointed out that users tend to make friends with similar people and will constantly adjust their behaviors to keep pace with their friends [23, 24]. Based on these theories, Miller et al. [25] found that the sentiment of microblog messages is influenced by the connected messages and spreads in the network formed by following the behavior between users. Besides, the results in [26] indicated that, under the emotional contagion, happy behaviors spread dynamically on social media. The above research results show that the propagation of behavior and information on social media is interdependent [27], and the closer the tie between users, the stronger of emotional contagion, and the sentiments of posts are become more similar. Therefore, the coevolutionary phenomenon on social media can be used to improve the insufficiency of the abovementioned methods that only rely on independent microblogging data.

Some researchers have considered the coevolutionary phenomenon to analyze the sentiment of microblogging data. Hu et al. [28] used sentiment consistency (i.e., the messages posted by the same user tend to have the same sentiment polarity) and emotional contagion to construct the tie for analyzing the sentiment of noisy and short tweets. On this basis, other studies have considered more influential factors to describe the tie strength between microblogs. For instance, the work [29] considered the similarity of users’ personal information and the frequency of interaction, and the work [30] added the similarity of texts. These methods extract influencing factors that involved all topics when calculating the tie strength, but fail to consider that the tie strength between users will have significant differences under different topics.

For this problem, Zou et al. [31] considered the topic context as an influential factor of tie strength and Liu et al. [32] divided the datasets by topic category. Furthermore, the authors [33] proposed the subjective and objective factors that affect users’ emotional changes to simulate the emotional contagion between users in the rumor propagation process on microblogging. The above research studies comprehensively consider the factors that influence the tie strength on social media, thereby obtaining the sentiment correlation between microblogs. However, these studies all empirically set the weight of each influencing factor when calculating the tie strength, which has a certain impact on the accuracy of the tie strength. In addition, most of the above methods adopt the least square method to classify sentiment polarity, which is not suitable for the high-dimensional and unlabeled data. With the successful application of deep learning in the field of natural language processing, in 2019, Zhao et al. [34] proposed to combine deep learning and the tie strength between microblogs to analyze microblog sentiment, and the results show that its performance is better than that of the least square method, but this method simply applies the traditional neural network architecture. Up to now, to our best knowledge, there is no other related follow-up work.

Inspired by the pioneering work [34], this paper proposes a deep learning architecture (denoted as MSA-UITC) to explore coevolution of emotional contagion and behavior for microblog sentiment analysis. Specifically, the coevolution of emotional contagion and behavior is described by the tie strength between microblogs, which is calculated by the Hawkes process. Besides, the Deepwalk algorithm is used to find the similar microblogs of the target microblog. Moreover, a CNN-BiLSTM-Attention network (the convolutional neural network and bidirectional long short-term memory network with a multihead attention mechanism) is designed to extract semantic features of microblog texts. Finally, the experimental results on two real datasets show that the proposed MSA-UITC can improve the accuracy of sentiment analysis.

The rest of this paper is organized as follows. Section 2 describes the proposed MSA-UITC in detail. The experimental results are described in Section 3. Finally, the conclusions are provided in Section 4.

2. The Proposed MSA-UITC

2.1. A Framework Overview

Figure 1 illustrates the overview of the proposed MSA-UITC for microblog sentiment analysis. It mainly consists of three parts: constructing the tie strength-based weighted network, building the sequence representation of similar microblogs, and CNN-BiLSTM-Attention network. Firstly, apply the Hawkes process to calculate the strength of tie based on user interaction and the correlation to the target microblog and construct the weighted network based on the tie strength. Secondly, utilize the Deepwalk algorithm to get the sequence representation of similar microblogs. Finally, use CNN-BiLSTM-Attention network to extract the joint features of target and similar microblogs for sentiment prediction. Now, let us elaborate on these processes.

2.2. Constructing the Tie Strength-Based Weighted Network

On social media, there is an interactive behavior between users, namely, the following relationship, which is similar to the friend relationship in real life. According to the emotional contagion theory, users are more susceptible to the sentiment of their friends (i.e., the two microblog texts posted by two users with the following relationship have similar sentiment [28]). Additionally, on the same topic, the closer the correlation between microblogs, the stronger the tie strength. Therefore, two factors that influence the tie strength between microblogs are considered. One is the following relationship between users, and the other is the correlation between microblogs.

To describe the correlation between microblogs, the Term Frequency-Inverse Document Frequency (TF-IDF) method [35] is adopted. Let the vector denote the microblog text posted by user , where represents the th word in this microblog text and denotes the weight of . Then, can be calculated as follows:where represents the frequency of word in the microblog text posted by user , represents the total number of words in the microblog text posted by user , is the total number of microblog texts in datasets, and denotes the number of microblog texts containing word in datasets.

To keep the denominator from 0, set equal to 1. According to equations (1)–(3), the vectors and of the microblog texts that are posted by users and can be obtained. The correlation between and can be calculated by cosine similarity:where denotes the vector multiplication and means the modulus.

Next, the tie strength is calculated by the Hawkes process, which is usually used to predict the impact on the current event based on the correlation between events. Therefore, the Hawkes process has been widely used in video popularity prediction [36] and disease prediction [37]. According to the definition of the Hawkes process in [38], the tie strength can be expressed as follows:where , and , respectively, represent the following relationship between users and and the correlation strength between microblog texts in time period , is a constant number of the basic strength, denotes the weight of the th influencing factor, and is the time adjustment factor of the th influencing factor. The maximum likelihood estimation is used to calculate the parameters in the Hawkes process. Finally, the tie strength between users and can be calculated by equation (5).

For constructing the tie strength-based weighted network, let represent the microblog-user correlation matrix, and the matrix element mean that the th microblog is posted by the th user. Let represent the tie strength matrix between user and , and the element . Let matrix denote the tie strength between microblogs. According to the above definition, the tie strength matrix between microblogs can be expressed as . An example of calculating the tie strength matrix between microblogs is shown in Figure 2.

As the graph can easily capture the tie strength between nodes, this paper transforms the tie strength matrix into an undirected weight network , where represents the set of nodes and each node in is associated with a microblog text, represents the set of edges, and the element represents the weight of each edge and . Based on the above description, the weighted network is constructed.

2.3. Building the Sequence Representation of Similar Microblogs

To find the microblog texts that are more similar to the target microblog according to the edge weigth of , the Deepwalk algorithm [39] that combines the Random Walk algorithm and Skip-Gram algorithm is adopted in this paper, which is the first network embedding method for learning low-dimensional latent representation of nodes in a network.

The main implementation process of Deepwalk algorithm is as follows. Firstly, randomly select a node from the weighted network as the starting node of random walk. Next, take walk sampling from the neighbor nodes until the maximum step size is reached. Since the sampling probability of a node is related to the weight of the connected edges, the Random Walk algorithm gets a set of node based on the weight of edge. Specifically, is defined as the set of contextual node which is related to each target node and represents the window size. Finally, use the objective function of Skip-Gram algorithm to predict the contextual node for each target node . The expression of the objective function iswherewhere represents the -dimensional vector space and can be expressed as follows:where denotes a matrix, is the vector representation of node , and represents the embedded dimension.

The vector representation of nodes can be obtained by maximizing the objective function , and the vector representation matrix of all nodes in network can be obtained. Thus, a representation matrix of all the similar microblog texts is obtained according to the Deepwalk algorithm.

2.4. CNN-BiLSTM-Attention Network

After obtaining the representation of similar microblogs, this paper designs the CNN-BiLSTM-Attention network to predict the sentiment polarity of target and similar microblogs. As shown in Figure 3, the sentiment analysis process can be divided into two branches. The left branch is used to extract semantic features of the target microblog, and the right branch is used to extract semantic features of the microblogs that are similar to the target microblog. The feature extraction process of the two branches will be introduced separately below.

Firstly, let us introduce the left branch of CNN-BiLSTM-Attention network. This paper uses CNN-BiLSTM with multihead attention mechanism to capture the high-level context information of the target microblog. For extracting the semantic features, the word embedding method is used to generate the vector representation of words. As microblog texts are short with irregular grammar, dimensional pretraining word embedding [40] is used in this paper.

Assume that the sentence contains words. Then, map each word into vector representation by word embedding which is denoted as , . Thus, all the sentences are converted into vectors representation as the input of the neural network. Figure 4 shows the process of CNN extracting sentence features. Specifically, a vector representation sequence of sentence is generated by word embedding. Then, local features are generated through the convolutional layer. To extract multiple features, three different size filters are used for feature extraction, and the filter window size is set to . The output of the convolutional layer iswhere represents the weight matrix, represents , is the bias vector, and is a nonlinear function. So, the set of local features is obtained by the convolutional layer. To further obtain features that contain important information, the feature maps generated by the three filters are sent to the max pooling layer:and then connect the output.

Since the BiLSTM network is fused by forward and backward LSTM, the information features in the preceding and the following can be captured. To obtain the context-dependent information between sentences, BiLSTM is employed to capture contextual semantic information. The input of BiLSTM is a connection of the pooling layer output vector that is denoted as . Figure 5 demonstrates the internal structure of BiLSTM to learn contextual semantic information. The internal information of BiLSTM is updated as follows:where means the elementwise multiplication, denotes the input, is the activation function, , , and represent input gate, forget gate, and output gate of LSTM, respectively, stands for the memory unit of LSTM, , , and represent the hidden state of LSTM, the hidden state of forward LSTM, and the hidden state of backward LSTM, respectively, and are weight matrices, , , , and are bias vectors, and means the output of the BiLSTM hidden state.

Next, the attention mechanism is used to assign different weights to the output features of BiLSTM. This paper uses the multihead attention mechanism that is composed of a series of self-attention models and is proposed in [41]. The expression for the self-attention mechanism is shown as follows:where represent the query vector, key vector, and value vector, respectively, and they are the mapping vectors of self-attention function, and denotes the dimension of . Then, the multihead attention mechanism can be expressed as follows:wherewhere , , , and represent the weight matrices for the linear layer. The structure of multihead attention mechanism is shown in Figure 6.

Now, let us introduce the right branch to learn the features of similar microblogs. According to the above introduction, the sequence representation of similar microblogs is generated by the Deepwalk algorithm. All similar microblogs are regarded as sentences, and their processing is similar to the target microblog. Specifically, the vector representation of words is generated by using the same word embedding method as the left branch. As the sentence order does not provide contextual dependency, here only CNN and multihead attention mechanism are used to extract the semantic features of each sentence, which are the same as the left branch.

After extracting the features from two branches, the joint features are generated by concatenating these two branch features. To further obtain the important features associated with sentiment expression, this paper uses the multihead attention mechanism after these two branches. Then, the joint features are fed into the fully connected layer to predict the sentiment polarity, and the softmax function is used to learn the probability of each sentiment polarity. This paper adopts the cross-entropy loss as the loss function in the training process, and the expression of objective function iswhere is the training set, is the microblog texts in the training set, is the number of sentiment polarity categories, means whether the sentiment polarity of the text belongs to the category , and represents the probability that the sentiment polarity of the predicted microblog is . This paper optimizes the network by minimizing the loss function and uses the back propagation algorithm to train the network.

3. Experiments

In the previous section, the proposed MSA-UITC has been introduced in detail. To further verify its validity, this section will conduct some experiments. Firstly, this section will introduce experimental settings. Then, the datasets and evaluation metrics will be described. Finally, the performance analysis including the comparison with other methods will be implemented.

3.1. Experimental Settings

In the experiments, the parameters of Deepwalk algorithm are set as follows: the maximum step size and the window size . For the CNN-BiLSTM-Attention network, the pretrained word embedding dimension is set to , the training epoch is set to 50, the batch size is set to 64, the number of CNN filters and BiLSTM hidden units are both set to 128, and the number of self-attention mechanism is set to . Furthermore, this paper uses dropout operation before the fully connected layer, and the dropout rate is set to 0.5.

3.2. Datasets

All experiments are conducted on two real-word Twitter datasets: Obama-McCain Debate (OMD) [42] and Health Care Reform (HCR) [43]. OMD and HCR datasets include tweets and manual sentiment polarity labels:(1)OMD: this dataset includes 3269 tweets and content is about the presidential debate between Barack Obama and John McCain in 2008. Tweets are annotated by at least three Amazon Mechanical Turkers and the sentiment is labeled by four polarities: positive, negative, mixed, and irrelevant. In this paper, the majority voting score is used to represent the sentiment polarity of tweets and only keeps tweets with positive and negative polarities. To obtain the following relationship between users, this paper uses the complete follower graph that is crawled by [44] in 2009. In the experiments, the OMD dataset is divided into three topics by the keywords it contained, i.e., Obama (including keyword “Obama” without “McCain”), McCain (including keyword “McCain” without “Obama”), and debate (including “Obama” and “McCain” or none of them).(2)HCR: this dataset contains 2516 tweets and the content includes the healthcare reform event in the United States in March 2010. The sentiment labels are manually annotated with five polarities: positive, negative, irrelevant, neutral, and unsure. Besides, this dataset divides tweets into 9 manually annotated topics, i.e., health care reform, Obama, Democrats, Tea Party, Stupak, Republicans, conservatives, liberals, and others [43]. The same as OMD, in the experiments, only retain tweets with positive and negative polarities and use the following relationship crawled by [44] in 2009.

In the experiments, randomly divide of the dataset into a training set and of the dataset into a test set. The detailed information of OMD and HCR datasets is shown in Table 1.

3.3. Evaluation Metrics

To compare the performance of microblog sentiment analysis methods, this paper uses four metrics: , , , and . Specifically, represents the probability of correctly predicted samples in the total predicted samples, is the probability of correctly predicted samples to be positive in the total number that predicted samples to be positive, denotes the probability of correctly predicted samples to be positive in the total positive samples, and is the harmonic average of and . The calculation formulas are shown as follows:where denotes the and represents the .

3.4. Methodological Comparison and Analysis
3.4.1. Usefulness of the Tie and Attention Mechanism

Some comparative experiments are conducted to verify whether exploring the tie between microblogs and multihead attention mechanism can improve the accuracy of sentiment analysis.

To verify the effectiveness of the tie, Figures 7 and 8 show the comparison results of MSA-UITC and MSA architecture (the proposed microblog sentiment analysis architecture does not use the tie between microblogs) on OMD and HCR datasets, respectively. From them, one can conclude that MSA-UITC with the tie performs best both in , , , and for microblog sentiment analysis, implying the usefulness of the tie. On the OMD dataset, the result shows that MSA-UITC outperforms MSA and obtains improvement of 3.72% in and 3.33% in . Besides, on the HCR dataset, MSA-UITC has also achieved improvement with 3.21% in and 2.22% in . This is because MSA-UITC with the tie alleviates the sparse problem of microblog texts by constructing the sentiment connection between microblogs.

To verify the effectiveness of attention mechanism, Figures 9 and 10 display the comparative results on two datasets. Specifically, “text + concat attention” denotes that the architecture uses three attention mechanisms, “text attention” means that the architecture only uses attention mechanism in the feature extraction process of target microblog and similar microblogs, “concat attention” represents that the attention mechanism is only used in the concatenation layer, and “none attention” implies that the architecture does not use any attention mechanism. From the experimental results, one can conclude that the attention mechanism plays a significant role in improving the performance of the architecture on two datasets. This is because the text layer attention and the concatenation layer attention assign effective weights to the sentiment words that determine the sentiment polarity in the microblog text. On the OMD dataset, MSA-UITC is 2.27% and 1.32% higher in and than the architecture without attention mechanism. On the HCR dataset, MSA-UITC has also achieved improvement with 2.41% in and 1.18% in compared with the architecture without attention mechanism.

3.4.2. Comparison with State-of-the-Art Methods

The above comparative experiments validate the usefulness of the tie between microblogs and attention mechanism in the proposed architecture. To further confirm the superiority of MSA-UITC, some comparative experiments are performed with state-of-the-art methods. The details of these methods are as follows:(1)SANT is a supervised method proposed in [28]. It uses sentiment consistency and emotional contagion to classify microblog sentiment.(2)SMSC is proposed in [45]. It is a structured framework by combining content and social context for microblog sentiment analysis.(3)SASS is proposed in [31]. It uses structure similarity and topic context for sentiment analysis.(4)SRPNN is proposed in [34]. It is the first work to combine user trust network and deep learning network for sentiment classification.

Table 2 shows the comparison of accuracy on OMD and HCR datasets. Compared with the state-of-the-art methods, MSA-UITC performs best both on OMD and HCR datasets, which means that the proposed tie strength calculation method and CNN-BiLSTM-Attention network can improve the performance of microblog sentiment analysis. In detail, compared with the SANT method, the of our proposed architecture on the OMD dataset is improved by 3.45% Besides, the of our proposed architecture outperforms the SMSC method with an improvement of 1.77% and 0.99%, respectively, on OMD and HCR datasets. Compared with the SASS method, MSA-UITC also gets an improvement of 0.76% and 2.12%, respectively, on OMD and HCR datasets. The main reason is that our proposed architecture can effectively connect similar microblogs and improve the accuracy of sentiment analysis by the CNN-BiLSTM-Attention network.

To further verify the validity of the proposed deep neural network, our architecture compared with the SRPNN method adopts CNN and simplified LSTM network for capturing semantic features of texts. The comparison results show that our architecture obtains an improvement of 2.99% on the OMD dataset and 2.92% on the HCR dataset. This indicates that the BiLSTM network and multihead attention mechanism have achieved performance improvement by extracting contextual semantic information and assigning different weights for features.

4. Conclusions

In this paper, a deep learning architecture (denoted as MSA-UITC) has been developed to explore coevolution of emotional contagion and behavior for microblog sentiment analysis. Specifically, the proposed MSA-UITC considers the user interaction and microblog textes correlation as the influencing factors and uses the Hawkes process to calculate the tie strength. Besides, the Deepwalk algorithm is used to find the similar microblogs of the target microblog. Afterwards, a CNN-BiLSTM-Attention network is designed to improve the performance of sentiment analysis. Finally, some comparative experiments on two real Twitter datasets prove the superiority of the proposed architecture.

Although the proposed microblog sentiment analysis architecture has achieved competitive performance, there are some future works which can be continued. On the one hand, this architecture only classifies positive and negative labels about the tweets on the datasets, and it is necessary to expand the categories of sentiment labels. On the other hand, one can also continue to optimize the deep learning model to improve the performance of sentiment analysis, such as using an improved wording embedding model [46]. Additionally, this paper considers that user interaction and the correlation with target microblog are two factors influencing the tie strength between microblogs. However, the history of the target microblog also has the similar sentiment as the target microblog. Therefore, it is necessary to consider the history of the target microblog as a factor influencing the tie strength.

Data Availability

No data were used to support the findings of the study.

Conflicts of Interest

All authors declare no conflicts of interest.

Authors’ Contributions

The authors claim that the research was realized in collaboration with the same responsibility. All authors read and approved the last version of the manuscript.

Acknowledgments

This work is supported by Natural Science Foundation of China (Grant Nos. 61702066 and 11747125), Major Project of Science and Technology Research Program of Chongqing Education Commission of China (Grant No. KJZDM201900601), Chongqing Research Program of Basic Research and Frontier Technology (Grant Nos. cstc2017jcyjAX0256 and cstc2018jcyjAX0154), Project Supported by Chongqing Municipal Key Laboratory of Institutions of Higher Education (Grant No. cqupt-mct-201901), Project Supported by Chongqing Key Laboratory of Mobile Communications Technology (Grant No. cqupt-mct-202002), Project Supported by Engineering Research Center of Mobile Communications, Ministry of Education (Grant No. cqupt-mct- 202006), Research Innovation Program for Postgraduate of Chongqing (Grant Nos. CYS17217 and CYS18238).