Abstract

Aiming at the serious colloquialism of social network texts and the sparse semantic features, this article proposes a CNN-BiGRU-based sentiment analysis method for social network texts in the big data environment. First, the dependency syntax tree is introduced to represent the dependency relationship between words to construct the word vector to represent the text. Then, sentiment features with different granularity are extracted by multiple convolution kernels of different sizes in a convolution neural network (CNN). These sentiment features are input into bidirectional gated recurrent unit (BiGRU) network for analysis to obtain deeper sentiment features. Finally, a certain number of neurons are discarded by the Dropout method, and sentiment types are classified by the Sigmoid activation function. The Weibo_senti_100k Weibo data set is used to demonstrate the proposed method. The results show that if the Dropout value is set to 0.25 and the Adam optimizer is selected, the analysis performance is the best. The accuracy, precision, recall, and AUC are about 94.09%, 95.13%, 92.87%, and 0.953, respectively, which has certain application value.

1. Introduction

With the rapid development of information technology and the Internet, people’s daily life is also changing. Internet users can more freely publish information and express opinions on social networking platforms, which also provides convenience for the dissemination of public opinion [1]. Weibo and other mainstream social network media platforms are playing an increasingly important role in public opinion communication. In recent years, the related research on text information from social networks has attracted more and more attention, especially the sentiment analysis of the text. It is of great value to grasp the attitudes and views of netizens on the topic [2].

The hot topics on the Internet are closely related to the public sentiment in the real society. The topics pushed by the Internet in real time and the infinity of Internet users’ comments and forwarding make the speed of public opinion propagation very fast. Therefore, for a topic, Internet users are easy to generate polarization of groups and even lead to group events on the Internet or in real life [3, 4]. The comment sentiment expressed by Internet users will not only affect the spread of public opinion on the whole topic but the users’ sentiments will be affected by other users. Negative public opinion will drive the negative sentiments of users and promote the development of public opinion in a negative direction. Therefore, it is necessary to analyze the sentiment of social network text, that is, extract effective information from the massive social network data, and analyze the sentiment tendency of the public. The sentiment analysis results of netizens can guide and control the topic in a targeted way to curb the continuous fermentation of negative events [5]. How to mine meaningful information from the massive social network data, analyze the comment content, and mine the sentiment tendency of the comment text, have become the current research hotspots.

The analysis of sentiment tendency in the text can also be called opinion mining. It can be continuously trained through a large number of real comment data through machine learning or deep learning methods. The final model can automatically and accurately judge the sentiment polarity expressed by the text [6]. At present, the deep learning method is excellent in sentiment classification. In essence, the deep neural network can effectively capture the high-level expression of data, and its expression ability is exponentially stronger than that of the shallow model. It can be used for sentiment analysis to more scientifically reflect the real semantics contained in the text, more carefully reflect each evaluation index of the topic, and more comprehensively help users and platforms make accurate and efficient judgments [7, 8].

However, the social network text has serious problems such as colloquialism, sparse semantic features, and too simple and refined, the analysis method based on deep learning still has some shortcomings. Therefore, a CNN-BiGRU-based social network text semantic analysis method in the big data environment is proposed. The innovations of the proposed method are summarized as follows:(1)In view of the problems that recurrent neural networks are prone to disappear gradients and cannot grasp the nonlinear relationship over a long-time span, which leads to long-term dependence, the proposed method uses the bidirectional gated recurrent unit (BiGRU) network to obtain the hidden vector representation of text and the dependency between words with large time steps.(2)In order to improve the accuracy of sentiment analysis, the proposed method combines the advantages of convolutional neural network (CNN) and BiGRU. CNN can extract local features and semantic information of text, and BiGRU can handle dependency information between words with large distance. Therefore, the CNN-BiGRU model is proposed, which has strong application ability.

The remaining chapters of this article are arranged as follows: the second chapter introduces the related research on emotion analysis; the third chapter introduces the emotion analysis model based on CNN-BiGRU; the fourth chapter is the experimental part, which designs a contrast experiment to verify the performance of the proposed model; and the fifth chapter is the conclusion.

Sentiment analysis is mainly to identify a kind of positive or negative sentiment of the target object, and the goal of text sentiment analysis is to clarify the attitude of the reviewer towards the commented object [9]. Text sentiment analysis technology is to divide unstructured subjective sentences according to their different sentiment tendencies. At present, there are three kinds of methods for text sentiment analysis: Based on a sentiment dictionary, based on machine learning, and based on the neural network [10, 11]. The analysis method based on the sentiment dictionary constructs the sentiment dictionary according to the different contexts of the text and makes rules to judge the sentiment tendency. For example, Reference [12] proposed a hybrid method based on dictionary technology and fuzzy classification technology to analyze the sentiment of the Twitter text. Using the UCINET tool for social network analysis, and combining artificial neural network to rank users, the sentiment of tweet content is divided into seven categories, which effectively realizes text sentiment analysis. In order to overcome the challenges brought by nonstandard languages, Reference [13] proposed two unsupervised dictionary-based sentiment classification methods. By using a large amount of encoded information in expressions, combined with a classifier based on weakly supervised neural networks, unknown words are recognized through word embedding in specific fields, and sentiment analysis is realized on the basis of sentiment classification. However, the non-high-frequency unit classifies the sentiment of the pseudo context through network expansion, which leads to the poor generalization ability of the dictionary-based analysis method and the difficulty in constructing the sentiment dictionary.

The method based on machine learning is to train a large number of labeled text data to obtain a pretraining model, so as to predict unknown text for sentiment classification. For example, Reference [14] proposed a hierarchical fusion cross-modal complementary network for multi-modal sentiment network analysis. The feature extraction module from text and image was used to learn the attention features of text and image generated by the image-text generator to form a hierarchical fusion framework, which could fully integrate different modal features and accurately analyze the sentiment of text and image. Reference [15] proposed a method to extract relevant sentiment information from location-based social networks and used a new scale to classify the information to achieve sentiment analysis of Twitter text data. However, the traditional machine learning methods have some problems such as high dimension and sparse features.

Compared with traditional machine learning methods, the neural network performs better in natural language processing. It maps features to low dimensional word vectors with context information and solves the problems of high dimension, sparse, and no consideration of the correlation between features in traditional text representation models. Reference [16] proposed a hybrid deep learning model for fine-grained sentiment prediction in multimodal data. By combining the advantages of a deep learning network and machine learning, the two specific symbol systems, text and visual image, were used in multimodal fusion to realize context sentiment analysis. Reference [17] proposed a new text semantic recognition method based on a hybrid neural network model structure for the polysemy phenomenon and topic confusion of Weibo text. It used the latent semantic relations in different language contexts and the cooccurrence statistical features between words in Weibo, and feed the output of CNN to the LSTM filter to achieve accurate Weibo sentiment analysis. Reference [18] proposed a sentiment analysis method using CNN and a bidirectional encoder to analyze the relevant corpus collected from Twitter using a three-layer convolution framework. It conducted experiments on the corpus of 17000 tweets. The results demonstrate that it has a high accuracy.

However, due to the increasing traffic of social networks, the text information has serious problems such as colloquialism, sparse semantic features, and being too simple and refined. The ability of sentiment analysis methods based on deep learning alone to deal with such problems needs to be improved. Therefore, a CNN-BiGRU-based social network text sentiment analysis method in a big data environment is proposed.

3. Aspect Based Sentiment Analysis Model Using CNN-BiGRU

3.1. Model Framework

Gated recurrent unit (GRU) solves the problem of gradient disappearance in recurrent neural network and has the ability to learn long-term dependence. BiGRU is an improvement on GRU. It is mainly composed of two GRUs in opposite directions. In this way, not only the previous information can be obtained, but also the influence of the following information on the current world can be considered [19]. CNN can extract text features and semantic information by convolution operation a using convolution kernel. The effect of a single CNN or BiGRU model in dealing with text sentiment analysis is not very ideal. Considering the advantages of CNN and BiGRU network models in sentiment analysis tasks, the CNN-BiGRU model is constructed by combining the two neural networks to solve the problems in text sentiment analysis of social networks. The model structure is shown in Figure 1.

3.2. Word Vector Represents Text

The word vector embedding layer includes sentence embedding and aspect word embedding. First, given a sentence , where is the length of the words in the text, and is the aspect word, where is the length of the aspect word. The unstructured text is mapped into continuous word vector representation by the Glove model, and then sentence embedding and aspect word embedding are obtained by looking up the embedding matrix , where the size of the vocabulary is represented by , and the embedding dimension is represented by .

When an aspect word appears in a sentence, the word closer to the aspect word will have a greater impact on the sentiment polarity of the aspect word [20]. The proposed method introduces a dependency syntax tree to represent the dependency between words. For example, in the sentence “This food tastes good but the price is a bit expensive,” the target word is price, and the open source library spacey is used to build the dependency syntax tree. Each word is taken as a node, and Floyd is used to calculate the shortest distance from each node to the first target node, which is taken as the position index. The shortest distance index sequence is expressed as , and its corresponding position embedding is obtained by searching the position embedding matrix , which is randomly initialized and updated during the training process.

3.3. CNN Layer

CNN layer mainly includes the convolution layer and pooling layer.

3.3.1. Convolution Layer

Convolution layer is mainly used to extract the features of the text and obtain the local semantic information of the sentence. The filter used in the convolution layer performs convolution on the sentence matrix to extract local features , and the calculation is as follows:where is the vector with total lines from to in the sentence matrix, is the convolution kernel, is the offset, and is the rectified linear unit (ReLU).

Each convolution kernel will extract a part of the features, and the filter will slide from top to bottom in the whole sentence matrix according to the set step size to obtain the local feature set . The calculation of is as follows:

3.3.2. Pooling Layer

The pooling layer performs feature dimension reduction [21]. The proposed method uses the max-pooling method to extract the largest feature from the local feature set obtained by the convolution layer to replace the entire local feature , and the calculation is as follows:

3.4. BiGRU Layer

The text vector is composed of a word vector and a position embedding vector. BiGRU is used to obtain aspect words and context information, so as to obtain a hidden layer vector representation [22]. GRU performance is similar to long and short-term memory networks, but it has fewer parameters and lower computational complexity. The GRU network has two gate structures of update gate and reset gate . is used to indicate that the cell unit receives information in the previous time step. The larger the value, the more information in the previous time step is memorized. is used to indicate the degree of ignoring the information in the previous time step. The more the value, the more the information is forgotten. At a certain time, the hidden state of GRU is calculated as follows:where represents the hidden state at a previous time, represents input sequence information, and are weight matrices, and is the sigmoid function.

The output of the BiGRU encoder combines the forward hidden layer and the backward hidden layer , where n is the length of the sentence.

3.5. Dropout Layer

CNN-BiGRU model has too many parameters and insufficient training samples, which is prone to overfitting. Therefore, the Dropout method is adopted to randomly discard a certain number of neurons with a certain probability in the training process. Only a part of neurons are allowed to participate in the training and parameter learning of the model, so as to ensure that the model cannot rely on some local features excessively in the process of training and learning parameters. It can improve the training efficiency and generalization ability of the model [23, 24]. The neural network model using the Dropout method is shown in Figure 2.

The specific workflow of the Dropout method is as follows:(1)Temporarily delete hidden layer neurons with a certain probability from the network.(2)The input is propagated forward in the network and then back through the same network. After each batch size is executed, the corresponding weights and offsets are updated according to the stochastic gradient descent algorithm.(3)Recovery of discarded neurons; at this time, the discarded neuron parameters remain the same, and the neuron parameters that have not been discarded have been updated.(4)Repeat steps (1), (2), and (3) continuously.

The obtained from the BiGRU layer is processed by Dropout to obtain , and the mathematical expression is as follows:

3.6. Sentiment Classification Layer

After the Dropout layer, the output representation is forwarded to the fully connected layer through the activation function. This layer maps the output of the previous layer to the required output size. This layer also learns to retain the relevant information required for the sentiment prediction of the target and forgets the irrelevant data. If the polarity probability of an aspect exceeds a threshold, the aspect is assigned to the corresponding sentiment class [25, 26]. The last task of fine-grained sentiment analysis is to classify the sentiment polarity. The model accepts as input features. The final predicted sentiment polarity of the aspect target is the label with the highest probability.

For sentiment classification, the proposed method uses a sigmoid activation function, which can output values between 0 and 1. The calculation is as follows:

At the same time, the proposed model is trained with L2 regularization to minimize the cross-entropy loss.

4. Experiments and Analysis

The experiments are conducted based on the deep learning framework TensorFlow. The specific experimental environment is shown in Table 1.

In addition, the model parameter settings in the experiment are shown in Table 2.

4.1. Data Set

In order to verify the effectiveness of the proposed model, the Weibo_senti_100k Weibo data set of the open-source Chinese natural language processing data set on the ChineseNLPCorpus project on GitHub was selected for the experiment. Weibo_senti_100k belongs to Sina Weibo comment text with sentiment annotation, with about 50000 positive and negative comments, respectively.

Weibo_senti_100k Weibo data set is preprocessed to remove various punctuation marks, HTML tags, “#” tags, “@” in user names, and so on. Then, use the Jieba word segmentation tool to segment words, and remove the stop words according to the stop words list of the Harbin Institute of Technology. Finally, in order to train and test the text sentiment analysis model, the preprocessed data set is divided into 80000 and 20000 items, respectively according to the ratio of the test set and a training set of 1 : 4. The sentiment categories are divided into positive and negative categories.

4.2. Evaluation Index

In the experiment, the accuracy, the area under ROC (receiver operating characteristic) curve (AUC), the precision, the recall, and F1 score are used as the evaluation indexes of the proposed model. Let TP be the true positive case, TN be the true negative case, FP be the false positive case, and FN be the false negative case.(1)Accuracy: accuracy is an index used to evaluate the classification model. Generally speaking, the accuracy refers to the proportion of correct results predicted by the proposed model, which is calculated as follows:(2)AUC (area under curve): the area enclosed by the coordinate axis under the ROC curve. The ROC curve shall be calculated before calculating AUC. The ROC curve is to draw the corresponding true positive rate (TPR) and false positive rate (FPR) results of the vertical axis with different cut-off points in the two-dimensional coordinate system. The obtained curve is the ROC curve. TPR and FPR are calculated as follows:(3)Precision: the precision index is the proportion of samples identified as positive categories that are indeed positive categories. Precision is defined as follows:(4)Recall: recall refers to the proportion of all positive category samples correctly identified as a positive category. The definition of recall is as follows:(5)F1 score: in order to give consideration to the precision and the recall, neither the precision nor the recall is preferred, so the derivative F1 score of the harmonic average of the two can be used. An important characteristic of the harmonic mean: only when both are large or small, the harmonic mean will be large or small, and as long as one is small, the result will be greatly lowered. F1 score is defined as follows:

4.3. Selection of Dropout Value

There are too many parameters in CNN-BiGRU model, and it is easy to produce an overfitting phenomenon in the training process. In order to solve this problem, the Dropout method is used to randomly discard the number of neurons with specific probability values in the training process. Only the remaining neurons are allowed to participate in the training and parameter learning of the model. During the training process, the model cannot rely excessively on some local feature information, so as to improve the training efficiency and generalization ability of the model. In the experiment, the Dropout values are set to 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, and 0.5 for analysis and verification. It explores the impact of different Dropout values on the sentiment analysis results. The specific experimental results are shown in Table 3.

According to the experimental results in Table 3, when the Dropout value is 0.1, the overall performance of the model is relatively poor, and the accuracy is 89.24%. This may be because the Dropout value is too small, the number of randomly discarded neurons is small, and there are too many parameters in the model. When the Dropout value is increased to 0.25, the model performance reaches the best from the three evaluation indexes of accuracy, precision, and F1, which are 89.95%, 90.07%, and 90.54%, respectively. After the overall performance of the model reaches the optimum, continue to increase the Dropout value. Almost all the evaluation indexes begin to show a downward trend, and the performance of the model begins to decline gradually. This may be because the number of discarded neurons is too large, resulting in too few parameters. It is difficult for the model to analyze the data well, resulting in the overall performance degradation. Therefore, in general, the Dropout value is set to 0.25.

4.4. Selection of Optimizer

The essence of the model training process is to minimize the loss function. After determining the loss function, the optimizer needs to be used for gradient optimization. The optimization target is the parameters in the model. The parameters in the training CNN-BiGRU model include all the parameters in CNN and BiGRU. In the experiment, five commonly used optimizers including SGD (stochastic gradient descent), AdaGrad, Adadelta, RMSProp (root mean square prop), and Adam are selected to optimize the model. Explore the impact of different optimizers on the model performance. The specific experimental results are shown in Table 4.

According to the experimental results in Table 4, the Adam optimizer has the best effect, with accuracy, precision, and F1 values of 88.98%, 88.86%, and 90.27%, respectively. This is mainly because Adam can adaptively adjust the learning rate parameters, and combines the advantages of Adadelta, RMSProp, and Momentum, and Adam is quite robust to the selection of super parameters. Therefore, the optimizer selects Adam.

4.5. Comparison with Other Methods

According to the model trained on the training set, the test set is analyzed. In order to demonstrate the analytical performance of the proposed method on the test set, it is compared with the methods in References [12, 17]. The results are shown in Figure 3.

It can be seen from Figure 3 that the proposed method performs well in any evaluation index compared with the other two methods, and its accuracy, precision, recall, and AUC are about 94.09%, 95.13%, 92.87%, and 0.953, respectively. Since the BiGRU model adopted in the proposed method can better learn the context information in the social network text and is more conducive to learning the text sentiment, its accuracy, precision, and AUC are improved by about 2.16%, 2.65%, and 0.081 compared with the LSTM model in Reference [17]. In addition, the combination of CNN and LSTM models in Reference [17] can better obtain the text sentiment types. Compared with the use of artificial neural networks only in Reference [12], its performance has been significantly improved, and the AUC has increased by about 0.157. Overall, the proposed method has the best performance in text sentiment analysis, which also verifies the effectiveness of the CNN-BiGRU model.

5. Conclusion

In the face of the explosive growth of text resources in social networks, how to effectively use text data and mine potential value is of great significance. Therefore, a CNN-BiGRU-based social network text sentiment analysis method in a big data environment is proposed to accurately obtain the sentiment polarity. The dependency syntax tree is used to represent the dependency relationship between words, and it is input into the CNN-BiGRU model to learn and obtain various sentiment features. At the same time, a certain number of neurons are discarded by the Dropout method to improve the analysis efficiency. Finally, the sigmoid activation function is used to complete the classification of sentiment types. Experiments based on Weibo_senti_100k Weibo data set shows as follows:(1)The proposed method adopts the Dropout method and Adam optimizer to improve its analysis performance. At the same time, the Dropout value is set to 0.25, and its F1 score reaches 90.54%. The optimization effect is remarkable.(2)The proposed CNN-BiGRU model has good sentiment analysis performance. Its accuracy, precision, recall, and AUC are about 94.09%, 95.13%, 92.87%, and 0.953, respectively, which is superior to other comparative methods.

The text information is diversity, but the proposed method only classifies the polarity of text sentiment into positive and negative, without more fine-grained identification and deep mining of text sentiment, such as dividing into three categories (positive, neutral, and negative) or more categories such as “pleasure, anger, sorrow, and joy.” Therefore, more detailed research will be conducted in the next work.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the High Level Vocational Professional Group Construction Project of China (2019), the National Level Teaching Innovation Team of Vocational Education Teachers (2019), and the Industry University Cooperation and Collaborative Education Project of Zhejiang Province (2020).