Abstract

In view of the problems that most existing emotion analysis models ignore the relationship between emotions and are not suitable for students, an emotion analysis model of teaching evaluation text based on deep learning is proposed. Firstly, combining the advantages of CNN extracting phrase features and BLSTM extracting sequence features, the CNN-BLSTM model is constructed to effectively enhance the extraction ability of text information. Then, the attention mechanism is used to adaptively perceive the context information, extract the text features that affect students’ emotion, and construct the CNN-BLSTM-AT model. Finally, the CNN-BLSTM-AT model is used to analyze the students’ emotion types in the dataset, and the mini-batch gradient descent method is used for model training. The experiment uses the weibo_senti_100k dataset to demonstrate the performance of the proposed model. The results show that adding the attention mechanism can improve the accuracy of the model by about 0.23. Also, its recall rate is not less than 0.57 and the minimum value of F1 is 0.748, which is better than other comparison models, thus demonstrating the effectiveness of the proposed model.

1. Introduction

At present, most of the work of sentiment analysis focuses on the research of sentiment polarity directly presented by the text. It mainly uses statistical machine learning methods to divide the text into positive emotions and negative emotions. There are few studies on the emotions of students that may be triggered by the text [14].

The purpose of student sentiment analysis is to study the mechanism by which language and text inspire students to produce emotions such as joy, anger, sorrow, and happiness and to predict the emotions that students may produce after reading the text. Students’ online evaluation of teaching is a key link in the development and support system of teachers in various universities. In the practice of online teaching evaluation for many years, universities have accumulated massive amounts of textual data on teaching evaluations. However, due to the non-structural characteristics of the texts of teaching evaluations, routines cannot be carried out. The statistical analysis of the data results in low data utilization. If things go on like this, not only does the teaching evaluation data fail to fully play its role in improving the teaching level of teachers but it also reduces the enthusiasm of students to participate in the evaluation of teaching. Analyzing and studying student emotions through evaluation texts has important practical significance and research value in understanding students’ mental state and information retrieval [5]. However, because human emotions are very complex and student emotions are even more diverse, research is relatively difficult, and it is still in its infancy [6]. Existing studies often use hand-designed feature extraction methods to extract text features that affect students’ emotions, ignoring the word order and context information of the text and failing to capture the complex language phenomena in the text. Also, most of the current mainstream student sentiment analysis methods adopt a multi-label way to express, and it is difficult to reflect the complexity of a variety of interrelated human emotions [7, 8].

Aiming at the problems that most existing models are not suitable for analyzing complex student emotions and ignoring the relationship between emotions, a teaching evaluation text emotion analysis model using attention mechanism and CNN-BLSTM model is proposed. Compared with other emotion analysis models, the innovations of the proposed model are as follows:(1)In order to improve the text feature extraction ability of the proposed model, it uses the CNN model to set up convolution windows of different sizes to extract the binary and ternary features of the text. In addition, BLSTM is used for sequence feature extraction, thereby providing highly reliable text features for subsequent classification.(2)In view of the lack of consideration of the relationship between emotions in traditional analysis models, the proposed model introduces an attention mechanism. It can adaptively combine context information and student emotion information to extract key text features that affect student emotions, effectively improving the accuracy of emotion classification.

Thanks to the rapid development of computer technology, the research objects of sentiment analysis have also developed from sentence and document level to attribute set data, which have better analysis accuracy and reliability [9]. Sentiment analysis methods have also made considerable progress from the initial sentiment dictionary-based sentiment analysis to machine learning-based sentiment analysis and then to deep learning-based analysis and research [10]. There have been many research studies on sentiment analysis based on sentiment dictionary, such as reference [11] extracting binary relations through word segmentation and dependency analysis. The existing sentiment dictionary is combined with web page information and manual annotations to mark it to form a binary relational knowledge base for sentiment analysis. But it did not take into account the influence of different contexts on emotional orientation. Reference [12] describes a rule-based sentiment analysis method. A sentiment dictionary based on morphological rules and ontological models is used for text sentiment analysis, and the language part that defines the sentiment of the text is determined. A good classification result has been achieved, but the classification accuracy needs to be improved.

Attribute-level sentiment classification based on machine learning mostly uses traditional machine learning classifiers such as support vector machines and Naive Bayes. For example, Samara et al. [13] studied the use of active stimuli to recognize emotional states. A hierarchical machine learning method for emotional state recognition based on facial expressions was proposed. The feature representation based on Euclidean distance is combined with the custom coding of the user's self-reported emotional state. A more accurate emotion recognition is achieved, but the recognition efficiency needs to be improved. Basha et al. [14] proposed a machine learning text emotion recognition method. Using natural language processing technology, emotional information is extracted and recognized when reading or writing. However, the accuracy of emotion recognition for daily non-standard language or behavior expression is poor.

Deep learning produces ideal prediction results in image recognition and other fields by learning multiple representations or features of data. Therefore, it has also been successfully applied to other fields such as sentiment analysis [15]. Jagirdar et al. [16] proposed a deep learning method for emotion analysis, focusing on the understanding of facial expressions of students in the classroom to better implement teaching. However, a large number of high-quality sentiment analysis training sets need to be labeled to improve the accuracy of the model. Lu and Zhang [17] proposed a sentiment analysis model using multi-channel convolutional neural network. It can extract more semantic information from emotional text and learn the emotional information hidden in emotional text. Kottursamy [18] used a new eXnet convolutional neural network for feature extraction to realize facial emotion recognition. Since the eXnet model has fewer parameters, the improved CNN model has higher accuracy. However, large-scale applications cannot be realized in a rich practical environment. At the same time, most of the above-mentioned sentiment analysis research studies are aimed at the recognition of text or facial expressions, and there are few research studies on students' emotion recognition in class. Therefore, a student sentiment analysis model using the attention mechanism and the CNN-BSTM model is proposed.

3. Sentiment Classification Model Based on CNN-BLSTM-AT Model

3.1. CNN-BLSTM Model

Both CNN and BLSTM have their own advantages in sentiment classification tasks. CNN uses multiple convolution kernels to perform convolution operations on the word vectors of the text to more effectively mine the potential semantic information of the text [19]. BLSTM can better predict the semantics of text sequences. Combining the two, the CNN-BLSTM model is formed for students' emotion analysis. The model structure is shown in Figure 1.

In the CNN-BLSTM model, text is input to CNN for emotional feature extraction, and deep-level information of the text is mined. Then, the feature value of CNN is input into BLSTM for further feature extraction, and a vector representation containing the entire sentence information is obtained, so as to realize emotion classification.

3.2. CNN Text Sentiment Analysis Feature Extraction

Based on the relevant characteristics of CNN, a double-layer parallel convolutional neural network is designed to extract and express text features. Its structure mainly includes three layers, embedding layer, convolution layer, and pooling layer.

3.2.1. Embedding Layer: Sentence Representation

The word frequency feature is used to form the word vector, the stuttering word segmentation is called to segment the comment text data, and the word set is obtained and the word frequency is counted. Since the review text is short and concise, the sentence length (the number of words included) is limited to 80. The sentence is input into the embedding layer, each word is converted into a 512-dimensional word vector, and the final embedding layer outputs an 80 × 512 two-dimensional matrix. Each sentence forms an n × m two-dimensional matrix , where is the word vector of the word .

3.2.2. Convolutional Layer: Feature Extraction

The purpose of the convolutional layer is to extract the semantic features of the sentence, and each convolution kernel correspondingly extracts a certain part of the feature. The number of convolution kernels is set to 256. Convolution operation is performed on each sentence matrix output by the embedding layer:where represents the feature matrix extracted by the convolution operation; is the weight matrix; and is the bias vector. These are the parameters learned by the CNN network.

In order to facilitate the calculation, it is necessary to perform a non-linear mapping on the convolution result of each convolution kernel:where the function is one of the commonly used excitation functions for neural network models.

The convolution windows of sizes 2 and 3 are used at the same time to obtain the binary and ternary features of sentences.

3.2.3. K-Max Pooling Layer: Feature Dimensionality Reduction

After the sentence is convolved, the extracted features are passed to the pooling layer. The pooling layer further aggregates these features and simplifies the expression of features. The K-max pooling operation is used in the proposed model, and the Top-K maximum values of each filter are selected to represent the semantic information represented by the filter [20]. The K value expression iswhere is the length of the sentence vector and is the size of the convolution window.

Since the number of convolution kernels is set to 256, the sentence representation matrix generated after pooling is .

The binary and ternary eigenvectors obtained from convolution layer and pooling layer are spliced together through fusion layer as the input matrix of BLSTM model.

3.3. BLSTM Text Sentiment Analysis Feature Extraction

LSTM memorizes the information of the previous nodes through the gate mechanism, so it is very suitable for processing sequence data [21, 22]. The model consists of multiple repeated modules to form a sequence. The input of each module contains not only the information of the current word but also the output of the hidden state of the previous word. The final model will obtain a vector representation containing the entire sentence information, so as to achieve sentiment classification. In LSTM, each node is more complicated than the traditional RNN, and its model structure is shown in Figure 2.

Each control unit of LSTM contains three gates (input gate , output gate , and forget gate ), cell state , and hidden layer state , which are five parts. Among them, the first step of LSTM is to discard part of the information from the cell state. This operation is determined by the forget gate .

The forgetting gate reads the hidden state of the previous node and the input word vector of the current sequence and then outputs a value between 0 and 1 to the cell state of the previous node. This indicates how much the current unit retains the cell state information of the previous time node. Among them, 1 means completely reserved, and 0 means completely discarded. The specific implementation is as follows:where represents the sigmoid function.

The second step of LSTM determines the amount of information added to the current cell state. This step requires two operational links to decide. The first link is the sigmoid layer of the input gate, which determines the information that needs to be updated, namely:

The second link is the tanh layer, which generates a vector , namely:

After obtaining and , the state of the old cell can be updated. Multiplying the old state and will discard the information that needs to be discarded. Then, add to get the new cell state :

The third step of LSTM is to determine the output of the model. This output is calculated from the current cell state , the output at the previous moment, and the current input . Among them, and are calculated as follows:

Through the output gate, the model finally gets the hidden layer output .

For the text sequence , after a series of operations, LSTM obtains the output vector of the model. Finally, the model averages the output vector of each time series to obtain the vector representation of the entire text.

Since the one-way LSTM can only capture the forward semantic information of the text, it lacks the acquisition of the reverse semantic information. Therefore, the proposed model uses BLSTM to obtain the forward and reverse semantic information of the text. Finally, the text vector representation of BLSTM is [23, 24].

3.4. CNN-BLSTM Model Incorporating Attention Mechanism

The CNN-BLSTM model not only considers the semantic information within the sentence but also considers the long-distance dependence between sentences. It realizes the effective mining and expression of text semantics [25]. In order to further capture the key text information that affects students’ emotions, the attention mechanism (AT) is introduced on the basis of the aforementioned methods to improve it. We propose a CNN-BLSTM model that integrates attention mechanism, namely, CNN-BLSTM-AT model. Its structure is shown in Figure 3.

The model is based on the following assumption: the semantic information of different parts of the text contributes differently to student sentiment classification. For important semantic information, more attention is allocated, and less attention is allocated to other parts. The key issue is how to allocate attention independently without receiving other information.

Observing the corpus, it is found that the importance of words or sentences in the text is highly dependent on the context. That is, the importance of the same word or sentence in different contexts is different. Inspired by this observation, the proposed model introduces a context vector , which can be regarded as a high-level representation containing textual context information, used to perceive important semantic features. The importance is distinguished by assigning an importance score (that is, attention weight) to each hidden node in the sequence layer. The larger the weight is, the more important it is for students' sentiment classification. The attention weight is calculated as follows:where is the implicit vector corresponding to node in the sequence layer; is the length of the implicit vector; is the hidden vector in the attention layer; represents the attention weight of the node in the sequence layer and satisfies ; and Q is the number of sentences in the text. The context vector can be randomly initialized and then can adaptively learn context information during the training process.

According to the attention weight vector , the hidden vectors of all nodes in the sequence layer are weighted to obtain the weighted semantic vector . The calculation is as follows:

Combining the weighted semantic vector and the overall text semantic vector to obtain the final text feature vector , which is calculated as follows:where , the parameters and are learned adaptively during the model training process. The feature vector calculated in this way highlights the role of semantic information related to student emotions and reduces the interference of other non-related information.

Finally, the feature vector representing the text is used as the input of softmax regression to obtain the probability distribution of student sentiment, which is expressed as follows:where matrix and vector are the parameters that the model needs to learn and is a K-dimensional polynomial distribution, which represents the proportion of each sentiment label. The number of sentiment labels is K, and .

3.5. Model Training

The proposed model adopts supervised learning in the training process, taking the cross-entropy error between the true emotion probability distribution and the predicted emotion probability distribution of each text in the training set D as the loss function. The calculation is as follows:where parameter set and emotion label collection . Each component in corresponds to the predicted probability value of the student’s emotion label . represents the true probability value of the student’s emotional label as . In order to strengthen the generalization ability of the model, the L2 regular term is added, and the loss function is set as follows:

The goal of model training is to obtain the parameter vector to minimize the loss function . The most used optimization algorithm is based on gradient descent, which only needs to solve the first derivative of the loss function. The computational cost is relatively small and can be applied to large-scale datasets. The proposed model adopts the mini-batch gradient descent method for training. In each iteration process, a small part of the sample is used to replace all the samples to participate in the calculation. Speed up model training while finding the global optimal solution.

4. Experiment and Analysis

The experimental environment configuration is shown in Table 1.

At the same time, the parameters of the proposed model are set for the Chinese dataset. The parameter settings in the experiment are shown in Table 2.

Colleges and universities in Wuhan have implemented online teaching evaluation for more than ten years. Post-90s and post-00s college students are no longer unfamiliar with this feedback mode of teaching evaluation. After the questionnaire survey and document sorting, this article selected nearly 10,000 textual comments for training and testing the sentiment classifier. After manual screening, we obtained about 6010 positive comments, 4770 of which were used for training and 1240 were used for testing. There were about 3950 negative comments, of which 3150 were used for training and 800 were used for testing. At the same time, all comments made by 320 teachers in the same year are selected for the processing output of the whole process.

4.1. Classification Result Statistics

The dataset is divided into training set and verification set by 10 : 1, and the statistical results are shown in Figure 4.

As can be seen from Figure 4, the number of misclassification of anger, surprise, and happiness is relatively small, and the number of misclassified texts does not exceed 20. In addition, there are many misclassifications of sadness and no emotion, accounting for about 65% of the total set of tests in this category. Since there are not many words expressing this kind of emotions and they will not be easily vented in the teaching evaluation texts, there are not many texts and there are many misclassifications. It can also be seen from the results in Figure 4 that students express the most disgusting emotions in the teaching evaluation text, such as hate, dislike, and so on. The teaching evaluation text has become a platform for students to vent their dissatisfaction, and this statistical result can also be useful for the school. Also, the teacher’s follow-up student psychological counseling plays a certain supporting role.

4.2. Accuracy and Loss Comparison

In order to demonstrate the role of attention mechanism in student sentiment analysis, experiments were conducted using the experimental dataset. Among them, comparing the CNN-BLSTM model with or without attention mechanism, the results of accuracy and loss are shown in Figure 5.

It can be seen from Figure 5 that the model with the attention mechanism has a significant improvement in accuracy than the model without the attention mechanism. Especially in the validation set, the accuracy of the model with the attention mechanism is 0.93, while the accuracy of the model without the attention mechanism is only 0.70, a difference of 0.23. Similarly, comparing the loss changes, for the training set, the difference between the two loss values is about 0.15. However, the loss value of the model without the attention mechanism in the verification focus is as high as 1.1. Through the training accuracy and loss value, it can be shown that the sentiment analysis performance of the proposed model has been greatly improved after the attention mechanism is introduced, and the problem that the model is easy to overfit is solved.

4.3. Recall Rate Comparison

In order to verify the performance of the proposed model, a significance test experiment was designed. Using word frequency as a feature, the models in [11, 14, 17] and the proposed CNN-BLSTM-AT model are used for 10-fold cross-validation. The recall rate of the sentiment analysis results is shown in Figure 6.

It can be seen from Figure 6 that compared with other models, the proposed model has the largest analysis recall rate in most arrays, which is greater than 0.57. The proposed model combines CNN and BLSTM and uses the attention mechanism to adaptively perceive the context information and extract the text features that affect students' emotion, which greatly ensures the accuracy of analysis. Reference [11] uses the existing sentiment dictionary combined with web page information and manual annotations to mark it to form a binary relationship knowledge base for sentiment analysis and only considers the binary relationship. Therefore, the analysis effect of the massive dataset of teaching evaluation text is not ideal, and the recall rate is lower than 0.56. But it did not take into account the influence of different contexts on emotional orientation. Similarly, reference [14] uses machine learning for text emotion recognition. The natural language processing technology used therein cannot cope with multiple types of sentiment analysis, such as disgust, so the result of the recall rate is not ideal. Reference [17] combines emotional dictionary and multi-channel CNN for emotional analysis, which can extract more semantic information from emotional text and learn the emotional information hidden in emotional text. Compared with the former two, the analysis ability has been improved to a certain extent, and the recall rate reached 0.65 at the highest. However, a single deep learning model has certain limitations when dealing with multiple types of emotions, so it needs to be improved.

4.4. F1 Value Comparison

Similarly, the F1 value of the sentiment analysis results obtained by the proposed model and the models in [11, 14, 17] is shown in Figure 7.

It can be seen from Figure 7 that in the 10 sets of data of the proposed model, the maximum F1 value is 0.845, the minimum is 0.748, and the difference is 0.097. The overall performance of student sentiment analysis is better. The main reason is that the attention mechanism is introduced in the CNN-BLSTM model, which can better analyze various emotions that are not easy to recognize, such as disgust, insensitivity, and so on. Reference [11] uses existing sentiment dictionaries to combine web page information and manual annotations to realize sentiment analysis, and reference [14] performs sentiment classification based on machine learning. Neither of them considers the issue of multiple types of emotions in the teaching evaluation text dataset. Therefore, in most cases, the F1 value is lower than 0.75, and the difference between F1 values of different arrays reaches 0.16. Reference [17] combined sentiment dictionary and multi-channel CNN to complete the channel analysis, and the F1 value of some arrays is better than that of the proposed model. It may be that the emotion types in the array are relatively simple and easy to classify. For complex emotion types, the F1 value of the proposed model is higher and more stable.

5. Conclusion

In order to obtain the emotional state of students with high precision, a sentiment analysis model using the attention mechanism and the CNN-BLSTM model is proposed. Among them, the phrase feature and sequence feature of the text are extracted, respectively, through CNN and BLSTM, and the attention mechanism is used to adaptively perceive context information to extract text features that affect students' emotions. We input the fused text features into the softmax classifier to complete sentiment classification. The performance of the proposed model is demonstrated experimentally based on the weibo_senti_100k dataset. The results show that there are more misclassifications of sadness and no emotion, accounting for about 65% of the total set of tests in this category. Also, adding attention mechanism can significantly improve the accuracy of analysis. In addition, the result recall rate of the proposed model in any array is greater than 0.57. In addition, the maximum value of F1 is 0.845, the minimum is 0.748, and the difference is 0.097. For complex emotions, while improving the accuracy of analysis, it also ensures the stability of the model.

However, deep learning methods require large-scale training corpus to improve the learning ability and generalization ability of the model. In practice, the cost of manual labeling is often high, and it is difficult to obtain a large amount of labeling data for specific fields. Therefore, in future work, we will consider combining deep learning methods and transfer learning methods for student teaching evaluation text analysis to alleviate the impact of domain dependence.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the Huanggang Normal University High-Level Cultivation Projects (no. 37), First-Class Undergraduate Courses Project of Huanggang Normal University (no. 2021CK41), Industry and Science Cooperation and Collaborative Education Project of Ministry of Education of China (no. 202101091023), Hubei Provincial Social Science Fund Prophase Funding Projects (no. 20ZD096), and Research Planning Foundation on Social Sciences of the Ministry of Education (no. 20YJA870017).