Abstract

In terms of cross-cultural exchanges, the film is not only an important embodiment of a country’s cultural soft power but also the most direct and favorable way of communication. The advent of the all-around well-off era has propelled people’s demand for spiritual, cultural, and entertainment which promotes the vigorous development of the film culture industry. The expansion and development of China’s film market, domestic films, are a reflection and extension of China’s culture and ideas. It plays an extremely important role in enhancing cultural self-confidence and cultural output. In order to better grasp the emotional tendency of the audience, understand the viewing needs, and put forward suggestions on domestic film production, it is very necessary to analyze the emotion of film reviews and dig deep into semantics. Since the evaluation of film works considers many factors that are complex and changeable, the choice of model plays a significant role in the process of emotion analysis. The deep learning model represented by a deep neural network has high tolerance to sentence noise, has strong information discrimination, and features self-learning ability. It also has great advantages in emotion classification tasks. This study conducts an in-depth study and research on the traditional emotion analysis methods and finally puts forward an effective emotion analysis framework that combines the traditional emotion analysis method and deep learning network. This framework enhances the text vectorization representation and emotion classification model by performing emotion analysis. The effectiveness is verified by corresponding experiments which justify the superiority of the approach.

1. Introduction

With the rapid development of China’s economy, people’s demand for spiritual culture is rising, and film culture has gradually become an important part of China’s cultural industry. Films are considered as one of the most familiar and common forms of entertainment which have grown at a rapid pace and are extremely influential [1]. Films have also become an important carrier of mass cultural communication enrich the mass’s cultural life. Users can freely edit and participate in submission, publication, and communication, not just limited to web browsing [2]. Virtual cyberspace has become a place for emotional communication and expression between individuals more frequently than in real society [3]. More and more people tend to actively comment on an event or product on social networks; express their views, emotions, attitudes, positions, and other important information; and gather valuable text data. Online evaluation data is the independent contribution of users, which fully reflects personal attitudes and views [4]. It has the characteristics of a large number, strong openness, multidimensional perspective, and high-value density. It not only reflects the main ideas of film viewers and the degree of likes and dislikes for films but also has a subjective impact on nonfilm viewers so as to change the trend of film box office and plays a decisive role in the further dissemination of culture [5]. The significance of the film review is shown in Figure 1.

In addition to reflecting the thoughts and feelings of film viewers, film reviews can also well reflect the overall opinions of film viewers on the film and promote the development of the film industry. China’s film industry is undergoing structural adjustment [6]. Under the background of high-quality development, while pursuing high box office, we should pay more attention to quality, tap the audience’s emotional attitude toward the film, and fully understand the film viewing needs, which has the following practical significance. Emotion analysis is a process of analyzing text comments with emotional color by using certain rules [7]. It has important applications in the fields of network public opinion monitoring, product word-of-mouth analysis, market emotion analysis, and so on. Through emotion analysis technology, effectively using massive comment data and automatically mining its potential emotional tendency can help consumers quickly and intuitively understand the public’s attitudes and opinions on the film, so as to make more rational consumption decisions [8]. In addition, film investors and creators can more intuitively get real and comprehensive market feedback information and audience opinions; provide references for subsequent film creation, publicity, and marketing; and then obtain more commercial benefits.

Numerous studies have been conducted wherein deep learning techniques have been used for emotion analysis. As an example, in the study in [9], a fusion of multilayered convolutional neural network and long short-term memory neural network (MCNN_LSTM) was implemented for emotional analysis of Chinese reviews. The output of this model yielded almost eight times higher speed without reducing the prediction accuracy. The study in [10] implemented two Markovian models to develop a violence prediction pruning system for movies and videos. The first model worked on emotions extracted from text while the second model functioned on emotions extracted from video frames and finally predicted if the video was violent or a nonviolent one. The extracted facial emotions were fed into a machine learning-based face cascade classifier which classified the emotions as fear, anger, sad, neutral, or happy.

Therefore, we need to start from the practical problems faced by the current emotion analysis task, combined with the traditional emotion analysis methods and deep learning model, to find a more efficient and accurate automatic emotion analysis method for film review [11]. Through the study of the basic knowledge of emotion analysis, this paper makes an in-depth study and research on the traditional emotion analysis methods and finally puts forward an effective emotion analysis method combining the traditional emotion analysis method and deep learning network [12]. This method improves the text vectorization representation and emotion classification model in the process of emotion analysis, and its effectiveness is verified by corresponding experiments.

2.1. Research Status of Emotion Analysis

The research of text emotion classification (propensity analysis) originated in the early 21st century. At present, it has become a research hotspot in many fields, such as natural language processing and machine learning [13]. The research of emotion analysis technology can be roughly divided into three categories: methods based on dictionaries and rules, methods based on traditional machine learning, and methods based on deep learning [14]. The text sentiment classification technique approach is shown in Figure 2.

The early emotion classification is mainly based on the research of emotion dictionaries. Using modern linguistics to construct a dictionary containing positive emotional words and negative emotional words, according to the emotional tendency information of the provided words, the emotional tendency is determined and analyzed with the calculated emotional value as the evaluation standard under different fine granularity [15]. The English emotion dictionary resource is WordNet-Affect, which is based on WordNet and obtains six basic emotion words by selecting and labeling WordNet’s set of emotion synonyms and then using WordNet to expand it to form an emotion dictionary [16]. The construction of an emotional dictionary often requires a lot of human and material resources, while the automatic construction of an emotional dictionary reduces the cost and enhances the applicability in different fields.

The methods based on traditional machine learning are mainly divided into supervised learning, unsupervised learning, and semisupervised learning. In the emotion classification task, the supervised learning method is the mainstream. The general process is to first process the text information, then construct a model for supervised learning, train its classification function, and then use the trained model to predict the emotional polarity of the new sample data [17]. The early research framework for studying machine learning-based text sentiment classification is to represent text using bag-of-words models and to perform classifier design after classifying it without considering order, syntax, and semantics [18]. Commonly used machine learning classifiers are plain Bayes, maximum entropy, and support vector machines. The performance of classifiers is domain-dependent, and no classifier can always maintain the best in different domains.

In recent years, deep learning has been widely used in the fields of speech, image, and natural language processing, and text sentiment analysis based on deep learning has become a hot topic in academia [19]. In the embedding representation of words, the research scholars have supplemented and improved the neural language model and thus proposed the Word2Vec word vector training model with matrix distributed representation, which makes the long-standing problem of dimensional disaster and semantics perfectly solved [20]. The Glove model is then proposed to merge the local feature information captured by the sliding window with the global statistical information to obtain the vector representation of words. Deep learning sentiment analysis models are constantly changing as time progresses [21]. Convolutional neural networks were first used in image classification tasks and then first applied to text classification tasks to achieve better classification results on many baseline datasets. A large number of corpora are trained through word completion in sentences and text prediction tasks to obtain the vectorized representation of words [22]. Domestic scholars, on the other hand, have proposed a hierarchical document coding model, Con-GRNN and STM-GRNN, based on the research results of foreign scholars to explore practical cases in depth. Experimental results on the restaurant review dataset and the IMBD movie review dataset show that this method has a significant improvement over the existing methods.

2.2. Research Status of the Emotional Analysis of Film Reviews

Domestic scholars firstly introduced machine learning models into the sentiment classification task of movie review dataset and compared three classical classification algorithms, namely, plain Bayesian model, maximum entropy model, and support vector machine, and two kinds of feature weights, namely, word frequency and Boolean value [23]. -gram, POS, and location features are also discussed, and it is obtained that the classification correct rate of machine learning methods is higher than that of manual judgment, among which SVM has the highest correct rate and NB the lowest, laying the foundation for research on sentiment classification based on machine learning [24]. Scholars then used a feature framework based on Unigram and Bigrams to first extract features from the text and then train the model from the movie reviews labeled as negative and positive [25]. And combining movie review domain lexicon and traditional machine learning to complete sentiment analysis has higher accuracy compared with only machine learning sentiment classification.

Scholars also combined both CNN and LSTM models through their own theoretical bases and research directions and conducted experimental comparisons of sentiment analysis on IMDB movie datasets using CNN, LSTM, and CNN-LSTM models, respectively [26]. The CNN-LSTM model includes convolutional neural network that is used for the purpose of feature extraction, and the long short-term memory (LSTM) technique performs sequence prediction. It was initially developed to solve series prediction problems and applications which required generating textual descriptions from image sequences. The experimental results show that the CNN-LSTM model has significantly improved the efficiency and accuracy compared with the traditional model and is superior in short text sentiment analysis [27]. China has a larger population market than any other country in the world and therefore has the highest number of published online movie reviews. Scholars propose word cooccurrence features to construct a variant low-dimensional vector space model of the vector space model after an in-depth study of Chinese movie classification and movie review sentiment classification [28]. In addition, Chinese is richer in semantics and emotion than English, so we use the term frequency weighting instead of Boolean weighting, -gram, and part-of-speech as references to better understand the semantic information behind the sentences [29]. On the basis of this research, scholars analyze and categorize texts with strong emotion and explore them in-depth and put forward their own insights based on previous research to establish an emotion classification model for movie reviews by CNN, combining CNN with word vectors [30]. The Word2Vec algorithm is used to transform the text data into the model for sentiment classification, and after training tens of thousands of Chinese movie reviews, it has a higher accuracy rate compared with the traditional method of SVM [31]. The movie review sentiment analysis model is shown in Figure 3.

In summary, sentiment analysis technology has been developed more and more mature, but the IMBD dataset is usually chosen for the selection of movie review datasets, which has less data volume and lacks the timeliness of evaluation content [32]. At the same time, the existing models applied to domestic movie reviews are not strong in real time and cannot extract local semantic features and temporal information at the same time. In the specific experiments, the intermediate level scores of movie reviews are also taken out, and further research on the overall sentiment classification results is lacking subsequently.

3. Design of Application Model

3.1. Basic Knowledge of Text Emotion Analysis

Text emotion analysis can also be called opinion mining, emotion tendency analysis, and so on. It refers to the process of extracting, analyzing, and processing the subjective text with emotional color by using the related technologies of natural language processing and text mining. At this stage, the research on text emotion analysis covers many fields, including natural language processing, text information mining, information retrieval, information extraction, and machine learning. It has won the continuous attention of many scholars and research institutions and has become one of the research hot issues in the field of natural language processing and text mining. According to the granularity of emotion analysis, the tasks of emotion analysis can be divided into text level, sentence level, and word or phrase level; According to the task types, it can be divided into subproblems such as emotion classification, emotion retrieval, and emotion extraction. The basic process of text emotion analysis is shown in Figure 4, which mainly includes the whole process of obtaining the initial data set, text preprocessing, corpus and emotion thesaurus construction, and emotion analysis results.

In most cases, we will make a positive and negative dichotomy of textual sentiment tendency. We can see a large number of comments on the Internet, which to some extent reflect people’s emotional tendency toward certain things or certain public opinion events. By organizing and analyzing these comments, we can provide more people with references for decision-making or help related organizations understand public opinion. If the manual analysis is used, the data volume and labor cost are too high and unrealistic, so text sentiment analysis has attracted the attention of more and more researchers. At this stage, sentiment analysis methods can be divided into two main categories: methods based on sentiment dictionaries and methods based on machine learning. The essence of the emotional dictionary is a collection of emotional words, and emotional words refer to words that express different emotional colors. After the task of identifying and extracting emotional words is completed, these words need to be matched with the words in the emotional dictionary one by one. If the emotional tendency of the text is judged only by the matched emotional words, it may cause ambiguity. Therefore, after completing the matching of emotional words, we also need to combine the degree of adverbs, negatives, and turning words to carry out the combination analysis and the calculation of emotional tendency value according to the set rules, so as to accurately grasp the real emotion of the text and obtain the final emotional tendency category of the text. It can be seen from the above that the key to the emotion analysis method based on an emotion dictionary is to construct an emotion dictionary and formulate the calculation rules of emotion tendency value. Compared with the method of constructing an emotional dictionary manually, researchers turn more attention and enthusiasm to the related technology of automatic construction of an emotional dictionary. The mutual information of points between words is shown in the following equation:

Therefore, PMI can be used to measure the similarity between words and help people to automate the task of building sentiment lexicons. When automating the construction of the sentiment lexicon, the first step is to select some sentiment words whose sentiment polarity is already known as seed words and use these seed words as a benchmark to calculate the sentiment tendency value of the target words using PMI and add them to the sentiment lexicon.

3.2. Text Vocabulary Vectorization

The number of comments freely edited by different users is huge and highly unstructured. Without structured or standardized syntax and patterns, it is impossible to use ready-made mathematical or statistical models for processing and analysis. Therefore, first of all, we need to transform the real text into an easy to handle representation of a machine learning algorithm. Words are the most basic unit of text. The text words in comments are transformed into real number vectors or other forms, and the learning model is trained and decided based on formal representation. Text representation is the foundation of natural language processing, and good or bad text representation determines the overall system performance. The earliest model that uses words as the basic processing unit is the bag-of-words model. Google’s open-source and efficient tool Word2Vec solved the above problems in 2013. Its neural network structure is shown in Figure 5.

The bag-of-words model puts all words into a bag, ignoring their sequential, syntactic, and other elements, and treats the occurrence of each word in the bag as relatively independent and as a combination of several words. The model is easy to construct, but it only symbolizes words and cannot retain any lexical meaning information, and there is a semantic gap. And the size of the vector dimension depends on the number of words in the corpus, and the constructed word vector matrix is too sparse, resulting in a dimensional disaster.

3.3. Analysis Model

The Continuous Bag-of-Words (CBOW) model assumes that the current word is predicted based on context. Since the CBOW model has multiple contextual words, we average these word vectors. The weighted average of the input word vectors can be expressed as the following equation:

The semantic representation as a context predicts the probability expression of the central target word as follows:

The word vector is the only neural network parameter in the CBOW model. The mathematical expression for optimizing the word vector matrix to maximize the log-likelihood of all words is as follows:

The probability of occurrence of each word is shown below.

The mathematical expression for optimizing the word vector matrix and maximizing all context log-likelihoods is as follows:

CBOW is more applicable in small databases, while Skip-Gram performs better in large corpora. The CBOW model is a predominant choice in various ML applications. In [3335], a bidirectional gated recurrent unit (GRU) was implemented in association with attention mechanism. The CBOW model was modified in the approach defining the context window and use of a weighted module to extract the word vectors from the text. To alleviate the huge computational task, two optimization methods are proposed: the hierarchical softmax method and the negative sampling method. Hierarchical softmax is an approximation technique which is inspired by the binary tree approach. H-softmax technique substitutes the softmax layer with the hierarchical layer that considers words as leaves. This helps in decomposing the probability of one word into sequence of probability calculations which eliminates the need of expensive normalization of words [36, 37]. It thus increases the speed of word prediction. In this way, each output node can obtain the above information and the following information at the same time, deal with more complex text expressions, and improve the sequential learning ability of the model. In this paper, we mainly use the TextRank algorithm combining TF-IDF and average information entropy for keyword extraction, which is referred to as TIHTextRank in this paper. The algorithm achieves keyword extraction by partitioning the text into several constituent units (i.e., words) and building a graph model to rank the important components of the text using a voting mechanism, thus using the information of a single text itself. The formula for calculating the TextRank score is as follows:

When using the TextRank algorithm to calculate the score of each point in the graph, it is often necessary to assign an arbitrary initial value to the points in the graph. Usually, the initial weight of the node is set to 1, and then, iterative operations are performed until convergence. For any word, its combined weight is calculated as follows:

From the TextRank algorithm, the transfer probability between any two points is calculated by the following formula:

Therefore, the TIHTextRank score of the final point is calculated as follows:

Finally, the prediction of central words is achieved by calculating the TIHTextRank scores of words and selecting -words as keywords in a descending order of scores along with the contextual words of the CBOW model as input. Since the model is ultimately used to solve the classification problem, the important features obtained by the self-attentive mechanism weight adjustment layer are input to the fully connected layer, and then, the final classification results are derived by the softmax classifier. Generally speaking, text data cannot be directly input into convolutional neural network for processing. Through the text preprocessing method mentioned above, the original text is transformed into the form of two-dimensional vector and then sent to the input of the model for training. Through the above tasks, the word vector output by the pretraining model can reflect the overall semantic information of the text as accurately and comprehensively as possible. To prevent overfitting, a dropout mechanism is introduced between the fully connected layer and the softmax layer to discard some of the trained parameters during the model training process.

4. Experiments and Results

To further validate the performance of the SA-BLCNN model proposed in this chapter, the following experiments are designed based on the word vectors obtained from the training of the KWCBOW model, and the accuracy, recall, and F1 value are used as the performance evaluation indexes of the model on the movie review dataset. When performing model training, the setting of model parameters often has a large impact on the performance of the model. Therefore, Experiment 1 is designed to compare experiments by fixing parameters, the dimensionality of word vectors, the value of dropout, the size of hidden layer in Bi-LSTM network structure, epoch, and the size of single training sample batch size, to evaluate the model performance according to certain performance evaluation indexes and obtain the optimal values of model parameters. Since the SA-BLCNN model is a hybrid sentiment analysis network proposed on the basis of a single CNN network and a Bi-LSTM network, therefore, the SA-BLCNN model proposed in this paper is trained and compared with the traditional CNN (neural network model), LSTM (long short-term memory) network model, and Bi-LSTM network model on the same dataset in Experiment 2 part, so as to achieve the performance evaluation of the model. In the process of model parameter tuning, a single variable parameter was used for all comparison experimental designs in order to ensure the validity of the experiments. Initially, the main parameters of the SA-BLCNN model were set as shown in Table 1.

In order to investigate the effects of four parameters, namely, word vector dimension, dropout, epoch, and batch size, on the model performance, four sets of comparison experiments were designed to determine the optimal parameter values. In order to investigate the effect of word vector dimensionality on the model performance, a comparison experiment was designed by adjusting the relevant parameters of the KWCBOW word vector training model to generate word vectors with different dimensions of 50, 100, 150, 200, 250, and 300 dimensions, and the experimental results are shown in Table 2 and Figure 6.

It can be seen that when the word vector dimension is 100, the SA-BLCNN model has the highest accuracy in the classification results. Therefore, the optimal word vector dimension parameter value for the SA-BLCNN model is 100. In training deep neural network models, the dropout mechanism is often introduced to prevent the occurrence of overfitting problems; the dropout mechanism refers to the process of eliminating all neurons with a certain probability during the training of neural network models, which can largely achieve the purpose of simplifying the neural network structure. In this experiment, the values of dropout are set to 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9 to investigate the effect of dropout on the classification performance of the SA-BLCNN model. The experimental results are shown in Table 3 and Figure 7.

Combining with the above graphs, it is clear that the SA-BLCNN model has the highest accuracy and F1 values when the dropout parameter value is 0.5. Therefore, the optimal dropout parameter value for the SA-BLCNN model is 0.5. In the process of neural network model training, the value of batch size often has a certain influence on the final training effect of the model. When the value of batch size is too small, the training time of the model will be longer, and the oscillation is not conducive to convergence. Generally speaking, the larger the batch size is, the more accurate its determined descent direction is and the smaller the training oscillation is, but when the batch size increases to a certain degree, its determined descent direction basically does not change anymore and it is easy to fall into the local optimal situation. The SA-BLCNN model proposed in this paper not only achieves the fusion of CNN network structure and Bi-LSTM network structure but also introduces the self-attention mechanism in the model, through which the relationship between words and words is enriched, the distance between distant dependent features is shortened, and more representative feature representations are extracted through the attention calculation. In summary, the proposed neural network-based hybrid sentiment analysis model SA-BLCNN represents the best in the sentiment classification task of this dataset and further verifies that it is feasible and effective to extract and fuse text features from different aspects using a multimedium network model.

5. Conclusion

With the continuous development of Internet technology, the channels for the public to obtain information have become more and more diversified. Douban film, as a domestic first-line film platform, not only contains thousands of film information but also has hundreds of millions of active users. On this platform, the public can not only see the film information they are interested in but also express their feelings and comments on the film. The value hidden in the massive comment information can not only provide users with real reference opinions but also provide information to businessmen, producers, and other film industry practitioners to help them understand the public’s preferences, so as to further cater to the market and lead the market to create excellent film works with more Chinese characteristics. In the text vectorial representation module, this paper firstly proposes a keyword extraction improvement algorithm TIHTextRank by combining TF-IDF algorithm, average information entropy algorithm, and traditional TextRank algorithm. On the basis of this algorithm, a text vectorial representation model combining keyword extraction technology is further proposed by combining the CBOW model of the Word2Vec model. Finally, multiple sets of comparison experiments were designed and implemented to further validate the KWCBOW model from both the linguistic perspective and the sentiment analysis task.

Although the system designed and implemented in this paper has achieved good results in the test set, there are still some problems to be solved due to my limited ability. In the text vector representation module, the TIHTextRank algorithm integrating the TF-IDF algorithm and average information entropy algorithm is used in this part. The algorithm uses the initial weight of words to construct the interword transfer matrix to improve the traditional keyword extraction TextRank algorithm. Later, we will consider further improving the keyword extraction technology combined with LDA topic model. For the operation efficiency of the system, although the average response time of the system is within 3 s in the later test process, it still has an impact on the user experience. In the later stage, the algorithm and hardware of the system need to be further optimized.

Data Availability

The datasets used during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflict of interest.