Abstract

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis technology. In recent years, neural networks are widely used to extract features of aspects and contexts and proven to have a dramatic improvement in retrieving the sentiment feature from comments. However, due to the increasing complexity of comment information, only considering sentence or word features, respectively, may cause the loss of key text information. Besides, characters have more microscopic features, so the fusion of features between three different levels, such as sentences, words, and characters, should be taken into consideration for exploring their internal relationship among different granularities. According to the above analysis, we propose a multifeature interactive fusion model for aspect-based sentiment analysis. Firstly, the text is divided into two parts: contexts and aspects; then word embedding and character embedding are associated to further explore the potential features. Secondly, to establish a close relationship between contexts and aspects, features fusion of both aspects and contexts are exploited in our model. Moreover, we apply the attention mechanism to calculate fusion weight of features, so that the key features information plays a more significant role in the sentiment analysis. Finally, we experimented on the three datasets of SemEval2014. The results of experiment showed that our model has a better performance compared with the baseline models.

1. Introduction

Sentiment analysis is an important and fundamental task in natural language processing, which aims to analyse the sentiment polarity of text. Aspect-based sentiment analysis is a fine-grained work of sentiment analysis; the aim is to analyse the sentiment polarity of the corresponding aspects in a sentence. For example, “I really like the appearance of this laptop, but its after-sales service is not very good.” There are two aspects presented, “notebook appearance” and “after-sales service.” The sentiment polarity for the notebook appearance is positive, while the sentiment polarity for after-sales service is negative.

In traditional, many researchers focused on statistical-based methods for sentiment analysis. Kiritchenko et al. [1] applied the method of manual labeling to sentiment classification of text. Wagner et al. [2] used the SVM method for analysing sentiment polarity of a sentence. In recent years, with the development of deep learning, more intelligent learning methods have been proposed [36], and researchers have applied neural network to construct feature representation system for sentiment analysis [79]. To extract feature representation, Dong et al. [10] proposed to apply recursive neural network to analyse the relationship between aspect and its contexts in Twitter dataset. Tang et al. [11] divided text into aspects and its contexts and fed them to the aspect-based sentiment analysis based on the long short-term memory network model. The relationship between aspects and contexts play a significant role in this process. Ma et al. [12] made a combination of long short-term memory network and attention mechanism to construct the IAN model between aspects and its contexts. Although these methods are much better than the traditional methods, the performance of ABSA is not good enough, because the extracted features are insufficient. In addition, most researchers use neural networks for sentiment analysis tasks, they adopt words embedding along with word-level feature analysis. However, Text has a strong hierarchical structure, that is, characters form words and words form sentences. In order to fully extract different levels of feature presentation information in the text, we believe that besides the word information, character information modeling should also be fully considered for further text information extraction. Moreover, the acquired character-level features help alleviate noisy information due to misspelling, thus allowing the model to focus on valuable information.

With the development of attention mechanism, attention mechanism is widely applied in sentiment analysis [1316]. Weights are assigned according to different tokens in output texts. Most researchers apply the attention mechanism to the neural network model of sentiment analysis, which improves the performance of sentiment classification. Ma et al. [12] propose IAN model, which focus interactively on learning attention in aspects and contexts. And IAN proved to be helpful to the performance of ABSA. Chao et al. [14] apply alternating co-attention to learn contexts and targets. The model can extract more important features in ABSA. In the neural network model, RNNs are difficult to parallelize, and the complexity of the neural networks cause the problem of gradient disappearance or gradient explosion. Song et al. [16] apply attention encoder network which eschews recurrence. Their model improves performance in aspect-based sentiment analysis. Overall, it is proved that attention mechanism plays a crucial role in ABSA tasks.

For the above analysis, we put forward an aspect-based sentiment analysis model, named Multifeature Interactive Fusion Model for Aspect-Based Sentiment Analysis (MFIF-ABSA). First, to extract more sufficient features, we input aspects and its contexts into this model for embedded representations, including word-level embedding and character-level embedding. LSTM is exploited for character-level feature information extraction on contexts and aspects. Second, for adequate interaction between features of the context and the aspect, the extracted feature information on contexts and aspects are interactively fused. Moreover, the output features are weighted by the attention mechanism, which makes the key features more evident in the sentiment analysis and improves classification performance. Finally, the extracted feature information is fed to the classification layer for sentiment classification. To effectively construct the relation between aspects and their contexts, our model makes an interactive operation in the feature extraction process. To evaluate the proposed approach, we examine our model on two datasets, SemEval2014 and Twitter dataset. The experimental results show that our method achieves the state-of-the-art performance on the three datasets.

In summary, the major contributions of our work are as follows: (1) in this paper, we extract hierarchical feature representation, such as character embedding, word embedding, and sentence representation. These features with different granularity are fused interactively, which would help our model obtain more effective feature representations in ABSA. (2) Using attention mechanisms to bridge hierarchical information of context and aspect, which allows our model to learn key information among different levels. (3) Increased feature interactivity enables our models to interact with feature information from contexts and aspects, thus enabling our models to learn about the interrelationships between contexts and aspects.

Sentiment analysis is an important and fundamental task in natural language processing. The traditional methods are mainly to apply sentiment dictionary to sentiment analysis [17, 18]. Kiritchenko et al. [1] proposed the SVM method for sentiment classification, which is based on the emotional dictionary and takes too much time for manual labeling. With the development of machine learning and deep learning, many researchers apply neural network model to sentiment analysis. Based on neural network, the original features are mapped into a continuous real-valued vector [19, 20]. The application of Tree-LSTM model in sentiment analysis has achieved good experimental results [21]. Jiang et al. [22] revealed the importance of aspect in sentiment analysis. The polarity of different object words may be different. Therefore, the role of aspect in the text should be taken into account. Different aspects have different sentiment polarities, so the features of aspect in a sentence should be taken into account. Many researchers emphasize the importance of aspect in the text. They split text into aspects and its contexts for sentiment polarity prediction which improved the performance [2324]. TD-LSTM, proposed by Tang et al. [11], divided the context into left and right parts according to the position of aspect, in which LSTM is applied to learn the feature of left and right contexts respectively; then concatenation of them improves the performance of ABSA. TD-LSTM fully considers the semantic information of aspects in the text to better conduct sentiment orientation analysis. With the development of attention mechanisms among various research fields [2528], Wang et al. [29] proposed attention-based LSTM for aspect-level sentiment classification. They concatenated the extracted aspect features and context features and used the attention mechanism to calculate the feature weights for highlighting the important information.

Attention mechanisms play an important role in bridging relationship between aspects and contexts, which significantly improves performance of the ABSA. Ma et al. [12] proposed an interactive attention network model. They used attention mechanisms to capture important features of contexts and aspects and improve the sentiment analysis performance of ABSA. To extract more important features in ABSA, Chao et al. [14] use alternating co-attention to learn contexts and targets. In order to solve the parallel problem of RNNs, Song et al. [16] apply attention encoder network which eschews recurrence. Their model improves performance in aspect-based sentiment analysis.

Based on the above analysis, we proposed a multifeature interactive fusion model for aspect-based sentiment analysis, in which multilevels of feature extraction enhance the ability of modeling feature representation and interactive feature fusion bridges their internal relationship, which achieves a further promotion in the accuracy of classification. Features are extracted from character-level and word-level, and model feature representation performs better.

3. MFIF-ABSA Model

In our paper, the input representation of sentence is formalized as . We supposed that a context consists of words and a target have words. In an aspect, .

In this section, we mainly describe the details of multifeature interactive fusion model for aspect-based sentiment analysis (MFIF-ABSA). To be specific, we defined the description of the MFIF-ABSA task and then introduce the method of the MFIF-ABSA in detail. The overall architecture of MFIF-ABSA model is shown in Figure 1. The MFIF-ABSA is divided into two major parts, contexts and aspects. Then the contexts and the aspects are subdivided, including word embedding layer, feature interaction layer, attention neural network layer, and classification layer.

3.1. Input Embedding Layer

First, words are embedded into the word representation, represented by vectors. Those vectors are preprocessed with glove [30]. The whole word data is mapped into a continuous low-dimensional real-valued vector The vocabulary size is represented by . The dimension of the mapped dimension is , and the matrix after the word embedding of the word is . Then, characters are embedded. Each word is divided into characters. For example, the word "beautiful" consists of the corresponding letters “b-e-a-u-t-i-f-u-l.” When a character such as “b-e-a-u-t” is known, the word “beautiful” can be easily predicted. Apparently, the model enhances the effect of semantic analysis. We represent the word characters as . The character dictionary size is and the mapped dimension is , so we get the character embedding matrix . The embedding layer is divided into two parts: the left side is the word-level embedding, and the right side is the character-level embedding.

The fusion of character embedding and word embedding can help our model learn the morphological characteristics of text and have a positive effect on the performance of our models. As shown in Figure 2, the right side of the figure is character embedding. We applied a bidirectional long short-term memory network (Bi-LSTM) to capture the sequence information of each word on the character embedding. LSTM can learn the long-term dependence between words in sentences and can avoid the problems of gradient vanishing and gradient explosion inherent in the traditional recurrent neural network during training. Specifically, given the time step , the character embedding is denoted as , and the update procedure of the Bi-LSTM is as follows:where , , , , , ,, σ is the sigmoid activation function, is the input gate, is the forgetting gate, is the output gate, ⊙ is the element dot product, is the connection of hidden states in LSTM, is the hidden state of the forward LSTM, is the hidden state of the backward LSTM, and is generated by concatenating the and .

Character features are extracted by LSTM, and then word embedding features are concatenated to help the model learn relevant microscopic information of input text throughout the training steps. are obtained by integrating information from word-level and character-level embedding, which are called character-word feature fusion.

To facilitate understanding, the character-word feature fusion feature of contexts information in our work are represented as , and the character-word feature fusion of aspects information are represented as .

3.2. Feature Interaction Layer

To enable our model to interactively learn the feature information between contexts and aspects, we use the pooling mechanism to extract context features and aspect features. Maximum pooling is exploited to select the maximum value from the window (several consecutive vectors) as the output feature. The results of the maximum pooling from the aspects are concatenated with the to achieve feature interacting. The same operation is performed on the context information as formula (6).where is the Maximum pooling, and are the features obtained after the operation, and and represent the interaction information. They are named the context interaction feature and the aspect interaction feature. is the final feature fusion through a connection of interactive information from aspect and context.

3.3. Attention Neural Network Layer

Gated recurrent unit (GRU) is used to further paraphrase the hierarchical information from the previous feature interaction layer (). GRU has only two gates, update gate and reset gate, which is simpler than LSTM.where and are update gates and reset gates, respectively; , , are weight matrix and , are bias; and is obtained by connecting the forward-propagating GRU information and the backward-propagating GRU information.

The GRU output is processed with the attention mechanism. According to the feature weight score, the attention mechanism highlights the contribution of effective features and controls the interference of redundant information in the process of sentiment analysis. The important feature information gets a higher weight. We use softmax function to get normalized attention. is the score function.where and are the weight matrix and bias.

3.4. Classification Layer

We feed to the fully connected layer for sentiment prediction and use the softmax function to calculate the probability of sentiment classification. The number of categories to be classified is , and the predicted classification is.where , , and are the weight matrix and bias. The number of categories is 3. The cross-loss entropy is used as the loss function.

4. Experimental and Analysis

In this section, the model MFIF-ABSA conducts experimental research for aspect-based sentiment analysis. The section is divided into seven subsections. The first subsection introduces the experimental setup and explains some parameters in the experiment. The second subsection introduces the datasets. The third and fourth subsections, respectively, introduce the comparison method and result analysis of our model; the fifth and sixth subsections introduce the comparative experimental methods of our model; finally, for the convenience of understanding, the advantages of each branch are directly reflected in the case analysis over other models.

4.1. Experimental Setting

We use the 300-dimensional Glove vector for preprocessing. The dimension of word embedding and character embedding is 300 and 100, respectively. The dimension of the hidden vector of LSTM and GRU is set to 300. The learning rate is set to . To prevent over-fitting, we applied dropout on the word embedding layer and the attention neural network layer separately. The value set is 0.5. The batch size is set to 32. The optimizer function is Adam. And the other parameters are randomly initialized with a uniform distribution U (−0.01,0.01). Evaluation metric is classification accuracy. All the models are implemented in Keras. The classification accuracy is defined aswhere T is the number of correct samples which is predicted. N is the number of total samples.

4.2. Dataset

We experimented on three datasets, two of which were from SemEval2014 [31], including views of restaurant and laptop, and the other was the Twitter dataset [10]. The statistics of the dataset are shown in Table 1.

4.3. Comparison to Other Methods

In this section, we introduced the evaluation of our method, as shown in Table 2. The benchmark for the assessment is to evaluate the performance of the model by calculating the accuracy on the same experimental dataset.Majority is the baseline, which chooses the largest sentiment polarity in training set to each instance in the test set.LSTM learns the hidden state of the context and gets the average vector for sentiment prediction [29].TD-LSTM. According to the position of aspect, it is divided into left contexts and right contexts, and then the features extracted from left and right contexts of the model are, respectively, concatenated to predict the sentiment polarity of aspect [23].AE-LSTM. Text is divided into aspects and contexts, respectively, and then the context features and aspect features are concatenated to predict the sentiment polarity of aspect [29].ATAE-LSTM. Based on AE-LSTM, the attention mechanism is used to generate the final features and predict the sentiment polarity of aspect [29].PRET_MULT is applied to transfer knowledge from document-level data to predict the sentiment polarity of aspect [32].IAN, an interactive attention network model, inputs contexts and aspects into the LSTM network respectively, then the aspect and context features are interacted, the attention mechanism is used to facilitate the extraction of the context features which is relevant to the target. Finally, all the features are connected and fed into the softmax function for emotional prediction [12].IAD-ABSA. Hazarika et al. [33] proposed a sentiment analysis method for internal dependencies between objects and contexts. For convenience, we named this model IAD-ABSA, which used stacked LSTMs to build a network model. Finally, the attention mechanism is applied to generate the final features, and the outputs from previous step are fed into the softmax layer to predict the sentiment polarity.GCAE. Wei et al. [34] applied gated convolutional networks model to predict the sentiment polarity, and the model uses gated Tanh-ReLU unit to control the sentiment features according aspects.

5. Main Results and Analysis

5.1. Results and Analysis of Baseline Method

Table 2 shows the results of the accuracy of the state-of-the-art baselines on the same datasets. We compared the model to these baseline models and analysed its advantages and disadvantages.

The performance of the majority method is the worst among all baselines. TD-LSTM is better than LSTM. Among the baselines of neural network, the performance of the LSTM is worst. The performance of TD-LSTM is better than LSTM. The reason is that the TD-LSTM method inputs the contexts and aspects separately. GCAE also divides the text into aspects and contexts, and then concatenates the feature representations and achieve better performance. Consequently, it is helpful to model the contexts and aspects separately for aspect-based sentiment analysis. AE-LSTM performs better than the LSTM model, because AE-LSTM not only separates the aspects and contexts in the model, but also combines the attention mechanism to obtain the weight of the feature representations. Its performance is better than LSTM, which proves the attention mechanism plays a positive role in performance of ABSA. ATAE-LSTM improves the AE-LSTM model, which emphasizes that aspect occupies a more important position in aspect-based sentiment analysis. The IAN model uses aspects and contexts’ feature representational interaction for sentiment analysis. The feature representations are obtained by interactive attention of contexts and aspects. And the context features and aspect features are concatenated. Experimental results prove that the performance of the IAN is better than AE-LSTM and ATAE-LSTM. IAD-ABSA utilizes the internal dependence of text and shows the importance of the relationship between aspects and contexts for sentiment analysis.

5.2. Results and Analysis of MFIF-ABSA Model

In this paper, the performance of our model is better than IAN, which proves that the feature interaction strategy proposed in this paper is superior to the interaction method used in IAN. Our model includes MF-ABSA and MFIF-ABSA. MF-ABSA is a model with no feature interaction on the feature interaction layer of the model, while MF-ABSA and MFIF-ABSA outperforms all baseline models. Firstly, we input contexts and aspects into the model, which interactively learns aspects feature representations with the contexts feature representations. Secondly, character-level embedding and word-level embedding are used, allowing our model to extract more feature representations and help our model to sufficiently establish the internal relationships within the sentence. Finally, the attention mechanism is applied to generate the final feature representation, so that the role of important feature information in all feature representations can be more significant. Among our two models, MFIF-ABSA performs better than the MF-ABSA. Only on the dataset of restaurant, the performance of MF-ABSA is better than that of MFIF-ABSA. In general, the feature interaction strategy proposed in this paper improves the performance of ABSA.

5.3. Effects of Character Embedding

In this section, the effect of character-level embedding on performance promotion is discussed. Our experiment was based on the MFIF-ABSA model on the three datasets. The character embedding layer in the model is removed and compared with MFIF-ABSA. The experimental results are shown in Figure 3. No_char is a model with no characters embedding. The experimental results of the model with character embedding layer show an increase by 1%, 1.84%, and 1.19% over the model with the character embedding layer. The above results proved that the character embedding layer has a positive effect on our model.

5.4. Effect of Attention

Next, the effect of attention mechanisms on performance promotion is discussed. Our experiment was based on the MFIF-ABSA model on the three datasets. The attention mechanisms layer in the model is removed and compared with MFIF-ABSA. The experimental results are shown in Figure 4. No_att represents the model without attention mechanism. The experimental results show that the model with attention mechanism improved the model’s performance by 1.02%, 1.44%, and 1.16%, respectively. The above results proved that the attention mechanism layer has a positive effect on the performance.

5.5. Case Analysis

In the last section, we analysed the error analysis of our model. Three sentences were extracted from three datasets. The accuracy of No_char, No_att, and MFIF-ABSA were evaluated, respectively. The results are shown in Table 3, for example, in a context “Great food, but the service is dreadful.” The predicted polarity of the No_char and No_att about the aspects (food, service) does not match the true polarity of the sentiment, while MFIF-ABSA manages to analyse its sentiment tendency accurately. Moreover, in a context “The staff should be a bit more friendly,” No_att fails to get the right analysing result, while No_char and MFIF-ABSA get the right one. Finally, in a context “If you want good tasting, well seasoned aspect term eat at Cabana and you can't go wrong,” accuracy analysis of No_char is wrong, but that of No_att and MFIC-ABSA is right. The results of three cases prove that MFIF-ABSA has a significant advantage in sentiment analysis than that of No_char and No_att models.

6. Conclusions

In this paper, we proposed a multifeature interactive fusion model for aspect-based sentiment analysis model, which not only puts the aspect information and context information into the model separately, but also uses character-level embedding to extract subtler feature information, and a feature interaction strategy is proposed to model hierarchical information in sentences. Finally, on the basis of feature interaction strategy, the attention mechanism is integrated to enhance the role of important features in classification. Through our experiments, the performance of our model is better than other baselines. In the future, we will evaluate the performance of our model in other datasets and explore whether character embedding has a positive effect on other tasks in natural language processing or not.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by National Natural Science Foundation of China (grant nos. 61772211 and 61503143).