Artificial Intelligence and Its Applications 2014View this Special Issue
A Self-Adaptive Hidden Markov Model for Emotion Classification in Chinese Microblogs
Microblogging is increasingly becoming one of the most popular online social media for people to express ideas and emotions. The amount of socially generated content from this medium is enormous. Text mining techniques have been intensively applied to discover the hidden knowledge and emotions from this huge dataset. In this paper, we propose a modified version of hidden Markov model (HMM) classifier, called self-adaptive HMM, whose parameters are optimized by Particle Swarm Optimization algorithms. Since manually labeling large-scale dataset is difficult, we also employ the entropy to decide whether a new unlabeled tweet shall be contained in the training dataset after being assigned an emotion using our HMM-based approach. In the experiment, we collected about 200,000 Chinese tweets from Sina Weibo. The results show that the F-score of our approach gets 76% on happiness and fear and 65% on anger, surprise, and sadness. In addition, the self-adaptive HMM classifier outperforms Naive Bayes and Support Vector Machine on recognition of happiness, anger, and sadness.
In the recent years, online social media, such as Microblogging, has generated enormous content on the world wide web. Microblogs are extremely limited in the length of 140 characters. This allows users to easily update a Microblog or receive updates from mobile devices, such as cell phones. Twitter has grown to one of the most popular Microblogging websites and generated hundreds of millions updates per day. Similar to Twitter, Weibo is a Microblogging website in China. In the past few years, it has become an increasingly important source of online social media information with its population of Chinese users growing rapidly to 309 million in 2012 (http://news.xinhuanet.com/english/sci/2013-01/15/c_132104473.htm), as well as with more than 1,000 tweets generated in every second.
The emotional states of the users are able to be inferred from these large numbers of short tweets. Emotions in tweets play a significant role in many fields. The stock market and other socioeconomic phenomena could be predicted by using emotion analysis of the Twitter users . Even the gross happiness of a community or a country could be estimated from Twitter .
Text mining on Chinese tweets is a challenging work. First, word segmentation in Chinese is more difficult than in English since there is no space between Chinese characters and it requires disambiguating segmentation strings. Second, like English Twitter, new words are coming out every day and it is difficult to develop a system to recognize the unknown emotional words. Third, words are ambiguous in various contexts, especially those emotional words. Most of the recent sentiment analysis approaches [3, 4] for Chinese tweets employ emotional word count as main feature and build emotional word dictionary for inference.
Considering these issues, we propose a self-adaptive hidden Markov model (HMM) based method to perform the emotion analysis for Chinese tweets. Our method implements a self-adaptive mechanism, which learns the parameters of HMM models and appends new recognized emotional words or sentences into emotional word dictionary, to deal with the issues of the sentiment ambiguities of words and the generation of new words. The main contributions of this paper are shown as follows. (a) More fine grained emotional categories are recognized. We employ the well-known six basic emotional categories defined by Ekman : happiness, sadness, fear, anger, disgust, and surprise, which are more intuitive and useful than traditional sentiment analysis categories: positive, negative, and neutral categories. (b) More useful features, other than word count, are defined. Our method employs the category-based features extracted from the short sentences to train our proposed HMM models. (c) A self-adaptive mechanism is used to test on a real dataset collected from Sina Weibo.
The rest of the paper is organized as follows. In Section 2, we survey the related works of emotion analysis either in Twitter or in Weibo and provide background on the concepts and methods related to emotion analysis for tweets, especially for Chinese tweets. In Section 3, we describe our proposed self-adaptive HMM-based method. In Section 4, we illustrate the dataset and the experiment setup and present results from our study. We discuss limitations of the method and future work and conclude in Section 5.
2.1. Related Work
Weibo plays a significant role in people’s lives; therefore its opinion mining and emotion analysis become interesting researches. There are a lot of researches on Twitter emotions abroad. Spencer and Uchyigit  identified subjective tweets and detected their emotion polarity by using Naive Bayes. In their test using bigrams without part of speech (POS) tags has a high accuracy of . In the study carried out by Pak and Paroubek , they utilized Naive Bayes to classify tweets into positive, negative, and neutral. It is shown that using the presence of an -gram as a binary feature yielded the best results. A research  has shown that machine learning algorithms (Naive Bayes, Maximum Entropy, and SVM) had high accuracy rate with above when sentiment categories consisted only of positive and negative. In the detection based on Twitter corpus, the authors explored various strategies of selecting features and found that bigram features outperform unigram and POS.
However, studies of Chinese Weibo emotions grow just a few years. Zhao et al.  used Naive Bayes model to train on emoticon features; they classified Chinese tweets to four categories of emotions (i.e., angry, disgusting, joyful, and sad), with an empirical precision of . Yuan and Purver  classified Chinese tweets to six emotions which are defined by Ekman as well. In their experiments, character-based features achieved accuracies for “happiness” and “fear” based on the SVM, but there is insufficient analysis to the other four emotions.
Our self-adaptive HMM is to recognize more grained emotional categories by category-based features. Besides, the self-adaptive mechanism can continually enhance the recognition accuracy.
2.2. Emotion Models
Two models are normally used to represent emotions: the categorical model and dimensional model . The categorical models are based on the assumption that the emotional categories are distinct. Ekman defined six basic emotions: anger, disgust, fear, joy, sadness, and surprise, and found a high agreement of expressions in multiple culture groups in his study. D’Mello et al.  proposed five categories (boredom, confusion, delight, flow, and frustration) for describing the affect states in ITS interactions.
In the dimensional model, core affects represent emotions in a two- or three-dimensional space. A valence dimension represents positive and negative emotions on different ends of scale. The arousal dimension distinguishes excited states from calm states. Sometimes a third, dominance dimension is used to differentiate if the subject feels in control of the situation or not. For example, Positive Affect and Negative Affect Schedule (PANAS)  provides two opposite mood factors (positive and negative) which has been widely used for opinion mining.
In this study, the reason we employ Ekman’s emotion model is that the six basic emotions are distinct expressions and contain people daily mood.
2.3. Chinese Text Preprocessing
Chinese, a unique language, differing from another language like English, is written without word delimiters. Therefore, its word segmentation is a significant task. Chinese text preprocessing is divided into two steps: word segmentation and stop words removing.
To split words, there are some Chinese lexical analysis systems (Institute of Computing Technology, Chinese Lexical Analysis System (ICTCLAS) , IKAnalyzer (http://code.google.com/p/ik-analyzer/), HTTPCWS (http://code.google.com/p/httpcws/), Simple Chinese Word Segmentation (SCWS) (http://gc.codehum.com/p/libscws), which are developed to perform word segmentation task). The accuracy rates of those systems are all evaluated above 90 percent.
We utilized NLPIR (namely, ICTCLAS 2013) as a word splitter in this study. Its latest lexical database contains new words often appearing in Weibo. It also can recognize bloggers’ nickname. Furthermore, an adaptive word splitter can add new words in its lexical database, which can be applied on our self-adaptive mechanism to update word-emotional vocabulary.
For reducing dimension of feature space, the next important step is to remove stop words (quantifier, pronoun, digits, notations, etc., e.g., “hundred,” “we,” “3,” and “%”). Similarly, Weibo has many unique properties like usernames and usage of links that must be removed.
In the field of document classification, each document is represented as a vector of term. Effective feature extraction is essential to make learning task effective. We adopt four features for our emotion classification model as follows.
Mutual Information (MI). MI is a useful information measure, which refers to the correlation between two sets of events. MI reflects the relevance between terms and text categorization on emotions. Two events mutual information of term and emotion is defined as
Chi-Square (CHI). CHI is a statistical method to measure the lack of independence between term and emotion . The higher the CHI values, the more dependence between them: where represents presence of and membership in ; represents presence of and nonmembership in ; represents absence of and membership in ; represents absence of and nonmembership in ; is the total number of tweets.
Term Frequency-Inverse Document Frequency (TF-IDF). The main idea of TF-IDF is as follows: if a word or phrase in a tweet appears in a high frequency and rarely appears in other tweets; then it has a good ability to distinguish among emotions: where represents the frequency of the term in the emotion ; the main idea of is that if the term rarely appears in other emotions, can be an indicator feature for . Their definition is where means the number of how many times occurs in emotion , denotes the total number of tweets in the dataset, and is the number of tweets where appears.
Expected Cross Entropy (ECE). ECE reflects the probability distribution of the text category on emotions and distances between the probability distribution given term . Its definition is where represents the probability of emotion given term and represents the probability of tweets associated with emotion .
2.5. Classification Methods
Hidden Markov Model (HMM). HMM is used to describe a Markov process with unknown parameters. It is difficult to determine implicit parameters of the process through observable parameters, which are then used to make further analysis.
An HMM describes two related discrete-time stochastic processes. The first process pertains to hidden state variables, denoting , which emit observed variables with different probabilities, and the second process pertains to observed variables .
Main parameters of HMM are the transition and emission probabilities :(transition probabilities): it means current state is depend on the previous state ; (emission probabilities): observation symbol is released by current state .
In our model, an HMM model is built for each emotion . The features extracted from a tweet are observed variables, while each hidden state of is considered as a state associated with the feature . If the tweet gets the highest probability on the model , it means that is associated with emotion .
3.1. Features as Observed Variables
Feature extraction methods fall into two main categories: category-based extraction and global-based extraction. The features exacted by the latter methods can reflect the importance of words in global corpus but cannot be used to distinguish the differences between emotions. Therefore, we adopt category-based feature extraction methods in our models. All the selected features we introduced in Section 2.4 are category-based extracted features.
Suppose a set of features extraction methods . A tweet is first divided into terms . Let be the th feature of the word extracted by the method . Then, an intermediate matrix of term-level features is obtained for calculating the tweet-level feature vector. For each emotion , the tweet-level feature vector of the tweet can be calculated as follows: where , . Actually, is a mean value of th features over all the words.
In our experiment, a tweet is mapped into a four-dimensional vector using the four features: MI, CHI, TF-IDF and ECE. For example, the tweet “How lovely!” is divided into two terms: “how” and “lovely” in Chinese. The features of the terms on happiness and anger are calculated, respectively, as listed in Table 1.
Therefore, the tweet’s feature vector is in happiness and in anger.
3.2. HMM-Based Emotional Classification Model
In our case each hidden state is supposed to emit a value, and the whole model generates the sequence of values that constitutes the tweet’s feature vector. States are considered to be a set of values that represent the best emotional category. There is a one-to-one mapping between HMM states and tweet features, which requires hidden states transition to be stationary and the states to begin at the first state . When the classifier works, transition probabilities indicate features of test tweets being drawn closer to the emotion, which is given as follows: The emission probability also can be calculated by the known state , feature , and training data. Our calculation method is given at the end of this chapter.
Algorithm 1 depicts the procedure of building an HMM model for emotion . The tweet-level feature vector is denoted as .
We construct an HMM model for each of the six emotions. When a new tweet arrives, the probabilities of the tweet are calculated on the six models, respectively. The tweet is labeled with the emotion whose model is associated with the maximum probability.
A Strategy of Calculating . To find a way to calculate the of emotion , denoted as , we attempt to use Jaccard similarity measure  to test correlation between value and state : where represents the number of tweets, which contains both and in ; represents the number of tweets, which contains only in ; and represents the number of tweets, which contains only in . In order to check whether emotion associates to observation or state , we introduce an associated factor, denoted by . Let be the feature extracted from training data. The following describes the association: We assume that observation or state is associated with if the above inequality is met.
3.3. Self-Adaptive Mechanism
3.3.1. Parameters Computed by PSO
To build an excellent HMM classifier, an important problem we have to solve is to find optimized sequences of HMM states. There are a variety of strategies we can utilize for optimizing state parameters.
Particle Swarm Optimization (PSO) algorithm  is a population based stochastic optimization. In our algorithm, each particle represents a candidate solution of an HMM parameter. These particles move around in the search-space. Through particle’s local best known position, it is guided toward the best known positions. Eventually this iterative method can find best parameters. Compared to Genetic Algorithm , PSO has simple rules and more powerful ability of global optimization that has been good applications in our study: where indicates that its corresponding variable is a vector. The variables and are random numbers between . records the individual extremes and records the global extremes. The constant is the inertia weight, and and present acceleration constants. Moreover, updates velocities of particles according to previous velocity and the distance to the best particle. The equation updates the particles’ position according to its previous position and current velocity.
PSO Parameter Settings. Different PSO parameters may have a large impact on optimization performance. Following are guidelines to help us select PSO parameters [19, 20]. : decide the granularity of searching space. We set it through values of training data. : keep the particles motion inertia. We set the inertia weight at . , : represent accelerated weight of pushing each particle to and . We set both weights to the value of .
Fitness Function. We use fitness function to find two kinds of parameters in HMM models, that is, the optimized associated factor and the , which is the th hidden states of the HMM model . Accordingly, the fitness function can be defined as follows: where is the metrics (in Section 4.3) of classifiers’ accuracy. There are a total number of parameters that need to be searched. Since finding the whole set of parameters is time intensive by PSO, we divide the set of parameters into independent parts. Seven parameters , , are learned for each part according to the fitness function.
Since it is time-consuming and expensive to manually label all the tweets, we introduce a feedback method that can automatically decide whether an unlabeled tweet shall be chosen and contained in a training data pool after it is assigned with an emotion by our HMM model. Unlike the strategy (Lewis and Catlett , Scheffer et al. ), which merely concerns the assigned emotions, a suitable strategy is to compute the entropy of a tweet to identify the discrimination of emotions; that is, an emotion is more discriminating than all other emotions on the tweet . Entropy is an information-theoretic measure and its formula is given as follows: where means the probability of the tweet recognized as emotion . The less the , the more the certainty about the tweet on the emotion . The algorithm of pool-based feedback is shown in Algorithm 2 to decide whether a tweet should be contained in the training dataset.
Since there are no public datasets of Chinese tweets associated with emotions, many studies employ emoticons to label tweets [7, 8, 10]. However, only a small percentage of Chinese users post tweets with emoticons. Besides, several emoticons cannot find their corresponding emotions. Hence, we build our own dataset as follows.(i)About 200,000 tweets were collected through Sina API.(ii)For each emotion, more than twelve seed terms were chosen for term-level feature extraction. We refer interested readers to Aman’s annotation scheme  for seed term selection. For instance, a set of seed terms on happiness may contain “enjoy” and “pleased.”(iii)Manually screening. Not all tweets are associated with a corresponding emotion. We asked ten students in computer science to choose good indicator tweet for the six emotions.Notice that duplicate tweets are removed. Eventually, we selected , , , , , and tweets on happiness, anger, surprise, fear, sadness, and disgust, respectively.
The procedure of emotion recognition for a tweet is shown as follows:(a)preprocess dataset;(b)build emotion lexicons; each lexicon contains a feature vector of terms extracted by feature extraction methods;(c)calculate feature vectors of the whole dataset;(d)optimize hidden states of each by PSO;(e)for each , calculate feature vector of in emotion and obtain the output value through model ;(f)return .To evaluate the performance of each of the HMM models, we randomly select several hand-annotated tweets. The test dataset contains tweets for each emotion. Each test run is executed five times. We use the average results for our evaluation.
4.3. Evaluation Metrics
Precision, recall, and -measure are the most commonly used evaluation methods for text classification tasks . We employ the three metrics in our experiment. For each emotion , precision and recall are defined as where represents the number of tweets that are correctly recognized; represents the number of tweets that are falsely recognized as ; represents the number of tweets that are actually associated with , but are recognized as another emotion.
To balance precision and recall rates, is defined as
4.4.1. Compare with Another Classifier
We compared our approach with two well-known classifiers, that is, Naive Bayes and Support Vector Machine (SVM) . They are often used for sentiment classification in literatures because of their easiness of implementation.
As shown in Figure 1, our HMM-based approach outperforms Naive Bayes and SVM on happiness, anger, and sadness. The performances produced by all these classifiers have less difference on the other three emotions.
The results also show that, for the emotions of happiness, surprise, and fear, all the three classifiers get good performance. The -measure is greater than . Furthermore, the -measure exceeds on fear. None of the three classifiers recognize disgust accurately.
4.4.2. The Comparison of Six Emotions
Figure 2 shows the classification results of the six emotions using our HMM-based approach. We find that our approach gets the best accuracy over on happiness and fear. It also gets a good performance on anger, surprise, and sadness with an average accuracy of .
4.4.3. Analysis of HMM Results
We also attempt to analyze the reasons for false recognition on emotions. Twitters with specific characteristics may lead to false recognition using our HMM-base approach.(i)Some twitters may contain multiple emotions. For example, “Wow, I’m so smart!” (original sentence is in Chinese) contains both happiness and surprise. Multiple emotional tweets may cause false recognition. It also explains why our approach gets low -score on disgust, which is often falsely recognized as anger.(ii)A number of tweets contain new words that can not be recognized.In addition, puns and polysemous words are significant factors but rather difficult to be recognized. According to these reasons, our HMM-based approach can be improved concerning the above characteristics in our future experiments.
In this paper, we present an approach to extract features using the four methods, that is, MI, TF-IDF, CHI, and ECE. Our classifier is based on HMM, in which hidden states are found by PSO algorithm. Since manually labeling large-scale dataset is difficult, we employ the entropy to decide whether a new unlabeled tweet shall be contained in the training dataset after being assigned an emotion using our HMM-based approach. The experimental results show that HMM outperforms SVM and NB, especially on happiness, anger, and sadness. In terms of the recognition precisions on the six emotions, HMM gets better performance on happiness and fear than on anger, surprise, and sadness.
In the future, we will optimize HMM to accurately recognize twitters associated with other emotions and automatically add new words for emotional seed terms selection. Moreover, self-adaptive mechanism will be implemented in our HMM model.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by Cuiying Grant of China Telecom, Gansu Branch (Grant no. lzudxcy-2013-3), Science and Technology Planning Project of Chengguan District, Lanzhou (Grant no. 2013-3-1), and Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry (Grant no. 44th).
X. Lou, Y. Chai, H. Zan, R. Xu, Y. Han, and K. Zhang, “Research on micro-blog sentiment analysis,” in Chinese Lexical Semantics, pp. 466–479, Springer, Berlin, Germany, 2013.View at: Google Scholar
P. Ekman, “Universals and cultural differences in facial expressions of emotion,” in Proceedings of the Nebraska Symposium on Motivation, University of Nebraska Press, 1971.View at: Google Scholar
J. Spencer and G. Uchyigit, “Sentimentor: sentiment analysis of twitter data,” in Proceedings of the 1st International Workshop on Sentiment Discovery from Affective Data (SDAD '12), M. Gaber, M. Cocea, S. Weibelzahl, E. Menasalvas, and C. Labbé, Eds., p. 56, CEUR, 2012.View at: Google Scholar
A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining,” in Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC '10), Valletta, Malta, May 2010.View at: Google Scholar
A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant supervision,” Project Report CS224N, Stanford University, Stanford, Calif, USA, 2009.View at: Google Scholar
Z. Yuan and M. Purver, “Predicting emotion labels for chinese microblog texts,” in Proceedings of the 1st International Workshop on Sentiment Discovery from Affective Data (SDAD '12), M. Gaber, M. Cocea, S. Weibelzahl, E. Menasalvas, and C. Labbé, Eds., pp. 40–47, CEUR, 2012.View at: Google Scholar
P. Jaccard, Nouvelles Recherches sur la Distribution Florale, 1908.
J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, December 1995.View at: Google Scholar
D. D. Lewis and J. Catlett, “Heterogenous uncertainty sampling for supervised learning,” in Proceedings of the International Conference on Machine Learning, vol. 94, pp. 148–156, 1994.View at: Google Scholar
S. Aman and S. Szpakowicz, “Identifying expressions of emotion in text,” in Text, Speech and Dialogue, pp. 196–205, Springer, Amsterdam, The Netherlands, 2007.View at: Google Scholar
V. Vapnik, Statistical Learning Theory, John Wiley & Sons, 1998.