Abstract

Sentiment analysis has been widely used in text mining of social media to discover valuable information from user reviews. Sentiment lexicon is an essential tool for sentiment analysis. Recent research studies indicate that constructing sentiment lexicons for special domains can achieve better results in sentiment analysis. However, it is not easy to construct a sentiment lexicon for a specific domain because most current methods highly depend on general sentiment lexicons and complex linguistic rules. In this paper, the construction of sentiment lexicon is transformed into a training-optimization process. In our scheme, the accuracy of sentiment classification is used as the optimization objective. The candidate sentiment lexicons are regarded as the individuals that need to be optimized. Then, two genetic algorithms are specially designed to adjust the values of sentiment words in lexicon. Finally, the best individual evolved in the presented genetic algorithms is selected as the sentiment lexicon. Our method only depends on some labelled texts and does not need any linguistic knowledge or prior knowledge. It provides a simple and easy way to construct a sentiment lexicon in a specific domain. Experiment results show that the proposed method has good flexibility and can generate high-quality sentiment lexicon in specific domains.

1. Introduction

Currently, it has become very convenient for people to express opinions and share knowledge through the Internet. With the continue increasing of online reviews on the Internet, sentiment analysis is becoming a research hot issue in natural language processing. To provide better personalized service, sentiment analysis has been applied to extract user opinions from the comments on the Internet [1]. In most methods of sentiment analysis, sentiment lexicon is an essential tool. Sentiment lexicon includes sentiment words or phrases and their intensity value and polarity [2]. Some recent research studies find that different domains have their special sentiment words, and these special sentiment words usually play an important role in sentiment analysis [3]. Accordingly, how to construct the sentiment lexicon in specific domains is very important to sentiment analysis, which becomes a hot issue in recent years.

The traditional unigrams sentiment lexicon is primarily collected by experts and annotated manually. Some generally used lexicons have been proposed, such as General Inquirer (GI) [4] and SentiWordNet [5]. Although these methods provide a good foundation to construct sentiment lexicon, the number of sentiment words in the lexicon is small and the coverage of sentiment words is limited. To improve the coverage of sentiment words, researchers proposed some methods to extend the traditional sentiment lexicons [6, 7]. Unigrams sentiment lexicons, degree adverbs, and negative words are combined together to construct n-grams sentiment lexicons [2]. However, the n-grams lexicons constructed by these methods only provide the sentiment polarity of n-grams but do not quantitatively describe their sentiment intensity. Moreover, due to the limitation of basic lexicon, these methods cannot generate new sentiment words, which have never appeared in the basic lexicon.

In some cases, the value and polarity of the same sentiment word may change in different domains. In order to better satisfy the requirement of sentiment analysis, it is necessary to construct sentiment lexicons by considering domain traits. There are some efforts to construct domain-specific sentiment lexicons. A common idea is to use the value of the sentiment word in a general lexicon as the basic value and then automatically adjust the basic value according to the corpus of the target domain [8, 9]. However, most of methods based on this idea is related to the background knowledge of domain and require a large number of extra annotations by linguists. Thus, these methods have no universal property and can hardly construct lexicon for other domains.

In recent years, some methods based on machine learning have been proposed to construct sentiment lexicon in specific domain [1012]. These methods can directly extract sentiment words from the corpus of a specific domain by learning the features of the text. The main advantage of machine learning methods is that they do not depend on the linguistic knowledge and the domain knowledge and have good universality. Moreover, these machine learning methods can generate new sentiment words that are not related to the basic lexicon and break the limitation of basic lexicon. Inspired by the idea of machine learning, we proposed a training-optimization-based method to automatically construct domain-specific sentiment lexicons. Different to the machine learning methods, we regard the process of learning as an optimization process and design genetic algorithms to optimize the intensity values of sentiment words. Compared with machining learning methods, our method has a simple computational structure and high running speed. The main innovation and contributions of this research are as follows:(i)A framework of constructing the sentiment lexicons for specific domains is proposed, in which the construction of sentiment lexicon is converted into a training and optimization process(ii)Our method extracts sentiment words from the short texts collected in the target domain, which breaks the limitation of seed lexicon and effectively improves the coverage of sentiment words of specific domains(iii)We specially design two genetic algorithms to optimize the sentiment lexicon, which makes it possible to automatically adjust the intensity values of sentiment words according to the domains. Our method has good universality and can be used in any domains

The remaining parts of this paper are organized as follows. In Section 2, a brief review of related works in sentiment analysis is given. Section 3 represents the framework of training-optimization and the proposed algorithms for constructing sentiment lexicon. The advantage of our scheme is described in Section 4. In Section 5, we present the experimental results. Finally, conclusions are drawn in Section 6.

Sentiment lexicon is an important basic component for sentiment analysis, which also has an important influence on the results of sentiment classification. There are some famous sentiment lexicons built by the experts before, such as General Inquirer [4] and SentiWordNet [5]. Although these general sentiment lexicons have achieved certain success in text sentiment analysis, the coverage of universal sentiment lexicon is not comprehensive for some specific domains. To extend the coverage of sentiment lexicon, Wang et al. selected some sentiment words from the existing sentiment lexicon as seed and used the PU learning method to generate new sentiment words [6]. Zhang et al. constructed a comprehensive sentiment lexicon by collecting network words and emoticons widely used in Chinese microblogs [13]. The polarity of sentiment word in this lexicon is determined according to the sentiment polarity of the texts. Milagros et al. created an emoji lexicon in an unsupervised way [7]. The initial sentiment value of each emoji is set as the value assigned by the emoji creators. Then, the value of emoji is adjusted based on the texts containing the emojis. Although these methods effectively increase the number of sentiment words and provide some good ways to set the intensity values of sentiment words, they do not address the problem that the values of sentiment words vary from domains.

In recent years, some methods have been proposed to construct domain-specific sentiment lexicons. In order to adapt the sentiment classification for specific domains, Deng et al. extracted candidate words from the unannotated corpus [9]. Then, the sentiment orientation of the candidate word is determined by measuring the relations between the candidate word and the sentiment words in a seed lexicon. Nuno et al. presented an approach based on statistical measures to construct a stock market lexicon [14]. The value of sentiment word is calculated according to a lot of labelled stock market microblogs. Wei Li et al. proposed a method to detect new words in specific domains, which incorporates manually calibrated sentiment scores, semantic information, and statistical similarity information derived from word2vec [15]. Frank et al. adapted word polarities to target domain by training a sentiment classifier [8]. In the training process, the wrongly predicted sentences are used as the feedback to correct sentiment words, which effectively improves the accuracy of sentiment classification. However, most of these methods require external human annotations and complex linguistic knowledge. To automatically construct domain-specific sentiment lexicons, researchers proposed some methods. William et al. constructed a lexical graph and defined the graph edges and propagated sentiment labels over the graph by a random walk method [16]. Then, a bootstrap-sampling approach is utilized to obtain confidence regions over the sentiment scores. Sixing Wu et al. utilized syntactic relations and semantic similarities to extract opinion pairs. The opinion pairs are utilized in a one-class SVM classifier to automatically construct a sentiment lexicon for the specific domain [17]. However, the methods in [16, 17] do not consider the influence of n-grams. It has been proven by researchers that n-grams are very important to sentiment classification [18]. Authony et al. optimized n-gram-based text feature selection to improve the accuracy of sentiment analysis [19]. In [20], intensifiers and negations are extracted to construct n-gram features for cross-domain sentiment classification. The limitation of these two methods [19, 20] is that they depend on the existed unigrams sentiment lexicons and do not provide sentiment values for the n-gram features.

Based above analysis, the main problems of constructing sentiment lexicons can be summarized as follows. (i) For the traditional methods, they depend on linguistic knowledge and update sentiment words slowly. They are unable to add new sentiment words generated by the Internet to the lexicon on time. (ii) Some unigrams lexicons and n-grams lexicons only provide the polarities of sentiment words but do not provide the intensity values of sentiment words. (iii) The existing methods of constructing domain-specific lexicons are specially designed according to the background knowledge of domain, which are not universal.

Today, machine learning-based methods are also widely used in sentiment analysis. Veny et al. utilized three methods of decision tree, Naïve Bayes, and random forest to do the classification of social media Twitter [21]. Saerom Park et al. proposed a semisupervised distributed representation to describe the difference of document for sentiment analysis [22]. Some deep learning models such as CNN, LSTM, and GRU are applied to sentiment analysis [23, 24]. The attention mechanism is applied in the deep learning framework to improve the sentiment analysis performance [25]. Inspired by these ideal of self-learning, we propose a novel scheme to automatically construct sentiment lexicon based on corpuses. The proposed scheme can overcome the above problems that have existed in the previous studies, which provide a new way to construct sentiment lexicon with the help of network text resources.

3. The Proposed Scheme

3.1. Framework of Training-Optimization Scheme

In Web forums, there are plenty of user reviews, which usually contain the sentiment orientation of users. Plenty of user comments gathered in Web forums are valuable sources to extract sentiment words in specific domains. However, although the sentiment orientation in user comments is mainly determined by the intensity and polarity of sentiment words, it is still difficult to describe the relationship between them by an explicit formula. Inspired by the idea of machine learning, we propose a training-optimization framework to solve this problem, in which the construction of sentiment lexicons in a specific domain is transformed into a process of supervised learning. Thus, the presented framework provides a novel and effective way to construct sentiment lexicon and determine the value of sentiment word according to the sentiment orientations of texts.

The proposed framework is shown in Figure 1, which mainly includes four parts. The first part is to extract the sentiment words from the training dataset. In the second part, randomly initialize the values of sentiment words to construct the initial sentiment lexicon. In the third part, classify the texts according to the sentiment lexicon and judge whether the results of text classification meet the requirement. If it does not satisfy the requirement, the training-optimization-based algorithm is used to adjust the sentiment lexicon in part four. The detailed descriptions on these four parts are as follows:(i)Part (i). Sentiment words extraction: The review texts in a specific domain are collected from Web forums and used as the corpuses to construct the sentiment lexicon. Since the comment texts such as tweets are very short, each word and n-gram feature in the texts have a certain impact on the result of sentiment classification. Thus, the words and the n-grams features appearing in the corpuses more than twice are selected as the candidate sentiment words.(ii)Part (ii). Sentiment values initialization: The value of sentiment word is set according to a ten-point system. We initialize each sentiment word a random value in the interval {−10, −9, …,−1,0,1, …, 9,10}. The positive and negative signs indicate the sentiment polarity, and the value of sentiment word represents the sentiment intensity.(iii)Part (iii). Sentiment classification and evaluation: A selected text is classified according to its sentiment polarity. The sentiment polarity of a text is determined as follows. Get the value of each sentiment t words according to the sentiment lexicon and accumulate the values of all sentiment words. If the sum of sentiment values is larger than 0, the text is classified as positive. Otherwise, the text is classified as negative. Since all the testing texts have been annotated manually, the accuracy of text classification can be evaluated by comparing with the text labels. Finally, if the accuracy of text classification meets the requirement or the optimization algorithm is converged, output the current lexicon as the final sentiment lexicon. Otherwise, go to part (iv) and adjust the value of sentiment words in the lexicon.(iv)Part (iv). Adjusting sentiment lexicon: Since there is no explicit rule to guide the adjustment of sentiment value, we randomly adjust the value of emotion words. In order to ensure the effectiveness of the adjustment, we further transform the process of adjusting the values of sentiment words into an optimization process. The accuracy of text classification is used as the optimization objective. The optimization of sentiment lexicon will be implemented by a genetic algorithm, which will be described in Section 3.2.

3.2. The Proposed Genetic Algorithms

In the framework, how to adjust the values of sentiment word in the lexicon is the core part of the framework. We adjust the intensity values of sentiment words based on the idea of the genetic algorithm. In this section, we firstly design the basic operations in the genetic algorithms, and then the framework is implemented by two algorithms, which are, respectively, called Algorithm 1 and Algorithm 2.

3.2.1. Basic Operations
(i)Population initializationIn order to optimize the lexicon by a genetic algorithm, a sentiment lexicon is regarded as an individual. Correspondingly, each sentiment word in a lexicon is regarded as a gene of the individual. Each sentiment word in a lexicon is initialized as a random integer between –10 and 10. By the same way, we initialize the whole population. Table 1 is an example of the initialized population.(ii)Fitness value calculationThe fitness value is designed based on whether a sentiment lexicon can correctly classify the texts. If a text is correctly classified by the lexicon, the fitness value of the lexicon is increased by a reward value. Otherwise, the fitness value is subtracted by a penalty value. The reward/penalty function is described as follows:where is an individual, i.e., a lexicon, is the text in the training dataset , is a penalty factor, and is the sentiment value of a text that is computed by the sentiment lexicon according to the following equation:where is the value of the sentiment word and n is the total number of sentiment words in the text.According to the reward and penalty function, the fitness value of a lexicon is calculated as follows:(iii)CrossoverLet and be the two individuals who will perform the crossover operation. In the process of crossover, a position is randomly selected. Then, the sentiment values of the gene in is exchanged with that of the gene in . We repeat randomly selecting a new position and exchanging the corresponding sentiment values. The total number of exchanged genes is determined by the crossover ratio . Finally, we can generate two new individuals by the crossover operation, which are denoted as and , respectively. Here, for better understanding of our crossover scheme, an example is given in Figure 2. In Figure 2, the second and fifth genes of individual L1 and L2 are exchanged.(iv)MutationIn general, the sentiment polarity of the text is closely related to the polarity of sentiment words in the text. If the probability that a sentiment word exists in the positive texts is greater than its probability in the negative texts, the polarity of the sentiment word is also positive with a high probability, and vice versa. According to this idea, we design a new mutation strategy, which is different to the traditional mutation strategy that randomly changes the value of gene.Define as the probability that sentiment word appears in the positive texts. We present the following function to guide the mutation of word :Function is plotted in Figure 3. We can see from Figure 3 that the value of function is around 0 and changes slowly when . It means that probability of word appearing in a positive or negative text is almost the same. Thus, function plays little influence on the sentiment value of . When , it means that word appears in the negative texts with a probability more than 80%. Thus, the output of function is a negative value and influences the final sentiment value greatly. Similarly, when , the output of function is a positive value and plays important impact on the final sentiment value. When or , they are the transition intervals and function plays an medium effect on the final sentiment value.According to function , we propose the mutation formula as follows:where is the value of sentiment word after mutation and is a random number in interval [−10, 10]. In equation (5), guarantees the diversity of population and function makes the value of sentiment word changed along the polarity direction. It should be noted that if is out of the interval [−10, 10], assign the corresponding boundary value to it.Based on above analysis, our mutation scheme is proposed as follows. Randomly select one gene in the individual and change its sentiment value according to equation (5). Then, we repeat randomly selecting a new gene and changing its sentiment values. The total number of mutational genes is determined by the mutation ratio .
3.2.2. Algorithm 1

According to the idea of evolution, Algorithm 1 is proposed to construct a sentiment lexicon. The details of Algorithm 1 are as follows:Step 1. According to the target domain, short texts such as microblogs and tweets are collected from the Internet. Pick out the texts with emotions and manually annotate them to obtain a training dataset.Step 2. For each text in the training dataset, the words and the n-grams features appearing more than twice are selected as the words of the sentiment lexicon.Step 3. Initialize the population according to the scheme in Section 3.2.1.Step 4. Calculate the fitness value of each individual according to the scheme in Section 3.2.1. Then, the roulette wheel policy is used to select two individuals from the population. Denote the two selected individuals as and , respectively.Step 5. Set the value of crossover ratio and execute the crossover operation between and according to the crossover strategy in Section 3.2.1. Correspondingly, we get two individuals and after crossover operation.Step 6. Set the value of mutation ratio and perform the mutation operation on and according to the mutation strategy in Section 3.2.1. Correspondingly, we get two individuals and after mutation.Step 7. According to the roulette wheel selection policy, select two worst individuals from the population, which are denoted as and , respectively. Compare the fitness values of , , , and , select the two individuals with higher fitness value back to the population, and remove the other two individuals.

Repeat Step 4 to 7 until the population is converged. Finally, the best individual in the population is selected as the sentiment lexicon.

3.2.3. Algorithm 2

In Algorithm 2, we optimize the sentiment lexicon by the evolution of population from one generation to the next. Moreover, we introduce an elite strategy into Algorithm 2 to improve the speed of convergence. Algorithm 2 is described as follows:Steps 1–3.These steps are the same as Algorithm 1.Step 4.Set the value of the proportion of elites in the population as . The first individuals with larger fitness value are regarded as the elites of the current generation. These elites are directly selected as the individuals of the next generation.Steps 5–7.These steps are the same as Steps 4–6 of Algorithm 1.Step 8.The two new generated individuals and are used as the individuals of the next generation. Repeat Steps 5–8 until all individuals in the next generation are produced.Step 9.Repeat Steps 4–8 until the population is converged. Output the best individual as the sentiment lexicon.

4. The Advantage of Our Scheme

Our scheme provides a simple way to construct sentiment lexicon for a specific domain. In our scheme, the sentiment words are extracted from the corpuses collected from a specific domain. Our scheme can be regarded as a corpus-driven method. Getting enough high-quality corpora is very important to our scheme. Nowadays, microblogs, twitters, and Web forums provide plenty of resources to create corpuses. The sentiment lexicon constructed by our scheme can achieve high coverage if enough corpuses are obtained. The newly generated sentiment words are also easy to collect by our scheme if the corpora are updated from time to time. As we know, under current technical conditions, it is not difficult to automatically obtain and update the corpus from the Internet, which laid a good foundation for our scheme. Moreover, in the process of constructing the sentiment lexicon, our scheme is not related to the background knowledge of domain or the linguistic knowledge. Thus, our scheme has good universality and can be used to construct sentiment lexicon for any domains.

In our scheme, we have adopted the working method of machine learning. The intensity and polarity of sentiment words are determined by a training-optimization process. Our scheme provides an indirect and effective way to solve the problems of infer the value of sentiment words according to the polarities of short texts. Thus, the sentiment lexicon constructed by our scheme not only includes emotion word and their polarity but also includes their intensity values, which can better support sentiment analysis of texts.

In our scheme, the quality of sentiment lexicon heavily depends on the corpora collected from a certain domain. Using network searching technique or network spider technique, our scheme can continually collect corpora from the Internet and automatically obtain new sentiment words, including unigram and n-grams, and update their intensity values. Thus, the sentiment lexicon constructed by our method has the ability of continually updating sentiment words and improving its quality.

5. Experiments

5.1. Datasets

Our experiments are performed on five public datasets of different domains, which are OMD, SOMD, HCR, SemEval2013, and STS-Test. Since the polarities of corpuses in the datasets have already been annotated by the publishers, they can be used to test our scheme directly. The details on these datasets are described as follows:(i)OMD1 is the short text dataset about Obama-McCain debate, which was captured from the TV debate between Obama and McCain in 2008 US elections. It has 710 positive texts and 1,196 negative texts, which can be regarded as a dataset in political domain.(ii)SOMD2 is the Strict Obama-McCain Dataset. SOMD is another version of the OMD. It contains 569 positive and 347 negative tweets. It is also used as a dataset in political domain.(iii)HCR3 is a comment dataset on Healthcare Reform, which was constructed in 2010. The HCR dataset contains 1,286 tweets, of which 369 are positive text and 917 are negative text. It is used as a dataset in healthcare reform domain.(iv)SemEval20134 is a short text dataset on hot issues or products, which is dedicated to Twitter sentiment analysis. It contains 3,640 positive tweets and 1,458 negative tweets. It can be regarded as a general dataset related to people’s lives.(v)STS-Test5 is the Stanford Twitter Dataset, which includes the comments on hot issues. The texts in the STS-Test dataset are manually annotated; among them, 177 texts are negative and 182 texts are positive.

5.2. Evaluation Metrics

The following performance metrics are used in the experiments of this paper, including Accuracy and F1-measure. Accuracy is a common and intuitive evaluation index shown in equation (6), which indicates the proportion of the correctly classified number of texts to the total number of the texts:where a is the number of positive texts classified correctly, d is the number of negative texts classified correctly, b is the number of positive texts classified incorrectly, and c is the number of negative texts classified incorrectly.

However, in the cases when the dataset has an unbalanced distribution, the Accuracy cannot reflect the classifier’s performance. In this work, F1-measure is adopted as another indicator, which is the weighted average of Precision and Recall. F1-measure is defined as follows:where and .

5.3. The Parameters of Algorithms

According to the general experiences, we set the population size as 2,000 and the penalty factor as 60. Since the crossover ratio and the mutation ratio have important impact on the performance of the genetic algorithm, we determine them by an experiment way. The HCR dataset is selected as the test datasets because it has a lot of texts. F1-measure is used as the indicator for choosing and . The test results are shown in Table 2. According to Table 2, we set and . For Algorithm 2, we need to set the value of . According to our experience, we set for all datasets except for the STS-test. The main reason is that the labelled texts in the STS-test are very small. For this dataset, we set as 0.025.

5.4. Results and Analysis
5.4.1. The Result of Sentiment Lexicon

According to our scheme, the sentiment lexicon can be generated based on the training set, which is constructed by randomly selecting 80% corpora from the dataset. Here, we take the HCR dataset as an example and construct the sentiment lexicon for Healthcare Reform. In this sentiment lexicon, there are 1,787 unigrams, 1,772 bigrams, and 843 trigrams. Some unigrams, bigrams, and trigrams in the sentiment lexicon for Healthcare Reform are listed in Table 3. It can be seen from Table 3 that the word “obstruction” is used as a negative word in our sentiment lexicon. In general, the word “obstruction” does not be regarded as a sentiment word. In our sentiment lexicon, quite a few words are similar to the word “obstruction.” Thus, it also confirms that our scheme does provide an effective way to obtain sentiment words in the specific domain.

In the sentiment lexicon constructed based on the HCR dataset, there are 1,687 positive words and 2,500 negative words, which provide a good coverage of sentiment words. However, in the sentiment lexicon, there are 214 words that their sentiment values are 0. It means that these words are regarded as neutral words in our scheme, which can be removed from the lexicon. In our scheme, to improve the coverage of lexicon, we add candidate sentiment words according to the frequency of word appearing in the corpora. It is possible that some words that are not emotional words have also been selected into the sentiment lexicon. Finally, the intensity values of these words are set close to 0 by the optimization process. Thus, although some neutral words are selected into the sentiment lexicon at the beginning, these words still can be excluded by setting their intensity values, which guarantees the high quality of our sentiment lexicon. Nevertheless, it is worth further studying that how to select sentiment words from corpora.

The other sentiment lexicons constructed from other datasets have the similar situation to the sentiment lexicon based on the HCR dataset. Therefore, the experiment results confirm that our scheme can automatically construct sentiment words (unigrams and n-grams) for specific domains and reasonably set their intensity values.

5.4.2. Performance Analysis and Comparison

We firstly test the performances of Algorithm 1 and Algorithm 2 in the five datasets. Then, some different conditions are also considered in Algorithm 1 and Algorithm 2. Algorithm-SW means running the algorithms without considering stop words. Algorithm+1, 2, 3-grams gives the results by combining the algorithms and n-gram features. Algorithm + Lex computes the results by combining Bing Liu’s lexicon [26]. All the test results for each dataset are listed in Table 4.

In [16], the authors present a method to construct domain-specific sentiment lexicons based on unlabelled corpora. We implement this method and test its performances in Accuracy and F1-measure. The results are also listed in Table 3 for comparison. Moreover, some similar schemes [2729] are selected and tested in the same datasets. To save the article length, we only list the best result from literatures [2729], which is denoted as BRFL in Table 4.

According to Table 4, we can get that the best Accuracy and F1-measure of Algorithm 1 and Algorithm 2 are always better than the results in Ref. [16]. It means that using labelled corpora is beneficial to improve the quality of sentiment lexicon. In the four datasets, i.e., STS-Test, HCR, OMD, and SOMD, the best results of Algorithm 1 and Algorithm 2 are better than the best results of BRFL. Only in the dataset SemEval2013, the BRFL has a little better performance than Algorithm 1 and Algorithm 2.

Meanwhile, Algorithm 1 has a little better performance than Algorithm 2 in most cases. In our experiments, the unigrams and n-gram features are also considered. Here, the n-gram features include bigrams and trigrams. According to Table 4, we can get that the sentiment lexicon has better performance when combining the n-gram features. The test results also confirm that the n-gram features play an important role in sentiment analysis.

According to Table 4, we can get that combining Bing Liu’s lexicon with our scheme can achieve better performance in most cases. The main reason is that adding Bing Liu’s lexicon can increase the coverage of sentiment words. However, in some cases, combining Bing Liu’s lexicon with our scheme cannot improve the performance and even get worse results. The main reason is that for some sentiment words, their sentiment values are conflict between our lexicon and Bing Liu’s lexicon. Bing Liu’s lexicon is a general sentiment lexicon. The sentiment word in Bing Liu’s lexicon cannot reflect the traits of specific domain. Therefore, using the sentiment words in Bing Liu’s lexicon to replace the corresponding sentiment words in our lexicon will lower the performance of sentiment analysis. Therefore, how to combine different sentiment lexicons is a problem worth studying in the future.

In addition, Algorithm-SW shows the results of using the algorithm without stop words. We can see that the results get worse in this case. It also confirms the conclusion in [30] that omitting the stop words has the negative influence on sentiment analysis.

5.4.3. Adaptability Analysis

To test the adaptability of our scheme, we construct a Chinese dataset on stock market by collecting more than 4,000 short comments from some financial web forums, such as “https://www.weibo.com/.” Then, we manually annotated the texts in the dataset. The Chinese dataset on stock market is used to test whether our scheme is suitable to construct Chinese sentiment lexicon. We generate a Chinese sentiment lexicon based on this dataset and test the results on the Accuracy and F1-measure, which are shown in Table 5.

Moreover, some famous Chinese sentiment lexicons, such as DUTIR6 and TSING7, are also used to test the results in the same dataset and compare with our scheme. In [6], the sentiment lexicon is expanded based on a neural learning method combing dictionary lookup and polarity association. This method is also used to compared with our scheme. All the test results are also listed in Table 5 for better comparison.

According to Table 5, we can see that our scheme has better Accuracy and F1-measure than the other scheme. The test results show that our scheme is suitable for constructing Chinese sentiment lexicon, which also confirms that our scheme has good adaptability.

5.4.4. Efficiency Analysis

For Algorithm 1, only two individuals are updated in each iteration. It means that the speed of convergence is not very fast. The largest fitness value in the population is used as a criterion to evaluate the convergence. We use the HCR dataset as an example to evaluate the efficiency of the proposed algorithms. Figure 4 shows the convergence process of fitness value in Algorithm 1. It can be seen that Algorithm 1 is converged after 128,000 iterations. Since each iteration updates two individuals, the converging time can be regarded as the time for Algorithm 1 to update 256,000 individuals.

In Algorithm 2, the evolution of the population is carried out from generation to generation. The convergence process of Algorithm 2 is illustrated in Figure 5. Algorithm 2 is converged at the 140th generation. To compare the efficiency with Algorithm 1, we use the number of updating individuals as the criterion. For Algorithm 1, the total numbers of updating individuals are 256,000, respectively. In Algorithm 2, the population size is 2,000 and the elite rate is 0.1. So, the number of updating individuals in each generation is 1,800. In the 140 generations, the total number of updating individuals is 252,000. Comparing the number of updating individuals, Algorithm 2 has the higher efficiency than Algorithm 1.

6. Conclusion

Sentiment lexicon is an important component for text sentiment analysis. In this paper, we proposed a framework based on training and optimization to construct sentiment lexicon for a specific domain. According to this framework, our presented method provides a way to automatically generate sentiment words and their intensity values for a specific domain by using the labelled corpora. Especially, we design two genetic algorithms, which can find suitable value for sentiment words by optimizing the accuracy of sentiment classification. Since the fast development of Internet, it is not difficult to find plenty of texts in a specific domain, which provides enough corpuses for our method. Thus, our method can be easily implemented in practice and effectively construct the specific sentiment lexicon for a given domain. Our method does not depend on the domain knowledge and has good adaptability and universality. The test results in five datasets from different domains also confirm that the sentiment lexicons generated by our method has good performance and can effectively support the sentiment analysis of short texts.

The main limitation in our method is that we simply choose the words that appear twice in a short text as sentiment words. Quite a few words among them are not really sentiment words. Although most of the sentiment values of these word are close to 0 after optimization and their impacts are very weak, these words still have negative effect on the results and efficiency of sentiment classification. Thus, it is necessary to design a method on how to filter the unnecessary words.

In the future, we think it is still worth studying that how to merge multiple sentiment lexicons together to improve the coverage of sentiment words and avoid the conflicts between sentiment words. In addition, how to collect the high-quality corpus for a specific domain from the Internet is still worth studying.

Data Availability

The data used to support the findings of this study are available within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61876200), the MOELayout Foundation of Humanities and Social Sciences (no. 20YJAZH102), and the Chongqing Social Science Planning Project (no. K2015-59).