Semisupervised Learning Based Opinion Summarization and Classification for Online Product Reviews

Dalal, Mita K.; Zaveri, Mukesh A.

doi:https://doi.org/10.1155/2013/910706

Applied Computational Intelligence and Soft Computing

On this page

Abstract Introduction Related Work Results Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2013 | Article ID 910706 | https://doi.org/10.1155/2013/910706

Semisupervised Learning Based Opinion Summarization and Classification for Online Product Reviews

Mita K. Dalal¹and Mukesh A. Zaveri²

Academic Editor: Sebastian Ventura

Received23 Mar 2013

Revised25 Jun 2013

Accepted27 Jun 2013

Published16 Jul 2013

Abstract

The growth of E-commerce has led to the invention of several websites that market and sell products as well as allow users to post reviews. It is typical for an online buyer to refer to these reviews before making a buying decision. Hence, automatic summarization of users’ reviews has a great commercial significance. However, since the product reviews are written by nonexperts in an unstructured, natural language text, the task of summarizing them is challenging. This paper presents a semisupervised approach for mining online user reviews to generate comparative feature-based statistical summaries that can guide a user in making an online purchase. It includes various phases like preprocessing and feature extraction and pruning followed by feature-based opinion summarization and overall opinion sentiment classification. Empirical studies indicate that the approach used in the paper can identify opinionated sentences from blog reviews with a high average precision of 91% and can classify the polarity of the reviews with a good average accuracy of 86%.

1. Introduction

The Internet offers an effective, global platform for E-commerce, communication, and opinion sharing. It has several blogs devoted to diverse topics like finance, politics, travel, education, sports, entertainment, news, history, environment, and so forth. on which people frequently express their opinions in natural language. Mining through these terabytes of user review data is a challenging knowledge-engineering task. However, automatic opinion mining has several useful applications. Hence, in recent years researchers have proposed approaches for mining user-expressed opinions from several domains such as movie reviews [1], political debates [2], restaurant food reviews [3], and product reviews [4–11], and so forth Generating user-query specific summaries is also an interesting application of opinion mining [12, 13]. Our focus in this paper is efficient feature extraction, sentiment polarity classification, and comparative feature summary generation of online product reviews.

Nowadays, several websites are available on which a variety of products are advertised and sold. Prior to making a purchase an online shopper typically browses through several similar products of different brands before reaching a final decision. This seemingly simple information retrieval task actually involves a lot of feature-wise comparison and decision making, especially since all manufacturers advertise similar features and competitive prices for most products. However, most online shopping sites also allow users to post reviews of products purchased. There are also dedicated sites that post product reviews by experts as well as end users. These user reviews if appropriately classified and summarized can play an instrumental role in influencing a buyers’ decision.

The main difficulty in analyzing these online users’ reviews is that they are in the form of natural language. While natural language processing is inherently difficult, analyzing online unstructured textual reviews is even more difficult. Some of the major problems with processing unstructured text are dealing with spelling mistakes, incorrect punctuation, use of nondictionary words or slang terms, and undefined abbreviations. Often opinion is expressed in terms of partial phrases rather than complete grammatically correct sentences. So, the task of summarizing noisy, unstructured online reviews demands extensive Preprocessing [14].

In this paper, we apply a multistep approach to the problem of automatic opinion mining that consists of various phases like Preprocessing, Semantic feature-set extraction followed by opinion summarization and classification. The multiword [15–18] based approach for feature extraction used in the paper offers significant advantages over other contemporary approaches like the Apriori based approach [4, 6, 8, 10, 11, 19] and the seed-set expansion approach [1, 14]. Our approach significantly reduces the overhead of pruning compared to the Apriori-based approach and does not require prior domain knowledge for selecting an initial seed-set of features like the seed-set expansion approach. We have demonstrated empirically that the approach proposed in the paper can identify opinionated sentences from blog reviews with a high average precision of 91% outperforming the other two feature extraction strategies. The multistep approach can also classify the polarity of the reviews with a good average accuracy of 86%.

The rest of the paper is organized as follows. Section 2 explores related work in the area of opinion mining, Section 3 describes the strategy used for opinion mining, and Section 4 evaluates the efficiency of the strategy based on our experiments and obtained results. Finally, we conclude and discuss the scope for future work in this field.

Classification and summarization of online blog reviews are very important to the growth of E-commerce and social-networking applications. Earlier work on automatic text summarization has mainly focused on extraction of sentences that are more significant in comparison to others in a document corpus [20–23]. The main approaches used to generate extractive summaries are combinations of heuristics such as cue words, key words, title words or position [20–23] (2) lexical chains [24], and (3) rhetorical parsing theory [25].

However, it is important to note that the task of summarizing online product reviews is very different from traditional text summarization, as it does not involve extracting significant sentences from the source text. Instead, while summarizing user reviews, the aim is to first of all identify semantic features of products and next to generate a comparative summary of products based on feature-wise sentiment classification of the reviews that will guide the user in making a buying decision. In [26], the authors have demonstrated that traditional unsupervised text classification techniques like naïve Bayes, maximum entropy, and support vector machine do not perform well on sentiment or opinion classification and pointed out the necessity for feature-oriented classification. Thus, recent research work in opinion mining has focused on feature based extraction and summarization [5–8, 10, 11, 14, 19].

Opinion mining from users’ reviews involves two main tasks— identification of the opinion feature set and (2) sentiment analysis of users’ opinions based on the identified features.

It has been observed that nouns and noun phrases (N and NP) frequently occurring in reviews are useful opinion features, while the adjectives and adverbs describing them are useful in classifying sentiment [4, 5, 9, 14, 27, 28].

In order to extract nouns, noun phrases, and adjectives from review text, parts-of-speech (POS) tagging [1, 4, 8, 14, 19, 28] is performed. However, all nouns and noun phrases are not useful in mining and cannot directly be included in the feature set. So, the feature set is subsequently extracted using approaches that involve frequency analysis and/or use of domain knowledge as is discussed next.

A popular approach for mining product features from reviews which has been used by several researchers is applying the Apriori algorithm [4, 6, 8, 10, 11, 19]. Now, the primary application of the Apriori algorithm proposed by Agrawal and Srikant [29] is market basket analysis, that is, to find out which products are frequently purchased together and to generate association rules on that basis. The advantage of applying this method for mining product features (frequently occurring N and NP) is that the initial frequent features set can be mined automatically. However, the disadvantage is that it treats individual words as transactional items and does not take into account the sequence in which they occur; thus, the semantic sense is lost, and it requires extensive pruning [4, 6, 8, 10, 11, 19] to remove redundant or incorrect features. Another approach is to specify a seed list of features which is subsequently expanded to generate a more extensive feature set. For example, in [1], the authors have generated an ontological features set for movies by choosing a seed set of features and expanding it using the conjunction rule [1, 30]. The seed-set expansion approach has also been used for feature-extraction from product reviews [14]. However, this approach requires some prior domain knowledge in order to specify the initial seed-set of features.

Various methods exist in the literature to associate features with their corresponding descriptors. Hu and Liu proposed the nearby-adjective heuristic [4, 19]. Although this method is simple and fast, it may result in inaccuracies. So, supervised approaches to determine association have been proposed in recent years such as syntactic dependency parsing [7] and syntactic tree templates [31].

Once the feature set is finalized, sentiment analysis can be performed on the users’ reviews. The orientation of the adjectives describing the elements of the extracted feature set is useful in performing the sentiment analysis. Earlier attempts at determining the semantic orientation of adjectives relied upon the use of supervised learning which involved frequency analysis and clustering on a large manually tagged corpus [27]. In [28], authors used the PMI statistic (pointwise mutual information) that predicted the orientation of an adjective based on its co-occurrence with words “excellent” or “poor.” However, since the adjective descriptors used for different products differ widely, it is not possible to achieve uniform accuracy using this technique. An alternative approach to determine the polarity of opinion words involves using an initial list of adjectives with known orientations [4, 19], which is subsequently expanded by looking up its synonyms and antonyms using the lexical resource WordNet [7, 32]. More recently, researchers have used the opinion mining tool SentiWordNet [1, 5, 33] to assist in the task of determining orientation of opinions in sentiment mining. In this paper, we have also used the SentiWordNet tool [1, 5, 33] for classifying the overall orientation of user reviews, with satisfactory results.

3. Proposed Opinion Summarizer and Classifier

In this section we will explain the system design of the opinion summarizer cum classifier implemented by us.

We generated an opinion review database by crawling some popular websites that categorically post product reviews by actual users. As shown in Figure 1, our product opinion summarizer has three main phases. These phases are preprocessing phase, (2) feature extraction phase, and (3) opinion summarization and classification phase. These phases are briefly described next.

3.1. Preprocessing Phase

Online blog reviews posted by users frequently contain spelling errors and incorrect punctuation. Our next phase—the feature-extraction phase—requires parts-of-speech tagging which works at the sentence level. Thus, it becomes important to detect end of sentences. So, in this phase we performed basic cleaning tasks like sentence boundary detection and spell-error correction. Sentences normally end with punctuations like period (.), question mark (?), or exclamation mark (!). Sometimes bloggers overuse the “?” and “!” symbols for emphasis. For example, a blogger may post a review that says

“It’s surprising that the ebook reader does not have a touch screen !!!!”

In such cases we conflate the repetitive punctuation symbols to a single occurrence (i.e., “!!!!” is replaced by a single “!”).

Several other considerations arise during the Preprocessing phase. The period (.) requires to be disambiguated as it may mean a full stop or a decimal point or an abbreviation (e.g., “Dr.,” “Ltd.”). Sometimes a single sentence straddles multiple lines as the user presses unnecessary return keys. In such cases we apply the sentence merge rules as proposed by Dey and Haque [14]. After sentence boundary detection, we perform spell-error correction using a word processor.

3.2. Feature Extraction Phase

In this phase we extract opinion features from the pre-processed review text obtained from the previous phase. We treat frequently occurring nouns (N) and noun phrases (NP) as possible opinion features and associated adjectives describing them as indicators of their opinion orientation.

We perform parts-of-speech (POS) tagging on the review sentences using the Link Grammar Parser [34]. The Link Grammar Parser is a well-known and efficient syntactic parser for English language (http://www.abisource.com/projects/link-grammar/). First, we extract all nouns (N) and noun phrases (NP) tagged by the Link Grammar Parser and identify the frequently occurring N and NP as possible opinion features. By frequently occurring N and NP we mean those Ns and NP which occur atleast five times in the users’ reviews. We do not extract frequent itemsets from review sentence database using the Apriori based approach [4, 6, 8, 10, 11, 19], since this method mines frequent features using a BOW (bag-of-words) approach and does not take into account the order in which the words of a phrase occur. Moreover, mining in this way would require ordering besides compactness and redundancy pruning [4, 8, 19]. We also do not use the seed-set expansion approach as it would require prior domain knowledge to specify a seed set [1, 14]. Instead we generate a frequent feature set using the multiword approach [15–18].

A multiword is an ordered sequence of words that has a higher semantic significance than the individual words comprising it. For example, “face time camera,” “retina display,” “wireless connectivity,” and “quad core graphics” are some of the multiwords extracted from user reviews of Tablet computers. Frequently occurring single words are also added to the feature set when they are not already subsets of existing multiwords. Stemming is performed on noun features expressed in plural to convert them to singular expression in order to improve their chances of matching [8]. As an example of stemming the feature word “processors” is stemmed to “processor.”

Along with each feature we also store the list of adjectives describing them and any opinion modifiers if present (such as “not”) preceding the identified descriptor in the review sentence. For example, consider the following parsed review sentence about a product (Tablet):

“The processor[.n] is[.v] significantly faster[.a], and the text[.n] is[.v] clear[.a].”

In the previous sentence, “.n” indicates noun, “.v” indicates verb, and “.a” indicates adjective. In this sentence, the nouns “processor” and “text” are opinion features while “faster” and “clear” are, respectively, the adjectives describing them.

While extracting multiword opinion features, it is possible that some multiword is a substring of another. For example, suppose that both the multiwords “Nexus 7 front camera” and “front camera” have been extracted as frequent features from our review database of Tablets. In such cases, we adopt the decomposition strategy [16], which favors a shorter feature compared to a longer one. Thus, the decomposition strategy pruning approach favors the more generic “front camera” as an opinion feature and discards the longer multiword. This is done due to two reasons. The first reason is that we want our opinion features to be as generic as possible over a product range. The second reason is based on our observation that bloggers who post reviews online are not experts and they prefer to describe products using shorter multiword features over longer ones.

Since all the frequent N and NP extracted using the above described strategy are not actual semantic features, the frequent feature set mined in this way is scrutinized by a human expert, and non-features are pruned. (Note however that the pruning overhead is much less compared to Apriori-based approach as shown empirically in Section 4). Moreover, semantically similar features need to be clubbed together manually in order to improve the accuracy of feature-based summarization. For example, as shown in Table 1, the features “screen/display/touchscreen” although extracted separately actually allude to the same feature. Thus, our final feature-set is generated using a semi-supervised approach.

In order to allow for the large language vocabulary of bloggers, we also enhance our final feature set by looking up and adding synonyms of the extracted features using the online lexical resource WordNet 3.1 (http://wordnet.princeton.edu/).

3.3. Opinion Summarization and Classification Phase

In the previous phase we extracted opinion features, adjectives describing them, and any modifiers if present. We also generate a statistical feature-wise summary for each product which enables comparison of different brands selling similar products. In order to determine the sentiment polarity of an adjective describing an opinion feature we make use of SentiWordNet [1, 5, 33] which is a lexical resource for opinion mining. SentiWordNet assigns three normalized sentiment scores: positivity, objectivity, and negativity to each synset of WordNet [7, 32]. Let us revisit the review sentence:

“The processor[.n] is[.v] significantly faster[.a], and the text[.n] is[.v] clear[.a].”

In this example, the SentiWordNet scores assigned to the appropriate usage of adjective clear is indicated as (P: 0.625; O: 0.375; N: 0). Since the value of the positive polarity is highest, the adjective “clear” can be assigned a positive polarity. In this way, we generate a feature-orientation table (FO table) that records the opinion features and their corresponding descriptors of positive and negative polarities. The Table 1 shows the FO table entries for some of the features of product “Tablet.” The FO table, thus generated, enables us to generate feature-wise summary of a product or comparative summaries of different brands of similar products. For example, Figure 2 compares two different models of Tablet computers based on feature-wise summary generated from several online user reviews.

Intuitively it is desirable that if a user’s opinion contains more number of features with positive polarity compared to the number of features with negative polarity it should be classified as positive. Likewise more number of negative polarity features should cause a user opinion to be classified as negative. In order to achieve this, we have simply calculated the normalized positive bias (Pos_Bias) as explained next. Note that a user opinion contains one or more sentences opining on one or more features. Let be the total number of features explicitly mentioned in a user opinion. Let be the number of features tagged positive using the FO table, and let be the number of features tagged as negative. Now, we calculate the term Pos_Bias which indicates the normalized positive bias of the opinion as indicated in the following:

The value of Pos_Bias falls in the range [−1, 1]. If the value of Pos_Bias is positive, the opinion is classified as positive. If its value is negative, the opinion is classified as negative, and if its value is zero the opinion is classified as neutral. Although we have performed a simplistic classification at this stage, the range of the Pos_Bias variable naturally lends itself to classification using different granularities and can be used for fuzzy classification.

4. Empirical Evaluation and Results

We collected over 1400 online reviews for four Tablets and three E-book Readers (approximately 200 reviews for each product) of leading brands from several popular review websites using a web crawler. The list of products is indicated in Table 2. We applied Preprocessing steps like sentence boundary detection, spell-error correction, and repetitive punctuation conflation on the review dataset as explained earlier in the preprocessing phase.

For each product category, we used 70% of the review sentences for training and 30% for testing. We obtained the feature set as explained in the feature extraction phase, determined the polarity (+/−) of the opinion words using SentiWordNet 3.0 (http://sentiwordnet.isti.cnr.it/), and generated the feature orientation table. The FO table entries were then used to identify opinionated sentences from user reviews and finally to classify their polarity as positive or negative as explained earlier.

We evaluated the effectiveness of the opinion mining strategy proposed in Section 3 at two tasks: automatically identifying opinionated sentences based on extracted features and (2) classifying the polarity of the users’ opinions.

In order to demonstrate the effectiveness of our multiword based feature extraction approach we have compared it with two other popular approaches: the Apriori based approach [4, 6, 8, 10, 11, 19] for initial feature set extraction and (2) the seed-set expansion approach [1, 14]. Since the seed-set expansion approach requires an initial list of features, we used an input of 10 manually selected and verified features for both product categories and then expanded it using the Word Expansion Algorithm as in [14].

Figure 3 compares the three approaches on the basis of the initial number of features extracted as compared to the actual number of usable semantic features as verified by a human expert. In Figure 3, Strategy 1 indicates the initial feature set retrieved using the Apriori based approach (before pruning), while Strategy 2 indicates the seed-set expansion approach. Strategy 3 indicates the feature-set extraction approach used by us as explained in the feature extraction phase of Section 3 (i.e., frequent single words + multiwords with decomposition pruning).

It becomes clear from Figure 3 that Strategy 3 performs better than both Strategy 1 and Strategy 2 since its initial feature-set is closer to the ideal feature set.

The main drawback of Strategy 1 (Apriori-based approach) is that it treats individual words as tokens following a bag-of-words approach and does not consider their order of appearance while mining frequent itemsets. So, it contains several redundant features and requires extensive pruning. The problem with Strategy 2 (seed-set expansion approach) is that it failed to extract sufficient features from the review dataset in spite of being given an initial seed set of relevant features. In contrast, Strategy 3 (multiwords based approach) takes into account the order of words from the beginning so it has a higher semantic sense and lesser redundancy. This method also does not require any prior domain knowledge. Thus it is superior to both the previous strategies. This is also proved by the empirical results shown in Table 3.

Table 3 compares the accuracy of the three feature extraction techniques when used for the task of automatically identifying opinionated sentences from the test set of user reviews. In order to perform the testing, opinionated sentences in the test set were first manually extracted and the polarity of the corresponding opinion features was tagged by human expert. Then, the same test set was subjected to automatic opinion mining using feature sets derived using the three strategies discussed previously. The comparative accuracy of opinionated sentence identification was recorded in terms of Precision, Recall and F-Measure as shown in Table 3.

The results of Table 3 show that our strategy outperforms the other two strategies. While the seed-set based approach achieved high precision (89%), its recall value is quite poor (65%). The primary reason for this is that the seed-set approach failed to identify several correct features from the review data set. The Apriori approach (with feature set pruned using the compactness and redundancy pruning [8, 19]) achieved good precision of 87% and recall value of 75%. However, the multiword based technique performed best with 91% precision and 78% recall. The reason for low recall rates is mainly due to the fact that implicitly expressed opinions could not be automatically identified by looking up the FO table.

For example, consider the following sentence referring to a Tablet:

“Since I bought it, I have not had to use my computer in weeks.”

A human expert would tag the aforementioned review sentence as a positive opinion, but there is no mention of any specific feature or adjective descriptor in the sentence, so it could not be automatically identified as a positive review using the FO table. Thus, lower recall was mainly due to false negatives in the opinion feature identification phase.

The accuracy of opinion polarity classification task is directly affected by the accuracy of opinion bearing sentence identification, because only the sentences identified as opinionated could be used for classification purposes. However, we achieved a high average accuracy 86% when classifying the polarity of opinions over the test set as shown in Table 4. Thus, the empirical results of the opinion mining technique are encouraging.

At present we have only classified opinions as positive or negative based on the Pos_Bias term as explained in the previous section. However, in the future we would like to take into account accelerators and decelerators [14] that enhance the effect of the adjectives determining opinion orientation and use it for fuzzy polarity classification. For example, consider the following reviews.(1)“The wireless connectivity is extremely good.”(2)“The wireless connectivity is good.”

In both the previous examples, the multiword “wireless connectivity” is a feature while “good” is the adjective describing it. Although both sentences have a positive orientation, the word “extremely” in the first sentence acts as an accelerator which communicates the positive sense more strongly. Incorporating the effect of such linguistic hedges [35, 36] can improve the result of opinion mining.

5. Conclusion and Future Work

Classifying and summarizing opinions of bloggers has several interesting and commercially significant applications. However, this task is much more difficult than classifying regular text and requires intensive Preprocessing. The success of the opinion mining task is mainly dependent on the efficiency and sophistication of the Preprocessing and feature extraction steps. We empirically proved that the proposed approach for product feature set extraction, that is, using frequent multiwords with decomposition strategy outperforms other contemporary approaches like the Apriori-based approach and the seed-set expansion approach.

Empirical results indicate that the multistep feature-based semisupervised opinion mining approach used in this paper can successfully identify opinionated sentences from unstructured user reviews and classify their orientation with acceptable accuracy. This enables reliable review opinion summarization which has several commercially important applications.

In the future, we want to perform opinion mining on larger and more varied blog data sets. We would also like to extend our work to fuzzy opinion classification with support for fuzzy user querying. We intend to do this by learning the strength of various adjective descriptors along with corresponding linguistic hedges and include them in the feature-orientation table generated during the mining process. The classification technique proposed in the paper can be naturally extended to support fuzzy classification.

References

L. Zhao and C. Li, “Ontology based opinion mining for movie reviews,” in Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management, pp. 204–214, 2009.
View at: Google Scholar
A. Balahur, Z. Kozareva, and A. Montoyo, “Determining the polarity and source of opinions expressed in political debates,” in Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics, vol. 5449 of Lecture Notes in Computer Science, pp. 468–480, Springer, 2009.
View at: Google Scholar
Y. H. Gu and S. J. Yoo, “Mining popular menu items of a restaurant from web reviews,” in Proceedings of the International Conference on Web Information Systems and Mining (WISM '11), vol. 6988 of Lecture Notes in Computer Science, pp. 242–250, Springer, 2011.
View at: Google Scholar
M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘04), pp. 168–177, August 2004.
View at: Google Scholar
M. A. Jahiruddin, M. N. Doja, and T. Ahmad, “Feature and opinion mining for customer review summarization,” in Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence (PReMI ‘09), vol. 5909 of Lecture Notes in Computer Science, pp. 219–224, 2009.
View at: Google Scholar
S. Shi and Y. Wang, “A product features mining method based on association rules and the degree of property co-occurrence,” in Proceedings of the International Conference on Computer Science and Network Technology (ICCSNT '11), pp. 1190–1194, December 2011.
View at: Publisher Site | Google Scholar
S. Huang, X. Liu, X. Peng, and Z. Niu, “Fine-grained product features extraction and categorization in reviews opinion mining,” in Proceedings of the 12th IEEE International Conference on Data Mining Workshops (ICDMW '12), pp. 680–686, 2012.
View at: Google Scholar
C.-P. Wei, Y.-M. Chen, C.-S. Yang, and C. C. Yang, “Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews,” Information Systems and e-Business Management, vol. 8, no. 2, pp. 149–167, 2010.
View at: Publisher Site | Google Scholar
A.-M. Popescu and O. Etzioni, “Extracting product features and opinions from reviews,” in Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP '05), pp. 339–346, October 2005.
View at: Google Scholar
H. Zhang, Z. Yu, M. Xu, and Y. Shi, “Feature-level sentiment analysis for Chinese product reviews,” in Proceedings of the IEEE 3rd International Conference on Computer Research and Development (ICCRD '11), vol. 2, pp. 135–140, March 2011.
View at: Publisher Site | Google Scholar
W. Y. Kim, K. I. Kim, J. S. Ryu, and U. M. Kim, “A method for opinion mining of product reviews using association rules,” in Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human (ICIS '09), pp. 270–274, November 2009.
View at: Publisher Site | Google Scholar
O. Feiguina and G. Lapalme, “Query-based summarization of customer reviews,” in Proceedings of the 20th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, vol. 4509 of Lecture Notes in Artificial Intelligence, pp. 452–463, Springer, 2007.
View at: Google Scholar
F. Jin, M. Huang, and X. Zhu, “A query-specific opinion summarization system,” in Proceedings of the 8th IEEE International Conference on Cognitive Informatics (ICCI '09), pp. 428–433, June 2009.
View at: Publisher Site | Google Scholar
L. Dey and S. M. Haque, “Opinion mining from noisy text data,” International Journal on Document Analysis and Recognition, vol. 12, no. 3, pp. 205–226, 2009.
View at: Publisher Site | Google Scholar
K. W. Church and P. Hanks, “Word association norms, mutual information and lexicography,” Computational Linguistics, vol. 16, no. 1, pp. 22–29, 1990.
View at: Google Scholar
W. Zhang, T. Yoshida, and X. Tang, “Text classification using multi-word features,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC '07), pp. 3519–3524, October 2007.
View at: Publisher Site | Google Scholar
M. K. Dalal and M. A. Zaveri, “Automatic text classification of sports blog data,” in Proceedings of the Computing, Communications and Applications Conference (ComComAp '12), pp. 219–222, January 2012.
View at: Publisher Site | Google Scholar
W. Zhang, T. Yoshida, and X. Tang, “TFIDF, LSI and multi-word in information retrieval and text categorization,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '08), pp. 108–113, October 2008.
View at: Publisher Site | Google Scholar
M. Hu and B. Liu, “Mining opinion features in customer reviews,” in Proceedings of the 19th National Conference on Artifical Intelligence (AAAI '04), pp. 755–760, San Jose, Calif, USA, July 2004.
View at: Google Scholar
H. P. Luhn, “The automatic creation of literature abstracts,” IBM Journal of Research and Development, vol. 2, pp. 159–165, 1958.
View at: Google Scholar
H. P. Edmundson, “New methods in automatic extracting,” Journal of the ACM, vol. 16, pp. 264–285, 1969.
View at: Google Scholar
C. Y. Lin and E. H. Hovy, “Manual and automatic evaluation of summaries,” in Proceedings of the ACL-02 Workshop on Automatic Summarization, vol. 4, pp. 45–51, 2002.
View at: Google Scholar
C. Y. Lin and E. H. Hovy, “Identifying topics by position,” in Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 283–290, 1997.
View at: Google Scholar
R. Barzilay and M. Elhadad, “Using lexical chains for text summarization,” in Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, pp. 10–17, 1997.
View at: Google Scholar
D. Marcu, “Improving summarization through rhetorical parsing tuning,” in Proceedings of the 6th Workshop on Very Large Corpora, pp. 206–215, 1998.
View at: Google Scholar
B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP '02), pp. 79–86, 2002.
View at: Google Scholar
V. Hatzivassiloglou and K. Mckeown, “Predicting the semantic orientation of adjectives,” in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (ACL ‘98), pp. 174–181, 1998.
View at: Google Scholar
P. D. Turney, “Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424, 2002.
View at: Google Scholar
R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499, 1994.
View at: Google Scholar
H. Kanayama and T. Nasukawa, “Fully automatic lexicon expansion for domain-oriented sentiment analysis,” in Proceedings of the 11th Conference on Empirical Methods in Natural Language Proceessing (EMNLP '06), pp. 355–363, July 2006.
View at: Google Scholar
L. Wu, Y. Zhou, F. Tan, F. Yang, and J. Li, “Generating syntactic tree templates for feature-based opinion mining,” in Proceedings of the 7th International Conference on Advanced Data Mining and Applications (ADMA '11), vol. 7121 of Lecture Notes in Artificial Intelligence, pp. 1–12, Springer, 2011.
View at: Google Scholar
G. A. Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
View at: Google Scholar
S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3. 0: an enhanced lexical resource for sentiment analysis and opinion mining,” in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC '10), pp. 2200–2204, 2010.
View at: Google Scholar
D. Sleator and D. Temperley, “Parsing English with a link grammar,” in Proceedings of the 3rd International Workshop on Parsing Technologies, pp. 1–14, 1993.
View at: Google Scholar
V. N. Huynh, T. B. Ho, and Y. Nakamori, “A parametric representation of linguistic hedges in Zadeh's fuzzy logic,” International Journal of Approximate Reasoning, vol. 30, no. 3, pp. 203–223, 2002.
View at: Publisher Site | Google Scholar
T. Zamali, M. A. Lazim, and M. T. A. Osman, “Sensitivity analysis using fuzzy linguistic hedges,” in Proceedings of the IEEE Symposium on Humanities, Science and Engineering Research, pp. 669–672, 2012.
View at: Google Scholar

Copyright

Copyright © 2013 Mita K. Dalal and Mukesh A. Zaveri. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3173

Downloads

2324

Citations

Applied Computational Intelligence and Soft Computing

Semisupervised Learning Based Opinion Summarization and Classification for Online Product Reviews

Abstract

1. Introduction

2. Related Work

3. Proposed Opinion Summarizer and Classifier

3.1. Preprocessing Phase

3.2. Feature Extraction Phase

3.3. Opinion Summarization and Classification Phase

4. Empirical Evaluation and Results

5. Conclusion and Future Work

References

Copyright