Abstract

The growth of E-commerce has led to the invention of several websites that market and sell products as well as allow users to post reviews. It is typical for an online buyer to refer to these reviews before making a buying decision. Hence, automatic summarization of users’ reviews has a great commercial significance. However, since the product reviews are written by nonexperts in an unstructured, natural language text, the task of summarizing them is challenging. This paper presents a semisupervised approach for mining online user reviews to generate comparative feature-based statistical summaries that can guide a user in making an online purchase. It includes various phases like preprocessing and feature extraction and pruning followed by feature-based opinion summarization and overall opinion sentiment classification. Empirical studies indicate that the approach used in the paper can identify opinionated sentences from blog reviews with a high average precision of 91% and can classify the polarity of the reviews with a good average accuracy of 86%.

1. Introduction

The Internet offers an effective, global platform for E-commerce, communication, and opinion sharing. It has several blogs devoted to diverse topics like finance, politics, travel, education, sports, entertainment, news, history, environment, and so forth. on which people frequently express their opinions in natural language. Mining through these terabytes of user review data is a challenging knowledge-engineering task. However, automatic opinion mining has several useful applications. Hence, in recent years researchers have proposed approaches for mining user-expressed opinions from several domains such as movie reviews [1], political debates [2], restaurant food reviews [3], and product reviews [411], and so forth Generating user-query specific summaries is also an interesting application of opinion mining [12, 13]. Our focus in this paper is efficient feature extraction, sentiment polarity classification, and comparative feature summary generation of online product reviews.

Nowadays, several websites are available on which a variety of products are advertised and sold. Prior to making a purchase an online shopper typically browses through several similar products of different brands before reaching a final decision. This seemingly simple information retrieval task actually involves a lot of feature-wise comparison and decision making, especially since all manufacturers advertise similar features and competitive prices for most products. However, most online shopping sites also allow users to post reviews of products purchased. There are also dedicated sites that post product reviews by experts as well as end users. These user reviews if appropriately classified and summarized can play an instrumental role in influencing a buyers’ decision.

The main difficulty in analyzing these online users’ reviews is that they are in the form of natural language. While natural language processing is inherently difficult, analyzing online unstructured textual reviews is even more difficult. Some of the major problems with processing unstructured text are dealing with spelling mistakes, incorrect punctuation, use of nondictionary words or slang terms, and undefined abbreviations. Often opinion is expressed in terms of partial phrases rather than complete grammatically correct sentences. So, the task of summarizing noisy, unstructured online reviews demands extensive Preprocessing [14].

In this paper, we apply a multistep approach to the problem of automatic opinion mining that consists of various phases like Preprocessing, Semantic feature-set extraction followed by opinion summarization and classification. The multiword [1518] based approach for feature extraction used in the paper offers significant advantages over other contemporary approaches like the Apriori based approach [4, 6, 8, 10, 11, 19] and the seed-set expansion approach [1, 14]. Our approach significantly reduces the overhead of pruning compared to the Apriori-based approach and does not require prior domain knowledge for selecting an initial seed-set of features like the seed-set expansion approach. We have demonstrated empirically that the approach proposed in the paper can identify opinionated sentences from blog reviews with a high average precision of 91% outperforming the other two feature extraction strategies. The multistep approach can also classify the polarity of the reviews with a good average accuracy of 86%.

The rest of the paper is organized as follows. Section 2 explores related work in the area of opinion mining, Section 3 describes the strategy used for opinion mining, and Section 4 evaluates the efficiency of the strategy based on our experiments and obtained results. Finally, we conclude and discuss the scope for future work in this field.

Classification and summarization of online blog reviews are very important to the growth of E-commerce and social-networking applications. Earlier work on automatic text summarization has mainly focused on extraction of sentences that are more significant in comparison to others in a document corpus [2023]. The main approaches used to generate extractive summaries are combinations of heuristics such as cue words, key words, title words or position [2023] (2) lexical chains [24], and (3) rhetorical parsing theory [25].

However, it is important to note that the task of summarizing online product reviews is very different from traditional text summarization, as it does not involve extracting significant sentences from the source text. Instead, while summarizing user reviews, the aim is to first of all identify semantic features of products and next to generate a comparative summary of products based on feature-wise sentiment classification of the reviews that will guide the user in making a buying decision. In [26], the authors have demonstrated that traditional unsupervised text classification techniques like naïve Bayes, maximum entropy, and support vector machine do not perform well on sentiment or opinion classification and pointed out the necessity for feature-oriented classification. Thus, recent research work in opinion mining has focused on feature based extraction and summarization [58, 10, 11, 14, 19].

Opinion mining from users’ reviews involves two main tasks— identification of the opinion feature set and (2) sentiment analysis of users’ opinions based on the identified features.

It has been observed that nouns and noun phrases (N and NP) frequently occurring in reviews are useful opinion features, while the adjectives and adverbs describing them are useful in classifying sentiment [4, 5, 9, 14, 27, 28].

In order to extract nouns, noun phrases, and adjectives from review text, parts-of-speech (POS) tagging [1, 4, 8, 14, 19, 28] is performed. However, all nouns and noun phrases are not useful in mining and cannot directly be included in the feature set. So, the feature set is subsequently extracted using approaches that involve frequency analysis and/or use of domain knowledge as is discussed next.

A popular approach for mining product features from reviews which has been used by several researchers is applying the Apriori algorithm [4, 6, 8, 10, 11, 19]. Now, the primary application of the Apriori algorithm proposed by Agrawal and Srikant [29] is market basket analysis, that is, to find out which products are frequently purchased together and to generate association rules on that basis. The advantage of applying this method for mining product features (frequently occurring N and NP) is that the initial frequent features set can be mined automatically. However, the disadvantage is that it treats individual words as transactional items and does not take into account the sequence in which they occur; thus, the semantic sense is lost, and it requires extensive pruning [4, 6, 8, 10, 11, 19] to remove redundant or incorrect features. Another approach is to specify a seed list of features which is subsequently expanded to generate a more extensive feature set. For example, in [1], the authors have generated an ontological features set for movies by choosing a seed set of features and expanding it using the conjunction rule [1, 30]. The seed-set expansion approach has also been used for feature-extraction from product reviews [14]. However, this approach requires some prior domain knowledge in order to specify the initial seed-set of features.

Various methods exist in the literature to associate features with their corresponding descriptors. Hu and Liu proposed the nearby-adjective heuristic [4, 19]. Although this method is simple and fast, it may result in inaccuracies. So, supervised approaches to determine association have been proposed in recent years such as syntactic dependency parsing [7] and syntactic tree templates [31].

Once the feature set is finalized, sentiment analysis can be performed on the users’ reviews. The orientation of the adjectives describing the elements of the extracted feature set is useful in performing the sentiment analysis. Earlier attempts at determining the semantic orientation of adjectives relied upon the use of supervised learning which involved frequency analysis and clustering on a large manually tagged corpus [27]. In [28], authors used the PMI statistic (pointwise mutual information) that predicted the orientation of an adjective based on its co-occurrence with words “excellent” or “poor. However, since the adjective descriptors used for different products differ widely, it is not possible to achieve uniform accuracy using this technique. An alternative approach to determine the polarity of opinion words involves using an initial list of adjectives with known orientations [4, 19], which is subsequently expanded by looking up its synonyms and antonyms using the lexical resource WordNet [7, 32]. More recently, researchers have used the opinion mining tool SentiWordNet [1, 5, 33] to assist in the task of determining orientation of opinions in sentiment mining. In this paper, we have also used the SentiWordNet tool [1, 5, 33] for classifying the overall orientation of user reviews, with satisfactory results.

3. Proposed Opinion Summarizer and Classifier

In this section we will explain the system design of the opinion summarizer cum classifier implemented by us.

We generated an opinion review database by crawling some popular websites that categorically post product reviews by actual users. As shown in Figure 1, our product opinion summarizer has three main phases. These phases are preprocessing phase, (2) feature extraction phase, and (3) opinion summarization and classification phase. These phases are briefly described next.

3.1. Preprocessing Phase

Online blog reviews posted by users frequently contain spelling errors and incorrect punctuation. Our next phase—the feature-extraction phase—requires parts-of-speech tagging which works at the sentence level. Thus, it becomes important to detect end of sentences. So, in this phase we performed basic cleaning tasks like sentence boundary detection and spell-error correction. Sentences normally end with punctuations like period (.), question mark (?), or exclamation mark (!). Sometimes bloggers overuse the “?” and “!” symbols for emphasis. For example, a blogger may post a review that says

“It’s surprising that the ebook reader does not have a touch screen !!!!”

In such cases we conflate the repetitive punctuation symbols to a single occurrence (i.e., “!!!!” is replaced by a single “!”).

Several other considerations arise during the Preprocessing phase. The period (.) requires to be disambiguated as it may mean a full stop or a decimal point or an abbreviation (e.g., “Dr.,” “Ltd.”). Sometimes a single sentence straddles multiple lines as the user presses unnecessary return keys. In such cases we apply the sentence merge rules as proposed by Dey and Haque [14]. After sentence boundary detection, we perform spell-error correction using a word processor.

3.2. Feature Extraction Phase

In this phase we extract opinion features from the pre-processed review text obtained from the previous phase. We treat frequently occurring nouns (N) and noun phrases (NP) as possible opinion features and associated adjectives describing them as indicators of their opinion orientation.

We perform parts-of-speech (POS) tagging on the review sentences using the Link Grammar Parser [34]. The Link Grammar Parser is a well-known and efficient syntactic parser for English language (http://www.abisource.com/projects/link-grammar/). First, we extract all nouns (N) and noun phrases (NP) tagged by the Link Grammar Parser and identify the frequently occurring N and NP as possible opinion features. By frequently occurring N and NP we mean those Ns and NP which occur atleast five times in the users’ reviews. We do not extract frequent itemsets from review sentence database using the Apriori based approach [4, 6, 8, 10, 11, 19], since this method mines frequent features using a BOW (bag-of-words) approach and does not take into account the order in which the words of a phrase occur. Moreover, mining in this way would require ordering besides compactness and redundancy pruning [4, 8, 19]. We also do not use the seed-set expansion approach as it would require prior domain knowledge to specify a seed set [1, 14]. Instead we generate a frequent feature set using the multiword approach [1518].

A multiword is an ordered sequence of words that has a higher semantic significance than the individual words comprising it. For example, “face time camera,” “retina display,” “wireless connectivity,” and “quad core graphics” are some of the multiwords extracted from user reviews of Tablet computers. Frequently occurring single words are also added to the feature set when they are not already subsets of existing multiwords. Stemming is performed on noun features expressed in plural to convert them to singular expression in order to improve their chances of matching [8]. As an example of stemming the feature word “processors” is stemmed to “processor.

Along with each feature we also store the list of adjectives describing them and any opinion modifiers if present (such as “not”) preceding the identified descriptor in the review sentence. For example, consider the following parsed review sentence about a product (Tablet):

“The processor[.n] is[.v] significantly faster[.a], and the text[.n] is[.v] clear[.a].”

In the previous sentence, “.n” indicates noun, “.v” indicates verb, and “.a” indicates adjective. In this sentence, the nouns “processor” and “text” are opinion features while “faster” and “clear” are, respectively, the adjectives describing them.

While extracting multiword opinion features, it is possible that some multiword is a substring of another. For example, suppose that both the multiwords “Nexus 7 front camera” and “front camera” have been extracted as frequent features from our review database of Tablets. In such cases, we adopt the decomposition strategy [16], which favors a shorter feature compared to a longer one. Thus, the decomposition strategy pruning approach favors the more generic “front camera” as an opinion feature and discards the longer multiword. This is done due to two reasons. The first reason is that we want our opinion features to be as generic as possible over a product range. The second reason is based on our observation that bloggers who post reviews online are not experts and they prefer to describe products using shorter multiword features over longer ones.

Since all the frequent N and NP extracted using the above described strategy are not actual semantic features, the frequent feature set mined in this way is scrutinized by a human expert, and non-features are pruned. (Note however that the pruning overhead is much less compared to Apriori-based approach as shown empirically in Section 4). Moreover, semantically similar features need to be clubbed together manually in order to improve the accuracy of feature-based summarization. For example, as shown in Table 1, the features “screen/display/touchscreen” although extracted separately actually allude to the same feature. Thus, our final feature-set is generated using a semi-supervised approach.

In order to allow for the large language vocabulary of bloggers, we also enhance our final feature set by looking up and adding synonyms of the extracted features using the online lexical resource WordNet 3.1 (http://wordnet.princeton.edu/).

3.3. Opinion Summarization and Classification Phase

In the previous phase we extracted opinion features, adjectives describing them, and any modifiers if present. We also generate a statistical feature-wise summary for each product which enables comparison of different brands selling similar products. In order to determine the sentiment polarity of an adjective describing an opinion feature we make use of SentiWordNet [1, 5, 33] which is a lexical resource for opinion mining. SentiWordNet assigns three normalized sentiment scores: positivity, objectivity, and negativity to each synset of WordNet [7, 32]. Let us revisit the review sentence:

“The processor[.n] is[.v] significantly faster[.a], and the text[.n] is[.v] clear[.a].”

In this example, the SentiWordNet scores assigned to the appropriate usage of adjective clear is indicated as (P: 0.625; O: 0.375; N: 0). Since the value of the positive polarity is highest, the adjective “clear” can be assigned a positive polarity. In this way, we generate a feature-orientation table (FO table) that records the opinion features and their corresponding descriptors of positive and negative polarities. The Table 1 shows the FO table entries for some of the features of product “Tablet. The FO table, thus generated, enables us to generate feature-wise summary of a product or comparative summaries of different brands of similar products. For example, Figure 2 compares two different models of Tablet computers based on feature-wise summary generated from several online user reviews.

Intuitively it is desirable that if a user’s opinion contains more number of features with positive polarity compared to the number of features with negative polarity it should be classified as positive. Likewise more number of negative polarity features should cause a user opinion to be classified as negative. In order to achieve this, we have simply calculated the normalized positive bias (Pos_Bias) as explained next. Note that a user opinion contains one or more sentences opining on one or more features. Let be the total number of features explicitly mentioned in a user opinion. Let be the number of features tagged positive using the FO table, and let be the number of features tagged as negative. Now, we calculate the term Pos_Bias which indicates the normalized positive bias of the opinion as indicated in the following:

The value of Pos_Bias falls in the range [−1, 1]. If the value of Pos_Bias is positive, the opinion is classified as positive. If its value is negative, the opinion is classified as negative, and if its value is zero the opinion is classified as neutral. Although we have performed a simplistic classification at this stage, the range of the Pos_Bias variable naturally lends itself to classification using different granularities and can be used for fuzzy classification.

4. Empirical Evaluation and Results

We collected over 1400 online reviews for four Tablets and three E-book Readers (approximately 200 reviews for each product) of leading brands from several popular review websites using a web crawler. The list of products is indicated in Table 2. We applied Preprocessing steps like sentence boundary detection, spell-error correction, and repetitive punctuation conflation on the review dataset as explained earlier in the preprocessing phase.

For each product category, we used 70% of the review sentences for training and 30% for testing. We obtained the feature set as explained in the feature extraction phase, determined the polarity (+/−) of the opinion words using SentiWordNet 3.0 (http://sentiwordnet.isti.cnr.it/), and generated the feature orientation table. The FO table entries were then used to identify opinionated sentences from user reviews and finally to classify their polarity as positive or negative as explained earlier.

We evaluated the effectiveness of the opinion mining strategy proposed in Section 3 at two tasks: automatically identifying opinionated sentences based on extracted features and (2) classifying the polarity of the users’ opinions.

In order to demonstrate the effectiveness of our multiword based feature extraction approach we have compared it with two other popular approaches: the Apriori based approach [4, 6, 8, 10, 11, 19] for initial feature set extraction and (2) the seed-set expansion approach [1, 14]. Since the seed-set expansion approach requires an initial list of features, we used an input of 10 manually selected and verified features for both product categories and then expanded it using the Word Expansion Algorithm as in [14].

Figure 3 compares the three approaches on the basis of the initial number of features extracted as compared to the actual number of usable semantic features as verified by a human expert. In Figure 3, Strategy 1 indicates the initial feature set retrieved using the Apriori based approach (before pruning), while Strategy 2 indicates the seed-set expansion approach. Strategy 3 indicates the feature-set extraction approach used by us as explained in the feature extraction phase of Section 3 (i.e., frequent single words + multiwords with decomposition pruning).

It becomes clear from Figure 3 that Strategy 3 performs better than both Strategy 1 and Strategy 2 since its initial feature-set is closer to the ideal feature set.

The main drawback of Strategy 1 (Apriori-based approach) is that it treats individual words as tokens following a bag-of-words approach and does not consider their order of appearance while mining frequent itemsets. So, it contains several redundant features and requires extensive pruning. The problem with Strategy 2 (seed-set expansion approach) is that it failed to extract sufficient features from the review dataset in spite of being given an initial seed set of relevant features. In contrast, Strategy 3 (multiwords based approach) takes into account the order of words from the beginning so it has a higher semantic sense and lesser redundancy. This method also does not require any prior domain knowledge. Thus it is superior to both the previous strategies. This is also proved by the empirical results shown in Table 3.

Table 3 compares the accuracy of the three feature extraction techniques when used for the task of automatically identifying opinionated sentences from the test set of user reviews. In order to perform the testing, opinionated sentences in the test set were first manually extracted and the polarity of the corresponding opinion features was tagged by human expert. Then, the same test set was subjected to automatic opinion mining using feature sets derived using the three strategies discussed previously. The comparative accuracy of opinionated sentence identification was recorded in terms of Precision, Recall and F-Measure as shown in Table 3.

The results of Table 3 show that our strategy outperforms the other two strategies. While the seed-set based approach achieved high precision (89%), its recall value is quite poor (65%). The primary reason for this is that the seed-set approach failed to identify several correct features from the review data set. The Apriori approach (with feature set pruned using the compactness and redundancy pruning [8, 19]) achieved good precision of 87% and recall value of 75%. However, the multiword based technique performed best with 91% precision and 78% recall. The reason for low recall rates is mainly due to the fact that implicitly expressed opinions could not be automatically identified by looking up the FO table.

For example, consider the following sentence referring to a Tablet:

“Since I bought it, I have not had to use my computer in weeks.”

A human expert would tag the aforementioned review sentence as a positive opinion, but there is no mention of any specific feature or adjective descriptor in the sentence, so it could not be automatically identified as a positive review using the FO table. Thus, lower recall was mainly due to false negatives in the opinion feature identification phase.

The accuracy of opinion polarity classification task is directly affected by the accuracy of opinion bearing sentence identification, because only the sentences identified as opinionated could be used for classification purposes. However, we achieved a high average accuracy 86% when classifying the polarity of opinions over the test set as shown in Table 4. Thus, the empirical results of the opinion mining technique are encouraging.

At present we have only classified opinions as positive or negative based on the Pos_Bias term as explained in the previous section. However, in the future we would like to take into account accelerators and decelerators [14] that enhance the effect of the adjectives determining opinion orientation and use it for fuzzy polarity classification. For example, consider the following reviews.(1)“The wireless connectivity is extremely good.”(2)“The wireless connectivity is good.”

In both the previous examples, the multiword “wireless connectivity” is a feature while “good” is the adjective describing it. Although both sentences have a positive orientation, the word “extremely” in the first sentence acts as an accelerator which communicates the positive sense more strongly. Incorporating the effect of such linguistic hedges [35, 36] can improve the result of opinion mining.

5. Conclusion and Future Work

Classifying and summarizing opinions of bloggers has several interesting and commercially significant applications. However, this task is much more difficult than classifying regular text and requires intensive Preprocessing. The success of the opinion mining task is mainly dependent on the efficiency and sophistication of the Preprocessing and feature extraction steps. We empirically proved that the proposed approach for product feature set extraction, that is, using frequent multiwords with decomposition strategy outperforms other contemporary approaches like the Apriori-based approach and the seed-set expansion approach.

Empirical results indicate that the multistep feature-based semisupervised opinion mining approach used in this paper can successfully identify opinionated sentences from unstructured user reviews and classify their orientation with acceptable accuracy. This enables reliable review opinion summarization which has several commercially important applications.

In the future, we want to perform opinion mining on larger and more varied blog data sets. We would also like to extend our work to fuzzy opinion classification with support for fuzzy user querying. We intend to do this by learning the strength of various adjective descriptors along with corresponding linguistic hedges and include them in the feature-orientation table generated during the mining process. The classification technique proposed in the paper can be naturally extended to support fuzzy classification.