Abstract

How to reduce the cost of competition in the industry, identify effective customers, and understand the emotional needs and consumer preferences of customers, so as to carry out fast and accurate commercial marketing, is an important research topic. In this paper, we discussed the method for the analysis of three product data which represent the customer-supplied ratings and reviews for microwave ovens, baby pacifiers, and hair dryers sold in the Amazon marketplace over the time period. The sentiment analysis, linear regression analysis, and descriptive statistics were implemented to analyze the three datasets. Based on the sentiment analysis given by the naive Bayesian classification algorithm, we found that the star rating is positively correlated with the reviews, while the helpfulness ratings have no specific relationship with the star rating and reviews. We use multiple regression analysis and clustering algorithm analysis to get the relationship between the 4 indexes such as time, star rating, reviews, and helpfulness rating. We find that there is a positive correlation between the 4 indexes, and the reputation of the product online market is improving as time grows. Based on the analysis of the positive reviews and star ratings, we suggested indicating a potentially successful or failing product by the positive reviews. We also discussed the relations between the star ratings and number of reviews. Finally, we selected the words from the Amazon sentiment dictionary as candidate words. By counting the candidate words’ appearance in the review, the keywords that can reflect the star rating were found.

1. Introduction

1.1. Background

Through the vigorous development of e-commerce in these years, the Internet traffic dividend [1] of the Internet ceased to exist, and the profit cost of merchants has become higher and higher. How to reduce the cost of competition in the industry, identify effective customers, and understand the emotional needs and consumer preferences of customers, so as to carry out fast and accurate commercial marketing, is an important research topic.

The data such as customers’ online comments, ratings, and sentiment are important sources of information on this issue. For example, Amazon provides customers with an opportunity to rate and review purchases. Individual ratings—called “star ratings”—allow purchasers to express their level of satisfaction with a product using a scale of 1 (low rated, low satisfaction) to 5 (highly rated, high satisfaction). Additionally, customers can submit text-based messages—called “reviews”—that express further opinions and information about the product. Other customers can submit ratings on these reviews as being helpful or not—called a “helpfulness rating”—towards assisting their own product purchasing decision. Companies use these data to gain insights into the markets in which they participate, the timing of that participation, and the potential success of product design feature choices.

The e-commerce marketing department can use the technology of data analysis to figure out marketing strategies that can maximize profits, thereby saving marketing costs for enterprises. In this paper, we discussed the method for the analysis of three product data which represent the customer-supplied ratings and reviews for microwave ovens, baby pacifiers, and hair dryers sold in the Amazon marketplace over the time period from March 02, 2003, to August 31, 2015. The three products are commonly used household products of which the prices are located in three price ranges, respectively. This will make the analysis more comprehensive and convictive. Our datasets come from open source Amazon Review Dataset [2]. Our datasets have 32022 data points in total. Each example includes the type, name of the product as well as the text review, the star_rating, and votes.

1.2. Our Works

We are asked to supply the identified key patterns, relationships, measures, and parameters in past customer-supplied ratings and reviews associated with other competing products, so as to inform their online sales strategy and identify potentially important design features that would enhance product desirability. We mainly focus on the tasks in the following.

At first, the sentiment analysis is performed. This is a crucial step to begin our analysis. Online reviews and comments are important information resources and contain a lot of intuitive data. Both online reviews and comments are types of text data which account for over 50% of business data. Text-based sentiment analysis is mainly focused on two technologies: machine learning (naive Bayes [36], SVM [710], and ME [11, 12]) and emotion analysis based on the emotional lexicon (dictionary-based method [1315] and corpus-based method [16, 17]). The machine learning-based method is more accurate than emotional lexicon-based method and has been widely used in sentiment analysis. Now, the deep learning technology is also applied in sentiment analysis; however, this technology needs huge amount data for model training and is not suitable for the data analysis of small e-commerce company. In this paper, the naive Bayes classifier is applied to extract the sentiment orientation (positive or negative) from the reviews. We quantified the sentiment orientation into 11 levels ranging from −5 to 5. A smaller level of quantification means more negative sentiments, and a bigger level means more positive sentiments. Quantitative level changes from small to large which means the process of emotional change from negative to positive.

Based on the linear regression analysis of the three datasets, we found that there is a significant linear relationship between the reviews and star rating. With the rise of star rating, the reviews of the products are more positive. Then, we build a multiple linear regression [18] model on the three datasets and found that there is a significant linear relationship between the three datasets.

We implement descriptive statistics [19] to analyze the relationship between the three datasets and their variation trend and then normalize these indexes. Based on the sentiment analysis given by the naive Bayesian classification algorithm, we found that the star rating is positively correlated with the reviews, while the helpfulness ratings have no specific relationship with the star rating and reviews. Moreover, there is no obvious boundary between the reviews, so we regard star rating as the most valuable index.

We use multiple regression analysis and clustering algorithm analysis to get the relationship between the 4 indexes such as time, star rating, reviews, and helpfulness rating. Based on the given data, we use random sampling consistency (RANSAC) algorithm to randomly select the 4 indexes of each year and then calculate the coefficient of multiple regression function by SPSS data processor to get the variation relationships between time and star rating, product rating, and helpfulness rating. We find that there is a positive correlation between time and star rating, product rating, and useful information, and the reputation of the product online market is improving as time grows.

Based on the analysis of the positive reviews and star ratings, we suggested indicating a potentially successful or failing product by the positive reviews. We also discussed the relations between the star ratings and number of reviews.

Finally, we selected the words from the Amazon sentiment dictionary [20] as candidate words. By counting the candidate words’ appearance in the review, the keywords that can reflect the star rating were found.

2. Sentiment Analysis

2.1. Data Preprocessing

The sentiment analysis of the reviews is a crucial step of our analysis. To this end, the naive Bayes classifier is used in this section. However, before we get start our tasks, the reviews need to be preprocessed because the reviews always contain a large amount of nonalphabetic characters (e.g., %, $, #, and @) which will cause errors of the word segmentation. Therefore, we get rid of the nonalphabetic characters from the reviews of the datasets, and then, the word segmentation is implemented.

2.2. Naive Bayes Classifier-Based Sentiment Analysis

The naive Bayes classifier is based on the bag-of-words model [21]. With the bag-of-words model, we check which word of the review appears in a positive word set or a negative word set. If the word appears in a positive word set, the total score of the review is updated with +1 and vice versa. If at the end the total score is positive, the review is classified as positive, and if it is negative, the text is classified as negative. The probability a review belongs to a class is given by the class probability multiplied by the products of the conditional probabilities of each word for that class:where is the class label, is the number of occurrences of word in class , is the total number of words in class , and is the number of words in the review we are currently classifying.

In our tasks, we classified the sentiments into 11 levels labeled by −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, and 5. A smaller label denotes more negative sentiment, while bigger one denotes more positive. Thus, the model can describe the sentiments more accurately.

2.3. Model Establishment

Step 1: data preprocessing as what we mentioned in Section 3.1Step 2: segment the word and build the training set based on the dictionary called Amazon Product Review Data which is downloaded from the website “https://github.com/uiuc-cs498/amazon-product-sentiment-analysisStep 3: train the NB classifierStep 4: analyze the reviews by NB classifier

2.4. Experiments and Discussion

Table 1 lists some sentiment analysis of the reviews (row number ranges from 4748 to 4767) in data file hair_dryer.tsv.

The first column in Table 1 is the row number of the test reviews; the second column shows the star rating of the reviews. The 11 columns in the right part of Table 1 show the probabilities of each class. Generally, the analysis is acceptable referring to the star ratings. However, some reviews are quite different from the star ratings, e.g., row numbers 4760 and 4765. This is mainly because of the fact that the positive words in the reviews are not contained in the dictionary. The word number in the dictionary is limited. It cannot completely cover all of the English words. Therefore, we must filter the words in the review that does not belong to the dictionary. The words which are filtered out may have an important effect on the sentiment analysis. The second reason is that some customers’ reviews have little correlation with their star ratings. They may give a positive review followed by a low star rating, or vice versa. However, it can be seen that star rating is proportional to the class label.

3. Task I

3.1. The Foundation of Model

To investigate the relationship between two or more variables, regression model [2224] is a powerful tool. There are innumerable forms of regression models which can be performed, for example, linear regression, multiple linear regression, ridge regression, elasticNet regression, and multinomial logit regression.

In this paper, we filter out some unrepresentative and invalid data, for example, the records without reviews. And then, the unitary and multiple linear regression models are established.

The general form of multiple linear regression model is written as follows:

In formula (2), are coefficients to be determined by regression. is the response variable. are explanatory variables that can be measured or controlled, called independent variable. is model’s error term (also known as the residuals). The multiple linear regression model is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable.

3.2. Model Establishment

Step 1: implement unitary linear regression analysis on each pair of the three datasets. As a result, only the group of star ratings and reviews exhibits a significant linear relationship.Step 2: implement multiple linear regression analysis of star ratings, helpfulness ratings, and reviews.

3.3. Experiments and Discussion

Tables 2 and 3 show the results of unitary and multiple linear regression. As to multiple linear regression analysis, microwave, hair dryer, and pacifier are all passing the significance test because the values are all less than 0.05. It shows that when star ratings and reviews increase, the helpfulness ratings also increase. The multiple R-squared value of microwave is 0.1458. The residual standard error of microwave is 1.521. So the standard error is 1.521. The multiple R-squared value of hair_dryer is 0.1133. The residual standard error of hair_dryer is 1.225. So the standard error is 1.225. The multiple R-squared value of pacifier is 0.03709. The residual standard error of pacifier is 1.559. So the standard error is 1.559. Multiple R-squared values of the three datasets are all less than 0.8, which means the multicollinearity between the variables is weak.

R square (r square value) is the coefficient of determination, which means the percentage of changes in dependent variables that can be explained by the model you fit. In Tables 2 and 3, the value of R-squared is very low. This means that the predicted points are quite different from the actual points, and there is a lack of fitting. Because of the low r square value, the residual will be high. There are three reasons. At first, because of the influence of other independent variables, if some variables have been proved to be related to the independent variables in this paper, they must be introduced as control variables, even if they have nothing to do with the research hypothesis. The second is the influence of system error. In fact, it is also an independent variable with a specific meaning. The third is the influence of random error. Therefore, the size of R square mainly affects the accuracy of the model rather than its correctness.

In multiple regression analysis, the regression coefficient means that when other forecast variables keep unchanged and a certain forecast variable increases, the value of the dependent variable increases. Since there have some unrepresentative and invalid data, we deleted them for preprocessing. Based on the unitary linear regression analysis on each pair of the three datasets, we found that “star ratings” and “reviews” have a significant linear relationship. This means the higher the “star_ratings” is, the more the positive sentiment. The e-commerce should pay more attention to the products of high star ratings.

4. Task II

4.1. Most Informative Data Measures

We make descriptive statistics on the three datasets and analyze the relationship between the three datasets. We calculated the star distribution of star ratings, then analyzed the basic characteristics of these data with Excel and SPSS software, such as range, standard deviation, and variance, and finally obtained the ratio of “helpful votes” and “total votes” (RHT). The descriptive statistics of the data are listed in Table 4.

In Table 5, the left column is the total votes of each product_ID given by the data file, and the right is the summation of reviews of each product_ID. Figure 1 gives the useful evaluation number (UER) which is defined as the number of helpful votes in certain RHT region. For example, in the RHT region (0.8, 1], we got 2500 helpful votes. Table 5 and Figure 1 show that most of the products are not voted. Nearly half of the product reviews have total votes between 0 and 100, and few products have more than 100 votes. This means most of the customers tend to give up voting. Therefore, the evaluation data of useful information are incomplete and one-sided.

Figure 2 shows the average customer sentiments on certain product over year. We can find out the sentiments can reflect the quality of the product as a whole, but the boundary of text evaluation attitude is not very clear, and the differences between them are very small, which is not conducive to quantitative evaluation.

Figure 3 gives the percentage pie chart of star ratings. The proportion of five-star rating products is 58%, followed by four-star products which account for 18%. The proportion of three-star, two-star, and one-star products is less than 10%. The differences between different star ratings are very clear. It can be seen that consumers have a high evaluation of commodities, and star rating can clearly and quantitatively show the value of commodities.

Every product will have star ratings and commodity evaluation, but only some products have useful commodity evaluation information. Star rating is positively related to the trend of commodity evaluation. However, commodity evaluation is subjective. Sometimes customers make fake comments for some purpose. Amazon uses machine learning models instead of raw data averages to calculate star ratings for products. This means commodity evaluation is not suitable for qualitative analysis, so star rating is the most valuable index. This conclusion is consistent with what we conclude in Section 3.

4.2. Time-Based Measures and Patterns within Each Dataset

We cluster the total sales of commodities (as shown in Figure 4) in each year and get the icicle chart (as shown in Figure 5). In Figure 5, index Num denotes the total sales of each year. According to the hierarchical clustering analysis, we can divide the online sales of products from 2002 to 2015 into four categories: 2015, 2002–2005, 2006–2009, and 2010–2014. We found that the total number of goods purchased in 2002–2005 was very small, which increased year by year in 2006–2014 and decreased in 2015 compared with 2014. Based on the investigation, we found that the rapid popularization of the Internet and the rapid development of online shopping malls in 2006–2014 and a series of macroeconomic and financial policies and reform measures launched by the international community may be the cause for the increase of the purchase volume of commodities in 2006–2014.

Figure 6 shows the summation of helpful votes and total_votes over year. In Figure 6, the horizontal and vertical coordinates present the year and summation of helpful votes, respectively. Figure 6 shows that the proportion of five-star products is 58%, followed by four-star products which account for 18%, and the proportion of three-star, two-star, and one-star products is less than 10%.

Figure 7 presents the star rating variation over year. From 2002 to 2007, the proportion of stars (the vertical coordinates) has changed a lot. From 2007 to 2015, the proportion of stars has changed steadily, among which the proportion of five stars has increased gradually.

The general trend of the total number of votes and helpful votes is positively related to the general sales volume. With the increase in the number of purchased commodities over time, consumers’ commodity evaluation attitude is becoming more and more positive. Among them, the five-star ratio has been higher than other stars over the years. After 2007, the proportion of four-star products is higher than that of one-star products, higher than that of three-star and higher than that of two-star products. It can be seen that the reputation of commodities is improving.

4.3. Relationship between Star Ratings and Reviews

For the data of the three charts given, vine (string): customers are invited to become Amazon vine voices, which is based on the trust they have won in the Amazon community, because they have written accurate and insightful comments. It can be inferred that under the condition of vine = y, the selected comments are those written by the “specific star” mentioned in the title. Through the analysis of the three charts, it can be concluded that the level of star rating is corresponding to the positive and negative aspects of the review. The lower the star rating is, the more negative the review is. On the contrary, the higher the star rating is, the more positive the review is. Therefore, in the process of research, we can use star rating to replace the positive and negative aspects of the review. We can use helpful_votes to divide total_votes to calculate the value and compare it with 50%; if it is higher than 50%, it is a positive impact, and if it is lower than 50%, it is a negative impact. When the positive impact is higher than the negative impact, the comments written by these specific stars are a positive impact.

The datasets are analyzed as following steps:Step 1: vine (string): customers are invited to be Amazon vine voices based on the trust they have earned in the Amazon community because they have written accurate and insightful comments. It can be inferred that under the condition of vine = y, the selected comments are those written by the “specific star” mentioned in the title.Step 2: calculate the proportion of star rated reviews for different types of products when vine is n and Y, respectively.Step 3: we can use helpful_votes to divide total_votes to calculate the value and compare it with 50%.

Tables 6 and 7 present the ratio of star rating reviews of hair dryer, oven, and pacifier, respectively, with different vine values.

From the above data, it can be seen that online customer reviews transfer other information to consumers except for commodity attribute information, so that consumers can better understand the quality and performance of products and avoid decision-making errors caused by information asymmetry, which has become the most important information for online shopping users before making purchase decisions. Online reviews not only have an important reference value for consumers but also have an impact on their subsequent comments. Based on the fine processing possibility model, it is concluded that high-quality reviews have a positive impact on Website Trust, and Website Trust has a positive impact on consumer reviews. Therefore, e-commerce websites can improve the comment system, rank high-quality comments first, and invite specific stars to write positive impact to guide consumers and encourage them to publish high-quality comments (in the pacifier table, some star ratings are beyond the normal range of star rating 1–5, and there are blank star ratings. These data will affect the research on the correct results, which is invalid).

4.4. Relationship between Quality Descriptors of Text-Based Reviews and Rating Levels

We selected the words from the dictionary as candidate words and count the candidate words’ appearance in the review. We define the top 50 words with the highest word frequency as set A, while set B is defined as the top 50 words with the highest word frequency in the reviews which have star ratings equal to 1. Similarly, sets C, D, E, and F composed of the top 50 words with the highest word frequency in reviews which have star ratings of 2, 3, 4, and 5, respectively. Finally, the intersection of A, B, C, D, E, and F will reflect the relations between quality descriptors and rating levels (25 words are contained in the intersection). Table 8 shows the 24 words of the intersection. Figure 8 gives the curve of the word frequency versus star ratings.

Table 8 indicates an obvious positive correlation between rating levels and quality descriptors such as “great,” “like,” and “love”. As shown in Figure 8, we present the word frequency (vertical coordinates) of 24 words (horizontal coordinates) in the intersection of sets A, B, C, D, E, and F. The word frequency in the 6 sets is displayed in different colors. Figure 8 indicates that a more positive quality descriptor will lead to a higher rating level. With the decrease of the positivity of quality descriptors, costumers tend to select the star rating in the 5 choices with similar possibilities.

5. Strength and Weakness

5.1. Strength

Perhaps the biggest strength of our method is the NB classifier which is applied in this paper to extract the sentiment orientation (positive or negative) from the reviews. This method enables our model to use the online text data and makes the follow-up model more simple.

5.2. Weakness

The accuracy of our model is influenced by the vocabulary and classification accuracy of the sentiment dictionary.

6. Further Work

Our future work will mainly focus on big data mining based on deep learning technology. In addition to sentiment analysis, it will also include mining customer needs from text data, trial experience perception, and consumption tendency.

7. Conclusion

In this paper, the naive Bayes (NB) classifier is applied to extract the sentiment orientation (positive or negative) from the Amazon product reviews. The sentiment orientation is quantified into 11 levels. Based on the linear regression and multiple linear regression models, we analyzed the three product datasets (microwave oven, baby pacifier, and hair dryer) to provide meaningful quantitative relationships between star ratings, reviews, and helpfulness ratings that will help the e-commerce company succeed in their online marketplace product offerings. Descriptive statistics was used to analyze the relationship between the three datasets and their variation trend. Our analysis showed that the star rating is positively correlated with the reviews, while the helpfulness ratings have no specific relationship with the star rating and reviews. This means the star rating is actually the most valuable index. The multiple regression and clustering algorithm analysis give the relationships between the 4 indexes such as time, star rating, reviews, and helpfulness rating. We find that there is a positive correlation between time and star rating, product rating, and useful information, and the reputation of the product online market is improving as time grows. This means people will tend to get used to using new products.

According to our data analysis, “star rating” and “reviews” of the online information are consistent on the whole, and “star rating” can qualitatively and accurately describe product information. The sales volume of hair dryer, microwave oven, and baby pacifier is increasing year by year. Meanwhile, the satisfaction of buyers to the products is also increasing. Comments and helpful comments are also increasing. We suggest that if an e-commerce company wants to further understand the product information accurately, it should take the star rating of the product as the measurement standard. At the same time, the improvement of the three datasets of “star rating,” “comment,” and “help rating” indicates the improvement of the product online market reputation, and in this case, we can appropriately increase the type and scale of online market products. We also suggest focusing on the 2 important features of your products, such as the convenience of the operation/store and the aesthetics of appearance design. That is mainly because the two most frequent keywords on product experience are “easy” and “cute.” The two features gain the highest star ratings.

Our method can be used to deal with any online product reviews not only for Amazon. The online product review analysis can help us to reduce the cost of competition in the industry and understand the emotional needs and consumer preferences of customers, so as to carry out fast and accurate commercial marketing.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was funded by the National Natural Science Foundation of China under grant no. 61602400. This work was also supported by the National Spark Program of China (no. 2015GA690259).