Research Article

Summarizing Online Movie Reviews: A Machine Learning Approach to Big Data Analytics

Table 5

Movie reviews classification accuracy on three tasks: PL04 dataset contains 2,000 movie reviews and is used as standard data collection for the classification problems in sentiment analysis [58], Full IMDB data collection includes 50,000 reviews, and subjectivity data collection comprises of 1000 review sentences [58].

NB with featuresPL04Full IMDBSubjectivity

1Unigrams features81.586.6690.75
2Bigrams features77.788.2976.03
3Unigrams + bigrams feature set [53]82.488.9191.22
4Unigrams + bigrams + trigrams feature set80.1589.2291.18
5Unigrams occurrence + smooth IDF + cosine norm82.187.3690.7
6Bigrams occurrence + smooth IDF + cosine norm81.1588.3176.72
7Unigrams + bigrams + smooth IDF + cosine norm [53]83.789.2890.91
8Unigrams + bigrams + trigrams + smooth IDF + cosine norm83.1589.3390.87
9Benchmark model [54]88.9088.8988.13