Abstract

With the recently grown attention from different research communities for opinion mining, there is an evolving body of work on Arabic Sentiment Analysis (ASA). This paper introduces a systematic review of the existing literature relevant to ASA. The main goals of the review are to support research, to propose further areas for future studies in ASA, and to smoothen the progress of other researchers’ search for related studies. The findings of the review propose a taxonomy for sentiment classification methods. Furthermore, the limitations of existing approaches are highlighted in the preprocessing step, feature generation, and sentiment classification methods. Some likely trends for future research with ASA are suggested in both practical and theoretical aspects.

1. Introduction

Nowadays, Sentiment Analysis (SA) as well as opinion mining is broadly investigated research areas [1, 2]. It is an application of natural language processing (NLP), computational linguistics, and text mining to extract people’ opinions or emotions towards an event, product, or others [3, 4]. In general, SA imposes identifying four elements comprising entity, its aspect, opinion holder, and his sentiment [5]. The extracted opinions can be classified to either objective or subjective text. The subjective text also can be classified to positive or negative sentiments [6].

Most studies undertaken in SA have been carried out on natural languages, such as English, Chinese, and Arabic. NLP in Arabic is still at the beginning stages [7]. It lacks the resources and tools. Therefore, Arabic still meets challenges in NLP tasks due to its structure complexity, history, and different cultures [8, 9].

A large number of tools and approaches, in the literature, are utilized to conduct the SA task. Most of them are designed to manage SA in English which is the science language [9]. These approaches are either the semantic approach or the machine learning (ML) approaches. The semantic approach extracts the sentiment words and calculates its polarities based on a sentiment lexicon [10]. On the contrary, to build a new model, ML classifiers train annotated data, after converting to feature vectors, to conclude specific features used in a particular class. Finally, the new model can be used to predict the class of the new data [4]. It is worthy to note that these approaches can be adapted to others languages, such as Arabic [9].

Arabic language has received less efforts compared with other languages [11]; however, hundreds of studies have been proposed for ASA. Since its introduction since a decade, ASA has become one of the most popular forms of information extraction from the reviews. These reviews contributed in many benefits, such as showing the product brand or service valuable insights [1215], identifying potential product advocates or social media influencers [1619], and detecting e-mail spam [20]. Consequently, ASA has been studied in various contexts, and a large number of studies has been published on the topic. To our best knowledge, there has not been a systematic review that would synthesize the results obtained on ASA.

This review aims to introduce SLR of the research efforts on ASA. This SLR starts with determining the basic requirements, and subsequently, a total of 191 papers have been initially considered to be relevant. This number has been reduced to 140 papers after reviewing the abstracts of the studies. Finally, only a total of 108 papers are completely reviewed. Through careful study and analysis of these papers, the desired information has been extracted.

It is sought in this SLR to give an outline of major research themes and techniques and suggestions for future research. The first goal of this study is to review articles to know the research current state. The second goal is to discuss the major issues influencing SA in Arabic based on the reviewed studies. In addition to theory development in the field, this article has fourfold contributions. Firstly, systematic literature review of the research efforts on ASA will be provided. Secondly, an enhanced taxonomy of ASA methods will be introduced. Thirdly, we make an attempt at creating such synthesis while placing particular emphasis on preprocessing step, feature generation, and sentiment classification methods. Finally, new trends in ASA will be suggested, and the implications for future research and practice will be highlighted.

The reminder of this paper is organized as follows. Section 2 provides the methodology. Findings are illustrated in Section 3. Section 4 presents the analysis and discussion of the existing research to identify the research gaps and makes implications for future research. Section 5 draws up the conclusion.

2. Review Methodology

SLR was developed as an approach to identify and review SA in Arabic. A systematic review is carried out using a systematic, explicit, and rigorous standard, aiming not only to summarize current research on the topic but also to involve an element of analytical criticism. It presents eight major steps which are essential for any review to be scientifically rigorous [21]. This study followed guidelines of [2123] in the collection and analysis process. The details are presented in the following subsections.

2.1. Research Questions

Identifying the research questions is the first step of a systematic review. This step has to be concise and clear. In the context of this study, the research questions are stated as follows:(i)RQ1. What is the current state of research? Who has published? When?(ii)RQ2. What are the most effective techniques used in ASA?(iii)RQ3. Which are the most significant gaps and limitations in the reviewed studies?(iv)RQ4. What are directions of future research on ASA?

2.2. Searching the Literature

The targeted strategy of this review included determining the population, selecting resources, deriving search strings, and the inclusion and exclusion criteria. The literature search process of this review involved querying reputed journals and conferences dealing with ASA indexed in Scopus, including several databases, such as Springer, Elsevier, and IGI Global. The temporal range for the review was set for the articles published from January 2013 until end of November 2018.

According to the research questions, the search was conducted with the permutations of keywords. The search query was (Arabic AND (Sentiment Analysis, Opinion Mining, SA, OR OM)) AND (Classification, classifier, Prediction, OR Polarity). Consequently, the investigated studies yielded in a total number of 191 publications. In total, it is sought to review 32 journals, 81 proceedings from 23 databases. The selected conferences, journals, and databases are illustrated in Table 1.

These publications varied in scope as well as type. In details, SLR included journal articles, conference papers, and book chapters. Referring to Figure 1, the most used source is conference papers, which covers out almost 70% of the total sources. However, reviews and journal articles are 27%. On the contrary, the least used sources are book chapters, which contribute only 3%.

2.3. Inclusion and Exclusion Criteria

The reviewed articles were filtered utilizing multilevel criteria which were for exclusion and inclusion, as depicted in Table 2. The exclusion process is called quality appraisal. while the inclusion process is called practical screening.

2.4. Data Collection

The data collected from each article to conduct the review of ASA were identified to include the following:(i)The source whether conference or journal and the full reference(ii)The article authors and their institutions(iii)The article title, publication year, and publisher(iv)The type of SA tasks conducted(v)The dataset or lexicon and its size, domain, and source(vi)SA classification level(vii)SA approach(viii)SA algorithms and its accuracy(ix)Arabic language type(x)Preprocessing process(xi)The feature selection and generation process(xii)Associative tools and applications to perform the SA process

This SLR was carried out from January 2013 to the end of November 2018 resulting in 191 articles. Figure 2 depicts the steps of prescreening process of articles resulting in the conducted review. First, a pilot study by searching the journals and conferences was conducted. The purpose of this pilot study was to test the search parameters. This step resulted in 142 articles. Second, the initial pool of articles was reduced to 91 through selecting the relevant articles. This step involved removing duplicated papers and filtering unrelated articles by reviewing titles and abstracts. Third, the pilot study was conducted again for new articles at the end of November 2018 yielding 49 articles. Finally, filtering and selecting relevant articles were again done with reading full-text. Only 108 relevant articles were identified across the journals and conferences.

2.5. Data Extraction

This stage showed consideration for extracting data from the reviewed papers to answer the research questions. The required data to conduct the review of ASA were identified, including the type of SA tasks, the dataset, domain, and source, SA classification level, SA approach, algorithms, Arabic language type, preprocessing process, and the feature selection and generation process.

Table 3 presents the methods used in preprocessing step of the literature. These preprocessing methods include text cleaning, normalization, stemming, tokenization, part of speech (POS) tagging, negation detection, segmentation, stop words removal, lemmatization, irony detection, and named entity recognition.

In a similar context, for datasets utilized in ASA, Table 4 presents the most datasets with their size and source extracted from the reviewed papers. These datasets were represented by antecedent’s category, which ranges from D01 to D25, as shown in the first column. The second column shows the name of datasets used in the literature, such as LABR, OCA, ASTD, and BRAD. Dataset size and its trends are introduced, respectively, from the third to the sixth column. The dataset sizes ranged from 147 to 5615943 sentences. In addition, the data trends varied in positive, negative, and neutral. Finally, the last column presents the dataset sources which involved different areas, including Twitter, Facebook, and the others.

Table 5 presents the information extracted from 108 studies to help in exploring ASA. As shown in Table 5, SA tasks were categorized into five tasks, including aspect detection (AD), building resource (BR), sentiment classification (SC), subjectivity classification (Subj C), and aspect-based sentiment analysis (Aspect SA). Arabic language has three types, including Classical Arabic (CA), Modern Standard Arabic (MSA), and Dialect Arabic (DA).

Moreover, to give more interpretation in case of recording different accuracies with multiple algorithms and multiple datasets, three typographical emphases inspired by studies [66, 108] were exploited. These typographical emphases include underline, bold, and italic. For example, in row no. 6, there are three algorithms with three datasets and three different results. To distinguish them, bold indicates that K-nearest neighbors (KNNs) algorithm with DS2 dataset producing the accuracy of 74.9%. Similarly, underlining is used to say that algorithm of Decision Tree (DT) with dataset DS1 producing the accuracy of 74.37%. In the same vein, italic is utilized to denote that KNN with dataset DS3 and feature technique of SMOTE produced the accuracy of 84.02%.

In addition, the precedent category of datasets is utilized in Table 5; otherwise, the dataset size was presented if the dataset was built in-house. Also, SA approaches were contracted in the third column of Table 5 to give more details of the reviewed articles. These approaches involved supervised (Su), unsupervised (Un), hybrid (Hb), lexicon-based (LB), and semisupervised (Ss).

3. Findings

This section introduces the findings of SLR and contributes to answer the two first research questions. The sections below are relevant to the obtained results from SLR. ASA has been needed since Arabic audience, who use Internet and applications, has recently gone up a lot [9]. A total of 191 papers were found during the period from January 2013 until end of November 2018. It is clearly noticed from Figure 3 that the number of research on ASA has increased gradually in the last five years; i.e., it reached 53 articles in 2018. It is evident that ASA is recently an absolutely hot research topic. Therefore, it has witnessed a speedy growth and an increased interest from researchers. Thus, it is a fascinating research field.

In addition, Figure 4 provides information about the most tasks used in ASA. According to the reviewed studies, most tasks are broken down into sentiment classification, building resource, and building resource and sentiment classification. Moreover, it is apparent from Figure 4 that the task of building resource and sentiment classification dominated over all other tasks. Indeed, Arabic language still lacks resources and tools that can be employed to support classifying sentiments.

As known, Arabic language is usually varied in MSA and DA. Figure 5 illustrates the number of articles targeting Arabic language types. Overall, it is worthy to note that studying both MSA and DA is more trend rather than studying only MSA. It is due to the abundant presence of DA in social media and microblogging channels.

Furthermore, the sources of datasets for opinions on any topic vary from social media platforms to websites that introduce products or services. Figure 6 depicts the datasets sources used in ASA.

It is clearly noticed that Twitter, i.e., 50% of the data sources used in articles, is the most frequently application of social media used in the reviewed articles. It has a great potential of exploring people lives and their potentials, opinions, and interests. It is restricted to a very short message called tweets which are often written using a lot of Arabic slang.

In general, machine learning- (ML-) based approaches for ASA involves the following stages, including data preprocessing, feature generation, and selection, and ML methods. In literature, there are several techniques proposed in every stage of SA to improve the performance better. Arabic research performed in each of these stages is addressed in the following sections.

3.1. Preprocessing Arabic Text

Arabic is a rich language that is a challenging linguistic domain for NLP. It has morphological complexities and dialectal varieties which require advanced preprocessing [126, 127]. Another complexity is expressing the sentiment and feeling in dialect rather than MSA [126]. Preprocessing and analysis of Arabic raw texts extremely reduce the noise and improve the efficiency. Unfortunately, most studies concentrated on preprocessing English text; nevertheless, few studies focused on Arabic text [69].

Figure 7 shows the information about the most preprocessing strategies used in ASA.

In general, it is clearly seen from Figure 7 that the most studies went through the preprocessing phase, including text cleaning, normalization, tokenization, stop words removal, and stemming. It is found that 52% of the articles considered stemming as an important step. Meanwhile, text cleaning, normalization, stop word removal, and tokenization were used in 30%, 47%, 44%, and 45%, respectively.

3.2. Common Features in Arabic Sentiment Analysis

Machine learning presents several algorithms for sentiment classification. Nevertheless, the challenge of capturing sentiment in the written context is to select the best features to be utilized [9]. Features provide comprehensive summarization of the outcomes with more precise analysis of the sentiments [128]. Figure 8 reveals the most frequent features that are used in ASA. It shows that, per the features used in the reviewed articles, n-gram models are the highest and semantic and lexical features are the lowest.

3.3. Methods Used in Arabic Sentiment Analysis

Reviewed studies have introduced a wide set of methods and techniques to solve the ASA problem. Figure 9 illustrates the most methods used in ASA.

It is clearly noticed that SVM and NB are the highly used methods in articles, while voting, boosting, and semantic orientation (SO) are the lowest. SVM has been adopted in several previous sentiment classification works resulting in 74 papers out of 108 papers, while NB was used in 71 papers. It is worthy to note that applying SVM classifier in the previous studies has been superior or comparable to others classifiers, such as NB.

3.4. Arabic Sentiment Analysis Applications

In recent years, ASA has gained considerable attention, and its applications have spread to almost every possible domain. Figure 10 shows the most domains were targeted to apply sentiment classification. It is clearly observed from Figure 10 that many researchers inclined to apply SA in business and economy and social and politics domains numbering in 40 and 39 articles, respectively. On the contrary, the lowest domains addressed in ASA were education, health, and travel, and tourism numbering in 3 studies for each domain.

Most studies focused on ASA applications in a limited set of domains, such as politics [15, 48, 62, 89], hotel [79, 113], business and economy [12, 20, 129], arts and books [29, 32, 92], entertainment and movies [71, 73, 99], and sport [81, 96]

Several papers were published [12, 74, 83] to study ASA for several purposes such as building Arabic senti-lexicon, designing a framework for ASA, and comparing two free online SA tools that support Arabic. These studies involved collecting small datasets with size less than 3000 tweets that are relevant to several domains such as education, sports, and politics. In [83], little tweets were dedicated separately to each domain, including education. Nevertheless, the drawback of this study was that results for each domain was neglected and did not show up [83]. In addition, Al-Horaib et al. [102] studied ASA in e-learning using traditional ML algorithms, such as SVM and NB. However, the volume of dataset was 2000 tweets and related only to King Abdul-Aziz University.

In conclusion, little work has looked at using ASA to classify the sentiments in the education domain. Nevertheless, there was no intended orientation from researchers to study the domain in particular. Moreover, the collected data were small, and the results of classification, that was related to the education domain, were not particularly highlighted and clarified in the discussion.

3.5. Taxonomy of Arabic Sentiment Analysis Methods

To fulfill ASA, several methods of sentiment classification were proposed. An enhanced taxonomy of Arabic sentiment classification methods adopted from [130, 131] was proposed in Figure 11.

This enhanced taxonomy on classification of the SA methods, which are reviewed and discussed previously, is presented in Figure 11. As shown in the figure, the applied methods can be classified based on the approaches using machine learning. ML approaches can be supervised, unsupervised, or semisupervised. The supervised ML approaches can be divided into probabilistic and nonprobabilistic. On the one hand, NB, maximum entropy (ME), conditional random field (CRF), Bayesian network (BN), logistic regression (LR) are examples of the probabilistic classification method. On the other hand, SVM, KNN, DT, rule-based, and NN, including deep learning and the traditional NN can be classified as nonprobabilistic classification methods. The unsupervised ML approach, such as genetic and clustering algorithms, is exploited when it is hard to find labeled documents. All ensemble approaches, such as Random forest, voting, bagging, boosting, and stacking, can be categorized as semisupervised approaches. Moreover, ASA methods differ in the point of using a lexicon. These methods can be based on dictionary, corpus, or ontology. In addition, the applied methods can be a hybrid of ML and semantic orientation approaches. The modification done in Figure 11 involves adding some methods, such as DL, traditional neural network, genetic algorithm, clustering, LR, CRF, all ensemble methods, and ontology-based approaches.

4. Discussion and Future Research Avenues

This part sought to discuss the obtained results from SLR and to give the answer to the third and fourth research questions. A total of 108 articles have been reviewed in ASA to capture the current state and achieve the research aims. The aims were summarizing the most effective techniques used in ASA, revealing the gaps and limitations in the reviewed studies, and highlighting the directions of future research on ASA.

It is obvious that ASA has been studied from three important perspectives. The first is the preprocessing strategies that strongly impact upon the obtained results of SA classification. The second is the process of feature generation and selection that play a significant role to build the vectors and, accordingly, improve the results. The last process is the classification method which in role receives the vectors outputted from features generation to classify the sentiments.

As shown in the literature, there are still some challenges in ASA that have to be addressed. These challenges varied into preprocessing strategies, feature selection, classification method, and the targeted domain.

Arabic sentences greatly involve noisy, missing, and inconsistent data that need to be preprocessed to improve Arabic sentiment classification. Not using preprocessing, such as eliminating insignificant comments and repetition of letters, may lead to ignore important words. Applying a wide set of preprocessing strategies, such as normalization, tokenization, word removal, and stemming, will enhance the sentiment classification.

Al-Rubaiee et al. [13] explored the preprocessing steps within RapidMiner; normalization, tokenization, stop word removal, and stemming. It was demonstrated that text preprocessing is a key factor in sentiment classification and shows different levels of accuracy by creating N-grams term of tokens.

The preprocessing effect on ASA was addressed, especially on Egyptian presidential elections in 2012. It resulted that employing information gain for selecting features with N-gram, stemming, and normalization improved the accuracy of Arabic text classification [47].

In addition, Alomari et al. [68] investigated several preprocessing strategies, including stemming, stop words, N-grams, and different weighting schemes employing several scenarios for each. In details, N-gram models were employed with both stemming types and without stemming as well as with weighting schemes, including TF-IDF or TF. An Arabic stop word removal was excluded on the ground that it reduces performance in all scenarios. Moreover, the experimental results indicated that SVM classifier using TF-IDF with stemming through bigrams feature outperformed the best scenario performance resulting from the Naive Bayesian classifier.

The authors of [92] studied the impact of the stemming on the Arabic sentiment classification problem. They performed experiments on two datasets: “2000 tweets” collected by Abdulla et al. [132] and “BBN tweets” by Salameh et al. [133]. They implemented Arabic root stemmer of Khoja and light stemmer that are integrated in RapidMiner. The results showed that light stemming is preferable than the root stemming and trigrams of characters combined with tokens of the text given; therefore, the best results for sentiment classification were obtained.

Duwairi and El-Orfali [26] performed a study on SA for Arabic text. The authors used two datasets: one prepared in-house related to politics domain and the other prepared by Rushdi-Saleh et al. [8] related to movie domain. Their objective was to investigate data representations and preprocessing strategies on ASA. The results showed that using stemming and light stemming combined with stop words removal adversely affected the classification performance of the dataset related to movie domain, while it improved slightly the classification for the other datasets.

Even though the preprocessing phase is a significant step in SA for text mining, it is still underestimated and not extensively covered in the literature. In addition, the best preprocessing techniques, that play a decisive role and effective in improving ASA, are still an open field to study and experiment.

A good preprocessing leads to select a suitable feature. Feature representation includes a semantic representation that is still a challenging task in NLP. Thus, capturing word semantic is possible with distributional semantic models. Merging word embedding with combination of N-gram models will improve the results of SA.

In ASA, N-grams models have been largely used as features. Some studies exposed that unigrams resulted in a better performance than bigrams and trigrams [35, 54]. This behavior was due to the fact that BOW can give a good data coverage, whereas bigrams and trigrams tend to be very sparse.

The features, such as count vector of unigrams, bigrams, and trigrams, were experimented separately. Diverse combinations of several N-gram models were attempted and resulted that combinations improved the classification process [37, 69, 94].

Alomari et al. [68] examined the use of several N-grams (unigram, bigrams, and trigrams) with various weighting schemes, including TF-IDF and TF, and found that the bigrams model with TF-IDF weighting scheme outperformed others.

Also, POS tagging feature has been utilized in analyzing Arabic textual contents. For example, Al-Moslmi et al. and Mohammad et al. [18, 83] exploited POS tagging features, including nouns, adverbs, and adjectives to investigate sentiments in Arabic text. Alhazmi et al. [106] carried out two sets of experiments, with POS and without POS, to assess the POS pattern effectiveness as features in sentiment classification. The experiments showed that using POS patterns did not make big improvements and that might be due to the fact that Arabic dialect is commonly used in twitter.

TF-IDF and BTO were widely used as a weighting scheme to create the word vector [13, 60, 71, 73]. The performance was comparable. It relied on the word vector model generated and supervised the ML algorithm used.

Moreover, feature representation that includes a semantic representation is still a challenging task in NLP. Thus, capturing words semantic is possible with distributional semantic models which involve mainly word embedding. Word embedding is an alternative approach for such hand-crafted features in ASA. Several recent studies have exploited this technique [14, 33, 91, 134].

It is noticed that using the word embedding with DL models helped improving the results over the linear models, such as SVM, as it is suitable for large datasets and can be computationally efficient [33, 90].

There are many methods that have been proposed to deal with the Arabic sentiment classification problem. However, the accuracy of these methods is varied due to the dimensionality, large datasets, and features. Employing the DL model in ASA will participate greatly to solve problems implicated in other common methods, such as SVM and NB.

Several studies have employed SVM and NB together to investigate the Arabic sentiment classification problem. In many of these studies, it is noticed that there is a strong competition in achieving higher accuracy of SVM and NB. There are 22 studies in which NB accuracy outperforms SVM [52, 62, 83, 106, 118]. NB has the ability to classify sentiments using a small training set. It utilizes the statistics to accomplish probabilities classification and is very effective in classifying documents. Basically, its purpose is to analyze the absence and presence of a specific feature using probabilities to classify features independently. It was highly effective when dealing with words that have the probabilities to be sentiment or not, such as adverbs or adjectives [126].

In contrast, SVM has been successfully used for general classification as well as regression, and it has proven its effectiveness in Arabic sentiment classification. It has the ability to model several sources of data, the highest obtained accuracy, and flexibility in handling high-dimensionality data. Moreover, to avoid incorrect classification, it utilizes a larger margin. Therefore, the performance of SMV outperformed NB performance in 29 studies [13, 27, 36, 59, 71, 102].

In [123], several experiments were conducted using SVM and NB on different feature sizes to examine the performance of frequently used features selection techniques. It was noticed that the accuracy of SVM and NB decreased when the features size increased. Consequently, SVMs were superior for relatively small datasets and features with fewer outliers.

Deep neural network has been successfully adopted to extract features. It has a big advantage over other ML methods. Deep contextual features about words are extracted in a lower dimensional space rather than requiring any features engineering for learning continuous text representation from data. Furthermore, DL models are the most adequate with very large datasets, large number of features, and complex classification tasks. Consequently, the DL model is a promising way to solve Arabic sentiment classification.

Recently, many research studies have significantly exploited DL in SA, as depicted in Figure 12, according to a survey that we have conducted. The line graph shows a steady increase over a six-year period. It is clearly observed from the line graph that the numbers of articles that implemented DL in SA soared slightly at the first and climbed steeply in the last two years. Nevertheless, implementing DL in ASA has received little efforts.

It is noticed that out of 209 articles is only six Arabic articles utilizing DL.

Alayba et al. [112] integrated convolutional neural network (CNN) and long short-term memory (LSTM) methods to investigate their benefits on ASA. As a result, the obtained accuracy for ASA was improved on several datasets. In addition, DL method for ASA was presented in [90]. The authors investigated several combinations of skip-gram and CBOW, including CNN and LSTM evaluated on two publicly available datasets. Using the combined LSTMs introduced, the highest results in terms of accuracy and other performance measures. Al-Azani et al. [135] used LSTM and its simplified variant gated recurrent unit (GRU) to detect sentiment polarity of Arabic microblogs. They compared the performance of DL to baseline traditional machine learning methods. The results showed that models based on LSTM and GRU outperformed other classifiers.

In general, using a combination of DL models for ASA is a promising alternative to traditional machine learning techniques, and it assists to increase the accuracy. The main idea of DL techniques is to use the deep neural network algorithms to learn complex features extracted from large raw data without relying on prior knowledge of predictors. These algorithms automatically learn new complex features instead of passing features created manually. To perform well, DL approaches need large amounts of data. Thus, the two main factors affecting the performance of DL techniques are automatic feature extraction and availability of resources. They are very important when comparing the DL techniques and traditional machine learning techniques.

4.1. Implications

Based on the reviewed work, several trends are noticed in the ASA area. It is clear that the reviewed literature covered ASA from the viewpoint of the classification methods and building resources related to specific domains. It is worthwhile to apply SA in many domains not targeted with the hottest methods.

However, many issues are still not sufficiently discussed and solved in ASA. These issues include shortcomings and gaps in the reviewed work to point out the implications exposed from reviewing ASA in several articles. These implications are involving two perspectives: for future research and for practice.

4.1.1. Implications for Future Research

As SLR focused on contributions of the existing literature relevant to ASA, almost implications, that are for future research, are discussed below:(i)Applying deep learning techniques to classify Arabic sentiments was conducted by some studies. However, it is not applied in many domains such as education.(ii)A comprehensive paradigm that expresses all the details of the preprocessing process in various situations have to be developed to conclude the appropriate processes that meet the characteristics of Arabic language.(iii)Building an Arabic lexicon is an open field. Most researchers built many lexicons that are either of limited size or not publicly available. Thus, oriented lexicons for each domain should be built since there are very few freely accessible Arabic corpora and lexicons for SA.(iv)It is clear from Table 5 that many researchers claimed to have achieved a high accuracy of sentiment classification, while they applied their classifiers on not standardized datasets. Thus, this can be unreliable. Therefore, the results of studies cannot be generalized, unless there is a standardization for DS as a benchmark in different domains.(v)Most current feature representations for Arabic language are borrowed from other languages, such as English. Therefore, developing new feature representation suiting Arabic language characteristics will help to improve the classification results.

4.1.2. Implications for Practice

ASA still needs applicable systems. These systems should consider the following:(i)Incomplete solutions were introduced to classify sentiments or opinions and predict events result. Thus, there is a need to develop recommendation systems in many fields, for instance, economy, business intelligence, politics, sports, education, and so on.(ii)An enhanced framework for ASA in different domains will contribute broadly to improve the performance of several industries. This will enhance the mental image of an organization through improving their services and products, therefore, customer satisfaction, and revenue.

5. Conclusion

In this SLR, the research articles on ASA were systematically reviewed. The contributions were analyzed with respect to specific research questions. It provides a systematic overview of existing research in ASA. After filtering, 108 published research, in 11 journals and 22 conference proceedings, have been analyzed.

ASA became an important issue in terms of talking about preprocessing process, features selection, and classification methods. The state of the art of ASA has shown the various and widespread works from different viewpoints. SLR highlights the frequent preprocessing strategies and the most methods used in feature selection. Furthermore, it presents a taxonomy of sentiment classification methods. This taxonomy was constructed to answer the research question: What are the most effective techniques used in ASA?.

Through SLR, it is obvious that ASA still needs more research. It contributes to implications for future research and implications for practice. The review shows that there is limited research on building standardized datasets and applying promising classification methods. Moreover, the review also reveals a lack of research from developing a new feature representation that suits Arabic language characteristics. Furthermore, avenues for future research also exist within developing recommendation systems in many fields and an enhanced framework for ASA in different domains. Researchers are encouraged to join this current research area.

Conflicts of Interest

The authors declare that they have no conflicts of interest.