Abstract

Context. Social media platforms such as Facebook and Twitter carry a big load of people’s opinions about politics and leaders, which makes them a good source of information for researchers to exploit different tasks that include election predictions. Objective. Identify, categorize, and present a comprehensive overview of the approaches, techniques, and tools used in election predictions on Twitter. Method. Conducted a systematic mapping study (SMS) on election predictions on Twitter and provided empirical evidence for the work published between January 2010 and January 2021. Results. This research identified 787 studies related to election predictions on Twitter. 98 primary studies were selected after defining and implementing several inclusion/exclusion criteria. The results show that most of the studies implemented sentiment analysis (SA) followed by volume-based and social network analysis (SNA) approaches. The majority of the studies employed supervised learning techniques, subsequently, lexicon-based approach SA, volume-based, and unsupervised learning. Besides this, 18 types of dictionaries were identified. Elections of 28 countries were analyzed, mainly USA (28%) and Indian (25%) elections. Furthermore, the results revealed that 50% of the primary studies used English tweets. The demographic data showed that academic organizations and conference venues are the most active. Conclusion. The evolution of the work published in the past 11 years shows that most of the studies employed SA. The implementation of SNA techniques is lower as compared to SA. Appropriate political labelled datasets are not available, especially in languages other than English. Deep learning needs to be employed in this domain to get better predictions.

1. Introduction

The relation between social media platforms, being the new way of linking the parts of the world, and politics is no secret. This relation attracted researchers seeking to exploit this era’s new abundant useful information to perform different tasks such as information extraction and sentiment analysis, among others. One of the most widely used platforms by researchers is Twitter. Apart from the dictionary approach and statistical approaches, machine learning has been effectively applied in several other domains for different purposes, for instance, [13]. Machine learning improved the prediction job in terms of accuracy and precision.

As of October 2020, Twitter had over 300 million users worldwide; 91% of them are over the age of 18. This platform attracts many politicians and enables them to interact and use it as a tool in their campaigns [4]. Offering an API that allows extracting public tweets and user’s public information and interconnections, it is considered a treasure for researchers aiming for election predictions.

Many researchers have analyzed and predicted different countries’ elections on different social media platforms such as Facebook and Twitter [48]. Few studies surveyed this topic [911]. To the best of our knowledge, no study ever has reported a systematic mapping study (SMS) or systematic literature review (SLR) about election predictions on Twitter. This research systematically identifies, gathers, and provides the available empirical evidence in this area.

This research study assists in providing a comprehensive overview and getting more in-depth knowledge about election prediction on Twitter, thus helping to(i)identify research gaps (research opportunities)(ii)aid researchers (decision-making) when selecting approaches or tools.

The main contribution of this research work is as follows:(1)Identify and classify the main approaches (RQ1) used to predict election: its techniques (RQ1a) and the tools (RQ1(b) (c))(2)Identify the research works that have reported manual/automatic data labelling (political data) (RQ2)(3)Identify and enlist the countries whose elections are analyzed (RQ3)(4)Identify and list the tweet languages used for predicting election on Twitter (RQ4)(5)Identify main topics used in the studies using machine learning techniques (RQ5)(6)Identify some demographic data in the field of election prediction on Twitter, such as the most frequent publication venues, active countries, organizations, and researchers (DQs)(7)Providing a centralized source for the researchers and practitioners by gathering disperse shreds of evidence (studies)

The remainder of this paper’s organization is as follows: Section 2 provides an overview of the most related work, and Section 3 presents a detailed methodology, following by Results and Discussion in Section 4. Furthermore, Section 5 deals with Validity and Threats, followed by the Conclusion and Future Work discussed in Section 6.

This section presents the most related work to SMS on election predictions on Twitter.

Chauhan et al. [9] in 2020 surveyed election prediction on online platforms such as Twitter and Facebook. Their study presents an in-depth analysis of the evaluation of SA techniques used in election prediction. They overviewed nearly 48 studies, including 10 studies that tried to infer users’ political stance.

In May 2019, Bilal et al. [10] presented a short overview of election prediction on Facebook and Twitter. They gave an overview of 13 studies. Their study mainly categorized the studies into two approaches: sentiment analysis and others. Additionally, they categorized those studies into two categories: “can predict elections” and “cannot predict elections.”

Singh and Sawhney [11] conducted a review of 16 papers in December 2017 related to forecasting elections on Twitter. They listed the countries whose elections were analyzed and provided tweet statistics used in the selected studies. Furthermore, they listed and presented the methods used for prediction and classified the studies into successfully and unsuccessfully, predicted elections.

All these studies presented short reviews except for [9]. Besides, all the aforementioned studies performed Adhoc literature surveys, and none of them followed a detailed systematic protocol. This study is the first systematic mapping study that mainly focused on election prediction on Twitter and thoroughly overviewed and analyzed the selected 98 primary studies.

3. Methodology

A systematic mapping study (SMS) is an effective way of getting knowledge about the state-of-the-art of a research field. This study conducts an SMS of election prediction on Twitter. Figure 1 shows the detailed flow of this SMS.

3.1. Approaches for Predicting Election on Twitter

Various approaches possibly could be employed to predict elections on Twitter. Researchers and practitioners mainly use three approaches: sentiment analysis (SA); volume-based (Vol.); and social network analysis (SNA). Figure 2 shows a generalized framework of election prediction on Twitter. A Twitter API is used to collect tweets about the election (candidates, election, political party, and trends). It is then preprocessed (cleaned and filtered) according to the needs, such as removing unnecessary characters, whitespaces, stemming, and so on, for sentiment analysis. Afterwards, an approach or technique is employed to perform the election prediction job or task effectively.

3.2. Aim and Research Questions

This study aims to identify and categorize the methods used for predicting elections on the Twitter platform. This aim can be divided into a set of research questions (RQs) for its broadness. The set of research questions (RQs) is as follows:RQ1: what are the approaches used in predicting elections on Twitter?RQ1(a): what are the techniques used for election prediction on Twitter?RQ1(b): which tools are utilized for election predictions?RQ1(c): which techniques/tools are employed for tweet collection?RQ2: which studies reported manually/automatically annotated data?RQ3: which countries are reported for election prediction on Twitter?RQ4: what are the languages of tweets used for predicting elections on Twitter?RQ5: what are the most frequent topics discussed?

We also gathered and investigated some exciting information by defining and answering some demographic questions (DQs): most active countries, organizations, and authors. This information helps the practitioners, researchers, and organizations in a certain way [1215]. The set of DQs is as follows:DQ1: who are the most active researchers in the field of analyzing election prediction on Twitter?DQ2: which are the most active organizations?DQ3: which are the most active publication venues?

Table 1 gives a short description of research questions (RQs) and demographic questions (DQs).

3.3. Search Strategy

It is mandatory to complete two essential operations before executing the search in different digital libraries: (a) specify search keywords and (b) specify digital libraries. Search keywords compose the search strings in digital libraries. Search keywords are identified in the former operation after analyzing the research field to which this study applies, “Election Prediction on Twitter.” Table 2 shows the whole set of selected keywords for this study. In the latter operation, we selected a list of digital libraries to execute the search strings. Five digital libraries were selected to carry out this research: IEEE Xplore, Web of Science (WoS), Scopus, ACM, and ScienceDirect. The keywords were used to create final queries using

We executed search queries on the level of title, abstract, and keywords of the articles. Some digital libraries do not provide search on the level of title, abstract, and keywords. In such a case, the search is performed on the entire text. Table 3 shows the list of digital libraries and the search queries that were executed to obtain potential primary papers. We performed the search in three different periods (phases), which are as follows:I. E1: searching and selection of papers from January 1, 2010, to January 14, 2020II. E2: searching and selection of papers from January 15, 2020, to January 7, 2021III. E3: searching and selection of papers from January 1, 2010, to January 7, 2021

The logic behind the three extraction phases is that we started this research before the second phase. Due to Covid-19, the work has been delayed. It can be noticed that E2 is not performed on Scopus. It is because, in mid-2020, Scopus has discontinued the search in its library. We used the ScienceDirect library as an alternate to Scopus.

Almost every digital library allows users to export the search results in some formats, that includes the title of the paper, metadata (venue, year of publication, authors names, authors affiliation, and much more), abstract (some digital libraries do not provide that), and keywords. After executing the first search, we obtained 787 potential papers.

3.4. Selection of Study and Quality Assessment

Mainly two tasks are included in the process of selecting a relevant paper: (1) defining the criteria for including/excluding the paper and (2) applying the defined criteria to choose the relevant papers [1618].

The following inclusion criteria were applied to the abstract of each paper:IC1: the study, related to election prediction (or forecasting) on TwitterIC2: research published in the field of “Computer Science”IC3: research published online between January 2010 and January 2021IC4: the reading of the study abstract must fit the topic

The following criteria were applied to exclude the papers:EC1: research papers, written in languages other than EnglishEC2: papers that are not accessible in full-textEC3: research published in non-peer review venuesEC4: grey literature and booksEC5: exclude short papers (less than four pages)EC6: exclude duplicate papers (selected only the most recent and detailed one)EC7: studies that present summaries of editorials/conferences

A top-down approach was followed to fulfill the criteria for the quality of the selection of relevant papers. Initially, the papers were excluded after taking the metadata such as title, abstract, and keywords of the papers into consideration. Furthermore, studies were excluded after reading the entire paper, if it is not in the scope of the current topic “Election Prediction on Twitter” or having low quality, such as the paper’s methodology did not satisfy the reader (author).

All the papers were equally distributed among all the authors to select the relevant paper by applying the inclusion and exclusion criteria. The authors held a meeting to ensure that a relevant paper is not excluded and an irrelevant paper is not included. The authors applied the criteria defined in [16, 17], to deal with disagreements. The details are given in Table 4. A paper is excluded if it falls in the category “F” (Exclude) or category “E” (consider as doubtful).

Figure 3 shows a full flow of the search in the five digital libraries and the selection process using inclusion/exclusion criteria. Table 5 shows the list of 98 primary selected papers for this SMS study with their bibliographic references.

3.5. Data Extraction

Data extraction is the process of extracting relevant information from the primarily selected papers according to the defined research and demographic questions. Initially, we agreed upon Data Extraction Form (DEF) after going through a thorough review. Next, we started proper extraction from the papers. “Data Extraction Form” provides a reliable and precise approach to extract data in systematic mapping studies [16, 19]. We inspected and thoroughly read the full-text of nearly all papers.

4. Results and Discussion

In this section, we briefly discuss the results of this SMS. A summary of the most notable results in each research and the demographic question is discussed separately. Figure 4(a) shows the number of studies published in different venues (Conference or Journal). Figure 4(b) shows the distribution of studies across the years. It is noteworthy that the topic of “Election Prediction on Twitter” is attracting researchers’ attention since the last decade.

4.1. RQ1: What Are the Approaches Used in Predicting Elections on Twitter?

Figure 5 shows the number of studies that use different approaches for election prediction on Twitter: sentiment analysis (SA), sentiment analysis (orientation), volumetric (Vol.), social network analysis (SNA); topic modelling using LDA (in this study, the algorithm name LDA is used instead of topic modelling in the approaches); and a combination of these approaches such as SA & Vol.; SA, Vol., & SNA; and SA (orientation), SNA, & LDA.

In this SMS, we have taken SA and SA (orientation) separately to facilitate researchers’ rapid approach to the specific study. SA approach includes a study that used either or both polarity detection (positive, negative, and neutral) and emotion detection (tense, angry, sad, happy, relaxed, exhausted, calm, excited, and nervous). SA (orientation) studies the political orientation of voters by analyzing their tweets that show voting behaviour explicitly, such as “I will vote for candidate A” and “I will not vote for candidate A.”

We defined the following terminologies to be used in the rest of the paper:i: an approach used alone in a paperj: an approach used along with other approaches in a paper

Figure 6 presents the approaches, along with the primary selected study(s). Figure 5 shows that 64 studies used the sentiment analysis approach only (SAi), nearly 65% of all the primary papers used in this study. Only 3 papers used SA (orientation).

It is interesting to note that only 9 papers employed the volume-based approach only, making it almost 9%. A hybrid approach of the “SA and Vol.” has been used by 16% of the selected studies. 1 study used SNA only, and 3 papers used the combination of SA and SNA approach, which makes almost 5% of the total studies. Only two studies used LDA along with the other approaches. S-17 used LDA for topic modelling and categorized those topics into positive and negative.

It is worth noting that most of the studies applied an SA approach (SAi+ SAj), which makes 89%, followed by a volume-based approach, concluding, 26% of the studies (Voli+ Volj). Very few studies employed a social network analysis approach. Opinion mining depicts better understandings about a political user’s behaviour. A user’s expressions in words are more understandable than the communication connections; an example is 100 citizens who comment negatively on a political leader’s post. It can positively impact the results of a prediction using a volumetric or SNA approach, but it is certainly against the leader (opposing in context). Thus, many researchers tend to use the SA approach.

4.1.1. RQ1(a): What Are the Techniques Used for Election Prediction on Twitter?

Approaches (RQ1) are further analyzed in-depth by answering RQ1(a),(b),(c), such as a supervised technique (SVM and NB) is applied in the SA approach for classifying tweets into positive, negative, or neutral. In this SMS, the techniques are classified into supervised (S); unsupervised (US); deep learning (DL); lexicon-based approaches (LAs); count (C); library (tool such as TextBlob); and the combination of these techniques such as S & US; S, US, & LA; US & LA; S & LA; S & DL; LA, C, & SNA; S, & C; S, US, & C.

Figure 7 shows the number of studies reporting these techniques. Numerous studies have employed supervised (S) learning techniques, 34 studies (Si) making almost 35% of the selected studies. By looking in-depth, we can see that some studies used other techniques along with it, such as S-41, S-51, and S-92. In conclusion, 51 studies used S-learning in total (Si+Sj), which makes it the highest used technique (52%) in this SMS.

Several studies used the LA for sentiment analysis, especially for tweets other than English. 25 studies employed LAi. Few papers reported LAj, making it (LAi + LAj) 39% of selected studies in this SMS. 18% of the selected studies used the count (Ci + Cj) techniques. Few papers employed US techniques in total (USi + USj) 9%. Only 5% of the selected studies used deep learning (DLi + DLj) techniques. Some studies used another tool/library for sentiment analysis, such as S-77 used TextBlob without mentioning any algorithm. Figure 8 shows the techniques along with the study(s).

4.1.2. RQ(b): Which Tools Are Utilized?

This section gives an overview of the tools, libraries, and dictionaries (TLD) used to assist the election prediction on Twitter. In addition to the list of TLD, the list of primary studies has been given exclusively in Table 6. NLTK is used the most. Some tools provide a graphical user interface (GUI), such as WEKA, RapidMiner, and Gephi. Nearly, 13% used such GUI tools. Almost 18 types of dictionaries are employed in the primary studies. Only one study reported Hadoop. The rest of the details can be seen in Table 6.

4.1.3. RQ(c): Which Techniques/Tools Are Employed for Tweet Collection?

Data can be collected from Twitter either using API or by crawling. Twitter provides two types of APIs: REST and Streaming. Few of the selected studies did not explicitly report any technique for collecting Twitter data, such as S-22, S-28, S31, S-35, and S-95. Some of the studies reported “Twitter API” only. S-57 used a dataset in Data World [66]. Figure 9 shows the number of studies that use different techniques and tools for collecting tweets. In this SMS, we used techniques and tools (name) similar to those reported in the primary studies. An example is Tweepy and twitter4j are Streaming APIs and is taken separately from Twitter Streaming API.

4.2. RQ2: Which Studies Reported Manually/Automatically Annotated Data?

Annotated (or labelled) corpus assists in training supervised and semisupervised techniques [67]. Large and unambiguous annotated data can lead to a better prediction by improving an algorithm’s results. Data can be annotated either or both manually and automatically [68]. There are few political annotated datasets available. Languages other than English lack such datasets.

This RQ aims to identify and list the studies that used manual or automatic data labelling. Some studies worked in languages other than English, such as S-48 annotated tweets in the Bulgarian language. Few studies employed automatic data labelling techniques such as S-79 uses deep neural networks to label the data. Figure 10 shows the list of studies that use manual or automatic political data labelling.

4.3. RQ3: Which Countries Are Reported for Election Prediction on Twitter?

This RQ aims to identify and list the countries whose elections are analyzed in the primary studies. Figure 11 shows the list of 28 countries and the total number of studies that analyzed its elections. It can be seen that 27 studies analyzed USA elections and 24 studies studied the prediction of Indian elections (both country level and regional). Elections of Indonesia, Netherlands, and Spain are reported in 7 studies, respectively, followed by Pakistan in 5, the UK in 4, and the rest can be observed in Figure 11.

4.4. RQ4: What Are the Languages of Tweets Used for Predicting Elections on Twitter?

The objective of this RQ is to classify and list the tweet languages used in the primary studies. Tweet languages used are Bulgarian, Chinese (candidates’ names) (CNN), Dutch, English, English translated from Spanish (S2E), English translated from Urdu (U2E), English translated from German (G2E), English translated from others (O2E), Greek, Hindi, Indonesian, Italian, Persian, Portuguese, Spanish, Swedish, Turkish, Multilanguage (English and Spanish) (MLES), Assume Multilanguage (English and Roman Urdu) (AMLEU), Assumption (English) (AE), Assumption (Spanish) (AS), and Not Mentioned (NM).

Roughly, 45% of the primary studies used English tweets. Subsequently, 7% of studies analyzed tweets in Indonesian and 7% in Spanish languages used. Figure 12 presents the list of languages and the number of studies that investigated them. Some studies translated tweets from other languages to English for further investigation. The reason is that other languages lack resources (annotated data and dictionaries); S-20, S-41, S-61, and S-76 are examples. S-17 used Chinese candidates’ names for tweet collection and used the volumetric approach for predicting the election. Almost 16% of studies have not reported any language, volumetric approach (most studies).

4.5. RQ5: What Are the Most Frequent Topics Discussed?

The goal of this question in this study is to extract information from the selected studies automatically. Such an approach can help the researchers to have an insight into the topics discussed. We classified the implementation and representation into two parts: (1) topic modelling (correlation) and (2) word cloud. LDA [69] is an example of topic modelling. We applied the topic modelling technique on two levels of the primary studies:1. Abstract level2. Full-text level

We further generated word clouds from the selected papers on the following levels:1. Titles2. Author keywords3. Abstracts4. Full-text

We converted all the papers from PDF to Text. For topic modelling, the data are preprocessed to clean the extracted data. The steps include converting all text to lower case, stemming and lemmatization, and employing stop words (English). Furthermore, sections such as “Acknowledgement” and “References” were excluded to perform topic modelling at “full-text level.” For word cloud, all the text at different levels (title, keywords, abstract, and full-text) is tokenized into single words, followed by removing unnecessary words using stop words (English). Next, compute the word frequencies and generate a word cloud for each level.

Figure 13 shows the 25 topics generated at the abstract level and illustrates the correlations between them. Blue circles represent correlated topics, while the red colour shows the anticorrelation or inverse correlation. It shows us exciting findings, such as “sentiment analysis polarity” has a high correlation with “presidential predict win.” Another topic, “social media popularity,” is highly correlated with “presidential predict win,” “outcome account expects,” and “election poll outcome.” The rest of the correlation and inverse correlation of the topics can be explored in Figure 13.

Figure 14 represents the correlation between 25 topics generated from the primarily selected papers’ full-text. It is interesting to note that nearly all the topics are anticorrelated.

4.5.1. Word Cloud

Word cloud represents the words visually. The occurrence of the most popular and frequent words appears in the word cloud. Figure 15(a) shows the word cloud for the titles of the selected papers. The words “Election, Elections, Analysis, Twitter, Sentiment, and Presidential” are prominent. It shows us that most studies employed sentiment analysis for predicting elections. Most of the studies analyzed presidential elections.

Figure 15(b) shows the word cloud generated from the author keywords of the selected studies. The words “Election, Sentiment, Twitter, Prediction, Social, Mining, Media, Machine, and Learning” are prominent. These findings show that the majority of studies implemented sentiment analysis for predicting elections on Twitter. Furthermore, it is worth noticing that many studies employed machine learning techniques.

Figure 16(a) depicts similar words in the world cloud of abstracts as in Figures 15(a) and 15(b). Some high-frequency words are “Twitter, Election, Social, Media, Sentiment, Analysis, Political, Predict, and Opinion.” Figure 16(b) illustrates almost the same themes from full-text as discussed in other word clouds. Some of the famous words are “Twitter, Election, Social, Prediction, Social, Media, Users, Presidential, Opinion, and India.” It shows that most of the studies applied sentiment analysis to predict elections on Twitter. It shows that several studies analyzed presidential and Indian elections.

By comparing the findings from word clouds and the outcomes of RQ1, it is noteworthy that both the results are nearly the same. As discussed in Section 4.1, approximately 89% of the studies applied sentiment analysis (SAi+ SAj). RQ1(a) shows that machine learning techniques are employed the most. Furthermore, RQ3 shows that the majority of the studies analyzed USA and Indian elections. The outcomes from the word clouds reflect almost the same information.

4.6. DQ1: Which Are the Most Active Researchers in the Field of Analyzing Election Prediction on Twitter?

A total of 284 researchers contributed and appeared as authors in the 98 selected primary studies. We selected researchers who have appeared in two or more papers in the selected studies. Figure 17 shows the most active researchers along with the study they contributed in.

Almost 100% of the active researchers are affiliated with academic organizations. These data identified some research groups in which researchers collaborated, such as Brian Heredia, Joseph D. Prusa, and Taghi M. Khoshgoftaar. These data let us know that the researcher Malhar Anjaria is not active since 2014. Furthermore, we noticed that the research group of Rincy Jose and Varghese S Chooralil is not active since 2016. This finding also tells us that more academic and industrial collaboration is needed.

4.7. DQ2: Which Are the Most Active Organizations?

This RQ aims to identify and list the most active organizations that appeared in the selected studies. A total number of 158 organization names were listed, out of which 13 organizations contributed to more than one study. The list of the organizations and their support level (contribution) is given in Table 7.

In this SMS, we have divided organizations into two categories: industry and academic (university, research institute, and government research organization). It is attention-grabbing that most of the academia is more active than the industry. Only 7 industrial organizations appeared in the selected studies. In S-82, one researcher named Nathaniel Poor affiliated to no organization. There is a need for more industrial and academic collaboration that can improve this domain. Figure 18 shows the distribution of organizations.

4.8. DQ3: Which Are the Most Active Publication Venues?

This RQ aims to identify and list the most active publication venues in the selected studies. Table 8 shows the venue name along with the support level (>1). The most active conference is “Lecture Notes in Computer Science,” whose support level is 5, followed by the “Communication in Computer and Information Science” conference. Only two journals, “PLOS ONE” and “Social Network Analysis and Mining,” have support level 2. The research is more published in conference venues, so the trend should be published in more prestigious peer-reviewed journals.

5. Validity Threats

We have followed some protocols to avoid or mitigate the validity threats (VTs) in this study. These VTs are as follows:1. Descriptive validity2. Interpretive validity3. Theoretical validity4. Generalizability5. Reliability

Each of these VTs is discussed separately in the subsequent sections.

5.1. Descriptive Validity

Descriptive validity (DV) deals with the accuracy and objectivity of the extracted information. DV endorses that no imperative information is skipped or ignored during the extraction process. To deal with DV, we arranged regular sessions to discuss and build agreement upon the extraction process, such as what information needs to be collected and stored. We agreed and designed Data Extraction Form (DEF) collectively. To maintain unbiasedness and ensure traceability, every entry in the DEF has a comment that links each extracted value assigned by the researcher.

5.2. Interpretive Validity

Interpretive validity (IV) deals with the validity of the conclusion drawn from the extracted information and ensures that the information extracted by a researcher is unbiased. To minimize IV, we applied the subsequent mechanisms. Initially, we arranged regular meetings to ensure that all the researchers are agreed upon the same interpretation and conclusion of the results (extracted information), a set of protocols, and their executions. Next, excluding the first author, researchers were divided into two distinct groups, drawing the results’ interpretation. The first author compared the drawn conclusions, matched them, and standardized the writing style. Finally, all the authors substantiated the interpretation and its traceability to the previous results in the DEF.

5.3. Theoretical Validity

Theoretical validity (TV) is a vital type of threat as there is a possibility of various inaccuracies while selecting relevant papers, such as biasedness of a researcher while extracting the papers, incapability of the search and selection process (either or both selecting irrelevant papers and excluding relevant papers), and quality of the selected papers, which leads to flawed conclusions.

We followed protocols discussed in Sections 3.3 and 3.4 to search the papers in the five databases and select relevant papers to minimize this threat.

5.4. Generalizability

To reduce this threat, we relied upon the impartiality of the data extraction process, DEF, and the set of rules to investigate, leading to the interpretations. Nevertheless, we assume that the primarily selected studies (98 papers) achieve the generalization with low-risk [70].

5.5. Reliability

To increase this SMS’s reliability, we performed a comprehensive report of the complete process from the start of the protocol till the conclusion. Finally, we described the rubrics used for self-appraisal by implementing the guidelines from Kitchenham and Charters [70] to minimize the threats.

6. Conclusion and Future Work

This study reports the planning, conducting, and implementation steps on “predicting elections on Twitter.” We selected 98 studies from January 2010 to January 2021. This study aims to identify and classify the approaches, techniques, tools, countries, and languages used in election prediction on Twitter.

We defined and implemented a search strategy to achieve our goal. Initially, we found 787 potential studies. After implementing selecting criteria (inclusion/exclusion), we chose 98 primary studies as relevant.

The extracted data lead us to the following conclusions:RQ1: approximately, 65% of the selected studies reported sentiment analysis (SAi) approach and 24% of the selected studies reported SAj, which concludes that 89% of the selected studies implemented sentiment analysis in total (SAi+ SAj), followed by a volume-based approach, 26% of the selected studies in total (Voli + Volj). 6% of the selected studies employed social network analysis techniques (SNAi + SNAj).RQ1(a): 51% of the selected studies used supervised learning in total (Si+Sj), which makes it the highest used technique (52%) in this SMS. Lexicon-based approach makes 39% (LAi+ LAj). 18% employed volumetric techniques (Ci+Cj). Only 9% employed unsupervised learning techniques (USi+ USj). Furthermore, 5% of the selected studies implemented deep learning (DLi+ DLj) techniquesRQ1(b): this SMS listed nearly all the tools used in the primary selected studies. NLTK is used most commonly. 13% of the selected studies reported GUI tools such as WEKA and RapidMiner. Almost 18 types of dictionaries are used in the primary studies.RQ1(c): almost 12% used Tweepy, 7% employed TwitterR, 5% Twitter REST API, 12% Search API, 9% Streaming API, and 20% of the selected studies just mentioned Twitter API.RQ2: 44% of the selected studies manually or automatically annotated the data.RQ3: the elections of 28 countries are analyzed in the selected studies. 28% of the selected studies studied USA election, and 25% analyzed Indian Elections. Elections of Indonesia, Netherlands, and Spain are reported in 7% (each) of the studies, followed by Pakistan 5%, and 4% analyzed UK elections.RQ4: nearly 45% of the primary studies used English tweets. 7% of the selected studies analyzed tweets in Indonesian, and 7% in Spanish languages. Approximately, 5% of the selected studies translated tweets from other languages to English, making English 50%.RQ5: some popular topics are “Election, Prediction, Twitter, Sentiment, Analysis, Opinion, Mining, Presidential, USA, India, Machine, and Learning.”

Demographic data show that 76% of the selected studies are conference papers, and 24% are Journal papers. Predicting elections on Twitter is getting more popular and attracting more researchers in the last decade. 284 researchers contributed in the primary selected 98 papers out of which 21 authors have support level more than 2. The authors who appeared in the selected studies were affiliated to 158 organizations. 13 organizations have contributed to more than 2 studies, out of which two organizations have support level 3. The results highlighted that 149 are academic organizations, and only 7 industrial affiliations have appeared. Furthermore, 9 venues are the most active, out of which 7 are conferences.

As future work, we recommend that(i)There is a need for in-depth analysis in the field of prediction election on Twitter(a)Metrics evaluation of the techniques(b)Details about the countries(c)Types of elections(d)Details about the data(e)Election results(ii)Empirical studies need to be conducted; election prediction(iii)Analyze elections predictions on platforms other than Twitter(iv)Analyze and compare election predictions in cross-fields, such as computer science and social sciences

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The research work was funded by the Beijing Municipal Natural Science Foundation (grant no. 4212026), National Science Foundation of China (grant no. 61772075), and National Key Research and Development Project of China (grant no. 2018YFC0832304). The authors are thankful to them for their financial support.