Complexity

Complexity / 2021 / Article
Special Issue

Complexity in Deep Neural Networks

View this Special Issue

Review Article | Open Access

Volume 2021 |Article ID 5565434 | https://doi.org/10.1155/2021/5565434

Asif Khan, Huaping Zhang, Nada Boudjellal, Arshad Ahmad, Jianyun Shang, Lin Dai, Bashir Hayat, "Election Prediction on Twitter: A Systematic Mapping Study", Complexity, vol. 2021, Article ID 5565434, 27 pages, 2021. https://doi.org/10.1155/2021/5565434

Election Prediction on Twitter: A Systematic Mapping Study

Academic Editor: M. Irfan Uddin
Received10 Feb 2021
Revised18 Mar 2021
Accepted25 Mar 2021
Published08 Apr 2021

Abstract

Context. Social media platforms such as Facebook and Twitter carry a big load of people’s opinions about politics and leaders, which makes them a good source of information for researchers to exploit different tasks that include election predictions. Objective. Identify, categorize, and present a comprehensive overview of the approaches, techniques, and tools used in election predictions on Twitter. Method. Conducted a systematic mapping study (SMS) on election predictions on Twitter and provided empirical evidence for the work published between January 2010 and January 2021. Results. This research identified 787 studies related to election predictions on Twitter. 98 primary studies were selected after defining and implementing several inclusion/exclusion criteria. The results show that most of the studies implemented sentiment analysis (SA) followed by volume-based and social network analysis (SNA) approaches. The majority of the studies employed supervised learning techniques, subsequently, lexicon-based approach SA, volume-based, and unsupervised learning. Besides this, 18 types of dictionaries were identified. Elections of 28 countries were analyzed, mainly USA (28%) and Indian (25%) elections. Furthermore, the results revealed that 50% of the primary studies used English tweets. The demographic data showed that academic organizations and conference venues are the most active. Conclusion. The evolution of the work published in the past 11 years shows that most of the studies employed SA. The implementation of SNA techniques is lower as compared to SA. Appropriate political labelled datasets are not available, especially in languages other than English. Deep learning needs to be employed in this domain to get better predictions.

1. Introduction

The relation between social media platforms, being the new way of linking the parts of the world, and politics is no secret. This relation attracted researchers seeking to exploit this era’s new abundant useful information to perform different tasks such as information extraction and sentiment analysis, among others. One of the most widely used platforms by researchers is Twitter. Apart from the dictionary approach and statistical approaches, machine learning has been effectively applied in several other domains for different purposes, for instance, [13]. Machine learning improved the prediction job in terms of accuracy and precision.

As of October 2020, Twitter had over 300 million users worldwide; 91% of them are over the age of 18. This platform attracts many politicians and enables them to interact and use it as a tool in their campaigns [4]. Offering an API that allows extracting public tweets and user’s public information and interconnections, it is considered a treasure for researchers aiming for election predictions.

Many researchers have analyzed and predicted different countries’ elections on different social media platforms such as Facebook and Twitter [48]. Few studies surveyed this topic [911]. To the best of our knowledge, no study ever has reported a systematic mapping study (SMS) or systematic literature review (SLR) about election predictions on Twitter. This research systematically identifies, gathers, and provides the available empirical evidence in this area.

This research study assists in providing a comprehensive overview and getting more in-depth knowledge about election prediction on Twitter, thus helping to(i)identify research gaps (research opportunities)(ii)aid researchers (decision-making) when selecting approaches or tools.

The main contribution of this research work is as follows:(1)Identify and classify the main approaches (RQ1) used to predict election: its techniques (RQ1a) and the tools (RQ1(b) (c))(2)Identify the research works that have reported manual/automatic data labelling (political data) (RQ2)(3)Identify and enlist the countries whose elections are analyzed (RQ3)(4)Identify and list the tweet languages used for predicting election on Twitter (RQ4)(5)Identify main topics used in the studies using machine learning techniques (RQ5)(6)Identify some demographic data in the field of election prediction on Twitter, such as the most frequent publication venues, active countries, organizations, and researchers (DQs)(7)Providing a centralized source for the researchers and practitioners by gathering disperse shreds of evidence (studies)

The remainder of this paper’s organization is as follows: Section 2 provides an overview of the most related work, and Section 3 presents a detailed methodology, following by Results and Discussion in Section 4. Furthermore, Section 5 deals with Validity and Threats, followed by the Conclusion and Future Work discussed in Section 6.

This section presents the most related work to SMS on election predictions on Twitter.

Chauhan et al. [9] in 2020 surveyed election prediction on online platforms such as Twitter and Facebook. Their study presents an in-depth analysis of the evaluation of SA techniques used in election prediction. They overviewed nearly 48 studies, including 10 studies that tried to infer users’ political stance.

In May 2019, Bilal et al. [10] presented a short overview of election prediction on Facebook and Twitter. They gave an overview of 13 studies. Their study mainly categorized the studies into two approaches: sentiment analysis and others. Additionally, they categorized those studies into two categories: “can predict elections” and “cannot predict elections.”

Singh and Sawhney [11] conducted a review of 16 papers in December 2017 related to forecasting elections on Twitter. They listed the countries whose elections were analyzed and provided tweet statistics used in the selected studies. Furthermore, they listed and presented the methods used for prediction and classified the studies into successfully and unsuccessfully, predicted elections.

All these studies presented short reviews except for [9]. Besides, all the aforementioned studies performed Adhoc literature surveys, and none of them followed a detailed systematic protocol. This study is the first systematic mapping study that mainly focused on election prediction on Twitter and thoroughly overviewed and analyzed the selected 98 primary studies.

3. Methodology

A systematic mapping study (SMS) is an effective way of getting knowledge about the state-of-the-art of a research field. This study conducts an SMS of election prediction on Twitter. Figure 1 shows the detailed flow of this SMS.

3.1. Approaches for Predicting Election on Twitter

Various approaches possibly could be employed to predict elections on Twitter. Researchers and practitioners mainly use three approaches: sentiment analysis (SA); volume-based (Vol.); and social network analysis (SNA). Figure 2 shows a generalized framework of election prediction on Twitter. A Twitter API is used to collect tweets about the election (candidates, election, political party, and trends). It is then preprocessed (cleaned and filtered) according to the needs, such as removing unnecessary characters, whitespaces, stemming, and so on, for sentiment analysis. Afterwards, an approach or technique is employed to perform the election prediction job or task effectively.

3.2. Aim and Research Questions

This study aims to identify and categorize the methods used for predicting elections on the Twitter platform. This aim can be divided into a set of research questions (RQs) for its broadness. The set of research questions (RQs) is as follows:RQ1: what are the approaches used in predicting elections on Twitter?RQ1(a): what are the techniques used for election prediction on Twitter?RQ1(b): which tools are utilized for election predictions?RQ1(c): which techniques/tools are employed for tweet collection?RQ2: which studies reported manually/automatically annotated data?RQ3: which countries are reported for election prediction on Twitter?RQ4: what are the languages of tweets used for predicting elections on Twitter?RQ5: what are the most frequent topics discussed?

We also gathered and investigated some exciting information by defining and answering some demographic questions (DQs): most active countries, organizations, and authors. This information helps the practitioners, researchers, and organizations in a certain way [1215]. The set of DQs is as follows:DQ1: who are the most active researchers in the field of analyzing election prediction on Twitter?DQ2: which are the most active organizations?DQ3: which are the most active publication venues?

Table 1 gives a short description of research questions (RQs) and demographic questions (DQs).


QuestionsDescription

RQ1Approaches (sentiment analysis, volumetric, social network analysis, or hybrid) used in the selected papers.
RQ1(a)Identify the learning techniques of the approaches used in the selected papers: Machine Learning (supervised, unsupervised, hybrid, deep learning), lexicon-based approach, and no machine learning (volumetric, online tool).
RQ1(b)Identify tools, libraries, and dictionaries along with the primary studies.
RQ1(c)List the techniques used for collecting tweets.
RQ2Identify and list the studies that manually/automatically labelled the data to assist their experiments (training, testing data).
RQ3List of the countries whose elections are analyzed on Twitter in the selected papers.
RQ4List of tweet languages analyzed in the selected papers.
RQ5Identify the most frequent topics automatically using LDA in the selected papers.
DQ1, DQ2, DQ3Based on the number of publications, the minimum contribution level is two papers.

3.3. Search Strategy

It is mandatory to complete two essential operations before executing the search in different digital libraries: (a) specify search keywords and (b) specify digital libraries. Search keywords compose the search strings in digital libraries. Search keywords are identified in the former operation after analyzing the research field to which this study applies, “Election Prediction on Twitter.” Table 2 shows the whole set of selected keywords for this study. In the latter operation, we selected a list of digital libraries to execute the search strings. Five digital libraries were selected to carry out this research: IEEE Xplore, Web of Science (WoS), Scopus, ACM, and ScienceDirect. The keywords were used to create final queries using


ABC

A1: ElectionB1: Predict (Predict, Prediction, Predicted, Predicting)C1: Twitter
B2: Forecast (Forecast, Forecasting, Forecasted)

We executed search queries on the level of title, abstract, and keywords of the articles. Some digital libraries do not provide search on the level of title, abstract, and keywords. In such a case, the search is performed on the entire text. Table 3 shows the list of digital libraries and the search queries that were executed to obtain potential primary papers. We performed the search in three different periods (phases), which are as follows:I. E1: searching and selection of papers from January 1, 2010, to January 14, 2020II. E2: searching and selection of papers from January 15, 2020, to January 7, 2021III. E3: searching and selection of papers from January 1, 2010, to January 7, 2021


Digital libraryQuery

Web of ScienceTS = (ELECTION) AND TS = (PREDICT OR FORECAST) AND TS = (TWITTER)
IEEE(((“All Metadata”:election) AND “All Metadata”:”predict” OR “forecast”) AND “All Metadata”:twitter
ACM(Title:(election) OR Abstract:(election)) AND (Title:(“prediction” “forecast”) OR Abstract:(“prediction” “forecast”)) AND (Title:(twitter) OR Abstract:(twitter))
Scopus(TITLE-ABS-KEY ( election) AND (TITLE-ABS-KEY ( predict) OR TITLE-ABS-KEY (forecast) ) AND TITLE-ABS-KEY (twitter) )
ScienceDirectTitle, abstract, keywords: ((“election”) AND (“predict” OR “forecast”) AND “twitter”))

The logic behind the three extraction phases is that we started this research before the second phase. Due to Covid-19, the work has been delayed. It can be noticed that E2 is not performed on Scopus. It is because, in mid-2020, Scopus has discontinued the search in its library. We used the ScienceDirect library as an alternate to Scopus.

Almost every digital library allows users to export the search results in some formats, that includes the title of the paper, metadata (venue, year of publication, authors names, authors affiliation, and much more), abstract (some digital libraries do not provide that), and keywords. After executing the first search, we obtained 787 potential papers.

3.4. Selection of Study and Quality Assessment

Mainly two tasks are included in the process of selecting a relevant paper: (1) defining the criteria for including/excluding the paper and (2) applying the defined criteria to choose the relevant papers [1618].

The following inclusion criteria were applied to the abstract of each paper:IC1: the study, related to election prediction (or forecasting) on TwitterIC2: research published in the field of “Computer Science”IC3: research published online between January 2010 and January 2021IC4: the reading of the study abstract must fit the topic

The following criteria were applied to exclude the papers:EC1: research papers, written in languages other than EnglishEC2: papers that are not accessible in full-textEC3: research published in non-peer review venuesEC4: grey literature and booksEC5: exclude short papers (less than four pages)EC6: exclude duplicate papers (selected only the most recent and detailed one)EC7: studies that present summaries of editorials/conferences

A top-down approach was followed to fulfill the criteria for the quality of the selection of relevant papers. Initially, the papers were excluded after taking the metadata such as title, abstract, and keywords of the papers into consideration. Furthermore, studies were excluded after reading the entire paper, if it is not in the scope of the current topic “Election Prediction on Twitter” or having low quality, such as the paper’s methodology did not satisfy the reader (author).

All the papers were equally distributed among all the authors to select the relevant paper by applying the inclusion and exclusion criteria. The authors held a meeting to ensure that a relevant paper is not excluded and an irrelevant paper is not included. The authors applied the criteria defined in [16, 17], to deal with disagreements. The details are given in Table 4. A paper is excluded if it falls in the category “F” (Exclude) or category “E” (consider as doubtful).


Reviewer 1
IncludeUncertainExclude

Reviewer 2IncludeABD
UncertainBCE
ExcludeDEF

Figure 3 shows a full flow of the search in the five digital libraries and the selection process using inclusion/exclusion criteria. Table 5 shows the list of 98 primary selected papers for this SMS study with their bibliographic references.


Primary Study# of citation (till Jan 2021)References

S-013R. C. Prati and E. Said-Hung, “Predicting the ideological orientation during the Spanish 24M elections in Twitter using machine learning,” AI Soc., vol. 34, no. 3, pp. 589–598, 2019, doi: 10.1007/s00146-017-0761-0.
S-020P. Mazumder, N. A. Chowdhury, M. Anwar-Ul-Azim Bhuiya, S. H. Akash, and R. M. Rahman, “A Fuzzy Logic Approach to Predict the Popularity of a Presidential Candidate,” in Studies in Computational Intelligence, 2018, vol. 769, pp. 63–74, doi: 10.1007/978-3-319-76081-0_6.
S-032D. Beleveslis, C. Tjortjis, D. Psaradelis, and D. Nikoglou, “A hybrid method for sentiment analysis of election related tweets,” 2019, doi: 10.1109/SEEDA-CECNSM.2019.8908289.
S-043E. Sanders and A. van den Bosch, “A Longitudinal Study on Twitter-Based Forecasting of Five Dutch National Elections,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11864 LNCS, pp. 128–142, doi: 10.1007/978-3-030-34971-4_9.
S-0515L. Oikonomou and C. Tjortjis, “A Method for Predicting the Winner of the USA Presidential Elections using Data extracted from Twitter,” 2018, doi: 10.23919/SEEDA-CECNSM.2018.8544919.
S-0646M. Anjaria and R. M. R. Guddeti, “A novel sentiment analysis of social networks using supervised learning,” Soc. Netw. Anal. Min., vol. 4, no. 1, pp. 1–15, 2014, doi: 10.1007/s13278-014-0181-9.
S-0712A. J. Wicaksono, Suyoto, and Pranowo, “A proposed method for predicting US presidential election by analyzing sentiment in social media,” in Proceeding - 2016 2nd International Conference on Science in Information Technology, ICSITech 2016: Information Science for Green Society and Environment, 2017, pp. 276–280, doi: 10.1109/ICSITech.2016.7852647.
S-0816J. A. Cerón-Guzmán and E. León-Guzmán, “A sentiment analysis system of Spanish tweets and its application in Colombia 2014 presidential election,” in Proceedings - 2016 IEEE International Conferences on Big Data and Cloud Computing, BDCloud 2016, Social Computing and Networking, SocialCom 2016 and Sustainable Computing and Communications, SustainCom 2016, 2016, pp. 250–257, doi: 10.1109/BDCloud-SocialCom-SustainCom.2016.47.
S-096S. Bhatia, B. Mellers, and L. Walasek, “Affective responses to uncertain real-world outcomes: Sentiment change on Twitter,” PLoS One, vol. 14, no. 2, 2019, doi: 10.1371/journal.pone.0212489.
S-102M. Plummer, M. A. Palomino, and G. L. Masala, “Analysing the Sentiment Expressed by Political Audiences on Twitter: The Case of the 2017 UK General Election,” in Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017, 2018, pp. 1449–1454, doi: 10.1109/CSCI.2017.253.
S-1120R. Srivastava, M. P. S. Bhatia, H. Kumar, and S. Jain, “Analysing Delhi Assembly Election 2015 using textual content of social network,” in ACM International Conference Proceeding Series, 2015, vol. 25-27-Sept, pp. 78–85, doi: 10.1145/2818567.2818582.
S-1218R. Bose, R. K. Dey, S. Roy, and D. Sarddar, “Analysing political sentiment using Twitter data,” Smart Innov. Syst. Technol., vol. 107, pp. 427–436, 2019, doi: 10.1007/978-981-13-1747-7_41.
S-1321E. Tunggawan and Y. E. Soelistio, “And the winner is...: Bayesian Twitter-based prediction on 2016 US presidential election,” in Proceeding - 2016 International Conference on Computer, Control, Informatics and its Applications: Recent Progress in Computer, Control, and Informatics for Data Science, IC3INA 2016, 2017, pp. 33–37, doi: 10.1109/IC3INA.2016.7863019.
S-146M. Ramzan, S. Mehta, and E. Annapoorna, “Are tweets the real estimators of election results?,” in 2017 10th International Conference on Contemporary Computing, IC3 2017, 2018, vol. 2018-Janua, no. August, pp. 1–4, doi: 10.1109/IC3.2017.8284309.
S-1535L. Chen, W. Wang, and A. P. Sheth, “Are twitter users equal in predicting elections? A study of user groups in predicting 2012 US republican presidential primaries,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, vol. 7710, no. May 2014, pp. 379–392, doi: 10.1007/978-3-642-35386-4_28.
S-1616R. Castro, L. Kuffó, and C. Vaca, “Back to #6D: Predicting Venezuelan states political election results through Twitter,” in 2017 4th International Conference on e-democracy and eGovernment, ICEDEG 2017, 2017, pp. 148–153, doi: 10.1109/ICEDEG.2017.7962525.
S-178Z. Xie, G. Liu, J. Wu, and Y. Tan, “Big data would not lie: prediction of the 2016 Taiwan election via online heterogeneous information,” EPJ Data Sci., vol. 7, no. 1, 2018, doi: 10.1140/epjds/s13688-018-0163-7.
S-180M. Ibrahim, O. Abdillah, A. F. Wicaksono, and M. Adriani, “Buzzer Detection and Sentiment Analysis for Predicting Presidential Election Results in a Twitter Nation,” in Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015, 2016, pp. 1348–1353, doi: 10.1109/ICDMW.2015.113.
S-1944A. A. Khatua, A. A. Khatua, K. Ghosh, and N. Chaki, “Can #Twitter-Trends predict election results? Evidence from 2014 Indian general election,” in Proceedings of the Annual Hawaii International Conference on System Sciences, 2015, vol. 2015-March, pp. 1676–1685, doi: 10.1109/HICSS.2015.202.
S-203P. Singh, Y. K. Dwivedi, K. S. Kahlon, A. Pathania, and R. S. Sawhney, “Can twitter analytics predict election outcome? An insight from 2017 Punjab assembly elections,” Gov. Inf. Q., vol. 37, no. 2, p. 101444, 2020, doi: 10.1016/j.giq.2019.101444.
S-2110P. Juneja and U. Ojha, “Casting online votes: To predict offline results using sentiment analysis by machine learning classifiers,” 8th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2017, 2017, doi: 10.1109/ICCCNT.2017.8203996.
S-2218D. A. Kristiyanti, A. H. Umam, M. Wahyudi, R. Amin, and L. Marlinda, “Comparison of SVM Naïve Bayes Algorithm for Sentiment Analysis Toward West Java Governor Candidate Period 2018-2023 Based on Public Opinion on Twitter,” 2018 6th Int. Conf. Cyber IT Serv. Manag. CITSM 2018, no. Citsm, pp. 1–6, 2019, doi: 10.1109/CITSM.2018.8674352.
S-2334R. Jose and V. S. Chooralil, “Prediction of election result by enhanced sentiment analysis on twitter data using classifier ensemble Approach,” Proc. 2016 Int. Conf. Data Min. Adv. Comput. SAPIENCE 2016, no. November, pp. 64–67, 2016, doi: 10.1109/SAPIENCE.2016.7684133.
S-245S. Sharma and N. P. Shetty, Determining the popularity of political parties using twitter sentiment analysis, vol. 701. Springer Singapore, 2018.
S-258F. Pimenta, D. Obradović, and A. Dengel, “A comparative study of social media prediction potential in the 2012 US Republican presidential preelections,” Proc. - 2013 IEEE 3rd Int. Conf. Cloud Green Comput. CGC 2013 2013 IEEE 3rd Int. Conf. Soc. Comput. Its Appl. SCA 2013, pp. 226–232, 2013, doi: 10.1109/CGC.2013.43.
S-2640B. Charalampakis, D. Spathis, E. Kouslis, and K. Kermanidis, “A comparison between semi-supervised and supervised text mining techniques on detecting irony in Greek political tweets,” Eng. Appl. Artif. Intell., vol. 51, no. C, pp. 50–57, May 2016, doi: 10.1016/j.engappai.2016.01.007.
S-2776J. Ramteke, S. Shah, D. Godhia, and A. Shaikh, “Election result prediction using Twitter sentiment analysis,” in Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016, 2016, vol. 1, doi: 10.1109/INVENTIVE.2016.7823280.
S-287P. Kassraie, A. Modirshanechi, and H. K. Aghajan, “Election vote share prediction using a sentiment-based fusion of Twitter data with Google trends and online polls,” in DATA 2017 - Proceedings of the 6th International Conference on Data Science, Technology and Applications, 2017, no. March, pp. 363–370, doi: 10.5220/0006484303630370.
S-298P. Mehndiratta, S. Sachdeva, P. Sachdeva, and Y. Sehgal, “Elections Again, Twitter May Help!!! A Large Scale Study for Predicting Election Results Using Twitter,” 2014, pp. 133–144, doi: 10.1007/978-3-319-13820-6_11.
S-3020M. Coletto, C. Lucchese, S. Orlando, and R. Perego, “Electoral predictions with Twitter: A machine-learning approach,” CEUR Workshop Proc., vol. 1404, 2015.
S-312S. Salari, N. Sedighpour, V. Vaezinia, and S. Momtazi, “Estimation of 2017 Iran’s Presidential Election Using Sentiment Analysis on Social Media,” Proc. - 2018 4th Iran. Conf. Signal Process. Intell. Syst. ICSPIS 2018, pp. 77–82, 2018, doi: 10.1109/ICSPIS.2018.8700529.
S-322X. Hu, L. Li, T. Wu, X. Ai, J. Gu, and S. Wen, “Every word is valuable: Studied influence of negative words that spread during election period in social media,” Concurr. Comput., vol. 31, no. 21, pp. 1–11, 2019, doi: 10.1002/cpe.4525.
S-3314B. Heredia, J. D. Prusa, T. M. Khoshgoftaar, and B. Raton, “Exploring the Effectiveness of Twitter at Polling the United States 2016 Presidential Election,” in Proceedings - 2017 IEEE 3rd International Conference on Collaboration and Internet Computing, CIC 2017, 2017, vol. 2017-Janua, pp. 283–290, doi: 10.1109/CIC.2017.00045.
S-3414P. Singh, R. S. Sawhney, and K. S. Kahlon, “Forecasting the 2016 US presidential elections using sentiment analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10595 LNCS, pp. 412–423, doi: 10.1007/978-3-319-68557-1_36.
S-355S. Rodríguez et al., “Forecasting the Chilean electoral year: Using twitter to predict the presidential elections of 2017,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 10914 LNCS, pp. 298–314, doi: 10.1007/978-3-319-91485-5_23.
S-3610A. Attarwala, S. Dimitrov, and A. Obeidi, “How efficient is Twitter: Predicting 2012 US presidential elections using Support Vector Machine via Twitter and comparing against Iowa Electronic Markets,” 2017 Intell. Syst. Conf. IntelliSys 2017, vol. 2018-Janua, no. September, pp. 646–652, 2018, doi: 10.1109/IntelliSys.2017.8324363.
S-3723R. Rezapour, L. Wang, O. Abdar, and J. Diesner, “Identifying the Overlap between Election Result and Candidates’ Ranking Based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis,” in Proceedings - IEEE 11th International Conference on Semantic Computing, ICSC 2017, 2017, pp. 93–96, doi: 10.1109/ICSC.2017.92.
S-382J. N. Franco-Riquelme, A. Bello-Garcia, and J. B. Ordieres-Meré, “Indicator Proposal for Measuring Regional Political Support for the Electoral Process on Twitter: The Case of Spain’s 2015 and 2016 General Elections,” IEEE Access, vol. 7, pp. 62545–62560, May 2019, doi: 10.1109/ACCESS.2019.2917398.
S-390M. Anjaria and R. M. R. Guddeti, “Influence factor based opinion mining of Twitter data using supervised learning,” 2014, doi: 10.1109/COMSNETS.2014.6734907.
S-404F. Tavazoee, C. Conversano, and F. Mola, “Investigating the relationship between tweeting style and popularity: The case of US presidential election 2016,” Commun. Comput. Inf. Sci., vol. 786, no. December, pp. 112–123, 2017, doi: 10.1007/978-3-319-69548-8_9.
S-415M. Awais, S. Ul, H. Ali, S. U. Hassan, and A. Ahmed, “Leveraging big data for politics: predicting general election of Pakistan using a novel rigged model,” J. Ambient Intell. Humaniz. Comput., no. 0123456789, 2019, doi: 10.1007/s12652-019-01378-z.
S-4255M. Gaurav, A. Kumar, A. Srivastava, and S. Miller, “Leveraging candidate popularity on Twitter to predict election outcome,” Proc. 7th work. Soc. Netw. Min. Anal. SNA-KDD 2013, 2013, doi: 10.1145/2501025.2501038.
S-430T. M. Fagbola and S. C. Thakur, “Lexicon-based bot-aware public emotion mining and sentiment analysis of the Nigerian 2019 presidential election on Twitter,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 10, pp. 329–336, 2019, doi: 10.14569/ijacsa.2019.0101047.
S-443B. Bansal and S. Srivastava, “Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features,” Int. J. Web Based Communities, vol. 15, no. 1, pp. 85–99, 2019, doi: 10.1504/IJWBC.2019.098693.
S-455B. Heredia, J. D. Prusa, and T. M. Khoshgoftaar, “Location-based twitter sentiment analysis for predicting the US 2016 presidential election,” in Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference, FLAIRS 2018, 2018, vol. 2009, pp. 265–270.
S-4642T. Mahmood, T. Iqbal, F. Amin, W. Lohanna, and A. Mustafa, “Mining Twitter big data to predict 2013 Pakistan election winner,” in 2013 16th International Multi Topic Conference, INMIC 2013, 2013, pp. 49–54, doi: 10.1109/INMIC.2013.6731323.
S-4728K. Singhal, B. Agrawal, and N. Mittal, “Modeling indian general elections: Sentiment analysis of political twitter data,” in Advances in Intelligent Systems and Computing, 2015, vol. 339, pp. 469–477, doi: 10.1007/978-81-322-2250-7_46.
S-4853J. Smailović, J. Kranjc, M. Grčar, M. Žnidaršič, and I. Mozetič, “Monitoring the Twitter sentiment during the Bulgarian elections,” Proc. 2015 IEEE Int. Conf. Data Sci. Adv. Anal. DSAA 2015, 2015, doi: 10.1109/DSAA.2015.7344886.
S-4930M. E. Huberty, “Multi-cycle forecasting of congressional elections with social media,” PLEAD 2013 - Proc. Work. Polit. Elections Data, Co-located with CIKM 2013, pp. 23–29, 2013, doi: 10.1145/2508436.2508439.
S-503R. Castro and C. Vaca, “National leaders’ twitter speech to infer political leaning and election results in 2015 Venezuelan parliamentary elections,” in IEEE International Conference on Data Mining Workshops, ICDMW, 2017, vol. 2017-Novem, pp. 866–871, doi: 10.1109/ICDMW.2017.118.
S-5112E. Kalampokis, A. Karamanou, E. Tambouris, and K. Tarabanis, “On predicting election results using twitter and linked open data: The case of the UK 2010 election,” J. Univers. Comput. Sci., vol. 23, no. 3, pp. 280–303, 2017, doi: 10.3217/jucs-023-03-0280.
S-5214B. Bansal and S. Srivastava, “On predicting elections with hybrid topic based sentiment analysis of tweets,” Procedia Comput. Sci., vol. 135, pp. 346–353, 2018, doi: 10.1016/j.procs.2018.08.183.
S-5391D. Murthy, “Twitter and elections: are tweets, predictive, reactive, or a form of buzz?,” Inf. Commun. Soc., vol. 18, no. 7, pp. 816–831, 2015, doi: 10.1080/1369118X.2015.1006659.
S-540S. Khan, S. A. Moqurrab, R. Sehar, and U. Ayub, Opinion and Emotion Mining for Pakistan General Election 2018 on Twitter Data, vol. 932, no. September. Springer Singapore, 2019.
S-5558A. Jungherr, “Tweets and votes, a special relationship: The 2009 federal election in Germany,” in PLEAD 2013 - Proceedings of the Workshop on Politics, Elections and Data, Co-located with CIKM 2013, 2013, pp. 5–13, doi: 10.1145/2508436.2508437.
S-564A. Kulshrestha, D. Lu, A. Shah, and D. Lu, “Politically predictive potential of social networks: Twitter and the Indian general election 2014,” in ACM International Conference Proceeding Series, 2017, vol. Part F1296, doi: 10.1145/3092090.3092137.
S-570K. K. Kameswari, J. Raghaveni, R. S. Shankar, and C. S. Rao, “Predicting election results using NLTK,” Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 1, pp. 4519–4529, 2019, doi: 10.35940/ijitee.A4399.119119.
S-5838S. Unankard, X. Li, M. Sharaf, J. Zhong, and X. Li, “Predicting elections from social networks based on sub-event detection and sentiment analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8787, pp. 1–16, doi: 10.1007/978-3-319-11746-1_1.
S-5929N. Dokoohaki, F. Zikou, D. Gillblad, and M. Matskin, “Predicting Swedish elections with Twitter: A case for stochastic link structure analysis,” Proc. 2015 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2015, pp. 1269–1276, 2015, doi: 10.1145/2808797.2808915.
S-60336E. Sang and J. Bos, “Predicting the 2011 dutch senate election results with twitter,” in Proceedings of the Workshop on Semantic Analysis in …, 2012, no. 53, pp. 53–60, [Online]. Available: http://dl.acm.org/citation.cfm?id=2389969.2389976%5Cnhttp://dl.acm.org/citation.cfm?id=2389976.
S-613169A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Predicting elections with Twitter: What 140 characters reveal about political sentiment,” ICWSM 2010 - Proc. 4th Int. AAAI Conf. Weblogs Soc. Media, pp. 178–185, 2010.
S-628P. Singh, R. S. Sawhney, and K. S. Kahlon, “Predicting the outcome of spanish general elections 2016 using twitter as a tool,” Commun. Comput. Inf. Sci., vol. 712, pp. 73–83, 2017, doi: 10.1007/978-981-10-5780-9_7.
S-630L. Mohan and S. Elayidom, “Predicting the winner of Delhi assembly election, 2015 from sentiment analysis on twitter data-A bigdata perspective,” Int. Arab J. Inf. Technol., vol. 16, no. 5, pp. 833–842, 2019.
S-6451W. Budiharto and M. Meiliana, “Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis,” J. Big Data, vol. 5, no. 1, pp. 1–10, 2018, doi: 10.1186/s40537-018-0164-1.
S-6537M. A. Razzaq, A. M. Qamar, and H. S. M. Bilal, “Prediction and analysis of Pakistan election 2013 based on sentiment analysis,” in ASONAM 2014 - Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2014, no. Asonam, pp. 700–703, doi: 10.1109/ASONAM.2014.6921662.
S-663B. R. Naiknaware and S. S. Kawathekar, “Prediction of 2019 Indian Election using sentiment analysis,” Proc. Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), I-SMAC 2018, pp. 660–665, 2019, doi: 10.1109/I-SMAC.2018.8653602.
S-6737R. Jose and V. S. Chooralil, “Prediction of election result by enhanced sentiment analysis on Twitter data using Word Sense Disambiguation,” in 2015 International Conference on Control, Communication and Computing India, ICCC 2015, 2016, pp. 638–641, doi: 10.1109/ICCC.2015.7432974.
S-6855P. Sharma and T. S. Moh, “Prediction of Indian election using sentiment analysis on Hindi Twitter,” in Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 2016, pp. 1966–1971, doi: 10.1109/BigData.2016.7840818.
S-6926L. Wang and J. Q. Gan, “Prediction of the 2017 French election based on Twitter data analysis,” 2017 9th Comput. Sci. Electron. Eng. Conf. CEEC 2017 - Proc., pp. 89–93, 2017, doi: 10.1109/CEEC.2017.8101605.
S-704L. Wang and J. Q. Gan, “Prediction of the 2017 French Election Based on Twitter Data Analysis Using Term Weighting,” 2018 10th Comput. Sci. Electron. Eng. Conf. CEEC 2018 - Proc., pp. 231–235, 2019, doi: 10.1109/CEEC.2018.8674188.
S-710Y. Gupta and P. Kumar, “Real-Time Sentiment Analysis of Tweets: A Case Study of Punjab Elections,” Proc. 2019 3rd IEEE Int. Conf. Electr. Comput. Commun. Technol. ICECCT 2019, pp. 1–12, 2019, doi: 10.1109/ICECCT.2019.8869203.
S-723F. Tavazoee, C. Conversano, and F. Mola, “Recurrent random forest for the assessment of popularity in social media: 2016 US election as a case study,” Knowl. Inf. Syst., vol. 62, no. 5, pp. 1847–1879, 2020, doi: 10.1007/s10115-019-01410-w.
S-7325E. Sanders and A. Van Den Bosch, “Relating political party mentions on twitter with polls and election results,” in CEUR Workshop Proceedings, 2013, vol. 986, pp. 68–71.
S-740J. Arroba Rimassa, F. Llopis, and R. Munoz Guillena, “Relevance as an enhancer of votes on Twitter,” pp. 1–8, 2018, doi: 10.4995/carma2018.2018.8311.
S-751G. Hu, S. Kodali, and A. Padamati, “Sentiment analysis of tweets on 2016 US presidential election candidates,” 29th Int. Conf. Comput. Appl. Ind. Eng. CAINE 2016, pp. 219–226, 2016.
S-760B. J. Chellia, K. Srivastava, J. Panja, and R. Paul, “Sentiment analysis of twitter for election prediction,” Int. J. Eng. Adv. Technol., vol. 9, no. 1, pp. 6187–6192, 2019, doi: 10.35940/ijeat.A1767.109119.
S-7714F. Nausheen and S. H. Begum, “Sentiment Analysis to Predict Election Results Using Python,” 2018 2nd Int. Conf. Inven. Syst. Control, no. Icisc, pp. 1259–1262, 2018.
S-782B. S. Bello, I. Inuwa-Dutse, and R. Heckel, “Social Media Campaign Strategies: Analysis of the 2019 Nigerian Elections,” 2019 6th Int. Conf. Soc. Networks Anal. Manag. Secur. SNAMS 2019, no. October, pp. 142–149, 2019, doi: 10.1109/SNAMS.2019.8931869.
S-7916B. Heredia, J. D. Prusa, and T. M. Khoshgoftaar, “Social media for polling and predicting United States election outcome,” Soc. Netw. Anal. Min., vol. 8, no. 1, p. 0, 2018, doi: 10.1007/s13278-018-0525-y.
S-805B. Justino Garcia Praciano, J. P. Carvalho Lustosa Da Costa, J. P. Abreu Maranhao, F. L. Lopes De Mendonca, R. T. De Sousa Junior, and J. Barbosa Prettz, “Spatio-Temporal Trend Analysis of the Brazilian Elections based on Twitter Data,” in IEEE International Conference on Data Mining Workshops, ICDMW, 2019, vol. 2018-Novem, pp. 1355–1360, doi: 10.1109/ICDMW.2018.00192.
S-8110V. K. Jain and S. Kumar, “Towards prediction of election outcomes using social media,” Int. J. Intell. Syst. Appl., vol. 9, no. 12, pp. 20–28, 2017, doi: 10.5815/ijisa.2017.12.03.
S-82177M. Skoric, N. Poor, P. Achananuparp, E. P. Lim, and J. Jiang, “Tweets and votes: A study of the 2011 Singapore General Election,” Proc. Annu. Hawaii Int. Conf. Syst. Sci., pp. 2583–2591, 2012, doi: 10.1109/HICSS.2012.607.
S-8356J. M. Soler, F. Cuartero, and M. Roblizo, “Twitter as a tool for predicting elections results,” Proc. 2012 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2012, pp. 1194–1200, 2012, doi: 10.1109/ASONAM.2012.206.
S-843F. J. J. Joseph, “Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree,” Proc. 2019 4th Int. Conf. Inf. Technol. Encompassing Intell. Technol. Innov. Towar. New Era Hum. Life, InCIT 2019, pp. 50–53, 2019, doi: 10.1109/INCIT.2019.8911975.
S-8527R. M. Filho, J. M. Almeida, and G. L. Pappa, “Twitter population sample bias and its impact on predictive outcomes: A case study on elections,” Proc. 2015 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Mining, ASONAM 2015, pp. 1254–1261, 2015, doi: 10.1145/2808797.2809328.
S-867D. F. Budiono, A. S. Nugroho, and A. Doewes, “Twitter sentiment analysis of DKI Jakarta’s gubernatorial election 2017 with predictive and descriptive approaches,” Proc. - 2017 Int. Conf. Comput. Control. Informatics its Appl. Emerg. Trends Comput. Sci. Eng. IC3INA 2017, vol. 2018-Janua, pp. 89–94, 2017, doi: 10.1109/IC3INA.2017.8251746.
S-8745N. D. Prasetyo and C. Hauff, “Twitter-based election prediction in the developing world,” in HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media, 2015, pp. 149–158, doi: 10.1145/2700171.2791033.
S-8816S. O’Banion and L. Birnbaum, “Using explicit linguistic expressions of preference in social media to predict voting behaviour,” in Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013, 2013, pp. 207–214, doi: 10.1145/2492517.2492538.
S-890A. Jhawar, V. Munjal, S. Ranjan, and P. Karmakar, “Social Network based Sentiment and Network Analysis to Predict Elections,” Proc. CONECCT 2020 - 6th IEEE Int. Conf. Electron. Comput. Commun. Technol., pp. 0–5, 2020, doi: 10.1109/CONECCT50063.2020.9198574.
S-902R. Casarin, J. C. Correa, J. E. Camargo, S. Dakduk, E. ter Horst, and G. Molina, “What makes a tweet be retweeted? A Bayesian trigram analysis of tweet propagation during the 2015 Colombian political campaign,” J. Inf. Sci., pp. 1–9, 2019, doi: 10.1177/0165551519886056.
S-9116Z. Xie, G. Liu, J. Wu, L. Wang, and C. Liu, “Wisdom of fusion: Prediction of 2016 Taiwan election with heterogeneous big data,” 2016, doi: 10.1109/ICSSSM.2016.7538625.
S-920A. Khan et al., “Predicting Politician’s Supporters’ Network on Twitter Using Social Network Analysis and Semantic Analysis,” Sci. Program., vol. 2020, 2020, doi: 10.1155/2020/9353120.
S-930Z. Gong, T. Cai, J. C. Thill, S. Hale, and M. Graham, “Measuring relative opinion from location-based social media: A case study of the 2016 US Presidential election,” PLoS One, vol. 15, no. 5, pp. 1–27, 2020, doi: 10.1371/journal.pone.0233660.
S-943P. C. López-López, P. Oñate, and Á. Rocha, “Social media mining, debate and feelings: digital public opinion’s reaction in five presidential elections in Latin America,” Cluster Comput., vol. 23, no. 3, pp. 1875–1886, 2020, doi: 10.1007/s10586-020-03072-8.
S-956M. Z. Ansari, M. B. Aziz, M. O. Siddiqui, H. Mehra, and K. P. Singh, “Analysis of Political Sentiment Orientations on Twitter,” Procedia Comput. Sci., vol. 167, pp. 1821–1828, 2020, doi: 10.1016/j.procs.2020.03.201.
S-960G. R. Gustisa Wisnu, Ahmadi, A. R. Muttaqi, A. B. Santoso, P. K. Putra, and I. Budi, “Sentiment Analysis and Topic Modelling of 2018 Central Java Gubernatorial Election using Twitter Data,” pp. 35–40, 2020, doi: 10.1109/iwbis50925.2020.9255583.
S-970L. P. Manik et al., “Aspect-Based Sentiment Analysis on Candidate Character Traits in Indonesian Presidential Election,” pp. 224–228, 2020, doi: 10.1109/icramet51080.2020.9298595.
S-980E. Soylu and S. Baday, “Predicting the June 2019 Istanbul Mayoral Election with Twitter,” no. June 2019, pp. 1–6, 2020, doi: 10.1109/asyu50717.2020.9259800.

3.5. Data Extraction

Data extraction is the process of extracting relevant information from the primarily selected papers according to the defined research and demographic questions. Initially, we agreed upon Data Extraction Form (DEF) after going through a thorough review. Next, we started proper extraction from the papers. “Data Extraction Form” provides a reliable and precise approach to extract data in systematic mapping studies [16, 19]. We inspected and thoroughly read the full-text of nearly all papers.

4. Results and Discussion

In this section, we briefly discuss the results of this SMS. A summary of the most notable results in each research and the demographic question is discussed separately. Figure 4(a) shows the number of studies published in different venues (Conference or Journal). Figure 4(b) shows the distribution of studies across the years. It is noteworthy that the topic of “Election Prediction on Twitter” is attracting researchers’ attention since the last decade.

4.1. RQ1: What Are the Approaches Used in Predicting Elections on Twitter?

Figure 5 shows the number of studies that use different approaches for election prediction on Twitter: sentiment analysis (SA), sentiment analysis (orientation), volumetric (Vol.), social network analysis (SNA); topic modelling using LDA (in this study, the algorithm name LDA is used instead of topic modelling in the approaches); and a combination of these approaches such as SA & Vol.; SA, Vol., & SNA; and SA (orientation), SNA, & LDA.

In this SMS, we have taken SA and SA (orientation) separately to facilitate researchers’ rapid approach to the specific study. SA approach includes a study that used either or both polarity detection (positive, negative, and neutral) and emotion detection (tense, angry, sad, happy, relaxed, exhausted, calm, excited, and nervous). SA (orientation) studies the political orientation of voters by analyzing their tweets that show voting behaviour explicitly, such as “I will vote for candidate A” and “I will not vote for candidate A.”

We defined the following terminologies to be used in the rest of the paper:i: an approach used alone in a paperj: an approach used along with other approaches in a paper

Figure 6 presents the approaches, along with the primary selected study(s). Figure 5 shows that 64 studies used the sentiment analysis approach only (SAi), nearly 65% of all the primary papers used in this study. Only 3 papers used SA (orientation).

It is interesting to note that only 9 papers employed the volume-based approach only, making it almost 9%. A hybrid approach of the “SA and Vol.” has been used by 16% of the selected studies. 1 study used SNA only, and 3 papers used the combination of SA and SNA approach, which makes almost 5% of the total studies. Only two studies used LDA along with the other approaches. S-17 used LDA for topic modelling and categorized those topics into positive and negative.

It is worth noting that most of the studies applied an SA approach (SAi+ SAj), which makes 89%, followed by a volume-based approach, concluding, 26% of the studies (Voli+ Volj). Very few studies employed a social network analysis approach. Opinion mining depicts better understandings about a political user’s behaviour. A user’s expressions in words are more understandable than the communication connections; an example is 100 citizens who comment negatively on a political leader’s post. It can positively impact the results of a prediction using a volumetric or SNA approach, but it is certainly against the leader (opposing in context). Thus, many researchers tend to use the SA approach.

4.1.1. RQ1(a): What Are the Techniques Used for Election Prediction on Twitter?

Approaches (RQ1) are further analyzed in-depth by answering RQ1(a),(b),(c), such as a supervised technique (SVM and NB) is applied in the SA approach for classifying tweets into positive, negative, or neutral. In this SMS, the techniques are classified into supervised (S); unsupervised (US); deep learning (DL); lexicon-based approaches (LAs); count (C); library (tool such as TextBlob); and the combination of these techniques such as S & US; S, US, & LA; US & LA; S & LA; S & DL; LA, C, & SNA; S, & C; S, US, & C.

Figure 7 shows the number of studies reporting these techniques. Numerous studies have employed supervised (S) learning techniques, 34 studies (Si) making almost 35% of the selected studies. By looking in-depth, we can see that some studies used other techniques along with it, such as S-41, S-51, and S-92. In conclusion, 51 studies used S-learning in total (Si+Sj), which makes it the highest used technique (52%) in this SMS.

Several studies used the LA for sentiment analysis, especially for tweets other than English. 25 studies employed LAi. Few papers reported LAj, making it (LAi + LAj) 39% of selected studies in this SMS. 18% of the selected studies used the count (Ci + Cj) techniques. Few papers employed US techniques in total (USi + USj) 9%. Only 5% of the selected studies used deep learning (DLi + DLj) techniques. Some studies used another tool/library for sentiment analysis, such as S-77 used TextBlob without mentioning any algorithm. Figure 8 shows the techniques along with the study(s).

4.1.2. RQ(b): Which Tools Are Utilized?

This section gives an overview of the tools, libraries, and dictionaries (TLD) used to assist the election prediction on Twitter. In addition to the list of TLD, the list of primary studies has been given exclusively in Table 6. NLTK is used the most. Some tools provide a graphical user interface (GUI), such as WEKA, RapidMiner, and Gephi. Nearly, 13% used such GUI tools. Almost 18 types of dictionaries are employed in the primary studies. Only one study reported Hadoop. The rest of the details can be seen in Table 6.


Tool/library/dictionaryPrimary study

NLTKS-01, S-06, S-13, S-21, S-27, S-37, S-39, S-57, S-77, S-84, S-89
Stanford POS TaggerS-06, S-32, S-36, S-39
Ark Twitter POS Tagger (abbreviated from now on as ATP) [20]S-47
Porter stemmer algorithmS-07, S-65, S-89
TweetNLP toolkitS-01
Scikit-learnS-01, S-03, S-13, S-08, S-21, S-27, S-35, S-57
SPSS statistical packageS-01
MATLAB neurofuzzy toolboxS-02
LIBLINEAR library [21]S-88
LIBSVM [22]S-97
TextBlob [23]S-05, S-64, S-77, S-80, S-84, S-89
Freeling [24]S-08
Twitter4jS-10
MongoDBS-10, S-14, S-51, S-84
MySQL databaseS-82, S-92
Microsoft ExcelS-94
Knowi [25]S-10
IntelliJ IDE [26]S-10
Opinion Words [27]S-10, S-58
WEKAS-11, S-20, S-26, S-34, S-36, S-65
ParallelDots AI APIsS-12
LinkedGeoData [28]S-15
Geocoding API from Google MapsS-16
Bing Map [29]S-85
Ggplot2 [30] packageS-19, S-66
Gephi [31]S-20, S-89, S-92
RapidMiner [32]S-22, S-46, S-92
Lexicoder Sentiment Dictionary (LSD) [33]S-53
SentiWordNet [34]S-23, S-25, S-44, S-47, S-51, S-67, S-68
WordNet [35]S-67
BalkaNet: WordNet [36]S-26
LexiPers [37]S-31
Lexicon Dictionary [27]S-32
NRC Word Emotion Association Lexicon (EmoLex) dictionary [38]S-43, S-44
VADER (Valence Aware Dictionary and sEntiment Reasoner) [39]S-27, S-44, S-78
OpinionFinder sentiment corpus [40]S-49
Emojie Lexicon Package [41]S-44
SenticNet [42]S-44
Syuzhet [43]S-44, S-72, S-12, S-20, S-43
Lexicon dictionary by Hu and Liu [27]S-44
Subjectivity lexicon [44]S-37
AFINN [45]S-44
Sentiment140 corpus [46, 47]S-45, S-54, S-79
OpLexicon [48] (Portuguese sentiment lexicon)S-80
SentiLex [49] (Portuguese sentiment lexicon)S-80
CRAN—Package RSentiment [50]S-28, S-40, S-75
tm Package [51] Text Mining in RS-43, S-75
R: Plyr [52]S-75
GtrendsR package (collect Google trends)S-28,
LingPipe library [53]S-29, S-51
LinguaKit [54] (for feature selection and sentiment analysis)S-38
twitteR (to determine user geolocation)S-38
SDP (shortest dependency part)S-47
LATINO library (link analysis and text mining toolbox) [55]S-48
Louvain modularity optimization method [56]S-50
Rgraphviz package [57]S-54
GraphChi [44]S-59
DBpediaS-51
Stanford NERS-51, S-58
Virtuoso store (extracted entities along with metadata were transformed into RDF and stored)S-51
OWLS-51
Plotly [58]S-57, S-77
Multiprocessing on a 16-core Amazon AWS EC2S-56
Lucene 3.1.0 Java API2 (preprocessing)S-58
Australia Gazetteer databaseS-58
Language Guesser developed by Thomas Martin [59]S-60
LIWC (linguistic inquiry and word count) [60, 61]S-61, S-94
Syntactic normalization of tweets [62] (for preprocessing)S-63
HadoopS-63
Rainbow tool [63]S-65
GATE Twitter NLP [64] (tweet normaliser)S-76
TwitterNLP [65] for tokenizationS-93
Tensorflow 1.1.0S-33, S-79
Google Chart APIS-83

4.1.3. RQ(c): Which Techniques/Tools Are Employed for Tweet Collection?

Data can be collected from Twitter either using API or by crawling. Twitter provides two types of APIs: REST and Streaming. Few of the selected studies did not explicitly report any technique for collecting Twitter data, such as S-22, S-28, S31, S-35, and S-95. Some of the studies reported “Twitter API” only. S-57 used a dataset in Data World [66]. Figure 9 shows the number of studies that use different techniques and tools for collecting tweets. In this SMS, we used techniques and tools (name) similar to those reported in the primary studies. An example is Tweepy and twitter4j are Streaming APIs and is taken separately from Twitter Streaming API.

4.2. RQ2: Which Studies Reported Manually/Automatically Annotated Data?

Annotated (or labelled) corpus assists in training supervised and semisupervised techniques [67]. Large and unambiguous annotated data can lead to a better prediction by improving an algorithm’s results. Data can be annotated either or both manually and automatically [68]. There are few political annotated datasets available. Languages other than English lack such datasets.

This RQ aims to identify and list the studies that used manual or automatic data labelling. Some studies worked in languages other than English, such as S-48 annotated tweets in the Bulgarian language. Few studies employed automatic data labelling techniques such as S-79 uses deep neural networks to label the data. Figure 10 shows the list of studies that use manual or automatic political data labelling.

4.3. RQ3: Which Countries Are Reported for Election Prediction on Twitter?

This RQ aims to identify and list the countries whose elections are analyzed in the primary studies. Figure 11 shows the list of 28 countries and the total number of studies that analyzed its elections. It can be seen that 27 studies analyzed USA elections and 24 studies studied the prediction of Indian elections (both country level and regional). Elections of Indonesia, Netherlands, and Spain are reported in 7 studies, respectively, followed by Pakistan in 5, the UK in 4, and the rest can be observed in Figure 11.

4.4. RQ4: What Are the Languages of Tweets Used for Predicting Elections on Twitter?

The objective of this RQ is to classify and list the tweet languages used in the primary studies. Tweet languages used are Bulgarian, Chinese (candidates’ names) (CNN), Dutch, English, English translated from Spanish (S2E), English translated from Urdu (U2E), English translated from German (G2E), English translated from others (O2E), Greek, Hindi, Indonesian, Italian, Persian, Portuguese, Spanish, Swedish, Turkish, Multilanguage (English and Spanish) (MLES), Assume Multilanguage (English and Roman Urdu) (AMLEU), Assumption (English) (AE), Assumption (Spanish) (AS), and Not Mentioned (NM).

Roughly, 45% of the primary studies used English tweets. Subsequently, 7% of studies analyzed tweets in Indonesian and 7% in Spanish languages used. Figure 12 presents the list of languages and the number of studies that investigated them. Some studies translated tweets from other languages to English for further investigation. The reason is that other languages lack resources (annotated data and dictionaries); S-20, S-41, S-61, and S-76 are examples. S-17 used Chinese candidates’ names for tweet collection and used the volumetric approach for predicting the election. Almost 16% of studies have not reported any language, volumetric approach (most studies).

4.5. RQ5: What Are the Most Frequent Topics Discussed?

The goal of this question in this study is to extract information from the selected studies automatically. Such an approach can help the researchers to have an insight into the topics discussed. We classified the implementation and representation into two parts: (1) topic modelling (correlation) and (2) word cloud. LDA [69] is an example of topic modelling. We applied the topic modelling technique on two levels of the primary studies:1. Abstract level2. Full-text level

We further generated word clouds from the selected papers on the following levels:1. Titles2. Author keywords3. Abstracts4. Full-text

We converted all the papers from PDF to Text. For topic modelling, the data are preprocessed to clean the extracted data. The steps include converting all text to lower case, stemming and lemmatization, and employing stop words (English). Furthermore, sections such as “Acknowledgement” and “References” were excluded to perform topic modelling at “full-text level.” For word cloud, all the text at different levels (title, keywords, abstract, and full-text) is tokenized into single words, followed by removing unnecessary words using stop words (English). Next, compute the word frequencies and generate a word cloud for each level.

Figure 13 shows the 25 topics generated at the abstract level and illustrates the correlations between them. Blue circles represent correlated topics, while the red colour shows the anticorrelation or inverse correlation. It shows us exciting findings, such as “sentiment analysis polarity” has a high correlation with “presidential predict win.” Another topic, “social media popularity,” is highly correlated with “presidential predict win,” “outcome account expects,” and “election poll outcome.” The rest of the correlation and inverse correlation of the topics can be explored in Figure 13.

Figure 14 represents the correlation between 25 topics generated from the primarily selected papers’ full-text. It is interesting to note that nearly all the topics are anticorrelated.

4.5.1. Word Cloud

Word cloud represents the words visually. The occurrence of the most popular and frequent words appears in the word cloud. Figure 15(a) shows the word cloud for the titles of the selected papers. The words “Election, Elections, Analysis, Twitter, Sentiment, and Presidential” are prominent. It shows us that most studies employed sentiment analysis for predicting elections. Most of the studies analyzed presidential elections.

Figure 15(b) shows the word cloud generated from the author keywords of the selected studies. The words “Election, Sentiment, Twitter, Prediction, Social, Mining, Media, Machine, and Learning” are prominent. These findings show that the majority of studies implemented sentiment analysis for predicting elections on Twitter. Furthermore, it is worth noticing that many studies employed machine learning techniques.

Figure 16(a) depicts similar words in the world cloud of abstracts as in Figures 15(a) and 15(b). Some high-frequency words are “Twitter, Election, Social, Media, Sentiment, Analysis, Political, Predict, and Opinion.” Figure 16(b) illustrates almost the same themes from full-text as discussed in other word clouds. Some of the famous words are “Twitter, Election, Social, Prediction, Social, Media, Users, Presidential, Opinion, and India.” It shows that most of the studies applied sentiment analysis to predict elections on Twitter. It shows that several studies analyzed presidential and Indian elections.

By comparing the findings from word clouds and the outcomes of RQ1, it is noteworthy that both the results are nearly the same. As discussed in Section 4.1, approximately 89% of the studies applied sentiment analysis (SAi+ SAj). RQ1(a) shows that machine learning techniques are employed the most. Furthermore, RQ3 shows that the majority of the studies analyzed USA and Indian elections. The outcomes from the word clouds reflect almost the same information.

4.6. DQ1: Which Are the Most Active Researchers in the Field of Analyzing Election Prediction on Twitter?

A total of 284 researchers contributed and appeared as authors in the 98 selected primary studies. We selected researchers who have appeared in two or more papers in the selected studies. Figure 17 shows the most active researchers along with the study they contributed in.

Almost 100% of the active researchers are affiliated with academic organizations. These data identified some research groups in which researchers collaborated, such as Brian Heredia, Joseph D. Prusa, and Taghi M. Khoshgoftaar. These data let us know that the researcher Malhar Anjaria is not active since 2014. Furthermore, we noticed that the research group of Rincy Jose and Varghese S Chooralil is not active since 2016. This finding also tells us that more academic and industrial collaboration is needed.

4.7. DQ2: Which Are the Most Active Organizations?

This RQ aims to identify and list the most active organizations that appeared in the selected studies. A total number of 158 organization names were listed, out of which 13 organizations contributed to more than one study. The list of the organizations and their support level (contribution) is given in Table 7.


Organization’s name and countryContribution

Department of Electronics Technology/Computer Science, Guru Nanak Dev University, Amritsar6
Department of Computer and Electrical Engineering and Computer Science Florida Atlantic University, Boca Raton, Florida3
The Data Mining and Analytics Research Group School of Science & Technology, International Hellenic University Thessaloniki2
Department of Business and Economics, University of Cagliari2
CLS/CLST, Radboud University Nijmegen, Erasmusplein 1, 6525 HT Nijmegen2
School of Economics and Management, Beihang University, Beijing2
School of Computer Science and Electronic Engineering, University of Essex, Colchester2
Department of Applied Sciences, The NorthCap University, Gurugram2
Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Mangalore2
Department of Computer Science and Engineering, Rajagiri School of Engineering and technology Ernakulam2
Escuela Superior Politecnica del Litoral, ESPOL2
Universita Ca’ Foscari, Venice2

In this SMS, we have divided organizations into two categories: industry and academic (university, research institute, and government research organization). It is attention-grabbing that most of the academia is more active than the industry. Only 7 industrial organizations appeared in the selected studies. In S-82, one researcher named Nathaniel Poor affiliated to no organization. There is a need for more industrial and academic collaboration that can improve this domain. Figure 18 shows the distribution of organizations.

4.8. DQ3: Which Are the Most Active Publication Venues?

This RQ aims to identify and list the most active publication venues in the selected studies. Table 8 shows the venue name along with the support level (>1). The most active conference is “Lecture Notes in Computer Science,” whose support level is 5, followed by the “Communication in Computer and Information Science” conference. Only two journals, “PLOS ONE” and “Social Network Analysis and Mining,” have support level 2. The research is more published in conference venues, so the trend should be published in more prestigious peer-reviewed journals.


Conference/journalVenueSupport level

Lecture Notes in Computer ScienceConference5
Communications in Computer and Information ScienceConference4
ACM International Conference Proceeding SeriesConference2
Advances in Intelligent Systems and ComputingConference2
CEUR Workshop ProceedingsConference2
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAMConference2
PLEAD-Proceedings of the Workshop on Politics, Elections, and DataConference2
PLOS ONEJournal2
Social Network Analysis and MiningJournal2

5. Validity Threats

We have followed some protocols to avoid or mitigate the validity threats (VTs) in this study. These VTs are as follows:1. Descriptive validity2. Interpretive validity3. Theoretical validity4. Generalizability5. Reliability

Each of these VTs is discussed separately in the subsequent sections.

5.1. Descriptive Validity

Descriptive validity (DV) deals with the accuracy and objectivity of the extracted information. DV endorses that no imperative information is skipped or ignored during the extraction process. To deal with DV, we arranged regular sessions to discuss and build agreement upon the extraction process, such as what information needs to be collected and stored. We agreed and designed Data Extraction Form (DEF) collectively. To maintain unbiasedness and ensure traceability, every entry in the DEF has a comment that links each extracted value assigned by the researcher.

5.2. Interpretive Validity

Interpretive validity (IV) deals with the validity of the conclusion drawn from the extracted information and ensures that the information extracted by a researcher is unbiased. To minimize IV, we applied the subsequent mechanisms. Initially, we arranged regular meetings to ensure that all the researchers are agreed upon the same interpretation and conclusion of the results (extracted information), a set of protocols, and their executions. Next, excluding the first author, researchers were divided into two distinct groups, drawing the results’ interpretation. The first author compared the drawn conclusions, matched them, and standardized the writing style. Finally, all the authors substantiated the interpretation and its traceability to the previous results in the DEF.

5.3. Theoretical Validity

Theoretical validity (TV) is a vital type of threat as there is a possibility of various inaccuracies while selecting relevant papers, such as biasedness of a researcher while extracting the papers, incapability of the search and selection process (either or both selecting irrelevant papers and excluding relevant papers), and quality of the selected papers, which leads to flawed conclusions.

We followed protocols discussed in Sections 3.3 and 3.4 to search the papers in the five databases and select relevant papers to minimize this threat.

5.4. Generalizability

To reduce this threat, we relied upon the impartiality of the data extraction process, DEF, and the set of rules to investigate, leading to the interpretations. Nevertheless, we assume that the primarily selected studies (98 papers) achieve the generalization with low-risk [70].

5.5. Reliability

To increase this SMS’s reliability, we performed a comprehensive report of the complete process from the start of the protocol till the conclusion. Finally, we described the rubrics used for self-appraisal by implementing the guidelines from Kitchenham and Charters [70] to minimize the threats.

6. Conclusion and Future Work

This study reports the planning, conducting, and implementation steps on “predicting elections on Twitter.” We selected 98 studies from January 2010 to January 2021. This study aims to identify and classify the approaches, techniques, tools, countries, and languages used in election prediction on Twitter.

We defined and implemented a search strategy to achieve our goal. Initially, we found 787 potential studies. After implementing selecting criteria (inclusion/exclusion), we chose 98 primary studies as relevant.

The extracted data lead us to the following conclusions:RQ1: approximately, 65% of the selected studies reported sentiment analysis (SAi) approach and 24% of the selected studies reported SAj, which concludes that 89% of the selected studies implemented sentiment analysis in total (SAi+ SAj), followed by a volume-based approach, 26% of the selected studies in total (Voli + Volj). 6% of the selected studies employed social network analysis techniques (SNAi + SNAj).RQ1(a): 51% of the selected studies used supervised learning in total (Si+Sj), which makes it the highest used technique (52%) in this SMS. Lexicon-based approach makes 39% (LAi+ LAj). 18% employed volumetric techniques (Ci+Cj). Only 9% employed unsupervised learning techniques (USi+ USj). Furthermore, 5% of the selected studies implemented deep learning (DLi+ DLj) techniquesRQ1(b): this SMS listed nearly all the tools used in the primary selected studies. NLTK is used most commonly. 13% of the selected studies reported GUI tools such as WEKA and RapidMiner. Almost 18 types of dictionaries are used in the primary studies.RQ1(c): almost 12% used Tweepy, 7% employed TwitterR, 5% Twitter REST API, 12% Search API, 9% Streaming API, and 20% of the selected studies just mentioned Twitter API.RQ2: 44% of the selected studies manually or automatically annotated the data.RQ3: the elections of 28 countries are analyzed in the selected studies. 28% of the selected studies studied USA election, and 25% analyzed Indian Elections. Elections of Indonesia, Netherlands, and Spain are reported in 7% (each) of the studies, followed by Pakistan 5%, and 4% analyzed UK elections.RQ4: nearly 45% of the primary studies used English tweets. 7% of the selected studies analyzed tweets in Indonesian, and 7% in Spanish languages. Approximately, 5% of the selected studies translated tweets from other languages to English, making English 50%.RQ5: some popular topics are “Election, Prediction, Twitter, Sentiment, Analysis, Opinion, Mining, Presidential, USA, India, Machine, and Learning.”

Demographic data show that 76% of the selected studies are conference papers, and 24% are Journal papers. Predicting elections on Twitter is getting more popular and attracting more researchers in the last decade. 284 researchers contributed in the primary selected 98 papers out of which 21 authors have support level more than 2. The authors who appeared in the selected studies were affiliated to 158 organizations. 13 organizations have contributed to more than 2 studies, out of which two organizations have support level 3. The results highlighted that 149 are academic organizations, and only 7 industrial affiliations have appeared. Furthermore, 9 venues are the most active, out of which 7 are conferences.

As future work, we recommend that(i)There is a need for in-depth analysis in the field of prediction election on Twitter(a)Metrics evaluation of the techniques(b)Details about the countries(c)Types of elections(d)Details about the data(e)Election results(ii)Empirical studies need to be conducted; election prediction(iii)Analyze elections predictions on platforms other than Twitter(iv)Analyze and compare election predictions in cross-fields, such as computer science and social sciences

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The research work was funded by the Beijing Municipal Natural Science Foundation (grant no. 4212026), National Science Foundation of China (grant no. 61772075), and National Key Research and Development Project of China (grant no. 2018YFC0832304). The authors are thankful to them for their financial support.

References

  1. R. Naseem, Z. Shaukat, M. Irfan et al., “Empirical assessment of machine learning techniques for software requirements risk prediction,” Electronics, vol. 10, no. 2, p. 168, 2021. View at: Publisher Site | Google Scholar
  2. A. Ullah, J. Wang, M. S. Anwar et al., “Fusion of machine learning and privacy preserving for secure facial expression recognition,” Security and Communication Networks, vol. 2021, Article ID 6673992, 12 pages, 2021. View at: Publisher Site | Google Scholar
  3. N. Boudjellal, H. Zhang, A. Khan et al., “ABioNER: a BERT-based model for Arabic biomedical named-entity recognition,” Complexity, vol. 2021, Article ID 6633213, 6 pages, 2021. View at: Publisher Site | Google Scholar
  4. A. Khan, H. Zhang, J. Shang et al., “Predicting politician’s supporters’ network on twitter using social network analysis and semantic analysis,” Scientific Programming, vol. 2020, Article ID 9353120, 17 pages, 2020. View at: Publisher Site | Google Scholar
  5. S. Unankard, X. Li, M. Sharaf, J. Zhong, and X. Li, “Predicting elections from social networks based on sub-event detection and sentiment analysis,” Web Information Systems Engineering—WISE 2014, vol. 8787, pp. 1–16, 2014. View at: Publisher Site | Google Scholar
  6. J. M. Soler, F. Cuartero, and M. Roblizo, “Twitter as a tool for predicting elections results,” in Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2012, pp. 1194–1200, Istanbul, Turkey, August 2012. View at: Publisher Site | Google Scholar
  7. T. M. Fagbola and S. Colin, “Lexicon-based bot-aware public emotion mining and sentiment analysis of the Nigerian 2019 presidential election on Twitter,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 10, pp. 329–336, 2019. View at: Publisher Site | Google Scholar
  8. B. Heredia, J. D. Prusa, and T. M. Khoshgoftaar, “Location-based twitter sentiment analysis for predicting the U.S. 2016 presidential election,” in Proceedings of the FLAIRS Conference 2017, vol. 2009, pp. 265–270, Marco Island, FL, USA, May 2017. View at: Google Scholar
  9. P. Chauhan, N. Sharma, and G. Sikka, “The emergence of social media data and sentiment analysis in election prediction,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 2, p. 2601, 2020. View at: Publisher Site | Google Scholar
  10. M. Bilal, A. Gani, M. Marjani, and N. Malik, “Predicting elections: social media data and techniques,” in Proceedings of the 2019 International Conference on Engineering and Emerging Technologies (ICEET), Lahore, Pakistan, February 2019. View at: Publisher Site | Google Scholar
  11. P. Singh and R. S. Sawhney, “Influence of Twitter on prediction of election results,” Advances in Intelligent Systems and Computing, vol. 564, pp. 665–673, 2018. View at: Publisher Site | Google Scholar
  12. A. Ahmad, C. Feng, M. Khan et al., “A systematic literature review on using machine learning algorithms for software requirements identification on stack overflow,” Security and Communication Networks, vol. 2020, Article ID 8830683, 19 pages, 2020. View at: Publisher Site | Google Scholar
  13. A. Vazquez-Ingelmo, F. J. Garcia-Penalvo, and R. Theron, “Information dashboards and tailoring capabilities—a systematic literature review,” IEEE Access, vol. 7, pp. 109673–109688, 2019. View at: Publisher Site | Google Scholar
  14. F. B. Shaikh, M. Rehman, and A. Amin, “Cyberbullying: a systematic literature review to identify the factors impelling university students towards cyberbullying,” IEEE Access, vol. 8, pp. 148031–148051, 2020. View at: Publisher Site | Google Scholar
  15. M. Hammad, H. A. Basit, S. Jarzabek, and R. Koschke, “A systematic mapping study of clone visualization,” Computer Science Review, vol. 37, Article ID 100266, 2020. View at: Publisher Site | Google Scholar
  16. A. Ahmad, J. L. B. Justo, C. Feng, and A. A. Khan, “The impact of controlled vocabularies on requirements engineering activities: a systematic mapping study,” Applied Sciences, vol. 10, no. 21, pp. 7749–7829, 2020. View at: Publisher Site | Google Scholar
  17. K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines for conducting systematic mapping studies in software engineering: an update,” Information and Software Technology, vol. 64, pp. 1–18, 2015. View at: Publisher Site | Google Scholar
  18. J. G. Enríquez, L. Morales-Trujillo, F. Calle-Alonso, F. J. Domínguez-Mayo, and J. M. Lucas-Rodríguez, “Recommendation and classification systems: a systematic mapping study,” Scientific Programming, vol. 2019, Article ID 8043905, 18 pages, 2019. View at: Publisher Site | Google Scholar
  19. D. Budgen, P. Brereton, S. Drummond, and N. Williams, “Reporting systematic reviews: some lessons from a tertiary study,” Information and Software Technology, vol. 95, pp. 62–74, 2018. View at: Publisher Site | Google Scholar
  20. K. Gimpel, N. Schneider, B. O’Connor et al., “Part-of-speech tagging for twitter: annotation, features, and experiments,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), vol. 2, pp. 42–47, Portland, OR, USA, June 2011. View at: Google Scholar
  21. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, “LIBLINEAR: a library for large linear classification,” Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008. View at: Google Scholar
  22. C.-C. Chang and C.-J. Lin, “LIBSVM: a Library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1–27, 2011. View at: Publisher Site | Google Scholar
  23. S. Loria, “TextBlob: simplified text processing—TextBlob 0.15.2 documentation,” 2018. View at: Google Scholar
  24. L. Padró and E. Stanilovsky, “FreeLing 3.0: towards wider multilinguality,” in Proceedings of the Language Resources and Evaluation Conference (LREC 2012) ELRA, pp. 2473–2479, Istanbul, Turkey, May 2012. View at: Google Scholar
  25. Unified analytics—business intelligence (KNOWI), 2021,.
  26. JetBrains, “Download IntelliJ IDEA: the Java IDE for professional developers by JetBrains,” 2017, https://www.jetbrains.com/idea/download. View at: Google Scholar
  27. M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining, vol. 4425, no. 1, pp. 79–86, Washington, DC, USA, July 2010. View at: Google Scholar
  28. LinkedGeoData.org, Linked GeoData, 2013, http://linkedgeodata.org/About.
  29. Microsoft, “Bing maps documentation,” 2020, https://docs.microsoft.com/en-us/bingmaps/. View at: Google Scholar
  30. H. Wickham, “ggplot2: create elegant data visualisations using the grammar of graphics,” R Package Version 3.6.1, 2018. View at: Google Scholar
  31. M. Bastian, S. Heymann, and M. Jacomy, “Gephi: an open source software for exploring and manipulating networks,” in Proceedings of the Third International Conference on Weblogs and Social Media, pp. 361-362, 2009. View at: Publisher Site | Google Scholar
  32. RapidMiner Inc., “RapidMiner—best data science & machine learning platform,” 2020, https://rapidminer.com/. View at: Google Scholar
  33. L. Young and S. Soroka, “Affective news: the automated coding of sentiment in political texts,” Political Communication, vol. 29, no. 2, pp. 205–231, 2012. View at: Publisher Site | Google Scholar
  34. S. Baccianella, A. Esuli, and F. Sebastiani, “SENTIWORDNET 3.0: an enhanced lexical resource for sentiment analysis and opinion mining,” in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, pp. 2200–2204, Valletta, Malta, May 2010. View at: Google Scholar
  35. G. A. Miller, “WordNet: a lexical database for English: communications of the ACM,” 1995, https://wordnet.princeton.edu/. View at: Google Scholar
  36. BalkaNet Project Home Page, 2021, http://www.dblab.upatras.gr/balkanet/resources.htm.
  37. B. Sabeti, P. Hosseini, G. Ghassem-Sani, and S. A. Mirroshandel, “LexiPers: an ontology based sentiment lexicon for Persian,” 2019. View at: Google Scholar
  38. P. D. Turney and S. M. Mohammad, “Crowdsourcing a word-emotion association lexicon,” Computational Intelligence, vol. 59, no. 3, pp. 436–465, 2013. View at: Google Scholar
  39. C. J. Hutto and E. Gilbert, “VADER: a parsimonious rule-based model for sentiment analysis of social media text,” in Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014, pp. 216–225, Ann Arbor, MI, USA, June 2014. View at: Google Scholar
  40. T. Wilson, OpinionFinder: A System for Subjectivity Analysis, University of Pittsburgh, Pittsburgh, PA, USA, 2005, http://nrrc.mitre.org/NRRC/publications.htm.
  41. GitHub—Trinker/Lexicon: A Data Package Containing Lexicons and Dictionaries for Text Analysis, 2021, https://github.com/trinker/lexicon.
  42. E. Cambria, S. Poria, R. Bajpai, and B. Schuller, “SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives,” in Proceedings of the 26th International Conference on Computational Linguistics, COLING 2016: Technical Papers, pp. 2666–2677, Osaka, Japan, December 2016. View at: Google Scholar
  43. M. Jockers, “Introduction to the syuzhet package,” 2016, https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html. View at: Google Scholar
  44. T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT ’05), vol. 7, no. 5, pp. 347–354, Vancouver, Canada, October 2005. View at: Publisher Site | Google Scholar
  45. Finn Arup Nielsen, “A new: evaluation of a word list for sentiment analysis in microblogs,” CEUR Workshop Proceedings, vol. 718, pp. 93–98, 2011. View at: Google Scholar
  46. A. Go, R. Bhayani, and L. Huang, Twitter Sentiment Classification Using Distant Supervision, Stanford University, Stanford, CA, USA, 2009.
  47. M. Michailidis, “Sentiment140 dataset with 1.6 million tweets (Kaggle),” Kaggle.Com, 2017, https://www.kaggle.com/kazanova/sentiment140%0Ahttps://www.kaggle.com/kazanova/sentiment140/data#training.1600000.processed.noemoticon.csv. View at: Google Scholar
  48. OpLexicon–PLN–PUCRS, 2021, https://www.inf.pucrs.br/linatural/wordpress/recursos-e-ferramentas/oplexicon/.
  49. SentiLex-PT 02–Datasets–B2FIND, 2021, http://b2find.eudat.eu/dataset/b6bd16c2-a8ab-598f-be41-1e7aeecd60d3.
  50. S. Goswami, “RSentiment: analyse sentiment of english sentences,” 2016, https://cran.r-project.org/web/packages/RSentiment/index.html. View at: Google Scholar
  51. I. Feinerer, “Introduction to the Tm package: text mining in R,” 2015. View at: Google Scholar
  52. H. Wickham, “CRAN—Package Plyr,” 2014, http://cran.r-project.org/web/packages/plyr/index.html. View at: Google Scholar
  53. Alias-I, “LingPipe Home,” 2016, http://alias-i.com/lingpipe/. View at: Google Scholar
  54. https://linguakit.com/es/analisis-completoAnálisis Completo, 2021,.
  55. LATINO download | SourceForge.net, 2021, http://www.latinolib.org/2011/10/about-latino.html.
  56. V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, Article ID P10008, 2008. View at: Publisher Site | Google Scholar
  57. Bioconductor—Rgraphviz, 2012, http://bioconductor.org/packages/release/bioc/html/Rgraphviz.html%0Ahttp://files/716/Rgraphviz.html.
  58. Plotly, “Plotly open source graphing libraries,” 2019, https://plot.ly/graphing-libraries/. View at: Google Scholar
  59. M. Thomas, “Ngram: Textcat implementation in python,” 2007, http://thomas.mangin.me.uk/data/source/ngram.py. View at: Google Scholar
  60. J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, “The development and psychometric properties of LIWC2007,” 2007. View at: Google Scholar
  61. M. Francis and J. W. Pennebaker, “LIWC: linguistic inquiry and word count,” 1993, http://liwc.wpengine.com/Technical report. View at: Google Scholar
  62. P. D. Turney, “Thumbs up or thumbs down?” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02), Philadelphia, PA, USA, July 2002. View at: Publisher Site | Google Scholar
  63. Rainbow, 2021,.
  64. L. Derczynski and K. Bontcheva, “GATE and social media: normalisation and PoS-tagging,” 2016. View at: Google Scholar
  65. GitHub—Aritter/twitter_nlp: Twitter NLP tools, 2021,.
  66. Data.World | The Cloud-Native Data Catalog, 2021,.
  67. N. Boudjellal, H. Zhang, A. Khan, A. Ahmad, R. Naseem, and L. Dai, “A silver standard biomedical corpus for Arabic language,” Complexity, vol. 2020, Article ID 8896659, 7 pages, 2020. View at: Publisher Site | Google Scholar
  68. N. Boudjellal, H. Zhang, A. Khan, and A. Ahmad, “Biomedical relation extraction using distant supervision,” Scientific Programming, vol. 2020, Article ID 8893749, 9 pages, 2020. View at: Publisher Site | Google Scholar
  69. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” 2003. View at: Google Scholar
  70. B. Kitchenham and S. Charters, “Guidelines for performing systematic literature reviews in software engineering,” 2007, http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf. View at: Google Scholar

Copyright © 2021 Asif Khan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views91
Downloads42
Citations

Related articles