Research Article

Uncovering Cybercrimes in Social Media through Natural Language Processing

Table 1

Application of NLP in cybersecurity and cyber defense.

WorkFieldScenarioGoalApplication of NLPComplement techniques

Tamura et al. [13]CybersecurityIndustrial control networksDetect anomalies in packet flowSimilarity modelMarkov chain model

Chambers et al. [14]CybersecurityTwitterDetect cyberattacks and analyze user behaviorContinuous bag-of-words (CBOW) model and topic-based model (PLDA)

Khandpur et al. [15]CybersecurityTwitterDetect cyberattacks in social mediaSimilarity model, domain generation algorithm, and dynamic query expansion

Ritter et al. [16]CybersecurityTwitterDetect cyberattacks in social mediaName-entity recognitionExpectation regularization

Kong et al. [17]CybersecurityGoogle PlayEvaluate the security of Android apps through user reviewsBag-of-words (BOW) + classifier (sparse SVM)Crowdsourcing techniques

Liao et al. [18]CybersecurityTechnical articlesDiscovery indicators of compromiseDependency parsing and topic extraction (POST)Classifier (SVM), classifier (logistic regression), and graph mining

Pereira-kohatsu et al. [9]Hate crimeTwitterIdentify and monitor hate speechLong short-term memory NN + multilayer perceptronClassifiers (LDA, QDA, RF, RLR, and SVM)

Muhammad et al. [19]Hate crimeTwitterClassify messages as hate speech, offensive, or nonoffensiveSequential CNN (SCNN)

Gambäck and Sikdar [20]Hate crimeTwitterClassify tweets as “racist,” “sexist,” “both,” or “non-hate-speech”Convolutional NN (CNN)

Malmasi and Zampieri [21]Hate crimeTwitterAnnotate tweets with labels “hate,” “offensive,” or “ok”Linear SVM

Qian et al. [22]Hate crimeTwitterAnalyze real-life extremists and hate groupsBidirectional LSTM (bi-LSTM) + deep reinforcement learning

Araque and Inglesias [7]RadicalizationTwitter and online newspapersCategorize radical usersSentiment analysis and similarity modelLogistic regression and linear SVM

Nouh et al. [23]RadicalizationTwitterCategorize radical tweetsLanguage model and sentiment analysisClassifiers (RF, NN, SVM, and KNN)

Chen [24]RadicalizationDark webCategorize forum postingsEnsemble SVRClustering

RED-Alert [25]RadicalizationSocial mediaMonitor social networks in real timeSemantic analysis, lexical analysis, and domain-specific ontologiesSocial network analysis and complex event processing

Iqbal et al. [26]CybercrimeChat logsSummarize conversations into crime-related topicsNamed-entity recognition, semantic analysis, similarity modelInformation visualizer

Pastrana et al. [27]CybercrimeUnderground forumsDetect cybercrime topics and identify potential victimsLogistic regression and topic extractionSocial network analysis and clustering (K-means)

Bhalerao et al. [28]CybercrimeUnderground forumsAnalyze posts and replies for the identification of supply chainsClassifiers, (FT, LR, SVM, and XGBoost)

Our workCybercrimeTwitterIdentify suspect groupsSimilarity model and sentiment analysisClustering (K-means) and graph mining