Complexity

Research Article

Uncovering Cybercrimes in Social Media through Natural Language Processing

Table 1

Application of NLP in cybersecurity and cyber defense.


Work	Field	Scenario	Goal	Application of NLP	Complement techniques

Tamura et al. [13]	Cybersecurity	Industrial control networks	Detect anomalies in packet flow	Similarity model	Markov chain model

Chambers et al. [14]	Cybersecurity	Twitter	Detect cyberattacks and analyze user behavior	Continuous bag-of-words (CBOW) model and topic-based model (PLDA)

Khandpur et al. [15]	Cybersecurity	Twitter	Detect cyberattacks in social media	Similarity model, domain generation algorithm, and dynamic query expansion

Ritter et al. [16]	Cybersecurity	Twitter	Detect cyberattacks in social media	Name-entity recognition	Expectation regularization

Kong et al. [17]	Cybersecurity	Google Play	Evaluate the security of Android apps through user reviews	Bag-of-words (BOW) + classifier (sparse SVM)	Crowdsourcing techniques

Liao et al. [18]	Cybersecurity	Technical articles	Discovery indicators of compromise	Dependency parsing and topic extraction (POST)	Classifier (SVM), classifier (logistic regression), and graph mining

Pereira-kohatsu et al. [9]	Hate crime	Twitter	Identify and monitor hate speech	Long short-term memory NN + multilayer perceptron	Classifiers (LDA, QDA, RF, RLR, and SVM)

Muhammad et al. [19]	Hate crime	Twitter	Classify messages as hate speech, offensive, or nonoffensive	Sequential CNN (SCNN)

Gambäck and Sikdar [20]	Hate crime	Twitter	Classify tweets as “racist,” “sexist,” “both,” or “non-hate-speech”	Convolutional NN (CNN)

Malmasi and Zampieri [21]	Hate crime	Twitter	Annotate tweets with labels “hate,” “offensive,” or “ok”	Linear SVM

Qian et al. [22]	Hate crime	Twitter	Analyze real-life extremists and hate groups	Bidirectional LSTM (bi-LSTM) + deep reinforcement learning

Araque and Inglesias [7]	Radicalization	Twitter and online newspapers	Categorize radical users	Sentiment analysis and similarity model	Logistic regression and linear SVM

Nouh et al. [23]	Radicalization	Twitter	Categorize radical tweets	Language model and sentiment analysis	Classifiers (RF, NN, SVM, and KNN)

Chen [24]	Radicalization	Dark web	Categorize forum postings	Ensemble SVR	Clustering

RED-Alert [25]	Radicalization	Social media	Monitor social networks in real time	Semantic analysis, lexical analysis, and domain-specific ontologies	Social network analysis and complex event processing

Iqbal et al. [26]	Cybercrime	Chat logs	Summarize conversations into crime-related topics	Named-entity recognition, semantic analysis, similarity model	Information visualizer

Pastrana et al. [27]	Cybercrime	Underground forums	Detect cybercrime topics and identify potential victims	Logistic regression and topic extraction	Social network analysis and clustering (K-means)

Bhalerao et al. [28]	Cybercrime	Underground forums	Analyze posts and replies for the identification of supply chains	Classifiers, (FT, LR, SVM, and XGBoost)

Our work	Cybercrime	Twitter	Identify suspect groups	Similarity model and sentiment analysis	Clustering (K-means) and graph mining