|
Model | Feature types | Feature used | Technique | Replicability of features | Remarks |
|
Spot 1.0 | Attribute features | Including the number of followers and followees, reputation, frequency of tweets, average number of URLs, hashtag, and trends | Machine-learning classification and statistical analysis | Easy | (1) Presented a tool developed for scoring suspicious profiles on Twitter through a three-dimensional indicator (2) Limited features for each category were examined (3) Text and semantics in tweets were completely ignored |
|
OddBall | Network features | Number of nodes, number of edges, weights, eigenvalues, and number of friends | Unsupervised method to detect abnormal nodes in weighted graphs | Easy | (1) Discovery of new patterns that egonets follow (2) Huge size of social network made it difficult to expand and gather network features |
|
DARPA | Attribute features, network features, and content features | User name/avatar, geographical location, and number of followers/followings; tweet syntax and tweet semantics, such as frequent topics; sentiment inconsistency; average number of tweets per day, average clustering coefficient of retweet, and number/percentage of bots in cluster | Step 1: initial bot detection by manually inspecting Step 2: clustering-based outlier detection (non-negative matrix factorization and KNN search) and network analysis Step 3: classification/outlier analysis (SVMs) | Easy | (1) Algorithm detected all bots in set scene (2) System needed to be semisupervised, with help of human judgement to augment automated bot-identification processes (3) Powerful visualization tools were needed to help analysts capture suspicious robots |
|
COMPA | Activity features and content features | Time (hour of day), message source, message text (language), message topic, links in messages, direct user interaction, and proximity | Based on user behavioral profile, anomaly detection used content and URL similarity measures | Easy | (1) Created behavioral profiles of users to detect deviation from normal model (2) Compared to previous version, COMPA looked at isolated compromises that affect high-profile accounts (3) It took a significant amount of time and computational resources to collect profile information from users (4) Accuracy of detection results depended on established behavioral profile and selected threshold |
|
SAHP | Activity features and content features | Active time, message source (terminals), message topic, link, stop word, keyword, and mention (@) | Combines information gain ratio with analytical hierarchy process algorithm | Easy | (1) Presented profile features of users more comprehensively (2) Improved on previously established COMPA methods for detecting compromised accounts (3) Detection behavior of proposed algorithm was highly dependent on threshold value, selection of which may introduce bias |
|
TB-CoAuth | Content features | Content free, content specific, stylometric, and folksonomy | Continuous authentication of textual content, incremental learning, and supervised-machine-learning classifiers | Hard | (1) Various features are selected: content free and content specific (2) Best classifier: SVM with RBF (radial basis function) kernel (3) F1-score: 94.57% (4) In the era of big data, it was inappropriate to rely on statistical and manual selection of features |
|
HoID | Content features | Hurst of Interest Distribution | Machine-learning classification (LDA) and statistical analysis (Hurst) | Hard | (1) Feature selection is novel and precise, with personal uniqueness (2) Detection process does not need investment of human resources, which greatly improves algorithm efficiency and accuracy (3) Best classifier: KNN (4) F1-score: 95.90% |
|