Security and Communication Networks

Research Article

Detecting Abnormal Social Network Accounts with Hurst of Interest Distribution

Table 3

Comparison of feature selection with competitive methods in field of abnormal-account detection on social networks.


Model	Feature types	Feature used	Technique	Replicability of features	Remarks

Spot 1.0	Attribute features	Including the number of followers and followees, reputation, frequency of tweets, average number of URLs, hashtag, and trends	Machine-learning classification and statistical analysis	Easy	(1) Presented a tool developed for scoring suspicious profiles on Twitter through a three-dimensional indicator (2) Limited features for each category were examined (3) Text and semantics in tweets were completely ignored

OddBall	Network features	Number of nodes, number of edges, weights, eigenvalues, and number of friends	Unsupervised method to detect abnormal nodes in weighted graphs	Easy	(1) Discovery of new patterns that egonets follow (2) Huge size of social network made it difficult to expand and gather network features

DARPA	Attribute features, network features, and content features	User name/avatar, geographical location, and number of followers/followings; tweet syntax and tweet semantics, such as frequent topics; sentiment inconsistency; average number of tweets per day, average clustering coefficient of retweet, and number/percentage of bots in cluster	Step 1: initial bot detection by manually inspecting Step 2: clustering-based outlier detection (non-negative matrix factorization and KNN search) and network analysis Step 3: classification/outlier analysis (SVMs)	Easy	(1) Algorithm detected all bots in set scene (2) System needed to be semisupervised, with help of human judgement to augment automated bot-identification processes (3) Powerful visualization tools were needed to help analysts capture suspicious robots

COMPA	Activity features and content features	Time (hour of day), message source, message text (language), message topic, links in messages, direct user interaction, and proximity	Based on user behavioral profile, anomaly detection used content and URL similarity measures	Easy	(1) Created behavioral profiles of users to detect deviation from normal model (2) Compared to previous version, COMPA looked at isolated compromises that affect high-profile accounts (3) It took a significant amount of time and computational resources to collect profile information from users (4) Accuracy of detection results depended on established behavioral profile and selected threshold

SAHP	Activity features and content features	Active time, message source (terminals), message topic, link, stop word, keyword, and mention (@)	Combines information gain ratio with analytical hierarchy process algorithm	Easy	(1) Presented profile features of users more comprehensively (2) Improved on previously established COMPA methods for detecting compromised accounts (3) Detection behavior of proposed algorithm was highly dependent on threshold value, selection of which may introduce bias

TB-CoAuth	Content features	Content free, content specific, stylometric, and folksonomy	Continuous authentication of textual content, incremental learning, and supervised-machine-learning classifiers	Hard	(1) Various features are selected: content free and content specific (2) Best classifier: SVM with RBF (radial basis function) kernel (3) F1-score: 94.57% (4) In the era of big data, it was inappropriate to rely on statistical and manual selection of features

HoID	Content features	Hurst of Interest Distribution	Machine-learning classification (LDA) and statistical analysis (Hurst)	Hard	(1) Feature selection is novel and precise, with personal uniqueness (2) Detection process does not need investment of human resources, which greatly improves algorithm efficiency and accuracy (3) Best classifier: KNN (4) F1-score: 95.90%