Research Article

Detecting Abnormal Social Network Accounts with Hurst of Interest Distribution

Table 3

Comparison of feature selection with competitive methods in field of abnormal-account detection on social networks.

ModelFeature typesFeature usedTechniqueReplicability of featuresRemarks

Spot 1.0Attribute featuresIncluding the number of followers and followees, reputation, frequency of tweets, average number of URLs, hashtag, and trendsMachine-learning classification and statistical analysisEasy(1) Presented a tool developed for scoring suspicious profiles on Twitter through a three-dimensional indicator
(2) Limited features for each category were examined
(3) Text and semantics in tweets were completely ignored

OddBallNetwork featuresNumber of nodes, number of edges, weights, eigenvalues, and number of friendsUnsupervised method to detect abnormal nodes in weighted graphsEasy(1) Discovery of new patterns that egonets follow
(2) Huge size of social network made it difficult to expand and gather network features

DARPAAttribute features, network features, and content featuresUser name/avatar, geographical location, and number of followers/followings; tweet syntax and tweet semantics, such as frequent topics; sentiment inconsistency; average number of tweets per day, average clustering coefficient of retweet, and number/percentage of bots in clusterStep 1: initial bot detection by manually inspecting
Step 2: clustering-based outlier detection (non-negative matrix factorization and KNN search) and network analysis
Step 3: classification/outlier analysis (SVMs)
Easy(1) Algorithm detected all bots in set scene
(2) System needed to be semisupervised, with help of human judgement to augment automated bot-identification processes
(3) Powerful visualization tools were needed to help analysts capture suspicious robots

COMPAActivity features and content featuresTime (hour of day), message source, message text (language), message topic, links in messages, direct user interaction, and proximityBased on user behavioral profile, anomaly detection used content and URL similarity measuresEasy(1) Created behavioral profiles of users to detect deviation from normal model
(2) Compared to previous version, COMPA looked at isolated compromises that affect high-profile accounts
(3) It took a significant amount of time and computational resources to collect profile information from users
(4) Accuracy of detection results depended on established behavioral profile and selected threshold

SAHPActivity features and content featuresActive time, message source (terminals), message topic, link, stop word, keyword, and mention (@)Combines information gain ratio with analytical hierarchy process algorithmEasy(1) Presented profile features of users more comprehensively
(2) Improved on previously established COMPA methods for detecting compromised accounts
(3) Detection behavior of proposed algorithm was highly dependent on threshold value, selection of which may introduce bias

TB-CoAuthContent featuresContent free, content specific, stylometric, and folksonomyContinuous authentication of textual content, incremental learning, and supervised-machine-learning classifiersHard(1) Various features are selected: content free and content specific
(2) Best classifier: SVM with RBF (radial basis function) kernel
(3) F1-score: 94.57%
(4) In the era of big data, it was inappropriate to rely on statistical and manual selection of features

HoIDContent featuresHurst of Interest DistributionMachine-learning classification (LDA) and statistical analysis (Hurst)Hard(1) Feature selection is novel and precise, with personal uniqueness
(2) Detection process does not need investment of human resources, which greatly improves algorithm efficiency and accuracy
(3) Best classifier: KNN
(4) F1-score: 95.90%