Research Article

WSF2: A Novel Framework for Filtering Web Spam

Algorithm 2

Dummy filter definition for the WSF2 platform.
(00) web_features SVM check_svm();
() describe SVM Classifies a web page as spam using Support Vector Machine classifier
() score SVM 3
()
() web_features TREE_95 check_tree(0.95, 0.99);
() describe TREE_99 C5.0 between 0.99 and 1.00
() score TREE_99 1.5
()
() web_features TREE_99 check_tree(0.99, 1.00);
() describe TREE_99 C5.0 between 0.99 and 1.00
() score TREE_99 3
()
() web_body HAS_VIAGRA_ON_WEB_BODY eval( "[vV][iI?1!][aA][gG][rR][aA]")
() describe HAS_VIAGRA_ON_WEB_BODY Check if the web page contains references to viagra on body
() score HAS_VIAGRA_ON_WEB_BODY 2
()
() meta HAS_HIGH_SPAM_RATE (SVM & (TREE_95 || TREE_99))
() describe HAS_HIGH_SPAM_RATE Has high probability of being spam
() score HAS_HIGH_SPAM_RATE +
()
() required_score 5