WSF2: A Novel Framework for Filtering Web Spam

Table 2

Available datasets for web spam research.

Corpus nameHostsPagesDomain crawled%SpamAvailable at

WebSpam UK200611,40077.9MUnited Kingdom (.uk)26%
WebSpam UK2007114,529105MUnited Kingdom (.uk)5.30%
WebSpam UK2011n/a3,766United Kingdom (.uk)53%
DC201099,00023MEurope (.eu)3.2%
Webb Spam Corpus 2006n/a350,000Links found in millions of spam e-mails100%
Webb Spam Corpus 2011n/a330,000Links found in millions of spam e-mails100%

