Scientific Programming / 2016 / Article / Tab 2

Research Article

WSF2: A Novel Framework for Filtering Web Spam

Table 2

Available datasets for web spam research.

Corpus nameHostsPagesDomain crawled%SpamAvailable at

WebSpam UK200611,40077.9MUnited Kingdom (.uk)26%
WebSpam UK2007114,529105MUnited Kingdom (.uk)5.30%
WebSpam UK2011n/a3,766United Kingdom (.uk)53%
DC201099,00023MEurope (.eu)3.2%
Webb Spam Corpus 2006n/a350,000Links found in millions of spam e-mails100%
Webb Spam Corpus 2011n/a330,000Links found in millions of spam e-mails100%

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.