Journals
Publish with us
Publishing partnerships
About us
Blog
Scientific Programming
Journal overview
For authors
For reviewers
For editors
Table of Contents
Special Issues
Scientific Programming
/
2016
/
Article
/
Tab 2
/
Research Article
WSF2: A Novel Framework for Filtering Web Spam
Table 2
Available datasets for web spam research.
Corpus name
Hosts
Pages
Domain crawled
%Spam
Available at
WebSpam UK2006
11,400
77.9M
United Kingdom (.uk)
26%
http://chato.cl/webspam/datasets/uk2006/
WebSpam UK2007
114,529
105M
United Kingdom (.uk)
5.30%
http://chato.cl/webspam/datasets/uk2007/
WebSpam UK2011
n/a
3,766
United Kingdom (.uk)
53%
https://sites.google.com/site/heiderawahsheh/home/web-spam-2011-datasets/uk-2011-web-spam-dataset
DC2010
99,000
23M
Europe (.eu)
3.2%
https://dms.sztaki.hu/en/letoltes/ecmlpkdd-2010-discovery-challenge-data-set
Webb Spam Corpus 2006
n/a
350,000
Links found in millions of spam e-mails
100%
http://www.cc.gatech.edu/projects/doi/WebbSpamCorpus.html
Webb Spam Corpus 2011
n/a
330,000
Links found in millions of spam e-mails
100%
http://www.cc.gatech.edu/projects/doi/WebbSpamCorpus.html