Research Article

Boosting Accuracy of Classical Machine Learning Antispam Classifiers in Real Scenarios by Applying Rough Set Theory

Table 2

Commonly used publicly available spam corpora.

Corpus% legitimate% spamNumber of messages

LingSpam183.316.62893
PU1156.243.81099
PU2180.020.0721
PU3151.049.04139
PUA150.050.01142
Spambase239.460.64601
2005 TRECSpam343.057.092189
2006 TRECSpam335.065.037822
2007 TRECSpam333.566.575419
SpamAssassin474.525.59332

Available at https://labs-repos.iit.demokritos.gr/skel/i-config/downloads/.
2Available at http://ftp.ics.uci.edu/pub/machine-learning-databases/spambase/.
3Available at http://trec.nist.gov/data/spam.html.
4Available at https://spamassassin.apache.org/publiccorpus/.