|
Authors | Algorithm | Dataset | Accuracy | Advantages | Limitations |
|
DeBarr and Wechsler [42] | Random forest | Custom collection | 95.2% | They got good accuracy with multiple trees | The dataset that they used was not a standard dataset |
Rusland et al. [63] | Modified Naïve Bayes with selective features | Spam base and spam data | 88% on spam base 83% on spam data | Selective features are taken that consume less time | They got less accuracy, and their model was not much intelligent |
Halu zu et al. [67] | Bayes Net, SVM, and NB | Twitter and Facebook dataset | 90% using SVM | They used the combined dataset for the training and testing of classifiers | Multiple algorithms and a combined dataset system take more training time |
Hijawi et al. [41] | (MLP), Naïve Bayes, random forest, and decision tree | Spam assassin | 99.3% using random forest | They use a list of most common spam features that improve the spam detection rate | They use a significant corpus of 6050 emails, but they use a small number of features extracted from the corpus |
Banday and Jan [55] | Naïve Bayes, K-nearest neighbor, SVM, and additive regression tree | Real-life dataset | 96.69% using SVM | They make a spam filter based on 8000 real-life spam emails | Their model is not so effective as spammers continuously change the characteristic that they used for making spam filter |
Verma and Sofat [48] | ID3 algorithm hidden Markov | Enron dataset | 89% | They use a preclassified dataset that uses less time in processing | Their model got an 11% loss that is not too good for spam filters |
Subasi et al. [40] | CART, C4.5, REP tree, LAD tree, and NBT | UCI dataset | 95.1% | They used 10-fold cross-validation that helps in better evaluation | Less number of features used |
Zheng et al. [12] | SVM | Weibo social network data | 99.5% | They use both user content and behavior features for detecting spammers | Feature extraction is based on statistical analysis and manual selection |
Garavand et al. [72] | SVM, deep learning, and particle swarm optimization | Standard datasets from UCI 70% education data | 93% using the support vector machine | They use deep learning models for feature extraction | The neural networks take massive time for training for the extraction of features |
Olatunji et al. [5] | ELM and SVM classifier | Enron dataset | 94.06 using SVM | They got a high accuracy level as compared to previous studies on the same dataset | For SVM, it takes more time than ELM to gain the accuracy level claimed in the paper |
Jamil et al. [10] | SVM, KNN, DT, and LR | Health fitness data | 92.1 using SVM | Smart contract-enabled blockchain technique is used that makes the system more secure | Interoperability of proposed model with IoT framework is not evaluated |
Arif et al. [11] | XGBoost, bagged model, and generalized linear model with stepwise feature selection | Smart home dataset | 91.8 using generalized linear model with stepwise feature selection | PCA was applied that enhances the accuracy of the system | Climatic and surrounding features of IoT devices are not considered |
|