Review Article
Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges
Table 3
Comparison of unsupervised learning techniques used for spam filtering.
| Authors | Algorithm used | Dataset | Accuracy (%) | Advantages | Disadvantages |
| Ahmed [80] | Improved digest and DBSCAN | Spam assassin | 96.7 | The proposed model divides email into fixed-length strings before clustering, which gives better accuracy | The speed of the proposed model depends upon the length of strings | Sharma and Rastogi [78] | K-means clustering | UCI dataset | 92.76 | It is discretized using supervised attribute filters and also used 10-fold cross-validation | While comparing multiple algorithms, results take a handsome amount of time | Cabrera-León et al. [81] | Unsupervised artificial neural networks | Enron email | 95 | The system is robust to word obfuscation, used in spam, independently of the use of stemming or lemmatization | Bad false negative and false positive rate are around 11 and 4%, respectively | Sasaki and Shinnou [82] | Spherical k-means algorithm | Ling-spam | 96.04 | The model uses various contents of spam emails | Updating spam contents and relevance feedback is not in the proposed model | Narisawa et al. [83] | Equivalence relations of strings | Japanese web forums | 95 | The model was scalable and language-independent | As the model uses N-Gram of documents, so results depend on the value of “n” | Tan et al. [79] | UNIK and SD2 | Social network sites data | 93 | It is highly robust to an increased level of spam attacks | The proposed system cannot handle short URLs |
|
|