Review Article

Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges

Table 3

Comparison of unsupervised learning techniques used for spam filtering.

AuthorsAlgorithm usedDatasetAccuracy (%)AdvantagesDisadvantages

Ahmed [80]Improved digest and DBSCANSpam assassin96.7The proposed model divides email into fixed-length strings before clustering, which gives better accuracyThe speed of the proposed model depends upon the length of strings
Sharma and Rastogi [78]K-means clusteringUCI dataset92.76It is discretized using supervised attribute filters and also used 10-fold cross-validationWhile comparing multiple algorithms, results take a handsome amount of time
Cabrera-León et al. [81]Unsupervised artificial neural networksEnron email95The system is robust to word obfuscation, used in spam, independently of the use of stemming or lemmatizationBad false negative and false positive rate are around 11 and 4%, respectively
Sasaki and Shinnou [82]Spherical k-means algorithmLing-spam96.04The model uses various contents of spam emailsUpdating spam contents and relevance feedback is not in the proposed model
Narisawa et al. [83]Equivalence relations of stringsJapanese web forums95The model was scalable and language-independentAs the model uses N-Gram of documents, so results depend on the value of “n
Tan et al. [79]UNIK and SD2Social network sites data93It is highly robust to an increased level of spam attacksThe proposed system cannot handle short URLs