Security and Communication Networks

Review Article

Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges

Comparison of unsupervised learning techniques used for spam filtering.


Authors	Algorithm used	Dataset	Accuracy (%)	Advantages	Disadvantages

Ahmed [80]	Improved digest and DBSCAN	Spam assassin	96.7	The proposed model divides email into fixed-length strings before clustering, which gives better accuracy	The speed of the proposed model depends upon the length of strings
Sharma and Rastogi [78]	K-means clustering	UCI dataset	92.76	It is discretized using supervised attribute filters and also used 10-fold cross-validation	While comparing multiple algorithms, results take a handsome amount of time
Cabrera-León et al. [81]	Unsupervised artificial neural networks	Enron email	95	The system is robust to word obfuscation, used in spam, independently of the use of stemming or lemmatization	Bad false negative and false positive rate are around 11 and 4%, respectively
Sasaki and Shinnou [82]	Spherical k-means algorithm	Ling-spam	96.04	The model uses various contents of spam emails	Updating spam contents and relevance feedback is not in the proposed model
Narisawa et al. [83]	Equivalence relations of strings	Japanese web forums	95	The model was scalable and language-independent	As the model uses N-Gram of documents, so results depend on the value of “n”
Tan et al. [79]	UNIK and SD2	Social network sites data	93	It is highly robust to an increased level of spam attacks	The proposed system cannot handle short URLs