About this Journal Submit a Manuscript Table of Contents
Applied Computational Intelligence and Soft Computing
Volume 2011 (2011), Article ID 416308, 8 pages
http://dx.doi.org/10.1155/2011/416308
Research Article

Classification of Textual E-Mail Spam Using Data Mining Techniques

Institute of Information Technology of Azerbaijan National Academy of Sciences, 9 F. Agayev Street, Baku 1141, Azerbaijan

Received 19 May 2011; Revised 23 August 2011; Accepted 5 September 2011

Academic Editor: Sebastian Ventura

Copyright © 2011 Rasim M. Alguliev et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Symantec, “State of spam and phishing. A monthly report 2010,” http://symantec.com/content/en/us/enterprise/other_resources/b-state_of_spam_and_phishing_report_09-2010.en-us.pdf.
  2. Ferris Research, “Cost of spam is flattening–our 2009 predictions,” http://www.ferris.com/2009/01/28/cost-of-spam-is-flattening-our-2009-predictions/.
  3. C. Ray and H. Hunt, “Tightening the net: a review of current and next generation spam filtering tools,” Computers and Security, vol. 25, no. 8, pp. 566–578, 2006. View at Publisher · View at Google Scholar · View at Scopus
  4. H. Wen-Feng and C. Te-Min, “An incremental cluster-based approach to spam filtering,” Expert Systems with Applications, vol. 34, no. 3, pp. 1599–1608, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. M. L. Sang, S. K. Dong, and S. P. Jong, “Spam detection using feature selection and parameters optimization,” in Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems, (CISIS '10), pp. 883–888, Krakow , Poland, February 2010. View at Publisher · View at Google Scholar · View at Scopus
  6. F. S. Mehrnoush and B. Hamid, “Spam detection using dynamic weighted voting based on clustering,” in Proceedings of the 2nd International Symposium on Intelligent Information Technology Application, (IITA '08), pp. 122–126, Shanghai , China, December 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. S. Minoru and Sh. Hiroyuki, “Spam detection using text clustering,” in Proceedings of the International Conference on Cyberworlds, (CW '05), pp. 316–319, Singapore, November 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. C. Paulo, L. Clotilde, S. Pedro, et al., “Symbiotic data mining for personalized spam filtering,” in Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, (IEEE/WIC/ACM), pp. 149–156, 2009.
  9. Kh. Ahmed, “An overview of content-based spam filtering techniques,” Informatica, vol. 31, no. 3, pp. 269–277, 2007. View at Scopus
  10. S. Nazirova, “Mechanism of classification of text spam messages collected in spam pattern bases,” in Proceedings of the 3rd International Conference on Problems of Cybernetics and Informatics, (PCI '10), vol. 2, pp. 206–209, 2010.
  11. W. Lauren, “Spam wars,” Communications of the ACM, vol. 46, no. 8, p. 136, 2003. View at Publisher · View at Google Scholar · View at Scopus
  12. G. Pawel and M. Jacek, “Fighting the spam wars: a re-mailer approach with restrictive aliasing,” ACM Transactions on Internet Technology, vol. 4, no. 1, pp. 1–30, 2004.
  13. L. Fulu, H. Mo-Han, and G. Pawel, “The community behavior of spammers,” http://web.media.mit.edu/~fulu/ClusteringSpammers.pdf.
  14. K. S. Xu, M. Kliger, Y. Chen, P. J. Woolf, and A. O. Hero, “Revealing social networks of spammers through spectral clustering,” in Proceedings of the IEEE International Conference on Communications, (ICC '09), Dresden, Germany, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. K. S. Xu, M. Kliger, Y. Chen, et al., “Tracking communities of spammers by evolutionary clustering,” http://www.eecs.umich.edu/~xukevin/xu_spam_icml_2010_sna.pdf.
  16. G. Salton, Dynamic Library–Information System, Mir, Moscow, Russia, 1979.
  17. G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513–523, 1988. View at Scopus
  18. S. V. Mochenov, A. M. Blednov, and U. A. Lugovskikh, “Vector representation of the textual information,” in Proceedings of the International Scientific Conference Materials, pp. 131–139, 2006.
  19. R. M. Alguliev and R. M. Alyguliev, “Automatic text documents summarization through sentences clustering,” Journal of Automation and Information Sciences, vol. 40, no. 9, pp. 53–63, 2008.
  20. G. Vishal and G. Si Lehal, “A survey of text mining techniques and applications,” Journal of Emerging Technologies in Web Intelligence, vol. 1, no. 1, pp. 60–76, 2009.
  21. X. Li and N. Ye, “A supervised clustering and classification algorithm for mining data with mixed variables,” IEEE Transactions on Systems, Man, and Cybernetics Part A, vol. 36, no. 2, pp. 396–406, 2006. View at Publisher · View at Google Scholar
  22. T. Li, “A unified view on clustering binary data,” Machine Learning, vol. 62, no. 3, pp. 199–215, 2006. View at Publisher · View at Google Scholar
  23. J. Grabmeier and A. Rudolph, “Techniques of cluster algorithms in data mining,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 303–360, 2002. View at Publisher · View at Google Scholar · View at MathSciNet
  24. M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  25. D. R. Tauritz, J. N. Kok, and I. G. Sprinkhuizen-Kuyper, “Adaptive Information Filtering using evolutionary computation,” Information Sciences, vol. 122, no. 2, pp. 121–140, 2000. View at Publisher · View at Google Scholar
  26. J. Lijuan and F. Liping, “Text classification based on Ant Colony Optimization,” in Proceedings of the 3rd International Conference on Information and Computing, (ICIC '10), pp. 229–232, Jiang Su , China, June 2010. View at Publisher · View at Google Scholar
  27. E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. de Carvalho, “A survey of evolutionary algorithms for clustering,” IEEE Transactions on Systems, Man and Cybernetics Part C, vol. 39, no. 2, pp. 133–155, 2009. View at Publisher · View at Google Scholar
  28. R. M. Alguliev and R. M. Aliguliyev, “Fast genetic algorithm for solving of the clustering problem of text documents,” Artificial Intelligence, vol. 3, pp. 698–707, 2005 (Russian).
  29. R. Ghaemi, N. Sulaiman, H. Ibrahim, et al., “A review: accuracy optimization in clustering ensembles using genetic algorithms,” Artificial Intelligence Review, vol. 35, no. 4, pp. 287–318, 2011.
  30. R. M. Alguliev and R. M. Aliguliyev, “A new summarization method of text documents and evaluation of classification result in three aspects,” Telecommunications, vol. 3, pp. 7–16, 2006 (Russian).
  31. R. M. Alguliev and R. M. Aliguliyev, “Effective summarization method of text documents,” in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, (WI '05), pp. 264–271, September 2005. View at Publisher · View at Google Scholar
  32. R. M. Alyguliyev, “The two-stage unsupervised approach to multidocument summarization,” Automatic Control and Computer Sciences, vol. 43, no. 5, pp. 276–284, 2009. View at Publisher · View at Google Scholar
  33. R. M. Alguliev and S. A. Nazirova, “Mechanism of forming and realization of anti-spam policy,” Telecommunications, vol. 12, pp. 38–43, 2009 (Russian).
  34. L. Kyung-Chan, K. Seung-Shik, and H. Kwang-Soo, “A term weighting approach for text categorization,” Lecture Notes in Computer Science, vol. 3689, pp. 673–678, 2005.
  35. G. Patanè and M. Russo, “Comparisons between fuzzy and hard clustering techniques,” in Proceedings of the Advances in Fuzzy Systems and Intelligent Technologies, (WILF '99), pp. 176–184, 1999.
  36. N. N. Glibovec and S. A. Medvid', “Genetic algorithms used to solve scheduling problems,” Cybernetics and System Analysis, vol. 39, no. 1, pp. 81–90, 2003.
  37. T. Witkovski, S. Elzway, and A. Antchak, “Designing of the main operations of genetic algorithms for production scheduling,” Journal of Automation and Information Sciences, vol. 35, no. 12, pp. 50–58, 2003.
  38. R. M. Alguliev and R. M. Alyguliev, “A genetic approach to quasi-optimal assignment of tasks in the distributed system,” Telecommunications and Radio Engineering, vol. 64, no. 2, pp. 97–108, 2005. View at Publisher · View at Google Scholar
  39. A. L. Olsen, “Penalty functions and the knapsack problem,” in Proceedings of the 1st IEEE Conference on Evolutionary Computation, pp. 554–558, June 1994.
  40. Z.-J. Lee, S.-F. Su, C.-Y. Lee, et al., “A heuristic genetic algorithm for solving resource allocation problems,” Knowledge and Information Systems, vol. 5, no. 4, pp. 503–511, 2003.