Table of Contents
International Scholarly Research Notices
Volume 2014, Article ID 717092, 10 pages
Research Article

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

Department of Computer Science and Engineering, Institute of Engineering and Management, West Bengal 700091, India

Received 16 April 2014; Revised 23 July 2014; Accepted 18 August 2014; Published 29 October 2014

Academic Editor: Sebastian Ventura

Copyright © 2014 Subhajit Dey Sarkar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Feldman and J. Feldman, Eds., The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, Cambridge, UK, 2007.
  2. R. Blumberg and S. Atre, “The problem with unstructured data,” DM Review, vol. 13, pp. 42–49, 2003. View at Google Scholar
  3. A. Kyriakopoulou and T. Kalamboukis, “Text classification using clustering,” in Proceedings of The 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD '06), Berlin, Germany, 2006.
  4. A. McCallum and K. Nigam, “A comparison of event models for naive Bayes text classification,” in Proceedings of the Workshop on Learning for Text Categorization (AAAI-98), vol. 752, 1998.
  5. D. Lewis David, “Naive (Bayes) at forty: the independence assumption in information retrieval,” in Machine Learning: ECML-98, pp. 4–15, Springer, Berlin, Germany, 1998. View at Google Scholar
  6. J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, “Tackling the poor assumptions of naive bayes text classifiers,” in Proceedings of the 20th International Conference on Machine Learning (ICML '03), vol. 3, 2003.
  7. R. Kohavi, B. Becker, and D. Sommerfield, “Improving simple bayes,” 1997.
  8. S.-B. Kim, H.-C. Rim, D. Yook, and H.-S. Lim, “Effective methods for improving Naive Bayes text classifiers,” in PRICAI 2002: Trends in Artificial Intelligence, pp. 414–423, Springer, Berlin, Germany, 2002. View at Publisher · View at Google Scholar
  9. S. D. Sarkar and S. Goswami, “Empirical study on filter based feature selection methods for text classification,” International Journal of Computer Applications, vol. 81, no. 6, 2013. View at Publisher · View at Google Scholar
  10. F. George, “An extensive empirical study of feature selection metrics for text classification,” Journal of Machine Learning Research, vol. 3, pp. 1289–1305, 2003. View at Google Scholar
  11. K. Bache and M. Lichman, “UCI Machine Learning Repository,” University of California, School of Information and Computer Science, Irvine, Calif, USA, 2013,
  12. J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011. View at Google Scholar · View at Scopus
  13. C. D. Manning, R. Prabhakar, and H. Schütze, Introduction to Information Retrieval, vol. 1, Cambridge University Press, 2008.
  14. L. Jiang, D. Wang, Z. Cai, and X. Yan, “Survey of improving naive Bayes for classification,” in Advanced Data Mining and Applications, vol. 4632 of Lecture Notes in Computer Science, pp. 134–145, Springer, Berlin, Germany, 2007. View at Publisher · View at Google Scholar
  15. J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Francisco, Calif, USA, 1988. View at MathSciNet
  16. Z. Wei and F. Gao, “An improvement to naive bayes for text classification,” Procediamm Engineering, vol. 15, pp. 2160–2164, 2011. View at Google Scholar
  17. I. S. Dhillon, S. Mallela, and R. Kumar, “A divisive information theoretic feature clustering algorithm for text classification,” The Journal of Machine Learning Research, vol. 3, pp. 1265–1287, 2003. View at Google Scholar
  18. M. A. Hall, Correlation-based feature selection for machine learning [Ph.D. dissertation], The University of Waikato, 1999.
  19. P. Mitra, C. A. Murthy, and S. K. Pal, “Unsupervised feature selection using feature similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301–312, 2002. View at Publisher · View at Google Scholar · View at Scopus
  20. G. Li, X. Hu, X. Shen, X. Chen, and Z. Li, “A novel unsupervised feature selection method for bioinformatics data sets through feature clustering,” in Proceedings of the IEEE International Conference on Granular Computing (GRC '08), pp. 41–47, Hangzhou, China, August 2008. View at Publisher · View at Google Scholar · View at Scopus
  21. A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010. View at Publisher · View at Google Scholar · View at Scopus
  22. Y. Zhao, R and Data Mining: Examples and Case Studies, Academic Press, New York, NY, USA, 2012.
  23. R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.
  24. P. Romanski, “FSelector: selecting attributes. R package Version 0.19,”
  25. D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch, e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-1, 2012,
  26. T. Hothorn, K. Hornik, and A. Zeileis, “Unbiased recursive partitioning: a conditional inference framework,” Journal of Computational and Graphical Statistics, vol. 15, no. 3, pp. 651–674, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus