Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 649260, 16 pages
http://dx.doi.org/10.1155/2014/649260
Research Article

An Ant Colony Optimization Based Feature Selection for Web Page Classification

Department of Computer Engineering, Çukurova University, Balcali, Sarıçam, 01330 Adana, Turkey

Received 25 April 2014; Revised 20 June 2014; Accepted 22 June 2014; Published 17 July 2014

Academic Editor: T. O. Ting

Copyright © 2014 Esra Saraç and Selma Ayşe Özel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. X. Qi and B. D. Davison, “Web page classification: features and algorithms,” ACM Computing Surveys, vol. 41, no. 2, article 12, 2009. View at Publisher · View at Google Scholar · View at Scopus
  2. Yahoo!, https://maktoob.yahoo.com/?p=us.
  3. Open direct Project, http://www.dmoz.org/.
  4. C. Chen, H. Lee, and C. Tan, “An intelligent web-page classifier with fair feature-subset selection,” Engineering Applications of Artificial Intelligence, vol. 19, no. 8, pp. 967–978, 2006. View at Publisher · View at Google Scholar · View at Scopus
  5. W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. Wang, “A novel feature selection algorithm for text categorization,” Expert Systems with Applications, vol. 33, no. 1, pp. 1–5, 2007. View at Publisher · View at Google Scholar · View at Scopus
  6. Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning (ICML '97), pp. 412–420, Nashville, Tenn, USA, July 1997.
  7. M. H. Aghdam, N. Ghasem-Aghaee, and M. E. Basiri, “Text feature selection using ant colony optimization,” Expert Systems with Applications, vol. 36, no. 3, pp. 6843–6853, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. M. Dorigo, Optimization, learning and natural algorithms [Ph. D. thesis], Politecnico di Milano, Milan, Italy, 1992.
  9. M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by a colony of cooperating agents,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 26, no. 1, pp. 29–41, 1996. View at Publisher · View at Google Scholar · View at Scopus
  10. L. Liu, Y. Dai, and J. Gao, “Ant colony optimization algorithm for continuous domains based on position distribution model of ant colony foraging,” The Scientific World Journal, vol. 2014, Article ID 428539, 9 pages, 2014. View at Publisher · View at Google Scholar
  11. T. M. Mitchell, Machine Learning, McGraw-Hill, New York, NY, USA, 1st edition, 1997.
  12. S. Chakrabarti, M. van den Berg, and B. Dom, “Focused crawling: a new approach to topic-specific Web resource discovery,” Computer Networks, vol. 31, no. 11–16, pp. 1623–1640, 1999. View at Publisher · View at Google Scholar · View at Scopus
  13. S. A. Özel and E. Saraç, “Focused crawler for finding professional events based on user interests,” in Proceedings of the 23rd International Symposium on Computer and Information Sciences (ISCIS '08), pp. 441–444, Istanbul, Turkey, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. F. Menczer and R. K. Belew, “Adaptive information agents in distributed textual environments,” in Proceedings of the 2nd International Conference on Autonomous Agents (AGENTS '98), pp. 157–164, Minneapolis, Minn, USA, May 1998. View at Publisher · View at Google Scholar · View at Scopus
  15. C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 1948. View at Publisher · View at Google Scholar · View at MathSciNet
  16. W. J. Wilbur and K. Sirotkin, “The automatic identification of stop words,” Journal of Information Science, vol. 18, no. 1, pp. 45–55, 1992. View at Publisher · View at Google Scholar · View at Scopus
  17. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience Press, New York, NY, USA, 1991.
  18. S. Guiasu, Information Theory with Applications, McGraw-Hill Press, New York, NY, USA, 1st edition, 1977. View at MathSciNet
  19. Y. Yang, “Noise reduction in a statistical approach to text categorization,” in Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263, July 1995. View at Scopus
  20. Y. Yang and W. J. Wilbur, “Using corpus statistics to remove redundant words in text categorization,” Journal of the American Society for Information Science, vol. 47, no. 5, pp. 357–369, 1996. View at Publisher · View at Google Scholar · View at Scopus
  21. Reuters Dataset, http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html.
  22. R. Jensen and Q. Shen, “Fuzzy-rough data reduction with ant colony optimization,” Fuzzy Sets and Systems, vol. 149, no. 1, pp. 5–20, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  23. Y. Chen, D. Miao, and R. Wang, “A rough set approach to feature selection based on ant colony optimization,” Pattern Recognition Letters, vol. 31, no. 3, pp. 226–233, 2010. View at Publisher · View at Google Scholar · View at Scopus
  24. C. L. Huang, “ACO-based hybrid classification system with feature subset selection and model parameters optimization,” Neurocomputing, vol. 73, no. 1–3, pp. 438–448, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. R. K. Sivagaminathan and S. Ramakrishnan, “A hybrid approach for feature subset selection using neural networks and ant colony optimization,” Expert Systems with Applications, vol. 33, no. 1, pp. 49–60, 2007. View at Publisher · View at Google Scholar · View at Scopus
  26. S. M. Vieira, J. M. C. Sousa, and T. A. Runkler, “Two cooperative ant colonies for feature selection using fuzzy models,” Expert Systems with Applications, vol. 37, no. 4, pp. 2714–2723, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. S. Nemati and M. E. Basiri, “Text-independent speaker verification using ant colony optimization-based selected features,” Expert Systems with Applications, vol. 38, no. 1, pp. 620–630, 2011. View at Publisher · View at Google Scholar · View at Scopus
  28. M. M. Kabir, M. Shahjahan, and K. Murase, “A new hybrid ant colony optimization algorithm for feature selection,” Expert Systems with Applications, vol. 39, no. 3, pp. 3747–3763, 2012. View at Publisher · View at Google Scholar · View at Scopus
  29. E. Akarsu and A. Karahoca, “Simultaneous feature selection and ant colony clustering,” Procedia Computer Science, vol. 3, pp. 1432–1438, 2011. View at Publisher · View at Google Scholar
  30. A. Al-Ani, “Ant colony optimization for feature subset selection,” World Academy of Science, Engineering and Technology, vol. 1, no. 4, 2007. View at Google Scholar
  31. W. Wang, Y. Jiang, and S. W. Chen, “Feature subset selection based on ant colony optimization and support vector machine,” in Proceedings of the 7th WSEAS International Conference of Signal Processing, Computational Geometry & Artificial Vision, p. 182, Athens, Greece, August 2007.
  32. N. Jain and J. P. Singh, “Modification of ant algorithm for feature selection,” in Proceedings of the International Conference on Control Automation, Communication and Energy Conservation (INCACEC '09), June 2009. View at Scopus
  33. N. Abd-Alsabour and M. Randall, “Feature selection for classification using an ant colony system,” in Proceedings of the 6th IEEE International Conference on e-Science Workshops (e-ScienceW '10), pp. 86–91, Brisbane, Australia, December 2010. View at Publisher · View at Google Scholar · View at Scopus
  34. M. H. Rasmy, M. El-Beltagy, M. Saleh, and B. Mostafa, “A hybridized approach for feature selection using ant colony optimization and ant-miner for classification,” in Proceeding of the 8th International Conference on Informatics and Systems (INFOS '12), pp. BIO-211–BIO-219, Cairo, Egypt, May 2012. View at Scopus
  35. R. S. Parpinelli, H. S. Lopes, and A. Freitas, “An ant colony algorithm for classification rule discovery,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 4, pp. 321–332, 2002. View at Google Scholar
  36. N. Holden and A. A. Freitas, “Web page classification with an ant colony algorithm,” in Parallel Problem Solving from Nature—PPSN VIII, vol. 3242 of Lecture Notes in Computer Science, pp. 1092–1102, Springer, Berlin, Germany, 2004. View at Google Scholar
  37. R. Jensen and Q. Shen, “Web page classification with aco-enhanced fuzzy-rough fetaure selection,” vol. 4259 of Lecture Notes in Artificial Intelligence, pp. 147–156, 2006. View at Google Scholar
  38. M. Janaki Meena, K. R. Chandran, A. Karthik, and A. Vijay Samuel, “An enhanced ACO algorithm to select features for text categorization and its parallelization,” Expert Systems with Applications, vol. 39, no. 5, pp. 5861–5871, 2012. View at Publisher · View at Google Scholar · View at Scopus
  39. J. A. Mangai, V. S. Kumar, and S. A. Balamurugan, “A novel feature selection framework for automatic web page classification,” International Journal of Automation and Computing, vol. 9, no. 4, pp. 442–448, 2012. View at Publisher · View at Google Scholar · View at Scopus
  40. WEKA, http://www.cs.waikato.ac.nz/~ml/weka.
  41. S. Kim and B. Zhang, “Genetic mining of HTML structures for effective web-document retrieval,” Applied Intelligence, vol. 18, no. 3, pp. 243–256, 2003. View at Publisher · View at Google Scholar · View at Scopus
  42. A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea, “Web page classification: a soft computing approach,” Lecture Notes in Artificial Intelligence, vol. 2663, pp. 103–112, 2003. View at Google Scholar
  43. A. Trotman, “Choosing document structure weights,” Information Processing & Management, vol. 41, no. 2, pp. 243–264, 2005. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  44. M. F. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3, pp. 130–137, 1980. View at Google Scholar
  45. M. Dorigo and T. Stützle, Ant Colony Optimization, The MIT Press, 2004.
  46. T. Bäck, Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York, NY, USA, 1996. View at Zentralblatt MATH · View at MathSciNet
  47. J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, Calif, USA, 1st edition, 1993.
  48. C. J. van Rijsbergen, Information Retrieval, Butterworth-Heinemann, London, UK, 2nd edition, 1979.
  49. M. Craven, D. DiPasquo, D. Freitag et al., “Learning to extract symbolic knowledge from the World Wide Web,” in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI '98), pp. 509–516, AAAI Press, July 1998. View at Scopus
  50. CMU, http://www.cs.cmu.edu/.
  51. WebKB, http://www.cs.cmu.edu/~webkb/.
  52. DBLP, http://www.informatik.uni-trier.de/~ley/db/.
  53. Google, http://www.google.com.
  54. E. Saraç and S. A. Özel, “URL tabanlı web sayfası sınıflandırma,” in Akıllı Sistemlerde Yenilikler ve. Uygulamaları Sempozyumu (ASYU '10), pp. 13–18, 2010.
  55. M.-Y. Kan and H. O. N. Thi, “Fast webpage classification using URL features,” in Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM '05), pp. 325–326, New York, NY, USA, November 2005. View at Publisher · View at Google Scholar · View at Scopus
  56. S. A. Özel, “A Web page classification system based on a genetic algorithm using tagged-terms as features,” Expert Systems with Applications, vol. 38, no. 4, pp. 3407–3415, 2011. View at Publisher · View at Google Scholar · View at Scopus
  57. E. P. Jiang, “Learning to integrate unlabeled data in text classification,” in Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT '10), pp. 82–86, Chengdu, China, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  58. T. Joachims, “Transductive inference for text classification using support vector machines,” in Proceedings of the 16th International Conference on Machine Learning, pp. 200–209, 1999.