Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015, Article ID 878291, 14 pages
http://dx.doi.org/10.1155/2015/878291
Research Article

A Linear-RBF Multikernel SVM to Classify Big Text Corpora

Department of Computer Science, Higher Technical School of Computer Engineering, University of Vigo, 32004 Ourense, Spain

Received 22 August 2014; Revised 10 November 2014; Accepted 13 November 2014

Academic Editor: Juan M. Corchado

Copyright © 2015 R. Romero et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513–523, 1988. View at Publisher · View at Google Scholar · View at Scopus
  2. R. Barandela, J. S. Sanchez, V. Garcia, and E. Rangel, “Strategies for learning in class imbalance problems,” Pattern Recognition, vol. 36, no. 3, pp. 849–851, 2003. View at Google Scholar
  3. G. M. Weiss, “Mining with rarity: a unifying framework,” ACM SIGKDD Explorations Newsletter, vol. 6, pp. 7–19, 2004. View at Google Scholar
  4. S. Tan, “Neighbor-weighted K-nearest neighbor for unbalanced text corpus,” Expert Systems with Applications, vol. 28, no. 4, pp. 667–671, 2005. View at Publisher · View at Google Scholar · View at Scopus
  5. L. Borrajo, R. Romero, E. L. Iglesias, and C. M. Redondo Marey, “Improving imbalanced scientific text classification using sampling strategies and dictionaries,” Journal of Integrative Bioinformatics, vol. 8, no. 3, p. 176, 2011. View at Google Scholar · View at Scopus
  6. P. Kang and S. Cho, “EUS SVMs: ensemble of under-sampled SVMs for data imbalance problems,” in Neural Information Processing, vol. 4232 of Lecture Notes in Computer Science, chapter 93, pp. 837–846, Springer, Berlin, Germany, 2006. View at Google Scholar
  7. R. Romero, E. L. Iglesias, and L. Borrajo, “Building biomedical text classifiers under sample selection bias,” in International Symposium on Distributed Computing and Artificial Intelligence, vol. 91 of Advances in Intelligent and Soft Computing, pp. 11–18, Springer, Berlin, Germany, 2011. View at Publisher · View at Google Scholar
  8. R. Romero, E. L. Iglesias, L. Borrajo, and C. M. R. Marey, “Using dictionaries for biomedical text classification,” Advances in Intelligent and Soft Computing, vol. 93, pp. 365–372, 2011. View at Publisher · View at Google Scholar · View at Scopus
  9. C.-C. Chang and C.-J. Lin, “LIBSVM: a Library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011. View at Publisher · View at Google Scholar · View at Scopus
  10. K.-K. Tseng, Y. Li, C.-Y. Hsu, H.-N. Huang, M. Zhao, and M. Ding, “Computer-assisted system with multiple feature fused support vector machine for sperm morphology diagnosis,” BioMed Research International, vol. 2013, Article ID 687607, 13 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  11. T. W. Pai, H. W. Wang, Y. C. Lin, and H. T. Chang, “Prediction of B-cell linear epitopes with a combination of support vector machine classification and amino acid propensity identification,” Journal of Biomedicine and Biotechnology, vol. 2011, Article ID 432830, 12 pages, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. W. Zhang, T. Yoshida, and X. Tang, “Text classification based on multi-word with support vector machine,” Knowledge-Based Systems, vol. 21, no. 8, pp. 879–886, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. W. Hersh, A. Cohen, J. Yang, R. T. Bhupatiraju, P. Roberts, and M. Hearst, “TREC 2005 genomics track overview,” in Proceedings of the 14th Text Retrieval Conference (TREC '05), pp. 14–25, November 2005. View at Scopus
  14. M. Gönen and E. Alpaydın, “Multiple kernel learning algorithms,” Journal of Machine Learning Research, vol. 12, pp. 2211–2268, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  15. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods, Cambridge University Press, New York, NY, USA, 2000.
  16. A. Ben-Hur and W. S. Noble, “Kernel methods for predicting protein-protein interactions,” Bioinformatics, vol. 21, supplement 1, pp. i38–i46, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. I. M. de Diego, J. M. Moguerza, and A. Muñoz, “Combining kernel information for support vector classification,” in Multiple Classifier Systems, F. Roli, J. Kittler, and T. Windeatt, Eds., vol. 3077, pp. 102–111, Springer, Berlin, Germany, 2004. View at Publisher · View at Google Scholar
  18. I. M. de Diego, A. Muñoz, and J. M. Moguerza, “Methods for the combination of kernel matrices within a support vector framework,” Machine Learning, vol. 78, no. 1-2, pp. 137–174, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan, “Multiple kernel learning, conic duality, and the SMO algorithm,” in Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 41–48, ACM, New York, NY, USA, July 2004. View at Scopus
  20. C. Igel, T. Glasmachers, B. Mersch, N. Pfeifer, and P. Meinicke, “Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 2, pp. 216–226, 2007. View at Publisher · View at Google Scholar · View at Scopus
  21. T. Damoulas and M. A. Girolami, “Pattern recognition with a Bayesian kernel combination machine,” Pattern Recognition Letters, vol. 30, no. 1, pp. 46–54, 2009. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Girolami and S. Rogers, “Hierarchic bayesian models for kernel learning,” in Proceedings of the 22nd International Conference on Machine Learning (ICML '05), pp. 241–248, August 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. K. P. Bennett, M. Momma, and M. J. Embrechts, “MARK: a boosting algorithm for heterogeneous kernel models,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 24–31, ACM, New York, NY, USA, July 2002. View at Scopus
  24. J. Bi, T. Zhang, and K. P. Bennett, “Column-generation boosting methods for mixture of kernels,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 521–526, New York, NY, USA, 2004.
  25. J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computational Linguistics, vol. 11, pp. 22–31, 1968. View at Google Scholar
  26. M. F. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3, pp. 130–137, 1980. View at Google Scholar
  27. J. Zhang and I. Mani, “kNN approach to unbalanced data distributions: a case study involving information extraction,” in Proceedings of the Workshop on Learning from Imbalanced Datasets (ICML'03), 2003.
  28. Q. Zou, Z. Wang, X. Guan, B. Liu, Y. Wu, and Z. Lin, “An approach for identifying cytokines based on a novel ensemble classifier,” BioMed Research International, vol. 2013, Article ID 686090, 11 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  29. A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method for learning from imbalanced data sets,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. S. R. Garner, “WEKA: the Waikato environment for knowledge analysis,” in Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64, 1995.
  31. R. Romero, E. L. Iglesias, and L. Borrajo, “A comparative analysis of balancing techniques and attribute reduction algorithms,” in 6th International Conference on Practical Applications of Computational Biology & Bioinformatics, vol. 154 of Advances in Intelligent and Soft Computing, pp. 87–94, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  32. I. T. Jolliffe, Principal Component Analysis, Springer, New York, NY, USA, 2nd edition, 2002. View at MathSciNet
  33. S. Kim, H. Rim, D. Yook, and H. Lim, “Effective methods for improving Naïve Bayes text classifiers,” in Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence (PRICAI '02), pp. 414–423, Springer, 2002.
  34. Y. Tang, Y.-Q. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalanced classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 1, pp. 281–288, 2009. View at Publisher · View at Google Scholar · View at Scopus
  35. S. Ali and K. A. Smith, “Automatic parameter selection for polynomial kernel,” in Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI '03), pp. 243–249, 2003.
  36. C.-H. Li, H.-H. Ho, Y.-L. Liu, C.-T. Lin, B.-C. Kuo, and J.-S. Taur, “An automatic method for selecting the parameter of the normalized kernel function to support vector machines,” Journal of Information Science and Engineering, vol. 28, no. 1, pp. 1–15, 2012. View at Google Scholar · View at MathSciNet · View at Scopus
  37. B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, Mass, USA, 2001.
  38. T. Hill and P. Lewicki, Statistics, Methods and Applications, StatSoft, Tulsa, Okla, USA, 2007.
  39. J. T. Chang, S. Raychaudhuri, and R. B. Altman, “Including biological literature improves homology search,” in Proceedings of the Pacific Symposium on Biocomputing, pp. 374–383, 2001.
  40. C. Hsu, C. Chang, and C. Lin, A practical guide to support vector classification, 2010.
  41. H. Cunningham, Y. Wilks, and R. J. Gaizauskas, “GATE—a general architecture for text engineering,” in Proceedings of the 16th Conference on Computational Linguistics (COLING '96), pp. 1057–1060, Copenhagen, Denmark, August 1996.
  42. D. H. Fisher, “Knowledge acquisition via incremental conceptual clustering,” Machine Learning, vol. 2, no. 2, pp. 139–172, 1987. View at Publisher · View at Google Scholar · View at Scopus
  43. A. Chernobai, S. Rachev, and F. Fabozzi, Composite Goodness-of-Fit Tests for Left-Truncated Loss Samples, Department of Statistics and Applied Probability, University of California, Santa Barbara, Calif, USA, 2005.
  44. B. F. J. Manly, Multivariate Statistical Methods: A Primer, Chapman & Hall, New York, NY, USA, 2nd edition, 1994. View at MathSciNet
  45. D. Arthur and S. Vassilvitskii, “k-Means ++: The advantages of carefull seeding,” in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, 2007.
  46. J. S. Farris, “On the cophenetic correlation coefficient,” Systematic Zoology, vol. 18, no. 3, pp. 279–285, 1969. View at Google Scholar
  47. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2nd edition, 2005.