Table of Contents
ISRN Bioinformatics
Volume 2014 (2014), Article ID 901419, 34 pages
http://dx.doi.org/10.1155/2014/901419
Review Article

Hierarchical Ensemble Methods for Protein Function Prediction

AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy

Received 2 February 2014; Accepted 25 February 2014; Published 5 May 2014

Academic Editors: T. Can and T. R. Hvidsten

Copyright © 2014 Giorgio Valentini. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. I. Friedberg, “Automated protein function prediction—the genomic challenge,” Briefings in Bioinformatics, vol. 7, no. 3, pp. 225–242, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. L. Pena-Castillo, M. Tasan, C. L. Myers et al., “A critical assessment of Mus musculus gene function prediction using integrated genomic evidence,” Genome Biology, vol. 9, supplement 1, article S2, 2008. View at Google Scholar
  3. G. Valentini, “Mosclust: a software library for discovering significant structures in bio-molecular data,” Bioinformatics, vol. 23, no. 3, pp. 387–389, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. A. Bertoni and G. Valentini, “Discovering multi-level structures in bio-molecular data through the Bernstein inequality,” BMC Bioinformatics, vol. 9, no. 2, article S4, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Ruepp, A. Zollner, D. Maier et al., “The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes,” Nucleic Acids Research, vol. 32, no. 18, pp. 5539–5545, 2004. View at Publisher · View at Google Scholar · View at Scopus
  6. M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. View at Publisher · View at Google Scholar · View at Scopus
  7. A. Valencia, “Automatic annotation of protein function,” Current Opinion in Structural Biology, vol. 15, no. 3, pp. 267–274, 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. N. Youngs, D. Penfold-Brown, K. Drew, D. Shasha, and R. Bonneau, “Parametric bayesian priors and better choice of negative examples improve protein function prediction,” Bioinformatics, vol. 29, no. 9, pp. 1190–1198, 2013. View at Publisher · View at Google Scholar
  9. A. Clare and R. D. King, “Predicting gene function in Saccharomyces cerevisiae,” Bioinformatics, vol. 19, no. 2, pp. II42–II49, 2003. View at Publisher · View at Google Scholar · View at Scopus
  10. Y. Bilu and M. Linial, “The advantage of functional prediction based on clustering of yeast genes and its correlation with non-sequence based classifications,” Journal of Computational Biology, vol. 9, no. 2, pp. 193–210, 2002. View at Publisher · View at Google Scholar · View at Scopus
  11. G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and S. Noble, “A statistical framework for genomic data fusion,” Bioinformatics, vol. 20, no. 16, pp. 2626–2635, 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. W. Tian, L. V. Zhang, M. Taşan et al., “Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function,” Genome Biology, vol. 9, no. 1, article S7, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. W. K. Kim, C. Krumpelman, and E. M. Marcotte, “Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy,” Genome Biology, vol. 9, no. 1, article S5, 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris, “GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function,” Genome Biology, vol. 9, no. 1, article S4, 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. I. Lee, B. Ambaru, P. Thakkar, E. M. Marcotte, and S. Y. Rhee, “Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana,” Nature Biotechnology, vol. 28, no. 2, pp. 149–156, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. P. Radivojac, W. T. Clark, T. R. Oron et al., “A large-scale evaluation of computational protein function prediction,” Nature Methods, vol. 10, no. 3, pp. 221–227, 2013. View at Publisher · View at Google Scholar
  17. A. S. Juncker, L. J. Jensen, A. Pierleoni et al., “Sequence-based feature prediction and annotation of proteins,” Genome Biology, vol. 10, no. 2, p. 206, 2009. View at Google Scholar · View at Scopus
  18. R. Sharan, I. Ulitsky, and R. Shamir, “Network-based prediction of protein function,” Molecular Systems Biology, vol. 3, p. 88, 2007. View at Google Scholar · View at Scopus
  19. A. Sokolov and A. Ben-Hur, “Hierarchical classification of gene ontology terms using the GOstruct method,” Journal of Bioinformatics and Computational Biology, vol. 8, no. 2, pp. 357–376, 2010. View at Publisher · View at Google Scholar · View at Scopus
  20. Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, “Hierarchical multi-label prediction of gene function,” Bioinformatics, vol. 22, no. 7, pp. 830–836, 2006. View at Publisher · View at Google Scholar · View at Scopus
  21. H. N. Chua, W.-K. Sung, and L. Wong, “An efficient strategy for extensive integration of diverse biological data for protein function prediction,” Bioinformatics, vol. 23, no. 24, pp. 3364–3373, 2007. View at Publisher · View at Google Scholar · View at Scopus
  22. P. Pavlidis, J. Weston, J. Cai, and W. S. Noble, “Learning gene functional classifications from multiple data types,” Journal of Computational Biology, vol. 9, no. 2, pp. 401–411, 2002. View at Publisher · View at Google Scholar · View at Scopus
  23. M. Re and G. Valentini, “Simple ensemble methods are competitive with stateof- the-art data integration methods for gene function prediction,” Journal of Machine Learning Research, W&C Proceedings, Machine Learning in Systems Biology, vol. 8, pp. 98–111, 2010. View at Google Scholar
  24. G. Obozinski, G. Lanckriet, C. Grant, M. I. Jordan, and W. S. Noble, “Consistent probabilistic outputs for protein function prediction,” Genome Biology, vol. 9, no. 1, article S6, 2008. View at Publisher · View at Google Scholar · View at Scopus
  25. D. Cozzetto, D. Buchan, K. Bryson, and D. Jones, “Protein function prediction by massive integration of evolutionary analyses and multiple data sources,” BMC Bioinformatics, vol. 14, supplement 3, article S1, 2013. View at Google Scholar
  26. X. Jiang, N. Nariai, M. Steffen, S. Kasif, and E. D. Kolaczyk, “Integration of relational and hierarchical network information for protein function prediction,” BMC Bioinformatics, vol. 9, article 350, 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. N. Cesa-Bianchi and G. Valentini, “Hierarchical cost-sensitive algorithms for genome-wide gene function prediction,” Journal of Machine Learning Research, W&C Proceedings, Machine Learning in Systems Biology, vol. 8, pp. 14–29, 2010. View at Google Scholar
  28. R. Cerri and A. C. P. L. F. De Carvalho, “New top-down methods using SVMs for hierarchical multilabel classification problems,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '10), pp. 1–8, IEEE Computer Society, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  29. L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Džeroski, “Predicting gene function using hierarchical multi-label decision tree ensembles,” BMC Bioinformatics, vol. 11, article 2, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. Y. Guan, C. L. Myers, D. C. Hess, Z. Barutcuoglu, A. A. Caudy, and O. G. Troyanskaya, “Predicting gene function in a hierarchical context with an ensemble of classifiers,” Genome Biology, vol. 9, no. 1, article S3, 2008. View at Publisher · View at Google Scholar · View at Scopus
  31. G. Valentini, “True path rule hierarchical ensembles for genome-wide gene function prediction,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 832–847, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. N. Alaydie, C. K. Reddy, and F. Fotouhi, “Exploiting label dependency for hierarchical multi-label classification,” in Advances in Knowledge Discovery and Data Mining, vol. 7301 of Lecture Notes in Computer Science, pp. 294–305, Springer, 2012. View at Google Scholar
  33. D. Koller and M. Sahami, “Hierarchically classifying documents using very few words,” in Proceedings of the 14th International Conference on Machine Learning, pp. 170–178, 1997.
  34. S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies,” VLDB Journal, vol. 7, no. 3, pp. 163–178, 1998. View at Google Scholar · View at Scopus
  35. M.-L. Zhang and Z.-H. Zhou, “Multilabel neural networks with applications to functional genomics and text categorization,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1338–1351, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. A. Burred and J. Lerch, “A hierarchical approach to automatic musical genre classification,” in Proceedings of the 6th International Conference on Digital Audio Effects, pp. 8–11, 2003.
  37. C. DeCoro, Z. Barutcuoglu, and R. Fiebrink, “Bayesian aggregation for hierarchical genre classification,” in Proceedings of the 8th International Conference on Music Information Retrieval, pp. 77–80, 2007.
  38. K. Trohidis, G. Tsoumahas, G. Kalliris, and I. P. Vlahavas, “Multilabel classification of music into emotions,” in Proceedings of the 9th International Conference on Music Information Retrieval, pp. 325–330, 2008.
  39. A. Binder, M. Kawanabe, and U. Brefeld, “Bayesian aggregation for hierarchical genre classification,” in Proceedings of the 9th Asian Conference on Computer Vision, 2009.
  40. Z. Barutcuoglu and C. DeCoro, “Hierarchical shape classification using Bayesian aggregation,” in Proceedings of the IEEE International Conference on Shape Modeling and Applications (SMI '06), p. 44, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  41. A. Dimou, G. Tsoumakas, V. Mezaris, I. Kompatsiaris, and I. Vlahavas, “An empirical study of multi-label learning methods for video annotation,” in Proceedings of the 7th International Workshop on Content-Based Multimedia Indexing (CBMI '09), pp. 19–24, Chania, Greece, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  42. K. Punera and J. Ghosh, “Enhanced hierarchical classification via isotonic smoothing,” in Proceedings of the 17th International Conference on World Wide Web (WWW '08), pp. 151–160, ACM, Beijing, China, April 2008. View at Publisher · View at Google Scholar · View at Scopus
  43. M. Ceci and D. Malerba, “Classifying web documents in a hierarchy of categories: a comprehensive study,” Journal of Intelligent Information Systems, vol. 28, no. 1, pp. 37–78, 2007. View at Publisher · View at Google Scholar · View at Scopus
  44. C. N. Silla Jr. and A. A. Freitas, “A survey of hierarchical classification across different application domains,” Data Mining and Knowledge Discovery, vol. 22, no. 1-2, pp. 31–72, 2011. View at Publisher · View at Google Scholar · View at Scopus
  45. O. G. Troyanskaya, K. Dolinski, A. B. Owen, R. B. Altman, and D. Botstein, “A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae),” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 14, pp. 8348–8353, 2003. View at Publisher · View at Google Scholar · View at Scopus
  46. K. Tsuda, H. Shin, and B. Schölkopf, “Fast protein classification with multiple networks,” Bioinformatics, vol. 21, supplement 2, pp. ii59–ii65, 2005. View at Publisher · View at Google Scholar · View at Scopus
  47. J. Xiong, S. Rayner, K. Luo, Y. Li, and S. Chen, “Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration,” BMC Bioinformatics, vol. 7, article 268, 2006. View at Publisher · View at Google Scholar · View at Scopus
  48. U. Karaoz, T. M. Murali, S. Letovsky et al., “Whole-genome annotation by using evidence integration in functional-linkage networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 9, pp. 2888–2893, 2004. View at Publisher · View at Google Scholar · View at Scopus
  49. A. Sokolov and A. Ben-Hur, “A structured-outputs method for prediction of protein function,” in Proceedings of the 2nd International Workshop on Machine Learning in Systems Biology (MLSB '08), 2008.
  50. K. Astikainen, L. Holm, E. Pitkanen, S. Szedmak, and J. Rousu, “Towards structured output prediction of enzyme function,” BMC Proceedings, vol. 2, supplement 4, article S2, 2008. View at Google Scholar
  51. B. Done, P. Khatri, A. Done, and S. Drǎghici, “Predicting novel human gene ontology annotations using semantic analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 91–99, 2010. View at Publisher · View at Google Scholar · View at Scopus
  52. G. Valentini and F. Masulli, “Ensembles of learning machines,” in Neural Nets WIRN-02, vol. 2486 of Lecture Notes in Computer Science, pp. 3–19, Springer, 2002. View at Google Scholar
  53. B. Shahbaba and R. M. Neal, “Gene function classification using Bayesian models with hierarchy-based priors,” BMC Bioinformatics, vol. 7, article 448, 2006. View at Publisher · View at Google Scholar · View at Scopus
  54. C. Vens, J. Struyf, L. Schietgat, S. Džeroski, and H. Blockeel, “Decision trees for hierarchical multi-label classification,” Machine Learning, vol. 73, no. 2, pp. 185–214, 2008. View at Publisher · View at Google Scholar · View at Scopus
  55. G. Valentini, “True path rule hierarchical ensembles,” in Multiple Classifier Systems. Eighth InternationalWorkshop, MCS, 2009, Reykjavik, Iceland, J. Kittler, J. Benediktsson, and F. Roli, Eds., vol. 5519 of Lecture Notes in Computer Science, pp. 232–241, Springer, 2009. View at Google Scholar
  56. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  57. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped blast and psi-blast: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  58. A. Conesa, S. Götz, J. M. García-Gómez, J. Terol, M. Talón, and M. Robles, “Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research,” Bioinformatics, vol. 21, no. 18, pp. 3674–3676, 2005. View at Publisher · View at Google Scholar · View at Scopus
  59. Y. Loewenstein, D. Raimondo, O. C. Redfern et al., “Protein function annotation by homology-based inference,” Genome biology, vol. 10, no. 2, p. 207, 2009. View at Google Scholar · View at Scopus
  60. A. Prlić, T. A. Down, E. Kulesha, R. D. Finn, A. Kähäri, and T. J. P. Hubbard, “Integrating sequence and structural biology with DAS,” BMC Bioinformatics, vol. 8, article 333, 2007. View at Publisher · View at Google Scholar · View at Scopus
  61. M. Falda, S. Toppo, A. Pescarolo et al., “Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms,” BMC Bioinformatics, vol. 13, supplement 4, article S14, 2012. View at Google Scholar
  62. T. Hamp, R. Kassner, S. Seemayer et al., “Homology-based inference sets the bar high for protein function prediction,” BMC Bioinformatics, vol. 14, supplement 3, article S7, 2013. View at Google Scholar
  63. E. M. Marcotte, M. Pellegrini, M. J. Thompson, T. O. Yeates, and D. Eisenberg, “A combined algorithm for genome-wide prediction of protein function,” Nature, vol. 402, no. 6757, pp. 83–86, 1999. View at Publisher · View at Google Scholar · View at Scopus
  64. A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani, “Global protein function prediction from protein-protein interaction networks,” Nature Biotechnology, vol. 21, no. 6, pp. 697–700, 2003. View at Publisher · View at Google Scholar · View at Scopus
  65. J. Z. Wang, R. Du, R. S. Payattakil, P. S. Yu, and C. F. Chen, “Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty,” Bioinformatics, vol. 23, no. 10, pp. 1274–1281, 2007. View at Publisher · View at Google Scholar
  66. H. Yang, T. Nepusz, and A. Paccanaro, “Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty,” Bioinformatics, vol. 28, no. 10, pp. 1383–1389, 2012. View at Publisher · View at Google Scholar
  67. H. Yu, L. Gao, K. Tu, and Z. Guo, “Broadly predicting specific gene functions with expression similarity and taxonomy similarity,” Gene, vol. 352, no. 1-2, pp. 75–81, 2005. View at Publisher · View at Google Scholar · View at Scopus
  68. Y. Tao, L. Sam, J. Li, C. Friedman, and Y. A. Lussier, “Information theory applied to the sparse gene ontology annotation network to predict novel gene function,” Bioinformatics, vol. 23, no. 13, pp. i529–i538, 2007. View at Publisher · View at Google Scholar · View at Scopus
  69. G. Pandey, C. L. Myers, and V. Kumar, “Incorporating functional inter-relationships into protein function prediction algorithms,” BMC Bioinformatics, vol. 10, article 142, 2009. View at Publisher · View at Google Scholar · View at Scopus
  70. X.-F. Zhang and D.-Q. Dai, “A framework for incorporating functional interrelationships into protein function prediction algorithms,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 740–753, 2012. View at Publisher · View at Google Scholar · View at Scopus
  71. E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, “Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps,” Bioinformatics, vol. 21, no. 1, pp. 302–310, 2005. View at Publisher · View at Google Scholar · View at Scopus
  72. A. Bertoni, M. Frasca, and G. Valentini, “COSNet: a cost sensitive neural network for semi-supervised learning in graphs,” in European Conference on Machine Learning, ECML PKDD 2011, vol. 6911 of Lecture Notes on Artificial Intelligence, pp. 219–234, Springer, 2011. View at Google Scholar
  73. M. Frasca, A. Bertoni, M. Re, and G. Valentini, “A neural network algorithm for semi-supervised node label learning from unbalanced data,” Neural Networks, vol. 43, pp. 84–98, 2013. View at Publisher · View at Google Scholar
  74. M. Deng, T. Chen, and F. Sun, “An integrated probabilistic model for functional prediction of proteins,” Journal of Computational Biology, vol. 11, no. 2-3, pp. 463–475, 2004. View at Publisher · View at Google Scholar · View at Scopus
  75. Y. A. I. Kourmpetis, A. D. J. Van Dijk, M. C. A. M. Bink, R. C. H. J. Van Ham, and C. J. F. Ter Braak, “Bayesian markov random field analysis for protein function prediction based on network data,” PLoS ONE, vol. 5, no. 2, Article ID e9293, 2010. View at Publisher · View at Google Scholar · View at Scopus
  76. S. Oliver, “Guilt-by-association goes global,” Nature, vol. 403, no. 6770, pp. 601–603, 2000. View at Publisher · View at Google Scholar · View at Scopus
  77. J. McDermott, R. Bumgarner, and R. Samudrala, “Functional annotation from predicted protein interaction networks,” Bioinformatics, vol. 21, no. 15, pp. 3217–3226, 2005. View at Publisher · View at Google Scholar · View at Scopus
  78. M. Re, M. Mesiti, and G. Valentini, “A fast ranking algorithm for predicting gene functions in biomolecular networks,” IEEE ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1812–1818, 2012. View at Publisher · View at Google Scholar
  79. M. Re and G. Valentini, “Cancer module genes ranking using kernelized score functions,” BMC Bioinformatics, vol. 13, supplement 14, article S3, 2012. View at Google Scholar
  80. M. Re and G. Valentini, “Large scale ranking and repositioning of drugs with respect to DrugBank therapeutic categories,” in International Symposium on Bioinformatics Research and Applications (ISBRA 2012), vol. 7292 of Lecture Notes in Computer Science, pp. 225–236, Springer, 2012. View at Publisher · View at Google Scholar · View at MathSciNet
  81. M. Re and G. Valentini, “Network-based drug ranking and repositioning with respect to drugbank therapeutic categories,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 6, pp. 1359–1371, 2014. View at Publisher · View at Google Scholar
  82. Y. Bengio, O. Delalleau, and N. Le Roux, “Label propagation and quadratic criterion,” in Semi-Supervised Learning, O. Chapelle, B. Scholkopf, and A. Zien, Eds., pp. 193–216, MIT Press, 2006. View at Google Scholar
  83. Y. Saad, Iterative Methods For Sparse Linear Systems, PWS Publishing Company, Boston, Mass, USA, 1996.
  84. N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella, “Random spanning trees and the prediction of weighted graphs,” in Proceedings of the 27th International Conference on Machine Learning (ICML '10), pp. 175–182, Haifa, Israel, June 2010. View at Scopus
  85. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” Journal of Machine Learning Research, vol. 6, pp. 1453–1484, 2005. View at Google Scholar · View at Scopus
  86. J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor, “Kernel-based learning of hierarchical multilabel classification models,” Journal of Machine Learning Research, vol. 7, pp. 1601–1626, 2006. View at Google Scholar · View at Scopus
  87. C. H. Lampert and M. B. Blaschko, “Structured prediction by joint kernel support estimation,” Machine Learning, vol. 77, no. 2-3, pp. 249–269, 2009. View at Publisher · View at Google Scholar · View at Scopus
  88. G. Bakir, T. Hoffman, B. Scholkopf, B. Smola, A. J. Taskar, and S. V. N. Vishwanathan, Predicting Structured Data, MIT Press, Cambridge, Mass, USA, 2007.
  89. R. Eisner, B. Poulin, D. Szafron, P. Lu, and R. Greiner, “Improving protein function prediction using the hierarchical structure of the gene ontology,” in Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), November 2005. View at Scopus
  90. H. Blockeel, L. Schietgat, and A. Clare, “Hierarchical multilabel classification trees for gene function prediction,” in Probabilistic Modeling and Machine Learning in Structural and Systems Biology, J. Rousu, S. Kaski, and E. Ukkonen, Eds., Helsinki University Printing House, Tuusula, Finland, 2006. View at Google Scholar
  91. O. Dekel, J. Keshet, and Y. Singer, “Large margin hierarchical classification,” in Proceedings of the 21th International Conference on Machine Learning (ICML '04), pp. 209–216, Omnipress, July 2004. View at Scopus
  92. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni, “Hierarchical classifications combining bayes with SVM,” in Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 177–184, ACM Press, June 2006. View at Scopus
  93. T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems. First International Workshop, MCS, 2000, Cagliari, Italy, J. Kittler and F. Roli, Eds., vol. 1857 of Lecture Notes in Computer Science, pp. 1–15, Springer, 2000. View at Google Scholar
  94. O. Okun, G. Valentini, and M. Re, Ensembles in Machine Learning Applications, vol. 373 of Studies in Computational Intelligence, Springer, Berlin, Germany, 2011.
  95. M. Re and G. Valentini, “Ensemble methods: a review,” in Advances in Machine Learning and Data Mining for Astronomy, Data Mining and Knowledge Discovery, pp. 563–594, Chapman & Hall, 2012. View at Google Scholar
  96. E. Bauer and R. Kohavi, “Empirical comparison of voting classification algorithms: bagging, boosting, and variants,” Machine Learning, vol. 36, no. 1, pp. 105–139, 1999. View at Google Scholar · View at Scopus
  97. T. G. Dietterich, “Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization,” Machine Learning, vol. 40, no. 2, pp. 139–157, 2000. View at Publisher · View at Google Scholar · View at Scopus
  98. R. E. Banfield, L. O. Hall, O. Lawrence, K. W. Bowyer, W. Kevin, and W. P. Kegelmeyer, “A comparison of decision tree ensemble creation techniques,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 173–180, 2007. View at Publisher · View at Google Scholar · View at Scopus
  99. S. Y. Sohn and H. W. Shin, “Experimental study for the comparison of classifier combination methods,” Pattern Recognition, vol. 40, no. 1, pp. 33–40, 2007. View at Publisher · View at Google Scholar · View at Scopus
  100. M. Dettling and P. Bühlmann, “Boosting for tumor classification with gene expression data,” Bioinformatics, vol. 19, no. 9, pp. 1061–1069, 2003. View at Publisher · View at Google Scholar · View at Scopus
  101. V. Robles, P. Larrañaga, J. M. Peña et al., “Bayesian network multi-classifiers for protein secondary structure prediction,” Artificial Intelligence in Medicine, vol. 31, no. 2, pp. 117–136, 2004. View at Publisher · View at Google Scholar · View at Scopus
  102. G. Valentini, M. Muselli, and F. Ruffino, “Cancer recognition with bagged ensembles of support vector machines,” Neurocomputing, vol. 56, no. 1–4, pp. 461–466, 2004. View at Publisher · View at Google Scholar · View at Scopus
  103. T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, and Y. Saeys, “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods,” Bioinformatics, vol. 26, no. 3, Article ID btp630, pp. 392–398, 2009. View at Publisher · View at Google Scholar · View at Scopus
  104. A. Rozza, G. Lombardi, M. Re, E. Casiraghi, G. Valentini, and P. Campadelli, “A novel ensemble technique for protein subcellular location prediction,” in Ensembles in Machine Learning Applications, vol. 373 of Studies in Computational Intelligence, pp. 151–177, Springer, Berlin, Germany, 2011. View at Google Scholar
  105. A. Topchy, A. K. Jain, and W. Punch, “Clustering ensembles: models of consensus and weak partitions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866–1881, 2005. View at Publisher · View at Google Scholar · View at Scopus
  106. A. Bertoni and G. Valentini, “Ensembles based on random projections to improve the accuracy of clustering algorithms,” in Neural Nets, WIRN 2005, vol. 3931 of Lecture Notes in Computer Science, pp. 31–37, Springer, 2006. View at Google Scholar
  107. R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998. View at Google Scholar · View at Scopus
  108. E. L. Allwein, R. E. Schapire, and Y. Singer, “Reducing multiclass to binary: a unifying approach for margin classifiers,” Journal of Machine Learning Research, vol. 1, no. 2, pp. 113–141, 2001. View at Google Scholar · View at Scopus
  109. E. M. Kleinberg, “On the algorithmic implementation of stochastic discrimination,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 5, pp. 473–490, 2000. View at Publisher · View at Google Scholar · View at Scopus
  110. L. Breiman, “Bias, variance and arcing classifiers,” Tech. Rep. TR 460, Statistics Department, University of California, Berkeley, Calif, USA, 1996. View at Google Scholar
  111. J. Friedman and P. Hall, “On bagging and nonlinear estimation,” Tech. Rep., Statistics Department, University of Stanford, Stanford, Calif, USA, 2000. View at Google Scholar
  112. K. Dembczynski, W. Waegeman, W. Cheng, and E. Hullermeier, “On label dependence in multi-label classification,” in Proceedings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD '10), pp. 5–12, Haifa, Israel, 2010.
  113. S. Mostafavi and Q. Morris, “Using the Gene Ontology hierarchy when predicting gene function,” in Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI '09), pp. 419–427, AUAI Press, Corvallis, Ore, USA, June 2009. View at Scopus
  114. L. J. Jensen, R. Gupta, H.-H. Stærfeldt, and S. Brunak, “Prediction of human protein function according to Gene Ontology categories,” Bioinformatics, vol. 19, no. 5, pp. 635–642, 2003. View at Publisher · View at Google Scholar · View at Scopus
  115. A. Lægreid, T. R. Hvidsten, H. Midelfart, J. Komorowski, and A. K. Sandvik, “Predicting gene ontology biological process from temporal gene expression patterns,” Genome Research, vol. 13, no. 5, pp. 965–979, 2003. View at Publisher · View at Google Scholar · View at Scopus
  116. R. Bi, Y. Zhou, F. Lu, and W. Wang, “Predicting Gene Ontology functions based on support vector machines and statistical significance estimation,” Neurocomputing, vol. 70, no. 4–6, pp. 718–725, 2007. View at Publisher · View at Google Scholar · View at Scopus
  117. P. Bogdanov and A. K. Singh, “Molecular function prediction using neighborhood features,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 208–217, 2010. View at Publisher · View at Google Scholar · View at Scopus
  118. M. Riley, “Functions of the gene products of Escherichia coli,” Microbiological Reviews, vol. 57, no. 4, pp. 862–952, 1993. View at Google Scholar · View at Scopus
  119. R. E. Schapire and Y. Singer, “BoosTexter: a boosting-based system for text categorization,” Machine Learning, vol. 39, no. 2, pp. 135–168, 2000. View at Google Scholar · View at Scopus
  120. N. Alaydie, C. K. Reddy, and F. Fotouhi, “Hierarchical multi-label boosting for gene function prediction,” in Proceedings of the International Conference on Computational Systems Bioinformatics (CSB '10), Stanford, Calif, USA, 2010.
  121. T. Kohonen, “The self-organizing map,” Neurocomputing, vol. 21, no. 1–3, pp. 1–6, 1998. View at Publisher · View at Google Scholar · View at Scopus
  122. H. Borges and J. Nievola, “Hierarchical classification using a competitive neural network,” in Proceedings of the International Conference on Computing, Networking and Communications (ICNC '12), pp. 172–177, 2012.
  123. N. Cesa-Bianchi, M. Re, and G. Valentini, “Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference,” Machine Learning, vol. 88, no. 1, pp. 209–241, 2012. View at Publisher · View at Google Scholar · View at Scopus
  124. R. Cerri and A. C. P. L. F. De Carvalho, “Hierarchical multilabel classification using Top-Down label combination and Artificial Neural Networks,” in Proceedings of the 11th Brazilian Symposium on Neural Networks (SBRN '10), pp. 253–258, IEEE Computer Society, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  125. R. Cerri and A. de Carvalho, “Hierarchical multilabel protein function prediction using local neural networks,” in Advances in Bioinformatics and Computational Biology, vol. 6832 of Lecture Notes in Computer Science, pp. 10–17, Springer, 2011. View at Google Scholar
  126. D. E. Rumelhart, R. Durbin, and Y. Chauvin, “Backpropagation: the basic theory,” in Backpropagation: Theory, Architectures and Applications, D. E. Rumelhart and Y. Chauvin, Eds., pp. 1–34, Lawrence Erlbaum, Hillsdale, NJ, USA, 1995. View at Google Scholar
  127. J. Hernandez, L. E. Sucar, and E. Morales, “A hybrid global-local approach forhierarchcial classification,” in Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference, pp. 432–437, 2013.
  128. B. Paes, A. Plastion, and A. Freitas, “Improving loacal per level hierarchical classification,” Journal of Information and Data Management, vol. 3, no. 3, pp. 394–409, 2012. View at Google Scholar
  129. G. Tsoumakas, I. Katakis, and I. Vlahavas, “Random k-labelsets for multilabel classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1079–1089, 2011. View at Publisher · View at Google Scholar · View at Scopus
  130. H. Wang, X. Shen, and W. Pan, “Large margin hierarchical classification with mutually exclusive class membership,” Journal of Machine Learning Research, vol. 12, pp. 2721–2748, 2011. View at Google Scholar · View at Scopus
  131. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. View at Google Scholar · View at Scopus
  132. M. des Jardins, P. Karp, M. Krummenacker, T. J. Lee, and C. A. Ouzounis, “Prediction of enzyme classification from protein sequence without the use of sequence similarity,” in Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB '97), pp. 92–99, AAAI Press, 1997.
  133. A. Sharkey, N. Sharkey, U. Gerecke, and G. Chandroth, “The test and select approach to ensemble combination,” in Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, J. Kittler and F. Roli, Eds., vol. 1857 of Lecture Notes in Computer Science, pp. 30–44, Springer, 2000. View at Google Scholar
  134. Y. Kourmpetis, A. van Dijk, and C. ter Braak, “Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes,” Algorithms for Molecular Biology, vol. 8, article 10, 2013. View at Publisher · View at Google Scholar
  135. N. Cesa-Bianchi, C. Gentile, A. Tironi, and L. Zaniboni, “Incremental algorithms for hierarchical classification,” in Advances in Neural Information Processing Systems, vol. 17, pp. 233–240, MIT Press, 2005. View at Google Scholar
  136. M. Re and G. Valentini, “An experimental comparison of Hierarchical Bayes and True Path Rule ensembles for protein function prediction,” in Multiple Classifier Systems. Nineth International Workshop, MCS, 2010, Cairo, Egypt, N. El-Gayar, Ed., vol. 5997 of Lecture Notes in Computer Science, pp. 294–303, Springer, 2010. View at Google Scholar
  137. A. J. Smola and R. Kondor, “Kernels and regularization on graphs,” in Proceedings of the 16th Annual Conference on Learning Theory, B. Scholkopf and M. K. Warmuth, Eds., Lecture Notes in Computer Science, pp. 144–158, Springer, August 2003. View at Scopus
  138. C. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, “Mismatch string kernels for svm protein classification,” in Advances in Neural Information Processing Systems, pp. 1441–1448, MIT Press, Cambridge, Mass, USA, 2003. View at Google Scholar
  139. M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1-2, pp. 1–305, 2008. View at Google Scholar
  140. R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” Journal of the American Statistical Association, vol. 67, no. 337, pp. 140–147, 1972. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  141. O. Burdakov, O. Sysoev, A. Grimvall, and M. Hussian, “An O(n2) algorithm for isotonic regression,” in Large-Scale Nonlinear Optimization, vol. 83 of Nonconvex Optimization and Its Applications, pp. 25–33, Springer, 2006. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  142. G. Valentini and M. Re, “Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction,” in Proceedings of the 1st International Workshop on learning from Multi-Label Data (MLD-ECML '09), pp. 133–146, Bled, Slovenia, 2009.
  143. Gene Ontology Consortium. True path rule, 2010, http://www.geneontology.org/GO.usage.shtml#truePathRule.
  144. B. Chen, L. Duan, and J. Hu, “Composite kernel based svm for hierarchical multilabel gene function classification,” in Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference, pp. 1380–1385, 2012.
  145. B. Chen and J. Hu, “Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction,” IEEJ Transactions on Electrical and Electronic Engineering, vol. 7, no. 2, pp. 183–189, 2012. View at Publisher · View at Google Scholar · View at Scopus
  146. J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986. View at Publisher · View at Google Scholar · View at Scopus
  147. R. D. King, A. Karwath, A. Clare, and L. Dehaspe, “The utility of different representations of protein sequence for predicting functional class,” Bioinformatics, vol. 17, no. 5, pp. 445–454, 2001. View at Google Scholar · View at Scopus
  148. H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf, “Top-down induction of clustering trees,” in Proceedings of the 15th International Conference on Machine Learning, pp. 55–63, 1998.
  149. H. Blockeel, L. Schietgat, S. Dzeroski, and A. Clare, “Decision trees for hierarchical multilabel classification: a case study in functional genomics,” in Proc. of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, vol. 4213 of Lecture Notes in Artificial Intelligence, pp. 18–29, Springer, 2006. View at Google Scholar
  150. D. Stojanova, M. Ceci, D. Malerba, and S. Deroski, “Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction,” BMC Bioinformatics, vol. 14, article 285, 2013. View at Google Scholar
  151. F. Otero, A. Freitas, and C. Johnson, “A hierarchical classification ant colony algorithm for predicting gene ontology terms,” in Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, vol. 5483 of Lecture Notes in Computer Science, pp. 68–79, Springer, 2009. View at Google Scholar
  152. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Scopus
  153. D. Kocev, C. Vens, J. Struyf, and S. Dzeroski, “Tree ensembles for predicting structured outputs,” Pattern Recognition, vol. 46, no. 3, pp. 817–833, 2013. View at Publisher · View at Google Scholar
  154. W. S. Noble and A. Ben-Hur, “Integrating information for protein function prediction,” in Bioinformatics—from Genomes To Therapies, T. Lengauer, Ed., vol. 3, pp. 1297–1314, Wiley-VCH, 2007. View at Google Scholar
  155. L. Lan, N. Djuric, Y. Guo, and S. Vucetic, “MS-kNN: protein function prediction by integrating multiple data sources,” BMC Bioinformatics, vol. 14, supplement 3, article S8, 2013. View at Google Scholar
  156. A. Sokolov, C. Funk, K. Graim, K. Verspoor, and A. Ben-Hur, “Combining heterogeneous data sources for accurate functional annotation of proteins,” BMC Bioinformatics, vol. 14, supplement 3, article S10, 2013. View at Google Scholar
  157. C. L. Myers and O. G. Troyanskaya, “Context-sensitive data integration and prediction of biological networks,” Bioinformatics, vol. 23, no. 17, pp. 2322–2330, 2007. View at Publisher · View at Google Scholar · View at Scopus
  158. S. Mostafavi and Q. Morris, “Fast integration of heterogeneous data sources for predicting gene function with limited annotation,” Bioinformatics, vol. 26, no. 14, Article ID btq262, pp. 1759–1765, 2010. View at Publisher · View at Google Scholar · View at Scopus
  159. M. Mesiti, E. Jiménez-Ruiz, I. Sanz et al., “XML-based approaches for the integration of heterogeneous bio-molecular data,” BMC Bioinformatics, vol. 10, no. 12, article 1471, p. S7, 2009. View at Google Scholar · View at Scopus
  160. S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, “Large scale multiple kernel learning,” Journal of Machine Learning Research, vol. 7, pp. 1531–1565, 2006. View at Google Scholar · View at Scopus
  161. A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “More efficiency in multiple kernel learning,” in Proceedings of the 24th International Conference on Machine Learning (ICML '07), pp. 775–782, ACM, New York, NY, USA, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  162. G. R. Lanckriet, M. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble, “Kernel-based data fusion and its application to protein function prediction in yeast,” Proceedings of the Pacific Symposium on Biocomputing, pp. 300–311, 2004. View at Google Scholar · View at Scopus
  163. D. P. Lewis, T. Jebara, and W. S. Noble, “Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure,” Bioinformatics, vol. 22, no. 22, pp. 2753–2760, 2006. View at Publisher · View at Google Scholar · View at Scopus
  164. N. Cesa-Bianchi, M. Re, and G. Valentini, “Functional inference in FunCat through the combination of hierarchical ensembles with data fusion methods,” in Proceedings of the 2nd International Workshop on learning from Multi-Label Data (MLD '10), pp. 13–20, Haifa, Israel, 2010.
  165. G. Yu, H. Rangwala, C. Domeniconi, G. Zhang, and Z. Zhang, “Protein function prediction by integrating multiple kernels,” in Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI '13), Beijing, China, 2013.
  166. D. M. Titterington, G. D. Murray, L. S. Murray et al., “Comparison of discrimination techniques applied to a complex data set of head injured patients,” Journal of the Royal Statistical Society A, vol. 144, no. 2, pp. 145–175, 1981. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  167. N. C. de Condorcet, Essai sur l'Application de l'Analyse a la Probabilité des Décisions Rendues à la Pluralité des Voix, Imprimerie Royale, Paris, France, 1785.
  168. J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998. View at Publisher · View at Google Scholar · View at Scopus
  169. L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001. View at Publisher · View at Google Scholar · View at Scopus
  170. M. Rè and G. Valentini, “Noise tolerance of multiple classifier systems in data integration-based gene function prediction,” Journal of Integrative Bioinformatics, vol. 7, no. 3, article 139, 2010. View at Google Scholar · View at Scopus
  171. M. Re and G. Valentini, “Ensemble based data fusion for gene function prediction,” in Multiple Classifier Systems. Eighth InternationalWorkshop, MCS, 2009, Reykjavik, Iceland, J. Kittler, J. Benediktsson, and F. Roli, Eds., vol. 5519 of Lecture Notes in Computer Science, pp. 448–457, Springer, 2009. View at Google Scholar
  172. M. Re and G. Valentini, “Prediction of gene function using ensembles of SVMs and heterogeneous data sources,” in Applications of Supervised and Unsupervised Ensemble Methods, vol. 245 of Computational Intelligence Series, pp. 79–91, Springer, 2009. View at Google Scholar
  173. M. Re and G. Valentini, “Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines,” Neurocomputing, vol. 73, no. 7-9, pp. 1533–1537, 2010. View at Publisher · View at Google Scholar · View at Scopus
  174. R. D. Finn, J. Tate, J. Mistry et al., “The Pfam protein families database,” Nucleic Acids Research, vol. 36, no. 1, pp. D281–D288, 2008. View at Publisher · View at Google Scholar · View at Scopus
  175. A. P. Gasch, P. T. Spellman, C. M. Kao et al., “Genomic expression programs in the response of yeast cells to environmental changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241–4257, 2000. View at Google Scholar · View at Scopus
  176. C. Stark, B.-J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, “BioGRID: a general repository for interaction datasets,” Nucleic Acids Research, vol. 34, pp. D535–D539, 2006. View at Google Scholar · View at Scopus
  177. C. Von Mering, R. Krause, B. Snel et al., “Comparative assessment of large-scale data sets of protein-protein interactions,” Nature, vol. 417, no. 6887, pp. 399–403, 2002. View at Google Scholar · View at Scopus
  178. G. Valentini and N. Cesa-Bianchi, “HCGene: a software tool to support the hierarchical classification of genes,” Bioinformatics, vol. 24, no. 5, pp. 729–731, 2008. View at Publisher · View at Google Scholar · View at Scopus
  179. A. Ben-Hur and W. S. Noble, “Choosing negative examples for the prediction of protein-protein interactions,” BMC Bioinformatics, vol. 7, supplement 1, article S2, 2006. View at Publisher · View at Google Scholar · View at Scopus
  180. N. Skunca, A. Altenhoff, and C. Dessimoz, “Quality of computationally inferred gene ontology annotations,” PLoS Computational Biology, vol. 8, no. 5, Article ID e1002533, 2012. View at Google Scholar
  181. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Google Scholar · View at Scopus
  182. G. Batista, R. Prati, and M. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, 2004. View at Publisher · View at Google Scholar
  183. M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” in Proceedings of the 14th International Conference on Machine Learning (ICML '97), pp. 179–187, Nashville, Tenn, USA, 1997.
  184. H. Guo and H. Viktor, “Learning from imbalanced data sets with boosting and data generation: the Databoost-IM approach,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 30–39, 2004. View at Publisher · View at Google Scholar
  185. Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007. View at Publisher · View at Google Scholar · View at Scopus
  186. Y. Zhang and D. Wang, “Cost-sensitive ensemble method for class-imbalanced datasets,” Abstract and Applied Analysis, vol. 2013, Article ID 196256, 6 pages, 2013. View at Publisher · View at Google Scholar
  187. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-. boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012. View at Publisher · View at Google Scholar · View at Scopus
  188. C. Huttenhower, M. A. Hibbs, C. L. Myers, A. A. Caudy, D. C. Hess, and O. G. Troyanskaya, “The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction,” Bioinformatics, vol. 25, no. 18, pp. 2404–2410, 2009. View at Publisher · View at Google Scholar · View at Scopus
  189. G. Yu, C. Domeniconi, H. Rangwala, G. Zhang, and Z. Yu, “Transductive multilabel ensemble classification for protein function prediction,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1077–1085, 2012.
  190. G. Yu, H. Rangwala, C. Domeniconi, G. Zhang, and Z. Yu, “Protein function prediction with incomplete annotations,” IEEE Transactions on Computational Biology and Bioinformatics, 2013. View at Publisher · View at Google Scholar
  191. Gene Ontology Consortium, “Gene Ontology annotations and resources,” Nucleic Acids Research, vol. 41, pp. D530–D535, 2013. View at Publisher · View at Google Scholar
  192. A. A. Freitas, D. C. Wieser, and R. Apweiler, “On the importance of comprehensible classification models for protein function prediction,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 172–182, 2010. View at Publisher · View at Google Scholar · View at Scopus
  193. R. Cerri, R. Barros, A. de Carvalho, and A. Freitas, “A grammatical evolution algorithm for generation of hierarchical multi-label classification rules,” in Proceedings of the IEEE Congress on Evolutionary Computation, 2013.
  194. C. Widmer and G. Ratsch, “Multitask learning in computational biology,” Journal of Machine Learning Research, W&P, vol. 27, pp. 207–216, 2012. View at Google Scholar
  195. J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin, “PowerGraph: distributed graph-parallel computation on natural graphs,” in Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation (OSDI '12), pp. 17–30, Hollywood, Calif, USA, 2012.
  196. M. Mesiti, M. Re, and G. Valentini, “Scalable network-based learning methods for automated function prediction based on the neo4j graph-database,” in Proceedings of the Automated Function Prediction SIG 2013—ISMB Special Interest Group Meeting, pp. 6–7, Berlin, Germany, 2013.
  197. H. W. Mewes, K. Albermann, M. Bähr et al., “Overview of the yeast genome,” Nature, vol. 387, no. 6632, pp. 7–8, 1997. View at Google Scholar · View at Scopus
  198. H. W. Mewes, D. Frishman, C. Gruber et al., “MIPS: a database for genomes and protein sequences,” Nucleic Acids Research, vol. 28, no. 1, pp. 37–40, 2000. View at Google Scholar · View at Scopus
  199. K. R. Christie, S. Weng, R. Balakrishnan et al., “Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms,” Nucleic Acids Research, vol. 32, pp. D311–D314, 2004. View at Google Scholar · View at Scopus
  200. K. Verspoor, D. Dvorkin, K. B. Cohen, and L. Hunter, “Ontology quality assurance through analysis of term transformations,” Bioinformatics, vol. 25, no. 12, pp. i77–i84, 2009. View at Publisher · View at Google Scholar · View at Scopus
  201. K. Verspoor, J. Cohn, S. Mniszewski, and C. Joslyn, “A categorization approach to automated ontological function annotation,” Protein Science, vol. 15, no. 6, pp. 1544–1549, 2006. View at Publisher · View at Google Scholar · View at Scopus
  202. S. Kiritchenko, S. Matwin, and A. Famili, “Hierarchical text categorization as a tool of associating genes with Gene Ontology codes,” in Proceedings of the 2nd European Workshop on Data Mining and Text Mining for Bioinformatics, pp. 26–30, 2004.