Table of Contents
International Journal of Proteomics
Volume 2014, Article ID 845479, 22 pages
http://dx.doi.org/10.1155/2014/845479
Review Article

A Survey of Computational Intelligence Techniques in Protein Function Prediction

Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India

Received 10 September 2014; Revised 31 October 2014; Accepted 7 November 2014; Published 11 December 2014

Academic Editor: Yaoqi Zhou

Copyright © 2014 Arvind Kumar Tiwari and Rajeev Srivastava. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. B. Boeckmann, A. Bairoch, R. Apweiler et al., “The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003,” Nucleic Acids Research, vol. 31, no. 1, pp. 365–370, 2003. View at Publisher · View at Google Scholar · View at Scopus
  2. I. Xenarios, L. Salwínski, X. J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, “DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions,” Nucleic Acids Research, vol. 30, no. 1, pp. 303–305, 2002. View at Publisher · View at Google Scholar · View at Scopus
  3. R. Edgar, M. Domrachev, and A. E. Lash, “Gene expression omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Research, vol. 30, no. 1, pp. 207–210, 2002. View at Publisher · View at Google Scholar · View at Scopus
  4. D. Szklarczyk, A. Franceschini, M. Kuhn et al., “The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored,” Nucleic Acids Research, vol. 39, supplement 1, pp. D561–D568, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000. View at Publisher · View at Google Scholar · View at Scopus
  6. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  7. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  8. W. R. Pearson, “Effective protein sequence comparison,” Methods in Enzymology, vol. 266, pp. 227–258, 1996. View at Publisher · View at Google Scholar · View at Scopus
  9. A. Bairoch, P. Bucher, and K. Hofmann, “The PROSITE database, its status in 1995,” Nucleic Acids Research, vol. 24, no. 1, pp. 189–196, 1996. View at Publisher · View at Google Scholar · View at Scopus
  10. T. K. Attwood, M. E. Beck, A. J. Bleasby, and D. J. Parry-Smith, “PRINTS—a database of protein motif fingerprints,” Nucleic Acids Research, vol. 22, no. 17, pp. 3590–3596, 1994. View at Google Scholar · View at Scopus
  11. W. R. Pearson and D. J. Lipman, “Improved tools for biological sequence comparison,” Proceedings of the National Academy of Sciences of the United States of America, vol. 85, no. 8, pp. 2444–2448, 1988. View at Publisher · View at Google Scholar · View at Scopus
  12. S. A. Benner, S. G. Chamberlin, D. A. Liberles, S. Govindarajan, and L. Knecht, “Functional inferences from reconstructed evolutionary biology involving rectified databases—an evolutionarily grounded approach to functional genomics,” Research in Microbiology, vol. 151, no. 2, pp. 97–106, 2000. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Harrison, F. Pearl, I. Sillitoe et al., “Recognizing the fold of a protein structure,” Bioinformatics, vol. 19, no. 14, pp. 1748–1759, 2003. View at Publisher · View at Google Scholar · View at Scopus
  14. J. A. Capra, R. A. Laskowski, J. M. Thornton, M. Singh, and T. A. Funkhouser, “Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure,” PLoS Computational Biology, vol. 5, no. 12, Article ID e1000585, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. D. S. Glazer, R. J. Radmer, and R. B. Altman, “Improving structure-based function prediction using molecular dynamics,” Structure, vol. 17, no. 7, pp. 919–929, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. J. C. Whisstock and A. M. Lesk, “Prediction of protein function from protein sequence and structure,” Quarterly Reviews of Biophysics, vol. 36, no. 3, pp. 307–340, 2003. View at Publisher · View at Google Scholar · View at Scopus
  17. L. Han, J. Cui, H. Lin et al., “Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity,” Proteomics, vol. 6, no. 14, pp. 4023–4037, 2006. View at Publisher · View at Google Scholar · View at Scopus
  18. S. Hwang, Z. Guo, and I. B. Kuznetsov, “DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins,” Bioinformatics, vol. 23, no. 5, pp. 634–636, 2007. View at Publisher · View at Google Scholar · View at Scopus
  19. L. Wang and S. J. Brown, “BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences,” Nucleic Acids Research, vol. 34, pp. W243–W248, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. C. Z. Cai, L. Y. Han, Z. L. Ji, X. Chen, and Y. Z. Chen, “SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence,” Nucleic Acids Research, vol. 31, no. 13, pp. 3692–3697, 2003. View at Publisher · View at Google Scholar · View at Scopus
  21. M. Bhasin and G. P. S. Raghava, “GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors,” Nucleic Acids Research, vol. 33, supplement 2, pp. W143–W147, 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. K.-C. Chou and H.-B. Shen, “MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM,” Biochemical and Biophysical Research Communications, vol. 360, no. 2, pp. 339–345, 2007. View at Publisher · View at Google Scholar · View at Scopus
  23. S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information,” Bioinformatics, vol. 20, no. 4, pp. 477–486, 2004. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Ahmad and A. Sarai, “PSSM-based prediction of DNA binding sites in proteins,” BMC Bioinformatics, vol. 6, article 33, 2005. View at Publisher · View at Google Scholar · View at Scopus
  25. N. Bhardwaj, R. E. Langlois, G. Zhao, and H. Lu, “Kernel-based machine learning protocol for predicting DNA-binding proteins,” Nucleic Acids Research, vol. 33, no. 20, pp. 6486–6493, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. I. B. Kuznetsov, Z. Gou, R. Li, and S. Hwang, “Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins,” Proteins: Structure, Function, and Bioinformatics, vol. 64, no. 1, pp. 19–27, 2006. View at Publisher · View at Google Scholar · View at Scopus
  27. Y. Fang, Y. Guo, Y. Feng, and M. Li, “Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features,” Amino Acids, vol. 34, no. 1, pp. 103–109, 2008. View at Publisher · View at Google Scholar · View at Scopus
  28. T. Li, Q.-Z. Li, S. Liu, G.-L. Fan, Y.-C. Zuo, and Y. Peng, “PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information,” Bioinformatics, vol. 29, no. 6, pp. 678–685, 2013. View at Publisher · View at Google Scholar · View at Scopus
  29. X. Ma, J. Wu, and X. Xue, “Identification of DNA-binding proteins using support vector machine with sequence information,” Computational and Mathematical Methods in Medicine, vol. 2013, Article ID 524502, 8 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  30. Y. Ofran, V. Mysore, and B. Rost, “Prediction of DNA-binding residues from sequence,” Bioinformatics, vol. 23, no. 13, pp. i347–i353, 2007. View at Publisher · View at Google Scholar · View at Scopus
  31. L. Wang, M. Q. Yang, and J. Y. Yang, “Prediction of DNA-binding residues from protein sequence information using random forests,” BMC Genomics, vol. 10, supplement 1, article S1, 2009. View at Publisher · View at Google Scholar · View at Scopus
  32. J. Wu, H. Liu, X. Duan et al., “Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature,” Bioinformatics, vol. 25, no. 1, pp. 30–35, 2009. View at Publisher · View at Google Scholar · View at Scopus
  33. W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, “iDNA-prot: identification of DNA binding proteins using random forest with grey model,” PLoS ONE, vol. 6, no. 9, Article ID e24756, 2011. View at Publisher · View at Google Scholar · View at Scopus
  34. W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and H. Zhang, “Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes,” PLoS ONE, vol. 9, no. 1, Article ID e86703, 2014. View at Publisher · View at Google Scholar · View at Scopus
  35. M. Terribilini, J. H. Lee, C. Yan, R. L. Jernigan, V. Honavar, and D. Dobbs, “Prediction of RNA binding sites in proteins from amino acid sequence,” RNA, vol. 12, no. 8, pp. 1450–1462, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “Prediction of RNA binding sites in a protein using SVM and PSSM profile,” Proteins, vol. 71, no. 1, pp. 189–194, 2008. View at Publisher · View at Google Scholar · View at Scopus
  37. C.-W. Cheng, E. C.-Y. Su, J.-K. Hwang, T.-Y. Sung, and W.-L. Hsu, “Predicting RNA-binding sites of proteins using support vector machines and evolutionary information,” BMC Bioinformatics, vol. 9, supplement 12, article S6, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. S. R. Maetschke and Z. Yuan, “Exploiting structural and topological information to improve prediction of RNA-protein binding sites,” BMC Bioinformatics, vol. 10, article 341, 2009. View at Publisher · View at Google Scholar · View at Scopus
  39. X. Ma, J. Guo, J. Wu et al., “Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature,” Proteins: Structure, Function and Bioinformatics, vol. 79, no. 4, pp. 1230–1239, 2011. View at Publisher · View at Google Scholar · View at Scopus
  40. C. R. Peng, L. Liu, B. Niu et al., “Prediction of RNA-binding proteins by voting systems,” Journal of Biomedicine and Biotechnology, vol. 2011, Article ID 506205, 8 pages, 2011. View at Publisher · View at Google Scholar · View at Scopus
  41. X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, “Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines,” Journal of Theoretical Biology, vol. 240, no. 2, pp. 175–184, 2006. View at Publisher · View at Google Scholar · View at Scopus
  42. L. Wang, C. Huang, M. Q. Yang, and J. Y. Yang, “BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features,” BMC Systems Biology, vol. 4, supplement 1, article S3, 2010. View at Publisher · View at Google Scholar · View at Scopus
  43. H. H. Lin, L. Y. Han, H. L. Zhang et al., “Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach,” BMC Bioinformatics, vol. 7, supplement 5, article S13, 2006. View at Publisher · View at Google Scholar · View at Scopus
  44. J. C. Ebert and R. B. Altman, “Robust recognition of zinc binding sites in proteins,” Protein Science, vol. 17, no. 1, pp. 54–65, 2008. View at Publisher · View at Google Scholar · View at Scopus
  45. M. Gao and J. Skolnick, “DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions,” Nucleic Acids Research, vol. 36, no. 12, pp. 3978–3992, 2008. View at Publisher · View at Google Scholar · View at Scopus
  46. H. Zhao, Y. Yang, and Y. Zhou, “Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets,” Nucleic Acids Research, vol. 39, no. 8, pp. 3017–3025, 2011. View at Publisher · View at Google Scholar · View at Scopus
  47. S. Hua and Z. Sun, “Support vector machine approach for protein subcellular localization prediction,” Bioinformatics, vol. 17, no. 8, pp. 721–728, 2001. View at Publisher · View at Google Scholar · View at Scopus
  48. K.-C. Chou and Y.-D. Cai, “Using functional domain composition and support vector machines for prediction of protein subcellular location,” The Journal of Biological Chemistry, vol. 277, no. 48, pp. 45765–45769, 2002. View at Publisher · View at Google Scholar · View at Scopus
  49. J. Wang, W.-K. Sung, A. Krishnan, and K.-B. Li, “Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines,” BMC Bioinformatics, vol. 6, article 174, 2005. View at Publisher · View at Google Scholar · View at Scopus
  50. D. Sarda, G. H. Chua, K.-B. Li, and A. Krishnan, “pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties,” BMC Bioinformatics, vol. 6, article 152, 2005. View at Publisher · View at Google Scholar · View at Scopus
  51. A. Garg, M. Bhasin, and G. P. S. Raghava, “Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search,” The Journal of Biological Chemistry, vol. 280, no. 15, pp. 14427–14432, 2005. View at Publisher · View at Google Scholar · View at Scopus
  52. M. Bhasin, A. Garg, and G. P. S. Raghava, “PSLpred: prediction of subcellular localization of bacterial proteins,” Bioinformatics, vol. 21, no. 10, pp. 2522–2524, 2005. View at Publisher · View at Google Scholar · View at Scopus
  53. Y. Huang and Y. Li, “Prediction of protein subcellular locations using fuzzy k-NN method,” Bioinformatics, vol. 20, no. 1, pp. 21–28, 2004. View at Publisher · View at Google Scholar · View at Scopus
  54. Q.-B. Gao, Z.-Z. Wang, C. Yan, and Y.-H. Du, “Prediction of protein subcellular location using a combined feature of sequence,” FEBS Letters, vol. 579, no. 16, pp. 3444–3448, 2005. View at Publisher · View at Google Scholar · View at Scopus
  55. P. Jia, Z. Qian, Z. Zeng, Y. Cai, and Y. Li, “Prediction of subcellular protein localization based on functional domain composition,” Biochemical and Biophysical Research Communications, vol. 357, no. 2, pp. 366–370, 2007. View at Publisher · View at Google Scholar · View at Scopus
  56. T. Wang and J. Yang, “Predicting subcellular localization of gram-negative bacterial proteins by linear dimensionality reduction method,” Protein and Peptide Letters, vol. 17, no. 1, pp. 32–37, 2010. View at Publisher · View at Google Scholar · View at Scopus
  57. A. Höglund, P. Dönnes, T. Blum, H.-W. Adolph, and O. Kohlbacher, “MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition,” Bioinformatics, vol. 22, no. 10, pp. 1158–1165, 2006. View at Publisher · View at Google Scholar · View at Scopus
  58. J.-Y. Shi, S.-W. Zhang, Q. Pan, Y.-M. Cheng, and J. Xie, “Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition,” Amino Acids, vol. 33, no. 1, pp. 69–74, 2007. View at Publisher · View at Google Scholar · View at Scopus
  59. W.-L. Huang, C.-W. Tung, H.-L. Huang, S.-F. Hwang, and S.-Y. Ho, “ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features,” BioSystems, vol. 90, no. 2, pp. 573–581, 2007. View at Publisher · View at Google Scholar · View at Scopus
  60. T. Tamura and T. Akutsu, “Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition,” BMC Bioinformatics, vol. 8, article 466, 2007. View at Publisher · View at Google Scholar · View at Scopus
  61. M. Rashid, S. Saha, and G. P. S. Raghava, “Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs,” BMC Bioinformatics, vol. 8, no. 1, article 337, 2007. View at Publisher · View at Google Scholar · View at Scopus
  62. F.-M. Li and Q.-Z. Li, “Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach,” Protein and Peptide Letters, vol. 15, no. 6, pp. 612–616, 2008. View at Publisher · View at Google Scholar · View at Scopus
  63. C. S. Ong and A. Zien, “An automated combination of kernels for predicting protein subcellular localization,” in Algorithms in Bioinformatics, pp. 186–197, 2008. View at Google Scholar
  64. B. Jin, Y. Tang, and Y. Q. Zhang, “Hybrid SVM-ANFIS for protein subcellular location prediction,” International Journal of Computational Intelligence in Bioinformatics and Systems Biology, vol. 1, no. 1, pp. 59–73, 2009. View at Publisher · View at Google Scholar
  65. S. Briesemeister, T. Blum, S. Brady, Y. Lam, O. Kohlbacher, and H. Shatkay, “SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins,” Journal of Proteome Research, vol. 8, no. 11, pp. 5363–5366, 2009. View at Publisher · View at Google Scholar · View at Scopus
  66. J. Ma and H. Gu, “A novel method for predicting protein subcellular localization based on pseudo amino acid composition,” BMB Reports, vol. 43, no. 10, pp. 670–676, 2010. View at Publisher · View at Google Scholar · View at Scopus
  67. C. Mooney, Y. H. Wang, and G. Pollastri, “De novo protein subcellular localization prediction by N-to-1 neural networks,” in Computational Intelligence Methods for Bioinformatics and Biostatistics, pp. 31–43, 2011. View at Google Scholar
  68. G.-L. Fan and Q.-Z. Li, “Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition,” Amino Acids, vol. 43, no. 2, pp. 545–555, 2012. View at Publisher · View at Google Scholar · View at Scopus
  69. A. S. Mer and M. A. Andrade-Navarro, “A novel approach for protein subcellular location prediction using amino acid exposure,” BMC Bioinformatics, vol. 14, no. 1, article 342, 2013. View at Publisher · View at Google Scholar · View at Scopus
  70. Y.-D. Cai and K.-C. Chou, “Using functional domain composition to predict enzyme family classes,” Journal of Proteome Research, vol. 4, no. 1, pp. 109–111, 2005. View at Publisher · View at Google Scholar · View at Scopus
  71. W.-L. Huang, H.-M. Chen, S.-F. Hwang, and S.-Y. Ho, “Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method,” BioSystems, vol. 90, no. 2, pp. 405–413, 2007. View at Publisher · View at Google Scholar · View at Scopus
  72. H.-B. Shen and K.-C. Chou, “EzyPred: a top-down approach for predicting enzyme functional classes and subclasses,” Biochemical and Biophysical Research Communications, vol. 364, no. 1, pp. 53–59, 2007. View at Publisher · View at Google Scholar · View at Scopus
  73. E. Nasibov and C. Kandemir-Cavas, “Efficiency analysis of KNN and minimum distance-based classifiers in enzyme family prediction,” Computational Biology and Chemistry, vol. 33, no. 6, pp. 461–464, 2009. View at Publisher · View at Google Scholar · View at Scopus
  74. T.-L. Zhang, Y.-S. Ding, and K.-C. Chou, “Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern,” Journal of Theoretical Biology, vol. 250, no. 1, pp. 186–193, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  75. X. B. Zhou, C. Chen, Z. C. Li, and X. Y. Zou, “Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes,” Journal of Theoretical Biology, vol. 248, no. 3, pp. 546–551, 2007. View at Publisher · View at Google Scholar · View at Scopus
  76. J.-D. Qiu, J.-H. Huang, S.-P. Shi, and R.-P. Liang, “Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform,” Protein and Peptide Letters, vol. 17, no. 6, pp. 715–722, 2010. View at Publisher · View at Google Scholar · View at Scopus
  77. Y.-C. Wang, X.-B. Wang, Z.-X. Yang, and N.-Y. Deng, “Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature,” Protein and Peptide Letters, vol. 17, no. 11, pp. 1441–1449, 2010. View at Publisher · View at Google Scholar · View at Scopus
  78. L. Lu, Z. Qian, Y.-D. Cai, and Y. Li, “ECS: an automatic enzyme classifier based on functional domain composition,” Computational Biology and Chemistry, vol. 31, no. 3, pp. 226–232, 2007. View at Publisher · View at Google Scholar · View at Scopus
  79. Y. C. Wang, Y. Wang, Z. X. Yang, and N. Y. Deng, “Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context,” BMC Systems Biology, vol. 5, supplement 1, article S6, 2011. View at Publisher · View at Google Scholar · View at Scopus
  80. A. Yadav and V. K. Jayaraman, “Structure based function prediction of proteins using fragment library frequency vectors,” Bioinformation, vol. 8, no. 19, pp. 953–956, 2012. View at Publisher · View at Google Scholar
  81. C. Chen, Y.-X. Tian, X.-Y. Zou, P.-X. Cai, and J.-Y. Mo, “Using pseudo-amino acid composition and support vector machine to predict protein structural class,” Journal of Theoretical Biology, vol. 243, no. 3, pp. 444–448, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  82. A. Garg and G. P. S. Raghava, “A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search,” In Silico Biology, vol. 8, no. 2, pp. 129–140, 2008. View at Google Scholar · View at Scopus
  83. L. C. Borro, S. R. M. Oliveira, M. E. B. Yamagishi et al., “Predicting enzyme class from protein structure using Bayesian classification,” Genetics and Molecular Research, vol. 5, no. 1, pp. 193–202, 2006. View at Google Scholar · View at Scopus
  84. C. Kumar and A. Choudhary, “A top-down approach to classify enzyme functional classes and sub-classes using random forest,” Eurasip Journal on Bioinformatics and Systems Biology, vol. 2012, article 1, 14 pages, 2012. View at Publisher · View at Google Scholar · View at Scopus
  85. C. Nagao, N. Nagano, and K. Mizuguchi, “Prediction of detailed enzyme functions and identification of specificity determining residues by random forests,” PLoS ONE, vol. 9, no. 1, Article ID e84623, 2014. View at Publisher · View at Google Scholar · View at Scopus
  86. B. J. Lee, M. S. Shin, Y. J. Oh, H. S. Oh, and K. H. Ryu, “Identification of protein functions using a machine-learning approach based on sequence-derived properties,” Proteome Science, vol. 7, no. 1, article 27, 2009. View at Publisher · View at Google Scholar · View at Scopus
  87. V. Volpato, A. Adelfio, and G. Pollastri, “Accurate prediction of protein enzymatic class by N-to-1 Neural Networks,” BMC Bioinformatics, vol. 14, supplement 1, article S11, 2013. View at Publisher · View at Google Scholar · View at Scopus
  88. H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, “Identification of prokaryotic and enkaryotic signal peptides and prediction of their cleavage sites,” Protein Engineering, vol. 10, no. 1, pp. 1–6, 1997. View at Publisher · View at Google Scholar · View at Scopus
  89. H. Nielsen, S. Brunak, and G. von Heijne, “Machine learning approaches for the prediction of signal peptides and other protein sorting signals,” Protein Engineering, vol. 12, no. 1, pp. 3–9, 1999. View at Publisher · View at Google Scholar · View at Scopus
  90. M. Reczko, P. Fiziev, E. Staub, and A. Hatzigeorgiou, “Finding signal peptides in human protein sequences using recurrent neural networks,” in Algorithms in Bioinformatics, pp. 60–67, 2002. View at Google Scholar
  91. H.-B. Shen and K.-C. Chou, “Signal-3L: a 3-layer approach for predicting signal peptides,” Biochemical and Biophysical Research Communications, vol. 363, no. 2, pp. 297–303, 2007. View at Publisher · View at Google Scholar · View at Scopus
  92. D. Plewczynski, L. Slabinski, K. Ginalski, and L. Rychlewski, “Prediction of signal peptides in protein sequences by neural networks,” Acta Biochimica Polonica, vol. 55, no. 2, pp. 261–267, 2008. View at Google Scholar · View at Scopus
  93. J. Sun and L. Wang, “Predicting signal peptides and their cleavage sites using support vector machines and improved position weight matrixes,” in Proceedings of the 4th International Conference on Natural Computation (ICNC ’08), vol. 5, pp. 95–99, Jinan, China, October 2008. View at Publisher · View at Google Scholar · View at Scopus
  94. Y. Wang, Q. Zhang, M.-A. Sun, and D. Guo, “High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles,” Bioinformatics, vol. 27, no. 6, pp. 777–784, 2011. View at Publisher · View at Google Scholar · View at Scopus
  95. Z. Zheng, Y. Chen, L. Chen, G. Guo, Y. Fan, and X. Kong, “Signal-BNF: a Bayesian network fusing approach to predict signal peptides,” Journal of Biomedicine and Biotechnology, vol. 2012, Article ID 492174, 8 pages, 2012. View at Publisher · View at Google Scholar · View at Scopus
  96. A. Gutteridge, G. J. Bartlett, and J. M. Thornton, “Using a neural network and spatial clustering to predict the location of active sites in enzymes,” Journal of Molecular Biology, vol. 330, no. 4, pp. 719–734, 2003. View at Publisher · View at Google Scholar · View at Scopus
  97. Y.-R. Tang, Z.-Y. Sheng, Y.-Z. Chen, and Z. Zhang, “An improved prediction of catalytic residues in enzyme structures,” Protein Engineering, Design and Selection, vol. 21, no. 5, pp. 295–302, 2008. View at Publisher · View at Google Scholar · View at Scopus
  98. N. V. Petrova and C. H. Wu, “Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties,” BMC Bioinformatics, vol. 7, article 312, 2006. View at Publisher · View at Google Scholar · View at Scopus
  99. W. Tong, R. J. Williams, Y. Wei, L. F. Murga, J. Ko, and M. J. Ondrechen, “Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines,” Protein Science, vol. 17, no. 2, pp. 333–341, 2008. View at Publisher · View at Google Scholar · View at Scopus
  100. G. Pugalenthi, K. K. Kumar, P. N. Suganthan, and R. Gangal, “Identification of catalytic residues from protein structure using support vector machine with sequence and structural features,” Biochemical and Biophysical Research Communications, vol. 367, no. 3, pp. 630–634, 2008. View at Publisher · View at Google Scholar · View at Scopus
  101. Y.-T. Chien and S.-W. Huang, “On the structural context and identification of enzyme catalytic residues,” BioMed Research International, vol. 2013, Article ID 802945, 9 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  102. M. Bhasin and G. P. S. Raghava, “Classification of nuclear receptors based on amino acid composition and dipeptide composition,” The Journal of Biological Chemistry, vol. 279, no. 22, pp. 23262–23266, 2004. View at Publisher · View at Google Scholar · View at Scopus
  103. J. Cai and Y. Li, “Classification of nuclear receptor subfamilies with RBF Kernel in support vector machine,” in Advances in Neural Networks—ISNN 2005, vol. 3498 of Lecture Notes in Computer Science, pp. 680–685, 2005. View at Publisher · View at Google Scholar
  104. Q.-B. Gao, Z.-C. Jin, X.-F. Ye, C. Wu, and J. He, “Prediction of nuclear receptors with optimal pseudo amino acid composition,” Analytical Biochemistry, vol. 387, no. 1, pp. 54–59, 2009. View at Publisher · View at Google Scholar · View at Scopus
  105. X. Xiao, P. Wang, and K.-C. Chou, “iNR-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix,” PLoS ONE, vol. 7, no. 2, Article ID e30869, 2012. View at Publisher · View at Google Scholar · View at Scopus
  106. P. Wang and X. Xiao, “NRPred-FS: a feature selection based twolevel predictor for nuclear receptors,” Journal of Proteomics & Bioinformatics, supplement 9, article 2, 2014. View at Google Scholar
  107. P. Wang, X. Xiao, and K.-C. Chou, “NR-2l: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features,” PLoS ONE, vol. 6, no. 8, Article ID e23505, 2011. View at Publisher · View at Google Scholar · View at Scopus
  108. M. Bhasin and G. P. S. Raghava, “GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors,” Nucleic Acids Research, vol. 32, supplement 2, pp. W383–W389, 2004. View at Publisher · View at Google Scholar · View at Scopus
  109. Q.-B. Gao and Z.-Z. Wang, “Classification of G-protein coupled receptors at four levels,” Protein Engineering, Design and Selection, vol. 19, no. 11, pp. 511–516, 2006. View at Publisher · View at Google Scholar · View at Scopus
  110. Q. Gu, Y.-S. Ding, and T.-L. Zhang, “Prediction of G-protein-coupled receptor classes in low homology using chous pseudo amino acid composition with approximate entropy and hydrophobicity patterns,” Protein & Peptide Letters, vol. 17, no. 5, pp. 559–567, 2010. View at Publisher · View at Google Scholar · View at Scopus
  111. Z.-L. Peng, J.-Y. Yang, and X. Chen, “An improved classification of G-protein-coupled receptors using sequence-derived features,” BMC Bioinformatics, vol. 11, article 420, 2010. View at Publisher · View at Google Scholar · View at Scopus
  112. T. Wang and J. Yang, Dimensionality Reduction Method for Predicting Membrane Proteins and Their Types, 2010.
  113. G. Huang, Y. Zhang, L. Chen, N. Zhang, T. Huang, and Y. D. Cai, “Prediction of multi-type membrane proteins in human by an integrated approach,” PLoS ONE, vol. 9, no. 3, Article ID e93553, 2014. View at Publisher · View at Google Scholar · View at Scopus
  114. H.-B. Shen, J. Yang, and K.-C. Chou, “Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition,” Journal of Theoretical Biology, vol. 240, no. 1, pp. 9–13, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  115. C. Z. Cai, Q. F. Yuan, H. G. Xiao, X. H. Liu, L. Y. Han, and Y. Z. Chen, “Prediction of transmembrane proteins from their primary sequence by support vector machine approach,” in Computational Intelligence and Bioinformatics, pp. 525–533, Springer, Berlin, Germany, 2006. View at Publisher · View at Google Scholar
  116. X.-G. Yang, R.-Y. Luo, and Z.-P. Feng, “Using amino acid and peptide composition to predict membrane protein types,” Biochemical and Biophysical Research Communications, vol. 353, no. 1, pp. 164–169, 2007. View at Publisher · View at Google Scholar · View at Scopus
  117. P.-Y. Zhao and Y.-S. Ding, “Prediction of membrane protein types by an ensemble classifier based on pseudo amino acid composition and approximate entropy,” in Proceedings of the International Conference on BioMedical Engineering and Informatics (BMEI ’08), vol. 1, pp. 164–168, May 2008. View at Publisher · View at Google Scholar · View at Scopus
  118. M. Deng, K. Zhang, S. Mehta, T. Chen, and F. Sun, “Prediction of protein function using protein-protein interaction data,” Journal of Computational Biology, vol. 10, no. 6, pp. 947–960, 2003. View at Publisher · View at Google Scholar · View at Scopus
  119. E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, “Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps,” Bioinformatics, vol. 21, supplement 1, pp. i302–i310, 2005. View at Publisher · View at Google Scholar · View at Scopus
  120. H. N. Chua, W.-K. Sung, and L. Wong, “Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions,” Bioinformatics, vol. 22, no. 13, pp. 1623–1630, 2006. View at Publisher · View at Google Scholar · View at Scopus
  121. G. Pandey, M. Steinbach, R. Gupta, T. Garg, and V. Kumar, “Association analysis-based transformations for protein interaction networks: a function prediction case study,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’07), pp. 540–549, August 2007. View at Publisher · View at Google Scholar · View at Scopus
  122. C. D. Nguyen, K. J. Gardiner, D. Nguyen, and K. J. Cios, “Prediction of protein functions from protein interaction networks: a Naïve Bayes approach,” in Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence (PRICAI ’08), pp. 788–798, 2008.
  123. P. Bogdanov and A. K. Singh, “Molecular function prediction using neighborhood features,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 208–217, 2010. View at Publisher · View at Google Scholar · View at Scopus
  124. M. Li, X. Wu, J. Wang, and Y. Pan, “Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data,” BMC Bioinformatics, vol. 13, no. 1, article 109, 2012. View at Publisher · View at Google Scholar · View at Scopus
  125. W. Xiong, H. Liu, J. Guan, and S. Zhou, “Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks,” BMC Bioinformatics, vol. 14, supplement 12, article S4, 2013. View at Publisher · View at Google Scholar · View at Scopus
  126. H. Wang, H. Huang, and C. Ding, “Function-function correlated multi-label protein function prediction over interaction networks,” Journal of Computational Biology, vol. 20, no. 4, pp. 322–343, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  127. M. Cao, H. Zhang, J. Park et al., “Going the distance for protein function prediction: a new distance metric for protein interaction networks,” PLoS ONE, vol. 8, no. 10, Article ID e76339, 2013. View at Publisher · View at Google Scholar · View at Scopus
  128. A. Mateos, J. Dopazo, R. Jansen, Y. Tu, M. Gerstein, and G. Stolovitzky, “Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons,” Genome Research, vol. 12, no. 11, pp. 1703–1715, 2002. View at Publisher · View at Google Scholar · View at Scopus
  129. M. Deng, T. Chen, and F. Sun, “An integrated probabilistic model for functional prediction of proteins,” Journal of Computational Biology, vol. 11, no. 2-3, pp. 463–475, 2004. View at Publisher · View at Google Scholar · View at Scopus
  130. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631–643, 2005. View at Publisher · View at Google Scholar · View at Scopus
  131. J. H. Hong and S. B. Cho, “Ensemble genetic programming for classifying gene expression data,” in Proceedings of the 2nd Asian-Pacific Workshop on Genetic Programming, 2004.
  132. T. K. Paul, Y. Hasegawa, and H. Iba, “Classification of gene expression data by majority voting genetic programming classifier,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC ’06), pp. 2521–2528, Vancouver, Canada, July 2006. View at Publisher · View at Google Scholar · View at Scopus
  133. S. M. Winkler, M. Affenzeller, and S. Wagner, “Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis,” Genetic Programming and Evolvable Machines, vol. 10, no. 2, pp. 111–140, 2009. View at Publisher · View at Google Scholar · View at Scopus
  134. K.-H. Liu and C.-G. Xu, “A genetic programming-based approach to the classification of multiclass microarray datasets,” Bioinformatics, vol. 25, no. 3, pp. 331–337, 2009. View at Publisher · View at Google Scholar · View at Scopus
  135. X.-L. Li, Y.-C. Tan, and S.-K. Ng, “Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method,” BMC Bioinformatics, vol. 7, supplement 4, article S23, 2006. View at Publisher · View at Google Scholar · View at Scopus
  136. G.-G. Li and Z.-Z. Wang, “Incorporating heterogeneous biological data sources in clustering gene expression data,” Health, vol. 1, no. 1, pp. 17–23, 2009. View at Google Scholar
  137. L. Tran, “Hypergraph and protein function prediction with gene expression data,” http://arxiv.org/abs/1212.0388.
  138. T. Puelma, R. A. Gutiérrez, and A. Soto, “Discriminative local subspaces in gene expression data for effective gene function prediction,” Bioinformatics, vol. 28, no. 17, pp. 2256–2264, 2012. View at Publisher · View at Google Scholar · View at Scopus
  139. I. Dinu, J. D. Potter, T. Mueller et al., “Improving gene set analysis of microarray data by SAM-GS,” BMC Bioinformatics, vol. 8, article 242, 2007. View at Publisher · View at Google Scholar · View at Scopus
  140. F. Tai and W. Pan, “Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data,” Bioinformatics, vol. 23, no. 23, pp. 3170–3177, 2007. View at Publisher · View at Google Scholar · View at Scopus
  141. H. Pang and H. Zhao, “Building pathway clusters from Random Forests classification using class votes,” BMC Bioinformatics, vol. 9, article 87, 2008. View at Publisher · View at Google Scholar · View at Scopus
  142. J. M. Dale, L. Popescu, and P. D. Karp, “Machine learning methods for metabolic pathway prediction,” BMC Bioinformatics, vol. 11, article 15, 2010. View at Publisher · View at Google Scholar · View at Scopus
  143. W. Zhang, S. Emrich, and E. Zeng, “A two-stage machine learning approach for pathway analysis,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM ’10), pp. 274–279, December 2010. View at Publisher · View at Google Scholar · View at Scopus
  144. B. Shahbaba, C. M. Shachaf, and Z. Yu, “A pathway analysis method for genome-wide association studies,” Statistics in Medicine, vol. 31, no. 10, pp. 988–1000, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  145. L. J. Jensen, M. Skovgaard, and S. Brunak, “Prediction of novel archaeal enzymes from sequence-derived features,” Protein Science, vol. 11, no. 12, pp. 2894–2898, 2002. View at Publisher · View at Google Scholar · View at Scopus
  146. B. Neyshabur, A. Khadem, S. Hashemifar, and S. S. Arab, “NETAL: a new graph-based method for global alignment of protein-protein interaction networks,” Bioinformatics, vol. 29, no. 13, pp. 1654–1662, 2013. View at Publisher · View at Google Scholar · View at Scopus
  147. S.-H. Chen, J. Sun, L. Dimitrov et al., “A support vector machine approach for detecting gene-gene interaction,” Genetic Epidemiology, vol. 32, no. 2, pp. 152–167, 2008. View at Publisher · View at Google Scholar · View at Scopus
  148. S. Asur, D. Ucar, and S. Parthasarathy, “An ensemble framework for clustering protein-protein interaction networks,” Bioinformatics, vol. 23, no. 13, pp. i29–i40, 2007. View at Publisher · View at Google Scholar · View at Scopus
  149. H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and S. R. Holbrook, “Identification of functional modules in protein complexes via hyperclique pattern discovery,” in Proceedings of the Pacific Symposium on Biocomputing, pp. 221–232, 2005.
  150. J. Nikkilä, P. Törönen, S. Kaski, J. Venna, E. Castrén, and G. Wong, “Analysis and visualization of gene expression data using self-organizing maps,” Neural Networks, vol. 15, no. 8-9, pp. 953–966, 2002. View at Publisher · View at Google Scholar · View at Scopus
  151. P. Törönen, M. Kolehmainen, G. Wong, and E. Castrén, “Analysis of gene expression data using self-organizing maps,” FEBS Letters, vol. 451, no. 2, pp. 142–146, 1999. View at Publisher · View at Google Scholar · View at Scopus
  152. P. Tamayo, D. Slonim, J. Mesirov et al., “Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 6, pp. 2907–2912, 1999. View at Publisher · View at Google Scholar · View at Scopus
  153. M. P. S. Brown, W. N. Grundy, D. Lin et al., “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 1, pp. 262–267, 2000. View at Publisher · View at Google Scholar · View at Scopus
  154. K. Torkkola, R. M. Gardner, T. Kaysser-Kranich, and C. Ma, “Self-organizing maps in mining gene expression data,” Information Sciences, vol. 139, no. 1-2, pp. 79–96, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  155. A. Zien, R. Küffner, R. Zimmer, and T. Lengauer, “Analysis of gene expression data with pathway scores,” Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB ’00), vol. 8, pp. 407–417, 2000. View at Google Scholar · View at Scopus