Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 236717, 17 pages
http://dx.doi.org/10.1155/2014/236717
Research Article

An Empirical Study of Different Approaches for Protein Classification

1Dipartimento di Ingegneria dell’Informazione, Via Gradenigo 6/A, 35131 Padova, Italy
2DISI, Università di Bologna, Via Venezia 52, 47521 Cesena, Italy
3Computer Information Systems, Missouri State University, 901 South National, Springfield, MO 65804, USA

Received 24 March 2014; Revised 5 May 2014; Accepted 7 May 2014; Published 15 June 2014

Academic Editor: Wei Chen

Copyright © 2014 Loris Nanni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. J. Wang, Y. Li, Q. Wang et al., “ProClusEnsem: predicting membrane protein types by fusing different models of pseudo amino acid composition,” Computers in Biology and Medicine, vol. 42, no. 5, pp. 564–574, 2012. View at Google Scholar
  2. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at Scopus
  3. K.-C. Chou and H.-B. Shen, “MemType-2L: a Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM,” Biochemical and Biophysical Research Communications, vol. 360, no. 2, pp. 339–345, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. K.-C. Chou and H.-B. Shen, “Recent progress in protein subcellular location prediction,” Analytical Biochemistry, vol. 370, no. 1, pp. 1–16, 2007. View at Publisher · View at Google Scholar · View at Scopus
  5. K.-C. Chou and H.-B. Shen, “Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides,” Biochemical and Biophysical Research Communications, vol. 357, no. 3, pp. 633–640, 2007. View at Publisher · View at Google Scholar · View at Scopus
  6. K.-C. Chou and H.-B. Shen, “A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0,” PLoS ONE, vol. 5, no. 4, Article ID e9931, 2010. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Maddouri and M. Elloumi, “Encoding of primary structures of biological macromolecules within a data mining perspective,” Journal of Computer Science and Technology, vol. 19, no. 1, pp. 78–88, 2004. View at Google Scholar · View at Scopus
  8. R. Saidi, M. Maddouri, and E. Mephu Nguifo, “Protein sequences classification by means of feature extraction with substitution matrices,” BMC Bioinformatics, vol. 11, article 175, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. H. Nakashima, K. Nishikawa, and T. Ooi, “The folding type of a protein is relevant to the amino acid composition,” Journal of Biochemistry, vol. 99, no. 1, pp. 153–162, 1986. View at Google Scholar · View at Scopus
  10. K.-C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins: Structure, Function and Genetics, vol. 43, no. 3, pp. 246–255, 2001. View at Publisher · View at Google Scholar · View at Scopus
  11. K.-C. Chou, “Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology,” Current Proteomics, vol. 6, no. 4, pp. 262–274, 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. L. Nanni, S. Brahnam, and A. Lumini, “High performance set of PseAAC and sequence based descriptors for protein classification,” Journal of Theoretical Biology, vol. 266, no. 1, pp. 1–10, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. L. Nanni and A. Lumini, “An ensemble of K-local hyperplanes for predicting protein-protein interactions,” Bioinformatics, vol. 22, no. 10, pp. 1207–1210, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. H. Lin and H. Ding, “Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition,” Journal of Theoretical Biology, vol. 269, no. 1, pp. 64–69, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. C. H. Q. Ding and I. Dubchak, “Multi-class protein fold recognition using support vector machines and neural networks,” Bioinformatics, vol. 17, no. 4, pp. 349–358, 2001. View at Google Scholar · View at Scopus
  16. H. Lin, W. Chen, L. Yuan, Z. Li, and H. Ding, “Using over-represented tetrapeptides to predict protein submitochondrial locations,” Acta Biotheoretica, vol. 61, no. 2, pp. 259–268, 2013. View at Google Scholar
  17. W.-Z. Lin, X. Xiao, and K.-C. Chou, “GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis,” Protein Engineering, Design and Selection, vol. 22, no. 11, pp. 699–705, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. X. Xiao, W.-Z. Lin, and K.-C. Chou, “Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes,” Journal of Computational Chemistry, vol. 29, no. 12, pp. 2018–2024, 2008. View at Publisher · View at Google Scholar · View at Scopus
  19. X. Xiao, S. Shao, Y. Ding, Z. Huang, and K.-C. Chou, “Using cellular automata images and pseudo amino acid composition to predict protein subcellular location,” Amino Acids, vol. 30, no. 1, pp. 49–54, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. X. Xiao, P. Wang, and K.-C. Chou, “GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes,” Journal of Computational Chemistry, vol. 30, no. 9, pp. 1414–1423, 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. X. Xiao, S. Shao, Y. Ding, Z. Huang, Y. Huang, and K.-C. Chou, “Using complexity measure factor to predict protein subcellular location,” Amino Acids, vol. 28, no. 1, pp. 57–61, 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. T. Jaakkola, M. Diekhans, and D. Haussler, “Using the Fisher kernel method to detect remote protein homologies,” Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB '99), AAAI Press, pp. 149–158, 1999. View at Google Scholar · View at Scopus
  23. C. Leslie, E. Eskin, and W. S. Noble, “The spectrum kernel: a string kernel for SVM protein classification,” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 564–575, 2002. View at Google Scholar · View at Scopus
  24. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, “Mismatch string kernels for discriminative protein classification,” Bioinformatics, vol. 20, no. 4, pp. 467–476, 2004. View at Publisher · View at Google Scholar · View at Scopus
  25. Z. Lei and Y. Dai, “An SVM-based system for predicting protein subnuclear localizations,” BMC Bioinformatics, vol. 6, article 291, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. Z. R. Yang and R. Thomson, “Bio-basis function neural network for prediction of protease cleavage sites in proteins,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 263–274, 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. M. Gribskov, A. D. McLachlan, and D. Eisenberg, “Profile analysis: detection of distantly related proteins,” Proceedings of the National Academy of Sciences of the United States of America, vol. 84, no. 13, pp. 4355–4358, 1987. View at Google Scholar · View at Scopus
  28. L. Nanni, A. Lumini, and S. Brahnam, “An empirical study on the matrix-based protein representations and their combination with sequence-based approaches,” Amino Acids, vol. 44, no. 3, pp. 887–901, 2013. View at Google Scholar
  29. S. R. Hegde, K. Pal, and S. C. Mande, “Differential enrichment of regulatory motifs in the composite network of protein-protein and gene regulatory interactions,” BMC Systems Biology, vol. 8, p. 26, 2014. View at Publisher · View at Google Scholar
  30. T. Huang, P. Wang, Z. Ye et al., “Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties,” PLoS ONE, vol. 5, no. 7, Article ID e11900, 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. L. J. Jensen, M. Kuhn, M. Stark et al., “STRING 8—a global view on proteins and their functional interactions in 630 organisms,” Nucleic Acids Research, vol. 37, no. 1, pp. D412–D416, 2009. View at Publisher · View at Google Scholar · View at Scopus
  32. A. Mora and I. M. Donaldson, “Effects of protein interaction data integration, representation and reliability on the use of network properties for drug target prediction,” BMC Bioinformatics, vol. 13, p. 294, 2012. View at Publisher · View at Google Scholar
  33. R. Schweiger, M. Linial, and N. Linial, “Generative probabilistic models for protein-protein interaction networks-the biclique perspective,” Bioinformatics, vol. 27, no. 13, pp. i142–i148, 2011. View at Publisher · View at Google Scholar · View at Scopus
  34. L. Nanni, S. Brahnam, and A. Lumini, “Wavelet images and Chou's pseudo amino acid composition for protein classification,” Amino Acids, vol. 43, no. 2, pp. 657–665, 2012. View at Google Scholar
  35. S. Kawashima and M. Kanehisa, “AAindex: amino acid index database,” Nucleic Acids Research, vol. 28, no. 1, p. 374, 2000. View at Google Scholar · View at Scopus
  36. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK, 2000.
  37. X. Yu, X. Zheng, T. Liu, Y. Dou, and J. Wang, “Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation,” Amino Acids, vol. 42, no. 5, pp. 1619–1625, 2012. View at Publisher · View at Google Scholar
  38. F.-M. Li and Q.-Z. Li, “Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach,” Protein and Peptide Letters, vol. 15, no. 6, pp. 612–616, 2008. View at Publisher · View at Google Scholar · View at Scopus
  39. S. Brahnam, L. Nanni, J.-Y. Shi, and A. Lumini, “Local phase quantization texture descriptor for protein classification,” in International Conference on Bioinformatics and Computational Biology (BIOCOMP '10), pp. 159–165, Las Vegas, Nev, USA, 2010.
  40. J.-Y. Shi and Y.-N. Zhang, “Using texture descriptor and radon transform to characterize protein structure and build fast fold recognition,” in International Association of Computer Science and Information Technology (IACSIT-SC '09), pp. 466–470, April 2009. View at Publisher · View at Google Scholar · View at Scopus
  41. J. Guo, Y. Lin, and Z. Sun, “A novel method for protein subcellular localization: combining residue-couple model and SVM,” in Proceedings of 3rd Asia-Pacific Bioinformatics Conference, pp. 117–129, 2005.
  42. Y.-H. Zeng, Y.-Z. Guo, R.-Q. Xiao, L. Yang, L.-Z. Yu, and M.-L. Li, “Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach,” Journal of Theoretical Biology, vol. 259, no. 2, pp. 366–372, 2009. View at Publisher · View at Google Scholar · View at Scopus
  43. E. Tantoso and K.-B. Li, “AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices,” Amino Acids, vol. 35, no. 2, pp. 345–353, 2008. View at Publisher · View at Google Scholar · View at Scopus
  44. X. Li, B. Liao, Y. Shu, Q. Zeng, and J. Luo, “Protein functional class prediction using global encoding of amino acid sequence,” Journal of Theoretical Biology, vol. 261, no. 2, pp. 290–293, 2009. View at Publisher · View at Google Scholar · View at Scopus
  45. L. R. Murphy, A. Wallqvist, and R. M. Levy, “Simplified amino acid alphabets for protein fold recognition and implications for folding,” Protein Engineering, vol. 13, no. 3, pp. 149–152, 2000. View at Google Scholar · View at Scopus
  46. M. Kumar, R. Verma, and G. P. S. Raghava, “Prediction of mitochondrial proteins using support vector machine and hidden Markov model,” Journal of Biological Chemistry, vol. 281, no. 9, pp. 5357–5363, 2006. View at Publisher · View at Google Scholar · View at Scopus
  47. J. C. Jeong, X. Lin, and X.-W. Chen, “On position-specific scoring matrix for protein function prediction,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 308–315, 2011. View at Publisher · View at Google Scholar · View at Scopus
  48. A. Garg and D. Gupta, “VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens,” BMC Bioinformatics, vol. 9, article 62, 2008. View at Publisher · View at Google Scholar · View at Scopus
  49. L. Yang, Y. Li, R. Xiao et al., “Using auto covariance method for functional discrimination of membrane proteins based on evolution information,” Amino Acids, vol. 38, no. 5, pp. 1497–1503, 2010. View at Publisher · View at Google Scholar · View at Scopus
  50. G.-L. Fan and Q.-Z. Li, “Predicting protein submitochondrion locations by combining different descriptors into the general form of Chou's pseudo amino acid composition,” Amino Acids, vol. 20, pp. 1–11, 2011. View at Google Scholar
  51. G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” Numerische Mathematik, vol. 14, no. 5, pp. 403–420, 1970. View at Publisher · View at Google Scholar · View at Scopus
  52. N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Transactions on Computers, vol. 23, no. 1, pp. 90–93, 1974. View at Google Scholar
  53. A. Sharma, J. Lyons, A. Dehzangi, and K. K. Paliwal, “A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition,” Journal of Theoretical Biology, vol. 320, pp. 41–46, 2013. View at Publisher · View at Google Scholar
  54. T. Ahonen, J. Matas, C. He, and M. Pietikäinen, “Rotation invariant image description with local binary pattern histogram fourier features,” in Image Analysis, vol. 5575 of Lecture Notes in Computer Science, pp. 61–70, Springer, 2009. View at Publisher · View at Google Scholar · View at Scopus
  55. V. Ojansivu and J. Heikkilä, “Blur insensitive texture classification using local phase quantization,” Image and Signal Processing, vol. 5099, pp. 236–243, 2008. View at Publisher · View at Google Scholar · View at Scopus
  56. K.-C. Chou and H.-B. Shen, “Recent progress in protein subcellular location prediction,” Analytical Biochemistry, vol. 370, no. 1, pp. 1–16, 2007. View at Publisher · View at Google Scholar · View at Scopus
  57. Y. Fang, Y. Guo, Y. Feng, and M. Li, “Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features,” Amino Acids, vol. 34, no. 1, pp. 103–109, 2008. View at Publisher · View at Google Scholar · View at Scopus
  58. L. Nanni, S. Mazzara, L. Pattini, and A. Lumini, “Protein classification combining surface analysis and primary structure,” Protein Engineering, Design and Selection, vol. 22, no. 4, pp. 267–272, 2009. View at Publisher · View at Google Scholar · View at Scopus
  59. X.-Y. Pan, Y.-N. Zhang, and H.-B. Shen, “Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features,” Journal of Proteome Research, vol. 9, no. 10, pp. 4992–5001, 2010. View at Publisher · View at Google Scholar · View at Scopus
  60. P. Du and Y. Li, “Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence,” BMC Bioinformatics, vol. 7, article 518, 2006. View at Publisher · View at Google Scholar · View at Scopus
  61. H.-B. Shen and K.-C. Chou, “Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins,” Protein Engineering, Design and Selection, vol. 20, no. 1, pp. 39–46, 2007. View at Publisher · View at Google Scholar · View at Scopus
  62. J. R. Bock and D. A. Gough, “Whole-proteome interaction mining,” Bioinformatics, vol. 19, no. 1, pp. 125–135, 2003. View at Publisher · View at Google Scholar · View at Scopus
  63. H.-B. Shen and K.-C. Chou, “Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells,” Biopolymers, vol. 85, no. 3, pp. 233–240, 2007. View at Publisher · View at Google Scholar · View at Scopus
  64. X. Guang, Y. Guo, J. Xiao et al., “Predicting the state of cysteines based on sequence information,” Journal of Theoretical Biology, vol. 267, no. 3, pp. 312–318, 2010. View at Publisher · View at Google Scholar · View at Scopus
  65. T. Fawcett, ROC Graphs: Notes and Practical Considerations for Researchers, HP Laboratories, Palo Alto, Calif, USA, 2004.
  66. T. C. W. Landgrebe and R. P. W. Duin, “Approximating the multiclass ROC by pairwise analysis,” Pattern Recognition Letters, vol. 28, no. 13, pp. 1747–1758, 2007. View at Publisher · View at Google Scholar · View at Scopus
  67. Z.-C. Qin, “ROC analysis for predictions made by probabilistic classifiers,” in International Conference on Machine Learning and Cybernetics (ICMLC '05), pp. 3119–3124, August 2005. View at Scopus
  68. A. Ulaş, O. T. Yldz, and E. Alpaydn, “Cost-conscious comparison of supervised learning algorithms over multiple data sets,” Pattern Recognition, vol. 45, no. 4, pp. 1772–1781, 2012. View at Publisher · View at Google Scholar · View at Scopus
  69. C.-D. Huang, C.-T. Lin, and N. R. Pal, “Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification,” IEEE Transactions on Nanobioscience, vol. 2, no. 4, pp. 221–232, 2003. View at Publisher · View at Google Scholar · View at Scopus
  70. A. Chinnasamy, W.-K. Sung, and A. Mittal, “Protein structure and fold prediction using Tree-Augmented naïve Bayesian classifier,” Journal of Bioinformatics and Computational Biology, vol. 3, no. 4, pp. 803–819, 2005. View at Publisher · View at Google Scholar · View at Scopus
  71. S. Martin, D. Roe, and J.-L. Faulon, “Predicting protein-protein interactions using signature products,” Bioinformatics, vol. 21, no. 2, pp. 218–226, 2005. View at Publisher · View at Google Scholar · View at Scopus
  72. K.-L. Lin, C.-Y. Lin, C.-D. Huang et al., “Feature selection and combination criteria for improving accuracy in protein structure prediction,” IEEE Transactions on Nanobioscience, vol. 6, no. 2, pp. 186–196, 2007. View at Publisher · View at Google Scholar · View at Scopus
  73. Y.-K. Chen and K.-B. Li, “Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition,” Journal of Theoretical Biology, vol. 318, pp. 1–12, 2013. View at Publisher · View at Google Scholar · View at Scopus
  74. J. J. Rodríguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: a new classifier ensemble method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619–1630, 2006. View at Publisher · View at Google Scholar · View at Scopus