Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2013, Article ID 530696, 6 pages
http://dx.doi.org/10.1155/2013/530696
Research Article

Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins

1School of Public Health, Hebei United University, Tangshan 063000, China
2Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
3Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China

Received 10 March 2013; Revised 16 April 2013; Accepted 28 April 2013

Academic Editor: Yanxin Huang

Copyright © 2013 Peng-Mian Feng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Lavigne, P. J. Ceyssens, and J. Robben, “Phage proteomics: applications of mass spectrometry,” Methods in Molecular Biology, vol. 502, pp. 239–251, 2009. View at Google Scholar · View at Scopus
  2. M. A. Hall, Correlation-Based Feature Selection for Machine Learning: Data Mining, Inference and Prediction, Springer, Berlin, Germany, 2008.
  3. K. C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at Scopus
  4. K. C. Chou, “Some remarks on predicting multi-label attributes in molecular biosystems,” Molecular Biosystems, 2013. View at Publisher · View at Google Scholar
  5. W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, “iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins,” Molecular BioSystems, vol. 9, no. 9, pp. 634–644, 2013. View at Publisher · View at Google Scholar
  6. X. Xiao X, P. Wang, W. Z. Lin, J. H. Jia, and K. C. Chou, “iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types,” Analytical Biochemistry, vol. 436, no. 2, pp. 168–177, 2013. View at Google Scholar
  7. W. Chen, P. M. Feng, H. Lin, and K. C. Chou, “iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition,” Nucleic Acids Research, vol. 41, no. 6, article e68, 2013. View at Google Scholar
  8. K. C. Chou, Z. C. Wu, and X. Xiao, “iLoc-Hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites,” Molecular Biosystems, vol. 8, no. 2, pp. 629–641, 2012. View at Google Scholar
  9. W. Chen, H. Lin, P. M. Feng, C. Ding, Y.-C. Zuo, and K.-C. Chou, “iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties,” PLoS ONE, vol. 7, no. 10, Article ID e47843, 2012. View at Google Scholar
  10. Y. Xu, J. Ding, L. Y. Wu, and K.-C. Chou, “iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition,” PLoS ONE, vol. 8, no. 2, Article ID e55844, 2013. View at Google Scholar
  11. UniProt Consortium, “Reorganizing the protein space at the Universal Protein Resource (UniProt),” Nucleic Acids Research, vol. 40, pp. D71–D75, 2012. View at Google Scholar
  12. W. Li and A. Godzik, “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658–1659, 2006. View at Publisher · View at Google Scholar · View at Scopus
  13. K. C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins, vol. 43, no. 3, pp. 246–255, 2001. View at Publisher · View at Google Scholar · View at Scopus
  14. K. C. Chou, “Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes,” Bioinformatics, vol. 21, no. 1, pp. 10–19, 2005. View at Publisher · View at Google Scholar · View at Scopus
  15. L. Nanni, A. Lumini, D. Gupta, and A. Garg, “Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 467–475, 2012. View at Publisher · View at Google Scholar
  16. D. Zou, Z. He, J. He, and Y. Xia, “Supersecondary structure prediction using Chou's pseudo amino acid composition,” Journal of Computational Chemistry, vol. 32, no. 2, pp. 271–278, 2011. View at Publisher · View at Google Scholar · View at Scopus
  17. S. W. Zhang, Y. L. Zhang, H. F. Yang, C. H. Zhao, and Q. Pan, “Using the concept of Chou's pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies,” Amino Acids, vol. 34, no. 4, pp. 565–572, 2008. View at Publisher · View at Google Scholar · View at Scopus
  18. K. K. Kandaswamy, G. Pugalenthi, S. Möller et al., “Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition,” Protein and Peptide Letters, vol. 17, no. 12, pp. 1473–1479, 2010. View at Google Scholar · View at Scopus
  19. S. Mei, “Predicting plant protein subcellular multi-localization by Chou's PseAAC formulation based multi-label homolog knowledge transfer learning,” Journal of Theoretical Biology, vol. 310, pp. 80–87, 2012. View at Publisher · View at Google Scholar
  20. Y. K. Chen and K. B. Li, “Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition,” Journal of Theoretical Biology, vol. 318, pp. 1–12, 2013. View at Google Scholar
  21. M. Hayat and A. Khan, “Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou's PseAAC,” Protein and Peptide Letters, vol. 19, no. 4, pp. 411–421, 2012. View at Google Scholar
  22. M. Khosravian, F. K. Faramarzi, M. M. Beigi, M. Behbahani, and H. Mohabatkar, “Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods,” Protein and Peptide Letters, vol. 20, no. 2, pp. 180–186, 2012. View at Google Scholar
  23. H. Mohabatkar, M. M. Beigi, K. Abdolahi, and S. Mohsenzadeh, “Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach,” Medicinal Chemistry, vol. 9, no. 1, pp. 133–137, 2013. View at Google Scholar
  24. M. M. Beigi, M. Behjati, and H. Mohabatkar, “Prediction of metalloproteinase family based on the concept of Chou's pseudo amino acid composition using a machine learning approach,” Journal of Structural and Functional Genomics, vol. 12, no. 4, pp. 191–197, 2011. View at Google Scholar
  25. S. S. Sahu and G. Panda, “A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction,” Computational Biology and Chemistry, vol. 34, no. 5-6, pp. 320–327, 2010. View at Publisher · View at Google Scholar · View at Scopus
  26. R. Zia-Ur and A. Khan, “Identifying GPCRs and their types with Chou's pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix,” Protein and Peptide Letters, vol. 19, no. 8, pp. 890–903, 2012. View at Google Scholar
  27. X. Y. Sun, S. P. Shi, J. D. Qiu, S. B. Suo, S. Y. Huang, and R. P. Liang, “Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform,” Molecular BioSystems, vol. 8, no. 12, pp. 3178–3184, 2012. View at Google Scholar
  28. L. Nanni and A. Lumini, “Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization,” Amino Acids, vol. 34, no. 4, pp. 653–660, 2008. View at Publisher · View at Google Scholar · View at Scopus
  29. M. Esmaeili, H. Mohabatkar, and S. Mohsenzadeh, “Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses,” Journal of Theoretical Biology, vol. 263, no. 2, pp. 203–209, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. H. Mohabatkar, “Prediction of cyclin proteins using Chou's pseudo amino acid composition,” Protein and Peptide Letters, vol. 17, no. 10, pp. 1207–1214, 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. H. Mohabatkar, M. Mohammad Beigi, and A. Esmaeili, “Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine,” Journal of Theoretical Biology, vol. 281, no. 1, pp. 18–23, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. D. N. Georgiou, T. E. Karakasidis, J. J. Nieto, and A. Torres, “Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition,” Journal of Theoretical Biology, vol. 257, no. 1, pp. 17–26, 2009. View at Publisher · View at Google Scholar · View at Scopus
  33. B. Q. Li, T. Huang, L. Liu, Y. D. Cai, and K. C. Chou, “Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network,” PLoS ONE, vol. 7, no. 4, Article ID e33393, 2012. View at Google Scholar
  34. T. Huang, J. Wang, Y. D. Cai, H. Yu, and K. C. Chou, “Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma,” PLoS ONE, vol. 7, no. 4, Article ID e34460, 2012. View at Google Scholar
  35. P. Du, X. Wang, C. Xu, and Y. Gao, “PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions,” Analytical Biochemistry, vol. 425, no. 2, pp. 117–119, 2012. View at Google Scholar
  36. D. S. Cao, Q. S. Xu, and Y. Z. Liang, “Propy: a tool to generate various modes of Chou's PseAAC,” Bioinformatics, vol. 29, no. 7, pp. 960–962, 2013. View at Google Scholar
  37. L. Montanucci, P. Fariselli, P. L. Martelli, and R. Casadio, “Predicting protein thermostability changes from sequence upon multiple mutations,” Bioinformatics, vol. 24, no. 13, pp. i190–i195, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. M. M. Gromiha and M. X. Suresh, “Discrimination of mesophilic and thermophilic proteins using machine learning algorithms,” Proteins, vol. 70, no. 4, pp. 1274–1279, 2008. View at Publisher · View at Google Scholar · View at Scopus
  39. L. C. Wu, J. X. Lee, H. D. Huang, B. J. Liu, and J. T. Horng, “An expert system to predict protein thermostability using decision tree,” Expert Systems with Applications, vol. 36, no. 5, pp. 9007–9014, 2009. View at Publisher · View at Google Scholar · View at Scopus
  40. H. Lin and W. Chen, “Prediction of thermophilic proteins using feature selection technique,” Journal of Microbiological Methods, vol. 84, no. 1, pp. 67–70, 2011. View at Publisher · View at Google Scholar · View at Scopus
  41. W. Chen and H. Lin, “Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine,” Computers in Biology and Medicine, vol. 42, no. 4, pp. 504–507, 2012. View at Google Scholar
  42. T. M. Mitchell, Machine Learning, McGraw-Hill, New York, NY, USA, 1997.
  43. J. Cao, R. Panetta, S. Yue, A. Steyaert, M. Young-Bellido, and S. Ahmad, “A naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins,” Bioinformatics, vol. 19, no. 2, pp. 234–240, 2003. View at Publisher · View at Google Scholar · View at Scopus
  44. M. Yousef, S. Jung, A. V. Kossenkov, L. C. Showe, and M. K. Showe, “Naïve Bayes for microRNA target predictions—machine learning for microRNA targets,” Bioinformatics, vol. 23, no. 22, pp. 2987–2992, 2007. View at Publisher · View at Google Scholar · View at Scopus
  45. Y. Murakami and K. Mizuguchi, “Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites,” Bioinformatics, vol. 26, no. 15, pp. 1841–1848, 2010. View at Publisher · View at Google Scholar · View at Scopus
  46. F. Sambo, E. Trifoglio, B. Di Camillo, G. M. Toffolo, and C. Cobelli, “Bag of Naïve Bayes: biomarker selection and classification from genome-wide SNP data,” BMC Bioinformatics, vol. 13, supplement 14, article S2, 2012. View at Google Scholar
  47. K. C. Chou, “A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space,” Proteins, vol. 21, no. 4, pp. 319–344, 1995. View at Publisher · View at Google Scholar · View at Scopus
  48. G. P. Zhou, “An intriguing controversy over protein structural class prediction,” Journal of Protein Chemistry, vol. 17, no. 8, pp. 729–738, 1998. View at Publisher · View at Google Scholar · View at Scopus
  49. K. C. Chou and D. W. Elrod, “Using discriminant function for prediction of subcellular location of prokaryotic proteins,” Biochemical and Biophysical Research Communications, vol. 252, no. 1, pp. 63–68, 1998. View at Publisher · View at Google Scholar · View at Scopus
  50. G. P. Zhou and N. Assa-Munt, “Some insights into protein structural class prediction,” Proteins, vol. 44, no. 1, pp. 57–59, 2001. View at Publisher · View at Google Scholar · View at Scopus
  51. Y. X. Pan, Z. Z. Zhang, Z. M. Guo, G. Y. Feng, Z. D. Huang, and L. He, “Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach,” Journal of Protein Chemistry, vol. 22, no. 4, pp. 395–402, 2003. View at Publisher · View at Google Scholar · View at Scopus
  52. G. P. Zhou and K. Doctor, “Subcellular location prediction of apoptosis proteins,” Proteins, vol. 50, no. 1, pp. 44–48, 2003. View at Publisher · View at Google Scholar · View at Scopus
  53. K. C. Chou and H. B. Shen, “Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms,” Natural Science, vol. 2, no. 10, pp. 1090–1103, 2010. View at Google Scholar
  54. M. Hayat and A. Khan, “MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM,” Journal of Theoretical Biology, vol. 292, pp. 93–102, 2012. View at Publisher · View at Google Scholar
  55. X. Xiao, Z. C. Wu, and K. C. Chou, “iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites,” Journal of Theoretical Biology, vol. 284, no. 1, pp. 42–51, 2011. View at Publisher · View at Google Scholar · View at Scopus
  56. K. C. Chou, Z. C. Wu, and X. Xiao, “iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins,” PLoS ONE, vol. 6, no. 3, Article ID e18258, 2011. View at Publisher · View at Google Scholar · View at Scopus
  57. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, Calif, USA, 2005.
  58. K. C. Chou and H. B. Shen, “Review: recent advances in developing web-servers for predicting protein attributes,” Natural Science, vol. 2, pp. 63–92, 2009. View at Google Scholar