Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 294279, 10 pages
http://dx.doi.org/10.1155/2014/294279
Research Article

enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

1School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
2Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
3Shanghai Key Laboratory of Intelligent Information Processing, Shanghai 518055, China
4Gordon Life Science Institute, Belmont, Massachusetts, USA
5PKU-HKUST ShenZhen-Hong Kong Institution, Shenzhen, Guangdong 518055, China
6Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
7School of Engineering & Applied Science, Aston University, Birmingham B47ET, UK
8School of Information Science and Technology, Xiamen University, Xiamen, Fujian 316005, China

Received 28 February 2014; Revised 5 May 2014; Accepted 5 May 2014; Published 26 May 2014

Academic Editor: Dongchun Liang

Copyright © 2014 Ruifeng Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Y.-D. Cai and S. L. Lin, “Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence,” Biochimica et Biophysica Acta—Proteins and Proteomics, vol. 1648, no. 1-2, pp. 127–133, 2003. View at Publisher · View at Google Scholar · View at Scopus
  2. X. Yu, J. Cao, Y. Cai et al., “Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines,” Journal of Theoretical Biology, vol. 240, no. 2, pp. 175–184, 2006. View at Google Scholar
  3. L. Nanni and A. Lumini, “An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins,” Amino Acids, vol. 36, no. 2, pp. 167–175, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “Identification of DNA-binding proteins using support vector machines and evolutionary profiles,” BMC Bioinformatics, vol. 8, article 463, 2007. View at Publisher · View at Google Scholar · View at Scopus
  5. N. Bhardwaj, R. E. Langlois, G. Zhao, and H. Lu, “Kernel-based machine learning protocol for predicting DNA-binding proteins,” Nucleic Acids Research, vol. 33, no. 20, pp. 6486–6493, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. Y. Fang, Y. Guo, Y. Feng, and M. Li, “Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features,” Amino Acids, vol. 34, no. 1, pp. 103–109, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. N. Bhardwaj and H. Lu, “Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions,” FEBS Letters, vol. 581, no. 5, pp. 1058–1066, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. K. K. Kumar, G. Pugalenthi, and P. N. Suganthan, “DNA-prot: identification of DNA binding proteins from protein sequence information using random forest,” Journal of Biomolecular Structure and Dynamics, vol. 26, no. 6, pp. 679–686, 2009. View at Google Scholar · View at Scopus
  9. G. Nimrod, A. Szilágyi, C. Leslie, and N. Ben-Tal, “Identification of DNA-binding proteins using structural, electrostatic and evolutionary features,” Journal of Molecular Biology, vol. 387, no. 4, pp. 1040–1053, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. G. Nimrod, M. Schushan, A. Szilágyi, C. Leslie, and N. Ben-Tal, “iDBPs: a web server for the identification of DNA binding proteins,” Bioinformatics, vol. 26, no. 5, pp. 692–693, 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. E. W. Stawiski, L. M. Gregoret, and Y. Mandel-Gutfreund, “Annotating nucleic acid-binding function based on protein structure,” Journal of Molecular Biology, vol. 326, no. 4, pp. 1065–1079, 2003. View at Publisher · View at Google Scholar · View at Scopus
  12. M. Keil, T. E. Exnep, and J. Brickmann, “Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network,” Journal of Computational Chemistry, vol. 25, no. 6, pp. 779–789, 2004. View at Publisher · View at Google Scholar · View at Scopus
  13. S. Ahmad and A. Sarai, “Moment-based prediction of DNA-binding proteins,” Journal of Molecular Biology, vol. 341, no. 1, pp. 65–71, 2004. View at Publisher · View at Google Scholar · View at Scopus
  14. A. K. Patel, S. Patel, and P. K. Naik, “Binary classification of uncharacterized proteins into DNA binding/non-DNA binding proteins from sequence derived features using ANN,” Digest Journal of Nanomaterials and Biostructures, vol. 4, no. 4, pp. 775–782, 2009. View at Google Scholar · View at Scopus
  15. A. K. Patel, S. Patel, and P. K. Naik, “Prediction and classification of DNA binding proteins into four major classes based on simple sequence derived features using ANN,” Digest Journal of Nanomaterials and Biostructures, vol. 5, no. 1, pp. 191–200, 2010. View at Google Scholar · View at Scopus
  16. B. Molparia, K. Goyal, A. Sarkar, S. Kumar, and D. Sundar, “ZiF-predict: a web tool for predicting DNA-binding specificity in C2H2 zinc finger proteins,” Genomics, Proteomics and Bioinformatics, vol. 8, no. 2, pp. 122–126, 2010. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Neumann, J. Holstein, J.-R. le Gall, and E. Lepage, “Measuring performance in health care: case-mix adjustment by boosted decision trees,” Artificial Intelligence in Medicine, vol. 32, no. 2, pp. 97–113, 2004. View at Publisher · View at Google Scholar · View at Scopus
  18. Y. Cai, J. He, X. Li et al., “A novel computational approach to predict transcription factor DNA binding preference,” Journal of Proteome Research, vol. 8, no. 2, pp. 999–1003, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. H. P. Shanahan, M. A. Garcia, S. Jones, and J. M. Thornton, “Identifying DNA-binding proteins using structural motifs and the electrostatic potential,” Nucleic Acids Research, vol. 32, no. 16, pp. 4732–4741, 2004. View at Publisher · View at Google Scholar · View at Scopus
  20. S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information,” Bioinformatics, vol. 20, no. 4, pp. 477–486, 2004. View at Publisher · View at Google Scholar · View at Scopus
  21. E. Nordhoff, A.-M. Krogsdam, H. F. Jørgensen et al., “Rapid identification of DNA-binding proteins by mass spectrometry,” Nature Biotechnology, vol. 17, no. 9, pp. 884–888, 1999. View at Publisher · View at Google Scholar · View at Scopus
  22. A. Bairoch and R. Apweiler, “The SWISS-PROT protein sequence data bank and its supplement TrEMBL,” Nucleic Acids Research, vol. 25, pp. 31–36, 1997. View at Google Scholar
  23. W. Chen, H. Lin, P. M. Feng, C. Ding, Y. C. Zuo, and K. C. Chou, “iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties,” PLoS ONE, vol. 7, Article ID e47843, 2012. View at Google Scholar
  24. W. Chen, P. M. Feng, H. Lin, and K. C. Chou, “iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition,” Nucleic Acids Research, vol. 41, article e69, 2013. View at Google Scholar
  25. X. Xiao, P. Wang, W. Z. Lin, J. H. Jia, and K. C. Chou, “iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types,” Analytical Biochemistry, vol. 436, pp. 168–177, 2013. View at Google Scholar
  26. Y. Xu, X. J. Shao, L. Y. Wu, N. Y. Deng, and K. C. Chou, “iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins,” Peer J, vol. 1, article e171, 2013. View at Google Scholar
  27. B. Liu, J. Xu, Q. Zou, R. Xu, X. Wang, and Q. Chen, “Using distances between Top-n-gram and residue pairs for protein remote homology detection,” BMC Bioinformatics, vol. 15, supplement 2, p. S3, 2014. View at Google Scholar
  28. B. Liu, J. Yi, A. Sv. et al., “QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions,” BMC Genomics, vol. 14, supplement 8, p. S3, 2013. View at Google Scholar
  29. B. Liu, X. Wang, Q. Zou, Q. Dong, and Q. Chen, “Protein remote homology detection by combining chou's pseudo amino acid composition and profile-based protein representation,” Molecular Informatics, vol. 32, pp. 775–782, 2013. View at Google Scholar
  30. B. Liu, X. Wang, Q. Chen, Q. Dong, and X. Lan, “Using amino acid physicochemical distance transformation for fast protein remote homology detection,” PLoS ONE, vol. 7, no. 9, Article ID e46633, 2012. View at Google Scholar
  31. Y. Zhang, B. Liu, Q. Dong, and V. X. Jin, “An improved profile-level domain linker propensity index for protein domain boundary prediction,” Protein and Peptide Letters, vol. 18, no. 1, pp. 7–16, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. B. Liu, X. Wang, L. Lin, B. Tang, Q. Dong, and X. Wang, “Prediction of protein binding sites in protein structures using hidden Markov support vector machine,” BMC Bioinformatics, vol. 10, article 381, 2009. View at Publisher · View at Google Scholar · View at Scopus
  33. B. Liu, X. Wang, L. Lin, Q. Dong, and X. Wang, “Exploiting three kinds of interface propensities to identify protein binding sites,” Computational Biology and Chemistry, vol. 33, no. 4, pp. 303–311, 2009. View at Publisher · View at Google Scholar · View at Scopus
  34. B. Liu, X. Wang, L. Lin, Q. Dong, and X. Wang, “A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis,” BMC Bioinformatics, vol. 9, article 510, 2008. View at Publisher · View at Google Scholar · View at Scopus
  35. B. Liu, D. Zhang, R. Xu et al., “Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection,” Bioinformatics, vol. 30, no. 4, pp. 472–479, 2014. View at Google Scholar
  36. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at Scopus
  37. H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000. View at Google Scholar · View at Scopus
  38. W. Li and A. Godzik, “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658–1659, 2006. View at Publisher · View at Google Scholar · View at Scopus
  39. L. Wang and S. J. Brown, “BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences,” Nucleic Acids Research, vol. 34, pp. W243–W248, 2006. View at Publisher · View at Google Scholar · View at Scopus
  40. E. Gasteiger, A. Gattiker, C. Hoogland, I. Ivanyi, R. D. Appel, and A. Bairoch, “ExPASy: the proteomics server for in-depth protein knowledge and analysis,” Nucleic Acids Research, vol. 31, no. 13, pp. 3784–3788, 2003. View at Publisher · View at Google Scholar · View at Scopus
  41. N. M. Luscombe, S. E. Austin, H. M. Berman, and J. M. Thornton, “An overview of the structures of protein-DNA complexes,” Genome Biology, vol. 1, no. 1, 2000. View at Google Scholar · View at Scopus
  42. C. Z. Cai, L. Y. Han, Z. L. Ji, X. Chen, and Y. Z. Chen, “SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence,” Nucleic Acids Research, vol. 31, no. 13, pp. 3692–3697, 2003. View at Publisher · View at Google Scholar · View at Scopus
  43. C. Lin, Y. Zou, J. Qin et al., “Hierarchical classification of protein folds using a novel ensemble classifier,” PLoS ONE, vol. 8, no. 2, Article ID e56499, 2013. View at Google Scholar
  44. R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, 1990. View at Publisher · View at Google Scholar · View at Scopus
  45. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to Boosting,” Journal of Computer and System Sciences, vol. 55, pp. 119–139, 1997. View at Google Scholar
  46. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996. View at Google Scholar
  47. D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–260, 1992. View at Google Scholar · View at Scopus
  48. C. Lin, Y. Zou, J. Qin, X. Liu, and Y. Jiang, “Hierarchical classification of protein folds using a novel ensemble classifier,” PLoS ONE, vol. 8, no. 2, Article ID e56499, 2013. View at Google Scholar
  49. E. Frank, M. Hall, L. Trigg, G. Holmes, and I. H. Witten, “Data mining in bioinformatics using Weka,” Bioinformatics, vol. 20, no. 15, pp. 2479–2481, 2004. View at Publisher · View at Google Scholar · View at Scopus
  50. W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, “iDNA-prot: identification of DNA binding proteins using random forest with grey model,” PLoS ONE, vol. 6, no. 9, Article ID e24756, 2011. View at Publisher · View at Google Scholar · View at Scopus
  51. J. Deng, “Grey entropy and grey target decision making,” The Journal of Grey System, vol. 22, no. 1, pp. 1–24, 2010. View at Google Scholar · View at Scopus