Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2017 (2017), Article ID 4590609, 10 pages
https://doi.org/10.1155/2017/4590609
Research Article

HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features

1Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
2School of Computing, Information and Mathematical Sciences, The University of the South Pacific, Suva, Fiji
3Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia
4School of Engineering and Physics, The University of the South Pacific, Suva, Fiji
5RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
6Department of Computer Science, Morgan State University, Baltimore, MD, USA

Correspondence should be addressed to Swakkhar Shatabda

Received 29 August 2017; Accepted 22 October 2017; Published 14 November 2017

Academic Editor: Paul Harrison

Copyright © 2017 Rianon Zaman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. H. M. Berman, J. M. Thornton, N. M. Luscombe, and S. E. Austin, “An overview of the structures of protein-dna complexes,” Genome Biology, vol. 1, 2000. View at Google Scholar
  2. E. W. Stawiski, L. M. Gregoret, and Y. Mandel-Gutfreund, “Annotating nucleic acid-binding function based on protein structure,” Journal of Molecular Biology, vol. 326, no. 4, pp. 1065–1079, 2003. View at Publisher · View at Google Scholar · View at Scopus
  3. S. Jones, J. M. Thornton, H. P. Shanahan, and M. A. Garcia, “Identifying DNA-binding proteins using structural motifs and the electrostatic potential,” Nucleic Acids Research, vol. 32, no. 16, pp. 4732–4741, 2004. View at Publisher · View at Google Scholar · View at Scopus
  4. R. Jaiswal, S. K. Singh, D. Bastia, and C. R. Escalante, “Crystallization and preliminary X-ray characterization of the eukaryotic replication terminator Reb1-Ter DNA complex,” Acta Crystallographica Section F:Structural Biology Communications, vol. 71, pp. 414–418, 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. R. E. Langlois and H. Lu, “Boosting the prediction and understanding of DNA-binding domains from sequence,” Nucleic Acids Research, vol. 38, no. 10, Article ID gkq061, pp. 3149–3158, 2010. View at Publisher · View at Google Scholar · View at Scopus
  6. S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information,” Bioinformatics, vol. 20, no. 4, pp. 477–486, 2004. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “Identification of DNA-binding proteins using support vector machines and evolutionary profiles,” BMC Bioinformatics, vol. 8, no. 1, article 463, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. B. Liu, S. Wang, and X. Wang, “DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation,” Scientific Reports, vol. 5, Article ID 15479, 2015. View at Publisher · View at Google Scholar · View at Scopus
  9. L. Song, D. Li, X. Zeng, Y. Wu, L. Guo, and Q. Zou, “nDNA-prot: identification of DNA-binding proteins based on unbalanced classification,” BMC Bioinformatics, vol. 15, no. 1, article 298, 2014. View at Publisher · View at Google Scholar
  10. C. Yan, M. Terribilini, F. Wu, R. L. Jernigan, D. Dobbs, and V. Honavar, “Predicting DNA-binding sites of proteins from amino acid sequence,” BMC Bioinformatics, vol. 7, no. 1, article 262, 2006. View at Publisher · View at Google Scholar · View at Scopus
  11. W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, “iDNA-prot: identification of DNA binding proteins using random forest with grey model,” PLoS ONE, vol. 6, no. 9, Article ID e24756, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. J. Zhou, Q. Lu, R. Xu, L. Gui, and H. Wang, “CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features,” in Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp. 78–85, China, December 2016. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Szilágyi and J. Skolnick, “Efficient Prediction of Nucleic Acid Binding Function from Low-resolution Protein Structures,” Journal of Molecular Biology, vol. 358, no. 3, pp. 922–933, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. B. Liu, J. Xu, and X. Lan, “iDNA-Prot|dis: identifying dna-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition,” PLoS ONE, vol. 9, no. 9, Article ID e106691, 2014. View at Publisher · View at Google Scholar
  15. Y. Fang, Y. Guo, Y. Feng, and M. Li, “Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features,” Amino Acids, vol. 34, no. 1, pp. 103–109, 2008. View at Publisher · View at Google Scholar · View at Scopus
  16. K. K. Kumar, G. Pugalenthi, and P. N. Suganthan, “DNA-prot: identification of DNA binding proteins from protein sequence information using random forest,” Journal of Biomolecular Structure and Dynamics, vol. 26, no. 6, pp. 679–686, 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and H. Zhang, “Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes,” PLoS ONE, vol. 9, no. 1, Article ID e86703, 2014. View at Publisher · View at Google Scholar · View at Scopus
  18. B. Liu, J. Xu, S. Fan, R. Xu, J. Zhou, and X. Wang, “PseDNA-Pro: DNA-binding protein identification by combining Chou's PseAAC and physicochemical distance transformation,” Molecular Informatics, vol. 34, no. 1, pp. 8–17, 2015. View at Publisher · View at Google Scholar
  19. Q. Dong, S. Wang, K. Wang, X. Liu, and B. Liu, “Identification of DNA-binding proteins by auto-cross covariance transformation,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015, pp. 470–475, USA, November 2015. View at Publisher · View at Google Scholar · View at Scopus
  20. L. Wei, J. Tang, and Q. Zou, “Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information,” Information Sciences, vol. 384, pp. 135–144, 2017. View at Publisher · View at Google Scholar · View at Scopus
  21. R. Xu, J. Zhou, H. Wang, Y. He, X. Wang, and B. Liu, “Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation,” BMC Systems Biology, vol. 9, no. 1, article S10, 2015. View at Publisher · View at Google Scholar · View at Scopus
  22. J. Im, N. Tuvshinjargal, B. Park, W. Lee, D.-S. Huang, and K. Han, “PNImodeler: web server for inferring protein-binding nucleotides from sequence data,” BMC Genomics, vol. 16, no. 3, article S6, 2015. View at Publisher · View at Google Scholar · View at Scopus
  23. I. Paz, E. Kligun, B. Bengad, and Y. Mandel-Gutfreund, “BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins,” Nucleic Acids Research, vol. 44, no. W1, pp. W568–W574, 2016. View at Publisher · View at Google Scholar
  24. H. P. Shanahan, M. A. Garcia, S. Jones, and J. M. Thornton, “Identifying DNA-binding proteins using structural motifs and the electrostatic potential,” Nucleic Acids Research, vol. 32, no. 16, pp. 4732–4741, 2004. View at Publisher · View at Google Scholar · View at Scopus
  25. G. Nimrod, M. Schushan, A. Szilágyi, C. Leslie, and N. Ben-Tal, “iDBPs: a web server for the identification of DNA binding proteins,” Bioinformatics, vol. 26, no. 5, Article ID btq019, pp. 692-693, 2010. View at Publisher · View at Google Scholar · View at Scopus
  26. R. Xu, J. Zhou, B. Liu et al., “Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach,” Journal of Biomolecular Structure and Dynamics, vol. 33, no. 8, pp. 1720–1730, 2015. View at Publisher · View at Google Scholar · View at Scopus
  27. X.-W. Zhao, X.-T. Li, Z.-Q. Ma, and M.-H. Yin, “Identify DNA-binding proteins with optimal Chou's amino acid composition,” Protein and Peptide Letters, vol. 19, no. 4, pp. 398–405, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. J. Lyons, A. Dehzangi, R. Heffernan et al., “Advancing the accuracy of protein fold recognition by utilizing profiles from hidden Markov models,” IEEE Transactions on NanoBioscience, vol. 14, no. 7, pp. 761–772, 2015. View at Publisher · View at Google Scholar
  29. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. M. Remmert, A. Biegert, A. Hauser, and J. Söding, “HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment,” Nature Methods, vol. 9, no. 2, pp. 173–175, 2012. View at Publisher · View at Google Scholar · View at Scopus
  31. B. Liu, L. Fang, F. Liu et al., “Identification of real microRNA precursors with a pseudo structure status composition approach,” PLoS ONE, vol. 10, no. 3, Article ID e0121501, 2015. View at Publisher · View at Google Scholar
  32. H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” in International Tables for Crystallography Volume F: Crystallography of biological macromolecules, pp. 675–684, 2006. View at Google Scholar
  33. I. Dondoshansky and Y. Wolf, Blastclust (NCBI Software Development Toolkit), NCBI, Maryland, Md, USA, 2002.
  34. D. B. Kuchibhatla, W. A. Sherman, B. Y. W. Chung et al., “Powerful sequence similarity search methods and in-depth manual analyses can identify remote homologs in many apparently ‘orphan’ viral proteins,” Journal of Virology, vol. 88, no. 1, pp. 10–20, 2014. View at Publisher · View at Google Scholar · View at Scopus
  35. UniProt Consortium et al., “Uniprot: the universal protein knowledgebase,” Nucleic Acids Research, vol. 45, no. D1, pp. D158–D169, 2017. View at Publisher · View at Google Scholar
  36. Y.-H. Taguchi and M. M. Gromiha, “Application of amino acid occurrence for discriminating different folding types of globular proteins,” BMC Bioinformatics, vol. 8, no. 1, article 404, 2007. View at Publisher · View at Google Scholar · View at Scopus
  37. A. Sharma, J. Lyons, A. Dehzangi, and K. K. Paliwal, “A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition,” Journal of Theoretical Biology, vol. 320, pp. 41–46, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  38. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  39. R. Sharma, A. Dehzangi, J. Lyons, K. Paliwal, T. Tsunoda, and A. Sharma, “Predict gram-positive and gram-negative subcellular localization via incorporating evolutionary information and physicochemical features into chou's general PseAAC,” IEEE Transactions on NanoBioscience, vol. 14, no. 8, pp. 915–926, 2015. View at Publisher · View at Google Scholar · View at Scopus
  40. A. Sharma, K. K. Paliwal, A. Dehzangi, J. Lyons, S. Imoto, and S. Miyano, “A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition,” BMC Bioinformatics, vol. 14, no. 1, article 233, 2013. View at Publisher · View at Google Scholar · View at Scopus
  41. A. Dehzangi, S. Sohrabi, R. Heffernan et al., “Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features,” BMC Bioinformatics, vol. 16, no. 4, article S1, 2015. View at Publisher · View at Google Scholar · View at Scopus
  42. A. Dehzangi, A. Sharma, J. Lyons, K. K. Paliwal, and A. Sattar, “A mixture of physicochemical and evolutionarybased feature extraction approaches for protein fold recognition,” International Journal of Data Mining and Bioinformatics, vol. 11, no. 1, pp. 115–138, 2014. View at Publisher · View at Google Scholar · View at Scopus
  43. A. Dehzangi, K. Paliwal, J. Lyons, A. Sharma, and A. Sattar, “Enhancing protein fold prediction accuracy using evolutionary and structural features,” in Proceedings of the IAPR International Conference on Pattern Recognition in Bioinformatics, pp. 196–207, Springer, Berlin, Germany, 2013.
  44. D. M. Powers, “Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation,” 2011.
  45. B. Efron and G. Gong, “A leisurely look at the bootstrap, the jackknife, and cross-validation,” The American Statistician, vol. 37, no. 1, pp. 36–48, 1983. View at Publisher · View at Google Scholar · View at MathSciNet
  46. F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn: machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. View at Google Scholar · View at MathSciNet
  47. J. D. Thompson, T. J. Gibson, D. G. Higgins et al., “Multiple sequence alignment using ClustalW and ClustalX,” Current Protocols in Bioinformatics, pp. 2-3, 2002. View at Google Scholar
  48. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  49. Q. Zou, J. Zeng, L. Cao, and R. Ji, “A novel features ranking metric with application to scalable visual and bioinformatics data classification,” Neurocomputing, vol. 173, part 2, pp. 346–354, 2016. View at Publisher · View at Google Scholar · View at Scopus