Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015, Article ID 425810, 10 pages
http://dx.doi.org/10.1155/2015/425810
Research Article

Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

1Golden Audit College, Nanjing Audit University, Nanjing 210029, China
2State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China

Received 24 June 2015; Accepted 21 September 2015

Academic Editor: Liam McGuffin

Copyright © 2015 Xin Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. Ibba and D. Soll, “Protein-RNA molecular recognition,” Nature, vol. 381, no. 6584, p. 656, 1996. View at Publisher · View at Google Scholar · View at Scopus
  2. R. N. De Guzman, R. B. Turner, and M. F. Summers, “Protein-RNA recognition,” Biopolymers, vol. 48, no. 2-3, pp. 181–195, 1998. View at Publisher · View at Google Scholar · View at Scopus
  3. S. Cusack, “Aminoacyl-tRNA synthetases,” Current Opinion in Structural Biology, vol. 7, no. 6, pp. 881–889, 1997. View at Publisher · View at Google Scholar · View at Scopus
  4. S. M. Fernández-Moya and A. M. Estévez, “Posttranscriptional control and the role of RNA-binding proteins in gene regulation in trypanosomatid protozoan parasites,” Wiley Interdisciplinary Reviews: RNA, vol. 1, no. 1, pp. 34–46, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. Y.-D. Cai and A. J. Doig, “Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition,” Bioinformatics, vol. 20, no. 8, pp. 1292–1300, 2004. View at Publisher · View at Google Scholar · View at Scopus
  6. V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998. View at MathSciNet
  7. Y.-D. Cai and S. L. Lin, “Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence,” Biochimica et Biophysica Acta, vol. 1648, no. 1-2, pp. 127–133, 2003. View at Publisher · View at Google Scholar · View at Scopus
  8. L. Y. Han, C. Z. Cai, S. L. Lo, M. C. M. Chung, and Y. Z. Chen, “Prediction of RNA-binding proteins from primary sequence by a support vector machine approach,” RNA, vol. 10, no. 3, pp. 355–368, 2004. View at Publisher · View at Google Scholar · View at Scopus
  9. X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, “Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines,” Journal of Theoretical Biology, vol. 240, no. 2, pp. 175–184, 2006. View at Publisher · View at Google Scholar · View at Scopus
  10. X. Shao, Y. Tian, L. Wu, Y. Wang, L. Jing, and N. Deng, “Predicting DNA- and RNA-binding proteins from sequences with kernel methods,” Journal of Theoretical Biology, vol. 258, no. 2, pp. 289–293, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  11. M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “SVM based prediction of RNA-binding proteins using binding residues and evolutionary information,” Journal of Molecular Recognition, vol. 24, no. 2, pp. 303–313, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  13. T. U. Consortium, “Reorganizing the protein space at the Universal Protein Resource (UniProt),” Nucleic Acids Research, vol. 40, no. 1, pp. D71–D75, 2012. View at Publisher · View at Google Scholar
  14. C. R. Peng, L. Liu, B. Niu et al., “Prediction of RNA-binding proteins by voting systems,” Journal of Biomedicine and Biotechnology, vol. 2011, Article ID 506205, 8 pages, 2011. View at Publisher · View at Google Scholar
  15. X. Ma, J. Guo, J. Wu et al., “Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature,” Proteins, vol. 79, no. 4, pp. 1230–1239, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. S. Ahmad and A. Sarai, “PSSM-based prediction of DNA binding sites in proteins,” BMC Bioinformatics, vol. 6, article 33, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. S.-Y. Ho, F.-C. Yu, C.-Y. Chang, and H.-L. Huang, “Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method,” BioSystems, vol. 90, no. 1, pp. 234–241, 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. L. Wang, M. Q. Yang, and J. Y. Yang, “Prediction of DNA-binding residues from protein sequence information using random forests,” BMC Genomics, vol. 10, supplement 1, article S1, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. J. Wu, H. Liu, X. Duan et al., “Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature,” Bioinformatics, vol. 25, no. 1, pp. 30–35, 2009. View at Publisher · View at Google Scholar · View at Scopus
  20. X. Ma, J.-S. Wu, H.-D. Liu, X.-N. Yang, J.-M. Xie, and X. Sun, “SVM-based approach for predicting DNA-binding residues in proteins from amino acid sequences,” in Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pp. 225–229, Shanghai, China, August 2009. View at Publisher · View at Google Scholar
  21. L. Wang, C. Huang, M. Q. Yang, and J. Y. Yang, “BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features,” BMC Systems Biology, vol. 4, no. 1, article S3, 2010. View at Publisher · View at Google Scholar · View at Scopus
  22. Y.-F. Huang, L.-Y. Chiu, C.-C. Huang, and C.-K. Huang, “Predicting RNA-binding residues from evolutionary information and sequence conservation,” BMC Genomics, vol. 11, supplement 4, article S2, 2010. View at Publisher · View at Google Scholar · View at Scopus
  23. Y. Murakami, R. V. Spriggs, H. Nakamura, and S. Jones, “PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences,” Nucleic Acids Research, vol. 38, supplement 2, pp. W412–W416, 2010. View at Publisher · View at Google Scholar · View at Scopus
  24. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  25. S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information,” Bioinformatics, vol. 20, no. 4, pp. 477–486, 2004. View at Publisher · View at Google Scholar · View at Scopus
  26. S. Ahmad and A. Sarai, “Moment-based prediction of DNA-binding proteins,” Journal of Molecular Biology, vol. 341, no. 1, pp. 65–71, 2004. View at Publisher · View at Google Scholar · View at Scopus
  27. J. R. Bock and D. A. Gough, “Predicting protein—protein interactions from primary structure,” Bioinformatics, vol. 17, no. 5, pp. 455–460, 2001. View at Publisher · View at Google Scholar · View at Scopus
  28. J. Wang, Biochemistry, Higher Education, 2002 (Chinese).
  29. R. E. Buntrock, “ChemOffice ultra 7.0,” Journal of Chemical Information and Computer Sciences, vol. 42, no. 6, pp. 1505–1506, 2002. View at Google Scholar
  30. M. Rose, “Re: Balaban et al.—low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate,” The American Journal of Gastroenterology, vol. 98, no. 10, pp. 2328–2329, 2003. View at Publisher · View at Google Scholar
  31. D. Bonchev, “The overall Wiener index—a new tool for characterization of molecular topology,” Journal of Chemical Information and Computer Sciences, vol. 41, no. 3, pp. 582–592, 2001. View at Publisher · View at Google Scholar · View at Scopus
  32. J. Shen, J. Zhang, X. Luo et al., “Predicting protein-protein interactions based only on sequences information,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 11, pp. 4337–4341, 2007. View at Publisher · View at Google Scholar · View at Scopus
  33. A. Liaw and M. Wiener, “Classification and regression by random forest,” R News, vol. 2, no. 3, pp. 18–22, 2002. View at Google Scholar
  34. C. Zou, J. Gong, and H. Li, “An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis,” BMC Bioinformatics, vol. 14, article 90, 2013. View at Publisher · View at Google Scholar · View at Scopus
  35. Y.-F. Gao, B.-Q. Li, Y.-D. Cai, K.-Y. Feng, Z.-D. Li, and Y. Jiang, “Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection,” Molecular Biosystems, vol. 9, no. 1, pp. 61–69, 2013. View at Publisher · View at Google Scholar · View at Scopus
  36. T. Gui, X. Dong, R. Li, Y. Li, and Z. Wang, “Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis,” Journal of Computational Biology, vol. 22, no. 1, pp. 63–71, 2015. View at Publisher · View at Google Scholar · View at Scopus
  37. B.-Q. Li, Y.-D. Cai, K.-Y. Feng, and G.-J. Zhao, “Prediction of protein cleavage site with feature selection by random forest,” PLoS ONE, vol. 7, no. 9, Article ID e45854, 2012. View at Publisher · View at Google Scholar · View at Scopus
  38. B.-Q. Li, K.-Y. Feng, L. Chen, T. Huang, and Y.-D. Cai, “Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS,” PLoS ONE, vol. 7, no. 8, Article ID e43927, 2012. View at Publisher · View at Google Scholar · View at Scopus
  39. B.-Q. Li, L.-L. Hu, L. Chen, K.-Y. Feng, Y.-D. Cai, and K.-C. Chou, “Prediction of protein domain with mRMR feature selection and analysis,” PLoS ONE, vol. 7, no. 6, Article ID e39308, 2012. View at Publisher · View at Google Scholar · View at Scopus
  40. X. Ma and X. Sun, “Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection,” Journal of Theoretical Biology, vol. 360, pp. 59–66, 2014. View at Publisher · View at Google Scholar · View at Scopus
  41. J. Wang, D. Zhang, and J. Li, “PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection,” BMC Systems Biology, vol. 7, supplement 5, article S9, 2013. View at Publisher · View at Google Scholar · View at Scopus
  42. N. Zhang, Y. Zhou, T. Huang et al., “Discriminating between lysine sumoylation and lysine acetylation using mRMR feature selection and analysis,” PLoS ONE, vol. 9, no. 9, Article ID e107464, 2014. View at Publisher · View at Google Scholar · View at Scopus
  43. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  44. M. Treger and E. Westhof, “Statistical analysis of atomic contacts at RNA-protein interfaces,” Journal of Molecular Recognition, vol. 14, no. 4, pp. 199–214, 2001. View at Publisher · View at Google Scholar · View at Scopus
  45. M. Terribilini, J.-H. Lee, C. Yan, R. L. Jernigan, V. Honavar, and D. Dobbs, “Prediction of RNA binding sites in proteins from amino acid sequence,” RNA, vol. 12, no. 8, pp. 1450–1462, 2006. View at Publisher · View at Google Scholar · View at Scopus
  46. M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “Prediction of RNA binding sites in a protein using SVM and PSSM profile,” Proteins: Structure, Function and Genetics, vol. 71, no. 1, pp. 189–194, 2008. View at Publisher · View at Google Scholar · View at Scopus