Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2016 (2016), Article ID 3281590, 12 pages
http://dx.doi.org/10.1155/2016/3281590
Research Article

RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest

1Department of Computational Science and Engineering, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
2Department of Electrical and Computer Engineering, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA
3Department of Biology, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA

Received 2 October 2015; Revised 13 January 2016; Accepted 31 January 2016

Academic Editor: Zhirong Sun

Copyright © 2016 Hamid D. Ismail et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. T. Hunter, “Signaling—2000 and beyond,” Cell, vol. 100, no. 1, pp. 113–127, 2000. View at Publisher · View at Google Scholar · View at Scopus
  2. R. H. Newman, J. Zhang, and H. Zhu, “Toward a systems-level view of dynamic phosphorylation networks,” Frontiers in Genetics, vol. 5, article 263, 2014. View at Publisher · View at Google Scholar · View at Scopus
  3. E. L. Huttlin, M. P. Jedrychowski, J. E. Elias et al., “A tissue-specific atlas of mouse protein phosphorylation and expression,” Cell, vol. 143, no. 7, pp. 1174–1189, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. P. J. Boersema, S. Mohammed, and A. J. R. Heck, “Phosphopeptide fragmentation and analysis by mass spectrometry,” Journal of Mass Spectrometry, vol. 44, no. 6, pp. 861–878, 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. B. Trost and A. Kusalik, “Computational prediction of eukaryotic phosphorylation sites,” Bioinformatics, vol. 27, no. 21, pp. 2927–2935, 2011. View at Publisher · View at Google Scholar · View at Scopus
  6. M. Hjerrild and S. Gammeltoft, “Phosphoproteomics toolbox: computational biology, protein chemistry and mass spectrometry,” FEBS Letters, vol. 580, no. 20, pp. 4764–4770, 2006. View at Publisher · View at Google Scholar · View at Scopus
  7. Y. Xue, J. Ren, X. Gao, C. Jin, L. Wen, and X. Yao, “GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy,” Molecular & Cellular Proteomics, vol. 7, no. 9, pp. 1598–1608, 2008. View at Publisher · View at Google Scholar · View at Scopus
  8. N. Blom, S. Gammeltoft, and S. Brunak, “Sequence and structure-based prediction of eukaryotic protein phosphorylation sites,” Journal of Molecular Biology, vol. 294, no. 5, pp. 1351–1362, 1999. View at Publisher · View at Google Scholar · View at Scopus
  9. J. H. Kim, J. Lee, B. Oh, K. Kimm, and I. Koh, “Prediction of phosphorylation sites using SVMs,” Bioinformatics, vol. 20, no. 17, pp. 3179–3184, 2004. View at Publisher · View at Google Scholar · View at Scopus
  10. T. Li, F. Li, and X. Zhang, “Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach,” Proteins: Structure, Function and Genetics, vol. 70, no. 2, pp. 404–414, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. Y. Xue, A. Li, L. Wang, H. Feng, and X. Yao, “PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory,” BMC Bioinformatics, vol. 7, article 163, 2006. View at Publisher · View at Google Scholar · View at Scopus
  12. J. C. Obenauer, L. C. Cantley, and M. B. Yaffe, “Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs,” Nucleic Acids Research, vol. 31, no. 13, pp. 3635–3641, 2003. View at Publisher · View at Google Scholar · View at Scopus
  13. J. Hu, H.-S. Rho, R. H. Newman, J. Zhang, H. Zhu, and J. Qian, “Phospho networks: a database for human phosphorylation networks,” Bioinformatics, vol. 30, no. 1, pp. 141–142, 2014. View at Publisher · View at Google Scholar · View at Scopus
  14. M. Wang, Y. Jiang, and X. Xu, “A novel method for predicting post-translational modifications on serine and threonine sites by using site-modification network profiles,” Molecular BioSystems, vol. 11, no. 11, pp. 3092–3100, 2015. View at Publisher · View at Google Scholar
  15. S. Datta and S. Mukhopadhyay, “An ensemble method approach to investigate kinase-specific phosphorylation sites,” International Journal of Nanomedicine, vol. 9, no. 1, pp. 2225–2239, 2014. View at Publisher · View at Google Scholar · View at Scopus
  16. R. Patrick, K.-A. Le Cao, B. Kobe, and M. Boden, “PhosphoPICK: modelling cellular context to map kinase-substrate phosphorylation events,” Bioinformatics, vol. 31, no. 3, pp. 382–389, 2015. View at Publisher · View at Google Scholar · View at Scopus
  17. S. Datta and S. Mukhopadhyay, “A grammar inference approach for predicting kinase specific phosphorylation sites,” PLoS ONE, vol. 10, no. 4, Article ID e0122294, 2015. View at Publisher · View at Google Scholar
  18. W. Fan, X. Xu, Y. Shen, H. Feng, A. Li, and M. Wang, “Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest,” Amino Acids, vol. 46, no. 4, pp. 1069–1078, 2014. View at Publisher · View at Google Scholar · View at Scopus
  19. R. H. Newman, J. Hu, H.-S. Rho et al., “Construction of human activity-based phosphorylation networks,” Molecular Systems Biology, vol. 9, article 655, 2013. View at Publisher · View at Google Scholar · View at Scopus
  20. A. K. Biswas, N. Noman, and A. R. Sikder, “Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information,” BMC Bioinformatics, vol. 11, article 273, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. K. Swaminathan, R. Adamczak, A. Porollo, and J. Meller, “Enhanced prediction of conformational flexibility and phosphorylation in proteins,” in Advances in Computational Biology, vol. 680 of Advances in Experimental Medicine and Biology, pp. 307–319, Springer, New York, NY, USA, 2010. View at Publisher · View at Google Scholar
  22. Y. Dou, B. Yao, and C. Zhang, “PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine,” Amino Acids, vol. 46, no. 6, pp. 1459–1469, 2014. View at Publisher · View at Google Scholar · View at Scopus
  23. B. Trost and A. Kusalik, “Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights,” Bioinformatics, vol. 29, no. 6, pp. 686–694, 2013. View at Publisher · View at Google Scholar · View at Scopus
  24. H. D. Ismail, A. Jones, J. H. Kim, R. H. Newman, and B. K. C. Dukka, “Phosphorylation sites prediction using Random Forest,” in Proceedings of the 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS '15), pp. 1–6, IEEE, Miami, Fla, USA, October 2015. View at Publisher · View at Google Scholar
  25. A. Jones, H. Ismail, J. H. Kim, R. Newman, and B. K. Dukka, “RF-Phos: random forest-based prediction of phosphorylation sites,” in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM '15), pp. 135–140, Washington, DC, USA, November 2015. View at Publisher · View at Google Scholar
  26. J. Gao, J. J. Thelen, A. K. Dunker, and D. Xu, “Musite, a tool for global prediction of general and kinase-specific phosphorylation sites,” Molecular and Cellular Proteomics, vol. 9, no. 12, pp. 2586–2600, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. L. M. Iakoucheva, P. Radivojac, C. J. Brown et al., “The importance of intrinsic disorder for protein phosphorylation,” Nucleic Acids Research, vol. 32, no. 3, pp. 1037–1049, 2004. View at Publisher · View at Google Scholar · View at Scopus
  28. L. M. Iakoucheva, C. J. Brown, J. D. Lawson, Z. Obradović, and A. K. Dunker, “Intrinsic disorder in cell-signaling and cancer-associated proteins,” Journal of Molecular Biology, vol. 323, no. 3, pp. 573–584, 2002. View at Publisher · View at Google Scholar · View at Scopus
  29. S. H. Diks, K. Parikh, M. van der Sijde, J. Joore, T. Ritsema, and M. P. Peppelenbosch, “Evidence for a minimal eukaryotic phosphoproteome?” PLoS ONE, vol. 2, no. 8, article e777, 2007. View at Publisher · View at Google Scholar · View at Scopus
  30. P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, Adaptive Computation and Machine Learning, MIT Press, Cambridge, Mass, USA, 2nd edition, 2001. View at MathSciNet
  31. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  32. H. Dinkel, C. Chica, A. Via et al., “Phospho.ELM: a database of phosphorylation sites-update 2011,” Nucleic Acids Research, vol. 39, no. 1, pp. D261–D267, 2011. View at Publisher · View at Google Scholar · View at Scopus
  33. K. Sikic and O. Carugo, “Protein sequence redundancy reduction: comparison of various methods,” Bioinformation, vol. 5, no. 6, pp. 234–239, 2010. View at Publisher · View at Google Scholar
  34. C. Shannon, “A mathematical theory of communication, Bell System Technical Journal 27: 379–423 and 623–656,” Mathematical Reviews, MR10, 133e, 1948. View at Google Scholar
  35. J. A. Capra and M. Singh, “Predicting functionally important residues from sequence conservation,” Bioinformatics, vol. 23, no. 15, pp. 1875–1882, 2007. View at Publisher · View at Google Scholar · View at Scopus
  36. G. D. Stormo, T. D. Schneider, L. Gold, and A. Ehrenfeucht, “Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli,” Nucleic Acids Research, vol. 10, no. 9, pp. 2997–3011, 1982. View at Publisher · View at Google Scholar · View at Scopus
  37. C. Li, J. Wang, and Y. Zhang, “Similarity analysis of protein sequences based on the normalized relative-entropy,” Combinatorial Chemistry & High Throughput Screening, vol. 11, no. 6, pp. 477–481, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. I. Erill and M. C. O'Neill, “A reexamination of information theory-based methods for DNA-binding site identification,” BMC Bioinformatics, vol. 10, no. 1, article 57, 2009. View at Publisher · View at Google Scholar · View at Scopus
  39. S. Ahmad, M. M. Gromiha, and A. Sarai, “RVP-net: online prediction of real valued accessible surface area of proteins from single sequences,” Bioinformatics, vol. 19, no. 14, pp. 1849–1851, 2003. View at Publisher · View at Google Scholar · View at Scopus
  40. Y. Dou, X. Zheng, J. Yang, and J. Wang, “Prediction of catalytic residues based on an overlapping amino acid classification,” Amino Acids, vol. 39, no. 5, pp. 1353–1361, 2010. View at Publisher · View at Google Scholar · View at Scopus
  41. Y. Dou, J. Wang, J. Yang, and C. Zhang, “L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier,” PLoS ONE, vol. 7, no. 4, Article ID e35666, 2012. View at Publisher · View at Google Scholar · View at Scopus
  42. D. Eisenberg, R. M. Weiss, T. C. Terwilliger, and W. Wilcox, “Hydrophobic moments and protein structure,” Faraday Symposia of the Chemical Society, vol. 17, pp. 109–120, 1982. View at Publisher · View at Google Scholar · View at Scopus
  43. G. Govindan and A. S. Nair, “Composition, Transition and Distribution (CTD)—a dynamic feature for predictions based on hierarchical structure of cellular sorting,” in Proceedings of the Annual IEEE India Conference (INDICON '11), pp. 1–6, IEEE, Hyderabad, India, December 2011. View at Publisher · View at Google Scholar
  44. I. Dubchak, I. Muchnik, S. R. Holbrook, and S.-H. Kim, “Prediction of protein folding class using global description of amino acid sequence,” Proceedings of the National Academy of Sciences of the United States of America, vol. 92, no. 19, pp. 8700–8704, 1995. View at Publisher · View at Google Scholar · View at Scopus
  45. G. Schneider and P. Wrede, “The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site,” Biophysical Journal, vol. 66, no. 2, part 1, pp. 335–344, 1994. View at Publisher · View at Google Scholar · View at Scopus
  46. D.-S. Cao, Q.-S. Xu, and Y.-Z. Liang, “Propy: a tool to generate various modes of Chou's PseAAC,” Bioinformatics, vol. 29, no. 7, pp. 960–962, 2013. View at Publisher · View at Google Scholar · View at Scopus
  47. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  48. G. Schneider and P. Wrede, “The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site,” Biophysical Journal, vol. 66, no. 2, pp. 335–344, 1994. View at Publisher · View at Google Scholar · View at Scopus
  49. T. Jo and J. Cheng, “Improving protein fold recognition by random forest,” BMC Bioinformatics, vol. 15, supplement 11, p. S14, 2014. View at Publisher · View at Google Scholar
  50. J. Jia, X. Xiao, and B. Liu, “Prediction of protein–protein interactions with physicochemical descriptors and wavelet transform via random forests,” Journal of Laboratory Automation, 2015. View at Publisher · View at Google Scholar
  51. Z.-H. You, K. C. Chan, and P. Hu, “Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest,” PLoS ONE, vol. 10, no. 5, Article ID e0125811, 2015. View at Publisher · View at Google Scholar
  52. Z.-P. Liu, L.-Y. Wu, Y. Wang, X.-S. Zhang, and L. Chen, “Prediction of protein-RNA binding sites by a random forest method with combined features,” Bioinformatics, vol. 26, no. 13, pp. 1616–1622, 2010. View at Publisher · View at Google Scholar · View at Scopus
  53. F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn: machine learning in Python,” The Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. View at Google Scholar · View at MathSciNet
  54. N. Blom, T. Sicheritz-Pontén, R. Gupta, S. Gammeltoft, and S. Brunak, “Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence,” Proteomics, vol. 4, no. 6, pp. 1633–1649, 2004. View at Publisher · View at Google Scholar · View at Scopus