Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015, Article ID 678764, 10 pages
http://dx.doi.org/10.1155/2015/678764
Research Article

AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model

1Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637, USA
2Department of Human Genetics, University of Chicago, E. 58th Street, Chicago, IL 60637, USA

Received 27 December 2014; Accepted 11 March 2015

Academic Editor: Min Li

Copyright © 2015 Jianzhu Ma and Sheng Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. B. Lee and F. M. Richards, “The interpretation of protein structures: estimation of static accessibility,” Journal of Molecular Biology, vol. 55, no. 3, pp. 379–400, 1971. View at Publisher · View at Google Scholar · View at Scopus
  2. W. Kauzmann, “Some factors in the interpretation of protein denaturation,” in Advances in Protein Chemistry, vol. 14, pp. 1–63, 1959. View at Publisher · View at Google Scholar
  3. K. A. Dill, “Dominant forces in protein folding,” Biochemistry, vol. 29, no. 31, pp. 7133–7155, 1990. View at Publisher · View at Google Scholar · View at Scopus
  4. C. Chothia, “Structural invariants in protein folding,” Nature, vol. 254, no. 5498, pp. 304–308, 1975. View at Publisher · View at Google Scholar · View at Scopus
  5. G. D. Rose, A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Zehfus, “Hydrophobicity of amino acid residues in globular proteins,” Science, vol. 229, no. 4716, pp. 834–838, 1985. View at Publisher · View at Google Scholar · View at Scopus
  6. K. A. Sharp, “Extracting hydrophobic free energies from experimental data: relationship to protein folding and theoretical models,” Biochemistry, vol. 30, no. 40, pp. 9686–9697, 1991. View at Publisher · View at Google Scholar · View at Scopus
  7. W. Kabsch and C. Sander, “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features,” Biopolymers—Peptide Science Section, vol. 22, no. 12, pp. 2577–2637, 1983. View at Publisher · View at Google Scholar · View at Scopus
  8. L. J. McGuffin, K. Bryson, and D. T. Jones, “The PSIPRED protein structure prediction server,” Bioinformatics, vol. 16, no. 4, pp. 404–405, 2000. View at Publisher · View at Google Scholar · View at Scopus
  9. D. Frishman and P. Argos, “Knowledgebased protein secondary structure assignment,” Proteins: Structure, Function, and Bioinformatics, vol. 23, no. 4, pp. 566–579, 1995. View at Publisher · View at Google Scholar · View at Scopus
  10. A. G. de Brevern, C. Etchebest, and S. Hazout, “Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks,” Proteins: Structure, Function, and Bioinformatics, vol. 41, no. 3, pp. 271–287, 2000. View at Google Scholar
  11. W.-M. Zheng and X. Liu, “A protein structural alphabet and its substitution matrix CLESUM,” in Transactions on Computational Systems Biology II, vol. 3680 of Lecture Notes in Comput. Sci., pp. 59–67, Springer, Berlin, Germany, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  12. I. Budowski-Tal, Y. Nov, and R. Kolodny, “FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 8, pp. 3481–3486, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. G. J. Kleywegt and T. A. Jones, “Phi/Psi-chology: ramachandran revisited,” Structure, vol. 4, no. 12, pp. 1395–1400, 1996. View at Publisher · View at Google Scholar · View at Scopus
  14. E. Faraggi, B. Xue, and Y. Zhou, “Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network,” Proteins: Structure, Function and Bioinformatics, vol. 74, no. 4, pp. 847–856, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. Y. Shen, F. Delaglio, G. Cornilescu, and A. Bax, “TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts,” Journal of Biomolecular NMR, vol. 44, no. 4, pp. 213–223, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. D. T. Jones, D. W. A. Buchan, D. Cozzetto, and M. Pontil, “PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments,” Bioinformatics, vol. 28, no. 2, Article ID btr638, pp. 184–190, 2012. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Vendruscolo, R. Najmanovich, and E. Domany, “Protein folding in contact map space,” Physical Review Letters, vol. 82, no. 3, pp. 656–659, 1999. View at Publisher · View at Google Scholar · View at Scopus
  18. L. Holm and C. Sander, “Protein structure comparison by alignment of distance matrices,” Journal of Molecular Biology, vol. 233, no. 1, pp. 123–138, 1993. View at Publisher · View at Google Scholar · View at Scopus
  19. F. Zhao and J. Xu, “A position-specific distance-dependent statistical potential for protein structure and functional study,” Structure, vol. 20, no. 6, pp. 1118–1126, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. K. Joo, S. J. Lee, and J. Lee, “Sann: solvent accessibility prediction of proteins by nearest neighbor method,” Proteins: Structure, Function and Bioinformatics, vol. 80, no. 7, pp. 1791–1797, 2012. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Ma, S. Wang, F. Zhao, and J. Xu, “Protein threading using context-specific alignment potential,” Bioinformatics, vol. 29, no. 13, pp. i257–i265, 2013. View at Publisher · View at Google Scholar · View at Scopus
  22. J. Ma, J. Peng, S. Wang, and J. Xu, “A conditional neural fields model for protein threading,” Bioinformatics, vol. 28, no. 12, pp. i59–i66, 2012. View at Publisher · View at Google Scholar · View at Scopus
  23. J. Ma, S. Wang, Z. Wang, and J. Xu, “MRFalign: protein homology detection through alignment of markov random fields,” PLoS Computational Biology, vol. 10, no. 3, Article ID e1003500, 2014. View at Publisher · View at Google Scholar · View at Scopus
  24. P. Benkert, M. Künzli, and T. Schwede, “QMEAN server for protein model quality estimation,” Nucleic Acids Research, vol. 37, no. 2, pp. W510–W514, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Cheng, Z. Wang, A. N. Tegge, and J. Eickholt, “Prediction of global and local quality of CASP8 models by MULTICOM series,” Proteins: Structure, Function and Bioinformatics, vol. 77, no. 9, pp. 181–184, 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. A. R. Kinjo, K. Horimoto, and K. Nishikawa, “Predicting absolute contact numbers of native protein structure from amino acid sequence,” Proteins: Structure, Function and Genetics, vol. 58, no. 1, pp. 158–165, 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. A. Kabakçioǧlu, I. Kanter, M. Vendruscolo, and E. Domany, “Statistical properties of contact vectors,” Physical Review E, vol. 65, no. 4, Article ID 041904, 2002. View at Publisher · View at Google Scholar · View at Scopus
  28. A. N. Tegge, Z. Wang, J. Eickholt, and J. Cheng, “NNcon: improved protein contact map prediction using 2D-recursive neural networks,” Nucleic Acids Research, vol. 37, supplement 2, pp. W515–W518, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. S. R. Holbrook, S. M. Muskal, and S.-H. Kim, “Predicting surface exposure of amino acids from protein sequence,” Protein Engineering, vol. 3, no. 8, pp. 659–665, 1990. View at Publisher · View at Google Scholar · View at Scopus
  30. B. Rost and C. Sander, “Conservation and prediction of solvent accessibility in protein families,” Proteins: Structure, Function and Genetics, vol. 20, no. 3, pp. 216–226, 1994. View at Publisher · View at Google Scholar · View at Scopus
  31. L. Ehrlich, M. Reczko, H. Bohr, and R. C. Wade, “Prediction of protein hydration sites from sequence by modular neural networks,” Protein Engineering, vol. 11, no. 1, pp. 11–19, 1998. View at Publisher · View at Google Scholar · View at Scopus
  32. G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio, “Prediction of coordination number and relative solvent accessibility in proteins,” Proteins: Structure, Function, and Bioinformatics, vol. 47, no. 2, pp. 142–153, 2002. View at Publisher · View at Google Scholar · View at Scopus
  33. S. Ahmad and M. M. Gromiha, “NETASA: neural network based prediction of solvent accessibility,” Bioinformatics, vol. 18, no. 6, pp. 819–824, 2002. View at Publisher · View at Google Scholar · View at Scopus
  34. R. Adamczak, A. Porollo, and J. Meller, “Accurate prediction of solvent accessibility using neural networks-based regression,” Proteins: Structure, Function and Genetics, vol. 56, no. 4, pp. 753–767, 2004. View at Publisher · View at Google Scholar · View at Scopus
  35. Z. Yuan, K. Burrage, and J. S. Mattick, “Prediction of protein solvent accessibility using support vector machines,” Proteins: Structure, Function and Genetics, vol. 48, no. 3, pp. 566–570, 2002. View at Publisher · View at Google Scholar · View at Scopus
  36. H. Kim and H. Park, “Prediction of protein relative solvent accessibility with support vector machines and longrange interaction 3D local descriptor,” Proteins: Structure, Function and Genetics, vol. 54, no. 3, pp. 557–562, 2004. View at Publisher · View at Google Scholar · View at Scopus
  37. M. N. Nguyen and J. C. Rajapakse, “Prediction of protein relative solvent accessibility with a two-stage SVM approach,” Proteins: Structure, Function and Genetics, vol. 59, no. 1, pp. 30–37, 2005. View at Publisher · View at Google Scholar · View at Scopus
  38. M. J. Thompson and R. A. Goldstein, “Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes,” Proteins: Structure, Function, and Genetics, vol. 25, no. 1, pp. 38–47, 1996. View at Publisher · View at Google Scholar
  39. J. Sim, S.-Y. Kim, and J. Lee, “Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method,” Bioinformatics, vol. 21, no. 12, pp. 2844–2849, 2005. View at Publisher · View at Google Scholar · View at Scopus
  40. S. Ahmad, M. M. Gromiha, and A. Sarai, “Real value prediction of solvent accessibility from amino acid sequence,” Proteins: Structure, Function and Genetics, vol. 50, no. 4, pp. 629–635, 2003. View at Publisher · View at Google Scholar · View at Scopus
  41. A. Garg, H. Kaur, and G. P. S. Raghava, “Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure,” Proteins: Structure, Function and Genetics, vol. 61, no. 2, pp. 318–324, 2005. View at Publisher · View at Google Scholar · View at Scopus
  42. Z. Yuan and B. Huang, “Prediction of protein accessible surface areas by support vector regression,” Proteins: Structure, Function, and Bioinformatics, vol. 57, no. 3, pp. 558–564, 2004. View at Publisher · View at Google Scholar · View at Scopus
  43. Z. Yuan, “Better prediction of protein contact number using a support vector regression analysis of amino acid sequence,” BMC Bioinformatics, vol. 6, article 248, 2005. View at Publisher · View at Google Scholar · View at Scopus
  44. N. Goldman, J. L. Thorne, and D. T. Jones, “Assessing the impact of secondary structure and solvent accessibility on protein evolution,” Genetics, vol. 149, no. 1, pp. 445–458, 1998. View at Google Scholar · View at Scopus
  45. Z. Wang, F. Zhao, J. Peng, and J. Xu, “Protein 8-class secondary structure prediction using conditional neural fields,” Proteomics, vol. 11, no. 19, pp. 3786–3792, 2011. View at Publisher · View at Google Scholar · View at Scopus
  46. C. N. Magnan and P. Baldi, “SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity,” Bioinformatics, vol. 30, no. 18, pp. 2592–2597, 2014. View at Publisher · View at Google Scholar
  47. J. Peng, L. Bo, and J. Xu, “Conditional neural fields,” in Advances in Neural Information Processing Systems, 2009. View at Google Scholar
  48. S. Wang, J. Peng, and J. Xu, “Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling,” Bioinformatics, vol. 27, no. 18, pp. 2537–2545, 2011. View at Publisher · View at Google Scholar · View at Scopus
  49. F. Zhao, J. Peng, and J. Xu, “Fragment-free approach to protein folding using conditional neural fields,” Bioinformatics, vol. 26, no. 12, Article ID btq193, pp. i310–i317, 2010. View at Publisher · View at Google Scholar · View at Scopus
  50. M. Källberg, H. Wang, S. Wang et al., “Template-based protein structure modeling using the RaptorX web server,” Nature Protocols, vol. 7, no. 8, pp. 1511–1522, 2012. View at Publisher · View at Google Scholar · View at Scopus
  51. M. Källberg, G. Margaryan, S. Wang, J. Ma, and J. Xu, “RaptorX server: a resource for template-based protein structure modeling,” in Protein Structure Prediction, vol. 1137 of Methods in Molecular Biology, pp. 17–27, Springer, 2014. View at Google Scholar
  52. I. Dubchak, S. Balasubramanian, S. Wang et al., “An integrative computational approach for prioritization of genomic variants,” PLoS ONE, vol. 9, no. 12, Article ID e114903, 2014. View at Publisher · View at Google Scholar
  53. J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning (ICML'01), pp. 282–289, 2001.
  54. R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, pp. 160–167, ACM, July 2008. View at Scopus
  55. R. Caruana, Multitask Learning, Springer, Berlin, Germany, 1998.
  56. C. Chothia, “The nature of the accessible and buried surfaces in proteins,” Journal of Molecular Biology, vol. 105, no. 1, pp. 1–12, 1976. View at Publisher · View at Google Scholar · View at Scopus
  57. H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000. View at Publisher · View at Google Scholar · View at Scopus
  58. D. Kozma, I. Simon, and G. E. Tusnády, “PDBTM: protein data bank of transmembrane proteins after 8 years,” Nucleic Acids Research, vol. 41, no. 1, pp. D524–D529, 2013. View at Publisher · View at Google Scholar · View at Scopus
  59. R. A. Jordan, Y. El-Manzalawy, D. Dobbs, and V. Honavar, “Predicting protein-protein interface residues using local surface structural similarity,” BMC Bioinformatics, vol. 13, article 41, 2012. View at Publisher · View at Google Scholar · View at Scopus
  60. R. Sowdhamini, S. D. Rufino, and T. L. Blundell, “A database of globular protein structural domains: clustering of representative family members into similar folds,” Folding and Design, vol. 1, no. 3, pp. 209–220, 1996. View at Publisher · View at Google Scholar · View at Scopus
  61. J. Moult, “A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction,” Current Opinion in Structural Biology, vol. 15, no. 3, pp. 285–289, 2005. View at Publisher · View at Google Scholar · View at Scopus
  62. R. Adamczak, A. Porollo, and J. Meller, “Combining prediction of secondary structure and solvent accessibility in proteins,” Proteins: Structure, Function and Genetics, vol. 59, no. 3, pp. 467–475, 2005. View at Publisher · View at Google Scholar · View at Scopus
  63. J. Cheng, A. Z. Randall, M. J. Sweredoski, and P. Baldi, “SCRATCH: a protein structure and structural feature prediction server,” Nucleic Acids Research, vol. 33, supplement 2, pp. W72–W76, 2005. View at Publisher · View at Google Scholar · View at Scopus
  64. Y. Y. Tseng and J. Liang, “Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach,” Molecular Biology and Evolution, vol. 23, no. 2, pp. 421–436, 2006. View at Publisher · View at Google Scholar · View at Scopus
  65. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  66. J. Söding, “Protein homology detection by HMM-HMM comparison,” Bioinformatics, vol. 21, no. 7, pp. 951–960, 2005. View at Publisher · View at Google Scholar · View at Scopus
  67. A. Biegert and J. Söding, “Sequence context-specific profiles for homology searching,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 10, pp. 3770–3775, 2009. View at Publisher · View at Google Scholar · View at Scopus
  68. J. Meiler, M. Müller, A. Zeidler, and F. Schmäschke, “Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks,” Journal of Molecular Modeling, vol. 7, no. 9, pp. 360–369, 2001. View at Publisher · View at Google Scholar · View at Scopus
  69. M. Duan, M. Huang, C. Ma, L. Li, and Y. Zhou, “Position-specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures,” Protein Science, vol. 17, no. 9, pp. 1505–1512, 2008. View at Publisher · View at Google Scholar · View at Scopus
  70. Y. H. Tan, H. Huang, and D. Kihara, “Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences,” Proteins: Structure, Function and Genetics, vol. 64, no. 3, pp. 587–600, 2006. View at Publisher · View at Google Scholar · View at Scopus
  71. H. Fei and J. Huan, “Structured feature selection and task relationship inference for multi-task learning,” Knowledge and Information Systems, vol. 35, no. 2, pp. 345–364, 2013. View at Publisher · View at Google Scholar · View at Scopus
  72. O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Z. Ya, and B. Tseng, “Multi-task learning for boosting with application to web search ranking,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '10), pp. 1189–1197, ACM, July 2010. View at Publisher · View at Google Scholar · View at Scopus
  73. J. Chen, J. Liu, and J. Ye, “Learning incoherent sparse and low-rank patterns from multiple tasks,” ACM Transactions on Knowledge Discovery from Data, vol. 5, no. 4, article 22, 2012. View at Publisher · View at Google Scholar · View at Scopus
  74. J. Liu, S. Ji, and J. Ye, “Multi-task feature learning via efficient l2, 1-norm minimization,” in Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pp. 339–348, AUAI Press, 2009.
  75. Y. Qi, M. Oja, J. Weston, and W. S. Noble, “A unified multitask architecture for predicting local protein properties,” PLoS ONE, vol. 7, no. 3, Article ID e32235, 2012. View at Publisher · View at Google Scholar · View at Scopus
  76. S. Wang, J. Ma, J. Peng, and J. Xu, “Protein structure alignment beyond spatial proximity,” Scientific Reports, vol. 3, article 1448, 2013. View at Publisher · View at Google Scholar · View at Scopus
  77. S. Wang and W.-M. Zheng, “CLePAPS: fast pair alignment of protein structures based on conformational letters,” Journal of Bioinformatics and Computational Biology, vol. 6, no. 2, pp. 347–366, 2008. View at Publisher · View at Google Scholar · View at Scopus
  78. S. Wang and W.-M. Zheng, “Fast multiple alignment of protein structures using conformational letter blocks,” The Open Bioinformatics Journal, vol. 3, pp. 69–83, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  79. J. Ma and S. Wang, “Algorithms, applications, and challenges of protein structure alignment,” Advances in Protein Chemistry and Structural Biology, vol. 94, pp. 121–175, 2014. View at Publisher · View at Google Scholar · View at Scopus
  80. Y. Zhang and J. Skolnick, “Scoring function for automated assessment of protein structure template quality,” Proteins: Structure, Function and Genetics, vol. 57, no. 4, pp. 702–710, 2004. View at Publisher · View at Google Scholar · View at Scopus
  81. J. Xu and Y. Zhang, “How significant is a protein structure similarity with TM-score = 0.5?” Bioinformatics, vol. 26, no. 7, pp. 889–895, 2010. View at Publisher · View at Google Scholar · View at Scopus
  82. Q. Luo, R. Hamer, G. Reinert, and C. M. Deane, “Local network patterns in protein-protein interfaces,” PLoS ONE, vol. 8, no. 3, Article ID e57031, 2013. View at Publisher · View at Google Scholar · View at Scopus