Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015 (2015), Article ID 705156, 12 pages
http://dx.doi.org/10.1155/2015/705156
Research Article

JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method

1School of Control Science and Engineering, Shandong University, Jinan 250061, China
2School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, China

Received 3 August 2015; Revised 5 October 2015; Accepted 11 October 2015

Academic Editor: Tatsuya Akutsu

Copyright © 2015 Lina Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. P. Georgopoulos, A. Lundquist-Heil, J. Yochem, and M. Feiss, “Identification of the E. coli dnaJ gene product,” Molecular & General Genetics, vol. 178, no. 3, pp. 583–588, 1980. View at Publisher · View at Google Scholar · View at Scopus
  2. P. Walsh, D. Bursać, Y. C. Law, D. Cyr, and T. Lithgow, “The J-protein family: modulating protein assembly, disassembly and translocation,” EMBO Reports, vol. 5, no. 6, pp. 567–571, 2004. View at Publisher · View at Google Scholar · View at Scopus
  3. X.-B. Qiu, Y.-M. Shao, S. Miao, and L. Wang, “The diversity of the DnaJ/Hsp40 family, the crucial partners for Hsp70 chaperones,” Cellular and Molecular Life Sciences, vol. 63, no. 22, pp. 2560–2570, 2006. View at Publisher · View at Google Scholar · View at Scopus
  4. Y. Kanazawa, H. Isomoto, M. Oka et al., “Expression of heat shock protein (Hsp) 70 and Hsp 40 in colorectal cancer,” Medical Oncology, vol. 20, no. 2, pp. 157–164, 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. S. S. Novoselov, M. V. Verbova, E. V. Vasil’eva, N. K. Vorob’ëva, B. A. Margulis, and I. V. Guzhova, “Expression of Hsp70 and Hdj1 chaperone proteins in human tumor cells,” Voprosy Onkologii, vol. 50, no. 2, pp. 174–178, 2004. View at Google Scholar
  6. Y. Cui, X. Zhang, Y. Gong et al., “Immunization with DnaJ (hsp40) could elicit protection against nasopharyngeal colonization and invasive infection caused by different strains of Streptococcus pneumoniae,” Vaccine, vol. 29, no. 9, pp. 1736–1744, 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. Y. Shen and L. M. Hendershot, “ERdj3, a stress-inducible endoplasmic reticulum DnaJ homologue, serves as a cofactor for BiP’s interactions with unfolded substrates,” Molecular Biology of the Cell, vol. 16, no. 1, pp. 40–50, 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Hageman, M. A. Rujano, M. A. W. H. van Waarde et al., “A DNAJB chaperone subfamily with HDAC-dependent activities suppresses toxic protein aggregation,” Molecular Cell, vol. 37, no. 3, pp. 355–369, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. B. Sha, S. Lee, and D. M. Cyr, “The crystal structure of the peptide-binding fragment from the yeast Hsp40 protein Sis1,” Structure, vol. 8, no. 8, pp. 799–807, 2000. View at Publisher · View at Google Scholar · View at Scopus
  10. P. Kota, D. W. Summers, H.-Y. Ren, D. M. Cyr, and N. V. Dokholyan, “Identification of a consensus motif in substrates bound by a type I Hsp40,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 27, pp. 11073–11078, 2009. View at Publisher · View at Google Scholar · View at Scopus
  11. S. Lee, C. Y. Fan, J. M. Younger, and H. Ren, “Identification of essential residues in the type II Hsp40 Sis1 that function in polypeptide binding,” Journal of Biological Chemistry, vol. 277, no. 24, pp. 21675–21682, 2002. View at Publisher · View at Google Scholar · View at Scopus
  12. J. C. Young, V. R. Agashe, K. Siegers, and F. U. Hartl, “Pathways of chaperone-mediated protein folding in the cytosol,” Nature Reviews Molecular Cell Biology, vol. 5, no. 10, pp. 781–791, 2004. View at Publisher · View at Google Scholar · View at Scopus
  13. M. Rug and A. G. Maier, “The heat shock protein 40 family of the malaria parasite Plasmodium falciparum,” IUBMB Life, vol. 63, no. 12, pp. 1081–1086, 2011. View at Publisher · View at Google Scholar · View at Scopus
  14. P. Feng, H. Lin, W. Chen, and Y. Zuo, “Predicting the types of J-proteins using clustered amino acids,” BioMed Research International, vol. 2014, Article ID 935719, 8 pages, 2014. View at Publisher · View at Google Scholar · View at Scopus
  15. J. N. Sterrenberg, G. L. Blatch, and A. L. Edkins, “Human DNAJ in cancer and stem cells,” Cancer Letters, vol. 312, no. 2, pp. 129–142, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. A. Mitra, R. A. Fillmore, B. J. Metge et al., “Large isoform of MRJ (DNAJB6) reduces malignant activity of breast cancer,” Breast Cancer Research, vol. 10, article R22, 2008. View at Publisher · View at Google Scholar · View at Scopus
  17. J. B. Hu, Y. K. Wu, J. Z. Li, X. Qian, Z. Fu, and B. Sha, “The crystal structure of the putative peptide-binding fragment from the human Hsp40 protein Hdj1,” BMC Structural Biology, vol. 8, article 3, 2008. View at Publisher · View at Google Scholar · View at Scopus
  18. S. L. Zhang, Y. Y. Liang, and X. G. Yuan, “Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity,” Journal of Theoretical Biology, vol. 341, pp. 71–77, 2014. View at Publisher · View at Google Scholar · View at Scopus
  19. Y.-N. Zhang, D.-J. Yu, S.-S. Li, Y.-X. Fan, Y. Huang, and H.-B. Shen, “Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features,” BMC Bioinformatics, vol. 13, pp. 65–70, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. C. X. Zou, J. Y. Gong, and H. L. Li, “An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis,” BMC Bioinformatics, vol. 14, article 90, 2012. View at Publisher · View at Google Scholar · View at Scopus
  21. M. D. Hayat, M. Tahir, and S. A. Khan, “Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces,” Journal of Theoretical Biology, vol. 346, pp. 8–15, 2014. View at Publisher · View at Google Scholar · View at Scopus
  22. J. Qian, D. Q. Miao, Z. H. Zhang, and W. Li, “Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation,” International Journal of Approximate Reasoning, vol. 52, no. 2, pp. 212–230, 2011. View at Publisher · View at Google Scholar · View at Scopus
  23. P. Wang and X. Xiao, “NRPred-FS: a feature selection based two level predictor for nuclear receptors,” Journal of Proteomics & Bioinformatics, supplement 9, article 002, 2014. View at Publisher · View at Google Scholar
  24. L. Li, Y. Zhang, L. Zou et al., “An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity,” PLoS ONE, vol. 7, no. 1, Article ID e31057, 2012. View at Publisher · View at Google Scholar · View at Scopus
  25. H.-L. Xie, L. Fu, and X.-D. Nie, “Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chou’s PseAAC,” Protein Engineering, Design and Selection, vol. 26, no. 11, pp. 735–742, 2013. View at Publisher · View at Google Scholar · View at Scopus
  26. M.-C. Chen, L.-S. Chen, C.-C. Hsu, and W.-R. Zeng, “An information granulation based data mining approach for classifying imbalanced data,” Information Sciences, vol. 178, no. 16, pp. 3214–3227, 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. R. Ratheesh Kumar, N. S. Nagarajan, S. P. Arunraj et al., “HSPIR: a manually annotated heat shock protein information resource,” Bioinformatics, vol. 28, no. 21, pp. 2853–2855, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. W. Li, L. Jaroszewski, and A. Godzik, “CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 17, pp. 282–283, 2001. View at Google Scholar
  29. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at Scopus
  30. M. Hayat and A. Khan, “Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition,” Journal of Theoretical Biology, vol. 271, no. 1, pp. 10–17, 2011. View at Publisher · View at Google Scholar · View at Scopus
  31. M. Kumar, R. Verma, and G. P. S. Raghava, “Prediction of mitochondrial proteins using support vector machine and hidden Markov model,” The Journal of Biological Chemistry, vol. 281, no. 9, pp. 5357–5363, 2006. View at Publisher · View at Google Scholar · View at Scopus
  32. R. Kaundal, R. Saini, and P. X. Zhao, “Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis,” Plant Physiology, vol. 154, no. 1, pp. 36–54, 2010. View at Publisher · View at Google Scholar · View at Scopus
  33. R. Verma, G. C. Varshney, and G. P. S. Raghava, “Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile,” Amino Acids, vol. 39, no. 1, pp. 101–110, 2010. View at Publisher · View at Google Scholar · View at Scopus
  34. T. H. Afridi, A. Khan, and Y. S. Lee, “Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition,” Amino Acids, vol. 42, no. 4, pp. 1443–1454, 2012. View at Publisher · View at Google Scholar · View at Scopus
  35. K.-C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins: Structure, Function, and Bioinformatics, vol. 43, no. 3, pp. 246–255, 2001. View at Publisher · View at Google Scholar · View at Scopus
  36. Z. Hajisharifi, M. Piryaiee, M. M. Beigi, M. Behbahani, and H. Mohabatkar, “Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test,” Journal of Theoretical Biology, vol. 341, pp. 34–40, 2014. View at Publisher · View at Google Scholar · View at Scopus
  37. H. Ding, L. Luo, and H. Lin, “Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition,” Protein and Peptide Letters, vol. 16, no. 4, pp. 351–355, 2009. View at Publisher · View at Google Scholar · View at Scopus
  38. H. Lin and H. Ding, “Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition,” Journal of Theoretical Biology, vol. 269, no. 1, pp. 64–69, 2011. View at Publisher · View at Google Scholar · View at Scopus
  39. H. Ding, L. Liu, F.-B. Guo, J. Huang, and H. Lin, “Identify golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition,” Protein and Peptide Letters, vol. 18, no. 1, pp. 58–63, 2011. View at Publisher · View at Google Scholar · View at Scopus
  40. C.-W. Tung and S.-Y. Ho, “Computational identification of ubiquitylation sites from protein sequences,” BMC Bioinformatics, vol. 9, article 310, 2008. View at Publisher · View at Google Scholar · View at Scopus
  41. K.-C. Chou, “Structural bioinformatics and its impact to biomedical science,” Current Medicinal Chemistry, vol. 11, no. 16, pp. 2105–2134, 2004. View at Publisher · View at Google Scholar · View at Scopus
  42. B.-Q. Li, L.-L. Hu, L. Chen, K.-Y. Feng, Y.-D. Cai, and K.-C. Chou, “Prediction of protein domain with mRMR feature selection and analysis,” PLoS ONE, vol. 7, no. 6, Article ID e39308, 2012. View at Publisher · View at Google Scholar · View at Scopus
  43. X. Zhao, J. Dai, Q. Ning, Z. Ma, M. Yin, and P. Sun, “Position-specific analysis and prediction of protein pupylation sites based on multiple features,” BioMed Research International, vol. 2013, Article ID 109549, 9 pages, 2013. View at Publisher · View at Google Scholar · View at Scopus
  44. Y. Du, J. Zhao, T. Chen et al., “Type I J-domain NbMIP1 proteins are required for both Tobacco mosaic virus infection and plant innate immunity,” PLoS Pathogens, vol. 9, no. 10, Article ID e1003659, 2013. View at Publisher · View at Google Scholar · View at Scopus
  45. S.-A. Chen, Y.-Y. Ou, T.-Y. Lee, and M. M. Gromiha, “Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties,” Bioinformatics, vol. 27, no. 15, Article ID btr340, pp. 2062–2067, 2011. View at Publisher · View at Google Scholar · View at Scopus
  46. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  47. Y.-N. Zhang, D.-J. Yu, S.-S. Li, Y.-X. Fan, Y. Huang, and H.-B. Shen, “Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features,” BMC Bioinformatics, vol. 13, no. 1, article 118, 2012. View at Publisher · View at Google Scholar · View at Scopus
  48. L. Yu, Y. Guo, Z. Zhang et al., “SecretP: a new method for predicting mammalian secreted proteins,” Peptides, vol. 31, no. 4, pp. 574–578, 2010. View at Publisher · View at Google Scholar · View at Scopus
  49. H. Lin, W. Chen, and H. Ding, “AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes,” PLoS ONE, vol. 8, no. 10, Article ID e75726, 2013. View at Publisher · View at Google Scholar · View at Scopus
  50. W. H. Press, B. P. Flannery, S. A. Teukolsky et al., Numerical Recipes in C, Cambridge University Press, Cambridge, UK, 1998.
  51. L. Yu and H. Liu, “Feature selection for high-dimensional data: a fast correlation-based filter solution,” in Proceedings of the 20th International Conference on Machine Learning, pp. 856–863, August 2003. View at Scopus
  52. C. Zou, J. Gong, and H. Li, “An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis,” BMC Bioinformatics, vol. 14, article 90, 2013. View at Publisher · View at Google Scholar · View at Scopus
  53. R. T. Yang, C. J. Zhang, R. Gao, and L. Zhang, “An ensemble method with hybrid features to identify extracellular matrix proteins,” PLoS ONE, vol. 10, no. 2, Article ID e0117804, 2015. View at Publisher · View at Google Scholar · View at Scopus
  54. R. T. Yang, C. J. Zhang, R. Gao, and L. Zhang, “An effective antifreeze protein predictor with ensemble classifiers and comprehensive sequence descriptors,” International Journal of Molecular Sciences, vol. 16, no. 9, pp. 21191–21214, 2015. View at Publisher · View at Google Scholar
  55. L. Nanni, S. Brahnam, and A. Lumini, “Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition,” Journal of Theoretical Biology, vol. 360, pp. 109–116, 2014. View at Publisher · View at Google Scholar · View at Scopus
  56. T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems, vol. 1857 of Lecture Notes in Computer Science, pp. 1–15, Springer, Berlin, Germany, 2000. View at Publisher · View at Google Scholar
  57. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002. View at Google Scholar
  58. K.-C. Chou and C.-T. Zhang, “Prediction of protein structural classes,” Critical Reviews in Biochemistry and Molecular Biology, vol. 30, no. 4, pp. 275–349, 1995. View at Publisher · View at Google Scholar
  59. K.-C. Chou and H.-B. Shen, “Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms,” Nature Protocols, vol. 3, no. 2, pp. 153–162, 2008. View at Publisher · View at Google Scholar · View at Scopus
  60. K.-C. Chou and H.-B. Shen, “Recent progress in protein subcellular location prediction,” Analytical Biochemistry, vol. 370, no. 1, pp. 1–16, 2007. View at Publisher · View at Google Scholar
  61. H. Lin, E. Deng, H. Ding, W. Chen, and K. Chou, “iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition,” Nucleic Acids Research, vol. 42, no. 21, pp. 12961–12972, 2014. View at Publisher · View at Google Scholar
  62. H. Ding and D. Li, “Identification of mitochondrial proteins of malaria parasite using analysis of variance,” Amino Acids, vol. 47, no. 2, pp. 329–333, 2015. View at Publisher · View at Google Scholar · View at Scopus
  63. H. Ding, E.-Z. Deng, L.-F. Yuan et al., “ICTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels,” BioMed Research International, vol. 2014, Article ID 286419, 10 pages, 2014. View at Publisher · View at Google Scholar · View at Scopus
  64. H. Ding, P.-M. Feng, W. Chen, and H. Lin, “Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis,” Molecular BioSystems, vol. 10, no. 8, pp. 2229–2235, 2014. View at Publisher · View at Google Scholar · View at Scopus
  65. G. M. Weiss, “Mining with rarity: a unifying framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7–19, 2004. View at Publisher · View at Google Scholar