Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2018 (2018), Article ID 9364182, 10 pages
https://doi.org/10.1155/2018/9364182
Research Article

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

1School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, Weihai 264209, China
2School of Control Science and Engineering, Shandong University, Jinan 250061, China

Correspondence should be addressed to Chengjin Zhang; nc.ude.uds@gnahzjc

Received 11 October 2017; Revised 25 December 2017; Accepted 26 December 2017; Published 7 February 2018

Academic Editor: Rosaria Scudiero

Copyright © 2018 Runtao Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. S.-Y. Jiang, Z. Ma, and S. Ramachandran, “Evolutionary history and stress regulation of the lectin superfamily in higher plants,” BMC Evolutionary Biology, vol. 10, no. 1683, pp. 1–24, 2010. View at Publisher · View at Google Scholar · View at Scopus
  2. N. Sharon, “Lectins: carbohydrate-specific reagents and biological recognition molecules,” The Journal of Biological Chemistry, vol. 282, no. 5, pp. 2753–2764, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. G. Vasta and H. Ahmed, Animal Lectins: A Functional View, CRC Press, Boca Raton, Florida, 1st edition, 2008. View at Publisher · View at Google Scholar
  4. G. R. Vasta, H. Ahmed, and E. W. Odom, “Structural and functional diversity of lectin repertoires in invertebrates, protochordates and ectothermic vertebrates,” Current Opinion in Structural Biology, vol. 14, no. 5, pp. 617–630, 2004. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Hu and D. T. Wong, “Lectin microarray,” Proteomics - Clinical Applications, vol. 3, no. 2, pp. 148–154, 2009. View at Publisher · View at Google Scholar · View at Scopus
  6. N. Sharon and H. Lis, “Lectins as cell recognition molecules,” Science, vol. 246, no. 4927, pp. 227–234, 1989. View at Publisher · View at Google Scholar · View at Scopus
  7. D. Hu, H. Tateno, and J. Hirabayashi, “Lectin engineering, a molecular evolutionary approach to expanding the lectin utilities,” Molecules, vol. 20, no. 5, pp. 7637–7656, 2015. View at Publisher · View at Google Scholar · View at Scopus
  8. K. L. Abbott and J. M. Pierce, “Lectin-based glycoproteomic techniques for the enrichment and identification of potential biomarkers,” Methods in Enzymology, vol. 480, no. C, pp. 461–476, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. V. Lavanya, A. Mohamed Adil, N. Ahmed, and S. Jamal, “Lectins-the promising cancer therapeutics,” Oncobiology and Targets, vol. 1, no. 1, pp. 12–15, 2014. View at Publisher · View at Google Scholar
  10. E. G. De Mejía and V. I. Prisecaru, “Lectins as bioactive plant proteins: a potential in cancer treatment,” Critical Reviews in Food Science and Nutrition, vol. 45, no. 6, pp. 425–445, 2005. View at Publisher · View at Google Scholar · View at Scopus
  11. Y. S. Chan and T. B. Ng, “A lectin with highly potent inhibitory activity toward breast cancer cells from edible tubers of Dioscorea opposita cv. nagaimo,” PLoS ONE, vol. 8, no. 1, Article ID e54212, 2013. View at Publisher · View at Google Scholar · View at Scopus
  12. M. D. Swanson, H. C. Winter, I. J. Goldstein, and D. M. Markovitz, “A lectin isolated from bananas is a potent inhibitor of HIV replication,” The Journal of Biological Chemistry, vol. 285, no. 12, pp. 8646–8655, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. C. Miller, S. Wilgenbusch, M. Michael, D. S. Chi, G. Youngberg, and G. Krishnaswamy, “Molecular defects in the mannose binding lectin pathway in dermatological disease: case report and literature review,” Clinical and Molecular Allergy, vol. 8, no. 1, pp. 1–9, 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. Z. Shi, N. An, S. Zhao, X. Li, J. K. Bao, and B. S. Yue, “In silico analysis of molecular mechanisms of legume lectin-induced apoptosis in cancer cells,” Cell Proliferation, vol. 46, no. 1, pp. 86–96, 2013. View at Publisher · View at Google Scholar · View at Scopus
  15. S. H. Choi, Y. L. Su, and B. P. Won, “Mistletoe lectin induces apoptosis and telomerase inhibition in human A253 cancer cells through dephosphorylation of Akt,” Archives of Pharmacal Research, vol. 27, no. 1, pp. 68–76, 2004. View at Publisher · View at Google Scholar · View at Scopus
  16. F.-T. Liu and G. A. Rabinovich, “Galectins as modulators of tumour progression,” Nature Reviews Cancer, vol. 5, no. 1, Article ID 036206, pp. 29–41, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Gomez-Brouchet, F. Mourcin, P.-A. Gourraud et al., “Galectin-1 is a powerful marker to distinguish chondroblastic osteosarcoma and conventional chondrosarcoma,” Human Pathology, vol. 41, no. 9, pp. 1220–1230, 2010. View at Publisher · View at Google Scholar · View at Scopus
  18. G. Canesin, P. Gonzalez-Peramato, J. Palou, M. Urrutia, C. Cordón-Cardo, and M. Sánchez-Carbayo, “Galectin-3 expression is associated with bladder cancer progression and clinical outcome,” Tumor Biology, vol. 31, no. 4, pp. 277–285, 2010. View at Publisher · View at Google Scholar · View at Scopus
  19. S. F. Altschul, T. L. Madden, A. A. Schäffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. View at Publisher · View at Google Scholar · View at Scopus
  20. R. Kumar, B. Panwar, J. S. Chauhan, and G. P. Raghava, “Analysis and prediction of cancerlectins using evolutionary and domain information,” BMC Research Notes, vol. 4, no. 1, pp. 1–9, 2011. View at Publisher · View at Google Scholar · View at Scopus
  21. R. LOTAN and A. RAZ, “Lectins in cancer cells,” Annals of the New York Academy of Sciences, vol. 551, pp. 385–398, 1988. View at Publisher · View at Google Scholar
  22. H. Lin, W. X. Liu, J. He, X. H. Liu, H. Ding, and W. Chen, “Predicting cancerlectins by the optimal g-gap dipeptides,” Scientific Reports, vol. 5, Article ID 16964, 2015. View at Publisher · View at Google Scholar
  23. G. S. Han, Z. G. Yu, V. Anh, A. P. D. Krishnajith, and Y.-C. Tian, “An ensemble method for predicting subnuclear localizations from primary protein structures,” PLoS ONE, vol. 8, no. 2, Article ID e57225, 2013. View at Publisher · View at Google Scholar · View at Scopus
  24. Y. Huang, B. Niu, Y. Gao, L. Fu, and W. Li, “CD-HIT Suite: a web server for clustering and comparing biological sequences,” Bioinformatics, vol. 26, no. 5, Article ID btq003, pp. 680–682, 2010. View at Publisher · View at Google Scholar · View at Scopus
  25. D. Damodaran, J. Jeyakani, A. Chauhan, N. Kumar, N. R. Chandra, and A. Surolia, “CancerLectinDB: adatabase of lectins relevant to cancer,” Glycoconjugate Journal, vol. 25, no. 3, pp. 191–198, 2008. View at Publisher · View at Google Scholar · View at Scopus
  26. S. Pérez, A. Sarkar, A. Rivet, C. Breton, and A. Imberty, “Glyco3d: a portal for structural glycosciences,” Methods in Molecular Biology, vol. 1273, pp. 241–258, 2015. View at Publisher · View at Google Scholar · View at Scopus
  27. Y.-N. Zhang, D.-J. Yu, S.-S. Li, Y.-X. Fan, Y. Huang, and H.-B. Shen, “Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features,” BMC Bioinformatics, vol. 13, no. 118, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. I. Dubchak, I. Muchnik, S. R. Holbrook, and S. Kim, “Prediction of protein folding class using global description of amino acid sequence,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 92, no. 19, pp. 8700–8704, 1995. View at Publisher · View at Google Scholar · View at Scopus
  29. K.-C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins: Structure, Function, and Genetics, vol. 43, no. 3, pp. 246–255, 2001. View at Publisher · View at Google Scholar · View at Scopus
  30. N. Xiaohui, L. Nana, X. Jingbo et al., “Using the concept of Chou's pseudo amino acid composition to predict protein solubility: an approach with entropies in information theory,” Journal of Theoretical Biology, vol. 332, pp. 211–217, 2013. View at Publisher · View at Google Scholar · View at Scopus
  31. J. Hu and X. Yan, “BS-KNN: An effective algorithm for predicting protein subchloroplast localization,” Evolutionary Bioinformatics, vol. 2012, no. 7, pp. 79–87, 2012. View at Publisher · View at Google Scholar · View at Scopus
  32. P. Wang, L. Hu, G. Liu et al., “Prediction of antimicrobial peptides based on sequence alignment and feature selection methods,” PLoS ONE, vol. 6, no. 4, Article ID e18476, 2011. View at Publisher · View at Google Scholar · View at Scopus
  33. X. Zhao, X. Li, Z. Ma, and M. Yin, “Prediction of lysine ubiquitylation with ensemble classifier and feature selection,” International Journal of Molecular Sciences, vol. 12, no. 12, pp. 8347–8361, 2011. View at Publisher · View at Google Scholar · View at Scopus
  34. C. N. Magnan, A. Randall, and P. Baldi, “SOLpro: Accurate sequence-based prediction of protein solubility,” Bioinformatics, vol. 25, no. 17, pp. 2200–2207, 2009. View at Publisher · View at Google Scholar · View at Scopus
  35. S. Mondal and P. P. Pai, “Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction,” Journal of Theoretical Biology, vol. 356, pp. 30–35, 2014. View at Publisher · View at Google Scholar
  36. J. A. Capra and M. Singh, “Predicting functionally important residues from sequence conservation,” Bioinformatics, vol. 23, no. 15, pp. 1875–1882, 2007. View at Publisher · View at Google Scholar · View at Scopus
  37. A. A. Schäffer, L. Aravind, T. L. Madden et al., “Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements,” Nucleic Acids Research, vol. 29, no. 14, pp. 2994–3005, 2001. View at Publisher · View at Google Scholar · View at Scopus
  38. S. Wold, J. Jonsson, M. Sjöström, M. Sandberg, and S. Rännar, “DNA and peptide sequences and chemical processes mutlivariately modelled by principal component analysis and partial least-squares projections to latent structures,” Analytica Chimica Acta, vol. 277, no. 2, pp. 239–253, 1993. View at Publisher · View at Google Scholar
  39. B. He, K. Wang, Y. Liu, B. Xue, V. N. Uversky, and A. K. Dunker, “Predicting intrinsic disorder in proteins: an overview,” Cell Research, vol. 19, no. 8, pp. 929–949, 2009. View at Publisher · View at Google Scholar · View at Scopus
  40. H. J. Dyson and P. E. Wright, “Intrinsically unstructured proteins and their functions,” Nature Reviews Molecular Cell Biology, vol. 6, no. 3, pp. 197–208, 2005. View at Publisher · View at Google Scholar · View at Scopus
  41. S. Niu, L. Hu, L. Zheng et al., “Predicting protein oxidation sites with feature selection and analysis approach,” Journal of Biomolecular Structure and Dynamics, vol. 29, no. 6, pp. 1154–1162, 2012. View at Publisher · View at Google Scholar
  42. Y. Dou, B. Yao, and C. Zhang, “PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine,” Amino Acids, vol. 46, no. 6, pp. 1459–1469, 2014. View at Publisher · View at Google Scholar · View at Scopus
  43. K. Peng, P. Radivojac, S. Vucetic, A. K. Dunker, and Z. Obradovic, “Length dependent prediction of protein intrinsic disorder,” BMC Bioinformatics, vol. 7, no. 1, pp. 1–17, 2006. View at Google Scholar
  44. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2011. View at Google Scholar
  45. Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007. View at Publisher · View at Google Scholar · View at Scopus
  46. K. Kira and LA. Rendell, “The feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 10th National Conference on Artificial Intelligence, pp. 129–134, San Jose, Calif, USA, 1992. View at Google Scholar
  47. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Scopus
  48. K. K. Kandaswamy, G. Pugalenthi, E. Hartmann et al., “SPRED: a machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes,” Biochemical and Biophysical Research Communications, vol. 391, no. 3, pp. 1306–1311, 2010. View at Publisher · View at Google Scholar · View at Scopus
  49. T. P. Mohamed, J. G. Carbonell, and M. K. Ganapathiraju, “Active learning for human protein-protein interaction prediction,” BMC Bioinformatics, vol. 11, no. 1, article no. S57, 2010. View at Publisher · View at Google Scholar · View at Scopus
  50. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, San Francisco Morgan Kaufmann, Elsevier, 2005.
  51. K.-C. Chou and C.-T. Zhang, “Prediction of protein structural classes,” Critical Reviews in Biochemistry and Molecular Biology, vol. 30, no. 4, pp. 275–349, 1995. View at Publisher · View at Google Scholar · View at Scopus
  52. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  53. P. M. Feng, W. Chen, H. Lin, and K. Chou, “iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition,” Analytical Biochemistry, vol. 442, no. 1, pp. 118–125, 2013. View at Publisher · View at Google Scholar · View at Scopus
  54. W. Chen, H. Yang, P. Feng, H. Ding, and H. Lin, “iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties,” Bioinformatics, vol. 33, no. 22, pp. 3518–3523, 2017. View at Publisher · View at Google Scholar
  55. W. Chen, H. Ding, P. Feng, H. Lin, and K. C. Chou, “iACP: a sequence-based tool for identifying anticancer peptides,” Oncotarget, vol. 7, no. 13, pp. 16895–16909, 2016. View at Publisher · View at Google Scholar
  56. W. Chen, P. Feng, H. Ding, H. Lin, and K.-C. Chou, “IRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition,” Analytical Biochemistry, vol. 490, pp. 26–33, 2015. View at Publisher · View at Google Scholar · View at Scopus
  57. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006. View at Publisher · View at Google Scholar · View at Scopus
  58. W. Chen, P.-M. Feng, E.-Z. Deng, H. Lin, and K.-C. Chou, “iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition,” Analytical Biochemistry, vol. 462, pp. 76–83, 2014. View at Publisher · View at Google Scholar
  59. W. Chen, P. M. Feng, H. Lin, and K. C. Chou, “iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition,” BioMed Research International, vol. 2014, Article ID 623149, 12 pages, 2014. View at Publisher · View at Google Scholar