About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2014 (2014), Article ID 753428, 10 pages
http://dx.doi.org/10.1155/2014/753428
Research Article

Integration of Residue Attributes for Sequence Diversity Characterization of Terpenoid Enzymes

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan

Received 1 November 2013; Accepted 21 February 2014; Published 11 May 2014

Academic Editor: Samuel Kuria Kiboi

Copyright © 2014 Nelson Kibinge et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. L. D. Stein, “Integrating biological databases,” Nature Reviews Genetics, vol. 4, no. 5, pp. 337–345, 2003. View at Publisher · View at Google Scholar · View at Scopus
  2. M. He and S. Petoukhov, Mathematics of Bioinformatics: Theory, Methods and Applications, vol. 19, John Wiley & Sons, 2011.
  3. L. Kong, Y. Zhang, Z.-Q. Ye et al., “CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine,” Nucleic acids research, vol. 35, supplement 2, pp. W345–W349, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. M. Vendruscolo and G. G. Tartaglia, “Towards quantitative predictions in cell biology using chemical properties of proteins,” Molecular BioSystems, vol. 4, no. 12, pp. 1170–1175, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Coghlan, D. A. Mac Dónaill, and N. H. Buttimore, “Representation of amino acids as five-bit or three-bit patterns for filtering protein databases,” Bioinformatics, vol. 17, no. 8, pp. 676–685, 2001. View at Scopus
  6. G. White and W. Seffens, “Using a neural network to backtranslate amino acid sequences,” Electronic Journal of Biotechnology, vol. 1, no. 3, pp. 17–18, 1998.
  7. S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 22, pp. 10915–10919, 1992. View at Publisher · View at Google Scholar · View at Scopus
  8. O. Weiss, M. A. Jiménez-Montaño, and H. Herzel, “Information content of protein sequences,” Journal of Theoretical Biology, vol. 206, no. 3, pp. 379–386, 2000. View at Publisher · View at Google Scholar · View at Scopus
  9. S. Kawashima and M. Kanehisa, “AAindex: amino acid index database,” Nucleic Acids Research, vol. 28, no. 1, p. 374, 2000. View at Scopus
  10. W. R. Atchley, J. Zhao, A. D. Fernandes, and T. Drüke, “Solving the protein sequence metric problem,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 18, pp. 6395–6400, 2005. View at Publisher · View at Google Scholar · View at Scopus
  11. W. J. Krzanowski, Principles of Multivariate Analysis, Oxford University Press, 2000.
  12. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at Publisher · View at Google Scholar · View at Scopus
  13. R. Grantham, “Amino acid difference formula to help explain protein evolution,” Science, vol. 185, no. 4154, pp. 862–864, 1974. View at Scopus
  14. A. L. Boulesteix, S. Janitza, J. Kruppa, and I. R. Konig, “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 6, pp. 493–507, 2012.
  15. V. Svetnik, A. Liaw, C. Tong, J. Christopher Culberson, R. P. Sheridan, and B. P. Feuston, “Random forest: a classification and regression tool for compound classification and QSAR modeling,” Journal of Chemical Information and Computer Sciences, vol. 43, no. 6, pp. 1947–1958, 2003. View at Publisher · View at Google Scholar · View at Scopus
  16. R. Genuer, J.-M. Poggi, and C. Tuleau-Malot, “Variable selection using random forests,” Pattern Recognition Letters, vol. 31, no. 14, pp. 2225–2236, 2010. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Kanehisa and S. Goto, “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, vol. 28, no. 1, pp. 27–30, 2000. View at Scopus
  18. R. Caspi, H. Foerster, C. A. Fulcher et al., “The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases,” Nucleic Acids Research, vol. 36, no. 1, pp. D623–D631, 2008. View at Publisher · View at Google Scholar · View at Scopus
  19. Y. Shinbo, Y. Nakamura, M. Altaf-Ul-Amin et al., “KNApSAcK: a comprehensive species-metabolite relationship database,” in Plant Metabolomics, pp. 165–181, Springer, 2006.
  20. J. Bohlmann, G. Meyer-Gauen, and R. Croteau, “Plant terpenoid synthases: molecular biology and phylogenetic analysis,” Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 8, pp. 4126–4133, 1998. View at Publisher · View at Google Scholar · View at Scopus
  21. I. Jollie, Principal Component Analysis, Wiley Online Library, 2005.
  22. R. L. Tatusov, D. A. Natale, I. V. Garkavtsev et al., “The COG database: new developments in phylogenetic classification of proteins from complete genomes,” Nucleic Acids Research, vol. 29, no. 1, pp. 22–28, 2001. View at Scopus
  23. H.-L. Huang, I.-C. Lin, Y.-F. Liou et al., “Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties,” BMC Bioinformatics, vol. 12, no. 1, article S47, 2011. View at Publisher · View at Google Scholar · View at Scopus
  24. R. Díaz-Uriarte and S. Alvarez de Andrés, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, vol. 7, article 3, 2006. View at Publisher · View at Google Scholar · View at Scopus
  25. M. Sandri and P. Zuccolotto, “Variable selection using random forests,” in Data Analysis, Classification and the Forward Search, pp. 263–270, Springer, 2006.
  26. C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis, “Conditional variable importance for random forests,” BMC Bioinformatics, vol. 9, article 307, 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. P. Kline, An Easy Guide to Factor Analysis, Routledge, 1994.
  28. J. D. Connolly and R. A. Hill, Dictionary of Terpenoids. 1. Mono-and Sesquiterpenoids, vol. 1, CRC Press, 1991.
  29. J. Degenhardt, T. G. Köllner, and J. Gershenzon, “Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants,” Phytochemistry, vol. 70, no. 15-16, pp. 1621–1637, 2009. View at Publisher · View at Google Scholar · View at Scopus
  30. S. Ikeda, T. Abe, Y. Nakamura, et al., “Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK Motorcycle database,” Plant and Cell Physiology, vol. 54, no. 5, pp. 711–727, 2013.
  31. R. Staden, “Sequence data handling by computer,” Nucleic Acids Research, vol. 4, no. 11, pp. 4037–4051, 1977. View at Scopus
  32. D. C. Hyatt, B. Youn, Y. Zhao et al., “Structure of limonene synthase, a simple model for terpenoid cyclase catalysis,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 13, pp. 5360–5365, 2007. View at Publisher · View at Google Scholar · View at Scopus
  33. D. A. Nagegowda, M. Gutensohn, C. G. Wilkerson, and N. Dudareva, “Two nearly identical terpene synthases catalyze the formation of nerolidol and linalool in snapdragon flowers,” Plant Journal, vol. 55, no. 2, pp. 224–239, 2008. View at Publisher · View at Google Scholar · View at Scopus
  34. N. J. Nieuwenhuizen, M. Y. Wang, A. J. Matich et al., “Two terpene synthases are responsible for the major sesquiterpenes emitted from the flowers of kiwifruit (Actinidia deliciosa),” Journal of Experimental Botany, vol. 60, no. 11, pp. 3203–3219, 2009. View at Publisher · View at Google Scholar · View at Scopus