Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014 (2014), Article ID 327306, 14 pages
http://dx.doi.org/10.1155/2014/327306
Research Article

Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species

1Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan
2School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
3Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
4Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan

Received 31 August 2013; Accepted 10 October 2013; Published 29 January 2014

Academic Editors: L. Bao and J. Wang

Copyright © 2014 Wen-Lin Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. J. Bradley, D. Johnson, and D. Rubenstein, Lecture Notes on Molecular Medicine, Blackwell Science, London, UK, 2005.
  2. M. M. Yin and J. T. L. Wang, “GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences,” Information Sciences, vol. 163, no. 1–3, pp. 201–218, 2004. View at Publisher · View at Google Scholar · View at Scopus
  3. J. Zeng, S. Zhu, and H. Yan, “Towards accurate human promoter recognition: a review of currently used sequence features and classification methods,” Briefings in Bioinformatics, vol. 10, pp. 498–508, 2009. View at Publisher · View at Google Scholar
  4. V. Rangannan and M. Bansal, “High-quality annotation of promoter regions for 913 bacterial genomes,” Bioinformatics, vol. 26, no. 24, pp. 3043–3050, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. M. Scherf, A. Klingenhoff, and T. Werner, “Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach,” Journal of Molecular Biology, vol. 297, no. 3, pp. 599–606, 2000. View at Publisher · View at Google Scholar · View at Scopus
  6. J.-Y. Yang, Y. Zhou, Z.-G. Yu, V. Anh, and L.-Q. Zhou, “Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides,” BMC Bioinformatics, vol. 9, article 113, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. K. Song, “Recognition of prokaryotic promoters based on a novel variable-window Z-curve method,” Nucleic Acids Research, vol. 40, no. 3, pp. 963–971, 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. X. Xie, S. Wu, K.-M. Lam, and H. Yan, “PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm,” Bioinformatics, vol. 22, no. 22, pp. 2722–2728, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Wang and S. Hannenhalli, “A mammalian promoter model links cis elements to genetic networks,” Biochemical and Biophysical Research Communications, vol. 347, no. 1, pp. 166–177, 2006. View at Publisher · View at Google Scholar · View at Scopus
  10. S. Wu, X. Xie, A. W.-C. Liew, and H. Yan, “Eukaryotic promoter prediction based on relative entropy and positional information,” Physical Review E, vol. 75, no. 4, Article ID 041908, 2007. View at Publisher · View at Google Scholar · View at Scopus
  11. N. I. Gershenzon and I. P. Ioshikhes, “Synergy of human Pol II core promoter elements revealed by statistical sequence analysis,” Bioinformatics, vol. 21, no. 8, pp. 1295–1300, 2005. View at Publisher · View at Google Scholar · View at Scopus
  12. V. B. Bajic and V. Brusic, “Computational detection of vertebrate RNA polymerase II promoters,” in Rna Polymerases and Associated Factors, pp. 237–250, Academic Press, San Diego, Calif, USA, 2003. View at Google Scholar
  13. M. Hackenberg, C. Previti, P. L. Luque-Escamilla, P. Carpena, J. Martínez-Aroza, and J. L. Oliver, “CpGcluster: a distance-based algorithm for CpG-island detection,” BMC Bioinformatics, vol. 7, article 446, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. L. Ponger and D. Mouchiroud, “CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences,” Bioinformatics, vol. 18, no. 4, pp. 631–633, 2002. View at Google Scholar · View at Scopus
  15. V. B. Bajic, S. H. Seah, A. Chong, G. Zhang, J. L. Y. Koh, and V. Brusic, “Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters,” Bioinformatics, vol. 18, no. 1, pp. 198–199, 2002. View at Google Scholar · View at Scopus
  16. S. Sonnenburg, A. Zien, and G. Rätsch, “ARTS: accurate recognition of transcription starts in human,” Bioinformatics, vol. 22, no. 14, pp. e472–e480, 2006. View at Publisher · View at Google Scholar · View at Scopus
  17. NNP 2.2, http://www.fruitfly.org/seq_tools/promoter.html.
  18. R. V. Davuluri, I. Grosse, and M. Q. Zhang, “Computational identification of promoters and first exons in the human genome,” Nature Genetics, vol. 29, no. 3, pp. 412–417, 2002. View at Publisher · View at Google Scholar · View at Scopus
  19. S. Knudsen, “Promoter2.0: for the recognition of PolII promoter sequences,” Bioinformatics, vol. 15, no. 5, pp. 356–361, 1999. View at Publisher · View at Google Scholar · View at Scopus
  20. M. Q. Zhang, “Identification of human gene core promoters in silico,” Genome Research, vol. 8, no. 3, pp. 319–326, 1998. View at Google Scholar · View at Scopus
  21. C. Y. Lim, B. Santoso, T. Boulay, E. Dong, U. Ohler, and J. T. Kadonaga, “The MTE, a new core promoter element for transcription by RNA poymerase II,” Genes and Development, vol. 18, no. 13, pp. 1606–1617, 2004. View at Publisher · View at Google Scholar · View at Scopus
  22. X. Wang, Z. Xuan, X. Zhao, Y. Li, and M. Q. Zhang, “High-resolution human core-promoter prediction with CoreBoost-HM,” Genome Research, vol. 19, no. 2, pp. 266–275, 2009. View at Publisher · View at Google Scholar · View at Scopus
  23. X. Zhao, Z. Xuan, and M. Q. Zhang, “Boosting with stumps for predicting transcription start sites,” Genome Biology, vol. 8, no. 2, article R17, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. J. W. Fickett and A. G. Hatzigeorgiou, “Eukaryotic promoter recognition,” Genome Research, vol. 7, no. 9, pp. 861–878, 1997. View at Google Scholar · View at Scopus
  25. W. Deng and S. G. E. Roberts, “A core promoter element downstream of the TATA box that is recognized by TFIIB,” Genes and Development, vol. 19, no. 20, pp. 2418–2423, 2005. View at Publisher · View at Google Scholar · View at Scopus
  26. K. Florquin, Y. Saeys, S. Degroeve, P. Rouzé, and Y. van de Peer, “Large-scale structural analysis of the core promoter in mammalian and plant genomes,” Nucleic Acids Research, vol. 33, no. 13, pp. 4255–4264, 2005. View at Publisher · View at Google Scholar · View at Scopus
  27. S. P. Pandey and A. Krishnamachari, “Computational analysis of plant RNA Pol-II promoters,” BioSystems, vol. 83, no. 1, pp. 38–50, 2006. View at Publisher · View at Google Scholar · View at Scopus
  28. T. Abeel, Y. Saeys, E. Bonnet, P. Rouzé, and Y. van de Peer, “Generic eukaryotic core promoter prediction using structural features of DNA,” Genome Research, vol. 18, no. 2, pp. 310–323, 2008. View at Publisher · View at Google Scholar · View at Scopus
  29. U. Ohler, H. Niemann, G.-C. Liao, and G. M. Rubin, “Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition,” Bioinformatics, vol. 17, no. 1, pp. S199–S206, 2001. View at Google Scholar · View at Scopus
  30. T. Abeel, Y. Saeys, P. Rouzé, and Y. van de Peer, “ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles,” Bioinformatics, vol. 24, no. 13, pp. i24–i31, 2008. View at Publisher · View at Google Scholar · View at Scopus
  31. P. S. Ho, G. Zhou, and L. B. Clark, “Polarized electronic spectra of Z-DNA single crystals,” Biopolymers, vol. 30, no. 1-2, pp. 151–163, 1990. View at Publisher · View at Google Scholar · View at Scopus
  32. J. A. Greenbaum, B. Pang, and T. D. Tullius, “Construction of a genome-scale structural map at single-nucleotide resolution,” Genome Research, vol. 17, no. 6, pp. 947–953, 2007. View at Publisher · View at Google Scholar · View at Scopus
  33. K.-J. Won, I. Chepelev, B. Ren, and W. Wang, “Prediction of regulatory elements in mammalian genomes using chromatin signatures,” BMC Bioinformatics, vol. 9, article 547, 2008. View at Publisher · View at Google Scholar · View at Scopus
  34. T. A. Down and T. J. P. Hubbard, “Computational detection and location of transcription start sites in mammalian genomic DNA,” Genome Research, vol. 12, no. 3, pp. 458–461, 2002. View at Publisher · View at Google Scholar · View at Scopus
  35. I. A. Shahmuradov, V. V. Solovyev, and A. J. Gammerman, “Plant promoter prediction with confidence estimation,” Nucleic Acids Research, vol. 33, no. 3, pp. 1069–1076, 2005. View at Publisher · View at Google Scholar · View at Scopus
  36. L. R. Cardon and G. D. Stormo, “Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments,” Journal of Molecular Biology, vol. 223, no. 1, pp. 159–170, 1992. View at Publisher · View at Google Scholar · View at Scopus
  37. M. G. Reese, “Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome,” Computers and Chemistry, vol. 26, no. 1, pp. 51–56, 2001. View at Publisher · View at Google Scholar · View at Scopus
  38. R. Gangal and P. Sharma, “Human pol II promoter prediction: time series descriptors and machine learning,” Nucleic Acids Research, vol. 33, no. 4, pp. 1332–1336, 2005. View at Publisher · View at Google Scholar · View at Scopus
  39. F. Anwar, S. M. Baker, T. Jabid et al., “Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach,” BMC Bioinformatics, vol. 9, article 414, 2008. View at Publisher · View at Google Scholar · View at Scopus
  40. K. Polat and S. Güneş, “A new method to forecast of Escherichia coli promoter gene sequences: Integrating feature selection and Fuzzy-AIRS classifier system,” Expert Systems with Applications, vol. 36, no. 1, pp. 57–64, 2009. View at Publisher · View at Google Scholar · View at Scopus
  41. Y. Gan, J. Guan, and S. Zhou, “A comparison study on feature selection of DNA structural properties for promoter prediction,” BMC Bioinformatics, vol. 13, no. 1, article 4, 2012. View at Publisher · View at Google Scholar · View at Scopus
  42. W.-L. Huang, C.-W. Tung, and S.-Y. Ho, “Human Pol II promoter prediction by using nucleotide property composition features,” in Proceedings of the International Symposium on Biocomputing (ISB '10), ACM, New York, NY, USA, Kerala, India, February 2010. View at Publisher · View at Google Scholar · View at Scopus
  43. Nucleotide properties, http://www.geneinfinity.org/sp/sp_dnaprop.html.
  44. D. Onidas, D. Markovitsi, S. Marguet, A. Sharonov, and T. Gustavsson, “Fluorescence properties of DNA nucleosides and nucleotides: a refined steady-state and femtosecond investigation,” The Journal of Physical Chemistry B, vol. 106, no. 43, pp. 11367–11374, 2002. View at Publisher · View at Google Scholar · View at Scopus
  45. H.-L. Huang, I.-C. Lin, Y.-F. Liou et al., “Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties,” BMC Bioinformatics, vol. 12, no. 1, article S47, 2011. View at Publisher · View at Google Scholar · View at Scopus
  46. W. L. Huang, “Ranking gene ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes,” Journal of Theoretical Biology, vol. 312, pp. 105–113, 2012. View at Google Scholar
  47. S.-Y. Ho, J.-H. Chen, and M.-H. Huang, “Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 34, no. 1, pp. 609–620, 2004. View at Publisher · View at Google Scholar · View at Scopus
  48. W.-L. Huang, C.-W. Tung, S.-W. Ho, S.-F. Hwang, and S.-Y. Ho, “ProLoc-GO: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization,” BMC Bioinformatics, vol. 9, article 80, 2008. View at Publisher · View at Google Scholar · View at Scopus
  49. R. Dreos, G. Ambrosini, R. Cavin Périer, and P. Bucher, “EPD and EPD new, high-quality promoter resources in the next-generation sequencing era,” Nucleic Acids Research, vol. 41, pp. D157–D164, 2013. View at Google Scholar
  50. J. R. Quinlan, “C5.0 online tutorial,” 2003, http://www.rulequest.com.
  51. U. Ohler, H. Niemann, G.-C. Liao, and G. M. Rubin, “Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition,” Bioinformatics, vol. 17, no. 1, pp. S199–S206, 2001. View at Google Scholar · View at Scopus
  52. S. S. Gross and M. R. Brent, “Using multiple alignments to improve gene prediction,” in Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology (RECOMB '05), pp. 379–393, Mary Ann Liebert, Cambridge, Mass, USA, 2005.
  53. V. Rangannan and M. Bansal, “High-quality annotation of promoter regions for 913 bacterial genomes,” Bioinformatics, vol. 26, no. 24, pp. 3043–3050, 2010. View at Publisher · View at Google Scholar · View at Scopus
  54. M. Scherf, A. Klingenhoff, K. Frech et al., “First pass annotation of promoters on human chromosome 22,” Genome Research, vol. 11, no. 3, pp. 333–340, 2001. View at Publisher · View at Google Scholar · View at Scopus
  55. D. S. Prestridge, “Predicting Pol II promoter sequences using transcription factor binding sites,” Journal of Molecular Biology, vol. 249, no. 5, pp. 923–932, 1995. View at Publisher · View at Google Scholar · View at Scopus
  56. TSSW, http://linux1.softberry.com/berry.phtml.
  57. S. Varma and R. Simon, “Bias in error estimation when using cross-validation for model selection,” BMC Bioinformatics, vol. 7, article 91, 2006. View at Publisher · View at Google Scholar · View at Scopus
  58. D. Restrepo-Montoya, C. Pino, L. F. Nino, M. E. Patarroyo, and M. A. Patarroyo, “NClassG+: a classifier for non-classically secreted Gram-positive bacterial proteins,” BMC Bioinformatics, vol. 12, article 21, 2011. View at Publisher · View at Google Scholar · View at Scopus
  59. W.-L. Huang, C.-W. Tung, H.-L. Huang, S.-F. Hwang, and S.-Y. Ho, “ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features,” BioSystems, vol. 90, no. 2, pp. 573–581, 2007. View at Publisher · View at Google Scholar · View at Scopus
  60. Z. Zhang, S. Kochhar, and M. G. Grigorov, “Descriptor-based protein remote homology identification,” Protein Science, vol. 14, no. 2, pp. 431–444, 2005. View at Publisher · View at Google Scholar · View at Scopus
  61. S.-Y. Ho, L.-S. Shu, and J.-H. Chen, “Intelligent evolutionary algorithms for large parameter optimization problems,” IEEE Transactions on Evolutionary Computation, vol. 8, no. 6, pp. 522–541, 2004. View at Publisher · View at Google Scholar · View at Scopus
  62. C. C. Chang and C. J. Lin, “LIBSVM : a library for support vector machines,” 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  63. K.-C. Chou and C.-T. Zhang, “Prediction of protein structural classes,” Critical Reviews in Biochemistry and Molecular Biology, vol. 30, no. 4, pp. 275–349, 1995. View at Google Scholar · View at Scopus
  64. K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of Theoretical Biology, vol. 273, no. 1, pp. 236–247, 2011. View at Publisher · View at Google Scholar · View at Scopus
  65. C.-H. Su, N. R. Pal, K.-L. Lin, and I.-F. Chung, “Identification of amino acid propensities that are strong determinants of linear B-cell epitope using neural networks,” PLoS ONE, vol. 7, no. 2, Article ID e30617, 2012. View at Publisher · View at Google Scholar · View at Scopus
  66. V. B. Bajic, L. T. Sin, Y. Suzuki, and S. Sugano, “Promoter prediction analysis on the whole human genome,” Nature Biotechnology, vol. 22, no. 11, pp. 1467–1473, 2004. View at Publisher · View at Google Scholar · View at Scopus
  67. B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochimica et Biophysica Acta, vol. 405, no. 2, pp. 442–451, 1975. View at Google Scholar · View at Scopus
  68. T. Li, C. Zhang, and M. Ogihara, “A comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression,” Bioinformatics, vol. 20, no. 15, pp. 2429–2437, 2004. View at Publisher · View at Google Scholar · View at Scopus