About this Journal Submit a Manuscript Table of Contents
Advances in Artificial Intelligence
Volume 2009 (2009), Article ID 219743, 11 pages
http://dx.doi.org/10.1155/2009/219743
Research Article

Bayesian Unsupervised Learning of DNA Regulatory Binding Regions

1Department of Mathematics, Åbo Akademi University, 20500 Turku, Finland
2Department of Mathematics, University of Linköping, 58183 Linköping, Sweden
3Department of Mathematics, The Royal Institute of Technology, 100 44 Stockholm, Sweden

Received 13 February 2009; Revised 6 June 2009; Accepted 2 July 2009

Academic Editor: Djamel Bouchaffra

Copyright © 2009 Jukka Corander et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. T. Werner, “Models for prediction and recognition of eukaryotic promoters,” Mammalian Genome, vol. 10, no. 2, pp. 168–175, 1999. View at Publisher · View at Google Scholar
  2. E. Eskin and P. A. Pevzner, “Finding composite regulatory patterns in DNA sequences,” Bioinformatics, vol. 18, supplement 1, pp. S354–363, 2002.
  3. L. Marsan and M.-F. Sagot, “Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification,” Journal of Computational Biology, vol. 7, no. 3-4, pp. 345–362, 2000. View at Publisher · View at Google Scholar
  4. U. Ohler and H. Niemann, “Identification and analysis of eukaryotic promoters: recent computational approaches,” Trends in Genetics, vol. 17, no. 2, pp. 56–60, 2001. View at Publisher · View at Google Scholar
  5. J. S. Liu, A. F. Neuwald, and C. E. Lawrence, “Bayesian models for multiple local sequence alignment and Gibbs sampling strategies,” Journal of the American Statistical Association, vol. 90, pp. 1156–1170, 1995.
  6. W. Thompson, E. C. Rouchka, and C. E. Lawrence, “Gibbs recursive sampler: finding transcription factor binding sites,” Nucleic Acids Research, vol. 31, no. 13, pp. 3580–3585, 2003. View at Publisher · View at Google Scholar
  7. M. Gupta and J. S. Liu, “Discovery of conserved sequence patterns using a stochastic dictionary model,” Journal of the American Statistical Association, vol. 98, no. 461, pp. 55–66, 2003. View at Publisher · View at Google Scholar
  8. S. T. Jensen, X. S. Liu, Q. Zhou, and J. S. Liu, “Computational discovery of gene regulatory binding motifs: a Bayesian perspective,” Statistical Science, vol. 19, no. 1, pp. 188–204, 2004. View at Publisher · View at Google Scholar
  9. S. T. Jensen and J. S. Liu, “BioOptimizer: a Bayesian scoring function approach to motif discovery,” Bioinformatics, vol. 20, no. 10, pp. 1557–1564, 2004. View at Publisher · View at Google Scholar
  10. E. P. Xing, W. Wu, M. I. Jordan, and R. M. Karp, “Logos: a modular Bayesian model for de novo motif detection,” Journal of Bioinformatics and Computational Biology, vol. 2, no. 1, pp. 127–154, 2004. View at Publisher · View at Google Scholar
  11. I. Ben-Gal, A. Shani, A. Gohr, et al., “Identification of transcription factor binding sites with variable-order Bayesian networks,” Bioinformatics, vol. 21, no. 11, pp. 2657–2666, 2005. View at Publisher · View at Google Scholar
  12. L. Hertzberg, O. Zuk, G. Getz, and E. Domany, “Finding motifs in promoter regions,” Journal of Computational Biology, vol. 12, no. 3, pp. 314–330, 2005. View at Publisher · View at Google Scholar
  13. Y. Barash, G. Elidan, N. Friedman, and T. Kaplan, “Modeling dependencies in protein-DNA binding sites,” in Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB '03), pp. 28–37, ACM Press, Berlin, Germany, April 2003.
  14. X. Zhang, H. Huang, M. Li, and T. Speed, “Finding short DNA motifs using permuted Markov models,” Bioinformatics, vol. 21, pp. 894–906, 2005.
  15. S. M. Li, J. Wakefield, and S. Self, “A transdimensional Bayesian model for pattern recognition in DNA sequences,” Biostatistics, vol. 9, no. 4, pp. 668–685, 2008. View at Publisher · View at Google Scholar
  16. J. Hawkins, C. Grant, W. S. Noble, and T. L. Bailey, “Assessing phylogenetic motif models for predicting transcription factor binding sites,” Bioinformatics, vol. 25, no. 12, pp. i339–i347, 2009. View at Publisher · View at Google Scholar
  17. T. Marschall and S. Rahmann, “Efficient exact motif discovery,” Bioinformatics, vol. 25, no. 12, pp. i356–i364, 2009. View at Publisher · View at Google Scholar
  18. P. Bühlmann and A. J. Wyner, “Variable length Markov chains,” Annals of Statistics, vol. 27, no. 2, pp. 480–513, 1999.
  19. M. Mächler and P. Bühlmann, “Variable length Markov chains: methodology, computing, and software,” Journal of Computational and Graphical Statistics, vol. 13, no. 2, pp. 435–455, 2004. View at Publisher · View at Google Scholar
  20. J. Rissanen, “A universal data compression system,” IEEE Transactions on Information Theory, vol. 29, no. 5, pp. 656–664, 1983.
  21. I. Abnizova and W. R. Gilks, “Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes,” Briefings in Bioinformatics, vol. 7, no. 1, pp. 48–54, 2006. View at Publisher · View at Google Scholar
  22. M. C. Frith, Y. Fu, L. Yu, J.-F. Chen, U. Hansen, and Z. Weng, “Detection of functional DNA motifs via statistical over-representation,” Nucleic Acids Research, vol. 32, no. 4, pp. 1372–1381, 2004. View at Publisher · View at Google Scholar
  23. J. Zhang, B. Jiang, M. Li, J. Tromp, X. Zhang, and M. Q. Zhang, “Computing exact P-values for DNA motifs,” Bioinformatics, vol. 23, no. 5, pp. 531–537, 2007. View at Publisher · View at Google Scholar
  24. C. P. Robert and G. Casella, Monte Carlo Statistical Methods, Springer, New York, NY, USA, 1999.
  25. P. Green, “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination,” Biometrika, vol. 82, pp. 711–732, 1995.
  26. J. Corander, M. Gyllenberg, and T. Koski, “Bayesian model learning based on a parallel MCMC strategy,” Statistics and Computing, vol. 16, no. 4, pp. 355–362, 2006. View at Publisher · View at Google Scholar
  27. E. E. Stückle, C. Emmrich, U. Grob, and P. J. Nielsen, “Statistical analysis of nucleotide sequences,” Nucleic Acids Research, vol. 18, no. 22, pp. 6641–6647, 1990.
  28. B. Ron, Y. Singer, and N. Tishby, “The power of amnesia: learning of probabilistic automata with variable memory lengths,” Machine Learning, vol. 25, pp. 117–149, 1996.
  29. M. Régnier, “A unified approach to word occurrence probabilities,” Discrete Applied Mathematics, vol. 104, pp. 259–280, 2000.
  30. T. Erhardsson, “Compound Poisson approximation for Markov chains using Stein's method,” Annals of Probability, vol. 27, no. 1, pp. 565–596, 1999.
  31. T. Erhardsson, “Compound Poisson approximation for counts of rare patterns in Markov chains and extreme sojourns in birth-death chains,” Annals of Applied Probability, vol. 10, no. 2, pp. 573–591, 2000.
  32. J. L. Thorne, H. Kishino, and J. Felsenstein, “Inching towards reality: an improved likelihood model for sequence evolution,” Journal of Molecular Evolution, vol. 34, pp. 3–16, 1992.
  33. J. Corander, M. Gyllenberg, and T. Koski, “Random partition models and exchangeability for bayesian identification of population structure,” Bulletin of Mathematical Biology, vol. 69, no. 3, pp. 797–815, 2007. View at Publisher · View at Google Scholar
  34. D. Geiger and D. Heckerman, “A characterization of the Dirichlet distribution through global and local parameter independence,” Annals of Statistics, vol. 25, no. 3, pp. 1344–1369, 1997.
  35. P. Marttinen, J. Corander, P. Törönen, and L. Holm, “Bayesian search of functionally divergent protein subgroups and their function specific residues,” Bioinformatics, vol. 22, no. 20, pp. 2466–2474, 2006. View at Publisher · View at Google Scholar
  36. J. Zhu and M. Q. Zhang, “SCPD: a promoter database of the yeast Saccharomyces cerevisiae,” Bioinformatics, vol. 15, no. 7-8, pp. 607–611, 1999.
  37. B. G. Mirkin and L. B. Chernyi, “Measurement of the distance between distinct partitions of a finite set of objects,” Automation and Remote Control, vol. 31, pp. 786–792, 1970.
  38. L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, no. 1, pp. 193–218, 1985. View at Publisher · View at Google Scholar
  39. S. Sinha and M. Tompa, “Discovery of novel transcription factor binding sites by statistical overrepresentation,” Nucleic Acids Research, vol. 30, no. 24, pp. 5549–5560, 2002. View at Publisher · View at Google Scholar
  40. G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole, “Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes,” Nucleic Acids Research, vol. 32, web server issue, pp. W199–W203, 2004. View at Publisher · View at Google Scholar
  41. M. Tompa, N. Li, T. L. Bailey, et al., “Assessing computational tools for the discovery of transcription factor binding sites,” Nature Biotechnology, vol. 23, no. 1, pp. 137–144, 2005. View at Publisher · View at Google Scholar