Table of Contents
Advances in Artificial Intelligence
Volume 2009, Article ID 219743, 11 pages
Research Article

Bayesian Unsupervised Learning of DNA Regulatory Binding Regions

1Department of Mathematics, Åbo Akademi University, 20500 Turku, Finland
2Department of Mathematics, University of Linköping, 58183 Linköping, Sweden
3Department of Mathematics, The Royal Institute of Technology, 100 44 Stockholm, Sweden

Received 13 February 2009; Revised 6 June 2009; Accepted 2 July 2009

Academic Editor: Djamel Bouchaffra

Copyright © 2009 Jukka Corander et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Most approaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and their positions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.