Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2010 (2010), Article ID 287070, 8 pages
Research Article

Testing the Coding Potential of Conserved Short Genomic Sequences

Department of Statistics, Carnegie Mellon University, PA 15213, USA

Received 21 September 2009; Accepted 2 January 2010

Academic Editor: Igor B. Rogozin

Copyright © 2010 Jing Wu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Proposed is a procedure to test whether a genomic sequence contains coding DNA, called a coding potential region. The procedure tests the coding potential of conserved short genomic sequence, in which the assumptions on the probability models of gene structures are relaxed. Thus, it is expected to provide additional candidate regions that contain coding DNAs to the current genomic database. The procedure was applied to the set of highly conserved human-mouse sequences in the genome database at the University of California at Santa Cruz. For sequences containing RefSeq coding exons, the procedure detected 91.3% regions having coding potential in this set, which covers 83% of the human RefSeq coding exons, at a 2.6% false positive rate. The procedure detected 12,688 novel short regions with coding potential at the false discovery rate <0.05; 65.7% of the novel regions are between annotated genes.