Comparative and Functional Genomics
Volume 4 (2003), Issue 3, Pages 318-328
Primary research paper

ORF Organization and Gene Recognition in the Yeast Genome

Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot 010021, China

Received 31 October 2002; Revised 3 March 2003; Accepted 10 March 2003

Copyright © 2003 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Some rules on gene recognition and ORF organization in the Saccharomyces cerevisiae genome are demonstrated by statistical analyses of sequence data. This study includes: (a) The random frame rule—that the six reading frames W1, W2, W3, C1, C2 and C3 in the double-stranded genome are randomly occupied by ORFs (related phenomena on ORF overlapping are also discussed). (b) The inhomogeneity rule—coding and non-coding ORFs differ in inhomogeneity of base composition in the three codon positions. By use of the inhomogeneity index (IHI), one can make a distinction between coding (IHI > 14) and non-coding (IHI ≤ 14) ORFs at 95% accuracy. We find that ‘spurious’ ORFs (with IHI ≤ 14) are distributed mainly in three classes of ORFs, namely, those with ‘similarity to unknown proteins’, those with ‘no similarity’, or ‘questionable ORFs’. The total number of spurious ORFs (which are unlikely to be regarded as coding ORFs) is estimated to be 470. (c) The evaluation of ORF length distribution shows that below 200 amino acids the occurrence of ATG initiator ORFs is close to random.