Statistical Algorithms for Long DNA Sequences: Oligonucleotide Distributions and Homogeneity Maps

Katsaloulis, P.; Theoharis, T.; Provata, A.

doi:https://doi.org/10.1155/2005/807304

Scientific Programming

On this page

Abstract Copyright Related Articles

Open Access

Volume 13 | Article ID 807304 | https://doi.org/10.1155/2005/807304

Statistical Algorithms for Long DNA Sequences: Oligonucleotide Distributions and Homogeneity Maps

P. Katsaloulis,^1,2T. Theoharis,¹and A. Provata²

Received26 Dec 2005

Accepted26 Dec 2005

Abstract

The statistical properties of oligonucleotide appearances within long DNA sequences often reveal useful characteristics of the corresponding DNA areas. Two algorithms to statistically analyze oligonucleotide appearances within long DNA sequences in genome banks are presented. The first algorithm determines statistical indices for arbitrary length oligonucleotides within arbitrary length DNA sequences. The critical exponent μ of the distance distribution between consecutive occurrences of the same oligonucleotide is calculated and its value is shown to characterize the functionality of the oligonucleotide. The second algorithm searches for areas with variable homogeneity, based on the density of oligonucleotides. The two algorithms have been applied to representative eucaryotes (the animal Mus musculusand the plant Arabidopsis thaliana) and interesting results were obtained, confirmed by biological observations. All programs are open source and publicly available on our web site.

Copyright

Copyright © 2005 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation Order printed copies

Views

153

Downloads

642

Citations