Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2016, Article ID 3528406, 7 pages
Research Article

An Optimal Seed Based Compression Algorithm for DNA Sequences

1Department of Information Science and Engineering, Rajiv Gandhi Institute of Technology, Bangalore 560032, India
2Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala 673601, India

Received 28 November 2015; Revised 9 May 2016; Accepted 19 June 2016

Academic Editor: Frank M. You

Copyright © 2016 Pamela Vinitha Eric et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.