Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2016, Article ID 3528406, 7 pages
http://dx.doi.org/10.1155/2016/3528406
Research Article

An Optimal Seed Based Compression Algorithm for DNA Sequences

1Department of Information Science and Engineering, Rajiv Gandhi Institute of Technology, Bangalore 560032, India
2Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala 673601, India

Received 28 November 2015; Revised 9 May 2016; Accepted 19 June 2016

Academic Editor: Frank M. You

Copyright © 2016 Pamela Vinitha Eric et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. E. S. Lander, L. M. Linton, B. Birren et al., “Initial sequencing and analysis of the human genome,” Nature, vol. 409, no. 6822, pp. 860–921, 2001. View at Publisher · View at Google Scholar
  2. X. Chen, S. Kwong, and M. Li, “Compression algorithm for DNA sequences and its applications in genome comparison,” in Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB '00), p. 107, ACM, Tokyo, Japan, April 2000. View at Scopus
  3. L. Allison, L. Stern, T. Edgoose, and T. I. Dix, “Sequence complexity for biological sequence analysis,” Computers and Chemistry, vol. 24, no. 1, pp. 43–55, 2000. View at Publisher · View at Google Scholar · View at Scopus
  4. E. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards parameter-free data mining,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215, August 2004. View at Scopus
  5. J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337–343, 1977. View at Google Scholar · View at MathSciNet
  6. J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Transactions on Information Theory, vol. 24, no. 5, pp. 530–536, 1978. View at Publisher · View at Google Scholar · View at MathSciNet
  7. S. Grumbach and F. Tahi, “Compression of DNA sequences,” in Proceedings of the IEEE Symposium on Data Compression, pp. 340–350, Snowbird, Utah, USA, 1993.
  8. S. Grumbach and F. Tahi, “A new challenge for compression algorithms: genetic sequences,” Information Processing and Management, vol. 30, no. 6, pp. 875–886, 1994. View at Publisher · View at Google Scholar · View at Scopus
  9. R. Giancarlo, D. Scaturro, and F. Utro, “Textual data compression in computational biology: a synopsis,” Bioinformatics, vol. 25, no. 13, pp. 1575–1586, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, New York, NY, USA, 2012.
  11. D. Salomon, Data Compression: The Complete Reference, Springer Science and Business Media, 2004.
  12. X. Chen, M. Li, B. Ma, and J. Tromp, “DNACompress: fast and effective DNA sequence compression,” Bioinformatics, vol. 18, no. 12, pp. 1696–1698, 2002. View at Publisher · View at Google Scholar · View at Scopus
  13. B. Ma, J. Tromp, and M. Li, “PatternHunter: faster and more sensitive homology search,” Bioinformatics, vol. 18, no. 3, pp. 440–445, 2002. View at Publisher · View at Google Scholar · View at Scopus
  14. A. Apostolico and S. Lonardi, “Compression of biological sequences by greedy off-line textual substitution,” in Proceedings of the Data Compression Conference (DDC '00), pp. 143–152, March 2000. View at Scopus
  15. D. Adjeroh and F. Nan, “On compressibility of protein sequences,” in Proceedings of the Data Compression Conference (DCC '06), p. 10, Snowbird, Utah, USA, March 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. D. Adjeroh, Y. Zhang, A. Mukherjee, M. Powell, and T. Bell, “DNA sequence compression using the Burrows-Wheeler Transform,” in Proceedings of the IEEE Computer Society Bioinformatics Conference, Computer Society, vol. 1, pp. 303–313, 2002.
  17. É. Rivals, M. Dauchet, J. P. Delahaye, and O. Delgrange, “Compression and genetic sequence analysis,” Biochimie, vol. 78, no. 5, pp. 315–322, 1996. View at Publisher · View at Google Scholar · View at Scopus
  18. M. D. Cao, T. I. Dix, L. Allison, and C. Mears, “A simple statistical algorithm for biological sequence compression,” in Proceedings of the Data Compression Conference (DCC '07), pp. 43–52, IEEE, Snowbird, Utah, USA, March 2007. View at Publisher · View at Google Scholar · View at Scopus
  19. D. Loewenstern and P. N. Yianilos, “Significantly lower entropy estimates for natural DNA sequences,” Journal of Computational Biology, vol. 6, no. 1, pp. 125–142, 1999. View at Publisher · View at Google Scholar · View at Scopus
  20. G. Korodi and I. Tabus, “An efficient normalized maximum likelihood algorithm for DMA sequence compression,” ACM Transactions on Information Systems, vol. 23, no. 1, pp. 3–34, 2005. View at Publisher · View at Google Scholar · View at Scopus
  21. J. I. Myung, D. J. Navarro, and M. A. Pitt, “Model selection by normalized maximum likelihood,” Journal of Mathematical Psychology, vol. 50, no. 2, pp. 167–179, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  22. T. Matsumoto, K. Sadakane, and H. Imai, “Biological sequence compression algorithms,” Genome Informatics, vol. 11, pp. 43–52, 2000. View at Google Scholar · View at Scopus
  23. F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens, “The context-tree weighting method: basic properties,” IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 653–664, 1995. View at Publisher · View at Google Scholar · View at Scopus
  24. B. Behzadi and F. Le Fessant, “DNA compression challenge revisited: a dynamic programming approach,” in Proceedings of the Annual Symposium on Combinatorial Pattern Matching, pp. 190–200, Springer, Berlin, Germany, 2005.
  25. D. Adjeroh and J. Feng, “The SCP and compressed domain analysis of biological sequences,” in Proceedings of the IEEE Bioinformatics Conference (CSB '03), pp. 587–592, Stanford, Calif, USA, August 2003. View at Publisher · View at Google Scholar
  26. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  27. P. Agarwal, “Compact encoding strategies for DNA sequence similarity search,” in Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB '95), vol. 4, pp. 211–217, 1995.