Table of Contents Author Guidelines Submit a Manuscript
International Journal of Genomics
Volume 2015, Article ID 196591, 8 pages
http://dx.doi.org/10.1155/2015/196591
Research Article

Spaced Seed Data Structures for De Novo Assembly

Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada V5Z 4S6

Received 6 February 2015; Accepted 30 March 2015

Academic Editor: Che-Lun Hung

Copyright © 2015 Inanç Birol et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. L. Decourt, A. J. Gebara, M. C. Lima, D. Delascio, E. Chiorboli, and J. M. Fernandes, “Turner syndrome: presentation of four cases,” Revista Paulista de Medicina, vol. 45, no. 2, pp. 251–264, 1954. View at Google Scholar
  2. S. Levi, “Nature and origin of mongolism; critical review of etiopathogenetic problem,” Rivista di Clinica Pediatrica, vol. 49, no. 3, pp. 171–186, 1951. View at Google Scholar · View at Scopus
  3. L. G. Biesecker, J. C. Mullikin, F. M. Facio et al., “The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine,” Genome Research, vol. 19, no. 9, pp. 1665–1674, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. K. V. Voelkerding and E. Lyon, “Digital fetal aneuploidy diagnosis by next-generation sequencing,” Clinical Chemistry, vol. 56, no. 3, pp. 336–338, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. R. D. Morin, M. Mendez-Lago, A. J. Mungall et al., “Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma,” Nature, vol. 476, no. 7360, pp. 298–303, 2011. View at Publisher · View at Google Scholar · View at Scopus
  6. S. P. Shah, A. Roth, R. Goya et al., “The clonal and mutational evolution spectrum of primary triple-negative breast cancers,” Nature, vol. 486, no. 7403, pp. 395–399, 2012. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Burrows and D. Wheeler, “A block sorting lossless data compression algorithm,” Tech. Rep. 124, Digital Equipment Corporation, 1994. View at Google Scholar
  8. P. Ferragina and G. Manzini, “Opportunistic data structures with applications,” in Proceedings of the 41st Annual Symposium on Foundations of Computer SCIence (Redondo Beach, CA, 2000), pp. 390–398, IEEE Comput. Soc. Press, Los Alamitos, Calif, USA, 2000. View at Publisher · View at Google Scholar · View at MathSciNet
  9. F. Hach, F. Hormozdiari, C. Alkan, I. Birol, E. E. Eichler, and S. C. Sahinalp, “MrsFAST: a cache-oblivious algorithm for short-read mapping,” Nature Methods, vol. 7, no. 8, pp. 576–577, 2010. View at Publisher · View at Google Scholar · View at Scopus
  10. B. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nature Methods, vol. 9, no. 4, pp. 357–359, 2012. View at Publisher · View at Google Scholar · View at Scopus
  11. H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760, 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. L. W. Hillier, G. T. Marth, A. R. Quinlan et al., “Whole-genome sequencing and variant discovery in C. elegans,” Nature Methods, vol. 5, no. 2, pp. 183–188, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. H. Li, J. Ruan, and R. Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Research, vol. 18, no. 11, pp. 1851–1858, 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. C. A. Albers, G. Lunter, D. G. MacArthur, G. McVean, W. H. Ouwehand, and R. Durbin, “Dindel: accurate indel calls from short-read data,” Genome Research, vol. 21, no. 6, pp. 961–973, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. P. Carnevali, J. Baccash, A. L. Halpern et al., “Computational techniques for human genome resequencing using mated gapped reads,” Journal of Computational Biology, vol. 19, no. 3, pp. 279–292, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  17. K. Chen, J. W. Wallis, C. Kandoth et al., “Breakfusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data,” Bioinformatics, vol. 28, no. 14, pp. 1923–1924, 2012. View at Publisher · View at Google Scholar · View at Scopus
  18. K. Wong, T. M. Keane, J. Stalker, and D. J. Adams, “Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly,” Genome Biology, vol. 11, no. 12, p. R128, 2010. View at Publisher · View at Google Scholar · View at Scopus
  19. K. Chen, J. W. Wallis, M. D. McLellan et al., “BreakDancer: an algorithm for high-resolution mapping of genomic structural variation,” Nature Methods, vol. 6, no. 9, pp. 677–681, 2009. View at Publisher · View at Google Scholar · View at Scopus
  20. A. McPherson, F. Hormozdiari, A. Zayed et al., “deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data,” PLOS Computational Biology, vol. 7, no. 5, Article ID e1001138, 2011. View at Publisher · View at Google Scholar
  21. M. G. Grabherr, B. J. Haas, M. Yassour et al., “Full-length transcriptome assembly from RNA-Seq data without a reference genome,” Nature Biotechnology, vol. 29, no. 7, pp. 644–652, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. H. Li, “Exploring single-sample snp and indel calling with whole-genome de novo assembly,” Bioinformatics, vol. 28, no. 14, Article ID bts280, pp. 1838–1844, 2012. View at Publisher · View at Google Scholar · View at Scopus
  23. M. H. Schulz, D. R. Zerbino, M. Vingron, and E. Birney, “Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels,” Bioinformatics, vol. 28, no. 8, Article ID bts094, pp. 1086–1092, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol, “ABySS: a parallel assembler for short read sequence data,” Genome Research, vol. 19, no. 6, pp. 1117–1123, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. I. Birol, S. D. Jackman, C. B. Nielsen et al., “De novo transcriptome assembly with ABySS,” Bioinformatics, vol. 25, no. 21, pp. 2872–2877, 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. G. Robertson, J. Schein, R. Chiu et al., “De novo assembly and analysis of RNA-seq data,” Nature Methods, vol. 7, no. 11, pp. 909–912, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. T. J. Pugh, O. Morozova, E. F. Attiyeh et al., “The genetic landscape of high-risk neuroblastoma,” Nature Genetics, vol. 45, no. 3, pp. 279–284, 2013. View at Publisher · View at Google Scholar · View at Scopus
  28. K. G. Roberts, R. D. Morin, J. Zhang et al., “Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia,” Cancer Cell, vol. 22, no. 2, pp. 153–166, 2012. View at Publisher · View at Google Scholar · View at Scopus
  29. S. Yip, Y. S. Butterfield, O. Morozova et al., “Concurrent CIC mutations, IDH mutations, and 1p/19q loss distinguish oligodendrogliomas from other cancers,” Journal of Pathology, vol. 226, no. 1, pp. 7–16, 2012. View at Publisher · View at Google Scholar · View at Scopus
  30. P. A. Pevzner and H. Tang, “Fragment assembly with double-barreled data,” Bioinformatics, vol. 17, no. 1, pp. S225–S233, 2001. View at Publisher · View at Google Scholar · View at Scopus
  31. D. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research, vol. 18, no. 5, pp. 821–829, 2008. View at Publisher · View at Google Scholar · View at Scopus
  32. R. Chikhi and G. Rizk, “Space-efficient and exact de Bruijn graph representation based on a Bloom filter,” Algorithms for Molecular Biology, vol. 8, article 22, 2013. View at Publisher · View at Google Scholar
  33. B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, pp. 422–426, 1970. View at Publisher · View at Google Scholar · View at Scopus
  34. X. Huang and A. Madan, “CAP3: a DNA sequence assembly program,” Genome Research, vol. 9, no. 9, pp. 868–877, 1999. View at Publisher · View at Google Scholar · View at Scopus
  35. J. T. Simpson and R. Durbin, “Efficient de novo assembly of large genomes using compressed data structures,” Genome Research, vol. 22, no. 3, pp. 549–556, 2012. View at Publisher · View at Google Scholar · View at Scopus
  36. S. K. Pham, D. Antipov, A. Sirotkin, G. Tesler, P. A. Pevzner, and M. A. Alekseyev, “Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly,” Journal of Computational Biology, vol. 20, no. 4, pp. 359–371, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  37. L. Ilie, S. Ilie, and A. M. Bigvand, “SpEED: fast computation of sensitive spaced seeds,” Bioinformatics, vol. 27, no. 17, pp. 2433–2434, 2011. View at Publisher · View at Google Scholar · View at Scopus
  38. A. Broder and M. Mitzenmacher, “Network applications of bloom filters: a survey,” Internet Mathematics, vol. 1, no. 4, pp. 485–509, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  39. R. L. Warren, B. P. Vandervalk, S. J. M. Jones, and I. Birol, “LINKS: scaffolding genome assemblies with kilobase-long nanopore reads,” bioRxiv, 2015. View at Publisher · View at Google Scholar