Table of Contents Author Guidelines Submit a Manuscript
International Journal of Genomics
Volume 2017 (2017), Article ID 6120980, 12 pages
https://doi.org/10.1155/2017/6120980
Research Article

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

Department of CSE, BUET, ECE Building West Palasi, Dhaka 1205, Bangladesh

Correspondence should be addressed to M. Sohel Rahman

Received 9 April 2017; Revised 19 July 2017; Accepted 26 July 2017; Published 27 August 2017

Academic Editor: Brian Wigdahl

Copyright © 2017 Md Mahfuzer Rahman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. Results. In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. Conclusions. In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.