International Journal of Genomics

Research Article

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

Formal steps of the algorithm.

for which HashFunc ← 0 to LastHashFunc do
hashedKhmer ← Hasher(whichHashFunc, rawKmerString)
index ← hashedKhmer%blockSize
quotient ← khmerhhashedKhmer ÷ blockSize
if no neighbor is found in memBlock[index] then
put the quotient into memBlock[index]
put whichHashFunc into memBlock[index]
put nextNeucleotide into memBlock[index]
else if neighbor(s) found in memBlock[index] and hashvalue matched then
put nextNeucleotide into memBlock[index]
else if whichHashF unc = LastHashF unc and neighbor(s) found in memBlock[index]
and hashvalue does not match then
firstLevelVectorIndex ← index%mapPointer5Byte.size
create tempVect with tempVect.indexVal ← index
add tempVect to mapPointer5Byte[firstLevelVectorIndex]
create a tempkmerInfo and put nextNeucleotide in it
isFound ← false
if mapPointer5Byte[firstLevelVectorIndex] [secondLevelVectorIndex].vect
has already this kmer then
update nextNeucleotide
isFound ← true
end if
if isFound = false then
put quotient, whichHashFunc in tempkmerInfo
add the newly updated tempkmerInfo to mapPointer5Byte[firstLevelVectorIndex]
[secondLevelVectorIndex].vect
end if
else
continue
end if
end for