Research Article

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

Algorithm 1

Formal steps of the algorithm.
for which HashFunc ← 0 to LastHashFunc do
  hashedKhmer ← Hasher(whichHashFunc, rawKmerString)
  indexhashedKhmer%blockSize
  quotientkhmerhhashedKhmer ÷ blockSize
  if no neighbor is found in memBlock[index] then
   put the quotient into memBlock[index]
   put whichHashFunc into memBlock[index]
   put nextNeucleotide into memBlock[index]
  else if neighbor(s) found in memBlock[index] and hashvalue matched then
   put nextNeucleotide into memBlock[index]
  else if whichHashF unc = LastHashF unc and neighbor(s) found in memBlock[index]
  and hashvalue does not match then
   firstLevelVectorIndex index%mapPointer5Byte.size
   create tempVect with tempVect.indexVal index
   add tempVect to mapPointer5Byte[firstLevelVectorIndex]
   create a tempkmerInfo and put nextNeucleotide in it
   isFound false
   if mapPointer5Byte[firstLevelVectorIndex] [secondLevelVectorIndex].vect
   has already this kmer then
    update nextNeucleotide
    isFound true
   end if
   if isFound = false then
    put quotient, whichHashFunc in tempkmerInfo
    add the newly updated tempkmerInfo to mapPointer5Byte[firstLevelVectorIndex]
    [secondLevelVectorIndex].vect
   end if
  else
   continue
  end if
end for