Research Article

A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats

Figure 3

Schematic of a hash index of short sequences strategy. Sequence reads with associated read identifiers are shown, with the regions that will be used for seed selection in capital letters and matched seeds of two bases from AA to TT. Given read identifiers are associated with the seeds using a hash function (e.g., a unique integer representation of each seed). Once such hash table has been built for unique reads; the corresponding data can be scanned with the same hash function, resulting in a much smaller subset of reads to more exactly search the extendible reads.
736473.fig.003