Research Article

A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats

Figure 2

Some key steps of SWA. (a) Raw reads processing. Input reads containing any ā€œ ā€ or a low quality region are discarded and then sorted in alphabetical order. (b) Graphical illustration of unique process. The five different color lines represent the five unique reads in preprocessed raw reads. Each of them appears more than twice. By unique processing, the identical reads are collapsed into one unique and corresponding frequency. (c) Seed selection. The unique reads are ranked by read count (from high to low). Unique reads with read count larger than are selected as seeds for repeat (the red dotted frame), while unique reads with read counts smaller than are selected as seeds for nonrepeat extension (the blue dotted frame). (d) The graphical example of extending repeats using sliding window function in dynamic overlapping interval. The dotted box represents the dynamic overlapping interval . After overlapping with seed in , the overlapped read counts are recorded and then sliding window function is used to filter out the read bias in this interval continuously, as shown in C1. C2 is the corresponding results filtered by sliding window function, and then the mean value of this interval is recorded in variable to detect the boundary of repeats (Figure 4) and nonrepeats (Figure 5). In this extension, SWA regards as the optimal extendable read. The extension of nonrepeats is performed in a similar way. The detailed extension and boundary detection are graphically shown in Figures 4 and 5.
736473.fig.002a
(a)
736473.fig.002b
(b)
736473.fig.002c
(c)
736473.fig.002d
(d)