Table of Contents Author Guidelines Submit a Manuscript
Erratum

An erratum for this article has been published. To view the erratum, please click here.

BioMed Research International
Volume 2014, Article ID 736473, 16 pages
http://dx.doi.org/10.1155/2014/736473
Research Article

A De Novo Genome Assembly Algorithm for Repeats and Nonrepeats

1School of Information Science and Technology, Sun Yat-Sen University, Guangzhou High Education Mega City, No. 132 Waihuan East Road, Panyu District, GuangZhou 510006, China
2SYSU-CMU Shunde International Joint Research Institute, Shunde 528300, China

Received 19 December 2013; Revised 1 April 2014; Accepted 8 April 2014; Published 25 May 2014

Academic Editor: Li-Ching Wu

Copyright © 2014 Shuaibin Lian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background. Next generation sequencing platforms can generate shorter reads, deeper coverage, and higher throughput than those of the Sanger sequencing. These short reads may be assembled de novo before some specific genome analyses. Up to now, the performances of assembling repeats of these current assemblers are very poor. Results. To improve this problem, we proposed a new genome assembly algorithm, named SWA, which has four properties: (1) assembling repeats and nonrepeats; (2) adopting a new overlapping extension strategy to extend each seed; (3) adopting sliding window to filter out the sequencing bias; and (4) proposing a compensational mechanism for low coverage datasets. SWA was evaluated and validated in both simulations and real sequencing datasets. The accuracy of assembling repeats and estimating the copy numbers is up to 99% and 100%, respectively. Finally, the extensive comparisons with other eight leading assemblers show that SWA outperformed others in terms of completeness and correctness of assembling repeats and nonrepeats. Conclusions. This paper proposed a new de novo genome assembly method for resolving complex repeats. SWA not only can detect where repeats or nonrepeats are but also can assemble them completely from NGS data, especially for assembling repeats. This is the advantage over other assemblers.