Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads
Identifying bacterial host genomic DNA in BAC assemblies. (a) and (b): scatter plot showing for each contig (minimum length 100 bp), the GC percentage and read depth (log scale) for Salmon BAC 184H23 (a) and Salmon BAC 114L13 (b) assemblies. Contigs with low read depths to the left of the dotted line at 20x read depth. (c) and (d): MEGAN comparisons of the low and high read depth contigs from the Salmon BAC 184H23 (a) and Salmon BAC 114L13 (b) assemblies. Trees collapsed at the Family level. Contigs with high read depth, in red, cluster into bony fishes (Clupeocephala) with a few hits classified as Schistosomatidae (flatworms). Numbers with the taxon names are number of hits summarized to that node and all nodes below in the NCBI taxonomic tree. Circles sizes are log-scale relative to the number of hits. “Not assigned”: contigs that were not assigned to any branch of the tree due to too low bit score (cutoff at 30) or because they are the only contig that were assigned to a particular taxon.