Research Article

Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads

Figure 1

Correlation between per-contig read depth and genomic copy number. (a) and (b), per-contig read depth frequency distributions for the 454 assemblies for E. coli K12 and P. gingivalis, respectively. (c) and (d): Scatter plot showing the number of BLASTX hits against the reference genome versus the per-contig read depth for each contig for the two assemblies. Linear regression curves are shown. Inset: the result of linear regression analysis. (e) and (f): comparison of the number of BLASTX hits against the reference genome versus the predicted per-contig copy number based on the statistical approach described in the paper. For each contig, the predicted copy number is indicated with the upper and lower confidence intervals. The 1 : 1 line is shown in grey. Copy number estimates with confidence intervals that do not include the known copy number (number of blast hits) are shown in red. Inset: the result of linear regression analysis.
782465.fig.001a
(a)
782465.fig.001b
(b)
782465.fig.001c
(c)
782465.fig.001d
(d)
782465.fig.001e
(e)
782465.fig.001f
(f)