Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads
Correlation between per-contig read depth and genomic copy number. (a) and (b), per-contig read depth frequency distributions for the 454 assemblies for E. coli K12 and P. gingivalis, respectively. (c) and (d): Scatter plot showing the number of BLASTX hits against the reference genome versus the per-contig read depth for each contig for the two assemblies. Linear regression curves are shown. Inset: the result of linear regression analysis. (e) and (f): comparison of the number of BLASTX hits against the reference genome versus the predicted per-contig copy number based on the statistical approach described in the paper. For each contig, the predicted copy number is indicated with the upper and lower confidence intervals. The 1 : 1 line is shown in grey. Copy number estimates with confidence intervals that do not include the known copy number (number of blast hits) are shown in red. Inset: the result of linear regression analysis.