Research Article

Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

Figure 6

(a) Operator and multiset description of the error filtering procedure. Applying a Phred > 30 cutoff to library filtered by Phred>1 cutoff ( ) yields a subpopulation of the library ( ). If errors are sequence-independent, the process should be identical to random sampling ( ). Any sequence-specific bias ( ) should be detected as deviation from . (b) Progressive sampling with more stringent cutoff. (c) Theoretical and theoretical 99.9% confidence interval (blue). (d) Observation of statistically significant deviation from operator: dots beyond the blue line represent sequences prone to bias. Red dots represent sequences that disappeared after in process or during sampling. (e) Magnitude of the bias range from 5 to 100-fold. (f) Bias in sampling of Phred > 30 data from Phred > 1 data ((f) is theory, (g) is observed). (h) Bias upon sampling of Phred > 30 data from Phred > 13. Many sequences were lost in this sampling and this loss was statistically significant beyond the 99.9% interval. This result shows that some sequences have propensity to harbor low- and medium-quality reads. Distribution of the errors is sequence specific.
491612.fig.006a
(a)
491612.fig.006b
(b)
491612.fig.006c
(c)
491612.fig.006d
(d)
491612.fig.006e
(e)
491612.fig.006f
(f)
491612.fig.006g
(g)
491612.fig.006h
(h)