Research Article

Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

Figure 4

(a) Testing the sampling operator using a large multiset made of 1000 unique elements with 1000 different copy numbers. Images describe linear and log-scale representation of the confidence interval of the sampling operator. Solutions beyond this interval were not observed in 5000 trials. Dotted line represents an overestimate of the 99.9% confidence interval (for details, see Figure S4). Most probable outcomes of the operator have either zero or one unique sequence beyond this interval. This line is used in subsequent sections (Figures 5 and 6). We note that distributions of the copy numbers have well-defined shape; according to central limit theorem, it is a normal distribution. With enough replicas, it should be possible to extrapolate the center of this distribution, define the solutions explicitly, and bypass the stochastic nature of the operator.
491612.fig.004