Computational and Mathematical Methods in Medicine

Research Article

Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

Figure 3

(a) Testing the sampling operator implemented as random indexing function using a model multiset. (b) In 100,000 trials, we observed 22 unique solutions from which 14 resided in a 95% confidence interval. Solutions with 0 and 1 copies of element A were found at equal abundances (“redundant solutions”). (c) Representation of the most probable solution as a line with 4 nodes; “” is a probability to find the solution; dotted line is an expected “average solution” for 50% sampling. (d) The 5th most probable solution; (e) least probable solution deviates the most form the average; (f) combination of all solutions. Red thick lines describe the most probable solutions; thin blue lines describe the least probable solutions. (g) Sampling of larger multisets yields more possible solutions (here, 2957 in 5000 trials). (h) All solutions of the sampling represented as lines. (i) Probability to observe a particular copy number after sampling. While (h) is the most accurate representations of the confidence intervals, the thin blue lines describe solutions outside the confidence interval; this representation is impractical due to large number of redundant solutions in larger multisets. In (i), confidence interval could be extrapolated from distributions of individual copy numbers (e): red dots are on or outside the confidence interval.