491612.fig.005a
(a)
491612.fig.005b
(b)
491612.fig.005c
(c)
491612.fig.005d
(d)
491612.fig.005e
(e)
Figure 5: (a) Representative lines from the intermediate file from Illumina deep sequencing analysis (for more information see ERROR_TAG_data0001.txt in Section 4 and our previous publication [6]). The reads have been parsed to identify adapters and barcodes. Each read has been tagged according to the library type, direction of the read, and quality of the adapter regions. We use this intermediate library to identify reads that harbor erroneous nucleotides. (b) Multiset view of the intermediate library. The library contains subsets that have low, medium, and high quality reads. Error filtering of this intermediate library to eliminates any read with Phred score below 30 yields a high quality library of reads . (c) Mean accuracy of the reads in the library after error filtering ranges from 95% to 99.6%. Even for very low-quality cutoff, Phred > 1, the average read quality is 95%. (d) Distribution of cumulative read accuracy in libraries processed using different cutoffs. (e) Linear plot of the data presented in (d) with zoom in on the region with >90% cumulative accuracy.