Research Article

Challenges of the Unknown: Clinical Application of Microbial Metagenomics

Table 3

Sequence output and data storage for the two datasets. The number of sequences surviving the common preprocessing stages are shown, whilst classified sequences are based on the targeted then assembly approach within the viral dataset, and the kmer based approach within the nonhuman model dataset. Percentages based on the expected number of PE sequences generated for each sequencing chemistry kit used. Storage (in GB) consists of all fastq and intermediate files including bam and bed format files, generated throughout the analysis.

SampleDataset 1: viral panelDataset 2: nonhuman model
Reads within set%Data (GB)Reads within set%Data (GB)

Predicted reads15,000,00025,000000
Sequenced reads 13,537,91790.39.112,734,16550.913.6
Preprocessing: trimming 12,223,51381.515.811,520,49946.124.5
Preprocessing: host screen11,265,75875.111,517,21746.1
Classified sequences8,006,56253.47.32,788,45011.25.5

Total storage32.243.6