Journal of Nucleic Acids

Research Article

Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

Table 1

Shows two genome positions analyzed by pipeline A (GNUPMap/Soap) as examples of the pipeline processing. The SNP file data analysis has 15 columns. The SNP file data includes the following: (1) reference genotype; (2) consensus genotype; (3) quality score of consensus genotype; (4) best base; (5) average quality score of best base; (6) count of uniquely mapped best base; (7) count of all mapped best base; (8) second best bases; (9) average quality score of second best base; (10) count of uniquely mapped second best base; (11) count of all mapped second best base; (12) sequencing depth of the site; (13) rank sum test value; (14) average copy number of nearby region; and (15) whether the site is a dbSNP (1, yes; 0, new SNP). The quality score in field 3 is the posterior probability, and the range is between 0 and 1 (The reported score is PP × 100, so the range is 0–100). The quality score in field 5 and 9 is the average quality score of the best or 2nd best base, respectively. This corresponds to the Illumina quality scores from the original QSEQ or FASTQ files, and the range is between 0 and 40.


1	2	3	4	5	6	7	8	9	10	11	12	13	14	15

A	T	36*	T	37	4	4	A	36	3	3	7	1	1	0
A	G	99	G	38	6	6	A	0	0	0	6	1	1	0

This example SNP is called with a posterior probability of 36%. The reference genotype is A and the consensus is T. The best base identified is T with a -score of 37, while the 2nd best base is the same as reference with a -score of 36. The total depth at the site is seven, with four reads supporting T and three reads supporting A. This SNP should be rejected by setting an appropriate posterior probability cutoff and not used to determine diversity or phylogenetic distance relationships.