Research Article

Genomic Analysis of a Serotype 5 Streptococcus pneumoniae Outbreak in British Columbia, Canada, 2005–2009

Table 1

Glossary of genomics and bioinformatics terms.

TermDefinition

ReadShort fragment of DNA sequence output by genome sequencer. Commonly 50–250 bp in length. Raw read refers to reads taken directly from the genome sequencer and in no way filtered.
Paired-endPairs of reads that are two ends of the same region of DNA a standard distance apart.
DepthNumber of reads mapped to a given position in the reference.
Reference based assemblyConstruction of reads by mapping/aligning to a known reference sequence.
De novo assemblyAssembly of reads without a reference.
AdapterShort nucleotide sequences found at the end of reads which are part of the sequencing reaction.
InsertionShort regions of DNA present in our samples but not the reference sequence.
ContigContiguous regions of DNA generated by joining together raw sequence reads.
Genomic islandRegions of the genome with suspected horizontal origins, in that they were likely acquired from other bacteria of the same or similar species.
Sequence compositionProportion of each nucleotide base present.
Codon usage biasRegions with different sequence composition or amino acid composition compared to the rest of the genome.
50Length for which all the contigs of that length or longer contain at least half of the total of the lengths of the contigs.
NonsynonymousNucleotide substitution that alters the amino acid sequence.
Open reading frame (ORF)The part of a gene that has the potential to code for a protein.

bp: base pairs.