Genomic Analysis of a Serotype 5 Streptococcus pneumoniae Outbreak in British Columbia, Canada, 2005–2009
Table 1
Glossary of genomics and bioinformatics terms.
Term
Definition
Read
Short fragment of DNA sequence output by genome sequencer. Commonly 50–250 bp in length. Raw read refers to reads taken directly from the genome sequencer and in no way filtered.
Paired-end
Pairs of reads that are two ends of the same region of DNA a standard distance apart.
Depth
Number of reads mapped to a given position in the reference.
Reference based assembly
Construction of reads by mapping/aligning to a known reference sequence.
De novo assembly
Assembly of reads without a reference.
Adapter
Short nucleotide sequences found at the end of reads which are part of the sequencing reaction.
Insertion
Short regions of DNA present in our samples but not the reference sequence.
Contig
Contiguous regions of DNA generated by joining together raw sequence reads.
Genomic island
Regions of the genome with suspected horizontal origins, in that they were likely acquired from other bacteria of the same or similar species.
Sequence composition
Proportion of each nucleotide base present.
Codon usage bias
Regions with different sequence composition or amino acid composition compared to the rest of the genome.
50
Length for which all the contigs of that length or longer contain at least half of the total of the lengths of the contigs.
Nonsynonymous
Nucleotide substitution that alters the amino acid sequence.
Open reading frame (ORF)
The part of a gene that has the potential to code for a protein.