Review Article

Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

Table 2

Tools for identifying variation from a reference genome using NGS reads.

NameReferencesURLComment

GATK[2]http://www.broadinstitute.org/gatk/(i) Arguably the most established genome analysis toolkit
(ii) Includes tools such as Unified Genotyper (SNP/genotype caller), variant filtration (for filtering SNPs), and variant Recalibrator (for SNP quality scores)
(iii) Well documented with forums
(iv) Input: SAM format
(v) Output: VCF format

QCALL[79]ftp://ftp.sanger.ac.uk/pub/rd/QCALL(i) Theoretically calls “high quality” SNPs even from low-coverage sequencing data
(ii) Makes use of linkage disequilibrium information

PyroBayes[80]http://bioinformatics.bc.edu/marthlab/wiki/index.php/PyroBayes(i) Theoretically makes “confident” base calls even in shallow read coverage for reads produced by Pyrosequencing machines.

SAMTools[27]http://samtools.sourceforge.net/(i) Computes genotype likelihoods
(ii) BCFtools calls SNP and genotypes
(iii) Successfully used in many WGS and WES projects such as the 1000 Genomes Project [17].
(iv) Offers additional features such as viewing alignments and conversion of SAM to a BAM format

SOAPsnp[81]http://soap.genomics.org.cn/soapsnp.html(i) Part of the reliable SOAP family of bioinformatics tools
(ii) Well-documented website and cited and used by many [82, 83].

Control-FREEC[84]http://bioinfo-out.curie.fr/projects/freec/(i) Identifies copy number variations (CNVs) between case and controls from sequencing data
(ii) R script available for visualising CNVs by chromosome
(iii) Input format: BAM

Atlas2[85]https://www.hgsc.bcm.edu/software/atlas-2(i) Calls SNPs and indels for WES data
(ii) Requires BAM file as input
(iii) Output: VCF format

GATK, SOAPsnp, and SAMTools have constantly been cited in large genetic association projects indicating their ease of use, reliability, and functionality. However, this is also helped by the fact that they have additional features. There are other tools such as Beagle [68], IMPUTE2 [86], and MaCH [87] which have modules for SNP and genotype calling but are mostly used for their main purpose such as imputation and haplotype phasing.