Review Article

Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process

Table 3

Tools for predicting variant effects, identifying neutral and pathogenic mutations.

NameReferenceMCCComments

*SIFT[88, 89]0.30 (unweighted)It is a highly cited with many projects using and citing it since 2001, uses available evolutionary information and is continually updated, is easy to use through VEP, and provides two classifications: “deleterious” and “tolerated.”

*PolyPhen-2[42]0.43It provides a high quality multiple sequence alignment pipeline and is optimized for high-throughput analysis of NGS data, is cited and used by many projects of different types, is easy to use through VEP, and provides three classifications: “probably damaging,” “possibly damaging,” and “benign.”

*FATHMM[43]0.72It is a high performing prediction tool. Clear examples are available on the website. It offers flexibility to the user for weighted (trained using inherited disease causing mutations) and unweighted (conservation-based approach) predictions and also offers protein domain-phenotype association information, and has options for cancer-specific predictions (FATHMM-Cancer) and predictions for noncoding variants (FATHMM-MKL).

GERP++ (and GERP) [9092]N/AIt determines constrained elements within the human genome; therefore variants in them are likely to induce functional changes. Can provide unique details about the candidate variant(s).

PhyloP[93]N/AIt helps detect nonneutral substitutions, similar aim with GERP.

CADD[11]It provides annotation and scores for all variants in the genome considering a wide range of biological features.

GWAVA[13]It provides predictions for variants in the noncoding part of the genome.

*SNAP[94]0.47It predicts the effects of nonsynonymous polymorphisms and is cited and used many times and should be used to check whether the predicted effect is matched by the putative causal variant. However it was labelled “too slow” for high throughput analyses by [46].

PupaSuite[95]It identifies functional SNPs using the SNPeffect [96] database and evolutionary information.

Mutation Assessor-2[97]It predicts the impact of protein mutations and is user friendly website and accepts many formats.

*PANTHER[98, 99]0.53 (unweighted)It predicts the effect of amino acid change based on protein evolutionary relationships. It provides a number ranging from 0 (neutral) to −10 (most likely deleterious) and allows the user to decide on the “deleteriousness” threshold. It is constantly updated making it a very reliable tool.

CONDEL-2[45]It combines FATHMM and Mutation Assessor (as of version 2) in order to improve prediction. It theoretically outperforms the tools it is using in comparison to when the tools are used individually.

*MutPred[44]0.63It predicts whether a missense mutation is going to be harmful or not based on a variety of features such as sequence conservation, protein structure, and functional annotations and is praised in recent comparative study by [46].

*SNPs&GO[100]0.65It is reported to have performed best amongst many prediction tools in [46] and provides two classifications: “disease related” and “neutral.”

Human Splicing Finder[47]N/AIt predicts the effect of noncoding variants in terms of alteration of splicing. Useful for compound heterozygotes if one allele is intronic.

Others[101104]0.19
0.43
0.40
*nsSNPAnalyzer (requires 3D structure coordinates), *PhD SNP, *Polyphen (not supported any more), and PMUT

Many methods have been developed to predict the functional effect of variants in the genome. Many of the tools listed above use different features and datasets to predict these effects. This is not an exhaustive list of all prediction tools but a collection of the most used/cited ones.
*Comprehensive information about the prediction tool including accuracy, specificity, and sensitivity available in [43, 46]. N/A: not applicable. MCC: Matthew’s correlation coefficient. MCCs are obtained from [43].