Review Article

Importance of Genetic Diversity Assessment in Crop Plants and Its Recent Advances: An Overview of Its Analytical Perspectives

Table 2

List of analytical programs for measuring molecular (genetic) diversity.

Analytical toolsData typeMain featuresSource linksReference

ArlequinRFLPs, DNA sequences, SSR data, allele frequencies, or standard multilocus genotypes.(i) Estimation allele and haplotype frequencies.
(ii) Tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium.
(iii) Estimation or parameters from past population expansions.
(iv) Thorough analyses of population subdivision under the AMOVA framework and so forth.
(v) Current version: Arlequin ver 3.5.1.3.
http://cmpg.unibe.ch/software/arlequin3Schneider et al. [87]
Excoffier et al. [88]

DnaSPDNA sequence data(i) Estimating several measures of DNA sequence variation within and between populations (in noncoding, synonymous, or nonsynonymous sites or in various sorts of codon positions), as well as linkage disequilibrium, recombination, gene flow, and gene conversion parameters.
(ii) DnaSP can also carry out several tests of neutrality: Hudson et al. [89], Tajima [90], McDonald and Kreitman [46], Fu and Li [91], and Fu [92] tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent and so forth.
(iii) Current version: DnaSP v5.10.01.
http://www.ub.edu/dnaspJ. Rozas and R. Rozas, [9395]
Librado and Rozas [96]

PowerMarkerSSR, SNP, and RFLP data(i) Computes several summary statistics for each marker locus, including allele number, missing proportion, heterozygosity, gene diversity, polymorphism information content (PIC), and stepwise patterns for microsatellite data.
(ii) PowerMarker is also used to compute allele frequency, genotype frequency, haplotype frequency for unrelated individuals, Hardy-Weinberg equilibrium, pairwise linkage disequilibrium, multilocus linkage disequilibrium, consensus trees, population structure, Mantel’s test, triangle plotting and visualization of linkage disequilibrium results.
(iii) Current version: PowerMarker V3.25.
http://statgen.ncsu.edu/powermarker/Liu and Muse [97]

DARwinSingle data (for haploids, homozygote diploids, and dominant markers), allelic data, and sequence data(i) Most widely used for various dissimilarity and distance estimations for different data, tree construction methods including hierarchical trees with various aggregation criteria (weighted or unweighted), Neighbor-Joining tree (weighted or unweighted), Scores method and principal coordinate analysis, and so forth.
(ii) Current version: DARwin v5.0.156.
http://darwin.cirad.fr/darwinPerrier and Jacquemoud-Collet [98]

NTSYSpcSingle data (for haploids, homozygote diploids, and dominant markers), allelic data, and sequence data(i) Used for clustering analysis, ordination analysis, principal component analysis, principal coordinate analysis, scaling analysis, and comparison of two matrices (Mantel test, Mantel [99] and so forth). 
(ii) Current version: NTSYSpc version 2.2.
http://www.exetersoftware.com/cat/ntsyspc/ntsyspc.htmlRohlf [100]

MEGADNA sequence, protein sequence, evolutionary distance, or phylogenetic tree data(i) Molecular evolutionary genetics analysis (MEGA) is most widely used for aligning sequences, estimating evolutionary distances, building tree from sequence data, testing tree reliability, and so forth.
(ii) Current version: MEGA6.
http://www.megasoftware.net Kumar et al. [101103] 
Tamura et al. [104]

PAUPMolecular sequences, morphological data, and other data types(i) Used for inferring and interpreting phylogenetic trees using parsimony, distance matrix, invariants, maximum likelihood methods, and many indices and statistical analyses.
(ii) Current version: PAUP version 4.0.
http://paup.csit.fsu.edu/Swofford [105]

STRUCTUREAll types of markers including mostly used markers like SSRs, SNPs, RFLPs, dArT, and so forth.(i) A free program to investigate population structure; it includes inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
(ii) Current version: STRUCTURE 2.3.4.
http://pritch.bsd.uchicago.edu/software/structure2_2.htmlPritchard et al. [106]
Falush et al. [107]
Hubisz et al. [108]

fastSTRUCTURESNP(i) An algorithm for inferring population structure from large SNP genotype data.
(ii) It is based on a variational Bayesian framework for posterior inference and is written in Python2.x.
http://rajanil.github.io/fastStructure/Raj et al. [109]

ADMIXTURESNP(i) ADMIXTURE is a program for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.
(ii) It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
(iii) Current version: ADMIXTURE 1.23.
https://www.genetics.ucla.edu/software/admixture/Alexander et al. [110]

fineSTRUCTURE Sequencing data(i) A fast and powerful algorithm for identifying population structure using dense sequencing data.
(ii) Current version: FineStructure 0.0.2.
http://paintmychromosomes.com/Lawson et al. [111]

POPGENEUse the dominant, codominant, and quantitative data for population genetic analysis(i) Used to calculate gene and genotype frequency, allele number, effective allele number, polymorphic loci, gene diversity, observed and expected heterozygosity, Shannon index, homogeneity test, -statistics, gene flow, genetic distance, dendrogram, neutrality test, and so forth.
(ii) Current version: POPGENE version 1.32,
https://www.ualberta.ca/~fyeh/popgene.html Francis et al. [112]

GENEPOPHaploid or diploid data (i) Used to compute exact tests or their unbiased estimation for Hardy-Weinberg equilibrium, population differentiation, and two-locus genotypic disequilibrium.
(ii) It converts the input GENEPOP file to formats used by other popular programs, like BIOSYS [113], DIPLOIDL [114], and LINKDOS [115], thereby allowing communication between them.
(iii) Current version: GENEPOP 4.2,
http://genepop.curtin.edu.au/Raymond and Rousset [116]

GenAIExCodominant, haploid, and binary genetic data. It accommodates the full range of genetic markers available, including allozymes, SSRs, SNPs, AFLP, and other multilocus markers, as well as DNA sequences(i) GenAIEx runs within Microsoft Excel enabling population genetic analysis of codominant, haploid, and binary data. Used to compute allele frequency-based analyses including heterozygosity, -statistics, Nei’s genetic distance, population assignment, probabilities of identity, and pairwise relatedness.
(ii) Used for calculating genetic distance matrices and distance based calculations including analysis of molecular variance (AMOVA) [117, 118]; principal coordinates analysis (PCA); Mantel tests [119]; 2D spatial autocorrelation analyses following Smouse and Peakall [120], Peakall et al. [121], Double et al. [122].
(iii) Current version: GenAIEx 6.5.
http://biology-assets.anu.edu.au/GenAlEx/Welcome.htmlPeakall and Smouse [123]