Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 319534, 7 pages
Research Article

Improved Variant Calling Accuracy by Merging Replicates in Whole-Exome Sequencing Studies

1Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
2Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
3Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA

Received 28 May 2014; Revised 15 July 2014; Accepted 15 July 2014; Published 4 August 2014

Academic Editor: Ernesto Picardi

Copyright © 2014 Yanfeng Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


In large scale population-based whole-exome sequencing (WES) studies, there are some samples occasionally sequenced two or more times due to a variety of reasons. To investigate how to efficiently utilize these duplicated sequencing data, we conducted comprehensive evaluation of variant calling strategies. 92 samples subjected to WES twice were selected from a large population study. These 92 duplicated samples were divided into two groups: group H consisting of the higher sequencing depth for each subject and group L consisting of the lower depth for each subject. The merged samples for each subject were put in a third group M. Using the GATK multisample toolkit, we compared variant calling accuracy among three strategies. Hierarchical clustering analysis indicated that the two replicates for each subject showed high homogeneity. The comparative analyses on the basis of heterozygous-homozygous ratio (Hete/Homo), transition-transversion ratio (Ti/Tv), and overlapping rate with the 1000 Genomes Project consistently showed that the data quality of the SNPs detected from the M group was more accurate than that of SNPs detected from the H and L groups. These results suggested that merging homogeneous duplicated exomes instead of using one of them could improve variant calling accuracy.