Table of Contents Author Guidelines Submit a Manuscript
International Journal of Genomics
Volume 2016 (2016), Article ID 7983236, 16 pages
http://dx.doi.org/10.1155/2016/7983236
Review Article

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data

1Division of Medical Oncology, Department of Medicine, School of Medicine, Aurora, CO 80045, USA
2University of Colorado Cancer Center, Aurora, CO 80045, USA
3Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, Anschutz Medical Campus, Aurora, CO 80045, USA

Received 25 May 2016; Accepted 26 October 2016

Academic Editor: Lam C. Tsoi

Copyright © 2016 Jennifer D. Hintzsche et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. L. Metzker, “Sequencing technologies—the next generation,” Nature Reviews Genetics, vol. 11, no. 1, pp. 31–46, 2010. View at Publisher · View at Google Scholar · View at Scopus
  2. S. Goodwin, J. D. McPherson, and W. R. McCombie, “Coming of age: ten years of next-generation sequencing technologies,” Nature Reviews Genetics, vol. 17, no. 6, pp. 333–351, 2016. View at Publisher · View at Google Scholar
  3. M. Lek, K. J. Karczewski, E. V. Minikel et al., “Analysis of protein-coding genetic variation in 60,706 humans,” Nature, vol. 536, no. 7616, pp. 285–291, 2016. View at Publisher · View at Google Scholar
  4. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. M. A. DePristo, E. Banks, R. Poplin et al., “A framework for variation discovery and genotyping using next-generation DNA sequencing data,” Nature Genetics, vol. 43, no. 5, pp. 491–498, 2011. View at Publisher · View at Google Scholar · View at Scopus
  6. G. A. Van der, M. O. Auwera, C. Hartl et al., “From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline,” Current Protocols in Bioinformatics, vol. 11, no. 1110, pp. 11.10.1–11.10.33, 2013. View at Google Scholar
  7. H. Li, B. Handsaker, A. Wysoker et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. H. Li and R. Durbin, “Fast and accurate long-read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 26, no. 5, Article ID btp698, pp. 589–595, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. B. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nature Methods, vol. 9, no. 4, pp. 357–359, 2012. View at Publisher · View at Google Scholar · View at Scopus
  11. S. Marco-Sola, M. Sammeth, R. Guigó, and P. Ribeca, “The GEM mapper: fast, accurate and versatile alignment by filtration,” Nature Methods, vol. 9, no. 12, pp. 1185–1188, 2012. View at Publisher · View at Google Scholar · View at Scopus
  12. T. D. Wu and S. Nacu, “Fast and SNP-tolerant detection of complex variants and splicing in short reads,” Bioinformatics, vol. 26, no. 7, pp. 873–881, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. H. Li, J. Ruan, and R. Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Research, vol. 18, no. 11, pp. 1851–1858, 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. C. Alkan, J. M. Kidd, T. Marques-Bonet et al., “Personalized copy number and segmental duplication maps using next-generation sequencing,” Nature Genetics, vol. 41, no. 10, pp. 1061–1067, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. R. Li, Y. Li, K. Kristiansen, and J. Wang, “SOAP: short oligonucleotide alignment program,” Bioinformatics, vol. 24, no. 5, pp. 713–714, 2008. View at Publisher · View at Google Scholar · View at Scopus
  16. R. Li, C. Yu, Y. Li et al., “SOAP2: an improved ultrafast tool for short read alignment,” Bioinformatics, vol. 25, no. 15, pp. 1966–1967, 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. Z. Ning, A. J. Cox, and J. C. Mullikin, “SSAHA: a fast search method for large DNA databases,” Genome Research, vol. 11, no. 10, pp. 1725–1729, 2001. View at Publisher · View at Google Scholar · View at Scopus
  18. G. Lunter and M. Goodson, “Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads,” Genome Research, vol. 21, no. 6, pp. 936–939, 2011. View at Publisher · View at Google Scholar · View at Scopus
  19. V. L. Galinsky, “YOABS: yet other aligner of biological sequences—an efficient linearly scaling nucleotide aligner,” Bioinformatics, vol. 28, no. 8, Article ID bts102, pp. 1070–1077, 2012. View at Publisher · View at Google Scholar · View at Scopus
  20. H. Li and N. Homer, “A survey of sequence alignment algorithms for next-generation sequencing,” Briefings in Bioinformatics, vol. 11, no. 5, Article ID bbq015, pp. 473–483, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Shendure and H. Ji, “Next-generation DNA sequencing,” Nature Biotechnology, vol. 26, no. 10, pp. 1135–1145, 2008. View at Publisher · View at Google Scholar · View at Scopus
  22. T. J. Treangen and S. L. Salzberg, “Repetitive DNA and next-generation sequencing: computational challenges and solutions,” Nature Reviews Genetics, vol. 13, no. 1, pp. 36–46, 2012. View at Publisher · View at Google Scholar · View at Scopus
  23. H. Xu, X. Luo, J. Qian et al., “FastUniq: a fast de novo duplicates removal tool for paired short reads,” PLoS ONE, vol. 7, no. 12, Article ID e52249, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Pabinger, A. Dander, M. Fischer et al., “A survey of tools for variant analysis of next-generation genome sequencing data,” Briefings in Bioinformatics, vol. 15, no. 2, pp. 256–278, 2014. View at Publisher · View at Google Scholar · View at Scopus
  25. D. Shigemizu, A. Fujimoto, S. Akiyama et al., “A practical method to detect SNVs and indels from whole genome and exome sequencing data,” Scientific Reports, vol. 3, article 2161, 2013. View at Publisher · View at Google Scholar · View at Scopus
  26. A. Rimmer, H. Phan, I. Mathieson et al., “Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications,” Nature Genetics, vol. 46, no. 8, pp. 912–918, 2014. View at Publisher · View at Google Scholar · View at Scopus
  27. E. Garrison and G. Marth, “Haplotype-based variant detection from short-read sequencing,” https://arxiv.org/abs/1207.3907.
  28. G. Ellison, S. Huang, H. Carr et al., “A reliable method for the detection of BRCA1 and BRCA2 mutations in fixed tumour tissue utilising multiplex PCR-based targeted next generation sequencing,” BMC Clinical Pathology, vol. 15, no. 1, article 5, 2015. View at Publisher · View at Google Scholar · View at Scopus
  29. A. K. Talukder, S. Ravishankar, K. Sasmal et al., “XomAnnotate: analysis of heterogeneous and complex exome—a step towards translational medicine,” PLoS ONE, vol. 10, no. 4, Article ID e0123569, 2015. View at Publisher · View at Google Scholar · View at Scopus
  30. K. Ye, M. H. Schulz, Q. Long, R. Apweiler, and Z. Ning, “Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads,” Bioinformatics, vol. 25, no. 21, pp. 2865–2871, 2009. View at Publisher · View at Google Scholar · View at Scopus
  31. E. Karakoc, C. Alkan, B. J. O'Roak et al., “Detection of structural variants and indels within exome data,” Nature Methods, vol. 9, no. 2, pp. 176–178, 2012. View at Publisher · View at Google Scholar · View at Scopus
  32. A. Ratan, T. L. Olson, T. P. Loughran, and W. Miller, “Identification of indels in next-generation sequencing data,” BMC Bioinformatics, vol. 16, no. 1, article 42, 2015. View at Publisher · View at Google Scholar · View at Scopus
  33. Z. Zhang, J. Wang, J. Luo et al., “Sprites: detection of deletions from sequencing data by re-aligning split reads,” Bioinformatics, vol. 32, no. 12, pp. 1788–1796, 2016. View at Publisher · View at Google Scholar
  34. K. Wang, M. Li, and H. Hakonarson, “ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data,” Nucleic Acids Research, vol. 38, no. 16, article e164, 2010. View at Publisher · View at Google Scholar · View at Scopus
  35. K. Cibulskis, M. S. Lawrence, S. L. Carter et al., “Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples,” Nature Biotechnology, vol. 31, no. 3, pp. 213–219, 2013. View at Publisher · View at Google Scholar · View at Scopus
  36. P. Cingolani, A. Platts, L. L. Wang et al., “A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3,” Fly, vol. 6, no. 2, pp. 80–92, 2012. View at Publisher · View at Google Scholar
  37. P. Cingolani, V. M. Patel, M. Coon et al., “Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift,” Frontiers in Genetics, vol. 3, article 35, 2012. View at Publisher · View at Google Scholar · View at Scopus
  38. L. Habegger, S. Balasubramanian, D. Z. Chen et al., “VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment,” Bioinformatics, vol. 28, no. 17, pp. 2267–2269, 2012. View at Publisher · View at Google Scholar · View at Scopus
  39. S. T. Sherry, M.-H. Ward, M. Kholodov et al., “DbSNP: the NCBI database of genetic variation,” Nucleic Acids Research, vol. 29, no. 1, pp. 308–311, 2001. View at Publisher · View at Google Scholar · View at Scopus
  40. I. F. A. C. Fokkema, J. T. den Dunnen, and P. E. M. Taschner, “LOVD: easy creation of a locus-specific sequence variation database using an ‘LSDB-in-a-box’ approach,” Human Mutation, vol. 26, no. 2, pp. 63–68, 2005. View at Publisher · View at Google Scholar · View at Scopus
  41. 1000 Genomes Project Consortium, G. R. Abecasis, D. Altshuler et al., “A map of human genome variation from population-scale sequencing,” Nature, vol. 467, no. 7319, pp. 1061–1073, 2010. View at Publisher · View at Google Scholar
  42. S. A. Forbes, D. Beare, P. Gunasekaran et al., “COSMIC: exploring the world's knowledge of somatic mutations in human cancer,” Nucleic Acids Research, vol. 43, no. 1, pp. D805–D811, 2015. View at Publisher · View at Google Scholar · View at Scopus
  43. P. Kumar, S. Henikoff, and P. C. Ng, “Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm,” Nature Protocols, vol. 4, no. 7, pp. 1073–1081, 2009. View at Publisher · View at Google Scholar · View at Scopus
  44. I. A. Adzhubei, S. Schmidt, L. Peshkin et al., “A method and server for predicting damaging missense mutations,” Nature Methods, vol. 7, no. 4, pp. 248–249, 2010. View at Publisher · View at Google Scholar · View at Scopus
  45. S. Chun and J. C. Fay, “Identification of deleterious mutations within three human genomes,” Genome Research, vol. 19, no. 9, pp. 1553–1561, 2009. View at Publisher · View at Google Scholar · View at Scopus
  46. H. A. Shihab, J. Gough, D. N. Cooper et al., “Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models,” Human Mutation, vol. 34, no. 1, pp. 57–65, 2013. View at Publisher · View at Google Scholar · View at Scopus
  47. C. Dong, P. Wei, X. Jian et al., “Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies,” Human Molecular Genetics, vol. 24, no. 8, pp. 2125–2137, 2015. View at Publisher · View at Google Scholar · View at Scopus
  48. H. Carter, C. Douville, P. D. Stenson, D. N. Cooper, and R. Karchin, “Identifying Mendelian disease genes with the variant effect scoring tool,” BMC Genomics, vol. 14, p. S3, 2013. View at Google Scholar · View at Scopus
  49. M. Kircher, D. M. Witten, P. Jain, B. J. O'Roak, G. M. Cooper, and J. Shendure, “A general framework for estimating the relative pathogenicity of human genetic variants,” Nature Genetics, vol. 46, no. 3, pp. 310–315, 2014. View at Publisher · View at Google Scholar · View at Scopus
  50. D. E. Larson, C. C. Harris, K. Chen et al., “Somaticsniper: identification of somatic point mutations in whole genome sequencing data,” Bioinformatics, vol. 28, no. 3, Article ID btr665, pp. 311–317, 2012. View at Publisher · View at Google Scholar · View at Scopus
  51. J. C. Mu, M. Mohiyuddin, J. Li et al., “VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications,” Bioinformatics, vol. 31, no. 9, pp. 1469–1471, 2014. View at Publisher · View at Google Scholar · View at Scopus
  52. K. S. Smith, V. K. Yadav, S. Pei, D. A. Pollyea, C. T. Jordan, and S. De, “SomVarIUS: somatic variant identification from unpaired tissue samples,” Bioinformatics, vol. 32, no. 6, pp. 808–813, 2015. View at Publisher · View at Google Scholar
  53. C. Xie and M. T. Tammi, “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, vol. 10, no. 1, article 80, 2009. View at Publisher · View at Google Scholar · View at Scopus
  54. D. Y. Chiang, G. Getz, D. B. Jaffe et al., “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nature Methods, vol. 6, no. 1, pp. 99–103, 2009. View at Publisher · View at Google Scholar · View at Scopus
  55. K. C. Amarasinghe, J. Li, S. M. Hunter et al., “Inferring copy number and genotype in tumour exome data,” BMC Genomics, vol. 15, no. 1, article 732, 2014. View at Publisher · View at Google Scholar · View at Scopus
  56. J. Li, R. Lupat, K. C. Amarasinghe et al., “CONTRA: copy number analysis for targeted resequencing,” Bioinformatics, vol. 28, no. 10, Article ID bts146, pp. 1307–1313, 2012. View at Publisher · View at Google Scholar · View at Scopus
  57. A. Magi, L. Tattini, I. Cifola et al., “EXCAVATOR: detecting copy number variants from whole-exome sequencing data,” Genome Biology, vol. 14, no. 10, article R120, 2013. View at Publisher · View at Google Scholar · View at Scopus
  58. J. F. Sathirapongsasuti, H. Lee, B. A. J. Horst et al., “Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV,” Bioinformatics, vol. 27, no. 19, Article ID btr462, pp. 2648–2654, 2011. View at Publisher · View at Google Scholar · View at Scopus
  59. V. Boeva, A. Zinovyev, K. Bleakley et al., “Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization,” Bioinformatics, vol. 27, no. 2, pp. 268–269, 2011. View at Publisher · View at Google Scholar · View at Scopus
  60. Y. Jiang, D. Redmond, K. Nie et al., “Deep sequencing reveals clonal evolution patterns and mutation events associated with relapse in B-cell lymphomas,” Genome Biology, vol. 15, no. 8, article 432, 2014. View at Publisher · View at Google Scholar · View at Scopus
  61. D. C. Koboldt, Q. Zhang, D. E. Larson et al., “VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing,” Genome Research, vol. 22, no. 3, pp. 568–576, 2012. View at Publisher · View at Google Scholar · View at Scopus
  62. A. B. Olshen, H. Bengtsson, P. Neuvial, P. T. Spellman, R. A. Olshen, and V. E. Seshan, “Parent-specific copy number in paired tumor-normal studies using circular binary segmentation,” Bioinformatics, vol. 27, no. 15, Article ID btr329, pp. 2038–2046, 2011. View at Publisher · View at Google Scholar · View at Scopus
  63. J.-Y. Nam, N. K. Kim, S. C. Kim et al., “Evaluation of somatic copy number estimation tools for whole-exome sequencing data,” Briefings in Bioinformatics, vol. 17, no. 2, pp. 185–192, 2015. View at Publisher · View at Google Scholar
  64. J. Nadaf, J. Majewski, and S. Fahiminiya, “ExomeAI: detection of recurrent allelic imbalance in tumors using whole-exome sequencing data,” Bioinformatics, vol. 31, no. 3, pp. 429–431, 2014. View at Publisher · View at Google Scholar · View at Scopus
  65. H. Carter, J. Samayoa, R. H. Hruban, and R. Karchin, “Prioritization of driver mutations in pancreatic cancer using cancer-specific high-throughput annotation of somatic mutations (CHASM),” Cancer Biology & Therapy, vol. 10, no. 6, pp. 582–587, 2010. View at Publisher · View at Google Scholar · View at Scopus
  66. F. Vandin, E. Upfal, and B. J. Raphael, “De novo discovery of mutated driver pathways in cancer,” Genome Research, vol. 22, no. 2, pp. 375–385, 2012. View at Publisher · View at Google Scholar · View at Scopus
  67. M. S. Lawrence, P. Stojanov, P. Polak et al., “Mutational heterogeneity in cancer and the search for new cancer-associated genes,” Nature, vol. 499, no. 7457, pp. 214–218, 2013. View at Publisher · View at Google Scholar · View at Scopus
  68. M. Kanehisa and S. Goto, “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, vol. 28, no. 1, pp. 27–30, 2000. View at Publisher · View at Google Scholar · View at Scopus
  69. D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Research, vol. 37, no. 1, pp. 1–13, 2009. View at Publisher · View at Google Scholar · View at Scopus
  70. D. Szklarczyk, A. Franceschini, S. Wyder et al., “STRING v10: protein-protein interaction networks, integrated over the tree of life,” Nucleic Acids Research, vol. 43, no. 1, pp. D447–D452, 2015. View at Publisher · View at Google Scholar · View at Scopus
  71. M. Jeon, S. Lee, K. Lee, A.-C. Tan, and J. Kang, “BEReX: biomedical entity-relationship eXplorer,” Bioinformatics, vol. 30, no. 1, pp. 135–136, 2014. View at Publisher · View at Google Scholar · View at Scopus
  72. E. J. Rossin, K. Lage, S. Raychaudhuri et al., “Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology,” PLoS Genetics, vol. 7, no. 1, Article ID e1001273, 2011. View at Publisher · View at Google Scholar · View at Scopus
  73. K. Slowikowski, X. Hu, and S. Raychaudhuri, “SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci,” Bioinformatics, vol. 30, no. 17, pp. 2496–2497, 2014. View at Publisher · View at Google Scholar · View at Scopus
  74. M. J. Landrum, J. M. Lee, G. R. Riley et al., “ClinVar: public archive of relationships among sequence variation and human phenotype,” Nucleic Acids Research, vol. 42, no. 1, pp. D980–D985, 2014. View at Publisher · View at Google Scholar · View at Scopus
  75. M. Whirl-Carrillo, E. M. McDonagh, J. M. Hebert et al., “Pharmacogenomics knowledge for personalized medicine,” Clinical Pharmacology and Therapeutics, vol. 92, no. 4, pp. 414–417, 2012. View at Publisher · View at Google Scholar · View at Scopus
  76. V. Law, C. Knox, Y. Djoumbou et al., “DrugBank 4.0: shedding new light on drug metabolism,” Nucleic Acids Research, vol. 42, no. 1, pp. D1091–D1097, 2014. View at Publisher · View at Google Scholar · View at Scopus
  77. M. Yoo, J. Shin, J. Kim et al., “DSigDB: drug signatures database for gene set analysis,” Bioinformatics, vol. 31, no. 18, pp. 3069–3071, 2014. View at Publisher · View at Google Scholar · View at Scopus
  78. E. Cerami, J. Gao, U. Dogrusoz et al., “The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data,” Cancer Discovery, vol. 2, no. 5, pp. 401–404, 2012. View at Publisher · View at Google Scholar · View at Scopus
  79. Y. Guo, X. Ding, Y. Shen, G. J. Lyon, and K. Wang, “SeqMule: automated pipeline for analysis of human exome/genome sequencing data,” Scientific Reports, vol. 5, article 14283, 2015. View at Publisher · View at Google Scholar · View at Scopus
  80. X. Gao, J. Xu, and J. Starmer, “Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses,” BMC Research Notes, vol. 8, no. 1, p. 72, 2015. View at Publisher · View at Google Scholar · View at Scopus
  81. J. Hintzsche, J. Kim, V. Yadav et al., “IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples,” Journal of the American Medical Informatics Association, vol. 23, no. 4, pp. 721–730, 2016. View at Publisher · View at Google Scholar
  82. R. B. Altman, S. Prabhu, A. Sidow et al., “A research roadmap for next-generation sequencing informatics,” Science Translational Medicine, vol. 8, no. 335, Article ID 335ps10, 2016. View at Publisher · View at Google Scholar