Research Article

DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework

Table 2

Comparison of the wall-clock time of DeepVariant and DeepVariant-on-Spark with different combinations of CPUs/GPUs.

Variant callerDeepVariantDeepVariant-on-Spark
Machine modelCPU onlyCPU+GPUCPU onlyCPU+GPU

CPUa1632649616326432641283264128
GPUb0000124000248
SparkcNoNoNoNoNoNoNoYesYesYesYesYesYes
AdamTransform (hr)00000000.560.320.20.580.310.2
SelectBAM (hr)00000000.50.330.230.480.290.2
Make_examples (hr)6.133.151.731.25.933.11.62.721.612.821.480.83
Call_variants (hr)10.86.535.353.831.511.521.53.662.020.980.70.380.21
Postprocess_variants (hr)0.560.540.530.480.460.460.450.20.130.070.20.10.06
Merge VCF (hr)00000000.020.020.020.020.020.02
Total time (hr)17.4910.227.615.517.95.083.557.664.422.54.82.581.52
USD/per genome14.0215.9420.7725.3117.8622.7231.7623.2523.9825.5428.5729.2333.17
#genomes/300USDd211814111613912121110109

aCPU means the number of CPU cores. bGPU means the number of NVIDIA Tesla P100 GPU. cSpark means using Apache Spark or not. d#genomes/300USD means the numbers of whole-genome sequence jobs that can be completed under the trial credit of 300 USD.