DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
Table 2
Comparison of the wall-clock time of DeepVariant and DeepVariant-on-Spark with different combinations of CPUs/GPUs.
Variant caller
DeepVariant
DeepVariant-on-Spark
Machine model
CPU only
CPU+GPU
CPU only
CPU+GPU
CPUa
16
32
64
96
16
32
64
32
64
128
32
64
128
GPUb
0
0
0
0
1
2
4
0
0
0
2
4
8
Sparkc
No
No
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
AdamTransform (hr)
0
0
0
0
0
0
0
0.56
0.32
0.2
0.58
0.31
0.2
SelectBAM (hr)
0
0
0
0
0
0
0
0.5
0.33
0.23
0.48
0.29
0.2
Make_examples (hr)
6.13
3.15
1.73
1.2
5.93
3.1
1.6
2.72
1.6
1
2.82
1.48
0.83
Call_variants (hr)
10.8
6.53
5.35
3.83
1.51
1.52
1.5
3.66
2.02
0.98
0.7
0.38
0.21
Postprocess_variants (hr)
0.56
0.54
0.53
0.48
0.46
0.46
0.45
0.2
0.13
0.07
0.2
0.1
0.06
Merge VCF (hr)
0
0
0
0
0
0
0
0.02
0.02
0.02
0.02
0.02
0.02
Total time (hr)
17.49
10.22
7.61
5.51
7.9
5.08
3.55
7.66
4.42
2.5
4.8
2.58
1.52
USD/per genome
14.02
15.94
20.77
25.31
17.86
22.72
31.76
23.25
23.98
25.54
28.57
29.23
33.17
#genomes/300USDd
21
18
14
11
16
13
9
12
12
11
10
10
9
aCPU means the number of CPU cores. bGPU means the number of NVIDIA Tesla P100 GPU. cSpark means using Apache Spark or not. d#genomes/300USD means the numbers of whole-genome sequence jobs that can be completed under the trial credit of 300 USD.