Research Article

High Performance Implementation of 3D Convolutional Neural Networks on a GPU

Table 3

Performance of cuDNN SGEMM versus that of the 3D WMFA on 3D convolution layers. Performance is measured in effective TFLOPS.

Layer × × × × TFLOPSSpeedup
cuDNN SGEMM3D WMFA

conv232 × 16 × 56 × 56 × 32641.211.281.05
conv364 × 8 × 28 × 28 × 322562.383.311.39
conv4256 × 4 × 14 × 14 × 322562.44.721.96
conv5256 × 2 × 7 × 7 × 322561.462.11.44