Research Article
High Performance Implementation of 3D Convolutional Neural Networks on a GPU
Table 3
Performance of cuDNN SGEMM versus that of the 3D WMFA on 3D convolution layers. Performance is measured in effective TFLOPS.
| Layer | × × × × | | TFLOPS | Speedup | cuDNN SGEMM | 3D WMFA |
| conv2 | 32 × 16 × 56 × 56 × 32 | 64 | 1.21 | 1.28 | 1.05 | conv3 | 64 × 8 × 28 × 28 × 32 | 256 | 2.38 | 3.31 | 1.39 | conv4 | 256 × 4 × 14 × 14 × 32 | 256 | 2.4 | 4.72 | 1.96 | conv5 | 256 × 2 × 7 × 7 × 32 | 256 | 1.46 | 2.1 | 1.44 |
|
|