Research Article

High Performance Implementation of 3D Convolutional Neural Networks on a GPU

Figure 2

For an input matrix, its size is ; is the tile size, here equal to 4. After the reshape kernel is applied, lots of small submatrices with new layouts are generated.