Research Article
High Performance Implementation of 3D Convolutional Neural Networks on a GPU
Algorithm 2
Convolutional layer implemented with WMFA
(
, ).
is the number of image tiles. | is the input tile size. | Neighbouring tiles overlap by . | is input tile in channel . | is filter in channel . | is output tile in filter . | for to do | for to C do | | Scatter to matrices : | end for | end for | for to do | for to C do | | Scatter to matrices : | end for | end for | for to do | for to do | for to do | | end for | end for | end for | for to do | for to do | Gather from matrices | | end for | end for |
|