Research Article

High Performance Implementation of 3D Convolutional Neural Networks on a GPU

Algorithm 2

Convolutional layer implemented with WMFA (, ).
is the number of image tiles.
is the input tile size.
Neighbouring tiles overlap by .
is input tile in channel .
is filter in channel .
is output tile in filter .
for   to   do
for   to C  do
Scatter to matrices :
end for
end for
for   to   do
for   to C  do
Scatter to matrices :
end for
end for
for   to   do
for   to   do
for   to   do
end for
end for
end for
for   to   do
for   to   do
Gather from matrices
end for
end for