input
:
output
:
(1)
for
all threads in each thread block
do in parallel
(2) local variable
(3)
(4)
(5)
Algorithm 2:
CUDA kernel for constant vector multiplication and vector vector addition.