input:
       output:
(1) for  all threads in each thread block do in parallel
(2)         local variable
(3)         
(4)         
(5)         
Algorithm 2: CUDA kernel for constant vector multiplication and vector vector addition.