Mathematical Problems in Engineering

Research Article

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Kernel 2.

Input:
CUDA-specific variables:
(i) threadId.x: a thread
(ii) blockId.x: a block
(iii) blockDim.x: number of threads per block
(iv) gridDim.x: number of blocks per grid
Output:
(01) define shared memory with size
(02) define shared memory with size
(03) threadIdx.x + blockIdx.x blockDim.x;
(04) threadIdx.x;
/Load ptr into the shared memory ptr_s /
(05) [] [];
(06) if == 0 then _s[] [ + ];
(07) __syncthreads();
(08) ([] − )/ + 1;
(09) ( , );
(10) 0.0; [];
(11) for to with += do
(12) index ;
(13) __syncthreads();
/Load into the shared memory /
(14) for to do
(15) if then
(16) [ ] ;
(17) += ;
(18) end
(19) done
(20) __syncthreads();
/Perform a scalar-style reduction/
(21) if ( or ) is false then
(22) ([] − );
(23) ;
(24) for to do
(25) += ;
(26) done
(27) end
(28) done
(29) ;