Research Article
A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
Input: | CUDA-specific variables: | (i) threadId.x: a thread | (ii) blockId.x: a block | (iii) blockDim.x: number of threads per block | (iv) gridDim.x: number of blocks per grid | Output: | (01) define shared memory with size | (02) define shared memory with size | (03) threadIdx.x + blockIdx.x blockDim.x; | (04) threadIdx.x; | /Load ptr into the shared memory ptr_s / | (05) [] []; | (06) if == 0 then _s[] [ + ]; | (07) __syncthreads(); | (08) ([] − )/ + 1; | (09) ( , ); | (10) 0.0; []; | (11) for to with += do | (12) index ; | (13) __syncthreads(); | /Load into the shared memory / | (14) for to do | (15) if then | (16) [ ] ; | (17) += ; | (18) end | (19) done | (20) __syncthreads(); | /Perform a scalar-style reduction/ | (21) if ( or ) is false then | (22) ([] − ); | (23) ; | (24) for to do | (25) += ; | (26) done | (27) end | (28) done | (29) ; |
|