Research Article

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Table 3

Speedups of IPCSR over other implementation techniques in single precision.

MatrixSpeedups on C2050Speedups on K20c
CUSPARSECUSPclSpMVPCSRCSR5CSR-AdaptiveCUSPARSECUSPclSpMVPCSRCSR5CSR-Adaptive

Dense 1.04 1.00 3.16 3.30 1.051.03 1.24 1.03 3.09 3.25 1.071.04
Protein 1.24 1.01 1.22 1.471.080.97 1.55 1.21 1.22 1.48 1.151.00
FEM/Spheres 1.79 1.34 1.04 1.43 1.271.00 2.24 1.61 1.04 1.44 1.351.07
FEM/Cantilever 1.71 1.31 1.09 1.48 1.091.00 2.14 1.65 1.09 1.50 1.161.08
Wind Tunnel 1.98 1.23 0.95 1.06 1.041.00 2.49 1.59 0.95 1.07 1.111.07
FEM/Harbor 1.84 1.44 1.78 1.69 1.091.02 2.31 1.59 1.78 1.70 1.161.08
QCD2.42 0.96 1.07 1.34 1.051.03 3.04 1.03 1.07 1.35 1.121.05
FEM/Ship 1.89 1.62 1.12 1.57 1.111.03 2.38 1.92 1.12 1.59 1.181.10
Economics2.91 1.86 1.80 1.66 1.341.02 3.83 2.38 1.89 1.75 1.501.04
Epidemiology1.71 0.65 0.76 1.03 1.031.00 2.15 0.79 0.76 1.04 1.101.02
FEM/Accelerator 2.32 2.14 1.29 1.52 1.161.01 2.91 1.77 1.29 1.53 1.241.07
Circuit 2.66 2.16 1.63 1.58 1.581.04 3.51 2.76 1.71 1.67 1.761.07
Webbase 7.43 1.29 1.27 1.11 1.091.01 9.64 1.62 1.31 1.15 1.201.03
LP 1.58 1.20 1.45 1.60 1.211.05 1.99 1.54 1.45 1.61 1.291.10
circuit5M 12.55 1.63 1.78 1.78 1.531.06 16.16 1.56 1.61 1.83 1.671.14
eu-20052.89 1.95 2.20 2.86 1.311.02 3.71 2.44 2.25 2.96 1.431.11
Ga41As41H72 1.60 1.41 1.71 1.98 1.341.02 2.01 2.02 1.71 2.00 1.431.09
in-2004 2.51 1.76 2.14 2.75 1.351.01 3.23 2.59 2.20 2.84 1.471.08
mip1 1.64 1.37 1.91 2.04 1.300.97 2.06 1.95 1.91 2.061.391.00
Si41Ge41H72 1.45 1.29 1.80 1.97 1.171.02 1.87 1.88 1.84 2.03 1.271.09