Research Article

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Table 4

Speedups of IPCSR over other implementation techniques in double precision.

MatrixSpeedups on C2050Speedups on K20c
CUSPARSECUSPclSpMVPCSRCSR5CSR-AdaptiveCUSPARSECUSPclSpMVPCSRCSR5CSR-Adaptive

Dense 0.91 0.95 2.06 2.72 1.011.00 1.08 0.97 2.01 2.68 1.051.01
Protein 0.87 0.92 0.94 1.08 1.010.95 1.09 1.10 0.94 1.09 1.080.99
FEM/Spheres 1.37 1.39 1.29 1.36 1.221.00 1.72 1.66 1.29 1.37 1.301.07
FEM/Cantilever 1.35 1.41 1.26 1.55 1.171.04 1.70 1.77 1.26 1.571.251.11
Wind Tunnel 1.34 1.21 0.82 1.27 0.860.83 1.82 1.69 0.88 1.38 1.000.97
FEM/Harbor 1.13 1.22 1.05 1.28 1.061.02 1.42 1.35 1.05 1.29 1.141.09
QCD1.49 0.71 0.96 1.16 1.051.01 1.87 0.77 0.70 1.17 1.131.09
FEM/Ship 1.43 1.60 1.19 1.23 1.081.01 1.79 1.89 1.19 1.24 1.151.08
Economics2.28 1.68 1.67 1.59 1.151.056 2.87 2.04 1.67 1.60 1.231.08
Epidemiology1.42 0.90 0.74 1.07 1.011.00 1.79 1.10 0.74 1.08 1.081.03
FEM/Accelerator 1.70 1.65 0.92 1.48 1.071.00 2.13 1.37 0.92 1.49 1.151.07
Circuit 1.94 1.85 1.25 1.11 1.061.02 2.56 2.36 1.31 1.17 1.191.06
Webbase 6.11 1.29 0.93 1.19 1.171.05 7.93 1.63 0.96 1.24 1.291.08
LP 1.39 1.28 1.29 1.79 1.281.01 1.75 1.64 1.18 1.81 1.371.08
circuit5M 7.61 1.71 1.15 1.74 1.101.06 9.80 1.64 1.18 1.80 1.211.16
eu-20051.90 1.78 1.52 2.06 1.171.03 2.44 2.22 1.56 2.13 1.291.13
Ga41As41H72 1.06 1.19 1.09 1.23 1.081.03 1.33 1.70 1.09 1.24 1.161.11
in-2004 1.69 1.41 1.31 1.92 1.231.14 2.18 2.08 1.35 1.98 1.351.26
mip1 1.10 1.11 1.16 1.87 1.080.98 1.39 1.59 1.16 1.88 1.151.03
Si41Ge41H72 1.45 1.29 1.97 2.50 1.261.01 1.82 1.84 1.80 1.99 1.351.08