Research Article

A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)

Table 3

FPGA accelerator performance versus high-end quad core PC.

Matrix size Ops
(MMACs)
PC time
(s)
FPGA compute time (s) FPGA system
cycles
FPGA core
cycles
System to core
ratio
PC perf
(GMACS)
FPGA Perf
(GMACS)
PC versus FPGA FPGA perf at 1 : 2 rat. FPGA versus PC
at 1 : 2 rat.

68,700 5.25 12.7 2.53 G 491 M 5.15 13.1 5.43 56
8,500 0.704 1.65 328 M 63.9 M 5.13 12.1 5.22 54
1,070 0.120 0.221 44 M 8.62 M 5.10 8.91 4.86 50
134 0.0235 0.0383 7.6 M 1.37 M 5.54 5.70 3.50 39
2.1 0.00081 0.00207 415 k 59 k 7.03 2.59 1.01 14

17,045 1.403 2.629 526 M 105 M 5.00 12.1 6.48 65
42.6 0.0076 0.0121 460 k 2.4 M 5.21 5.58 3.5 36

The bold font emphasizes important entries for comparison reasons.