Research Article
A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)
Table 3
FPGA accelerator performance versus high-end quad core PC.
| Matrix size | Ops (MMACs) | PC time (s) | FPGA compute time (s) | FPGA system cycles | FPGA core cycles | System to core ratio | PC perf (GMACS) | FPGA Perf (GMACS) | PC versus FPGA | FPGA perf at 1 : 2 rat. | FPGA versus PC at 1 : 2 rat. |
| | 68,700 | 5.25 | 12.7 | 2.53 G | 491 M | 5.15 | 13.1 | 5.43 | | 56 | | | 8,500 | 0.704 | 1.65 | 328 M | 63.9 M | 5.13 | 12.1 | 5.22 | | 54 | | | 1,070 | 0.120 | 0.221 | 44 M | 8.62 M | 5.10 | 8.91 | 4.86 | | 50 | | | 134 | 0.0235 | 0.0383 | 7.6 M | 1.37 M | 5.54 | 5.70 | 3.50 | | 39 | | | 2.1 | 0.00081 | 0.00207 | 415 k | 59 k | 7.03 | 2.59 | 1.01 | | 14 | |
| | 17,045 | 1.403 | 2.629 | 526 M | 105 M | 5.00 | 12.1 | 6.48 | | 65 | | | 42.6 | 0.0076 | 0.0121 | 460 k | 2.4 M | 5.21 | 5.58 | 3.5 | | 36 | |
|
|
The bold font emphasizes important entries for comparison reasons.
|