Research Article
IP-Enabled C/C++ Based High Level Synthesis: A Step towards Better Designer Productivity and Design Performance
Table 8
Mapping of Matrix Multiplication Operation.
| Design (2 2 matrix multiplication; 16 bit integer elements) | XC4VFX12FF668-10 | LUT | FF | DSP | BRAM | CLK (ns)7 | Latency/ throughput8 | Wall clock time (ns) | IP core | Comments |
| D1 (v-HLS) | 31 | 62 | 1 | 0 | 2.856 | 70/70 | 199.9 | No | External memory (EM) | D2 (v-HLS) | 39 | 138 | 8 | 0 | 2.856 | 9/9 | 25.7 | No | EM | D3 (v-HLS) | 135 | 86 | 1 | 3 | 3.247 | N.A | N.A | No |
— | D4 (v-HLS) | 129 | 198 | 8 | 3 | 3.112 | N.A | N.A | No | — | D5 (v-HLS) | 124 | 103 | 1 | 3 | 3.215 | 117/117 | 376.1 | No | — | D6 (v-HLS) | 155 | 214 | 8 | 3 | 2.988 | 59/59 | 176.2 | No | — | D7 (proposed) | 269 | 622 | 2 | 4 | 2.956 | 22/4 | 11.8 | Yes | — | D8 (proposed) | 514 | 1688 | 8 | 0 | 3.126 | 17/1 | 3.1 | Yes | — | D9 (proposed) | 183 | 576 | 4 | 7 | 3.005 | 17/2 | 3.0 | Yes | — |
|
|
Clock period constraint of 4 ns with 0.5 ns clock jitter. The best case and the worst case latencies were the same for v-HLS flows.
|