Research Article

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

Table 1

Generation of the extended model for one FPGA based on the resource consumption, the computational performance of one PE, and the I/O limited performance obtained using ROCCC. The resultant roofline is obtained applying (2) and (4).

Resource ConsumptionNo unrollingUnrolling ×2Unrolling ×4Unrolling ×8Unrolling ×16Unrolling ×32

Slice registers (301440) 3652 6145 11132 21109 40573 79979
Slice LUTs (150720) 3157 4281 6335 10814 20189 37634
LUT-FF pairs (37680) 1069 1435 2245 3805 7193 13068
BRAM/FIFO (416) 1 2 3 4 8 15
DSP48 (768) 18 24 36 60 108 204
Max. number of s ( ) 35 26 16 9 5 2
Computational intensity ( ) 1.9768 2.624 3.144 3.496 3.704 3.816
Performance per ( ) [GBops/s] 0.636 1.191 2.132 2.711 3.192 3.482
Computational performance ( ) [GBops/s] 22.24 30.96 34.08 24.4 15.92 6.96
CI × PCIe × 8 BW [GBops/s] 8.302 11.027 13.190 14.678 15.554 16.0272
Resultant roofline 8.302 11.027 13.190 14.678 15.554 6.96