Research Article

The Potential for a GPU-Like Overlay Architecture for FPGAs

Figure 5

The floating point units in a datapath that supports MADD, DP3, and DP4 ALU instructions. The pipeline latency of each unit is shown on the left (for Altera floating point IP cores), and the total latency of the datapath is 53 cycles without accounting for extra pipeline stages for multiplexing between units.
514581.fig.005