Research Article

The Potential for a GPU-Like Overlay Architecture for FPGAs

Figure 7

Mapping our register file architecture to four Stratix II's 64 KB M-RAM blocks. The read circuitry shows an example where we are reading operands across threads in a batch for a vector/scalar ALU instruction pair (VLIW): r3 as an operand for the vector instruction and r5 as an operand for the scalar instruction. While not shown, register writes are implemented similarly.
514581.fig.007