Research Article
The Potential for a GPU-Like Overlay Architecture for FPGAs
Table 2
The schedule of operand reads from the central register file for batches of four threads (T0–T3, T4–T7, etc.) decoding both ALU and TEX instructions. TEX instructions require only one source operand, hence we can read source operands for four threads in a single cycle.
|