|
Characteristics | Approaches |
Warp [4, 10] | CCA [5, 11] | Amber [12, 13] | DIM [6, 14] | Megablock [7, 15] |
|
Partitioning approach | Detect and decompile inner loops, dynamically translate those loops into configurations for a custom FPGA | Detect segments of instructions which are transformed into subgraphs and executed as macroinstructions on the CCA. Migration by modifying the instruction stream | Detection of hot basic blocks by trace analysis, which are translated to DFGs and mapped to mesh-type RPU configurations | Identify as many instructions as possible, inside one or more basic blocks, to be mapped to DIM | Detect repeating patterns of instructions in the execution trace and migrate those loops to an RPU |
|
Coupling | Loose RPU/GPP coupling, shared instruction and data memory | Tight RPU coupling to the GPP pipeline | Tight RPU coupling to the GPP pipeline | Tight RPU coupling to the GPP pipeline | Loose RPU/GPP coupling through bus or dedicated connections |
|
Granularity | Fine-grained RPU (LUTs, MAC) | Coarse-grained RPU (ALUs) | Coarse-grained RPU (ALUs) | Coarse-grained RPU (ALUs) | Coarse-grained RPU (ALUs) |
|
Size of the segment of code to be mapped in a configuration | Inner loops with up to tens of lines of code | From a couple to a dozen of instructions across basic blocks | Up to 1 basic block | (1) A couple to a dozen of instructions inside a basic block or (2) across up to three basic blocks with speculation | Inner and outer loops with up to hundreds of lines of code |
|
Benchmarks | NetBench, MediaBench, EEMBC, Powerstone, and in-house tool ROCM | MediaBench, SPECint, and encryption algorithms | MiBench suite | MiBench suite | Texas DSPLIB and IMGLIB |
|
Target domain | General Embedded systems | General Embedded and General-Purpose Systems | General Embedded and General-Purpose Systems | General Embedded and General-Purpose Systems | General Embedded Systems |
|
GPP | (1) ARM7 at 100 MHz (2) MicroBlaze at 85 MHz | (1) 4-issue superscalar ARM (2) In-order 5-stage pipelined ARM (ARM-926EJ) | 4-issue in-order MIPS-based RISC | Minimips softcore based on the MIPS R3000 | MicroBlaze |
|
Size of the RPU | 14.22 mm2 with 180 nm library (~852,000 gates) | 0.61 mm2 with 130 nm library | n.a. | >1 million gates | n.a. |
|
Average speedup | (1) 6.3x (2) 5.9x | (1) 1.2x (2) 2.3x | 1.25x | (1) 2.0x (2) 2.5x | 2.0x |
|
Average energy reduction | (1) 66% (2) 24%–55% | n.a. | n.a. | .7x | n.a. |
|