Research Article

Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units

Table 6

Summary of characteristics for the more relevant approaches.

CharacteristicsApproaches
Warp [4, 10]CCA [5, 11]Amber [12, 13]DIM [6, 14]Megablock [7, 15]

Partitioning approachDetect and decompile inner loops, dynamically translate those loops into configurations for a custom FPGADetect segments of instructions which are transformed into subgraphs and executed as macroinstructions on the CCA. Migration by modifying the instruction streamDetection of hot basic blocks by trace analysis, which are translated to DFGs and mapped to mesh-type RPU configurations Identify as many instructions as possible, inside one or more basic blocks, to be mapped to DIMDetect repeating patterns of instructions in the execution trace and migrate those loops to an RPU

CouplingLoose RPU/GPP coupling, shared instruction and data memoryTight RPU coupling to the GPP pipelineTight RPU coupling to the GPP pipelineTight RPU coupling to the GPP pipelineLoose RPU/GPP coupling through bus or dedicated connections

GranularityFine-grained RPU (LUTs, MAC)Coarse-grained RPU (ALUs)Coarse-grained RPU (ALUs)Coarse-grained RPU (ALUs)Coarse-grained RPU (ALUs)

Size of the segment of code to be mapped in a configurationInner loops with up to tens of lines of codeFrom a couple to a dozen of instructions across basic blocksUp to 1 basic block(1) A couple to a dozen of instructions inside a basic block or
(2) across up to three basic blocks with speculation
Inner and outer loops with up to hundreds of lines of code

BenchmarksNetBench, MediaBench, EEMBC, Powerstone, and in-house tool ROCMMediaBench, SPECint, and encryption algorithmsMiBench suiteMiBench suiteTexas DSPLIB and IMGLIB

Target domainGeneral Embedded systemsGeneral Embedded and General-Purpose SystemsGeneral Embedded and General-Purpose SystemsGeneral Embedded and General-Purpose SystemsGeneral Embedded Systems

GPP(1) ARM7 at 100 MHz
(2) MicroBlaze at 85 MHz
(1) 4-issue superscalar ARM
(2) In-order 5-stage pipelined ARM (ARM-926EJ)
4-issue in-order MIPS-based RISCMinimips softcore based on the MIPS R3000MicroBlaze

Size of the RPU14.22 mm2 with 180 nm library (~852,000 gates)0.61 mm2 with 130 nm libraryn.a.>1 million gatesn.a.

Average speedup(1) 6.3x
(2) 5.9x
(1) 1.2x
(2) 2.3x
1.25x(1) 2.0x
(2) 2.5x
2.0x

Average energy reduction(1) 66%
(2) 24%–55%
n.a.n.a..7xn.a.