Research Article

A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

Table 3

Bypassing register usage profile and effects of using non-transient copy on a single tile (16 PE) configuration.

Bypassing register usage profile: per PE per cyclePerformance impact
AveragePeak% NontransientAmount of inter-PE linksIPC
DisableEnableDisableEnableEnableDisableEnableDeltaDisableEnableDelta

idct(row+col)2.82.3192045%23572245−5%10.911.12%
interpolate _avg4_c4.83.1211458%17071333−22%6.48.838%
interpolate _halfpel_ _c3.73.9191645%18351621−12%8.39.616%
sad16_c( )1.00.86554%47524000−16%9.810.24%
get_block(horizontal)1.00.96522%455438−4%8.19.010%
get_block(vertical)1.61.08552%513400−22%5.78.041%
get_block(V+H)4.22.2181045%14691150−22%7.99.723%
get_block(H+V)3.12.313843%14551148−21%8.49.513%

Average−15%18%