Research Article
A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance
Table 3
Bypassing register usage profile and effects of using non-transient copy on a single tile (16 PE) configuration.
| | Bypassing register usage profile: per PE per cycle | Performance impact | Average | Peak | % Nontransient | Amount of inter-PE links | IPC | Disable | Enable | Disable | Enable | Enable | Disable | Enable | Delta | Disable | Enable | Delta |
| idct(row+col) | 2.8 | 2.3 | 19 | 20 | 45% | 2357 | 2245 | −5% | 10.9 | 11.1 | 2% | interpolate_avg4_c | 4.8 | 3.1 | 21 | 14 | 58% | 1707 | 1333 | −22% | 6.4 | 8.8 | 38% | interpolate_halfpel__c | 3.7 | 3.9 | 19 | 16 | 45% | 1835 | 1621 | −12% | 8.3 | 9.6 | 16% | sad16_c() | 1.0 | 0.8 | 6 | 5 | 54% | 4752 | 4000 | −16% | 9.8 | 10.2 | 4% | get_block(horizontal) | 1.0 | 0.9 | 6 | 5 | 22% | 455 | 438 | −4% | 8.1 | 9.0 | 10% | get_block(vertical) | 1.6 | 1.0 | 8 | 5 | 52% | 513 | 400 | −22% | 5.7 | 8.0 | 41% | get_block(V+H) | 4.2 | 2.2 | 18 | 10 | 45% | 1469 | 1150 | −22% | 7.9 | 9.7 | 23% | get_block(H+V) | 3.1 | 2.3 | 13 | 8 | 43% | 1455 | 1148 | −21% | 8.4 | 9.5 | 13% |
| Average | | | | | | | | −15% | | | 18% |
|
|