Research Article
Extensible Embedded Processor for Convolutional Neural Networks
Table 5
Speedup and gate summary for custom SIMD instructions.
| 4 x 4 tile, 3 x 3 conv | Cycles | Speedup | Gates |
| Baseline | 408 | — | 156466 | Shared | 29 | 14.1 | 2389 | Full | 22 | 18.5 | 4456 |
| 8 x 8 tile, 3 x 3 conv | Cycles | Speedup | Gates | Baseline | 3404 | — | 156466 | Shared | 130 | 26.2 | 2389 | Full | 130 | 26.2 | 4456 | Shared + splits | 91 | 37.4 | 2405 | Full + splits | 88 | 38.7 | 4472 |
| Max pooling | Cycles | Speedup | Gates | Baseline | 44 | — | 156466 | Tie | 13 | 3.4 | 262 |
| FC16 | Cycles | Speedup | Gates | Baseline | 2114 | — | 156466 | Shared | 77 | 27.5 | 4124 | Full | 71 | 29.8 | 2970 |
|
|