Research Article

Extensible Embedded Processor for Convolutional Neural Networks

Table 5

Speedup and gate summary for custom SIMD instructions.

4 x 4 tile, 3 x 3 convCyclesSpeedupGates

Baseline408156466
Shared2914.12389
Full2218.54456

8 x 8 tile, 3 x 3 convCyclesSpeedupGates
Baseline3404156466
Shared13026.22389
Full13026.24456
Shared+splits9137.42405
Full+splits8838.74472

Max poolingCyclesSpeedupGates
Baseline44156466
Tie133.4262

FC16CyclesSpeedupGates
Baseline2114156466
Shared7727.54124
Full7129.82970