Research Article
A Domain-Specific Architecture for Elementary Function Evaluation
Table 1
Accuracy, throughput, and table size (for SPU/double precision).
| Function | Cycles/double new | Cycles/double SPU | Speedup (%) | Max error (ulps) | Table size () | Poly order | |
| recip | 3 | 11.3 | 376 | 0.500 | 2048 | 3 | | div | 3.5 | 14.9 | 425 | 1.333 | recip | 3 | | sqrt | 3 | 15.4 | 513 | 0.500 | 4096 | 3 | 18 | rsqrt | 3 | 14.6 | 486 | 0.503 | 4096 | 3 | |
| cbrt | 8.3 | 13.3 | 160 | 0.500 | 8192 | 3 | 18 | rcbrt | 10 | 16.1 | 161 | 0.501 | rcbrt | 3 | | qdrt | 7.5 | 27.6 | 368 | 0.500 | 8192 | 3 | 18 | rqdrt | 8.3 | 19.6 | 229 | 0.501 | rqdrt | 3 | 18 |
| log2 | 2.5 | 14.6 | 584 | 0.500 | 4096 | 3 | 18 | log21p | 3.5 | n/a | n/a | 1.106 | log2 | 3 | | log | 3.5 | 13.8 | 394 | 1.184 | log2 | 3 | | log1p | 4.5 | 22.5 | 500 | 1.726 | log2 | 3 | |
| exp2 | 4.5 | 13.0 | 288 | 1.791 | 256 | 4 | 18 | exp2m1 | 5.5 | n/a | n/a | 1.29 | exp2 | 4 | | exp | 5.0 | 14.4 | 288 | 1.55 | exp2 | 4 | | expm1 | 5.5 | 19.5 | 354 | 1.80 | exp2 | 4 | |
| atan2 | 7.5 | 23.4 | 311 | 0.955 | 4096 | 2 | 18 | atan | 7.5 | 18.5 | 246 | 0.955 | atan2 | 2 + 3 | | asin | 11 | 27.2 | 247 | 1.706 | atan2 | 2 + 3 + 3 | | acos | 11 | 27.1 | 246 | 0.790 | atan2 | 2 + 3 + 3 | |
| sin | 11 | 16.6 | 150 | 1.474 | 128 | 3 + 3 | 52 | cos | 10 | 15.3 | 153 | 1.025 | sin | 3 + 3 | | tan | 24.5 | 27.6 | 113 | 2.051 | sin | 3 + 3 + 3 | |
|
|