Research Article
Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor
Table 2
Matrix transpose performance (Unit: ms).
| N | M | L | C version | Proposed (NEON) | (Auto-Vectorization) |
| 536 | 1024 | 256 | 364.2304 | 0.446443 |
| 663 | 1024 | 256 | 630.0066 | 0.707373 |
| 816 | 1024 | 384 | 970.4782 | 1.78282 |
| 952 | 1024 | 384 | 1172.607 | 2.078113 |
|
|