Research Article
Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor
Table 3
Matrix multiplication performance (unit: ms).
| N | M | L | C version [3] | Proposed (NEON) | (Auto-Vectorization) |
| 536 | 1024 | 256 | 148.8991 | 93.91285 |
| 663 | 1024 | 256 | 171.0976 | 159.2069 |
| 816 | 1024 | 384 | 334.7499 | 224.5633 |
| 952 | 1024 | 384 | 391.7564 | 361.7326 |
|
|