Research Article
Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor
Table 1
ARM NEON intrinsic functions for the proposed method.
| Operations | ARM NEON Intrinsic functions |
| Load | uint16x8_t vld1q_u16(__transfersize(8) uint16_t const ptr); |
| Store | void vst1q_u16(__transfersize(8) uint16_t ptr, uint16x8_t val); |
| Extracting lanes from a vector into a register | uint16_t vgetq_lane_u16(uint16x8_t vec, __constrange(0, 7) int lane); |
| Lane Broadcast | uint16x8_t vdupq_n_u16(uint16_t value); |
| Vector Interleave | uint16x8x2_t vzipq_u16(uint16x8_t a, uint16x8_t b); |
| Vector Multiply Accumulate | uint16x8_t vmlaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); |
|
|