Security and Communication Networks / 2018 / Article / Tab 1

Research Article

Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor

Table 1

ARM NEON intrinsic functions for the proposed method.

OperationsARM NEON Intrinsic functions

Loaduint16x8_t vld1q_u16(__transfersize(8) uint16_t const ptr);

Storevoid vst1q_u16(__transfersize(8) uint16_t ptr, uint16x8_t val);

Extracting lanes from a vector into a registeruint16_t vgetq_lane_u16(uint16x8_t vec, __constrange(0, 7) int lane);

Lane Broadcastuint16x8_t vdupq_n_u16(uint16_t value);

Vector Interleaveuint16x8x2_t vzipq_u16(uint16x8_t a, uint16x8_t b);

Vector Multiply Accumulateuint16x8_t vmlaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c);

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.