Security and Communication Networks / 2018 / Article / Alg 2

Research Article

Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor

Algorithm 2

Efficient matrix multiplication and accumulation.
Require: Matrix A ( matrix, ), Matrix S ( matrix,
), Matrix E ( matrix, )
Ensure: Matrix E ( matrix, )
1: for i from 0 to M do
2:   for j from 0 to L do
3:    sum_vect = NEON_Lane_Broadcast(0);
4:    for k from 0 to iter_k do
5:      a_vec = NEON_Vector_Load (A + i N + k LANES_SHORT_NUM);
6:      s_vec = NEON_Vector_Load (S + j N + k LANES_SHORT_NUM);
7:      sum_vect = NEON_Multiply_Accumulate(sum_vect, a_vec, s_vec);
8:    NEON_Vector_Store (sum, sum_vect);
9:    E[i L + j] += sum[]+sum[]+sum[] + sum[] +sum[]+sum[]+sum[]+sum[];
10:    if (k == N/LANES_SHORT_NUM) && (NLANES_SHORT_NUM)
11:      for k from N-(NLANES_SHORT_NUM) to N do
12:         E[i L + j] += A[iN+k]B[kN+j];
13: Return E;

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at help@hindawi.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.