Research Article

Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Algorithm 3

DPF-based parallel Montgomery multiplication () algorithm: Converting Phase.
Input:
: Thread ID;
: Number of processed limbs per thread;
: Number of threads per Montgomery multiplication, where ;
: Redundant-format sub-result, where
;
Output:
: Simplified-format sub-result, where
;
(1)
(2) for   to   do
(3)
(4)
(5) end for
(6) while  carry of any thread is non-zero  do
(7)
(8)for   to   do
(9)
(10)
(11)end for
(12) end while