Research Article

Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Table 8

Throughput of operations per second.

Neves and Araujo [22] Henry and Goldberg [23]Jang et al. [19] Emmart and Weems [7] Yang [24] Jeffrey and Robinson [21]Ours

CUDA platformGTX 260M2050GTX 580GTX780TiGT 750 mGTX TITAN
SM number24141615214
Shader Clock (GHz)1.2421.1501.5440.8760.9670.837
Int Mul/SM (/Clock)8 (24-bit)1616323232
Int Mul (G/s)134 (238)25839542062375
Throughput scaling factor0.3570.6881.0531.120.1651
Latency scaling factor1.4841.3741.8451.0471.1551

1024-bit MulMod (ops/s)
MulMod (scaled) (ops/s)
RSA-1024 (ops/s)41,42634,981

RSA-2048 (ops/s)12,04462,3655,24442,211
RSA-2048 (ms)13.8360.07195.2721.22

RSA-2048 (scaled) (ops/s)14,50411,43855,68331,78242,211
RSA-2048 (scaled) (ms)25.5162.87225.6021.22

RSA-4096 (ops/s)5,2575,790
RSA-4096 (scaled) (ops/s)4,6935,790

et al. also report the latency of RSA-2048 decryption is 6.5 ms (after scaled 6.8 ms) when the Batch Size is 1, at the moment the throughput is 154. peak 2048-bit RSA throughput, when Threads/RSA is , window size is 6, Max Reg. is 127, and Batch Size is . peak 4096-bit RSA throughput, when Threads/RSA is , window size is 6, Max Reg. is 127, and Batch Size is .