International Journal of Reconfigurable Computing

Volume 2016 (2016), Article ID 6371403, 10 pages

http://dx.doi.org/10.1155/2016/6371403

## FPGA Based High Speed SPA Resistant Elliptic Curve Scalar Multiplier Architecture

^{1}Electrical Engineering Department, COMSATS Institute of Information Technology, Abbottabad, Pakistan^{2}School of Electronic Engineering, Dublin City University, Dublin, Ireland^{3}School of Computer & Software, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China

Received 16 December 2015; Revised 30 March 2016; Accepted 3 May 2016

Academic Editor: Michael Hübner

Copyright © 2016 Khalid Javeed and Xiaojun Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The higher computational complexity of an elliptic curve scalar point multiplication operation limits its implementation on general purpose processors. Dedicated hardware architectures are essential to reduce the computational time, which results in a substantial increase in the performance of associated cryptographic protocols. This paper presents a unified architecture to compute modular addition, subtraction, and multiplication operations over a finite field of large prime characteristic . Subsequently, dual instances of the unified architecture are utilized in the design of high speed elliptic curve scalar multiplier architecture. The proposed architecture is synthesized and implemented on several different Xilinx FPGA platforms for different field sizes. The proposed design computes a 192-bit elliptic curve scalar multiplication in 2.3 ms on Virtex-4 FPGA platform. It is 34 faster and requires 40 fewer clock cycles for elliptic curve scalar multiplication and consumes considerable fewer FPGA slices as compared to the other existing designs. The proposed design is also resistant to the timing and simple power analysis (SPA) attacks; therefore it is a good choice in the construction of fast and secure elliptic curve based cryptographic protocols.

#### 1. Introduction

Elliptic curve based cryptography (ECC) proposed independently by Miller [1] and Koblitz [2] has established itself as a proper alternative to the traditional systems such as Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) [3]. The National Institute of Standards and Technology (NIST) recommended 256 bits of key lengths for ECC to achieve the same level of security as 3072 bits of RSA.

Due to the fact that ECC offers similar security with considerable smaller key sizes than RSA, it has been standardized by IEEE and NIST [4]. Thus, as the result of smaller key sizes, its implementation led to substantial reduction in power consumption and storage requirements and offers potentially higher data rates. These inherent properties rank it as a strong candidate for providing security in resource-constrained devices. Unfortunately, due to the underlying complex mathematical structure, its implementation on general-purpose processors (GPP) struggles to meet the speed requirements of many real-time applications.

Thus, several new implementation platforms have been explored during the last years. Field programmable gate array (FPGA) has been established as a proper platform for implementation of security algorithms such as ECC and RSA. Its shorter design cycle time, lower design cost, and its reconfigurability make it more attractive than other platforms, such as Application Specific Integrated Circuits (ASICs).

Elliptic curve scalar point multiplication is the central and most time consuming operation in all ECC based schemes. Its efficient implementation on various platforms is very critical. It is achieved by manipulating points on a properly chosen elliptic curve over a finite field. Mathematically, it is expressed as , where is a base point, is an integer value, and is the resultant point of multiplication of and . For example, it can be achieved by adding to itself () times. The strength of any ECC schemes is based on the computational hardness of finding given and known as Elliptic Curve Discrete Logarithm Problem (ECDLP).

There are several elliptic curve representations satisfying different performance and security requirements. A flexible design capable of supporting different values for elliptic curve parameters and a prime is more demanding. The ECDLP is not the only way of finding scalar ; it can also be revealed by monitoring the timing [5] and power consumption of cryptographic devices known as side channel attacks (SCAs) [6]. The simplest SCAs are based on the timing and simple power consumption analysis (SPA). Detailed surveys on known SCAs, countermeasures, and secure ECC implementations are reported previously in [7, 8].

Elliptic curve scalar point multiplication involves many basic modular arithmetic operations such as addition, subtraction, multiplication, inversion, and division. Hence, optimization of these operations can significantly improve the performance of ECC schemes.

Elliptic curve cryptosystems can be designed on a finite field either with prime characteristics or with binary characteristics . The arithmetic is easier to implement in hardware than because of carry-free arithmetic. However, field parameters in are mostly fixed and are not very flexible. Some efficient ECC implementations over are presented in [9–14]. A very good survey of high speed hardware implementations of ECC has been reported in [15].

Several hardware based elliptic curve processors over have also been proposed in the literature [5, 16–26]. The design reported in [21] proposed two architectures to speed up the EC point multiplication operation. Both these architectures are based on incorporating parallel dedicated hardware units to compute arithmetic operations such as addition, subtraction, multiplication, and division over . The multiplication unit [21] is based on a bit-serial interleaved multiplication while, for a division over , a dedicated hardware unit based on a binary version of the extended Euclidean algorithm is used. Ghosh et al. proposed a speed and area optimized architecture for EC point multiplication by exploiting a concept of shared hardware arithmetic over [20]. The saving in area is achieved by sharing hardware resources among different arithmetic operations, while multiple copies of the arithmetic units are used to speed up EC point multiplication.

##### 1.1. Contribution

Modern FPGAs have dedicated built-in arithmetic components (dedicated multipliers, block RAMs, etc.) to perform different signal processing tasks efficiently. However, in this work these components are not used due to the limitations of the adopted technique to perform a modular multiplication, that is, Interleaved Multiplication (IM) algorithm [27], which interleaved the reduction step by reducing each partial product. To the best of authors knowledge, no work has been reported targeting a digit-wise implementation of the IM technique. However, available small-sized dedicated multipliers inside an FPGA can be very effective in case of the Montgomery multiplication [28] and the NIST recommended primes [29]. A modular multiplication using these methods can be performed by integers multiplication followed by a modular reduction.

This paper presents a novel architecture to speed up the EC point multiplication in affine coordinates. The proposed design is based on a unified adder, subtractor, and multiplier (Add/Sub/Mul) unit. The unified Add/Sub/Mul unit is an extension of our previous multiplier design reported in [30]. The proposed unified unit in this work performs modular addition and subtraction in a single clock cycle, while modular multiplication is performed in clock cycles, where . The careful FPGA implementation of the proposed EC point multiplication architecture outperforms the other existing designs. The main advantages of the proposed design are as follows.(i)It reduces the number of required clock cycles and computation time of EC point multiplication to almost and , respectively, with considerably smaller FPGA area consumption. The reduction in clock cycles and computation time is mainly due to the proposed multiplier [30].(ii)Furthermore, the adopted algorithm for EC point multiplication with careful implementation of arithmetic primitives is capable of resisting the timing and SPA attacks [5].(iii)It is flexible; all parameters (curve parameter , EC point , scalar value , and the prime value ) can be easily changed without FPGA reconfiguration.

This paper is organized as follows. Section 2 briefly explains EC group operations such as EC point addition and EC point doubling. In addition, this section also describes the Montgomery ladder structure for the EC point multiplication algorithm. The unified Add/Sub/Mul unit over is presented in Section 3. Section 4 proposes a novel architecture for EC point multiplication based on the unified Add/Sub/Mul unit. Implementation results and performance evaluation are presented in Section 5, and finally the paper is concluded in Section 6.

#### 2. Elliptic Curve Group Operations

In this paper, we consider an elliptic curve , defined over a prime field , where is a large prime characteristic number. Field elements are represented as integers in the range []. An elliptic curve over in short Weierstrass form is represented aswhere, , , , and and (modulo ). The set of all points that satisfies (1), plus the point at infinity, makes an abelian group. EC point addition and EC point doubling operations over such groups are used to construct many elliptic curve cryptosystems. The EC point addition and EC point doubling operations in affine coordinates can be represented as follows: let and be two points on the elliptic curve. The group operation is the point addition, , which is defined by the group law and is given aswhere

If , then a special case of adding a point to itself is called EC point doubling operation. In affine coordinates the EC point addition requires one division, two multiplications, and six addition or subtraction operations, whereas the EC point doubling can be performed by using one division, three multiplications, and seven addition or subtraction operations. Therefore, optimization of these operations impacts significantly on the overall performance of the EC point multiplication operation.

##### 2.1. Elliptic Curve Scalar Multiplication

EC cryptosystems are mostly based on the EC point multiplication operation. This operation can be performed as a sequence of EC point addition and EC point doubling operations given in Algorithm 1, which is known as the Montgomery ladder for EC point multiplication. Algorithm 1 works on the binary representation of and it is assumed that the most significant bit is equal to 1. The EC point addition and EC point doubling operations are not dependant on the bit pattern of , so these operations can be performed in parallel. As these can be executed concurrently, therefore Algorithm 1 gives an extra feature of protection against the timing and simple power analysis (SPA) attacks.