Security and Communication Networks

Volume 2018, Article ID 4983404, 18 pages

https://doi.org/10.1155/2018/4983404

## A Vendor-Neutral Unified Core for Cryptographic Operations in GF(p) and GF() Based on Montgomery Arithmetic

^{1}Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland^{2}Institute ProtectIT, Deggendorf Institute of Technology, 94469 Deggendorf, Germany

Correspondence should be addressed to Martin Schramm; ed.ged-ht@mmarhcs.nitram

Received 6 October 2017; Revised 14 March 2018; Accepted 17 May 2018; Published 21 June 2018

Academic Editor: Fawad Ahmed

Copyright © 2018 Martin Schramm et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In the emerging IoT ecosystem in which the internetworking will reach a totally new dimension the crucial role of efficient security solutions for embedded devices will be without controversy. Typically IoT-enabled devices are equipped with integrated circuits, such as ASICs or FPGAs to achieve highly specific tasks. Such devices must have cryptographic layers implemented and must be able to access cryptographic functions for encrypting/decrypting and signing/verifying data using various algorithms and generate true random numbers, random primes, and cryptographic keys. In the context of a limited amount of resources that typical IoT devices will exhibit, due to energy efficiency requirements, efficient hardware structures in terms of time, area, and power consumption must be deployed. In this paper, we describe a scalable word-based multivendor-capable cryptographic core, being able to perform arithmetic operations in prime and binary extension finite fields based on Montgomery Arithmetic. The functional range comprises the calculation of modular additions and subtractions, the determination of the Montgomery Parameters, and the execution of Montgomery Multiplications and Montgomery Exponentiations. A prototype implementation of the adaptable arithmetic core is detailed. Furthermore, the decomposition of cryptographic algorithms to be used together with the proposed core is stated and a performance analysis is given.

#### 1. Introduction

The next generation of embedded systems and IoT devices will exhibit a much higher degree of internetworking which gives rise to security considerations [1]. As a logical consequence, such devices must become cryptographic nodes, besides others, being capable of encrypting/decrypting and signing/verifying data as well as establishing spontaneous secured communications by exchanging common secrets used for secret key calculation. While many embedded chips already have support for hardware-accelerated symmetric algorithms (mainly AES) [2] and hash functions, due to various reasons, such as complexity, space, and costs, they lack in hardware support especially for supporting a wide range of public-key and key exchange algorithms with different precision widths. Besides, many modern cryptographic primitives necessitate the capability for producing true random numbers and random prime numbers. Typical IoT devices furthermore very often only exhibit a limited amount of resources which requires efficient cryptographic hardware structures in terms of area, power consumption, and calculation performance [3]. In general enterprises developing IoT products basically have three options to include application functionalities in high integrated devices, using Application Specific Standard Products (ASSP), Application Specific Integrated Circuits (ASIC), or Field Programmable Gate Arrays (FPGA). Today FPGAs have become promising components for IoT applications [4], compared to ASSP solutions which often cannot provide the required functionality and can provide a better Total Cost of Ownership (TCO) compared to ASIC solutions. Thus for devices which are equipped with a FPGA device, it is valuable to examine how efficient hardware structures for performing cryptographic operations can be included.

In matters of algorithm agility an arithmetic engine with minimal hardware footprint, which can handle the arithmetic operations of a great variety of cryptographic algorithms, is of great importance for IoT based devices. Especially the calculability of the individual operations leading to lower and upper calculation time bounds is quite important.

This paper proposes a tiny-held vendor-neutral cryptographic arithmetic core exemplarily implemented in FPGA-logic. For efficiency, time-intensive modular operations, such as multiplication and exponentiation operations, Montgomery Arithmetic is used. Without the need of any expensive software precalculations the core is able to perform a high number of cryptographic algorithms and handle various key sizes by simply processing operation lists. Furthermore the core architecture is unified and can perform calculations in both prime finite fields () and binary extension fields (). To illustrate the versatility of the developed core, well-established cryptographic algorithms have been rewritten and fragmented into operation lists to be processed by the arithmetic engine.

The paper is organized as follows. Section 2 states the related work of this research. In Section 3 the design of the proposed Enhanced Montgomery Multiplication Core is stated; the specified functional range of the core is given in Section 4. In Section 5 some exemplary application descriptions for the core are mentioned and in Section 6 the results of the performance analysis are stated. Finally, Section 7 concludes the paper.

#### 2. Related Works

The efficiency of cryptographic algorithms when implemented on reconfigurable hardware is mainly determined by the fact of how the underlying finite field arithmetic operations are realized [5]. Several applications in cryptography such as ciphering and deciphering of asymmetric algorithms, the creation and verification of digital signatures, and secure key exchange mechanisms require excessive use of the basic finite field modular arithmetic operations addition, multiplication, and the calculation of the multiplicative inverse. Especially the field multiplication operation is crucial to the efficiency of a design, since it is the core operation of many cryptographic algorithms [6].

In [7] P. L. Montgomery introduced a representation of residue classes in order to speed up modular multiplications without affecting modular additions and subtractions. Over the years numerous designs have been proposed implementing modular multiplications based on Montgomery’s multiplication algorithm [8]. The foundation for these architectures was presented by A. Tenca and Ç. Koç in [9]. The architecture is based on a word-based Montgomery Multiplication algorithm for prime finite fields in which multiplications are performed in a bit-serial fashion. E. Savaş et al. in [10] have proposed an extension which, in addition to the standard integer modulo arithmetic, also allows polynomial computations over binary finite fields. An overview about algorithms and hardware architectures for Montgomery Multiplication can be found in [11]. Optimizations of the original design have been proposed concerning the hardware implementation of the Montgomery Multiplication algorithm [12] as well as by utilizing special arithmetic hardcore extensions of FPGAs to accelerate digital signal processing applications [13]. Some designs only focus on utilizing the Montgomery Multiplication method to accelerate modular exponentiation operations as required by the RSA algorithm [14, 15].

However, no publication focuses on how the Montgomery Multiplication architecture can be embedded into a comprehensive solution. In this paper we propose an enhanced version of a bit-serial word-based unified Montgomery Multiplication core based on logic elements only which is controlled by a state machine and offers the functional range to be able to perform complete cryptographic algorithms without additional complex processing required in software.

#### 3. Enhanced Montgomery Multiplication Core

##### 3.1. Requirements

Today a high number of different public-key algorithms are in use. To ensure compatibility, cryptographic applications must support a large portion of those algorithms. While typical software implementations often can easily be upgraded in order to adapt new algorithms and larger key sizes, the same is not necessarily true for hardware implementations. Therefore following requirements have been identified for the Enhanced Montgomery Multiplication Core:(i)*Use of Montgomery Arithmetic*. The design must be able to perform modulo operations in a time-efficient manner by using Montgomery Arithmetic. At least the core must support Montgomery Multiplications and Montgomery Exponentiations. Furthermore the core must support standard modulo additions and modulo subtractions.(ii)*Works on Both Finite Fields ** and *. The architecture must exhibit an unified structure supporting both standard integer modulo operations of prime finite fields as well as polynomial calculations of binary finite fields.(iii)*Montgomery Parameter Calculation*. In general the Montgomery Parameters ( and ) can be precomputed for previously known moduli. However, as a requirement the core must be able to handle arbitrary moduli. Therefore it must be capable of calculating the Montgomery Parameters , and without the need of precalculations done in software.(iv)*Scalable Design*. The architecture must be scalable in terms of timing, area, and power consumption. This includes the parametrisation of the word width, the internal storage size, and the amount of processing units within the pipeline.(v)*Multialgorithm Support*. The core must be based on a building-block design. The functional range provided by the arithmetic unit should empower algorithm agility, by fragmenting cryptographic algorithms into a list of core operations. At least the core must be capable of performing RSA [16] operations, (safe) prime number generation and primality testing (MR) [17, 18], key exchange operations (DH) [19], and elliptic curve calculations (EC) [20] over both prime and binary finite fields.(vi)*Supporting as Many Precision Widths as Possible*. The design must support a wide range of different precision widths determining the security level of the cryptographic algorithm. If a certain security level, due to increased attacking computing power, becomes inadequate, the precision width can be adjusted accordingly which makes the hardware less prone to become obsolete due to higher security demands. The core must support the current recommendations for minimum key sizes [21] and should also support larger key sizes. For RSA algorithm and Diffie-Hellman key exchange support the architecture should be able to handle precisions up to bit moduli, for elliptic curve cryptography support precisions up to bits for prime finite fields and precisions up to bits for binary finite fields should be possible.(vii)*Time-Invariant Operations*. The architecture must be capable of performing its operations in a time-invariant manner. If security sensitive information, such as private keys, will be processed, it must be ensured that all operations exhibit the same execution time to prevent side-channel attacks based on timing analysis.

##### 3.2. Overall Core Architecture

Figure 1 illustrates the overall architecture of the proposed Enhanced Montgomery Multiplication Core which is capable of meeting all requirements as specified above.