Abstract

The Internet of Things is changing all sectors such as manufacturing, agriculture, city infrastructure, and the automotive industry. All these applications ask for secure processors that can be embedded in the IoT devices. Furthermore, these devices are restricted in terms of computing capabilities, memory, and power consumption. A major challenge is how to meet the need for security in such resource-constrained devices. This paper presents a customized version of LEON3, the ReonV RISCV (Reduced Instruction Set Computer-five) processor, dedicated for IoT applications that has strong effective security mechanisms built in at the design stage. Firstly, efficient lightweight cipher designs are elaborated and validated. Then, the proposed cryptographic instructions (PRESENT and PRINCE) are integrated into the default instruction set architecture of the ReonV processor core. The instruction set extensions (ISE) of lightweight cipher modules can be instantiated in software routines exactly as the instructions of the base architecture. A single instruction is needed to implement a full lightweight cryptographic instruction. The customized ReonV RISCV processor is implemented on a Xilinx FPGA platform and is evaluated for Slice LUTs plus FF-pairs, frequency, and throughput. Obtained results show that our proposed concepts not only can achieve good encryption results with high performance and reduced cost but also are secure enough to resist against the most common attacks.

1. Introduction

The Internet of Things (IoT) refers to a huge number of connected devices to the Internet, all collecting and sharing vast amounts of data [1]. Regarding the huge amount of connected devices, the IoT has become an integral part of the lives of billions of people around the world.

However, when using open and connected devices such as smart city assets like smart transportation, smart traffic-lights, and smart meters, in industry 4.0 devices like programmable logic controllers (PLCs), robots, and machines, there have been recently many IoT-related security threats with a devastating security incident and the problems will get worse [2, 3]. Hence, we all have to make security the key issue in the choice and deployment of IoT-related devices.

Cryptography is an effective countermeasure, and the IoT is now required to apply encryption to autonomous devices in environments with various restrictions. Lightweight cryptography is a technology researched and developed to enable the application of secure encryption, even for devices with limited resources [4].

Securing IoT devices require innovation at the processor level to boost the next generation of IoT edge devices with a new kind of enhanced processors. This category of processors is built on new levels of Central Processing Unit (CPU) with integrated IoT features, real-time performance, and functional safety. Security mechanism can be added to processor based IoT devices by means of specific software libraries or dedicated hardware accelerators. The software approach is very flexible, but it is not suited for extremely constrained devices. Indeed, software solution can be too costly in terms of memory occupation or energy consumption and the performance may be enough to meet the application requirements. Moreover, certain security threats may not be defeated using software solutions only. The second approach allows meeting better performance and often better security but is not flexible and often requires an excessively large area occupation. Furthermore, depending on the frequency and the amount of data which needs to be moved to the accelerator, the delay of the transfer could overshadow the speedup of the dedicated hardware.

This paper focuses on the addition of custom lightweight cryptographic instructions into the default instruction set architecture of the ReonV processor.

This work has led to the creation of ReonV-LWCIS processor, which is a new version of the ReonV. It represents a modified version of the LEON3. The ReonV is a synthesizable VHDL model of a 32 bit processor originally compliant with the SPARC V8 architecture, changed to implement the RISC-V RV32I ISA. Two instructions of lightweight cryptographic algorithms: PRESENT and PRINCE, are incorporated in the customized processor with respect of computing capabilities, cost, efficiency (i.e., throughput per slice) power, energy per bit consumption, and security level.

The rest of this article is organized as follows: the next section discusses earlier works related to instruction set extensions for cryptographic algorithms. Section 3 gives a description of LEON3 SPARC V8 processor and the modified version of LEON3, the ReonV RISCV processor. The elaborated hardware architectures of lightweight cryptographic algorithms are analyzed in Section 4. The proposed instruction set extensions for lightweight cryptographic instructions (LWCIS) are analyzed in Section 5. Synthesis results on FPGA (Field Programmable Gate Arrays) platforms as well as security analysis of the elaborated cipher designs are achieved in Section 6. Implementation results of the customized processor are presented in this section too. Section 7 concludes this paper.

This section reviews works related with our proposed approach. Numerous implementation approaches often exist for a given lightweight cryptographic algorithm. The software approach consists of using general purpose processors together with embedded software implementations. The second approach is the implementation using dedicated hardware platforms such as FPGA or ASIC (Application Specific Integrated Circuit). Another hybrid approach is based on instruction set extensions (ISEs) technique and consists of the customization of the processor’s instruction set and the microarchitecture.

The work presented in [5] introduced a software implementation of HIGHT lightweight block cipher, on 8-bit AVR and 32-bit ARM Cortex-M3 and hardware (i.e., ASIC). The authors suggested generic methods which can be easily used for other ARX block ciphers.

In [6], Varici et al. have implemented different design methodologies of PRESENT algorithm which include hand-coded RTL, Vivado HLS, PicoBlaze, VerySimpleCPU (VSCPU) based microcontrollers, and a customized VSCPU. They propose a customized CPU design based on optimizing the instruction set architecture for PRESENT algorithm. They prove that their solution is efficient with better flexibility features compared to RTL designs.

Eisenkraemer et al. [7] have focused on the enhancement of the instruction set architecture (ISA). A hybrid design is applied to boost the performance of their design. The recommended design is validated by measuring the area overhead, memory parameters, and the speedup for optimized implementations of AES, DES, 3DES, and SHA using Cadence LX7 Processor and Xtensa platform. Their obtained results provide excellent tradeoff with the area, memory, and cycle count performances.

In another work [8], authors have investigated a separate instruction set extension for 32- and 64-bit base architectures. Their results show better performances for an AES-128 block encryption compared to software implementation. The improvements are about 4x and 10x faster with a hardware cost of 1.1 K and 8.2 K gates, respectively. Their work has supported RISC-V in becoming the first widely implemented ISA to support AES acceleration across all implementation profiles, from embedded IoT devices to application and server class processors.

In [9] Werner et al. have presented a method for software IP confidentiality and authentic execution on IoT devices. Their method named Sponge-based Control Flow Protection (SCFP) consists of hardware extension between the CPU’s fetch and decode stage to decrypt and authenticate instructions at the compile time. This prevents code-reuse, code-injection, and fault attacks on the code and the control flow. To check their concept, they proposed an extended version of RISC-V processor enhanced with Sponge-based Control Flow Protection. They achieved an average overhead of SCFP in code size of 19.8% and an execution time of 9.1%. They claimed that their design meets IoT devices requirements.

The works of Grabher et al. [10] described a number of lightweight instruction set extensions (ISEs). They have presented a design that can improve the performance overhead that results from bit-sliced implementation, at the same time preserving the advantages of bit-sliced implementation. They have suggested a generic design with a high agility degree of cryptographic algorithms (AES, DES, Serpent, and PRESENT), SHA-1 hash function, and polynomials multiplication.

In [11] a processor instruction set based on stack-based processor architecture is presented. The recommended design consists of connecting a serialized Klein lightweight cryptographic cipher coprocessor to the 32-bit ZPU core. By this way, different lightweight security and authentication primitives can be implemented in a code and area effective way. The proposed design is synthesized using VeriSilicon GSMC 0.13 um with a frequency of 100 kHz and a gate count of 4.5 K GE.

In another work, an ongoing effort in securing RISC-V core called S-RISC-V is introduced [12]. The proposed methodology consists of using key generation to protect memory with ISA extension and hardware implementation. The suggested architecture has been verified on Zedboard FPGA platform, driven by the host ARM core with a frequency of at 25 MHz. Compared to the original RISC-V core, they noticed an area overhead less than 10%.

Steinegger and Primas [13] proposed an instruction set extension for Ascon-p. Their proposed design consists of integration into a 32-bit RI5CY processor core. This permits accelerating the symmetric cryptographic computations with a low cost. The proposed instruction extension for RISC-V avoids the need for protection against implementation attacks, by the choice of the appropriate AEAD mode in software. They prove that their proposed accelerator can be achieved with hardware metrics of 4.7 kGE. Park et al. in [14] have investigated a set of implementation methods of the Simeck algorithms with different data block sizes using a 64-bit Intel processor Advanced Vector Extension 2 (AVX2). They proposed an efficient encryption method for human care service. They obtained 3.5 cycles/byte and 4.6 cycles/byte for Simeck32/64 and Simeck64/128 encryption, respectively.

In industry, lightweight cryptographic module is used by NXP semiconductors. The security system on LPC55S6x devices is ensured by PRINCE encryption algorithm. The cipher is used for real-time encryption of data being written to the flash chip and decryption of secured on-chip flash data while reading [15]. They proposed a real-time on-the-fly encryption/decryption functionality executed in one cycle. This permits securing application code, as well as stored keys and flash update security.

Because innovative IoT devices often times require customizable security solutions, there is a room to elaborate custom cryptographic architectures with better implementations performances. Processors customization is still an active area of research which aims at reducing the gap between the application requirements and the features of standard processor by extending the base instruction set of embedded processors with a number of instructions dedicated to security.

In this work, we suggest a solution that enclose the efficiency of lightweight cryptographic algorithms, the software-based implementations flexibility, and the high performance of custom hardware architectures. The proposed solution consists in developing an enhanced processor architecture by customizing both the microarchitecture and the instruction set of the processor. Two lightweight cryptographic instructions set extensions (PRESENT and PRINCE) for RISC-V ReonV processor core are elaborated. These custom instructions can then be instantiated in software routines exactly as the instructions of the base architecture.

2.1. Description of the ReonV and LEON3 Processors

To be suited for lightweight and IoT applications, microprocessor core has to be optimized in terms of area and power consumption. Many works treat with architectures and freeware processor designs. The main difficulty consists in applying a compact instruction set architecture which is also code-efficient and has GCC support. LEON3 [16] is a configurable processor written in VHDL, provided as part of the GRLIB IP Library licensed by Cobham Gaisler AB. The advantage of the availability of its source helps to make changes and explore new concepts. The synthesizable model is a 32-bit processor compliant with the SPARC V8 architecture and the full source code is available under the GNU GPL license, allowing free and unlimited use for research. LEON3 as a processor is not dedicated to the IoT; it works in space shuttles and domains that need heavy calculations.

In this work we used ReonV, a RV32I open-source CPU, which is a VHDL model that was developed by reusing the LEON3 processor as well as the structure of its SOC (System On Chip). It consists of changing only its 7-step integer instruction pipeline to implement the RV32I ISA instead of the original ISA SPARCV8. The instruction set architecture (ISA) of the original processor LEON3 is changed from SPARCV8 to RISC-V [18], while retaining all other IP cores and resources such as memory, memory controllers, peripheral support, debugging support unit (DSU), synthesis scripts for several FPGAs from different manufacturers, and others.

As shown in Figure 1, there is the SOC structure of the LEON3 processor in the GRLIB models. It also illustrates that there are many gains when reusing this structure in the development of a new RISC-V processor, because it will inherit without requiring changes.

The RISC-V architecture can be personalized for the IoT devices. It represents a flexible computing platform which enables novel development for IoT applications [19].

2.2. RISC-V VS SPARC V8

SPARC V8 [20] is suitable to be implemented on servers and not for embedded devices. It is under the GPL license. This architecture has limits; it lacks several important architectural features and other properties that dramatically increase the implementation complexity, especially for high-performance implementations.

Furthermore, contrary to SPARC V8, the RISC-V architecture is modular; a good balance between cost and efficiency can be achieved for a dedicated application. It is a free and open ISA that avoids technical errors and is simple to implement in many microarchitectural styles. Another reason why RISC-V has attracted industry interest is that it was designed to be both stable for long-term viability and adaptable to accommodate a lot of applications. A comparison of open ISAs is shown in Table 1.

2.3. Instruction Pipeline

ReonV and LEON3 both have a 7-stage pipeline. ReonV ended up using the same steps as LEON3 for convenience (IF, ID, RA, EX, ME, EXP, WB) [17]. For a 7-stage pipeline processor, the integer unit (UI) uses 7 cycles to complete an instruction:(i)FI (fetch instruction): the CPU reads the instructions from the memory address whose value is present in the program counter.(ii)DI (decode instruction): the instruction is decoded to identify the type of instruction and the register file is accessed to obtain the values of the registers used in the instruction.(iii)RA (register access): this step allows the reading of the operands in the registers.(iv)EX (execution instruction): this step allows you to retrieve the operands from the registry file and execute the instruction using the ALU (Arithmetic-Logic Unit).(v)ME (memory): the memory operands are read and written from and to the memory present in the instruction. This step denotes a transfer from a register to the memory in the case of a STORE type instruction (write access) and from memory to a register in the LOAD case (read access).(vi)EXP (exception): this step propagates exceptions resulting from the execution of instructions.(vii)WB (write back): the extracted value is rewritten in the register present in the instructions in order to benefit from a very short access time for the following instructions.

When designing integrated circuits for IoT applications, it is desirable to create optimized processors to efficiently run IoT algorithms having an effective resistant against external attacks with a tradeoff between high-efficiency and hardware metrics. The RISC-V ISA enables designers to create custom instructions which are targeted at critical algorithms such as DSP (Digital Signal Processor), artificial intelligence, or cryptography.

2.4. Proposed Lightweight Cryptographic Architectures

IoT is extremely insecure and presents an enormous threat to us. For this reason, in designing lightweight cipher blocks for IoT devices, it is important to recognize that we are building a block cipher that requires a good security level with high throughput speed, while keeping in mind a moderate cost of area and low power consumption.

In this work, the main objective is to propose an enhanced processor equipped with a full hardware security solution. New proposed unrolled architectures of PRESENT and PRINCE ciphers with a 64-bit data size and 128-bit key length are proposed. The two elaborated cipher designs are based on two different techniques: substitution–permutation based network (SPN) and FX-construction. A 32-bit data width of the proposed cipher designs is also elaborated to fulfill ReonV processor datapath. As shown in Table 2, lightweight cryptography block ciphers specifications are presented, including the block/key size (bit), datapath, and the number of rounds.

3. PRESENT Cipher Architecture

PRESENT algorithm is a lightweight encryption cipher proposed in 2007 by Bogdanov et al. [21]. It is the most widely used of the light ciphers, and it is part of the ISO/IEC 29192 [22]. For a block length of 64 bits, two key lengths are supported: 80 and 128 bits [23]. PRESENT is a substitution–permutation based network (SPN), which consists of 25 or 31 turns according to the key size. PRESENT cipher used a bit-oriented permutation; this makes it different than other SPN ciphers. Each turn is successively composed of addRoundKey, a parallel 4-bit S-box operation completed by a permutation of the bits as shown in Figure 2:(i)AddRoundKey: this operation combines the subkeys of each turn with the data, using the eXclusive OR (XOR) bit by bit.(ii)SBoxLayer: a 4-bit to 4-bit S-box is used. The action of this box is described by the S-box table presented in [21].(iii)P-Layer: the P-Layer function consists of rearranging the 64 bits in the order as described in [21]. The bit i of state is adjusted to bit position P(i).(iv)The key schedule: two different key sizes of 80 or 128 bits are allowed to encrypt data. The key update process is varied depending on its size.

Applications such as IoT require high security levels. As a consequence, PRESENT cipher with 128 bit key size security will be adequate to secure communications. Furthermore, a large amount of data will be treated and transmitted; a cipher design with a high throughput and good efficiency is required. A new design will be elaborated without excluding area cost, power, and energy consumption. Encryption and decryption using PRESENT cipher have approximately the same physical requirements.

3.1. PRINCE Cipher Architecture

PRINCE block cipher is the outcome of a collaboration between the Technical University Denmark, NXP Semiconductors, and the Ruhr University Bochum. PRINCE cipher [24, 25] is a lightweight encryption algorithm targeting low latency, unrolled hardware implementations presented in 2012. The key size is about 128 bits, with a block size of 64 bits. Two possible numbers of iterations are possible (11–13) which make it suitable for hardware implementation to reduce latency and achieve better performance. This makes it comparable to AES cipher for 10 rounds using a 128-bit key. Furthermore, the PRINCE rounds are much less complex; this makes it an interesting lightweight block cipher with possible resources optimization and cost reduction [26].

The PRINCE architecture is described by Figure 3. Each round of PRINCE core is categorized as an SPN cipher composed of a key addition, S-box-layer, a linear layer, and a round constant addition. The key is split into two 64-bit keys K0 and . The input (Plaintext) is XORed with K0; then it is processed by a core function using K1. The output of the core function is XORed by to produce the final output (Ciphertext):(i)RCi-add: in this step a 64-bit round constant is XORed with the 64-bit subkey. The values of RCi are presented by Borghoff and Canteaut [24].(ii)S-Layer: one 4-bit S-box is used. The action of the S-box in hexadecimal notation is presented in [24].(iii)The Matrices (M/M′-layer): in this layer, a 64-bit state is multiplied, respectively, with a 64 × 64 matrix M and M′. The M layer (Mult‐by‐M) is only used in the middle round. The M′ layer (Mult‐by‐M′) is used to ensure the α‐reflection property (RCi ⊕ RC11-i = α). The M′ mapping is combined in the round functions with a matrix shift row (SR) (M = SR [Mult‐by‐M′]) to guarantee a full diffusion.

Compared to most of lightweight ciphers, PRINCE algorithm has a small number of rounds and the layers constituting a round have low logic depth. As a result, fully unrolled design can reach higher frequencies.

Besides that, the PRINCE cipher is based on FX-construction, which increases the security of a core block cipher. Any claim is noticed regarding PRINCE cipher security against known attacks.

4. Extension of Lightweight Cryptographic Instructions from ReonV Processor

In this part, we will present the principle of integrating new instructions into the ReonV processor core, in order to support the lightweight cryptographic primitives previously developed.

4.1. Integration of Lightweight Cryptoprocessor in the Integer Unit

To add lightweight cryptographic algorithms as extensions for the ReonV processor [17], we are going to build a cryptoprocessor that will be integrated into the integer unit (IU) of the processor, specifically, in the “EXECUTE” stage of the pipeline. The integration of the proposed lightweight cryptoprocessor is shown in the diagram below (Figure 4).

4.2. Lightweight Cryptographic Instruction Configuration Register

In order to know the cryptographic instruction that will be executed, a register called “cryptographic instruction configuration register” must be added on the AMBA APB bus (Figure 5). This register contains information on the extension of the cryptographic instructions implemented in the ReonV. It allows you to select the type of instruction that will be executed; the call of these instructions can be selected independently.

These are instructions in different formats; in order to integrate the lightweight cryptographic instructions we have worked on format 3 (FMT3). This register contains the possible values for format 3 of the cryptographic instructions. The « FMT3-OP3-2C » format defined by bits 15, 16, and 17 contains the cryptographic instruction of PRINCE cipher. For “FMT3-OP3-2D” format, defined by bits 12, 13, and 14, it contains the instruction of the algorithm PRESENT. Two versions of ReonV are possible « ISEC-Reonv-Base-Version » which is a basic version and a second extended version « ISEC-Reonv-EXT-Version » (see Figure 6).

The cryptographic instructions of the lightweight algorithms named PRESENT and PRINCE are grouped in the crypto-config register according to the family; in bits 15, 16, and 17 we have added the PRINCE instruction and in the bits 13, 14, and 15 the instruction of the PRESENT cipher is shown in Table 3.

4.3. Instruction Decoding Register

To integrate our two proposed ciphers as lightweight cryptographic instructions, we worked on the instruction decoding register. This register allows instruction decoding. In fact, the two bits 30 and 31 are used to select the form of the instruction. To execute an instruction which belongs to a specified format, it is necessary to load [24 : 19] bits of the decode register as described in Figure 7.

For an instruction belonging to “FMT3_OP3_2C,” the decoding register must be filled with the word 0X81600000. In the other side an instruction belonging to the “FMT3_OP3_2D” format, the instruction decoding register, must be loaded with the word 0X81680000.

4.4. Proposed Lightweight Cryptographic Instructions

This section introduced the description of the overloaded lightweight cryptographic instructions (PRESENT and PRINCE), described for RISC-V architecture. A unique format “f” is proposed for the elaborated as follows:

These instructions have two operands, “Op1” and “Op2.” These two registers, predefined by the core of the ReonV processor, are 32 bits in size. The values of the calculation parameters are entered in 32-bit blocks. Thus, to enter operand “Op1” of 128 bits (key), 4 blocks of size 32 bits are needed and 2 blocks for 64-bit operand “Op2” (plaintext). The result of the instruction is stored in the 32-bit destination register “rd” (ciphertext). The “Crypt_en” selects to enable or disable the cipher module (see Figure 8).

The proposed instructions are implemented with the format FMT3 opcode 0x2C and 0x2D; they need 7 clock cycles for 64-bit block cipher.

5. Experimental Results and Security Analysis

In this section, we give the performances overview as well as security analysis of our proposed lightweight cryptographic cipher designs. The hardware implementation of enhanced ReonV processor is evaluated in this section too. All hardware features were derived directly or partly from the results obtained by full design implementation of our VHDL code using Mentor Graphics ModelSim 6.6d and Xilinx Vivado 2021.2 [27]. After place and route, the elaborated hardware implementations have been tested. All the results have been generated for different Xilinx FPGAs Platforms.

5.1. Hardware Implementation of Our Proposed Lightweight Cipher Architectures on FPGA Platforms

The main objective of the proposed lightweight cryptographic cipher designs is to achieve an economic hardware implementation with optimized performances, as well as high security level. The comparison of all the metrics of our proposed lightweight ciphers architectures is shown in Table 4. The plaintext and key loading phases are considered in this work in contrary to other publications.

The comparison of our proposed cipher designs with other published works is taken as reference metrics: the occupied standard reconfigurable logic (equivalent slices) of FPGAs, the running speed, analyzed in terms of clock cycle number, running frequency, and the achieved throughput speed, as well as throughput on area ratio (efficiency), power, and energy per bit consumption. Some metrics are deducted from results published in literature.

A fair comparison with the state of the art is very difficult to achieve due to different FPGA technologies and implementation strategies being used. For this reason, three Xilinx FPGA platforms were chosen as target devices in this work due to their similarities with those used in the literature: Zync-7000, Kintex-7, and Spartan-6 [27]. Furthermore, almost in all related state-of-the-art articles only 64-bit data width is suggested; in this paper both 32-bit and 64-bit data width designs are elaborated for cipher designs. Regarding the total size of input data 64 bit and 128 bit representing, respectively, the plaintext and the secret key, and taking into consideration the internal communication 32-bit datapath of the host processor, input/output buffers are used to load the total data size. By this way, the input as well as output data blocks are divided into 32-bit word wide. Therefore, the number of cycles increased by 4 cycles for input block and 2 cycles for output block. This will have a direct impact on the latency, throughput, area occupation, and energy consumed by the design.

To be fair, our proposed 64-bit data width will be compared with identical data sizes of the state of the art.

Various hardware implementations of lightweight cryptographic block ciphers are reported in literature which could be compared to this work as presented in Table 4. Authors in [28, 29] described two hardware implementations of PRESENT cipher. Dalmasso et al. [28] describe a hardware design of PRESENT-128 on Xilinx Kintex-7 FPGA. A normalized frequency of 100 MHz is adopted. They claim that their obtained results show the expected benefits in terms of throughput and area, which allows selecting the best lightweight crypto-ciphers depending on the target device or application.

Compared to our proposed design, in [28] the PRESENT cipher implementation is iterative, and the 31 rounds are performed serially to provide the ciphertext. They implemented a PRESENT cipher architecture with 80-bit key size. In our case, fully unrolled PRESENT cipher architectures with 128-bit key size are elaborated; the 31 rounds are computed in parallel in a single cycle. As can be noticed, our proposed PRESENT cipher design achieves an increase of more than 12x the performances illustrated in [28] for a 64-bit data width on Kintex-7 board. This gain cannot be performed without a loss in consumed resources. The increasing cost in area is about 6x the number of total slices on kintex-7; this is due to the 128-bit key length used. However, we notice a better efficiency in terms of throughput on area ratio (throughput per slice) on Kintex-7 board.

Lara-Nino et al. [29] present an FPGA-based design for PRESENT lightweight block cipher and its implementation results. Although their proposed design allows optimization in area, the throughput is 330x slower than our noticed performance and the throughput on area ratio is 46x worse than our presented solution on Spartan-6 FPGA board.

Neither power consumption nor energy per bit is treated in [28, 29] to be compared with.

In another work [30], Lara-Nino et al. described a PRESENT cipher design implemented on a Spartan-6 FPGA board. We notice an area occupation of 474 slices, which is better by 384 total slices than our proposed design. However, the presented results show a frequency of 13.56 MHz, a latency of 396 cycles with a very low throughput of about 2.19 Mbps, high-power consumption of 23.45 W, and a high energy per bit dissipation. As can be concluded, our proposed PRESENT cipher design is by far better compared to [30] in terms of throughput, throughput on area ratio, dynamic power, and energy per bit consumption.

In [31] Abdullah and Obeid introduce a hardware implementation of PRINCE block cipher on Kintex-7 FPGA platform determined by the quantum cryptography protocol BB84. Obtained data indicates a throughput of 3.931 Gbps. As can be noticed, our proposed PRINCE cipher block is almost 2x better in terms of throughput than result illustrated in [31] when implemented on kintex-7 platform, with a throughput of approximately 5 Gbps and better efficiency of 9.375 Mbps per slice. The noticed area occupation is little better than [31]; however a waste in power and energy per bit consumption is noticed.

In another work [32], obtained results show a throughput of 4.18 Gbps and 2.875 W as power consumption when implemented on Virtex-6 FPGA board. In [33] a PRINCE IP core architecture has been elaborated to encrypt and decrypt the data in only one clock cycle with low latency and high speed. Obtained results with Virtex-403 FPGA board indicates a frequency of 31.76 MHz, a throughput of 2.032 Gbps, and a power consumption of 0.165 W. A comparison with results presented in [32, 33] cannot be continued because of different FPGA platforms.

Comparing the results of our two proposed unrolled cipher designs, PRINCE cipher presents better throughput speed, less area occupation, good throughput per slice ratio, and reduced energy per bit consumption compared with PRESENT cipher on all FPGA boards for a 64-bit data width. This can be explained by the reduced number of rounds for PRINCE cipher (12 rounds) compared to PRESENT cipher (31 rounds). The layers constituting a round of PRINCE have low logic depth and the operation’s complexity is much lower. The logic depth of a path represents the number of combinatorial gates between input and output; it will be linked directly to the latency of the circuit. As a result, complex operations will be associated with a large number of combinatorial gates and a reduced throughput speed.

We can also notice that obtained results of our elaborated cipher designs depend on the technology of FPGA platforms used. Indeed, Kintex-7 FPGA is a 28 nm process technology; on the other side the spartan-6 FPGA board is a 45 nm process technology.

Security analysis of our proposed lightweight cipher designs is elaborated in the next section.

5.2. Security Analysis of Our Proposed Lightweight Ciphers Architecture

To test security of our proposed lightweight cryptographic architectures against the most common attacks, statistical analysis as well as key sensibility analysis is achieved.

5.3. Statistical Analysis

In this section, security analysis of the experimental results is discussed. Security analysis covers histogram, entropy, and correlation analysis [34].

5.4. Histograms of Encrypted Images

The histogram shows distribution of pixel intensities of the image. It illustrates how the gray levels of the pixels in an image are distributed. Based on the obtained histogram, an attacker can deduce the plain-pixels by carrying out a frequency analysis (statistical attack). To prevent statistical attack, histogram of cipher-image should statistically have a uniform distribution. Relatively uniform distribution of the cipher-image is a sign of a good image encryption quality [34]. In Figures 912, we give the results obtained for Lena, Peppers, Baboon, and Barbara of size 256 × 256.

As can be concluded from the following figures, the histograms of the ciphered images are very close to the uniform distribution and are completely different from the plain images with good diffusion properties.

5.5. Entropy Analysis

Entropy is a statistical measure of uncertainty or randomness associated with information. It can be used to define the image texture. Thus, entropy can be considered as an assessment criterion for the significance of cipher algorithm [35]. The higher the entropy the better the encryption algorithm. The formula of entropy as given by Shannon is as follows:where H(X) is the entropy of the encrypted image, and P(xi) is the probability of each gray level appearance (xi = 0, 1, …, 255). In the case of equal probability levels, the entropy is maximum. The entropy should be ideally equal to 8 in the case of random images.

As presented in Table 5, it can be noticed that the entropy of all ciphered images is close to maximum, depicting an attribute of the algorithm.

From obtained entropy values of ciphered images, we notice that results are very close to 8. The entropy of each ciphered image is greater than 7.90. We conclude that the ciphered images have a good local randomness; hence our proposed lightweight cryptographic architectures can be considered as safe from entropy attack.

6. Correlation Analysis

Correlation analysis is an extensively used technique to decide the strength of a relationship between two pixels in an image [36]. For ciphered image, the less the adjacent pixel correlation will be, the better the encryption process is capable of hiding the details of the original image.

The correlation coefficient is computed using equations (3)–(6) where x and y are the grayscale values of two adjacent pixels of the image in horizontal, vertical, and diagonal directions.where

For a plain image, two adjacent pixels have a strong linear relationship and the correlation coefficient is close to 1. For a good ciphered image, the correlation should be as low as possible (close to zero) between adjacent pixels.

Table 6 shows that correlation coefficients values are very close to zero between adjacent pixels. There is no correlation between the plain and ciphered images. As a result, there is any similarity between plain and encrypted images, proving a very good achieved confusion by the proposed lightweight ciphers. All of them present good ability to resist against statistical attacks.

6.1. Key Sensitivity Analysis

A secure cipher design should be very sensitive to the secret key; to measure the key sensitivity of our proposed lightweight cipher designs, we randomly choose two secret keys with only one-bit (LSB) difference to encrypt the plain image. The key sensitivity evaluation is based on two parameters: The Number of Pixels Change Rate (NPCR) and the Unified Average Changing Intensity (UACI). The two parameters NPCR and UACI are calculated according to the following equations [38]:

In the previous equations, i and j are the row and column indexes of the image, respectively. M and N are, respectively, the length and width sizes of ciphered images C1 and C2.

The values of NPCR and UACI demonstrate that the algorithm is robust against differential attack. As can be concluded from Table 7, the average obtained values of NPCR and UACI are close to the optimal values of NPCR and UACI which are 99.609% and 33.463%, respectively. Therefore, our proposed cipher designs are highly sensitive to small changes in the secret key and resistant against differential attacks.

6.2. Comparison with Existing Cryptographic Solutions

In this section, we compare our proposed cipher designs with several cryptographic solutions. In the comparison, we focus on the statistical analysis and key sensitivity with previous works. Table 8 presents a comparison of entropy and correlation coefficients for the proposed designs and other works from three directions for the image Lena of size 256 × 256.

We can see that our proposed lightweight cipher designs have competitive performances compared with other existing cryptographic schemes. Our proposed cipher designs reveal the randomness and the degree of ambiguity in the ciphered image.

In order to compare the key sensitivities for our two elaborated cipher designs, the image Lena is encrypted by the secret key and then by changing a single bit in the original secret key. Both NPCR and UACI of encrypted images are calculated as listed in Table 9.

The test results support that the approach we proposed has at least the same excellent performance as the other proposed solution. Therefore, the cipher designs we suggest can resist differential attacks effectively.

6.3. Hardware Implementation of Enhanced ReonV Processor on FPGA

We have synthesized the customized ReonV processor used to optimize the lightweight cryptographic instructions with Spartan-6 FPGA Board (Table 10).

When integrated in the integer unit stage of the processor, we can conclude that the area occupation increases a little by 1% for the two proposed instructions in terms of Slice LUTs plus FF-pairs. On the other side, we noticed the resulting improvement in the throughput is significant for both PRESENT and PRINCE instructions by 480% and 341%, respectively. Furthermore, the maximum clock rates reported by the synthesis tools are 133.074 MHz for the original ReonV processor.

In case of enhanced processor version, the maximum frequency reaches 132.769 MHz and 132.103 MHz for the modified one with PRESENT and PRINCE instructions, respectively. The difference is marginal, and the overall implementation can be optimized.

7. Conclusion

To ensure IoT devices, security must extend beyond software-based security into hardware-based security. In this work, an efficient and secure support of cryptography on embedded processors is elaborated. We have focused on the promising approach of instruction set extensions. Two instruction set extensions for 32-bit processor for PRESENT and PRINCE lightweight cryptographic ciphers are described. The designs and implementations of our proposed 32-bit data width ciphers are detailed. We noticed that our proposed block ciphers present high throughputs as well as good efficiency with moderate power consumption and energy dissipation. Security level is analyzed to prove the robustness of these designs. The results show that our proposed lightweight cryptographic modules are secure enough against possible attacks.

The proposed lightweight cryptographic ciphers are integrated directly into the core of the processor, which makes them available by means of additional custom instructions. With hardware costs of about 4K Slice LUTs plus FF-pairs with an area overhead about 1%, a throughput of 1.2 Gbps is reached. The main advantage is that a single instruction is needed to implement a full lightweight cryptographic instruction which can be instantiated in software routines exactly as the processor’s base instructions with high security level.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.