#### Abstract

This work presents the design, hardware implementation, and performance analysis of novel asynchronous AES (advanced encryption standard) Key Expander and Round Function, which offer increased side-channel attack (SCA) resistance. These designs are based on a delay-insensitive (DI) logic paradigm known as null convention logic (NCL), which supports useful properties for resisting SCAs including dual-rail encoding, clock-free operation, and monotonic transitions. Potential benefits include reduced and more uniform switching activities and reduced signal-to-noise (SNR) ratio. A novel method to further augment NCL AES hardware with random voltage scaling technique is also presented for additional security. Thereby, the proposed components leak significantly less side-channel information than conventional clocked approaches. To quantitatively verify such improvements, functional verification and WASSO (weighted average simultaneous switching output) analysis have been carried out on both conventional synchronous approach and the proposed NCL based approach using Mentor Graphics ModelSim and Xilinx simulation tools. Hardware implementation has been carried out on both designs exploiting a specified side-channel attack standard evaluation FPGA board, called SASEBO-GII, and the corresponding power waveforms for both designs have been collected. Along with the results of software simulations, we have analyzed the collected waveforms to validate the claims related to benefits of the proposed cryptohardware design approach.

#### 1. Introduction

Advanced encryption standard (AES) is the most widely used symmetric-key algorithm standard in different security protocols [1]. Originally, the algorithm was called Rijndael; but after its selection as the candidate for AES due to its merits, it gained popularity. It is used by hundreds of millions of users worldwide to protect security in various applications. AES was conceived as reliable in providing security for data, until researchers proved that side-channel attacks (SCA) were successful in compromising its security. Since the discovery of various efficient SCAs such as power analysis and EM (electromagnetic) analysis, researchers have started exploring different approaches to design countermeasures.

Wave dynamic differential logic (WDDL) [2] and sense amplifier based logic (SABL) [3] are some of the previously proposed countermeasures of synchronous category. But both of these approaches suffer from timing related issues that could leak side-channel information. Wu et al. [4] proposed an asynchronous S-box design that proved to be power efficient and side-channel attack resistant. Sui et al. [5] proposed a design approach that combines S-box design with random dynamic voltage scaling (RDVS) to boost SCA resistance to a greater extent.

This paper proposes a scalable asynchronous AES Key Expander and Round Function designs that incorporate the merits of null convention logic (NCL) and random voltage scaling. In this work, these two modules are then utilized to design a NCL based subset of the AES cryptosystem. The reason for calling it a subset is that, in an actual AES, the two modules are utilized iteratively. But for the cryptosystem subset discussed in this work, we utilize the two modules only for a single iteration for verification purposes.

This work has multiple contributions in improving SCA resistance of cryptohardware as follows:(1)the proposed approach contributes to a uniform and reduced switching activity in cryptosystem and thereby curtail the leaked power and improve resistance against power analysis SCA;(2)the anticipated improved switching profile also translates to uniform and reduced EM radiation side-channel information emanating from cryptosystem and boosts the resistance of cryptosystem against EM SCA [6];(3)the proposed Key Expander and Round Function designs allow easy scaling for implementing entire AES algorithm of any of the following variants—128, 192, or 256 bits;(4)they can also be easily scaled and implemented for different modes of AES like electronic codebook (ECB), cipher feedback (CFB), and cipher block chaining (CBC) modes;(5)both proposed designs incorporate a power efficient NCL combinational substitution box design, which provides power benefits when compared to the conventional approach;(6)the proposed design can also be effectively coupled with STRVDS (spatial temporal random dynamic voltage scaling) technique to intentionally inject random noise for even higher SCA resistance.

The rest of the paper is arranged as follows. Section 2 gives a background of AES, NCL, and vulnerabilities of synchronous AES which are essential in understanding the proposed design techniques. Section 3 details the influence of switching activity on SCA. Section 4 describes the proposed NCL AES Key Expander. Section 5 describes the proposed NCL AES Round Function. The proposed STRDVS noise injection technique for NCL cryptohardware is discussed in Section 6. Section 7 discusses the results, which include the functional verification, WASSO analysis, hardware implementation, and power trace analysis for both conventional and proposed designs. This is finally followed by conclusion and future work.

#### 2. Preliminaries and Review

##### 2.1. Advanced Encryption Standard

The AES algorithm is a symmetric block cipher that processes data blocks of 128 bits using cipher keys of three different lengths: 128, 192, or 256 bits. Its operations are performed on the State. The State is a two-dimensional array of bytes which contains the Plaintext, consisting of four rows and columns, where is the block length divided by 32. Similarly, the Key Schedule is a two-dimensional array of bytes which contains the Key.

At the start of the cipher operation, input Plaintext is copied to the State and input Key is copied to the Key Schedule. After an initial Round Key addition, the State is transformed by a Round Function implemented times. This number depends on the key length: for 128 bits, for 192 bits, and for a key length of 256 bits.

Figure 1 shows the two main components of AES. Key Expander and Round Function have four basic byte-oriented transformations each, which are applied to the Key Schedule and the State, respectively.

##### 2.2. Vulnerability of Synchronous AES Hardware Design

Cryptographic algorithms including AES have been used in many applications which require high security. To satisfy these security requirements, various public/private-key algorithms have been proposed and hardware models are designed for encryption and decryption processes. However, without proper hardware implementation, these algorithms and models are still vulnerable to side-channel attacks [7–9]. Differential power analysis (DPA) is one good example of side-channel attack where a series of power traces is intentionally collected for a set of input Plaintexts (or ciphertexts) and statistically analyzed to reveal the private key or significantly narrow down the key search space [7, 8, 10, 11]. The statistical nature of DPA makes it harder to counteract, since extremely small deviations in power can be accumulated and amplified to locate power peaks and the secret key can still be attacked. Even more powerful CPA (correlation power analysis) attack has been also recently gaining attentions [12].

Just as the power consumption of CMOS devices is data-dependent, the electromagnetic radiation emanating from a cryptosystem is also data-dependent. This data-dependent radiation is again the origin of side-channel information leakages. The leaked side-channel information is analyzed by means of electromagnetic analysis (EMA), which measures electromagnetic fields near cryptographic device [6] and uses this data to compromise the security. But if we can curtail the leakage of side-channel information, we can thereby make it difficult for the attacker to have sufficient information to identify the segments in the power waveform and EM radiation. We can secure the cryptosystem more effectively against these power analysis and EMA SCA.

##### 2.3. Null Convention Logic (NCL)

NCL is a delay-insensitive (DI) logic design paradigm. The delay insensitivity of NCL circuits is achieved by dual-rail and quad-rail logic [13]. A dual-rail signal can effectively represent four states. Out of them, the three valid states are DATA0, DATA1, and NULL. The fourth state in which both rails are asserted is considered as an illegal state. The valid data states DATA0 and DATA1 correspond to Boolean logic 0, Boolean logic 1, respectively. The control signal NULL is used for asynchronous handshaking. The clock-free operation is implemented via the two delay-insensitive registers located on either side of the combinational circuit and the local handshaking signals.

The main benefit of NCL is that more uniform power consumption signature can be achieved since the signals are implemented by two complementary wires. Furthermore, due to delay-insensitive nature, these DI circuits adhere to monotonic transitions between DATA and NULL; so, there is no glitching, unlike clocked Boolean circuits that produce substantial glitch power and information leakage resulting from glitching. DI systems better distribute switching over time and area, reducing the switching activity, peak power demand, and system noise, unlike clocked Boolean circuits where much of the circuitry switches simultaneously at the clock edge. Another important potential of NCL is it inherently allows intentional noise injection by randomizing timing of switching activities to further reduce the side-channel information leakage. The downside is it generally incurs area and wire overhead.

#### 3. Influence of Switching Activity on SCA

##### 3.1. Role of Switching Activity on Power Analysis SCA

The dynamic power consumption of CMOS gates is particularly relevant from a side-channel point of view since it determines a simple relationship between a device’s internal data and its externally observable power consumption. It can be written as

In (1), is the power consumed, is the switching activity factor, is the switched capacitance, is the supply voltage, and is the clock frequency. This data-dependent power consumption is the origin of side-channel information leakages. If we are able to reduce the switching activity factor in (1), that would directly translate to decreased dynamic power consumption. Messerges et al. discussed the role of SNR ratio in determining the success probability of a DPA attack in [14]. Consider

Equation (2) can be used to estimate SNR [15]. In this equation, is the variance of exploitable component of power consumption and is the variance of noise component. By reducing this exploitable power information , we can lower the SNR ratio. The lower the SNR ratio, the lower the leakage; so, performing the power analysis attack becomes harder.

##### 3.2. Role of Switching Activity on EM SCA

The switching activity also influences the EM radiation leaked from the cryptosystem. The voltage fluctuation caused by ground bounce can be expressed as [6]

In this equation, is the effective parasitic inductance, is the number of simultaneous switching outputs, and is the rate of change of the current. So, it is clear that if we are able to reduce the switching activity , we can reduce the information leakage due to , as .

#### 4. NCL AES Key Expander Design

The AES algorithm uses a Key Expander to calculate the Round Keys used in AddRoundKey stage of the Round Function. The AES specification refers to this process as the KeyExpansion. The motive behind the purpose of this unit is that generating multiple keys from an initial key and using a unique key for each round, instead of using the same key for all the rounds, greatly increase the diffusion of bits. For this research, we chose AES with a key size of 128 bits.

The control unit for these NCL AES Key Expander and Round Function is shown in Figure 2. In this control unit, the input data which is in ordinary binary format is read and is converted into dual-rail inputs by single-rail to dual-rail converter. is the output acknowledgement signal coming out of the NCL Round function and Key Expander. It acts like clock signal for the other units in the controller. The converter and multiplexer (MUX) are controlled by . When is 1, it means NCL Round function and Key expander are ready for NULL wavefront; then, MUX will send all 0’s to Plaintext and Input_Key to nullify the NCL Key Expander and Round function. Otherwise, MUX will select the dual-rail data that is output from the converter. The dual-rail “Input_Key” is fed as input to the NCL Key Expander and it generates the Round Keys necessary for each encryption round of AES.

The block diagram of the Key Expander architecture [16] is presented in Figure 3. The , , , and are the four columns of the Key Schedule. The columns of the Key Schedule which have their index as a multiple of four undergo the “RSX step” along with the XOR operation; all the remaining columns undergo XOR operations to generate the Round Key. As depicted in the figure, Key Expander consists of the following modules.

*RotateWord*. This operation accepts an array of 4 bytes and rotates them 1 position to the left. The RotWord function used by KeyExpansion is very similar to the ShiftRows routine used by the encryption algorithm except that it works on a single column of the Key Schedule, instead of the rows of the State array.

*SubWord*. The SubWord routine performs a byte-by-byte substitution on a given row of the Key Schedule table using the NCL S-box. The substitutions in KeyExpansion operate exactly like those in the SubBytes step of Round Function. The input byte to be substituted is fed as input to the NCL combinational S-box, and this input then undergoes multiplicative inversion in and affine transformation during encryption. We employed the dual-rail combinational NCL S-box proposed in [4] for this step as this design already proved to be very power efficient and resistant to SCA. The architecture of the S-box and the block diagram of its internal multiplicative inversion module are presented in Figures 4 and 5.

*Round Constant Module*. This module uses an array Rcon, called the round constant table. In the synchronous implementation, these round constants are 4 bytes each to match with a column of the Key Schedule table. The AES KeyExpansion routine [1] requires 10 round constants, one for each round of the AES algorithm. In our implementation, we implement this as an array of round constants represented in dual-rail notation.

*XOR Module*. In this module, we perform the XOR operation between the columns of the Key Schedule with or without the round constant selected in previous step depending on the column which is being calculated. In order to realize this XOR function in NCL, we have to make use of NCL XOR function designed using the NCL threshold gates.

Unlike Boolean logic, NCL has 27 fundamental threshold gates to realize arbitrary logic [13]. In order to achieve the input-completeness and observability, it is important to choose appropriate threshold gates. For the design of NCL XOR function, the sum-of-product (SOP) expressions are and . They can be realized by mapping them to THxor0 gates as shown in Figure 6. However, two transistors can be eliminated for each rail of (when using static gates) by realizing this same functionality using TH24comp gates. This is done by adding the two* do not care* terms, representing the cases when both rails of either or are simultaneously asserted.

The new equations are and . The NCL XOR function realized using these equations and TH24comp gates is presented in Figure 7 and is used in our proposed design. This TH24comp based XOR offers a 10% reduction in the number of transistors required compared to the approach using THxor0 gates.

#### 5. NCL AES Round Function

The top-level architecture of the proposed NCL AES Round Function design is presented in Figure 8. Controller for this module is presented previously in Figure 2. This control unit takes care of converting the ordinary Plaintext and Input_Key into dual-rail notations. The dual-rail “Input_Key” is fed as input to the NCL Key Expander and it generates the Round Key, which along with the dual-rail Plaintext from the controller is fed to the AES Round Function.

The NCL AES Round Function consists of the following four steps which are performed sequentially.

*(**1) NCL SubBytes*. In this transformation, each dual-rail byte of the State matrix is substituted independently by another one which is computed by the NCL S-box. The S-box is a key element in the AES architecture as it significantly influences the security, power consumption, and throughput of the AES hardware. We are using the dual-rail combinational NCL S-box proposed in [4] for this step as this design already proved to be very power efficient and resistant to SCA.

*(**2) NCL ShiftRows*. The NCL ShiftRow transformation function performs byte transposition of all dual-rail NCL signals by using circular shifting, where each row of dual-rail State is rotated cyclically to left using 0-, 1-, 2-, and 3-byte offsets for encryption.

*(**3) NCL MixColumns*. In this transformation, each column of the dual-rail State matrix is multiplied by a circulant maximum distance separable matrix. This MixColumns function takes four dual-rail bytes as inputs and four dual-rail bytes as outputs, where each input byte affects all four output bytes. The multiplication of the State array element with 2 in the dual-rail domain is realized by 1-bit left shift of dual-rail signals followed by a conditional NCL XOR operation. The multiplication with 3 is implemented in a similar fashion but it involves an additional NCL XOR operation.

*(**4) NCL AddRoundKey*. AddRoundKey transformation performs a byte level dual-rail XOR operation on the dual-rail output of MixColumn and corresponding dual-rail Round Key.

#### 6. Spatial Temporal Random Dynamic Voltage Scaling (STRDVS) Augmentation of NCL AES for Higher SCA Resistance

Recently, Yang et al. [17] applied random dynamic voltage and frequency scaling (RDVFS) to synchronous cryptoprocessors to enhance resistance against side-channel attacks. By randomly changing the supply voltage, “noise” can be injected into the power trace, making the attack more difficult. The clock frequency changes with different supply voltages to avoid timing violation. However, since the circuits are synchronous, the change in clock frequency can be easily observed in the power trace and, using certain hypothesis, the voltage corresponding to the frequency can also be obtained. As such, the attack can still be successful. To alleviate the problem, [18] proposes to use random DVS (RDVS) only, without changing the clock frequency. However, the tight timing constraint gives little room to do the voltage scaling.

It is obvious that the security enhancement highly depends on how much “noise” can be injected; this in turn depends on how much room is available for the voltage scaling. We argue that RDVS is more suitable for QDI designs for two reasons. First, there will be no timing constraint as in the synchronous or bounded-delay counterparts, leaving more room for voltage scaling. Second, since there is no clock signal, fewer gates will switch simultaneously and thus the power supply noise is reduced. Accordingly, the noise margin is increased, providing even more room for voltage scaling.

Different from [17, 18], in addition to changing the supply voltage randomly over time (temporal randomness), we propose to supply different random voltages over different regions in the chip (spatial randomness). Since NCL is self-timed and event-driven, difference in latencies among the regions caused by STRDVS is* inherently tolerated* unlike the clocked counterpart. Such spatial and temporal RDVS (STRDVS) in NCL will maximize the noise injected and thus the resistance to side-channel attacks.

Spatial and temporal random dynamic voltage scaling (STRDVS) is especially suitable for delay-insensitive designs to provide additional resistance to side-channel attack and to further reduce the power consumption as a byproduct [19]. The reason for QDI circuits to still have vulnerabilities is the imbalanced load capacitances between the two rails of a signal. Although the total number of switching is independent of data pattern, the switching activities between the two rails are different. For example, passing consecutive DATA1s makes Rail1 switch all the time, while passing consecutive DATA0s makes Rail0 switch all the time. Since most likely the two rails drive different loads, power is still imbalanced across data patterns and is still coupled with data being processed. A number of literature proposed various techniques to mitigate this problem.

##### 6.1. Leveraging TRNG for the Proposed STRDVS NCL Cryptohardware

TRNG (true random number generator) is widely used for designing hardware systems for secure applications such as secure wireless communications, electronic financial transactions, smart cards, mobile computing, and secure RFID. Unlike PRNG (pseudorandom number generator) which always gives the same number sequence for a particular seed state (i.e., thereby less secure), TRNG are based on microscopic phenomena that generate a low-level, statistically random “noise” signal with high information entropy [20], such as thermal noise, oscillator drift, the photoelectric effect, or other quantum phenomena [21]. There exist various TRNG designs for hardware implementation purposes including ones that are reported in [22–30]. One good example is TRNG1 IP (intellectual property) core by IP Cores, Inc [31]. TRNG1 features a high entropy source (i.e., either 128 or 256 bits) and satisfies Federal Information Processing Standard (FIPS) Publication 140-2 Annex C (i.e., “approved” random number generator) from the US National Institute of Standards and Technology (NIST) [32] and passes the requirements of the NIST SP 800-22 test suite [33].

The proposed NCL AES components leverage a TRNG for the proposed STRDVS technique for even higher resistance over SCA by intentionally injecting noise. Since TRNG already exists in most secure HW systems, it is not an overhead to the proposed design.

##### 6.2. Spatial/Temporal Randomness & Granularity of STRDVS

The entire circuit is divided into several regions, and different randomly generated voltage control signals from TRNG are supplied to dynamically scale the voltage level in each region. Since NCL is asynchronous and event-driven, difference in latencies among the regions caused by STRDVS is* inherently tolerated* unlike the clocked counterpart. For example, suppose the entire circuit is divided into 56 voltage regions with eight dynamically scaling voltage levels. Then, each region will need a 3-bit randomly-generated voltage control signal. Accordingly, the 8-bit random number generator can yield different random control signals for 56 regions. As such, the temporal randomness can be achieved.

Figure 9 shows a gated signal from TRNG controlling the supply voltage of a STRDVS region as an example. In order for STRDVS to enhance side-channel attack resistance, the power difference due to the change in supply voltage (i.e., for the same input bit) must be comparable with the power difference due to the change in input bit (i.e., for the same supply voltage). As such, the correlation between the input data and the power consumption is substantially reduced. Thereby, the difference in power traces can hardly be used to identify input switching. However, scaling down the voltage has a direct impact on the latency of the processor. Accordingly, the lowest possible voltage that can keep the latency of our NCL processor within the tolerable bound should be determined at design time.

With that determined, we still need to determine two critical parameters: the number of voltage levels (i.e.,* temporal granularity*) and the number of voltage domains (i.e., regions with different supply voltages,* spatial granularity*). Larger and can result in increased security as more noise is injected into the power trace; on the other hand, they may also increase the area and design complexity. As a future work, we will investigate the tradeoffs between area, power, latency, and security and find out the optimal setting of the parameters. In addition, a natural property of our STRDVS method is that the level of security is related to the encryption/decryption data rate: a high data rate gives little room to perform voltage scaling and thus little room to improve the security. It will also be interesting to see a tradeoff curve between the encryption/decryption data rate and the level of security.

#### 7. Experimental Verification of the Proposed Design

##### 7.1. Functional Verification of the Proposed Design

The conventional synchronous implementation and the proposed NCL AES Key Expander and NCL AES Round Function have been implemented in VHDL for a comparative study. The functional verification simulations of these designs have been performed with Mentor Graphics ModelSim. The proposed designs have been functionally verified completely using a large set of test vectors from [1]. A sample set of test vectors is presented in Figure 10 and the corresponding functional verification results are presented in Figures 11, 12, and 13.

##### 7.2. Weighted Average Simultaneous Switching Output (WASSO) Analysis

WASSO tool is an utility of Xilinx PlanAhead suite that validates signal integrity of the device based on the I/O pin and bank assignments made in the design.

This analysis gives a measure of the amount of simultaneous switching occurring in the design. So, we used this analysis to determine the variation in switching activity across both AES Round Function designs. The results obtained were plotted and presented in Figure 14. The implementation platform chosen for carrying out WASSO analysis is Xilinx Virtex-5 FPGA. As switching activity directly depends on the number of simultaneously switching outputs, switching activity can be reduced if SNR gets reduced.

**(a) Individual Banks**

**(b) Neighbors**

From Figures 14(a) and 14(b), it can be observed that the switching activity in the proposed design is lessened to a considerable extent and is also more uniform as compared to its synchronous counterpart. This reduction decreases the amount of unintentionally leaked information and the uniformity makes it more difficult to exploit the remaining leaked information to carry out SCAs.

##### 7.3. Effects of Switching Activity on Signal-to-Noise Ratio

According to (2), it is clear that SNR is directly proportional to . The is a combination of two quantities: and . But is zero as we are considering a DPA attack, in which we perform the same operation again and again but with different input data. So, becomes equal to . The is data-dependent and is a function of switching activity. So, the reduction of switching activity observed from WASSO simulations will translate into reduction of of all the points on the power trace. This overall reduction of will translate into reduction of and consequently reduction of SNR.

Additionally, as discussed previously, power consumption of a cryptosystem is heavily dependant on Hamming weight of data it processes. Due to this, equal Hamming weights of all inputs in our proposed design will enable our NCL design to maintain a uniform power consumption and thereby a uniform SNR on power trace. Thus, the proposed design enables the cryptosystem to have a reduced and uniform SNR, which is a key element for enhancing security.

By using the switching activity results, we performed parametric simulations and plotted SNR of NCL design in comparison to the synchronous approach. These approximate results are presented in Figure 15(a). Using this SNR data, Figure 15(b) shows how variation in SNR influences number of traces that an attacker must collect to perform a successful DPA attack. As SNR ratio decreases, performance of this NCL based approach keeps getting better. So, this is the advantage of employing NCL for cryptosystem design.

**(a) Relative decrease in SNR**

**(b) Relative increase in difficulty for performing DPA**

##### 7.4. Power Benefits

In AES implementations, the SubBytes transformation which entirely depends on the S-box is the most crucial factor deciding the energy performance of the AES itself. More than 50% of entire power is dependent on this step [34–36]. Due to the use of novel NCL S-box design, we achieve a 22% reduction in power consumption [4] at this SubBytes step. So, this reduction will cause significant improvement in the energy efficiency of the proposed NCL based design approach.

##### 7.5. Hardware Implementation and Power Trace Analysis

In the previous section, the performance of our proposed design was evaluated using software simulations. However, to get a more accurate performance analysis, simulations on the hardware implementation are necessary. In this section, we discuss in detail the procedure used for hardware implementation experiment of the proposed design and the synchronous AES. Additionally, we present the power trace data obtained from the power measurements on the hardware implementations and discuss the variations between this obtained data for the two designs. Figure 16 shows the side-channel attack standard evaluation board (SASEBO-GII board) [37] that is used as the basic platform in this experiment.

The reason for choosing this FPGA board as a platform for hardware implementation is that this board has been specifically designed for security evaluation of cryptographic circuits and for the purpose of side-channel attack experiments. There are two FPGA cores in this board that can be utilized. The first FPGA is a cryptographic FPGA which is a Xilinx Virtex-5 series FPGA. The second one is the control FPGA which is a Spartan-3A series FPGA. These FPGAs are connected through a general-purpose input/output common bus. The AES Round Function and Key Expander circuits are implemented in the cryptographic FPGA and the configuration circuit is programmed into the configuration FPGA. The purpose of separating these two circuits is to prevent the power trace of the configuration circuit from interfering with the power trace of the cryptographic circuit so that the measurements of power traces, which decide the resistance of the design to power analysis attacks, can be done fairly.

For the purpose of power trace measurement, shunt resistors are present on FPGA board which utilize core and/or ground lines of cryptographic FPGA to give an accurate measurement of the cryptographic FPGA power consumption. These measurements can be captured by an oscilloscope via a voltage probe.

Figure 17 presents the experimental setup used for power trace analysis. For making a qualitative comparison, in terms of security, between the quality of power traces of the conventional design and the proposed NCL design, we supply a set of three inputs to both designs. As the same inputs are applied to both designs, this enables us to evaluate the performance of different circuits to the same input data.

If we are able to prove that the following two features of the power trace are true for NCL based design, then we can conclude that the proposed approach enhances security. They are as follows. The power trace is more uniform compared to synchronous design for the same input; and the power trace of NCL based approach exhibits a higher degree of similarity between all the three different input cases as compared to the similarity exhibited by synchronous approach.

So, in order to perform a qualitative comparison, we applied a series of three Plaintexts, which are shown in Figure 18, to both cryptosystem designs and encrypted it with the same key. Then, we recorded the power traces for each of these cases for both designs and compared their quality in terms of security. The results are presented in Figures 19 to 24.

From Figures 20, 22, and 24, we can clearly see that the power waveforms look considerably similar for the proposed design in all the three cases even when the input Plaintext is different. But on the contrary for synchronous design, from Figures 19, 21, and 23, we can see that the power trace has clear variations between the three cases, as represented by ovals. These variations as discussed previously can be effectively exploited to compromise security. But, in case of proposed design, we do not see any clear variations between the three traces. In addition to the lack of these variations in the proposed design, we can also see that the waveforms are far more uniform as compared to their synchronous counterparts.

So, with this increased uniformity and with high degree of similarity between power traces for different Plaintexts, we can conclude that security is improved to a considerable extent due to inherent benefits of NCL.

Figure 25 shows the power trace corresponding to NULL-DATA wavefronts in the hardware implemented design. Figure 26 presents the propagation delay in the hardware implementation of the proposed design. After the input is applied, output arrives after 40 ns.

#### 8. Conclusion and Future Work

A novel asynchronous design approach for the two main components of AES, which are the Key Expander and Round function, is reported and validated in this work. This research is being used as the basis for a research project that aims to tape out a silicon chip of NCL AES design, which can be used to carry out more performance evaluation experiments. Contrary to the existing countermeasures which do not target the source of SCA problem and try to find solutions in later stages, the proposed approach combines the merits of NCL design paradigm for balanced switching profile and event-driven operation and spatial/temporal random dynamic voltage scaling (STRDVS) for injecting random noise to mitigate the source of the SCA problem, which is side-channel information leakage. In addition to providing power analysis SCA resistance, our approach also enhances resistance to EMA SCAs. Qualitative comparisons between the proposed approach and the traditional synchronous design have been conducted to verify merits of the proposed design. Both software simulation and hardware implementation results validate the effectiveness and correctness of our approach. In the future, the efficacy of the proposed design approach and its augmentation with STRDVS technique will be evaluated by performing an actual side-channel attack like the DPA or correlation power analysis (CPA).

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.