Abstract

Threshold implementation (TI) is a lightweight countermeasure against side-channel attacks when glitches happen. As to masking schemes, an S-box is the key part to protection. In this paper, we propose a general first-order lightweight TI scheme for 4  4 S-boxes and name it as MiniSat-lightweight-threshold implementation (MS-LW-TI). First, we use MiniSat to optimally decompose an S-box into the least number of three different logic gate operations, AND, OR, and XOR. Among these operations, we define two primitives and the extension of two primitives for TI design. Furthermore, we prove that the primitives and their extensions strictly comply with the security properties. Finally, we implement MS-LW-TI on Xilinx Spartan-6 Field Programmable Gate Array (FPGA) to show that the S-boxes of PRESENT, GIFT, and PICCOLO consume only 17, 15, and 13 look-up-tables (LUTs), 16, 9, and 16 flip-flops (FFs), 6, 5, and 6 slices, respectively. Compared with the existing lightweight TI design, our TI for PRESENT S-box has a 22%, 38%, and 25% reduction of LUTs, FFs, and slices to the design by Shahmirzadi and Moradi at IACR Transactions on Cryptographic Hardware and Embedded Systems (TCHES) 2021, and our TI for GIFT S-box has a 6%, 25%, and 28% reduction of LUTs, FFs, and slices to the design by Jati et al., which is the smallest.

1. Introduction

All along, the constrained devices of the Internet of Things (IoT) have been in demand, for example, RFID tags, smart cards, and sensors in wireless networks with low computing power and implementation area. Many lightweight block ciphers such as PRESENT [1], GIFT [2], PICCOLO [3], and LED [4] have been proposed to protect the sensitive data of constrained devices.

The physical security of cryptography has made remarkable attention since Kocher et al. [5] proposed side-channel attacks (SCA) in 1999. Unprotected hardware implementations of lightweight block ciphers are vulnerable to SCA [68]. Masking uses secret sharing to eliminate the correlation between sensitive data and power consumption during operation and is considered as the theoretical secure resistance [9]. However, masking has to face potential leakage in hardware implementation when glitches happen. In 2006, Nikova et al. [10] proposed threshold implementation (TI) to solve the weakness from glitches. TI has become well-known in hardware masking schemes.

1.1. Related Works

Lightweight block ciphers are used in resource-constrained environments and have strict requirements for resources, especially hardware resources [11]. Therefore, the topic of lightweight and side-channel resistant implementation is a pressing issue for these ciphers [12].

TI is based on secret sharing and multiparty computation methods, and it must satisfy three important properties: correctness, noncompleteness, and uniformity. For the linear components of lightweight cryptographic algorithms, it is easy to satisfy the three properties at the same time. However, the nonlinear components (such as the S-box) are challenging to meet and constructing the TI of an S-box takes a lot of resources. For example, an S-box needs to be added fresh randomness [13] or requires a larger number of shares [14]. Thus, the lightweight TI of these ciphers reduces the number of shares without adding random numbers. According to the TI setting, an S-box with algebraic degree should be split into shares to achieve security against th order attacks. An S-box with a high algebraic degree can be decomposed into multiple S-boxes with a lower algebraic degree. Then, these S-boxes can be designed for pipeline implementation. Generally, the Boolean equation of a 4 × 4 S-box is represented by algebraic normal form (ANF), and its algebraic degree is 3. Poschmann et al. [12] have decomposed the S-box with algebraic degree 3 into two S-box with algebraic degree 2 to construct a three shares TI of PRESENT S-box. Afterward, Kutzner et al. [15] decomposed the PRESENT S-box of degree 3 into combining two quadratic functions and some linear functions to construct a lightweight three shares TI of PRESENT S-box. Since LED uses the S-box of PRESENT, Yao et al. [16] adopted the decomposed S-box of Poschmann et al. [12] directly. Jati et al. [17] decomposed the S-box of GIFT into two S-boxes, which are selected by the LIGHTER tool with estimating gate equivalents, and they realized a three shares TI of GIFT S-box. The three shares TI of GIFT S-box was used by the work of Satheesh and Shanmugam [18] and the work of Caforio et al. [19], respectively. Meanwhile, Reparaz et al. [20] and Gross et al. [21] showed how to use only shares when th order security. Chen et al. [22] adopted the decomposed S-box of Kutzner et al. [15] to realize two shares TI of PRESENT S-box. Subsequently, Shahmirzadi and Moradi [23] proposed lightweight two shares TI of S-boxes of lightweight block ciphers.

The idea of using low algebraic degree S-boxes instead of one high algebraic degree S-box, in essence, reduces redundant logic units. However, there are different decomposition methods for an S-box, and the decomposition methods of the above schemes still have redundant logic units in the ANF equation of S-boxes. These redundant logic units lead to a large number of AND and XOR gate operations for TI. The cost of these gates is very expensive for lightweight implementation.

1.2. Our Contributions

To further solve the problem of lightweight and side-channel resistant implementation, we propose a general, efficient, and low-resource TI scheme named MS-LW-TI.

S-boxes contain the least AND, OR, and XOR gates after MiniSat optimization. Based on these logic gates, we constructed two primitives for TI, which are and . For 4 × 4 S-boxes, we can use the input variables as of the primitive. For 8 × 8 S-boxes, we have to add fresh randomness or merge AND logic gates. We build the first-order MS-LW-TI based on two primitives and their extensions. MS-LW-TI can guarantee the first-order glitch-extended probing secure.

We implement MS-LW-TI on the S-boxes of PRESENT, GIFT, and PICCOLO, which consume only 17, 15, and 13 look-up-tables (LUTs); 16, 9, and 16 flip-flops (FFs); 6, 5, and 6 slices; and 3, 3, and 2 clock cycles, respectively. Compared with the existing lightweight TI design, our two-shares scheme for PRESENT S-box has a 22%, 38%, and 25% reduction of LUTs, FFs, and slices to the design by Shahmirzadi and Moradi at TCHES 2021 [23], and our three-shares scheme for GIFT S-box has a 6%, 25%, and 28% reduction of LUTs, FFs, and slices to the design by Jati et al. [17], which is the smallest one presently.

1.3. Organization

In Section 2, we introduce the security properties of TI, and some classic two and three shares TI of S-boxes of lightweight block ciphers. In Section 3, we analyze two primitives and the extensions of the first-order MS-LW-TI. In Section 4, we present the implementation and security analysis of the first-order MS-LW-TI. Section 5 concludes the paper.

2. Preliminaries

2.1. Threshold Implementations (TIs)

As we know, glitches can occur in the combinational circuit of cryptographic algorithms. In this instance, Faust et al. [24] proposed the glitch-extended probing model. It involves placing a probe on an output port of a circuit, propagating it backward, and extending it to multiple probes at the input ports of the combinational circuit that drives the probed port. For the th order security of a nonlinear function of S-box, there are at least (d + 1) input shares of TIs. After dividing the binary variable into (d + 1) shares , the coordinate Boolean function is split into (d + 1) shared functions . TI has to satisfy three important properties, namely correctness, noncompleteness, and uniformity [10, 13]. Given a variable , its share is represented as follows:For Boolean functions , its share is represented as follows:

Property 1 (Correctness). The shared functions are said to be correct if the output share represents the output of the original Boolean function , i.e.:where the reduced sharing elements are defined by the following equation:

Property 2 (Noncompleteness). Every shared function is independent of at least one share of the input variable . The shared functions in Equation (3) are noncomplete.

Property 3 (Uniformity). If the function is invertible, then uniformity is satisfied by invertible realizations. And the probabilistic distributions of the shared and original inputs are denoted by and , respectively. The input share is said to be uniform if and only if, for any input , its shares occur with the same probability:

where is a constant related to the number of shares and the value of variables. And given uniform input shares, the probabilistic distributions of the shared and original outputs are denoted by and , respectively. The output share is said to be uniform if and only if, for any output , its shares occur with the same probability:

2.2. State-of-the-Art Two and Three Shares Threshold Implementations

TI with or input shares to resist th order attacks have been shown in [12, 1519, 22, 23]. Considering the boundaries of the first-order TI when the is 2 and the is 1, there are two or three input shares. Some classic TI of S-boxes of lightweight block ciphers are described below.

Poschmann et al. [12] proposed a three shares first-order TI of PRESENT, in which the cubic S-box is decomposed into two quadratic S-boxes and represented as . The ANF equations of and have nine AND gates and 19 XOR gates. The ANF equations of three shares TI of S-box are shown in [12], which have 81 AND gates and 105 XOR gates. And the ANF equations of and are as follows:

Kutzner et al. [15] also proposed a three shares first-order TI of PRESENT, in which the S-box can be decomposed as , and the ANF equations of , , and are as follows:

The ANF equations for one , two , and one have eight AND gates and 17 XOR gates. The ANF equations of three shares TI of S-box are shown in [15], which have 72 AND gates and 93 XOR gates. The ANF equations of two shares TI of S-box are shown in [22], which have 32 AND gates and 47 XOR gates.

Jati et al. [17] proposed a first-order TI of GIFT using a similar technique of Poschmann et al. [12] and the ANF equations of and are as follows:These equations have five AND gates and 15 XOR gates. And the ANF equations of three shares TI of and are shown in [1719], which have 45 AND gates and 69 XOR gates.

Shahmirzadi and Moradi proposed a lightweight two shares first-order TI of PRESENT, in which the S-box of degree 3 is used directly. And the ANF equations of the S-box are as follows:These equations have 23 AND gates and 23 XOR gates. And the ANF equations of two shares TI of the S-box are shown in [23], which have 200 AND gates and 164 XOR gates.

3. First-Order TI Design for Primitives

The two and three input shares of the first-order TI have their features. The implementation of three input shares TI can require computational resources to calculate multiple shares. At the same time, the two input shares may require more storage resources to register variables. For example, for a TI of AND gate, the two input shares require four registers, while the three input shares only require three registers. In this paper, we will discuss both two and three input shares of the first-order MS-LW-TI.

3.1. Decomposition of 4 × 4 S-Boxes

The lightweight block cipher PRESENT has become a standard algorithm in International Organization for Standardization (ISO-29192). And the GIFT is an improved version of the PRESENT. The PICCOLO is also a well-known algorithm. Thus, we choose these algorithms as our research objects. To ensure the high security of cryptographic algorithms, the 4 × 4 S-boxes have a high algebraic degree for these algorithms, and their algebraic degree is 3. However, the higher algebraic degree requires more area in hardware implementation. To optimize implementation, we adopt MiniSat to reduce AND gates, OR gates, and XOR gates of an S-box based on the scheme by Stoffelen [25] at Fast Software Encryption (FSE) 2016.

The algebraic degree of 4  4 S-box in GIFT is 3. MiniSat (cryptominisat-5.8.0) decomposes the S-box into multiple logic gate functions, which include three AND gates, one OR gate, and 10 XOR gates. The logic gate functions of the S-box are as follows:where , , , and are input variables, , , , and are output variables, and , , , , , and are intermediate variables. The symbols , , and are AND, OR, and XOR gates, respectively. We know that the OR gate can be represented by the AND gate, and the NOT gate can be represented by the XOR gate and a constant 1. So can be rewritten as .

The algebraic degree of the PRESENT S-box is also 3. Courtois et al. [26] presented the optimal logic gate functions of the S-box by MiniSat, these logic functions include two AND gates, two OR gates, and 10 XOR gates. For the logic functions of the S-box, see Appendix A.

The PICCOLO S-box is a simple nonlinear function, its algebraic degree is 3. The optimal logic gate functions of the PICCOLO S-box include four OR gates and nine XOR gates. For the logic gate functions of the PICCOLO S-box, see Appendix B.

The S-box optimizations are also applied by Bilgin et al. [27] and Cassiers et al. [28] to reduce AND depth and gate complexity, which make a low-latency hardware circuit. For the number of logic gates, their S-box optimizations are not the least. But our S-box optimization obtains the minimum number of logic gates. Our scheme is different from theirs. For example, we use the number of logic gates of a PRESENT S-box to discuss the difference. For better understanding, we make AND gate represent OR gate and XOR gate represent NOT gate. The scheme of Bilgin et al. [27] needs four AND gates and 24 XOR gates. And the scheme of Cassiers et al. [28] needs four AND gates and 20 XOR gates. Our scheme only needs four AND gates and 16 XOR gates. So our scheme is the least. The AND depth of schemes by Bilgin et al. [27] and Cassiers et al. [28] is 2, our scheme is 3, which is the largest for AND depth, but the number of logic gates is the least, which is suitable for lightweight block cipher to reduce implementation area.

3.2. TI of Primitives
3.2.1. The Basic Idea of Primitives

An S-box is represented as two logic functions after MinSat: two-input XOR gate and two-input AND gate , where and are input variables and is the output variable because the OR gate can be represented by AND gate. The is a noninvertible function without satisfying uniformity. We construct an invertible function to satisfy uniformity by adding XOR a new number . The invertible function is expressed as . For GIFT, PRESENT, and PICCOLO 4  4 S-boxes, there are four functions, which means they need four one-bit to construct these invertible functions. The four one-bit numbers are the four input variables of S-boxes. For AES 8  8 S-box, there are 32 functions, which means they need 32 one-bit numbers, while the eight input variables are not enough, and they have to add 24 one-bit fresh random numbers. If we do not add the fresh random numbers in the scheme, we can consider merging the 32 functions, and a like way is described in detail in [23]. We will not discuss it in this paper.

3.2.2. The Establishment of Primitives

We design the first-order TI of two and three shares corresponding to (d + 1) and (2d + 1), respectively.

Linear logic gate function: .

Nonlinear logic gate function: .

Thus, the two logic gate functions and are defined as two basic primitives of MS-LW-TI.

Two basic primitives of two shares are described as follows: The and are shares of , and the and are shares of . is the third variable, whose shares are and :

(1) Linear Transformation: . The first-order TI with two shares is as follows:

(2) Nonlinear Transformation: . The first-order TI with two shares is as follows:where , , and are register variables, and are output shares. The operates of and are defined as compression layers.

Two basic primitives of three shares are described as follows: The , , and are shares of , and the , , and are shares of . is the third variable, whose shares are , , and .

(3) Linear Transformation: . The first-order TI with three shares is as follows:

(4) Nonlinear Transformation: . The first-order TI with three shares is as follows:where , , and are output shares.

Based on the two basic primitives, we can extend to logical gate functions such as , , and .

3.2.3. Secure Properties of Primitives

We prove the security properties of the two primitives for (d + 1) and (2d + 1) boundaries of the first-order TI.

Proposition 1. The primitive has correctness, noncompleteness, and uniformity of TI.

Proof. (1) Two shares of the scheme: In Equation (13), the original output is equal to the XOR of output shares, i.e.:It satisfies the correctness.
In Equation (13), is independent of and , and is independent of and . It satisfies the noncompleteness.
The probabilistic distributions of input and output shares are denoted as and , respectively. The probability of input shares is described as follows:where the number of different (, ) values is 4 and the number of share is 2.According to Equation (6), it satisfies the uniformity.
Based on the uniformity of inputs, the probability of shared outputs is described as follows:where the number of different values is 2, the number of share is 2.According to Equation (6), the output of primitive satisfies the uniformity. Under the condition of uniformity of shared inputs, the shared outputs are uniform, which meets the uniformity.
(2) Three shares of the scheme: The proof process of Equation (16) is the same as Equation (13), similarly, satisfies correctness, noncompleteness, and uniformity of TI with three shares.

Proposition 2. The primitive also has correctness, noncompleteness, and uniformity of TI.

Proof. (1) Two shares of the scheme: In Equation (14), the original output is equal to the XOR of output shares, i.e.:It satisfies the correctness. In Equation (14), there are no two shares of an input variable that appears in registers , , , and at the same time. Thus, it satisfies the noncompleteness.
The primitive is an invertible function, and the probabilistic distribution shared inputs and outputs are denoted by and , respectively. The uniformity of shared inputs is described as follows:where the number of different (, , ) values is 8 and the number of share is 2. According to Equation (6), the input shares satisfy the uniformity.
Under the uniformity of inputs, the uniformity of shared outputs is described as follows:where the number of different values is 2 and the number of share is 2. According to Equation (6), the output of primitive satisfies the uniformity.
(2) Three shares of the scheme: The proof process of Equation (17) is the same as Equation (14), similarly, satisfies correctness, noncompleteness, and uniformity of TI with three shares.

Based on primitives and , we can extend to logic gate functions that also satisfy the correctness, noncompleteness, and uniformity.

Property 1: The function has correctness, noncompleteness, and uniformity.

Proof. According to Proposition 1, the primitive satisfies three secure properties of TI. If we regard as a new variable to replace , then can be transformed into , which is equivalent to the inverse of . Thus, the function has correctness, noncompleteness, and uniformity.

Property 2: The function has correctness, noncompleteness, and uniformity.

Proof. According to Proposition 2, the primitive satisfies three secure properties of TI. Meanwhile, according to Proposition 1, the primitive satisfies three secure properties of TI. If we regard as a new variable to replace and to replace , then is transformed into , which meets Proposition 1. Thus, the function satisfies correctness, noncompleteness, and uniformity.

Property 3: The function has correctness, noncompleteness, and uniformity.

The proof process of Property 3 is the same as Property 1 and Property 2.

4. MS-LW-TI of 4  4 S-Boxes

4.1. Design of MS-LW-TI

We take the S-box of GIFT as an example to introduce the specific construction and implementation of the MS-LW-TI. An S-box is solved by MiniSat to obtain two-input logic gates, and . After the two logic gates are constructed, the following five logic gate functions are summarized: , , , , and . We describe the implementation of the five logic gate functions:

(1) and .

The logic gate functions and are the two basic primitives, which have been introduced in Section 3.2.

(2) .

Two shares of the scheme: , , , , , and are , , and input shares, respectively:where and are output shares.

Three shares of the scheme: , , , , , , , , and are , , and input shares, respectively:where , , and are output shares.

(3) .

Two shares of the scheme: , , , and are and input shares, respectively:

Three shares of the scheme: , , , , , and are and input shares, respectively:

(4) .

According to the design of and , we can implement .

After MiniSat, the S-box of GIFT is decomposed as 14 logic gates. And we construct the primitives and extensions from the 14 logic gates. These primitives and extensions have a cascaded and parallel relationship with each other. To disallow the propagation of glitches in cascaded logic functions that include AND logic gate, we need to insert several registers among these functions. For example, , , and . The implementation process of all the logic functions of the GIFT S-box is shown below.

The , , , and are input variables, the , , , and are output variables, and the and are intermediate variables:

For the GIFT S-box, the design architectures of two shares and three shares of MS-LW-TI are shown in Figures 1 and 2, respectively. In Figure 1, the compress functions compress logic functions to output shares by means of XORs. And the ANF equations of two shares and three shares of the MS-LW-TI are listed in Appendices C and D, respectively.

At the same time, the implementation process of the logic gate functions of PRESENT S-box and PICCOLO S-box are listed in Appendices E and F, respectively.

4.2. Security Analysis of MS-LW-TI

Theoretically, we have proved that the five logic gate functions satisfy the basic properties of TI. These functions can be considered local security, but the security of the overall S-box composed of these functions is unknown in implementation. Because there is a cascaded and parallel relationship between these functions, which may lead to the S-box failing to meet the basic properties of TI. The security definitions of cascaded and parallel functions are described by Bilgin et al. [29] and Dhooghe et al. [30]. It shows that let and be two functions with the same uniform input . If and are parallel, each satisfies the basic properties of TI. Then TI of and cannot lead to information leakage. If the outputs of and are the inputs of , which are cascaded. In addition to satisfying the properties of TI, the joint distribution of the and is uniform, then TI of is safe.

Based on the above definition, we have verified the security of MS-LW-TI. Here, we selected the GIFT S-box to discuss, PRESENT S-box, and PICCOLO S-box can also be verified in the same way. We will not describe them in detail.

First, for the correctness of the scheme, we can verify the output shares according to the input shares of the shared S-box, and logic gate functions satisfy the correctness of MS-LW-TI, then the output results of the shared S-box are equal to the original S-box. Second, we verify the noncompleteness of MS-LW-TI. The , , , and are input variables of the S-box, and the shares of each input variable are operated separately and do not appear at the same time in the shared function circuits. So, placing a probe on each shared function circuit does not reveal any information.

Finally, we verify the important uniformity of MS-LW-TI. The first-order probing secure implementation of can be easily achieved with to replace the fresh mask bit [23]. For the GIFT S-box, two shares of the scheme, the variables , , , , , , , and can act as the fresh mask bit in the logic functions , , , and , respectively. Meanwhile, three shares of the scheme, the variables , , , , , , , , , , , and can act as the fresh mask bit in the logic functions , , , and , respectively.

Two shares of the scheme: Among these functions, , , , and containing AND logic gates, sequential circuit implementation is performed to block glitches using registers. Thus, we just need to consider the uniformity of input share and output share for each function. All possible input values of shares , , , , , , , and for the whole S-box are . , , , , , and are the input shares of and functions, and are the output shares of , and and are the output shares of . When the output shares of one function are used as the input shares of another function, , , , , , and are the input shares of , and and are the output shares of . , , , , , and are the input shares of , and and are the output shares of . The input and output share distributions of these functions are shown in Table 1. Looking at the distributions of the shares in Table 1, the input and output shares of , , , and satisfy uniformity.

We also need to consider glitches for the combinational circuit implementation of the compress and functions. Among compress functions, the joint probability distributions of the output data are equal, the number of occurrences of (0,0), (0,1), (1,0), and (1,1) are 96, 32, 96, and 32, respectively, and the joint probability distributions of the output data , , and have the same situation. Meanwhile, the joint probability distributions of the output data are also equal, the number of occurrences of (0,0), (0,1), (1,0), and (1,1) are 96, 96, 32, and 32, respectively, and the joint probability distributions of the output data , , and also have the same situation. Thus, it is shown that the output of the compress function is independent of the input data and satisfies the first-order glitch-extended probing secure.

For function, the joint probability distributions of the output data are equal, the number of occurrences of (0,0,0,0), (0,0,0,1), (0,0,1,0), (0,0,1,1), (0,1,0,0), (0,1,0,1), (0,1,1,0), (0,1,1,1), (1,0,0,0), (1,0,0,1), (1,0,1,0), (1,0,1,1), (1,1,0,0), (1,1,0,1), (1,1,1,0), and (1,1,1,1) are 48, 0, 32, 16, 0, 16, 0, 16, 48, 16, 32, 0, 0, 0, 32, and 0, respectively. The joint probability distributions of the output data are also equal, the number of occurrences of (0,0,0,0), (0,0,0,1), (0,0,1,0), (0,0,1,1), (0,1,0,0), (0,1,0,1), (0,1,1,0), (0,1,1,1), (1,0,0,0), (1,0,0,1), (1,0,1,0), (1,0,1,1), (1,1,0,0), (1,1,0,1), (1,1,1,0), and (1,1,1,1) are 48, 32, 0, 16, 48, 32, 16, 0, 0, 0, 16, 16, 0, 32, 0, and 0 respectively. Thus, it is shown that the output of the function is also independent of the input data and satisfies the first-order glitch-extended probing secure.

Three shares of the scheme: This scheme is like two shares of the scheme where we need to consider the uniformity of input share and output share for , , , and in a sequential circuit. All possible input values of shares , , , , , , , , , , , and for whole S-box are . , , , , , , , , and are the input shares of function whose distribution contains 512 different cases, each occurring eight times. And , , and are the output shares of function whose distribution contains eight different cases, each of which occurs 512 times. Similarly, the , , and have the same input and output distributions as the . Thus, the input and output shares of , , , and also satisfy uniformity. In combinational circuit, for the function, the joint probability distributions of the output shares are 64 different cases, each occurring 64 times. Therefore, the joint probability distributions of the output shares are equal, the function also satisfies the first-order glitch-extended probing secure.

We also verify that the output shares uniformity of the whole S-box for all possible input shares is based on the definition in [29]. Uniform sharing of a function (circuit): The sharing is uniform if and only if:

, with , :

,

where the and are the numbers of input variables and output variables, respectively. Meanwhile, the and the are the number of input shares and output shares, respectively.

For two shares of S-box of GIFT, , , , and , . For three shares of S-box of GIFT, we also verify the uniformity of each output. The , , , and , then . The distribution of output shares of the S-box shows in Tables 2 and 3 for all possible input shares. From the data in Tables 2 and 3, the distribution of output shares of shared S-box is uniform. Thus, MS-LW-T satisfies the first-order glitch-extended probing secure.

4.3. Hardware-Resources Consideration

Logic circuits are composed of logic gates, and the number of logic gates determines the size of a logic circuit [12]. In Section 2.2, it is mentioned that the implementation of an S-box can have different logic functions, and different logic functions have different numbers of logic gates. Shahmirzadi and Moradi directly used the cubic S-box, and we found a higher number of AND gates and XOR gates for the implementation because the ANF equations contain many redundant logic units, for example, the operation and operation in the function, which also appear in the function and function, respectively. Poschmann et al. [12] decomposed the S-box into and to reduce the algebraic degree of the S-box and also reduce the number of AND logic gates, but there are still some redundant logic units in ANF equations of and , for example, operation in the function and the function. And Kutzner et al. [15] decomposed the S-box into two identical G functions. Jati et al. [17] decomposed the S-box into and . The ANF equations of and also have redundant logic units, for example, operation in the function and the function. We use MiniSat to decompose the S-box, which also reduces the algebraic degree of the S-box, while the ANF equations do not contain redundant logic units, thus minimizing the number of logic gates, as shown in Section 3.1. When a TI of S-box is implemented, the functions in the ANF equations are split into several shared functions independent of each other, and then the output of these shared functions is calculated together to get the output of the original functions, and the realization is always required to satisfy the three secure properties of TI [13]. MS-LW-TI constructs the primitives to implement each function from the three secure properties of TI. Meanwhile, MS-LW-TI can resist first-order glitch-extended probing attacks like the above schemes. When the ANF equations of the S-box require fewer logic gates, the number of logic gates required for its TI will also be reduced, thus reducing the overall protection resources [12]. We count the number of AND gates and XOR gates in the ANF equations of S-boxes and the ANF equations of TI of S-boxes, respectively. The comparison results between MS-LW-TI and existing TI are shown in Table 4. From the results in Table 4, we find that MS-LW-TI has the least number of AND gates and XOR gates.

4.4. Experiments of MS-LW-TI
4.4.1. Evaluation of MS-LW-TI

We have proved that MS-LW-TI satisfies the first-order glitch-extended probing secure. To evaluate the information leakage of the design implementation, we implement our scheme on the Spartan-6 FPGA of the SAKURA-G board [31]. Meanwhile, we use the PicoScope 5000 Series to collect the power consumption traces at a sampling rate of 500 MS/s, and the number of sampling points of a power consumption trace is 5,000 points. The trigger signal goes high at the 10% position and drops to low after one clock cycle. The test vector leakage assessment (TVLA) is a general evaluation technique used to evaluate the security of masking schemes [32]. In the evaluation experiment, we collected the power consumption traces by conducting a reliable fixed-versus-random t-test, and then we evaluated whether there is a difference in the mean values of the distributions of the two sets of traces to determine whether there is information leakage. The TVLA threshold of 4.5 is selected based on [33] (confidence interval is 99.999%). We use the TVLA to perform the unprotected and first-order MS-LW-TI of the GIFT, PRESENT, and PICCOLO S-boxes.

We sample 200,000 power consumption traces for the unprotected S-boxes. The target FPGA receives unprotected input and issues output also in the same unprotected form. From Figures 35, the values of t-statistics are relatively large, more than 4.5. The difference between fixed and random data leads to information leakage of input operations, logical operations, and output operations for running the unprotected S-boxes on the target FPGA. This result is as expected and demonstrates a sufficiently high signal-to-noise ratio (SNR) for the experimental setup [17].

As we know, as the amount of traces increases, the t-statistics may also become larger. We sample 5,000,000 power consumption traces for first-order MS-LW-TI of the S-boxes. These traces have 25 times the number of unprotected S-boxes in the same experimental setup. The target FPGA receives shared input and issues output also in the same sharing form. From Figures 611, the values of t-statistics are small, which are in the range of 4.5. This result is also as expected and demonstrates there is no information leakage. On the other hand, the masking schemes can reduce the SNR in the implementation [34]. Considering protection against SCA, we compare the MS-LW-TI with the existing work in terms of the properties of TI, TVLA evaluation, and security order, respectively. Through Table 5, we consider that MS-LW-TI can resist first-order SCA.

4.4.2. Performance of MS-LW-TI

The implementation of MS-LW-TI is based on the Design Suite 14.7 with Xilinx Spartan-6 (XC6SLX75) FPGAs for S-boxes, which are used in the existing TI schemes [35, 36]. When these TI schemes map a hardware design to an FPGA, the authors count the number of occupied slices as a metric for size. Every slice contains four LUTs and eight FFs. In terms of resources, we implement the first-order MS-LW-TI of the GIFT S-box. The scheme of two shares requires 14 LUTs, 16 FFs, and six slices, and the scheme of three shares only requires 15 LUTs, nine FFs, and five slices. In terms of performance, the scheme of two shares needs 3 clock cycles and the maximum frequency is 369.959 MHz, the scheme of three shares needs three clock cycles and the maximum frequency is 351.247 MHz. We implement the first-order MS-LW-TI of the PRESENT S-box. The scheme of two shares only requires 17 LUTs, 16 FFs, six slices, and 351.494 MHz, the scheme of three shares requires 20 LUTs, nine FFs, six slices, and 343.289 MHz. The first-order MS-LW-TI S-box of PRESENT requires three clock cycles. Furthermore, we implement the first-order MS-LW-TI of the PICCOLO S-box. The scheme of two shares requires 13 LUTs, 16 FFs, six slices, and 527.704 MHz, the scheme of three shares only requires 14 LUTs, six FFs, eight slices, and 539.084 MHz. The first-order MS-LW-TI S-box of PICCOLO only requires two clock cycles.

The MS-LW-TI has lower resources in hardware implementation. Since the S-boxes are optimized, we obtain the least logic gates to construct the protection of S-boxes based on the primitives. The S-box optimizations of Bilgin et al. [27] and Cassiers et al. [28] have not been constructed into TI, and we do not consider the protection resources and the performance to compare with them. Compared with the TI of Shahmirzadi and Moradi, our two-shares scheme reduces 5 LUTs, 10 FFs, and two slices, which has a 22% reduction of LUTs, 38% reduction of FFs, and 25% reduction of slices with only two additional clock cycles. Compared with the TI of Jati et al. [17], our three-shares scheme reduces one LUT, three FFs, and two slices, which has a 6% reduction of LUTs, 25% reduction of FFs, and 28% reduction of slices with only one additional clock cycle. The detailed results are shown in Table 6. The designs expressed in Verilog Hardware Description Language (HDL) and the experimental results of the schemes are based on the implementation (post-place & route) of Design Suite 14.7 with Xilinx Spartan-6 FPGAs (XC6SLX75, Area Optimization Synthesis Mode) in Table 6.

5. Conclusion

TI is a very important masking scheme and can guarantee no leakage when glitches happen. The TI design of S-boxes is the key issue to protecting lightweight block ciphers such as PRESENT, GIFT, and PICCOLO. However, there is a high decomposition complexity and more resource consumption for TI of S-boxes of lightweight block ciphers.

To solve the problems, we propose a general, efficient, and low-resource TI scheme named MS-LW-TI. The S-box after MiniSat contains only AND gates, OR gates, and XOR gates, which are constituted into two basic primitives. Meanwhile, the two primitives can be extended to any logic gate functions of S-boxes. We build the (d + 1) and (2d + 1) first-order MS-LW-TI for the S-boxes, which satisfy the first-order glitch-extended probing secure. Thus, we take the GIFT S-box as an example to demonstrate the design, implementation, and safety analysis of MS-LW-TI in detail. By TVLA evaluation on 5,000,000 traces, the MS-LW-TI is no information leakage.

In this paper, we designed the lightweight implementation of (d + 1) and (2d + 1) first-order MS-LW-TI for the S-box of PRESENT, GIFT, and PICCOLO without any fresh randomness. The results of experiments show that MS-LW-TI has fewer resources than existing TI schemes. The MS-LW-TI is the smallest one presently, which is suitable for lightweight block cipher to resist first-order SCA in hardware implementation.

Appendix

A. The Logic Gate Functions of PRESENT S-Box by MiniSat.

The , , , and are input variables, the , , , and are output variables, the , , , , , , , , and are intermediate variables:

B. The Logic Gate Functions of PICCOLO S-Box by MiniSat.

The , , , and are input variables, the , , , and are output variables, the , , , and are intermediate variables:

C. The ANF Equations of Two Shares of the MS-LW-TI for GIFT S-Box.

The , , , , , , , and are input shares, the , , , , , , , and are output shares:

D. The ANF Equations of Three Shares of the MS-LW-TI for GIFT S-Box.

The , , , , , , , , , , , and are input shares, the , , , , , , , , , , , and are output shares:

E. The Implementation Process of Logic Gate Functions of PRESENT S-Box.

The , , , and are input variables, the , , , and are output variables, the , , , , and are intermediate variables:

F. The Implementation Process of Logic Gate Functions of PICCOLO S-Box.

The , , , and are input variables, the , , , and are output variables:

Data Availability

Data are available upon request from the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key R&D Program of China (No. 2022YFB3103800).