Abstract

With the globalization of the manufacturing supply chain, the malicious modification existing in the middle of distrust is becoming an important security issue on the chip. These modifications are called hardware Trojan (HT). HT is difficult to detect due to its high concealment and diversity of implementation. HT detection based on the side channel is a relatively effective detection method because it does not need to trigger the Trojan or destroy the chip. However, detection based on the side channel faces two major challenges. Firstly, the side channel detection is quite dependent on the golden model. The second one relates to the accuracy of the samples. Side channel information of the chip comes from the hardware manufacturing process and implementation, so it is obviously affected by process variation. In the existing work, many self-reference detection methods have been proposed to solve the problem of missing golden models. However, the existing methods often have special requirements for the circuit structure (such as the need for self-similar structures in the circuit). And, they can hardly resist process variation. This paper combines design and detection. We select the power consumption generated at different times and construct two self-reference ‘knapsack’ to detect HT. The solution proposed in this article is a kind of self-reference method, but we need neither self-similar structures nor the same state of some clocks in the circuit. Meanwhile, by constructing the ‘knapsack,’ we reduce the impact of process variation on detection accuracy because the process variation in the two sets of power consumption is balanced.

1. Introduction

With the development of global outsourcing manufacturing services, an emerging security problem has emerged in the field of Integrated Circuit (IC) manufacturing, that is, potential chip modification in uncontrolled chip manufacturing [1]. These modifications, maliciously and intentionally applied to the circuit, are called the Hardware Trojans [2]. The hardware Trojan can be divided into the Always-On Hardware Trojans (AHT) and the Triggered Hardware Trojans (THT) according to the different trigger mode. An Always-On Trojan can cause harm as soon as the power of the circuit is on. A Triggered Hardware Trojan contains two parts: the trigger circuit and payload circuit, as shown in Figure 1. The trigger circuit starts running after the power is on, but it does not show malicious behaviour. It only monitors some signals or a series of events in the circuit, and its output is connected to the load circuit of the Trojan. The payload circuit is usually in a silent state. Once triggered, it shows malicious behaviour. Compared with Always-On Trojans, Triggered Trojans are more dangerous because the adversary can control when the Trojan is detonated. The process of chip design and manufacture can be divided into specification, design, fabrication, testing, and assembly. It is possible to inject hardware trojans in design, fabrication, and assembly [3].

For these different Trojan injection stages, a variety of hardware Trojan detection methods have been proposed. Li et al. [4] divided detection methods into two categories. Detecting the HTs inserted by the EDA tools or brought in the IP cores is called pre-silicon detection and finding the HTs inserted during the assembly stage and the manufacturing stage is called post-silicon detection. The objects of pre-silicon detection are RTL code, netlist, domain, and so on. Pre-silicon detection can be further divided into (a) detections based on formal verification and functional simulation methods [57], (b) detections based on the information flow [8], and (c) detections based on the analysis of Trojan characteristics [9, 10]. The objects of post-silicon detection are actual IC manufactured by untrusted founders. Widely used post-silicon detection technologies include (a) logic testing [11], (b) side channel analysis [1217], (c) detection combining logic test and side channel detection [18, 19], and (d) chip reverse engineering [20].

Existing detection techniques have great limitations. Pre-silicon detection is often used to detect the Trojan with a specific structure or function because most pre-silicon detections require prior knowledge of the Trojan. For example, the detection based on the analysis of Trojan features requires modelling in advance. Formal verification and functional simulation detection are only effective for the Trojan with specific behaviour. Among post-silicon detection, logic test and other technologies which need to activate the Trojan for detection have great limitations. Because of the rarity of Trojan triggering events, it is difficult to activate the Trojan in physical detection. Trojan detection based on side channel analysis is a widely used post-silicon method [21]. The SCA method can detect the Trojan even when it is not triggered because the trigger circuit of THT keeps running as soon as the power is on. However, most of the previous SCA approaches rely on a golden chip which is usually hard to obtain. Procuring a golden chip may require destructive reverse engineering through decapsulation, delayering, and imaging of the chip [9].

The related work of golden chip-free detection can be divided into two categories. In the first category, a golden template is simulated for detection through the netlist file or layout file. He et al. proposed a novel strategy for HT detection using electromagnetic side channel-based spectrum modelling and analysing [22]. They utilize the design data at the early stage of the IC lifecycle, and the generated spectrum can serve as the golden reference. Rad et al. proposed a method which does not need a golden chip, but a Trojan-free layout is required to serve as the trusted model [23]. The second category is self-reference hardware Trojan detection based on the spatial or time similarity of circuit parameters. Du et al. proposed a self-reference method to compare the characteristics of transient current between two circuit blocks [24]. However, the method requires a set of golden chips to effectively eliminate process variation. Hoque et al. proposed a time self-reference TeSR method [25], in which the current signature of a chip at two different time windows is compared to isolate the Trojan effect. Zheng et al. proposed an IC integrity analysis SeMIA based on spatial self-similarity [26]. SeMIA compares the side-channel signature of one block with another self-similar block on the same chip. The key idea is that different self-similar blocks (i.e., parts of an adder, comparator, memory, and logical datapath) experience different stresses due to widely varying levels of activities, or exhibit asymmetric side-channel signatures due to HT attacks.

This paper proposes a golden chip-free hardware Trojan detection scheme based on the power side channel. In order to overcome the situation that the similar structure is easily bypassed by the adversary and cannot cover the whole circuit, our scheme modifies the original design. We take advantage of the fact that the physical power consumption is proportional to the number of logic gate toggles. Under certain inputs, we construct two circuits with the same toggles for self-reference detection. It is theoretically guaranteed that the complexity of the adversary bypassing our detection scheme is . Through simulation experiments, we demonstrate the ability to reduce process variation.

The rest of this paper is organized as follows. In Section 2, we introduce the toggle count power model, discuss our detection principle, and analyse the method to reduce process variation. In Section 3, we describe the detection scheme in detail. In Section 4, we construct both simulated and physical experiments. Section 5 concludes this paper.

2. TC-Based HT Detection

The detection technology studied in this paper is aimed at the THT injected during fabrication. The trigger part of the THT is always active, no matter whether the Trojan is triggered. Therefore, they will generate additional power consumption outside the original circuit and will be reflected on the power side channel. We can determine whether a hardware Trojan is injected in the circuit by detecting the extra power consumption. To effectively evaluate power consumption, we introduce the toggle-count power model firstly.

2.1. Toggle Count in the Digital Circuit

The digital circuit consumes power whenever they perform computations. The total power consumption of a CMOS circuit is the sum of the power consumption of the logic cells making up the circuit. The power consumption of a logic gate is [27]where α indicates the number of logic gate toggles per unit time. α is related to the data and operation of the circuit. Other parameters have nothing to do with the operation or data of the circuit. They are only affected by the electrical characteristics.

Equation (1) shows that if the electrical parameters are determined, the leakage power has a direct relationship with the toggles of the logic gate. Mangard et al. mapped the toggles to simulate power consumption and successfully recovered the key from the AES encryption with mask protection [28]. This power model is called a toggle count (TC) model. The value of TC is calculated by the following equation:where is the time point, is the total number of logic gates in the circuit, represents the TC of the th gate during , is the delay of the last toggle in the combined circuit [29], and is the input vector pair, and we denote it as .

According to equation (2), TC changes with . The TC model only takes TC into count rather than the function of the circuit. Therefore, different circuits may have a specific so that their TC is the same. And, the same circuits can share the same TC under different values of .

Definition 1. (pair circuits). for the input vector and , if , then the circuits and ( and can be the same circuit) under and are pair circuits.
According to equation (1), when the electrical parameters are determined, the power consumption of two circuits with the same TC is equal. So, the power consumption of pair circuits is equal in reality.

2.2. Process Variation in TC-Based HT Detection
2.2.1. Detecting HT with TC Directly

In Section 2.1, we explained that pair circuits share the same power consumption. Therefore, given two , if the two circuits can be configured as pair circuits, we can achieve self-reference detection through them.

When pair circuits are activated by s which meet Definition 1, they generate equal TC. In this situation, if one Trojan is injected in the pair, the TC of the two circuits will differ from each other caused by TC of the Trojan trigger circuit unless the adversary can guess the used and make the Trojan generate no additional toggles. To obtain this correct , the adversary needs to exhaust the space of (, where is the length of input vectors). Based on this idea, the basic detection algorithm is given in Algorithm 1.RandomSelect. Choose two for two circuits separately.CircuitExpand. Add redundant circuit into original circuit with fewer TC so that and become pair circuits at the design stage. After CircuitExpand, and can generate the same power consumption. Therefore, as long as the relationship between and is compared, it is possible to identify whether a hardware Trojan is injected into pair circuits.TapeOut. Stand for chip manufacturing. Before TapeOut, we add auxiliary detection circuit. Circuit after TapeOut is the object of physical detection.

Input: circuit design and , input vector space
Output: whether existing Trojans in or
(1) = RandomSelect ();
(2);
(3)if < then
(4)   = CircuitExpand ();
(5)else then
(6)   = CircuitExpand ();
(7)end
(8)TapeOut (, );
(9)ifthen
(10)   return TRUE;
(11)  else then
(12)   return FALSE;
2.2.2. Problems Caused by Process Variation

In the physical environment, the side channel information will be affected by process variation and measurement noise during the measurement process. It means that the physical power consumption is not completely equal to the simulation power obtained by the power model. In Algorithm 1, the presence of noise will bring great challenges to the detection. It is generally believed that when the amount of data is large enough, measurement noise can be reduced by statistical methods. However, as the offset is introduced by the circuit in the production process, process variation is a fixed value even in multiple measurements and cannot be eliminated statistically.

Process variation refers to the deviation of threshold voltage and gate capacitance of the transistor [30] caused by the difference between gate length, oxide thickness, and channel doping during the production of transistors [31]. Process variation will eventually be reflected on side channel information such as power consumption and time delay. The lightweight characteristic of Trojan makes the Trojan circuit’s proportion in the original circuit very small. So, using Algorithm 1 directly will make the differences caused by Trojan to be overwhelmed by noise. The detection ability of the hardware Trojan cannot be guaranteed. Let the power consumption deviation caused by logic gate be .

2.2.3. Reduce Process Variation

In order to be similar to measurement noise and statistically reduce the influence of process variation, the concept of pair circuits is extended to multiple groups of .

The new idea is shown as follows. We construct two sets of vectors, and . The logic gate which toggles under will also toggle under , and the corresponding TC is equal. We add the physical power consumption of the circuit under and separately and compare the overall power consumption.

Firstly, because the total TC is same, the overall power consumption is also theoretically same. Process variation caused by different processes for each logic gate may meet a certain distribution. But, when the logic gate is produced, the variation is a fixed value. For the same gate, each is the same even if running multiple times. We can get . That is, the total power consumption deviation due to process variation is same. Under this detection scheme, even process variation is considered, and the accumulated power consumptions are still equal. If a hardware Trojan is injected into the circuit, the TC generated by the Trojan trigger circuit will make the two sets of power consumption unequal.

2.2.4. Feasibility Improvement

When the number of logic gates of the circuit is extended to the order of billions or millions, it is a hard task to ensure that the TC of each gate is the same. Every time a logic gate is added to the operation, it is equivalent to an increase in the calculated dimension by one. In order to reduce the analysis complexity, we divide the circuit into several small regions, and each region is a square with side length , as shown in Figure 2. Define each square as a grid, denoted as . We make the logic gates toggle the same numbers in each grid. The partition proposed in this article refers to the physical partition rather than the circuit logical segmentation and has nothing to do with the specific circuit function. Once the length is determined, the circuit layout can be partitioned, and the time complexity is . The purpose of dividing the layout is to normalize the process variation of every gate in the same grid so that the same TC of each gate is converted to the same TC of each grid. In this way, we can reduce the complexity of the problem. Later, we discuss the rationality of such partitioning. Note that, when the circuit is segmented logically, the combination circuit between registers is generally divided into one part. However, in our method, a combination circuit may also be divided into two or more grids (when is small enough).

Wafer is the basic material used in the manufacture of silicon semiconductor-integrated circuits. The wafer can be oxidized and etched to produce various circuit element structures. After the etching and other steps, the wafer is divided into individual die and becomes an integrated circuit product with specific electrical functions. In these circuits, intradie variations exhibit spatial correlation [32]. There are perfect correlations among the devices in the same grid, high correlations among those in close grids, and low or zero correlations in far-away grids.

Hypothesis 1. For a given grid , , , and there is .
Under Hypothesis 1, if the TC of two in one grid are the same, for this grid, the total process variation generated by two is the same. For the given and , if the TC of any grid is the same, the process variation in the two power consumption is equal.
Trojan detection under Hypothesis 1 can effectively reduce the complexity of the problem and make the detection scheme feasible because we degenerate the dimensions from each gate to each grid. However, the feasibility improvement also reduces the detection accuracy. When reaches the minimum value (as shown in Figure 2(b)), each grid contains only one logic gate. It is assumed that Hypothesis 1 is true. At this time, the highest detection accuracy can be obtained, and the effect of process variation is completely avoided. When each grid contains multiple logic gates, there are differences between each gate in the grid. But, they are highly correlated. Therefore, power consumption analysis under Hypothesis 1 can effectively reduce the impact of process variation, though it still exists in the end. The larger is, the greater the influence of process variation will be. At the same time, the larger makes the condition that TC of each grid reaches the same faster and easier. The balance between detection accuracy and time overhead can be dynamically adjusted according to the designer/detector.

2.3. Build KP-like TC
2.3.1. Pair TC Sets

For , denotes the TC generated by the circuit under , where denotes the TC at the clock and is the maximum clock cycle of the circuit. Then, the TC of the total circuit under can be recorded as an × matrix . And, the physical power leakage corresponding to is a matrix L:

For any element in the matrix , , where represents the TC of the th grid at the th clock when the input is and is the number of grids. If the circuit does not run for more than under one , the elements in T are padded with zeros.

Definition 2. (pair TC sets). For a circuit divided into grids, , and if , then and are pair TC sets for the circuit. , and  = .
Because for each , there is , and the total TC under and are the same. According to equation (1), activating the circuit with and separately will generate the same dynamic power consumption. The static power consumption of the circuit has nothing to do with the data at runtime. Moreover, we make as many elements in and as possible to guarantee the same static power. From Sections 2.2.3 and 2.2.4, the same TC of each grid ensures a balance of process variation in the overall power consumption of two sets. Therefore, if and are pair TC sets, the sums of the physical power consumption corresponding to them are equal.

2.3.2. Build Pair TC Sets

Definition 3. (multidimensional 0/1 knapsack problem). Given an matrix and an -dimensional column vector , determine whether there is an -dimensional binary vector making the equation true.

Proposition 1. Building pair TC sets can be reduced to solve the multidimensional 0/1 knapsack problem.

Proof. The construction of pair TC sets is divided into two stages: (1) selecting elements from the matrix T to join the set and (2) for a given set, finding the paired set. In the second stage, for a given , in every grid constitutes a column vector of dimension, corresponding to the column vector in the MKP problem. The remaining elements after removing the set in the matrix form a new matrix :corresponding to the matrix in MKP. The goal of building pair TC sets is to find a column vector in that satisfies . This corresponds to the binary vector solved in MKP. In the MKP problem, there is no stipulation on the number of 1 in the binary vector, and it can be the sum of any number of equal to . However, when building pair TC sets, it is necessary to have the same number of elements of the two sets. When corresponding to the MKP problem, we add an additional constraint: the number of 1 in the binary vector is equal to the number of elements in the set. This constraint can be transformed into an additional dimension and added to the original matrix. The new row vector in the matrix is . The dimension of each column vector is increased by 1. In this way, the construction of pair TC sets can still be reduced to the MKP problem.

3. Scheme of KP-Based HT Detection

3.1. Design of the Scheme

The detection scheme proposed in this paper associates the detector with the circuit designer and assists the detection by inserting circuits into the original design. The detection target in this article is the hardware Trojan injected in fabrication, and the Trojan is not always on but needs to be triggered.

3.1.1. Flow of the Scheme

The entire detection process is shown in Figure 3. Our scheme receives the RTL design or netlist design of the circuit and finally determines whether there is a hardware Trojan in the physical chip. The plan can be divided into three phases.

The first phase is the preprocess stage, which is used to generate the data structures required for the subsequent steps. Algorithm 2 first selects a set of input vectors in the overall input space for testing. The selection of the input vector needs to cover the normal circuit functions according to the original circuit structure. A vector is then generated from . is used to simulate the TC to obtain the TC matrix . There are many different levels of simulation of TC [33], and we use the lowest level simulation to meet the actual running state.

Input: circuit design , input vector space
Output: matrix and
(1) = RandomSelect ();
(2);
(3) = Simulate (, );

The second phase is the circuit design modification stage. The target of this stage is the circuit after Place and Route. The layout circuit is divided into grids, and the domain length of each grid is . In the matrix generated in the first stage, elements are randomly selected and put into the sets and , respectively. For each grid, Algorithm 3 calculates the sum of the TC. If there is , then Algorithm 3 outputs the design layout, , , and . Otherwise, Algorithm 3 performs circuit design modification. Finally, the modified PR-level design will be output. The circuit modification strategy is described in detail in Section 3.1.2.

Input: original circuit and
Output: modified circuit and set ,
(1) = Partition ();
(2)Select n elements from T and put them into and , respectively;
(3)forfromto
(4)  ifthen
(5)   Expansion ();
(6)  end
(7)  ;
(8)end
(9);

The third phase is the Trojan detection stage. At the end of the second phase, we consider the design of the circuit to be reliable and Trojan-free. As a potential hardware attacker, the chip manufacturer can inject a hardware Trojan in the circuit tape-out link. The target of the detection phase is a completed circuit. We run the circuit according to and collect the power leakage matrix . The elements in correspond to the elements in on a one-to-one basis. The elements in corresponding to the elements in and are accumulated, and we compare and . If the two are different, the Trojan was injected during the manufacturing process. Otherwise, the circuit is clean and secure.

3.1.2. Circuit Expansion

Section 2.3.2 proves that the core of building pair TC sets is to solve an MKP problem. At present, the knapsack problem is still an NPC problem. Horowitz and Sahni proposed a two-table algorithm using the divide-and-conquer method, with time complexity [34]. With the introduction of various artificial intelligence algorithms, many optimized knapsack solutions have been proposed. But, they are still essentially in exponential time complexity.

In order to make the detection scheme reach the practical and feasible time complexity, this paper combines the detection with the chip design. We make some adjustments to the original design to make it suitable for our detection scheme. As mentioned in Section 2.2.1, we insert the redundant circuit into the original circuit. Once the set is selected, the set is not directly solved according to the conditions. We randomly select elements to join set . For each grid, calculate the distance between and . By inserting redundant toggles in the original design, the TC of and in each grid reaches the same. The complexity of solving the knapsack problem is transformed into the complexity of constructing redundant circuits.

In our scheme, the main function of the circuit expansion is to reduce the time overhead of the designer when building the pair TC sets. The AND gate is the most basic logic element and exists in all CMOS circuits. So, we extend redundant circuits based on an AND gate and its two fan-in gates, as shown in Figure 4. We designed five extension methods based on the different types of gate 1 and gate 2 (AND or NOT) to generate additional TC without changing the original logic function of the circuit. The extended circuit logic is shown in Table 1.

3.1.3. Overhead

The five methods mentioned in Section 3.1.2 all add two redundant logic gates, which can generate two additional TC on average. That is, the number of redundant logic gates generated by this solution is .

In terms of the time complexity of Algorithm 3, as we mentioned in Section 2.2.4, Partition is an complexity operation. Therefore, the time complexity of Algorithm 3 is related to the number of executions of Expansion. On average, every time the TC differs by two, we need to insert a redundant logic. The execution frequency of the algorithm is . Therefore, the time complexity of Algorithm 3 is .

3.2. Security of KP-Based HT Detection

The attacker model is as follows. The adversary can get the layout design of the circuit and the format of the input and output. In order to bypass this detection scheme, the adversary can (1) determine which grids are used in the detection by solving and sets and (2) construct a special Trojan so that the injected Trojan presents the same TC for any test vector.

For the first method, the adversary cannot obtain because is constructed based on the which is the subset of the entire input space. Taking the AES128 encryption algorithm as an example, the size of its input space is . It is difficult for the adversary to guess . Even considering the worst case, the adversary can determine . In Section 2.3, we have explained that solving and from is an exponential complexity . Therefore, the adversary cannot bypass our detection through the first method. With the second method, the attacker does not need to solve and . However, the Trojan trigger circuit cannot achieve the same TC state for all test vectors. Because of the small probability of triggering the Trojan, the Trojan trigger circuit must have a large fan-in cone [35]. When the part state meets the trigger condition, the fan-in cone will produce different responses. If the same TC occurs for any input, the trigger circuit loses its function. Therefore, the adversary cannot bypass our detection through the second method.

3.3. Detection Capability of the Scheme

The existence of process variation makes the measured power consumption calculated according to and not completely equal. Let the difference between physical power consumptions without Trojan be and the power consumption of the Trojan trigger circuit be . If equation (5) is satisfied, the scheme can do detect the injected Trojan:

Because the trigger circuit generates different TC under different inputs, we use average TC to measure the relationship between the size of the Trojan and the original circuit. We denote the average TC ratio between the Trojan trigger circuit and the original circuit as . And, because TC is proportional to the nominal power consumption of the circuit, we get . According to the previous equation,

That is, for a hardware Trojan with the toggle scale greater than , a 100% successful detection rate can be achieved. The scheme proposed in this paper can effectively deal with process variation. In the physical detection, will be less than the theoretical value, so it can detect a smaller scale Trojan.

4. Experiment

We conducted the experiments on Xilinx Virtex5 XC5VLX30 FPGA. The device granularity on FPGA can only reach the standard FPGA unit, that is, LUT. Therefore, in the experiment, we convert all gate-level signals into LUT output signals.

4.1. Circuit under Test

For the detection scheme proposed in this paper, we carried out the experiment both with simulated data and physical data. The simulation experiment is based on an FPGA implementation of a 3-share AES S-box masking [36]. We call it TIS16. The physical experiment is based on AES-ECB encryption [37].TIS16 uses a new addition chain to accelerate and lighten the S-box, as shown in Figure 5, where S stands for square operation and M stands for multiplication. The hardware implementation of TIS16 is shown in Figure 6.

The shamul module is used to perform the inversion defined by the addition chain and iterate the square operation according to the affine transformation in GF (28). The shamac module performs shared multiplication on constants and then performs shared addition.

This paper implemented TIS16’s 3-share design on Xilinx Virtex5 XC5VLX30 FPGA. A round of encryption requires 21 clocks. The entire circuit occupies 530 registers and 883 LUTs.

Based on the physical measurement experiment on the AES-ECB encryption algorithm coming from DPA contest v2 [37], we implemented it on the SASEBO-GII development board. The board is equipped with Xilinx Spartan6 XC6SLX30 FPGA. The S-box of AES is implemented by a look-up table method. Each round of encryption is completed within one clock, and each encryption contains a total of 11 clocks. This circuit uses a total of 881 registers and 2296 LUTs.

4.2. Reducing Process Variation
4.2.1. Simulation Experiment

There is a proportional relationship between the TC and the physical power consumption of the circuit. Based on the TC, we simulated the effect of process variation on each signal and obtained the simulated power consumption affected by process variation. By measuring the deviation of the physical power consumption under different grid sizes, the ability of our scheme to resist process variation was verified. The simulation of the circuit is based on Xilinx’s ISim. The specific process of the experiment is as follows:(1)Circuit Simulation. Postplace and route simulation of the test circuit generates a VCD file, in order to obtain TC of the test circuit under a given input (in this experiment, we focused on the part that actually implements the encryption function. We only simulated the masking part, and the operation of the control circuit was ignored).(2)Circuit Partition. This experiment is based on Xilinx’s FPGA, and the circuit can naturally be partitioned by SLICE. The logic elements belonging to the same SLICE are divided into the same grid.(3)VCD File Analysis. Calculate the TC of each SLICE (grid) at each clock, and generate matrix by Algorithm 2.(4)Build Pair TC Setsand. In the experiment, we use the Euclidean distance as an indicator to measure the difference between the TC of two sets in each grid. We build and with the minimized Euclidean distance to ensure that the TC difference of each grid is minimized. It is used to reduce redundant gates.(5)Insert Redundancy. Compared to ASIC circuits, FPGAs specify the total number of hardware resources available to the designer. In the experiment, we choose the speed-first strategy during place and route, leaving a part of resources in each SLICE to add redundancy. On FPGA, for the expansion scheme proposed in 4.1.3, we convert it into the corresponding LUT truth table. For grids with different TC, we construct a LUT that is related to the input. For grids with the same TC, we construct a redundant LUT that is independent of the input.(6)Power Simulation. The simulation power consumption includes the nominal power caused by toggles and the noise power caused by process variation. Research by Chang and Sapatnekar [32] shows that, with the influence of process variation, the leakage power consumption of logic gates approximately follows a logarithmic Gaussian distribution. We simulate the power consumption of each flip as , where is a Gaussian random variable. According to the spatial correlation of process variation parameters, for TC in the same grid, we generate simulation power with the same expectation and variance . The average power consumption of each grid is denoted as a random variable . We assume . The size of affects the proportion of process variation in the total power consumption. In our experiment, we set it to 5%. affects the relevance of TC in the same grid and is related to . We set of each region to the same value and test the ability of this solution to resist process variation under different .

When divided by SLICE size, the circuit is partitioned into 206 grids. After the 4th step, the circuit is activated with input vectors according to and , respectively. The resulting TC are 88291 and 83939. After the redundancy is inserted, their TC are both 88310. And, the final simulated values are 70131.27 and 71335.53. The deviation between the two sets is about 1.72%. Regardless of the partition, we directly choose input to make and toggle the same, which means the entire circuit is in the same grid. The resulting simulation power is 75488.57 and 78639.83, respectively. The deviation between the two is about 4.17%. The values of in the two experiments are set to 0.01 and 0.001. It can be seen from the experiments that the detection scheme proposed in this paper has a certain resistance to process variation, which improves the detection accuracy of Trojan under the same experimental environment.

4.2.2. Physical Experiment

The first five steps of the physical experiment are the same as the simulation experiment. In Step 6, we used the physical power consumption data. In the physical experiment, we tested the experimental effect under different grid sizes. Specifically, in the experiment, the smallest partition unit we use is SLICE (), and each SLICE contains 4 LUTs. Besides, we tested the performance of the larger partitions ().

We describe the circuit in the XDL file of Xilinx and instantiate the GII AES design twice. The two instantiations are placed and routed in the same way only with an offset in the phase. The distance of the same function SLICEs is , as mentioned in Section 2.2.4. here refers to the ordinate difference of SLICE in the XDLRC file. As shown in Figure 7, of SLICE1 and SLICE2 equals 1, and of SLICE1 and SLICE4 equals 3. In this way, when we measure the power of the two different instantiations, it is equivalent to constructing a set and a set under a partition with .

We choose an AES instance as the reference circuit and collect 100,000 power traces. The traces are statically aligned based on correlation and denoised with Gaussian filtering. The preprocessing power traces of ten rounds’ AES encryption are shown in Figure 8. Under different , we perform the same power collection and preprocessing on other circuits and calculate the power difference between them (as set) and the reference circuit (as set). Figure 9 shows the difference in power consumption between set and set under different .

From Figure 9(f), it can be seen intuitively that when is larger, the difference in power consumption between the two sets will fluctuate more. We choose the Euclidean distance as the evaluation index and calculate the distance between the power trace of the reference circuit and the power trace of the circuits under different . When , the Euclidean distances are 5.8119, 30.7587, 44.7261, 48.8229, and 53.6777, respectively.

Experimental results show that when the circuit is partitioned into small grids and the power consumption is compared by grids, the smaller the grids, the smaller the process variation in the power consumption. And, the comparison result will be more accurate. The scheme proposed in this paper can effectively reduce the influence of process variation on the detection accuracy of Trojans.

4.3. Detecting Hardware Trojans
4.3.1. Simulation Experiment

In order to demonstrate the detection capability of KP-based HT detection, we injected hardware Trojan in TIS16 and conducted the simulation experiment. The HT sample comes from Trust-Hub [38], and the experiment is based on the AES-T800 in Trust-Hub. After place and route, TIS16 uses a total of 305 registers and 745 LUTs, and the Trojan circuit occupies an additional 4 registers and 8 LUTs. Our experiment only considers the Trojan trigger circuit and does not pay attention to the payload (when the payload is not triggered, it will not produce any toggle that affects the detection method). The trigger circuit of the AES-T800 is a finite state machine. When the four consecutive input vectors meet the four given values, the Trojan will be triggered. Based on the circuit used in Section 4.2, we inject the Trojan trigger circuit. In order to ensure that the original design will not be modified, we directly inject the Trojan in the XDL file which contains the placement and routing information.

For the circuit with HT, we activate it with the same input in Section 4.2. The simulation power difference between and sets is 4,297.01, which accounts for 6.02% of the original circuit power consumption, and the difference caused by the Trojan trigger circuit accounts for 4.33%. This value exceeds the 1.72% caused by process variation shown in Section 4.2. The result proves that our scheme can successfully detect hardware Trojans.

Note that the number of active gates of the Trojan trigger circuit and the original circuit is completely different under different test benches. Therefore, it may result in different detection effects choosing different test benches. In the experiment, we test different test benches and find that the minimum power consumption difference caused by Trojan is only 565.56 (0.79%). And, the maximum can reach 3297.69 (4.62%). When we select the worst input vectors, the Trojan power consumption may be masked by process variation. This fact reflects the importance of another work in this article, which is to reduce process variation. When the algorithm proposed in this paper is not used, process variation causes 4.17% power deviation in our experiment, which makes it impossible to successfully detect the AES-T800 Trojan under most test benches. When the circuit is partitioned, the process variation is reduced to 1.72% making the detection success rate increase.

4.3.2. Physical Experiment

We have extracted the Trojan trigger circuit in Trust-Hub [38]. In the Trust-Hub Trojan library, there are only three Trojan trigger modes for AES. In addition to the AES-T800 finite-state machine trigger structure used in the simulation experiment, the remaining structures include counter triggers and specific plaintext triggers. However, the solution in this article only pays attention to the toggle power consumption of the trigger circuit, which dilutes the trigger structure of the Trojan. At the same time, we assume that the trigger structure does not change the original function. Therefore, we use redundant circuits with different toggles to represent the Trojan trigger circuits equivalently. In the physical experiment, we insert the “Trojan” circuit, as shown in Figure 10. In the ten rounds of AES encryption, the encryption core calculates a 128 bit signal at the end of each round and applies it to the next encryption. We pull out two bits of and perform an AND operation. The result is sent to a redundant register. In this way, we can inject an extra toggle in each round of encryption in the original circuit. The specific implementation is shown in Figure 11.

The total scale of the injected circuit is 20 registers and 40 LUTs, accounting for 2.27% and 1.74% of the original design, respectively. We calculate the absolute difference between the power consumption of the circuit-inserted hardware Trojans and the power consumption of the reference circuit. And, the result is compared with the circuit under different . As shown in Figures 12(a) and 12(b), when the Trojan trigger circuit contains registers, the extra power consumption is relatively large. Even under the partition length of , the power of the Trojan is still far greater than the power caused by the process variation. The Trojan trigger circuit of this size can be effectively detected. On this basis, we reduced the size of the Trojan trigger circuit, by removing the registers in Figure 10 and turning the Trojan trigger circuit into a pure combinational circuit. As shown in Figures 12(c) and 12(d), under the Trojan scale of 40 LUTs, the power consumption of the Trojan is mixed with process variation when . At this time, due to the influence of process variation, the existence of Trojans cannot be detected. But, when the partition size reaches , the Trojan can still be obviously distinguished. Figure 12(e) shows the difference in power consumption between the Trojan trigger circuit with registers and the Trojan trigger circuit without registers under the same toggles.

Experiments show that the proposed scheme can effectively detect the combinational logic Trojan trigger circuit which accounts for about 1% of the original circuit scale.

4.4. Scheme Efficiency and Overhead

For time and space overhead, we found that the solution time is linear and can be directly calculated based on small data size. At present, under the simulation condition of a clock of 100 ns, the postroute simulation using Isim takes 16 minutes when performing 1000 encryptions (26 clocks each time, including key expansion). And, the generated VCD file reaches 4018405 kB (about 3G). For the above simulation data (data volume at 26  108 moments), the analysis time of the VCD file is about 6 minutes.

In this experiment, we use the knapsack algorithm to solve the multidimensional knapsack. When the knapsack dimension reaches 6 (that is, the total number of partitions is 6), for 1000 traces (10 clocks per trace), the size of the state matrix of dynamic programming reaches more than 16 GB. Without state recording, when the dimension reaches 10, the solution time has exceeded 24 hours.

In the previous article on hardware Trojan detection, the work of TeSR [25] and SeMIA [26] are similar to this paper. TeSR leverages on the uncorrelated temporal variations in transient current signature of sequential hardware Trojans to isolate their effect from process and measurement noise. By comparing current signature of a chip for the same input pattern at different time windows, TeSR can only detect sequential hardware Trojans, while the solution proposed in this article can detect both combination Trojans and sequential Trojans. The Trojan example used in TeSR contains 8 registers. Our scheme can detect the Trojans without registers. SeMIA uses the inherent structural self-similarity in the design to detect hardware Trojans. The experiments in SeMIA show that SeMIA can detect hardware Trojans that account for 2.3% of the original circuit. The solution proposed in this paper can still effectively distinguish the Trojans from the process variation when the Trojan accounts for 1.74% of the area.

5. Conclusions

This paper proposes a detection scheme for post-silicon hardware Trojans. Our method combines design and detection. We insert several redundant circuits during the design so that the power consumption selected at a specific time can be superimposed to form two self-referenced power consumption sets. By modifying the design, the problem of requiring a specific structure in the existing self-reference detection scheme is solved. For the modified circuit, we can generate two sets of circuit running moments. The total toggle counts of the two sets are equal. In this way, the physical power consumption corresponding to the two sets are also equal. Their power consumption can be referred to each other, and they can be seen as each other's golden template. The adversary has no knowledge of the redundancy addition process. In order to find the equal toggle counts, they have to solve a knapsack problem. Given that solving the knapsack problem is an NP problem, it proves that even if the adversary obtains the original design, he/she cannot know which power is included in the self-reference set. This guarantees the security of the proposed detection method. Based on the spatial correlation of process variation, this article divides the circuit and extends the knapsack into a multidimensional knapsack. We enable the variation in each grid to reach a balance which minimizes the deviation caused by process variation in the overall power consumption. In this paper, the resistance to process variation is realized by dividing the circuit into small grids, and it is verified in experiments.

5.1. Future Work

In the future, we will study excellent test-bench generation technology to improve the success rate of our detection scheme. When the circuit scale is expanded to a large order of magnitude, our method will be more complicated. We try to study the appropriate algorithm to make the difference between them as small as possible when selecting the elements of the sets. Besides, we will explore the possibility of the hardware Trojan location through grid division and quantify the relationship between the average TC ratio and process variation.

Data Availability

The simulation power data and physical traces used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by The Laboratory Open Fund of Beijing Smart-Chip Microelectronics Technology Co., Ltd., and was supported in part by the National Natural Science Foundation of China under Grant 61972295, Wuhan Science and Technology Project Application Foundation Frontier Special Project (2019010701011407), and Foundation Project of Wuhan Maritime Communication Research Institute (2018J-11).