Abstract
SM4 is a Chinese commercial block cipher standard used for wireless communication in China. In this paper, we use the partial linear approximation table of Sbox to search for three rounds of iterative linear approximations of SM4, based on which the linear approximation for 20round SM4 has been constructed. However, the best previous identified linear approximation only covers 19 rounds. At the same time, a linear approximation for 19round SM4 is obtained, which is better than the known results. Furthermore, we show the key recovery attack on 24round SM4 which is the best attack according to the number of rounds.
1. Introduction
SMS4 [1], issued in 2006 by Chinese government, serves the WAPI (WLAN Authentication and Privacy Infrastructure) as the underling block cipher for the security of wireless LANs. In 2012, SMS4 was announced as the Chinese commercial block cipher standard, renamed SM4 [2].
SM4 receives more attention from the cryptographic community and a lot of cryptanalytic results for SM4 have been produced. In [3], the rectangle and boomerang attacks on 18round SM4 and the linear and differential attacks on 22round SM4 have been presented. Using multiple linear attack, Etrog and Robshaw gave an attack on 23round SM4 in [4]. Besides these, the differential attack and the multiple linear attack on 22round SM4 have been introduced in [5, 6]. Till now, the best differential attack for 23round SM4 is given in [7]. Cho and Nyberg proposed a multidimensional linear attack on 23round SM4 in [8]. The best linear attack on 23round SM4 is provided by Liu and Chen in [9]. Bai and Wu proposed a new lookuptablebased whitebox implementation for SM4 which could protect the large linear encodings from being cancelled out in [10]. Moreover, relatedkey differential attack on SM4 has been given in [11] and the lower bound of the number of linear active Sboxes for SMS4like ciphers has been analyzed in [12].
Linear cryptanalysis [13] is one of the most important techniques in the analysis of symmetrickey cryptographic primitives. The linear cryptanalysis focuses on the linear approximation between plaintext, ciphertext, and key. If a cipher behaves differently from a random permutation for linear cryptanalysis, this can be used to build a distinguisher or even a key recovery attack through adding some rounds. The subkeys of appended rounds are guessed and the ciphertexts are decrypted and/or plaintexts are encrypted using these subkeys to calculate intermediate state at the ends of distinguisher. If the subkeys are correctly guessed, then the distinguisher should hold. Otherwise, it will fail. Linear cryptanalysis has been used to analyze many ciphers such as [14–17].
Our Contributions. In terms of the number of rounds that all the previous attacks for SM4 can work, the best key recovery attacks on SM4 are linear cryptanalysis and differential cryptanalysis, and both of them are based on 19round distinguishers. Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4. Therefore, we focus on searching the linear approximation for SM4 to improve the attacks on SM4. The contributions of this paper are summarized as follows.
The best previous linear attacks work on the 19round linear approximations. We design a new search algorithm for the iterative linear approximations for small rounds of SM4 by gradually expanding the partial linear approximation table of Sbox. Firstly, it is proved that there is no oneround or tworound iterative linear approximation for SM4, and then some properties are obtained for the iterative linear approximations of 3round SM4. Based on these properties, we utilize our searching algorithm to get an 19round linear approximation with bias and a 20round linear approximation with bias . The results about our identified linear approximations with the previous ones are depicted in Table 1. It can be seen that our linear approximations are the best ones so far.
The best previous attacks can work on 23round SM4. Utilizing our identified 20round linear approximation of SM4, we give a key recovery attack on 24round SM4, which is the best attack according to the number of rounds for SM4. Moreover, the new 19round linear approximation is used to attack 23round SM4. As a result, the best previous linear attack on 23round SM4 is improved. A summary of our attacks and the previous attacks on SM4 is listed in Table 2.
The paper is organized as follows. Section 2 briefly describes the notations used in this paper and introduces the SM4 block cipher. Section 3 shows how to search the better linear approximations for SM4. In Section 4, we use the 19round and 20round linear approximations to attack 23round and 24round SM4, respectively. Section 5 concludes this paper.
2. Preliminaries
2.1. Notations
In this subsection, we will present the notations used in this paper as follows:(i): a bitwise XOR operation(ii): concatenation of two words(iii): left cyclic shift operation(iv): multiplication of two vectors, matrix and vector, or two matrices(v): bitwise inner product(vi): logical AND operation(vii): the th bit of (viii): a bit string starting from the th bit to the th bit of .
2.2. Brief Description of SM4
SM4 is a Chinese national standard block cipher used in WAPI for WLAN. It has 128bit block size and the key size is also 128 bits. The design of SM4 is based on the unbalanced generalized Feistel structure and the number of rounds is 32. We denote the plaintext as , and the encryption procedure is described as follows:where is the th round’s subkey . The ciphertext . The decryption procedure is the same as the encryption procedure with the reverse order of subkeys.
One round of SM4 is shown in Figure 1. It can be known from Figure 1 that is composed of the nonlinear layer and the linear transformation . Layer has four Sboxes used in parallel. The specification of the Sbox could be referred to [1]. Let and be the 32bit input and output words of the linear transformation . Then
The key schedule of SM4 is similar to the encryption procedure but the only difference between them is that the linear transformation in the key schedule is The 128bit master key is first masked with the constants and then input to the key schedule function. where , , , and . And then is computed as follows: where , is the constant.
3. Search for the Linear Approximations of SM4
In terms of the number of rounds, all previous attacks for SM4 can work. One of the best key recovery attacks on SM4 is linear and differential cryptanalysis, and both of them are based on 19round distinguishers. Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4. Therefore, the key point is to search for the linear approximation of SM4. As far as we know, some methods to search for linear approximations of SM4 have been considered in [3, 4, 9, 19].
The search method in [3] is to construct linear approximations for reducedround SM4 by identifying a oneround linear approximation with the same input and output masks for the function. In this way, the number of active functions can be minimized. As a result, an 18round linear approximation with bias for SM4 has been found.
In [4], Etrog and Robshaw derived a 5round iterative linear approximation where only the last two rounds are active, and then they concatenated three fiveround iterative linear approximations to construct an 18round linear approximation with bias .
In [19], Liu et al. used the branchandbound algorithm in [20] to obtain a series of 5round iterative linear approximations, which are utilized to construct an 18round linear approximation with bias .
In order to get a better linear approximation for SM4, Liu and Chen gave a more dedicated search algorithm in [9]. They firstly used an MILPbased method to search the mode for the linear approximation with the minimum number of active Sboxes for reducedround SM4; then based on the identified mode they found the 19round linear approximation with bias .
It is obvious that even if the number of active Sboxes for a linear approximation is minimized, the absolute of its bias might not be maximum. From this point, we focus on searching for better linear approximations with a few more active Sboxes.
At CTRSA 2014, Biryukov and Velichkov extended the branchandbound algorithm to search for the differential characteristics of ARX ciphers where the partial differential distribution table for modular addition is used in order to improve the search efficiency [21]. Inspired from this idea, we will use the partial linear approximation table to search for linear approximations of SM4.
At first, some properties for basic operations such as the XOR operation, the threeforked branching operation, and the linear map will be introduced.
Lemma 1 (XOR operation [22]). Let ; the input mask vector and output mask are and , respectively. Then if and only if .
Lemma 2 (threeforked branching operation [22]). Let ; the input mask and output linear mask vector are and , respectively. Then if and only if .
Lemma 3 (linear map [23]). Let with the input mask vector and output mask vector ; then if and only if , where is the transposed matrix of and is an invertible binary matrix.
Biases in the linear approximation table for Sbox of SM4 take the values . If we put all the linear approximation table into the search program, the program will be too slow to get a better linear approximation. Thus, the partial linear approximation table is used in the search algorithm. The basic idea is that linear approximations of Sbox with higher bias are utilized first. If no better linear approximation is output, then we can expand the partial linear approximation table by appending more linear approximations of Sbox with less bias successively till a better linear approximation is output.
In order to get a better linear approximation, one common method is to find iterative linear approximations for short rounds first based on which long rounds of linear approximations could be produced directly. Thus, we will focus on searching for iterative linear approximations of SM4.
Now three properties for iterative linear approximations of SM4 are shown as follows.
Property 4. There is no oneround iterative linear approximation with active Sboxes on SM4.
Proof. From Figure 2, if there is an iterative linear approximation for the first round, we haveUsing the property of threeforked branch, we haveFrom (6) and (7), we get which implies and all the Sboxes in this round are passive. Thus, there is no oneround iterative linear approximation for SM4.
Property 5. The iterative linear approximation for two rounds of SM4 does not exist.
Proof. If there is an iterative linear approximation for the first two rounds in Figure 2, then we haveWith the property of threeforked branch, we haveAccording to (9) and (10), we derive Thus, which means that . Substitute the terms in the above formulas and we have so , which means that all Sboxes in the first two rounds are passive. Therefore, 2round iterative linear approximation for SM4 does not exist.
Property 6. For the iterative linear approximation of 3round SM4, the minimum number of active Sboxes is 3. Meanwhile, each round has one active Sbox and the active Sboxes are located in the same positions of three rounds.
Proof. If there is an iterative linear approximation for three rounds in Figure 2, then we haveSoWe focus on the linear approximation with less active Sboxes. From (15), it is impossible for a threeround iterative linear approximation to have only one active Sbox. If there are two active Sboxes, then or or . Hence, all Sboxes in the threeround linear approximation are passive. Take as an example.
If , then , which implies . Then Thus, ; we have . In the cases and , we can also obtain that there is no active Sbox in the threeround linear approximation by the similar way of the case . Therefore, the iterative linear approximation for threeround SM4 has at least three active Sboxes. From (15), it is clear that each round has one active Sbox and these active Sboxes are located in the same positions of three rounds.
From Property 6, we will try to search for the iterative linear approximation of 3round SM4 where each round has only one active Sbox. The search algorithm is listed in Algorithm 1. In Algorithm 1, the following notations are used. and are input and output masks of layer in the th round. and are input and output mask of the th Sbox of the th round. is a partial linear approximation table of Sbox which consists of linear approximations with bias no less than .

After proceeding the search algorithm, we identify 12240 3round iterative linear approximations with bias . With any 3round iterative linear approximation, we can construct linear approximations for 19round and 20round SM4 with bias and , respectively. Compared with the best previous 19round linear approximation in [9], the bias has been improved from to . In Tables 3 and 4, we give linear approximations for 19round and 20round SM4, respectively, where all masks are denoted as hexadecimal values and “” is undecided.
4. Key Recovery Attacks for SM4
4.1. Linear Attack on 24Round SM4
We append two rounds to the bottom and the top of the 20round linear approximation in Table 4, respectively. Then a linear attack on 24round SM4 is presented. The partial sum technique [24] is used in the partial encryption and decryption procedures. See Figure 3.
According to the linear approximation in Figure 3, we denote , , , , , and . From the linear approximation, we have Consider the partial encryption and decryption; the left side of the above equation can be written as follows:where is the state after transformation and is the state after layer in the th round.
Since then, (19) can be transformed into
Let . The attack process is given as follows:(1)Collect plaintext/ciphertext pairs.(2)Initialize counters to zero.(3)For every plaintext/ciphertext pair, calculate . Then increase the counter by one.(4)Guess the bit . Allocate counters to zero.(5)For every , calculate , and . .(6)Guess the 8bit . Allocate counters to zero.(7)For every , calculate and . .(8)Guess the bit . Allocate counters to zero.(9)For every , calculate , and . .(10)Guess the bit . Initialize counters to zero.(11)For every , calculate . If , increase the counter by ; otherwise, decrease it by .(12)We set the advantage to be 47 which implies that the top absolute values in are kept. For each kept subkey value, we guess the remaining 88 bits of (the master key can be gotten from the key schedule) and test the key by trail encryptions.
The time complexity of Step is about operations which is equivalent to 24round encryptions. Both Steps and need oneround decryptions or encryptions. Steps and take oneround decryptions or encryptions. The complexity of Step is 24round encryptions. So the total time complexity is about encryptions.
The memory complexity of Step is about bytes and the counter requires bytes, so the total memory complexity is about bytes.
If we set the data complexity , the success rate by [25]. The time complexity is 24round encryptions.
4.2. Linear Attack on 23Round SM4
Two rounds are added to the bottom and the top of the 19round linear approximation in Table 3, respectively. The key recovery attack on 23round SM4 is similar to the attack procedure of 24round SM4, so we omit details of the process.
If we set the data complexity and the advantage to be 47, the time complexity is 23round encryptions, and the memory complexity is bytes. The success rate is computed with the method in [25].
5. Conclusions
In this paper, it is firstly shown that there is no oneround or tworound iterative linear approximation for SM4 and the property for the 3round iterative linear approximation. On the basis of the property, we search for the iterative linear approximation of 3round SM4 by the partial linear approximation table. Next the 20round linear approximation is constructed by 3round iterative linear approximations. The best previous distinguishers only cover 19 rounds. Then the key recovery attack on 24round SM4 is provided, which is the best known attack on SM4 so far. Moreover, we also get a better 19round linear approximation, used to improve the linear attack on 23round SM4. As for future work, we hope to use the similar technique to search for a better differential characteristic for SM4.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported by 973 Program (no. 2013CB834205), NSFC Projects (nos. 61133013 and 61572293), and Program for New Century Excellent Talents in University of China (NCET130350).