Abstract

SM4 is a Chinese commercial block cipher standard used for wireless communication in China. In this paper, we use the partial linear approximation table of S-box to search for three rounds of iterative linear approximations of SM4, based on which the linear approximation for 20-round SM4 has been constructed. However, the best previous identified linear approximation only covers 19 rounds. At the same time, a linear approximation for 19-round SM4 is obtained, which is better than the known results. Furthermore, we show the key recovery attack on 24-round SM4 which is the best attack according to the number of rounds.

1. Introduction

SMS4 [1], issued in 2006 by Chinese government, serves the WAPI (WLAN Authentication and Privacy Infrastructure) as the underling block cipher for the security of wireless LANs. In 2012, SMS4 was announced as the Chinese commercial block cipher standard, renamed SM4 [2].

SM4 receives more attention from the cryptographic community and a lot of cryptanalytic results for SM4 have been produced. In [3], the rectangle and boomerang attacks on 18-round SM4 and the linear and differential attacks on 22-round SM4 have been presented. Using multiple linear attack, Etrog and Robshaw gave an attack on 23-round SM4 in [4]. Besides these, the differential attack and the multiple linear attack on 22-round SM4 have been introduced in [5, 6]. Till now, the best differential attack for 23-round SM4 is given in [7]. Cho and Nyberg proposed a multidimensional linear attack on 23-round SM4 in [8]. The best linear attack on 23-round SM4 is provided by Liu and Chen in [9]. Bai and Wu proposed a new lookup-table-based white-box implementation for SM4 which could protect the large linear encodings from being cancelled out in [10]. Moreover, related-key differential attack on SM4 has been given in [11] and the lower bound of the number of linear active S-boxes for SMS4-like ciphers has been analyzed in [12].

Linear cryptanalysis [13] is one of the most important techniques in the analysis of symmetric-key cryptographic primitives. The linear cryptanalysis focuses on the linear approximation between plaintext, ciphertext, and key. If a cipher behaves differently from a random permutation for linear cryptanalysis, this can be used to build a distinguisher or even a key recovery attack through adding some rounds. The subkeys of appended rounds are guessed and the ciphertexts are decrypted and/or plaintexts are encrypted using these subkeys to calculate intermediate state at the ends of distinguisher. If the subkeys are correctly guessed, then the distinguisher should hold. Otherwise, it will fail. Linear cryptanalysis has been used to analyze many ciphers such as [1417].

Our Contributions. In terms of the number of rounds that all the previous attacks for SM4 can work, the best key recovery attacks on SM4 are linear cryptanalysis and differential cryptanalysis, and both of them are based on 19-round distinguishers. Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4. Therefore, we focus on searching the linear approximation for SM4 to improve the attacks on SM4. The contributions of this paper are summarized as follows.

The best previous linear attacks work on the 19-round linear approximations. We design a new search algorithm for the iterative linear approximations for small rounds of SM4 by gradually expanding the partial linear approximation table of S-box. Firstly, it is proved that there is no one-round or two-round iterative linear approximation for SM4, and then some properties are obtained for the iterative linear approximations of 3-round SM4. Based on these properties, we utilize our searching algorithm to get an 19-round linear approximation with bias and a 20-round linear approximation with bias . The results about our identified linear approximations with the previous ones are depicted in Table 1. It can be seen that our linear approximations are the best ones so far.

The best previous attacks can work on 23-round SM4. Utilizing our identified 20-round linear approximation of SM4, we give a key recovery attack on 24-round SM4, which is the best attack according to the number of rounds for SM4. Moreover, the new 19-round linear approximation is used to attack 23-round SM4. As a result, the best previous linear attack on 23-round SM4 is improved. A summary of our attacks and the previous attacks on SM4 is listed in Table 2.

The paper is organized as follows. Section 2 briefly describes the notations used in this paper and introduces the SM4 block cipher. Section 3 shows how to search the better linear approximations for SM4. In Section 4, we use the 19-round and 20-round linear approximations to attack 23-round and 24-round SM4, respectively. Section 5 concludes this paper.

2. Preliminaries

2.1. Notations

In this subsection, we will present the notations used in this paper as follows:(i): a bitwise XOR operation(ii): concatenation of two words(iii): left cyclic shift operation(iv): multiplication of two vectors, matrix and vector, or two matrices(v): bitwise inner product(vi): logical AND operation(vii): the th bit of (viii): a bit string starting from the th bit to the th bit of .

2.2. Brief Description of SM4

SM4 is a Chinese national standard block cipher used in WAPI for WLAN. It has 128-bit block size and the key size is also 128 bits. The design of SM4 is based on the unbalanced generalized Feistel structure and the number of rounds is 32. We denote the plaintext as , and the encryption procedure is described as follows:where is the th round’s subkey . The ciphertext . The decryption procedure is the same as the encryption procedure with the reverse order of subkeys.

One round of SM4 is shown in Figure 1. It can be known from Figure 1 that is composed of the nonlinear layer and the linear transformation . Layer has four S-boxes used in parallel. The specification of the S-box could be referred to [1]. Let and be the 32-bit input and output words of the linear transformation . Then

The key schedule of SM4 is similar to the encryption procedure but the only difference between them is that the linear transformation in the key schedule is The 128-bit master key is first masked with the constants and then input to the key schedule function. where , , , and . And then is computed as follows: where , is the constant.

3. Search for the Linear Approximations of SM4

In terms of the number of rounds, all previous attacks for SM4 can work. One of the best key recovery attacks on SM4 is linear and differential cryptanalysis, and both of them are based on 19-round distinguishers. Whether we can get a better distinguisher is our first motivation to improve the attacks on SM4. Therefore, the key point is to search for the linear approximation of SM4. As far as we know, some methods to search for linear approximations of SM4 have been considered in [3, 4, 9, 19].

The search method in [3] is to construct linear approximations for reduced-round SM4 by identifying a one-round linear approximation with the same input and output masks for the function. In this way, the number of active functions can be minimized. As a result, an 18-round linear approximation with bias for SM4 has been found.

In [4], Etrog and Robshaw derived a 5-round iterative linear approximation where only the last two rounds are active, and then they concatenated three five-round iterative linear approximations to construct an 18-round linear approximation with bias .

In [19], Liu et al. used the branch-and-bound algorithm in [20] to obtain a series of 5-round iterative linear approximations, which are utilized to construct an 18-round linear approximation with bias .

In order to get a better linear approximation for SM4, Liu and Chen gave a more dedicated search algorithm in [9]. They firstly used an MILP-based method to search the mode for the linear approximation with the minimum number of active S-boxes for reduced-round SM4; then based on the identified mode they found the 19-round linear approximation with bias .

It is obvious that even if the number of active S-boxes for a linear approximation is minimized, the absolute of its bias might not be maximum. From this point, we focus on searching for better linear approximations with a few more active S-boxes.

At CT-RSA 2014, Biryukov and Velichkov extended the branch-and-bound algorithm to search for the differential characteristics of ARX ciphers where the partial differential distribution table for modular addition is used in order to improve the search efficiency [21]. Inspired from this idea, we will use the partial linear approximation table to search for linear approximations of SM4.

At first, some properties for basic operations such as the XOR operation, the three-forked branching operation, and the linear map will be introduced.

Lemma 1 (XOR operation [22]). Let ; the input mask vector and output mask are and , respectively. Then if and only if .

Lemma 2 (three-forked branching operation [22]). Let ; the input mask and output linear mask vector are and , respectively. Then if and only if .

Lemma 3 (linear map [23]). Let with the input mask vector and output mask vector ; then if and only if   , where is the transposed matrix of and is an invertible binary matrix.

Biases in the linear approximation table for S-box of SM4 take the values . If we put all the linear approximation table into the search program, the program will be too slow to get a better linear approximation. Thus, the partial linear approximation table is used in the search algorithm. The basic idea is that linear approximations of S-box with higher bias are utilized first. If no better linear approximation is output, then we can expand the partial linear approximation table by appending more linear approximations of S-box with less bias successively till a better linear approximation is output.

In order to get a better linear approximation, one common method is to find iterative linear approximations for short rounds first based on which long rounds of linear approximations could be produced directly. Thus, we will focus on searching for iterative linear approximations of SM4.

Now three properties for iterative linear approximations of SM4 are shown as follows.

Property 4. There is no one-round iterative linear approximation with active S-boxes on SM4.

Proof. From Figure 2, if there is an iterative linear approximation for the first round, we haveUsing the property of three-forked branch, we haveFrom (6) and (7), we get which implies and all the S-boxes in this round are passive. Thus, there is no one-round iterative linear approximation for SM4.

Property 5. The iterative linear approximation for two rounds of SM4 does not exist.

Proof. If there is an iterative linear approximation for the first two rounds in Figure 2, then we haveWith the property of three-forked branch, we haveAccording to (9) and (10), we derive Thus, which means that . Substitute the terms in the above formulas and we have so , which means that all S-boxes in the first two rounds are passive. Therefore, 2-round iterative linear approximation for SM4 does not exist.

Property 6. For the iterative linear approximation of 3-round SM4, the minimum number of active S-boxes is 3. Meanwhile, each round has one active S-box and the active S-boxes are located in the same positions of three rounds.

Proof. If there is an iterative linear approximation for three rounds in Figure 2, then we haveSoWe focus on the linear approximation with less active S-boxes. From (15), it is impossible for a three-round iterative linear approximation to have only one active S-box. If there are two active S-boxes, then or or . Hence, all S-boxes in the three-round linear approximation are passive. Take as an example.
If , then , which implies . Then Thus, ; we have . In the cases and , we can also obtain that there is no active S-box in the three-round linear approximation by the similar way of the case . Therefore, the iterative linear approximation for three-round SM4 has at least three active S-boxes. From (15), it is clear that each round has one active S-box and these active S-boxes are located in the same positions of three rounds.

From Property 6, we will try to search for the iterative linear approximation of 3-round SM4 where each round has only one active S-box. The search algorithm is listed in Algorithm 1. In Algorithm 1, the following notations are used. and are input and output masks of -layer in the th round. and are input and output mask of the th S-box of the th round. is a partial linear approximation table of S-box which consists of linear approximations with bias no less than .

  for   to 2  do
  for   to   do
    for all   do
      
      find all input masks indexed by from , and store in
      for   to   do
        
       find all input masks indexed by from , and store in
       for   to   do
         
         find all input masks indexed by from , and store in
         for   to   do
           
           for   to   do
             
           end for
           for   to   do
             
           end for
           
           
           
           
           
           
           if    then
             return  
             // 3-round iterative linear approximation
           else
             continue.
           end if
        end for
       end for
     end for
   end for
  end for
   
end for

After proceeding the search algorithm, we identify 12240 3-round iterative linear approximations with bias . With any 3-round iterative linear approximation, we can construct linear approximations for 19-round and 20-round SM4 with bias and , respectively. Compared with the best previous 19-round linear approximation in [9], the bias has been improved from to . In Tables 3 and 4, we give linear approximations for 19-round and 20-round SM4, respectively, where all masks are denoted as hexadecimal values and “” is undecided.

4. Key Recovery Attacks for SM4

4.1. Linear Attack on 24-Round SM4

We append two rounds to the bottom and the top of the 20-round linear approximation in Table 4, respectively. Then a linear attack on 24-round SM4 is presented. The partial sum technique [24] is used in the partial encryption and decryption procedures. See Figure 3.

According to the linear approximation in Figure 3, we denote , , , , , and . From the linear approximation, we have Consider the partial encryption and decryption; the left side of the above equation can be written as follows:where is the state after transformation and is the state after layer in the th round.

Since then, (19) can be transformed into

Let . The attack process is given as follows:(1)Collect plaintext/ciphertext pairs.(2)Initialize counters to zero.(3)For every plaintext/ciphertext pair, calculate . Then increase the counter by one.(4)Guess the -bit . Allocate counters to zero.(5)For every , calculate , and . .(6)Guess the 8-bit . Allocate counters to zero.(7)For every , calculate and . .(8)Guess the -bit . Allocate counters to zero.(9)For every , calculate , and . .(10)Guess the -bit . Initialize counters to zero.(11)For every , calculate . If , increase the counter by ; otherwise, decrease it by .(12)We set the advantage to be 47 which implies that the top absolute values in are kept. For each kept subkey value, we guess the remaining 88 bits of (the master key can be gotten from the key schedule) and test the key by trail encryptions.

The time complexity of Step is about operations which is equivalent to 24-round encryptions. Both Steps and need one-round decryptions or encryptions. Steps and take one-round decryptions or encryptions. The complexity of Step is 24-round encryptions. So the total time complexity is about encryptions.

The memory complexity of Step is about bytes and the counter requires bytes, so the total memory complexity is about bytes.

If we set the data complexity , the success rate by [25]. The time complexity is 24-round encryptions.

4.2. Linear Attack on 23-Round SM4

Two rounds are added to the bottom and the top of the 19-round linear approximation in Table 3, respectively. The key recovery attack on 23-round SM4 is similar to the attack procedure of 24-round SM4, so we omit details of the process.

If we set the data complexity and the advantage to be 47, the time complexity is 23-round encryptions, and the memory complexity is bytes. The success rate is computed with the method in [25].

5. Conclusions

In this paper, it is firstly shown that there is no one-round or two-round iterative linear approximation for SM4 and the property for the 3-round iterative linear approximation. On the basis of the property, we search for the iterative linear approximation of 3-round SM4 by the partial linear approximation table. Next the 20-round linear approximation is constructed by 3-round iterative linear approximations. The best previous distinguishers only cover 19 rounds. Then the key recovery attack on 24-round SM4 is provided, which is the best known attack on SM4 so far. Moreover, we also get a better 19-round linear approximation, used to improve the linear attack on 23-round SM4. As for future work, we hope to use the similar technique to search for a better differential characteristic for SM4.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by 973 Program (no. 2013CB834205), NSFC Projects (nos. 61133013 and 61572293), and Program for New Century Excellent Talents in University of China (NCET-13-0350).