#### Abstract

Most of the traditional cryptanalytic technologies often require a great amount of time, known plaintexts, and memory. This paper proposes a generic cryptanalysis model based on deep learning (DL), where the model tries to find the key of block ciphers from known plaintext-ciphertext pairs. We show the feasibility of the DL-based cryptanalysis by attacking on lightweight block ciphers such as simplified DES, Simon, and Speck. The results show that the DL-based cryptanalysis can successfully recover the key bits when the keyspace is restricted to 64 ASCII characters. The traditional cryptanalysis is generally performed without the keyspace restriction, but only reduced-round variants of Simon and Speck are successfully attacked. Although a text-based key is applied, the proposed DL-based cryptanalysis can successfully break the full rounds of Simon32/64 and Speck32/64. The results indicate that the DL technology can be a useful tool for the cryptanalysis of block ciphers when the keyspace is restricted.

#### 1. Introduction

Cryptanalysis of block ciphers has persistently received great attention. In particular, recently, many cryptanalytic techniques have emerged. The cryptanalysis based on the algorithm of algebraic structures can be categorized as follows: a differential cryptanalysis, a linear cryptanalysis, a differential-linear cryptanalysis, a meet-in-the-middle (MITM) attack, and a related-key attack [1, 2]. Differential cryptanalysis, which is the first general cryptanalytic technique, analyses how differences evolve during encryption and how differences of plaintext pairs evolve to differences of the resultant ciphertext pairs [3]. The differential cryptanalysis has evolved to various types of differential cryptanalysis such as an integral cryptanalysis, which is sometimes known as a multiset attack, a boomerang attack, an impossible differential cryptanalysis, and an improbable differential cryptanalysis [1, 2]. Linear cryptanalysis is also a general cryptanalytic technique, where it analyses linear approximations between plaintexts bits, ciphertexts bits, and key bits. It is a known plaintext attack. The work in [4] showed that the efficiency of the linear cryptanalysis can be improved by use of chosen plaintexts. The authors in [5] proposed a zero-correlation linear cryptanalysis, which is a key recovery technique. The MITM attack, which employs a space-time tradeoff, is a generic attack which weakens the security benefits of using multiple encryptions [6]. The biclique attack, which is a variant of the MITM attack, utilizes a biclique structure to extend the number of possibly attacked rounds by the MITM attack [6]. In a related-key attack, an attacker can observe the operation of a cipher under several different keys whose values are initially unknown, but where some mathematical relationship connecting the keys is known to the attacker [7].

However, the conventional cryptanalysis might be impractical or have limitations to be generalized. First, most of conventional cryptanalytic technologies often require a great amount of time, known plaintexts, and memory. Second, although the traditional cryptanalysis is generally performed without the keyspace restriction, only reduced-round variants are successfully attacked on recent block ciphers. For example, no successful attack on the full-round Simon or the full-round Speck, which is a family of lightweight block ciphers, is known [8–10]. Third, we need an automated and generalized test tool for checking the safety of various lightweight block ciphers for Internet of Things [11]. There are various automated techniques that can be used to build distinguishers against block ciphers [12–14]. Because resistance against differential cryptanalysis is an important design criterion for modern block ciphers, most designs rely on finding some upper bound on probability of differential characteristics [12]. The authors in [13] proposed a truncated searching algorithm which identifies differential characteristics as well as high probability differential paths. The authors in [14] applied a mixed integer linear programming (MILP) to search for differential characteristics and linear approximations in ARX ciphers. However, most automated techniques have endeavoured to search for differential characteristics and linear approximations. Hence, the machine learning- (ML-) based cryptanalysis can be a candidate to solve the above problems.

This paper proposes a generic deep learning- (DL-) based cryptanalysis model that finds the key from known plaintext-ciphertext pairs and shows the feasibility of the DL-based cryptanalysis by applying it to lightweight block ciphers. Specifically, we try to utilize deep neural networks (DNNs) to find the key from known plaintexts. The contribution of this paper is two-fold: first, we develop a generic and automated cryptanalysis model based on the DL. The proposed DL-based cryptanalysis is a promising step towards a more efficient and automated test for checking the safety of emerging lightweight block ciphers. Second, we perform the DL-based attacks on lightweight block ciphers, such as S-DES, Simon, and Speck. In our knowledge, this is the first attempt to successfully break the full rounds of Simon32/64 and Speck32/64 although we apply the text-based key for the block ciphers.

The remainder of this paper is organized as follows: Section 2 presents the related work; Section 3 describes the attack model for cryptanalysis; Section 4 introduces the DL-based approach for the cryptanalysis of lightweight block ciphers and presents the structure of the DNN model; Section 5 describes how to learn and evaluate the model; in Section 6, we apply the DL-based cryptanalysis to lightweight block ciphers and evaluate the performance of the DL-based cryptanalysis; finally, Section 7 concludes this paper.

*Notations*: we give some notations, which will be used in the rest of this paper. A plaintext and ciphertext are, respectively, denoted by *p* = (*p*_{0}, *p*_{1}, …, *p*_{n−1}) and *c* = (*c*_{0}, *c*_{1}, …, *c*_{n−1}), where *n* is the block size, *p*_{i} is the *i*th bit of the plaintext, *c*_{i} is the *i*th bit of the ciphertext, and . A key is denoted by *k* = (*k*_{0}, *k*_{1}, …, *k*_{m−1}), where *m* is the key length and *k*_{i} is the *i*th bit of the key, . Let denote the key bits from the *i*th bit to the *j*th bit of the key, that is, . A block cipher is specified by an encryption function, *E*(*p*, *k*), that is, *c* = *E*(*p*, *k*).

#### 2. Related Work

ML has been successfully applied in a wide range of areas with significant performance improvement, including computer vision, natural language processing, speech, and game [15]. The development of ML technologies provides a new development direction for cryptanalysis [16]. The idea of the relationship between the fields of cryptography and ML is introduced in [17] at 1991. After that, many researchers have endeavoured to apply the ML technologies for the cryptanalysis of block ciphers.

The studies on the ML-based cryptanalysis can be classified as follows: first, some studies focused on finding the characteristics of block ciphers by using ML technologies. The authors in [18] used a recurrent neural network to find the differential characteristics of block ciphers, where the recurrent neural network represents the substitution functions of a block cipher. The author in [19] applied an artificial neural network to automate attacks on the classical ciphers of a Caesar cipher, a Vigenère cipher, and a substitution cipher, by exploiting known statistical weakness. They trained a neural network to recover the key by providing the relative frequencies of ciphertext letters. Recent work [20] experimentally showed that a CipherGAN, which is a tool based on a generative adversarial network (GAN), can crack language data enciphered using shift and Vigenère ciphers.

Second, some studies used ML technologies to classify encrypted traffic or to identify the cryptographic algorithm from ciphertexts. In [21], an ML-based traffic classification was introduced to identify SSH and Skype encrypted traffic. The authors in [22] constructed three ML-based classification protocols to classify encrypted data. They showed the three protocols, hyperplane decision, Naïve Bayes, and decision trees, efficiently perform a classification when running on real medication data sets. The authors in [23] used a support vector machine (SVM) technique to identify five block cryptographic algorithms, AES, Blowfish, 3DES, RC5, and DES, from ciphertexts. The authors in [24] proposed an unsupervised learning cost function for a sequence classifier without labelled data, and they showed how it can be applied in order to break the Caesar cipher.

Third, other researchers have endeavoured to find out the mapping relationship between plaintexts, ciphertexts, and the key, but there are few scientific publications. The work in [25] reported astonishing results for attacking the DES and the Triple DES, where a neural network was used to find the plaintexts from the ciphertexts. The authors in [26] used a neural network to find out the mapping relationship between plaintexts, ciphertexts, and the key in simplified DES (S-DES). The author in [27] developed a feedforward neural network that discovers the plaintext from the ciphertext without the key in the AES cipher. The authors in [28] attacked on the round-reduced Speck32/64 by using deep residual neural networks, where they trained the neural networks to distinguish the output of Speck with a given input difference based on the chosen plaintext attack. The attack in [28] is similar to the classical differential cryptanalysis. However, the previous work failed to attack the full rounds of lightweight block ciphers, and moreover, they failed to develop a generic deep learning- (DL-) based cryptanalysis model.

#### 3. System Model

We consider (*n*, *m*) lightweight block ciphers such as S-DES, Simon, and Speck, where *n* is the block size and *m* is the key length. Our objective is to find the key, **k**, in which the attacker has access to *M* pairs, , of known plaintexts, and their resultant ciphertexts encrypted with the same key, that is, , *j* = 1, 2, …, *M*. Hence, the cryptanalytic model is a known plaintext attack model. Because the algorithms of block ciphers have been publicly released, we assume that the algorithms of block ciphers are known.

#### 4. Deep Learning-Based Approach

##### 4.1. DNN Learning Framework

The modern term “DL” is considered as a better principle of learning multiple levels of composition, which uses multiple layers to progressively extract higher level features from the raw input [29]. In the DL area, a DNN is considered as one of the most popular generative models. As a multilayer processor, the DNN is capable of dealing with many nonconvex and nonlinear problems. The feedforward neural network forms a chain, and thus, the feedforward neural network can be expressed aswhere **x** is the input, the parameter consists of the weights **W** and the biases **b**, is called the *l*th layer of the network, and *L* is the number of hidden layers. Each layer of the network consists of multiple neurons, each of which has an output that is a nonlinear function of a weighted sum of neurons of its preceding layer. The output of the *j*th neuron at the *l*th layer can be expressed aswhere is the weight corresponding to the output of the *i*th neuron at the preceding layer and is the bias. We apply a DNN to find the key of lightweight block ciphers. The multilayer perception mechanism and special training policy promote the DNN to be a commendable tool to find affine approximations to the action of a cipher algorithm. We train the DNN by using *N*_{r} pairs of randomly generated with different keys in order that the system *f* finds affine approximations to the action of a cipher, as shown in Figure 1. In Figure 1, the loss function can be the mean square error (MSE) between the encryption key, **k**, and the output of the DNN, . The performance of the trained DNN is evaluated by using *N*_{t} pairs randomly generated with different keys. Finally, given *M* known plaintexts, we find the key by using the trained DNN and the majority decision.

##### 4.2. DNN Structure for the Cryptanalysis

The structure of a DNN model for the cryptanalysis is shown in Figure 2. We consider a ReLU function, , as the nonlinear function. The DNN has neurons at the *l*th hidden layer, where *l* = 1, …, *L*. Each neuron at the input layer associates each bit of the plaintext and ciphertext; that is, the *i*th neuron represents **p**_{i}, and the (*j* + *n* − 1)th neuron represents **c**_{j}, where *i*, *j* = 0, 1, …, *n* − 1. The number of neurons at the input layer is 2*n*. Each neuron at the output layer associates each bit of the key; that is, the output of the *i*th neuron corresponds to **k**_{i}, where *i* = 0, 1, …, *m* − 1. Hence, the number of neurons at the output layer is *m*. The output of the DNN, , is a cascade of nonlinear transformation of the input data, , mathematically expressed aswhere *L* is the number of hidden layers and is the weights of the DNN.

#### 5. Model Training and Testing

##### 5.1. Data Generation

The ML algorithm learns from data. Hence, we need to generate data set for training and testing the DNN. Because the algorithms of modern block ciphers are publicly released, we can generate *N* plaintext-ciphertext pairs with different keys, where *N* = *N*_{r} + *N*_{s}, *N*_{r} is used for training the DNN, and *N*_{s} is used for testing the DNN. Let the *j*th sample represent , , as shown in Figure 3, where for , , and .

##### 5.2. Training Phase

The goal of our model is to minimize the difference between the output of the DNN and the key. Let **X** represent the training plaintext-ciphertext pairs , and let **K** represent the training keys corresponding to the *j*th pair , where .

The DNN learns the value of the parameter that minimizes the loss function, from the training samples, as follows:where because the samples are i.i.d., the MSE loss function can be expressed as follows:where is noted as the number of training samples, is the *i*th bit of the key corresponding to the *j*th sample, and is the *i*th output of the DNN corresponding to the *j*th sample.

##### 5.3. Test Phase

After training, the performance of the DNN is evaluated in terms of the bit accuracy probability (BAP) of each key bit. Here, the BAP of the *i*th key bit is the number of the DNN finding the correct *i*th key bit, divided by the total number of test samples.

Because the output of the DNN is a real number, , we quantize the output of the DNN into {0, 1}. The quantized output of the DNN can then be expressed as

Then, the BAP of the *i*th key bit is given aswhere is the number of test samples. has one if two input values, *a* and *b*, are identical, and otherwise, it has zero. is the *i*th key bit corresponding to the *j*th test sample, and is the quantized output of the DNN with the input of the *j*th test sample.

##### 5.4. Majority Decision When *M* Plaintexts Are Known

Assume that we have *M* plaintext-ciphertext pairs encrypted with the same key. If we have a probability of finding the *i*th key bit, , then the attack success probability of finding the *i*th key bit, which is the probability of a correct majority decision, is given as

By using the de Moivre–Laplace theorem, as *M* grows large, the normal distribution can be used as an approximation to the binomial distribution, as follows:where . Hence, in order to find the *i*th key bit with a success probability greater than or equal to , the number of required known plaintexts is

#### 6. Performance Evaluation

##### 6.1. Data Set and Performance Metric

For the data set, we generate the plaintext as any combination of a random binary digit, that is, . However, for the encryption key, we consider two methods. The first method is a “*random key*,” where the key has any combination of a random binary digit, that is, , . Hence, the probability that the *i*th key bit is one is 0.5 for all *i*. The other method is a “*text key*,” where the key has any combination of characters. For the simplicity, as shown in Figure 4, the character is one out of 64 ASCII characters, which consists of lowercase and uppercase alphabet characters, 10 digits, and two special characters: and . Hence, in the text key generation, each eight bits belongs to the set of , that is, , where . For example, for a 64-bit key, the key consists of 8 characters. In the text key, the probability that the *i*th key bit is one depending on the order in each character. Let the occurrence probability denote , where is the probability that the *i*th key bit is . Figure 5 shows the occurrence probability of the *i*th key bit . For example, the first bit of the key character is always 0, and the second bit is one with the probability of 0.828.

Taking the occurrence probability of each key bit into consideration, the performance of finding the *i*th key bit can be expressed as the deviation as follows:where is the BAP and is the occurrence probability of the *i*th key bit. If *M* known plaintexts is given, the performance of finding the *i*th key bit is given by , where , which is the probability of a correct majority decision, is obtained from equation (9).

##### 6.2. Simulation Environment

The performance of the DL-based cryptanalysis is evaluated for the lightweight block ciphers: S-DES, Simon32/64, and Speck32/64, as shown in Table 1.

In order to train the DNN with an acceptable loss rate, it is necessary to expand the network size. Hyperparameters, such as the number of hidden layers, the number of neurons per hidden layer, and the number of epochs, should be tuned in order to minimize a predefined loss function. The traditional way of performing hyperparameter optimization has been grid search or random search. Other hyperparameter optimizations are Bayesian optimization, gradient-based optimization, evolutionary optimization, and population-based training [30, 31]. Moreover, automated ML (AutoML) has been proposed to design and train neural networks automatically [30]. In our simulation, by using the data set of Simon32/64 and Speck32/64 ciphers, we simply perform an exhaustive searching to set the number of hidden layers, *L*, and the number of neurons per hidden layer, , through a manually specified subset of the hyperparameter space, *L* ∈ {3, 5, 7} and ∈ {128, 256, 512}. Additionally, to reduce the complexity, we choose a smaller number of hidden layers if the performance difference is not greater than 10^{−5}. If the number of epochs is greater than 3000, the error becomes small, and when it reaches 5000, it is sufficiently minimized, so we set the number of epochs is fixed to 5000. Consequently, the parameters used for training the DNN models are as follows: the number of hidden layers is 5, the number of neurons at each hidden layer is 512, and the number of epochs is 5000. We use the adaptive moment (Adam) algorithm for the learning rate optimization of the DNN.

The powerful “*Tensorflow*” is introduced to design and process the DNN. Also, we deploy a GPU-based server, which is equipped with Nvidia GeForce RTX 2080 Ti and its CPU is Intel Core i9-9900K. The implemented DL-based cryptanalysis tool is shown in Figure 6. The GUI was implemented by using PyQt over Python 3.7. The implemented tool provides various combinations of ML architectures, hyperparameters, and training/test samples.

##### 6.3. Simplified DES

###### 6.3.1. Overview of S-DES

S-DES, designed for education purposes at 1996, has similar properties and structure as DES but has been simplified to make it easier to perform encryption and decryption [32]. The S-DES has an 8-bit block size and a 10-bit key size. The encryption algorithm involves five functions: an initial permutation (IP); a complex function labelled *f*_{K}, which involves both permutation and substitution operations and depends on a key input; a simple permutation function that switches the two halves of the data; the function *f*_{K} again; and finally a permutation function that is the inverse of the initial permutation (). S-DES may be said to have two rounds of the function *f*_{K}.

Because the length of the key is limited, the brute-force attack, which is known as an exhaustive key search, is available. Some previous work presented an approach for breaking the key using genetic algorithm and particle swarm optimization [33, 34], which is concluded that the genetic algorithm is a better approach than the brute force for analysing S-DES.

###### 6.3.2. Test Results

For training and testing the DNN, we generate *N* plaintext-ciphertext pairs with different keys, as follows:where for and . Here, is the number of samples for training and is the number of samples for testing. In the simulation, we use *N*_{r} = 50000 and *N*_{s} = 10000. The plaintext is any combination of a random binary digit, that is, . We generate the encryption key by using two methods: a random key and a text key. In the S-DES with a 10-bit key, the text key has any combination of one character and two random binary bits.

Figure 7 shows the BAP of the DNN when we apply a random key and a text key. The results show the DL-based cryptanalysis can break the S-DES cipher. When we apply a random key, the key bits, **k**_{1}, **k**_{5}, and **k**_{8}, are quite vulnerable to the attack and the key bit of **k**_{6} is the safest. Because the minimum value of the BAP is at the 6th key bit, from equation (10), we need known plaintexts to find all the key bits with a probability of and we need known plaintexts to find all the key bits with a probability of . When we apply a text key, the BAP becomes high, thanks to the bias of the occurrence probability of each key bit, , as shown in Figure 5. Because the minimum value of the BAP is at the 6th key bit, from equation (10), we need known plaintexts to find all the key bits with a probability of and we need known plaintexts to find all the key bits with a probability of .

Figure 8 shows the deviation between the BAP and the occurrence probability of each key bit. Because of the bias of the occurrence probability of each key bit in the text key, we need to eliminate the bias characteristics of each key bit. The DNN shows that the key bits, which are quite vulnerable to the attack, are (**k**_{2}, **k**_{5}, **k**_{8}) in the text key and (**k**_{1}, **k**_{5}, **k**_{8}) in the random key. The key bit of **k**_{6} is the safest both in the text key and in the random key.

##### 6.4. Lightweight Block Ciphers

###### 6.4.1. Overview of Simon and Speck

Lightweight cryptography is a rapidly evolving and active area, which is driven by the need to provide security or cryptographic measures to resource-constrained devices such as mobile phones, smart cards, RFID tags, and sensor networks. Simon and Speck is a family of lightweight block ciphers publicly released in 2013 [35, 36]. Simon has been optimized for performance in hardware implementations, while Speck has been optimized for software implementations. The Simon block cipher is a balanced Feistel cipher with a *u*-bit word, and therefore, the block length is *n* = 2*u*. The key length, *m*, is a multiple of *u* by 2, 3, or 4. Simon supports various combinations of block sizes, key sizes, and number of rounds [35]. In this paper, we consider a Simon32/64 which refers to the cipher operating on a 32-bit plaintext block that uses a 64-bit key. The Speck is an add-rotate-xor (ARX) cipher. The block of the Speck is always two words, but the words may be 16, 24, 32, 48, or 64 bits in size. The corresponding key is 2, 3, or 4 words. Speck also supports various combinations of block sizes, key sizes, and number of rounds [35].

As of 2018, no successful attack on full-round Simon or full-round Speck of any variant is known. The authors in [37] showed differential attacks of up to slightly more than half of the number of rounds for Simon and Speck families of block ciphers. The authors in [38] showed an integral attack on 24-round Simon32/64 with time complexity of 2^{63} and the data complexity of 2^{32}. The work in [39] showed an improved differential attack on 14-round Speck32/64 with time complexity of 2^{63} and the data complexity of 2^{31}.

###### 6.4.2. Data Generation

For training and testing the DNN, we generate *N* plaintext-ciphertext pairs with different keys, as follows:where and . Here, is the number of samples for training and is the number of samples for testing. The plaintext is any combination of a random binary digit, that is, . We generated the encryption key by using two methods: a random key and a text key. In the text key, the 64-bit key consists of 8 characters, where each character is one of 64-character set, . Hence, although the total keyspace is 2^{64}, the actual keyspace is reduced to 2^{48}. For training, we use samples, and for the test, we use samples.

###### 6.4.3. Test Results

Figure 9 shows the BAP of the Simon32/64 with a random key in unit of character. The DNN shows that the BAP of each key bit varies randomly with an average of almost 0.5. Moreover, the results vary with each simulation with different hyperparameters. That is, the DNN failed to attack the Simon32/64 with a random key.

Figure 10 shows the BAP and the deviation of the Simon32/64 with a text key in unit of character. The BAP of each key bit is almost identical to the occurrence probability of the text key because the DNN learns the characteristics of the training data. However, when we eliminate the bias characteristics of the text key, the DNN shows the positive deviations, which means the DNN can break a Simon32/64 with a text key. For example, from equation (10), we need just known plaintexts in order to find the key bit of **k**_{2} with a probability of 0.99. The minimum value of BAPs is 0.51603 at **k**_{3}, which is greater than by about , except the last bits of each character. Hence, we can find the encryption key with a probability of given known plaintexts, and we can find the encryption key with a probability of given known plaintexts.

**(a)**

**(b)**

Figure 11 shows the BAP of the Speck32/64 with a random key in unit of character. The BAP of each key bit varies randomly with an average of almost 0.5, similar to the results of the Simon32/64. Moreover, the results vary with different hyperparameters. That is, the DL-based attacks against the Speck32/64 with a random key have been failed.

Figure 12 shows the BAP and the deviation of the Speck32/64 with a text key in unit of character. The DNN shows the positive deviations. That is, the DNN shows the possibility of breaking a Speck32/64 with a text key. The minimum value of BAPs is 0.51607 at **k**_{3}, which is greater than by about , except the last bits of each character. Hence, we can find the encryption key with a probability of given known plaintexts, and we can find the encryption key with a probability of given known plaintexts.

**(a)**

**(b)**

#### 7. Conclusions

We developed a DL-based cryptanalysis model and evaluated the performance of the DL-based attack on the S-DES, Simon32/64, and Speck32/64 ciphers. The DL-based cryptanalysis may successfully find the text-based encryption key of the block ciphers. When a text key is applied, the DL-based attack broke the S-DES cipher with a success probability of 0.9 given 2^{8.08} known plaintexts. That is, the DL-based cryptanalysis reduces the search space nearly by a factor of 8. Moreover, when a text key is applied to the block ciphers, the DL-based cryptanalysis finds the linear approximations between the plaintext-ciphertext pairs and the key, and therefore, it successfully broke the full rounds of Simon32/64 and Speck32/64. When a text key is applied, with a success probability of 0.99, the DL-based cryptanalysis finds 56 bits of Simon32/64 with 2^{12.34} known plaintexts and 56 bits of Speck32/64 with 2^{12.33} known plaintexts, respectively. Because the developed DL-based cryptanalysis framework is generic, it can be applied to attacks on other block ciphers without change.

The drawback of our proposed DL-based cryptanalysis is that the keyspace is restricted to the text-based key. However, although uncommon, a text-based key can be used to encrypt. For example, the login password entered with the keyboard can be text based if the input data are not hashed. Modern cryptographic functions are designed to be very random looking and to be very complex, and therefore, ML can be difficult to find meaningful relationships between the inputs and the outputs if the keyspace is not restricted. Hence, our approach limited the keyspace to only text-based keys, and the proposed DL-based cryptanalysis could successfully break the 32 bit variants of Simon and Speck ciphers. If the keyspace is not limited, the DL-based cryptanalysis failed to attack the block ciphers. In the future, the accuracy of ML will be improved, and the accuracy becomes more precise, thanks to the development of algorithms and hardware. Moreover, advanced data transformation that efficiently maps cryptographic data onto ML data will help the DL-based cryptanalysis to be performed without the keyspace restriction.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (nos. 2019R1F1A1058716 and 2020R1F1A1065109).