Abstract
Deep learning has played an important role in many fields, which shows significant potential for cryptanalysis. Although these existing works opened a new direction of machine learning aided cryptanalysis, there is still a research gap that researchers are eager to fill. How to further improve neural distinguishers? In this paper, we propose a new algorithm and model to improve neural distinguishers in terms of accuracy and the number of rounds. First, we design an algorithm based on SAT to improve neural distinguishers. With the help of SAT/SMT solver, we obtain new effective neural distinguishers of SIMON using the input differences of highprobability differential characteristics. Second, we propose a new neural distinguisher model using multiple output differences. Inspired by the existing works and data augmentation in deep learning, we use the output differences to exploit more derived features and train neural distinguishers, by splicing output differences into a matrix as a sample. Based on the new model, we construct neural distinguishers of SIMON and SPECK with round and accuracy promotion. Utilizing our neural distinguishers, we can distinguish reducedround SIMON or SPECK from pseudorandom permutation better.
1. Introduction
Deep learning has brought about significant improvement in many fields [1–3], and it enlightened cryptanalysis. As early as 1991, Ronald Rivest [4] discussed the similarities and differences between machine learning and cryptography and analysed the application of machine learning in the field of cryptography. In recent years, deep learning has also been applied to side channel analysis [5, 6], and it was pointed out that the sensitive information on embedded devices can be effectively extracted by training neural networks.
At Crypto 2019, Gohr [7] showed that deep learning can produce very powerful cryptographic distinguishers and indicated that the neural distinguisher was better than the distinguisher obtained by traditional approach. He used an input difference to train neural distinguishers of SPECK32/64 [8] based on the deep residual neural networks (ResNets) [9]. If the accuracy of a neural distinguisher exceeds 0.5, the neural distinguisher can distinguish target cipher from pseudorandom permutation. Gohr's work is a giant leap in differential cryptanalysis based on deep learning. However, his work actually opened many questions.
Why are neural distinguishers effective? How to improve neural distinguishers in terms of accuracy and the number of rounds?
In Eurocrypt 2021, Benamira et al. [10] proposed a detailed analysis and thorough explanations of the inherent workings of Gohr’s distinguishers. They showed that Gohr's neural distinguisher was in fact inherently building a very good approximation of the differential distribution table (DDT). Based on this, Benamira et al. also constructed an 8round distinguisher of SIMON32/64. In [10], Benamira et al. answered the first question. Similarly, Chen and Yu [11] bridged machine learning and cryptanalysis via the extended differentiallinear connectivity table. The first question is answered in [10, 11]. In addition to these works related to the inherent workings of neural distinguishers, there are some works related to the improvement of the neural distinguishers. In [12], Chen and Yu designed a new neural distinguisher model using multiple ciphertext pairs instead of single ciphertext pair. The new neural distinguisher can be used to improve the key recovery attack on 11round SPECK32/64. But Chen et al. did not explore improving the accuracy from the perspective of input difference or output difference, which is not conducive to finding a longerround neural distinguisher. In [13], Su et al. constructed polytopic neural distinguisher of roundreduced SIMON32/64. Their work partially answered the second question, yet the second question is still worth studying, especially in selecting the input differences and data format. Not limited to the neural distinguishers, there are also some works related to the neural aided key recovery attack [14–16].
It is not difficult to find that further improvement of the neural distinguishers is still worth studying, especially in accuracy and the number of rounds, because if the distinguishing accuracy is promoted, the complexity of key search can be reduced; and if the number of rounds is increased, the key recovery attack can be improved. However, unfortunately, there are few works to explore how to improve neural distinguishers from the perspective of the input difference. Besides, the neural distinguishers can be improved by using other distinguisher models. Inspired by these existing works, our core target is to answer the second question, that is, to further improve neural distinguishers in terms of accuracy and the number of rounds.
In this paper, our contributions are as follows.
An algorithm is designed based on SAT to improve neural distinguishers and apply to SIMON. In [7], Gohr chose as the input difference to train his distinguisher because it transitioned deterministically to the lowweight output difference. But such input differences are hard to find, which makes it difficult to find effective distinguishers. To solve this problem, we propose an algorithm based on SAT to improve neural distinguishers. With the help of this automatic search tool, we search for the exact round differential characteristics with probability and choose their input differences to train round neural distinguishers, where is the optimal probability and is the block size. Utilizing the algorithm, we obtain some neural distinguishers of 9round SIMON32/64, 10round SIMON48/96, and 11round SIMON64/128 with the accuracy exceeding 57% for the first time. Compared with the choice of input difference presented in [10], our algorithm obtains higheraccuracy neural distinguishers. Our results are shown in Table 1.
A new neural distinguisher model is proposed using multiple output differences and neural distinguishers of SIMON and SPECK are improved. In image recognition based on deep learning, a deep learning researcher will enhance some objective features of pictures so that the neural network can learn more effective features, which will improve the accuracy of the network. In [10], Benamira et al. explored the connection between Gohr's distinguisher and DDT, which enlightens us that the output difference is helpful to improve neural distinguishers. This also implies that we can selectively enhance certain features from output difference to improve neural distinguishers. Unlike [7, 12] using ciphertext pairs as training data, we use the output differences to train neural distinguishers by splicing output differences into a matrix as a sample. For a matrix, we treat it as an image and each output difference of the matrix is treated as an objective feature. Our goal is not only to learn each objective feature but also to learn the connections between output differences. If all output differences of the matrix are from the same input difference, the matrix will be labeled 1; otherwise, it will be labeled 0. Thanks to the new model learning more features than using ciphertext pairs, we improve neural distinguishers of SIMON32/64, SIMON48/96, and SIMON64/128. Besides, we obtain new neural distinguishers of 8round SPECK32/64, 7round SPECK48/96, and 8round SPECK64/128, which are better than the existing neural distinguishers. Using our improved neural distinguishers, we can distinguish reducedround SIMON or SPECK from pseudorandom permutation better. As a footnote, we show with experiments where the improvement in the accuracy of distinguishers is not due to the increase in the number of plaintexts but learning more features from the relationship between the output differences. The summary of our neural distinguishers together with other neural distinguishers is shown in Table 1.
The remainder of this paper is organised as follows. In Section 2, we introduce the basic notations and review Gohr’s distinguishers. In Section 3, we design an algorithm based on SAT to help us find highaccuracy neural distinguishers. In Section 4, we propose a new neural distinguisher model to ulteriorly improve neural distinguishers. Conclusions are drawn in Section 5 where we also suggest further work.
2. Preliminaries
To make it easier to read this paper, we first list the main notations. Then an overview of Gohr's work is given.
2.1. Notations
SIMON  SIMON acting on bit plaintext blocks and using a bit key 
SPECK  SPECK acting on bit plaintext blocks and using a bit key 
Bitwise XOR  
Bitwise AND  
Bitwise OR  
Addition modulo  
Left circular shift by j bits  
Master key  
round subkey 
2.2. Overview of Gohr's Distinguisher Mode
Given a fixed input difference and a plaintext pair , the resulting ciphertext pair is regarded as s sample. Each sample will be attached a label :
A neural network is trained over enough samples labeled 1 and 0. In addition, half the training data comes from ciphertext pairs labeled 1 and the other half from ciphertext pairs labeled 0. For the samples with label 1, their ciphertext pairs are from a specific distribution related to the fixed input difference. For the samples with label 0, their ciphertext pairs are from a uniform distribution due to their random input difference. If a neural network can obtain a stable distinguishing accuracy higher than 50\% on a testing set, we call the trained neural network a neural distinguisher. What is particularly noteworthy is that each sample is encrypted by a random key. By this method, the neural distinguisher will work whether the key is changed or not. In [7], Gohr chose the deep residual neural networks to train neural distinguisher and obtained effective neural distinguishers of 5round, 6round, and 7round SPECK32/64.
In traditional differential attack, it is pivotal to distinguish encryption function from a pseudorandom permutation, which is done with the help of the differential characteristic. For an round optimal characteristic of a block cipher with block size bits, we calculate the output difference given the fixed input difference . If the ratio of the output difference to is about , then we can distinguish the block cipher from a pseudorandom permutation. This is called distinguishing attack for block ciphers.
For Gohr’s neural distinguisher, we can obtain ciphertext pairs encrypted by the input difference . We input the ciphertext pairs, and the neural distinguisher will predict their labels. If the ratio of samples labeled 1 exceeds 0.5, we can distinguish the block cipher and pseudorandom permutation and the neural distinguisher is effective. This is called a distinguishing attack based on the neural distinguisher. In addition, it is obvious that the higher the accuracy of the neural distinguisher, the better the effect of the distinguishing attack; and the complexity of key search can also be reduced if the distinguishing accuracy is greatly promoted. So, it is necessary to improve neural distinguisher.
In [7], Gohr explained the reason for choosing as the input difference that it transitioned deterministically to the lowweight difference . But it is pretty hard to find such input differences unless the full differential distribution table is used. Moreover, it is a timeconsuming task to calculate the full DDT, especially for largesize block ciphers.
3. An Approach Based on SAT to Improve Neural Distinguisher
In traditional differential cryptanalysis, it is a primary task to find a highprobability differential characteristic, which takes advantage of the unevenness of the differential distribution. The distribution of output differences is different for different input differences. For a neural distinguisher, it actually learns the distribution of output difference given a fixed input difference. Therefore, the input difference directly affects the accuracy of the neural distinguisher.
In [7], Gohr chose as the input difference to train the distinguisher because it transitioned deterministically to a lowweight output difference. But such input differences are hard to find, which makes it difficult to find effective distinguishers. In [10], Benamira et al. chose the input difference from round or round optimal differential characteristics for round neural distinguishers.
In this section, we will introduce our algorithm for improving round neural distinguishers by searching for the round differential characteristics. With the help of SAT/SMT solver, we search for highprobability differential characteristics with probability in , where is the optimal probability and is the block size. Using our algorithm, we can obtain highaccuracy neural distinguishers for 9round SIMON32/64, 10round SIMON48/96, and 11round SIMON64/128.
3.1. Generic Network Architecture
Gohr converted the distinguisher of ciphertext pairs into a binary classification problem. His method is not only applicable to SPECK but also applicable to SIMON. With his method, we can construct a generic network architecture for other ciphers. We refer to [7] for the description of the method of constructing the network architecture.
There are multiple neural networks available to train neural distinguishers, such as MIP and ResNets. We choose the ResNets to train a neural distinguisher.
Our networks comprise three main components: input layer, iteration layer, and predict layer, shown in Figure 1. in Figure 1 refers to the word size of SIMON . The input layer receives training data with fixed length. In the iteration layer, we use 5 residual blocks. In each residual block, we use two Conv1D layers, and each Conv1D layer is followed by a batch normalization layer and an activation layer. After flattening data from iteration layer, data will be sent into a fully connected layer. The fully connected layer consists of a hidden layer and an output unit.
(a)
(b)
(c)
In our network, we choose the kernel size of the first Conv1D layer as 1 and the kernel size of the other Conv1D layer is 3. In addition, the number of filters in each convolutional layer is and the padding method is SAME. At last, we train our network based on L2 weights regularization to avoid overfitting. The other details of the hyperparameters used are given in Table 2. In Table 2, we choose hyperparameters similar to those in Gohr’s choice, so we can ignore the influence of the neural network and its parameters. After the neural distinguisher is trained, we can use it to distinguish the output of target cipher with a given input difference from random data. The higher its accuracy on the test set is, the better it distinguishes ciphertext data.
3.2. An Algorithm Based on SAT to Improve Neural Distinguisher
SAT is the Boolean Satisfiability Problem. It is an NPcomplete problem and considers whether there is a valid assignment to Boolean variables satisfying a given set of Boolean conditions. As the key issue of computer science and artificial intelligence, SAT solvers have gained a lot of attention since it was proposed. It has great advantages of open source, good interface, high efficiency, and perfect compatibility. There are many cryptanalysis results based on SAT [18–20].
At present, there are two main ways to select the input differences of neural distinguishers. One way is to directly choose an existing optimal differential characteristic [12], and the other is to choose round or round optimal differential characteristics for round neural distinguishers [10]. But these methods cannot effectively promote the distinguishing accuracy.
Taking into account the unevenness of the distribution of output differences for different input differences, we decide to choose the input differences of highprobability differential characteristics as the candidate differences. We search for highprobability differential characteristic by a SATbased automatic search tool and train neural distinguishers with these input differences of differential characteristics. Based on this, we design an algorithm to help us search for neural distinguishers with higher accuracy, which is shown in Algorithm 1. In Algorithm 1, we expand the search space of input difference by expanding the range of the probability. We choose as the lower bound of the probability, where is the probability of the optimal differential characteristics and refers to the block size of the target cipher. By experimental experience, we find that if the differential probability is lower than , there are almost no highaccuracy neural distinguishers. So there is nearly no need to spend time on the differential characteristics with the probability lower than .
Using Theorem 3 in [19] and opensource SAT/SMT solver Z3 [21], we search for highprobability differential characteristics of SIMON. Then, with Algorithm 1, we get the 9, 10, and 11round neural distinguishers of SIMON32/64, SIMON48/96, and SIMON64/128, respectively. This is the first time that there is a neural distinguisher of 11round SIMON64/128. The results of the neural distinguishers are shown in Table 3.

In order to show that Algorithm 1 is effective, we use the other two methods [10, 12] of selecting the input difference to train the neural distinguishers with the same rounds. For the method presented in [12], we choose in [17] as input difference to train 9round neural distinguisher. Besides, for the other method presented in [10], we train 10round neural distinguishers of SIMON48/96 using 9round and 8round optimal differential characteristics, and the specific results are shown in Tables 4 and 5.
In Table 3, we choose the same data format as that of Gohr’s distinguisher, which is single ciphertext pair. Other hyperparameters are posted in Table 2. As shown in Table 3, we show the comparison of the accuracy from three methods of selecting the input difference. As we can see, compared with selecting the input difference in [10, 17], the accuracy of neural distinguishers obtained by Algorithm 1 has been significantly promoted, which can be used to reduce the complexity of key recovery attack. Although both methods select the input difference from differential characteristics, Algorithm 1 selects the exact rounds of the differential characteristics according to the rounds of neural distinguisher.
We also try to search for neural distinguishers for more rounds. Unfortunately, as the number of rounds increases, the nonrandom features of the ciphertext pairs become weaker and weaker. So it is difficult for us to find a neural distinguisher with longer round, even if using Algorithm 1. In addition, the higher the Hamming weight of the input difference, the weaker the nonrandom feature of the ciphertext pair. So we should firstly search for input differences with lower Hamming weight adopting Algorithm 1 if time is limited.
4. A New Neural Distinguisher Model Using Multiple Output Differences
In [10], Benamira et al. show that the neural distinguisher generally relies on not only the differential distribution of ciphertext pairs but also the differential distribution in penultimate and antepenultimate rounds. This enlightens us whether we can directly use the output differences to train neural distinguishers. Unlike [7, 12] using ciphertext pairs as samples, we design a new neural distinguisher model with multiple output differences as a sample. Using the new model, we obtain the highaccuracy neural distinguishers for 10round SIMON32/64, 11round SIMON48/96, 12round SIMON64/128, 8round SPECK32/64, 7round SPECK48/96, and 8round SPECK64/128. Additionally, we show with experiments that the promotion in the accuracy of distinguishers is not due to the increase of the number of plaintexts but learning more features from the relationship between the output differences.
4.1. New Neural Distinguisher Model
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. As we know, the deep learning is datadriven, and the quality of the data determines the quality of the model to some extent. For neural distinguishers, the choice of ciphertext pairs directly affects the accuracy of the neural distinguishers, which has been solved in Section 3. In deep learning field, the format of training data also affects the quality of the trained model to some extent. This enlightens us that we can improve neural distinguishers from the perspective of data format. In image recognition, the deep learning researchers currently rotate the image or crop it to enhance some objective features, which has been experimentally proven to be effective. Inspired by Benamira et al.’s work and data augmentation in deep learning, we use the output differences to train neural distinguishers by splicing output differences into a matrix as a sample. For a matrix, we treat it as an image and each output difference of the matrix is treated as an objective feature. Our goal is not only to learn each objective feature but also to learn the connections between output differences.
As shown in Figure 2, the plaintext pairs are encrypted by a random master key. The ciphertext pairs are converted to output differences, where is the block size of ciphers. We splice multiple output differences into a matrix as a sample, which is described as . Similar to Gohr’s method, given an input difference , each sample will be attached a label according to the following equation:
If the label is 1, the matrix is denoted as a positive sample. Otherwise, it is denoted as a negative sample. We call the new data format . By randomly generating plaintext and key, we make our distinguishers learn the features of target block cipher instead of the features of the plaintext or key. In the experiment, we make the neural network learn more features by using more output differences in a matrix. As we can see, the new data format needs more ciphertext pairs. For the same number of training sets, the new model requires times more data than Gohr’s model.
Because only the channel dimension is changed, we refer to Figure 1 for the description of network architecture.
4.2. Applications to SIMON and SPECK
4.2.1. Application to SIMON
We choose the input difference in Table 3 to train new neural distinguishers. Other hyperparameters are posted in Table 2. The accuracy comparison is presented in Table 6. For 11round SIMON48/96, we do not obtain an effective neural distinguisher using the input difference in Table 3. So we research other highprobability differential transmissions.
In Table 6, the “SCP” refers to the data format of Gohr’s neural distinguisher, and the “MOD” refers to the data format shown in Figure 2. As shown in Table 6, compared with using ciphertext pairs, the number of rounds and accuracy of new neural distinguishers are greatly promoted. In addition, the new distinguishers can be further promoted by increasing , which shows that the superposition of output difference can help the neural network to learn more unknown features.
4.2.2. Application to SPECK
The new format is not limited to the neural distinguisher of SIMON, as it can also be found to be effective in SPECK. In [7, 12], is used to train neural distinguisher of 7round SPECK32/64. Using the difference, we obtain a new higheraccuracy neural distinguisher of 7round SPECK32/64. Not only that, with the help of [18, 20], we obtain a good input difference and an effective 8round neural distinguisher. As far as we know, this is the first effective 8round neural distinguisher of SPECK32/64 with accuracy more than 55\%. Besides, we also obtain neural distinguishers of 7round SPECK48/96 and 8round SPECK64/128. Summary of the existing results is shown in Table 7. In Table 7, “SCP” refers to the data format of Gohr’s neural distinguisher, “MCP” refers to the data format using multiple ciphertext pairs, and “MOD” refers to the data format shown in Figure 2. Other hyperparameters are posted in Table 2.
Utilizing the new model, we improve neural distinguishers in terms of length and accuracy. We can achieve better results in distinguishing attack utilizing the new neural distinguishers. Moreover, we give a further illustration of our model. Since we use more data in the new model than using ciphertext pairs, this makes our improved results seem to be related to increase of data. We perform supplementary experiments to show that the improvement of the accuracy of distinguishers is not due to the increase of the number of plaintexts but because of learning more features from the relationship between the output differences.
4.3. A Supplementary Explanation to Our New Model
Although the accuracy is higher using the new data format, the performance may be likely improved by training on more training samples. So we use the same number of ciphertext pairs to train neural distinguishers shown in Table 8. Other hyperparameters are posted in Table 2.
In Table 8, “SCP” refers to the data format of Gohr’s neural distinguisher, and “MOD” refers to the data format shown in Figure 2. As shown in Table 8, the accuracy using multiple output differences is higher, even if the same amount of data is used. In addition, it takes up less memory using output differences, which can reduce training time in the training process.
To further illustrate the effectiveness of the new distinguishers, we conduct additional experiments. As shown in Figure 3, we use same output differences as a sample; And we call the data format . As shown in Figures 2 and 3, uses different output difference in a sample, while uses same output difference in a sample. Based on the data format , 10^{6} positive and negative ciphertext pairs are randomly generated; and each output difference is reused times and filled in a matrix as a sample. Then new neural distinguishers are performed on 10^{6} samples. We calculate the accuracy of the new neural distinguishers for these special data. Table 9 shows the corresponding test results.
In Table 9, the “Accuracy using ” refers to the accuracy of neural distinguishers trained by . The “Accuracy using ” refers to the accuracy of neural distinguishers trained by . As shown in Table 9, the accuracy using is lower than that using . This illustrates that the new distinguishers learn more unknown features especially in the connection of different output differences.
5. Conclusion and Future Work
In this paper, we proposed a new algorithm and model to further improve neural distinguishers. On the one hand, by carefully selecting the input differences with utilizing SAT/SMT algorithm, we managed to search for exact round differential characteristics with high probability and trained round neural distinguishers. On the other hand, by adopting the new data format, we spliced multiple output differences into a matrix as a sample to capture more derived features; thus we can improve the number of rounds and accuracy of neural distinguishers. Application to SIMON and SPECK has proved the superiorities of our algorithm and models.
With our results, we obtain new effective neural distinguishers, which can be used to distinguish reducedround SIMON or SPECK from pseudorandom permutation better. Since there are numerous network architectures now with the development of deep learning, it is meaningful to explore other appropriate network models to improve neural distinguishers.
Appendix
A. Brief Description of SIMON and SPECK
SIMON
SIMON [8] is a lightweight block cipher proposed by the NSA (National Security Agency). The aim of SIMON is to fill the need for secure, flexible, and analyzable lightweight block ciphers. It is a family of lightweight block ciphers with block sizes of 32, 48, 64, 96, and 128 bits. The constructions are Feistel ciphers using a word size n of 16, 24, 32, 48, or 64 bits, respectively. Table 10 makes explicit all parameter choices for all versions of SIMON.
For SIMON , the keydependent SIMON round function is the map defined bywhere is the round subkey.
SPECK
Similar to SIMON, there are some different variants of SPECK, and these parameters about SPECK are shown in Table 11.
For SPECK , the keydependent SPECK round function is the map : defined bywhere is the round subkey.
As it is out of scope for our purpose, we refer to [8] for the description of the keyscheduling.
Data Availability
The data used to support the findings of this study are included within the article.
Disclosure
A preprint has previously been published in [22].
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.