Abstract

Known plaintext attack is a common attack method in cryptographic attack. For ciphertext, only known part of the plaintext but unknown key, how to restore the rest of the plaintext is an important part of the known plaintext attack. This paper uses backpropagation neural networks to perform cryptanalysis on AES in an attempt to restore plaintext. The results show that the neural network can restore the entire byte with a probability of more than 40%, restoring more than half of the plaintext bytes with a probability of more than 63% and restoring more than half of the bytes above 89%.

1. Introduction

With the development of machine learning technology, the research of cryptanalysis is not limited to traditional manual deciphering. The intelligent deciphering based on machine learning, especially the emergence of intelligent deciphering based on neural network, provides a new development direction for cryptanalysis.

Neural network-based cryptanalysis can solve the shortcomings of traditional cryptanalysis methods in terms of attack difficulty and the amount of data required for attacks. First of all, neural network as an ideal black box recognition tool can effectively analyze and simulate the black box problem (cryptanalysis problem), infinitely approach the cryptanalysis problem, and finally get the algorithm equivalent to the encryption and decryption algorithm to achieve cryptanalysis. In this case, the neural network does not need to restore the key of the algorithm and the specific setting parameters and only needs to input the ciphertext, and after training, the corresponding plaintext can be obtained with a certain probability. Secondly, as a machine learning method, neural network can effectively achieve the corresponding cryptanalysis results and can achieve the final deciphering algorithm with less training set (plaintext-ciphertext pair). Therefore, neural networks are rapidly recognized by the industry as a method of cryptanalysis and are gradually called a new research direction in the field of cryptanalysis.

The application of neural network in cryptanalysis is mainly used for the global deduction of cryptographic algorithms [1] (the algorithm that the attacker obtains and encrypts and decrypts may not know the key) and the complete crack [1] (the attacker obtains the key).

In 2008, Bafghi et al. [2] used a neural network model to represent the differential operation of block ciphers to find the difference features. The differential feature space of the block cipher can be represented by a multilevel weighted directed graph, so the problem of finding the best differential feature can be transformed into the problem of finding the least weight multibranch path between two known nodes in the directed graph. In this paper, the cyclic neural network (RNN) is used to find the path by minimizing the network cost function in the differential operation graph of the block cipher. In 2010, Alallayah et al. [3ā€“5] performed neural network-based cryptanalysis on classical cryptography, sequence ciphers, and simplified DES (SDES). They regard cryptanalysis as a black box problem, using neural networks as an ideal tool for identifying black boxes, combining system identification technology with adaptive system technology, and constructing a neuroidentifier that simulates the target cryptosystem. A neural model and the key can be determined from a given pair of ciphertexts. In 2012, Alani et al. [6, 7] implemented a known plaintext attack based on neural networks for DES and 3DES. In the experiment, they trained a neural network to retrieve plaintext from ciphertext without having to retrieve the keys used in encryption. This kind of attack is successfully applied to DES and 3DES. For DES, an average of pairs of ciphertext pairs are needed, and cryptanalysis can be completed in about 51 minutes to achieve a global deductive effect. For the 3DES cryptanalysis, only an average of plaintext-ciphertext pairs are needed, and the analysis can be completed in about 72 minutes. Corresponding results significantly reduce the number of known plaintexts required and reduce the time required to perform a full attack compared to other attacks. In 2013, Bahubali Akiwate et al. [8] proposed a neural network-based cryptanalysis of the DES algorithm, aiming to analyze the nonlinear characteristics of DES through neural networks. The text data is used as the plaintext in the analysis, and the ciphertext encrypted by DES is used as the input of the neural network and is trained together with the plaintext to obtain the corresponding output. The neural output is compared with the plaintext, and performance error indicators are established for analyzing the data to improve efficiency. In 2014, Danziger et al. [9] used the method in [5] to find out the mapping relationship between plaintext, ciphertext, and key in SDES. When experiments are performed using 102,400 sample data, the keys of the first, second, and fifth bits of the 10-bit key can be obtained; when the number of samples is reduced to 2000, the keys of the first and second bits can be obtained. Together with the differential analysis, they improved the S-box of SDES, which reduced the correlation between adjacent keys in the key space, making it resistant to key restoration attacks in neural network cryptanalysis.

Inspired by [6, 7], this paper uses the methods in [6, 7] to construct an analysis framework similar to them. According to the block length and data type of AES algorithm, the data processing process suitable for AES algorithm is set. Then, according to the research object and the structural characteristics of neural network, the specific experimental system structure is designed. The AES algorithm in ECB and CBC mode is used to perform cryptanalysis using neural network. Without knowing the key, the plaintext is restored from the ciphertext, trying to achieve the effect of global deduction. The results show that the plaintext restored by the AES-128 and AES-256 algorithms over the neural network is more than 63% higher than the plaintext compared with the original plaintext.

Section 2 of this paper introduces the specific process of cryptanalysis based on neural networks. Section 3 designs the steps of the overall experiment and selects the relevant parameters. Section 4 is the experimental results. Section 5 gives the conclusion.

2. Cryptanalysis Process Based on Neural Network

The cryptanalysis method based on neural network utilizes the learning ability of the neural network to train the neural network with the known Ming ciphertext. After the training is completed, the neural network can restore the plaintext from the ciphertext that does not belong to the training set. The corresponding system structure is shown in Figure 1. The system contains an information forward propagation process (ciphertext neural network plaintext), and an error backpropagation process (plaintext error function weight correction neural network).

The ciphertexts are input in the neural network, and the output results are compared with the known plaintexts to obtain an error function. The weight is continuously corrected according to the error until the neural network is successfully trained. And finally the plaintext can be restored with a certain probability. This attack method is considered to be a global deductive attack, which is functionally equivalent to the original decryption algorithm without knowing the key. This analysis method is similar to the global approximation method for multilayer feedforward neural networks in [11].

In order to train a neural network with an acceptable error rate, it is necessary to expand the network size, so it is necessary to increase the time of each training cycle. There are many related parameters that need to be set in the neural network, such as the number of neurons, the number of hidden layers, the training function, etc. These parameters will be specified in Section 3.

3. Experimental Design and Parameter Selection

3.1. Experimental Environment

The relevant experiments were carried out in MATLAB_R2016a with a neural network toolbox. The equipment used in the experiment was MacBook, and the relevant data is shown in Table 1.

3.2. Data Representation

We divide the ciphertext data into 64-bit blocks, that is, get 64 channels of input and output, and represent them in matrix form in MATLAB. where is the plaintext matrix, is the ciphertext matrix, is the -th plaintext bit in the -th plaintext block, is the -th in the -th ciphertext block A ciphertext bit, and is the number of blocks of the ciphertext pair used in the training. Each bit in corresponds to each bit in . According to the number of inputs and outputs of the neural network, the number of rows in the two matrices is selected, and all ciphertexts in the ciphertext pair are encrypted by the same key.

3.3. Size and Layout of Neural Network

In the experiment, we choose feedforward backpropagation neural network and cascaded feedforward backpropagation neural network, in which the hidden layer and output layer of the feedforward BP neural network are only affected by the previous layer. The hidden layer and the output layer of the cascaded feedforward BP neural network are related to each of the previous layers. The specific experimental structure is shown in Figures 2 and 3.

The relevant parameters of the neural network are set as follows:(i)Set four hidden layers, each of which has 128, 256, 256, and 128 neurons. Since the experiment in [6] contains several hidden layer settings. When the setting is four layers, and the neurons in each layer are 128, 256, 256, and 128, respectively, the experiment is successful and the results are better. So we choose this setting. In the future research, we will also change the settings of the hidden layer, taking into account the impact of different settings on the results.(ii)The training function of the neural network selects the quantized conjugate gradient method ā€™trainscgā€™, which is suitable for large networks and occupies storage due to the conjugate gradient method. The space is small, and the ā€™trainscgā€™ quantized conjugate gradient method saves more time than other conjugate gradient methods.(iii)The error function that selects the correction weight during training is the mean square error (MSE). The error function is also called the loss function. The two most commonly used loss functions are cross entropy and mean square error. Among them, the cross entropy characterizes the distance between two probability distributions, which is a loss function that is used more in the classification problem. Unlike the classification problem, the regression mainly solves the prediction of specific numerical values. The restoration of plaintext from the known ciphertext belongs to the regression problem. For the regression problem, the most commonly used loss function is the mean square error.

3.4. Training Stop Condition

The maximum number of training cycles is set to 500, and there are three training stop conditions, namely, (1) reaching the final training cycle number of 500; (2) the acceptable mean square error limit of 0.05; (3) continuous verification failure The maximum number of times is 20.

3.4.1. Training Phase

(1) The training process first creates a neural network and selects the network layout, that is, determines the corresponding neurons in each layer, and then sets the training stop conditions.

(2) At the beginning of the training, part of the data in the data set is used as the training set, and the rest is used as the test set. The training of the ciphertext in the training set is performed, the ciphertext is input and the corresponding result is output, and the result is XORed with the original plaintext, and the weight is continuously corrected according to the error function.

(3) The training is over and it is judged whether the training is successful. If the number of consecutive verification failures reaches 10, and the mean square error is higher than 0.1, the training fails; if the mean square error is less than 0.1, the training is successful.

(4) If the training fails, reinitialize the network and return to step to start retraining.

(5) If the training is successful, enter the test process.

3.4.2. Test Phase

(1) After the training is completed, the ciphertext matrix used in the training is input into the neural network to obtain the corresponding plaintext result . The is subjected to a bitwise XOR operation with the plaintext of the training set, and the error percentage is calculated, which is called the training error. Training errors can be expressed by the following formula: where is the number of bits per block in the training set, is the number of blocks in the training set, is the -th bit in the -th block of the neural network output, and is the -th bit in the -th block of the plaintext used in the training. In this experiment, .

(2) The ciphertext different from the training set is input into the neural network, and the corresponding result of the neural network is obtained. The and the plaintext of the training set are XORed bit by bit, and the error percentage is calculated, which is called the test error. The test error can be expressed by the following formula: where is the number of bits per block in the test set, is the number of blocks in the test set, is the -th bit in the -th block of the output during the test, and is the -th bit in the -th block of the plaintext used in the test. In this experiment, .

(3) The ciphertext of the entire data set is input into the neural network, and the corresponding result is obtained, and the output and the original plaintext are XORed bit by bit, and the calculated error percentage is called the overall error. The overall error can be expressed by the following formula: where is the number of bits per block in the overall data set, is the number of blocks in the data set, is the -th bit in the -th block of the output, and is plaintext of the -th bit in the -th block. In this experiment, .

(4) Compare the output result with the original plaintext , and count the number of pairs of each byte and the number of errors 1-8 bits, and count the number of consecutive two bytes, four bytes, and eight bytes. The reason for counting the total number of consecutive bytes is that, in Chinese, one Chinese character is two bytes, and the words are generally composed of two Chinese characters, that is, four bytes, and four Chinese characters (eight bytes) may form a phrase or a sentence. So if the plaintext is composed of Chinese, part of the plaintext can be understood by restoring some of the Chinese characters. The same applies to English and other languages. The more the consecutive pairs of bytes correct, the better the understanding of the content of the plaintext.

4. Experimental Result

Experiments are specific to the AES-128, ECB, and CBC modes of the AES-256 algorithm. According to the experimental results in [6] and our current ciphertext data, the experiment consists of two parts: one is to select the image in the Caltech Image Database [10] and splicing into 1000 files of 512 KB size. In plain text, the above four kinds of cryptographic algorithms are used to encrypt each, 1,000 ciphertext files are, respectively, obtained, then the ciphertext files are truncated, the file length is still controlled to 512 KB, a total of 4,000 ciphertext files of 512 KB size are obtained, and then the ciphertext is obtained. Converted to 01 sequence, respectively, and intercepted the first digits for experiment. Second, Python randomly generated 1,000 files of characters as plaintext, respectively, encrypted with AES-128, AES-256 algorithm ECB and CBC modes to get 1,000 ciphertext files, and then cut off the ciphertext files, control the file length to characters. A total of 4,000 ciphertext files with a length of characters are obtained, and then the ciphertexts are converted into 01 sequences. A 01 sequence with a size of is obtained. The result of the experimental output is between , we round it up, the result of the result less than 0.5 is 0, and the result of the result greater than or equal to 0.5 is 1.

In the specific experiment, in the first part, we select the first 85% of the data set as the training set and the last 15% as the test set and count the output results of each size of , each byte error 0-8 bits number. The corresponding results are shown in Tables 2 and 3. Since [3, 8] and this paper are all aimed at restoring plaintext, the restoration accuracy of this paper and [3, 8] is compared. The corresponding results are shown in Table 4. In the second part, we select the first -, -, -, -, -, and -bit training neural networks in a data set and count each 0-8-bit error in each byte of the output result of . The corresponding results are shown in Figures 4 and 5. And the number of consecutive two bytes, four bytes, eight pairs of pairs, and the corresponding results are shown in Figures 6, 7, and 8.

It can be seen from Tables 2 and 3 that the two neural networks have little difference in the plaintext restoration results of the four algorithms, and the full pair of bytes can be more than 100, accounting for more than 40% of all bytes. The number of more than half of the bits in each byte is more than 89% of all bytes. Therefore, these two neural networks have a good effect on the plaintext restoration in cryptanalysis, which can effectively reduce the number of exhaustive, significantly shorten the analysis time, and greatly reduce the required resources.

It can be seen from Table 4 that the epoch of the four AES algorithms are all below 50, the MSE is more than 0.05, the epoch of DES and 3DES are both above 200, and the MSE is less than 0.04. This is because the AES algorithm is the latest data encryption standard, and the security is higher than DES and 3DES. It is difficult to restore the plaintext with high accuracy only through 2048 plaintext-ciphertext pairs, and this experiment only uses personal laptop for experiment; memory and performance are limited. Therefore, the epoch is small, resulting in higher error than DES and 3DES. However, only the DES and 3DES algorithms in the ECB mode are considered, the CBC mode is not considered, and the block length of the algorithms and the experimental data source are not explained in the [6, 7].

From Figure 4, in the experimental results of the feedforward BP neural network, the number of bytes in the pair is between 500 and 700, and the number of more than half of the bits in each byte accounts for more than 63% of all bytes. And the statistical number of each algorithm is normally distributed. It can be seen from Figure 5 that the cascaded feedforward BP neural network also has a normal distribution for each algorithm, and as the training set data increases, the number of bytes of the pair increases. In the experiment, we also found that for the feedforward BP neural network, when the training set is increased to bytes, the training continues to fail, and the MSE is always above 0.1; while the training success rate of the cascaded feedforward BP neural network is always, it stays above 99%, and as the training set increases, Epoch also increases. For the feedforward BP neural network, there is no obvious linear relationship between the size of the training set data and the number of bytes of the full pair. For the cascade feedforward BP neural network, the size of the training set data, and the pair of words there is a significant positive correlation between the number of sections. As the training set data increases, the number of bytes in the pair increases. Therefore, when the amount of training data increases to a certain extent, the cascade feedforward BP neural network can effectively improve the efficiency of plaintext restoration and reduce the number of exhaustive times required.

Since the feedforward BP neural network is not good for restoring consecutive bytes, this paper only shows the restoration results of the cascade feedforward BP neural network. It can be seen from Figures 6ā€“8 that as the training set increases, the number of bytes all correct increases. When the training set size is the same, the total number of bytes all correct of the four algorithms does not differ much. Figure 6 shows the number of consecutive two bytes all correct. When the training set reaches bytes, there are an average of more than 2,000 two bytes all correct, accounting for more than 3% of the whole. Figure 7 shows the number of four consecutive bytes all correct, when the training set reaches bytes, there are more than 700 four bytes all correct, accounting for more than 2.2% of the whole. Figure 8 shows the number of consecutive eight bytes all correct. When the training set reaches bytes, there are about 200 eight bytes all correct, accounting for about 1.2% of the whole.

5. Conclusion

This paper discusses the global deductive study of AES-128 and AES-256 algorithms. For the research goal, we use the feedforward BP neural network and the cascade feedforward BP neural network to restore the ciphertexts of the AES-128 and AES-256 algorithms in ECB and CBC modes. As a new method for cryptanalysis, neural network can restore the corresponding plaintext according to ciphertext. In the restored result, the number of bytes in all pairs is above 40%, and the number of bytes in more than half is 89%. Above, and for the cascade feedforward BP neural network, as the training set data increases, the error rate decreases, and the number of pairs of all pairs increases. In the global deductive study, we found that different neural networks will get different results, and different data types will lead to differences in error rates. Therefore, we will use this as an opportunity to continue to study different neural networks and consider the impact of more data types on error rates.

As an emerging cryptanalysis method, the cryptanalysis of plaintext restoration based on neural network is still in the experimental stage. The research on AES is also a new attempt. To get better results, we need to know more plaintext in advance and set more restrictions. Future research may require specific structural features specific to the AES algorithm to perform cryptanalysis more efficiently.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing; National Key Research and Development Project 2016-2018 (2016YFE0100600); Open Fund Project of Information Assurance Technology (KJ-15-008); State Key Laboratory of Cryptography and Science.