Abstract

Now, the use of deep learning technology to solve the problems of the low multiclassification task detection accuracy and complex feature engineering existing in traditional intrusion detection technology has become a research hotspot. In all kinds of deep learning, recurrent neural networks (RNN) are very important. The RNN processes 41 feature attributes and maps them to a 122-dimensional high-dimensional feature space. To detect multiclassification tasks, this study proposes an intrusion detection method based on fully connected recurrent neural networks and compares its performance with previous machine learning methods on benchmark datasets. The research results show that the intrusion detection system (IDS) model based on fully connected recurrent neural network is very suitable for classification of intrusion detection. Classification methods, especially in multiclassification tasks, have high detection accuracy, significantly improve the detection performance of detection attacks and DoS attacks, and it provides a new research direction for the future attempts of intrusion detection methods for industrial control systems.

1. Introduction

With the achievements of deep learning in image recognition, speech recognition, etc, it also provides a new method and idea for researchers in the field of intrusion detection to carry out related work.

Recurrent neural networks (RNN) were important. In 2007, Rachid et al. applied RNN and standard neural network to the field of intrusion detection and conducted experiments on the constructed small sample dataset. The experimental results showed that the detection and classification performance based on the RNN was relatively general on the small sample dataset, which was lower than the neural network under the same conditions. In 2010, Mansour et al., considering the complexity of the fully connected neural network structure, constructed a partially connected RNN, and the features between groups were constructed into a fully connected recurrent neural network, and there was no information between the features between groups [13]. Contact feedback to reduce model training time and achieve better detection results on the datasets you build. Although the locally connected recurrent neural network structure shortens the training time, the features were artificially classified, and the connections and roles between different groups of features were not considered.

After in-depth analysis, this study proposes an intrusion detection model based on fully connected recurrent neural network. Forty one feature attributes were processed and mapped into a 122-dimensional high-dimensional feature space, and features were no longer grouped and classified. Considering the relationship between features, this study investigates the detection ability of fully connected recurrent neural networks under multiclassification tasks.

2. Recurrent Neural Network (RNN)

Currently, recurrent neural networks (RNN) are mainly used to solve dynamic system problems involving time series of events. Structurally, a RNN includes a hidden layer, an input layer, and an output layer [46]. The current output is related to the previous output, and the nodes of each hidden layer are no longer disconnected. Therefore, the main work is achieved through the loop of the hidden layer itself. Essentially, an RNN is a unidirectional information flow from the input layer to the hidden layer, combined with a unidirectional information flow from the last sequential hidden layer to the current hidden layer. Figure 1 visually compares the differences between traditional neural networks and RNN.

Figure 2 shows the structure of the RNN after expansion. All states before time series t will be represented as outputs at time series t-1 and affect the time series t. Therefore, a RNN is a learning model with a dynamic deep structure. If the hidden unit is regarded as the storage space of the entire network, when the RNN is expanded according to the time series, it can be considered that the RNN has memorized all the information so far, which is a typical end-to-end learning method. Theoretically, a RNN can learn arbitrarily long sequence information and can remember end-to-end information, reflecting the “depth” of deep learning.

Obviously, in training, the training of RNN includes forward pass and backward pass. Similar to the traditional neural network training algorithm, the forward pass is output according to the time sequence, and the reverse pass is to pass the accumulated residuals of the previous period back through the RNN. During forward propagation, the hidden layer output (ht) iswhere σ is the activation function, xt is the input vector of the time series t, ht is the output of the hidden layer, W is the weight matrix, U is the self-circulating weight matrix, and bh is hidden layer bias.

2.1. Intrusion Detection System (IDS) Based on Fully Connected Recurrent Neural Network

The overall framework of the intrusion detection model is shown in Figure 3, which mainly includes five steps [710].

Obviously, the training of the FCRNN-IDS has two aspects: forward propagation and weight fine-tuning. The forward propagation was responsible for the operation of the output data, and the fine-tuning of the weights was to update the weights by passing the accumulated residuals [11], which was no different from ordinary neural network training. The training was divided into two steps: first, the forward propagation algorithm was used to calculate its output value for each training sample of the input model. Then, using the weight fine-tuning algorithm, the entire model parameters were fine-tuned through backpropagation, and finally, a complete fully connected RNN classification model was obtained.

According to Figure 2, Algorithm 1 is a forward propagation algorithm, and Algorithm 2 is a weight update algorithm, respectively. Calculate the output of each instance xi using the forward propagation algorithm.

Input: the training sample was xi (i = l, 2, m), the weight matrix was Whx, Whh, and Wyh, the bias was bh and by, the activation function e uses the sigmoid function, and the classification function g uses the SoftMax function.
Output: the output value corresponding to the training sample Xi
(1) for xi from 1 to m do
(2)
(3)  
(4)  
(5)
(6) End for
Input: the training sample was (x1,y1) (i = 1, 2, ..., m).
Initialization: the initialization model parameter was θ = {Whx,Whh,Wyh,bh,by}
Output: the fine-tuned model parameter was θ = {Whx,Whh,Wyh,bh,by}
(1)For each sample xi, input a fully connected RNN, the output of xi was calculated by Algorithm 2.1
(2)Calculate the cross-entropy L(y:) between the output value of each sample and the label value:
(3)For each network model parameter θi in θ, calculate the partial derivative
(4)Make the error propagate back along the network and update each network model parameter θi in θ:
(5)If t = k, save the model parameters and the algorithm ends
(6)If t< k, then t = t + 1, turn to 1.

3. Experiment

3.1. Data Sources

The dataset used in the experiment is a new benchmark dataset NSL-KDD [1215]. This dataset is widely used.

3.2. Data Feature Extraction and Selection

Each connection record in the NSL-KDD contains 41 feature attributes [1619]. Among them, 41 features can be divided into 4 categories:(i)Basic features (9 types in total, numbered 1 to 9)(ii)Content features (13 types in total, serial numbers 10 to 22)(iii)Time-based network traffic statistics (9 types in total, serial numbers 23∼31)(iv)Host-based network traffic statistics (10 types in total, serial numbers 32∼41)

3.3. Data Preprocessing

Using the NSL-KDD dataset, each connection record consists of 41 feature attributes, including 3 non-numeric feature attributes. Data preprocessing mainly includes two parts: numericalization of nonnumerical feature attributes. After one-hot encoding, the attribute feature ’protocol_type’ corresponds to the binary feature vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). The other two nonnumerical attribute properties “service” and “flag” have 70 and 11 values, respectively. After such digital processing, the original 41-dimensional feature vector was converted into a 122-dimensional high-dimensional feature vector. On the one hand, one-hot encoding solves the problem of nonnumerical data conversion, making the calculation of “distance” between features more reasonable.

The first step was that the data value space was too large. Feature attributes such as “duration[0, 58329],” “dst_bytes[0,1.3 × 109],” and “src_bytes[0,1.3  ×  109],” which were correspondingly scaled by the logarithmic correction method as “duration[0, 4.77],” “dst_bytes[0, 9.11],” and “src_bytes[0, 9.11];” then, make each instance lie on the same order of magnitude on this feature. The second step was to normalize the data to the [0, 1] value range according to the following formula:where xi was the attribute eigenvalue, Min was the minimum value, and Max was the maximum value.

3.4. Test Plan

This study adopts the comparative test method to detect the accuracy of FCRNN-IDS.

3.4.1. Comparative Experiment 1: Comparison with Traditional Machine Learning Methods

Using the same dataset NSL-KDD, the detection accuracy of seven traditional machine learning algorithms [2033] such as decision tree, Naive Bayes, Naive Bayes tree, random tree, random forest, support vector machine, multilayer perceptron, and the detection accuracy of the FCRNN-IDS model in the case of 2-class (normal, abnormal) and 5-class (normal, probe, Dod, R2L and U2R) were studied and compared.

In [5], the authors investigated the anomaly detection performance of the above 7 classification algorithm models on the NSL-KDD benchmark dataset using Weka machine learning and data mining tools. Under the 2-class task, Figures 4 and 5 show the detection accuracy of KDDTest+ and KDDTest-21 by seven traditional machine learning methods, respectively. This study takes the research results of [5] as one of the comparative experiments; under the 2-class task, it was compared with the detection model based on fully connected recurrent neural network.

3.4.2. Comparative Experiment 2: Comparison with Recent Similar Literature

Wang and Cai [8] studied the performance of artificial intrusion detection systems under two and five types of tasks based on the same benchmark dataset NSL-KDD. The experimental results show that, under the dataset KDDTest+, the highest detection rate of the model was 81.2% under the 2-class classification; the highest detection rate of the model was 79.9% under the 5-class classification.

Deng et al. [9] proposed three-layer partially connected recurrent neural network architecture with 41 features as input and 4 intrusion categories and normal category as output. Taking the KDD99 dataset as the benchmark, some connection records were selected as the training set and the test set, respectively. The results show that the highest detection accuracy of the model was 94.1%. On the test set, the training time was set at 1383 seconds.

4. Results’ Analysis

4.1. Experimental Results of 2-Class Tasks

As mentioned earlier, the 41-dimensional feature vector was mapped to a 122-dimensional feature vector, so in the 2-class experiment. Figure 6 shows the detection accuracy of the FCRNN-IDS model on the training set with different structures and Learningrates.

As shown in Figure 7, it shows the detection accuracy of the FCRNN-IDS model on the test set KDDTest+ with different structures and Learningrates. As can be seen from Figure 7, when the Learningrate was 0.1 and the number of HiddenNodes was 80, the detection accuracy of the model on the test set KDDTest+ was 83.28%.

Figure 8 shows the detection accuracy of the FCRNN-IDS model on KDDTest-21 with different structures and Learningrates. As can be seen from Figure 8, when the Learningrate was 0.1 and the hidden node was 80, the model has the highest detection accuracy on KDDTest-21, which was 68.55%.

The experimental results were as follows.

As shown in Table 1, the number of HiddenNodes was 80 and the Learningrate was 0.1, which obtains high detection accuracy. Figure 9 details the variation in detection accuracy of the FCRNN-IDS model iteratively trained on the KDDTrain+, KDDTest+, and KDDTest-21.

Ashfaq e al. [5] studied the detection accuracy of classification algorithms such as J48, Multilayer Perception, Naive Bayes, Support Vector Machine, and Random Forest. The results are shown in Figures 4 and 5; the artificial neural network algorithm has the highest detection accuracy on the test set KDDTest+ in the 2-class task, reaching 81.2%, which was the latest literature on the application of related algorithms. The above model experimental results were all based on the dataset NSL-KDD, so they had similar comparison conditions.

As shown in Figure 10, the three algorithm models showed good classification and detection performance, especially the Naive Bayesian tree on the test sets, KDDTest+ and KDDTest-21. Better classification and detection performance: the detection accuracy of this method was high, 82.02% and 66.16%, respectively.

Compared with the detection method based on artificial neural network proposed in [8], FCRNN-IDS has the highest detection accuracy under the 2-class task, which was 81.2%, and the detection accuracy under the 2-class task was also higher. Table 2 shows the confusion matrix of the ANN-based detection model on the test set KDDTest + when performing the 2-class task. Table 3 presents the confusion matrix of FCRNN-IDS on KDDTest + under the 2-class task.

Therefore, when performing 2-class tasks, FCRNN-IDS further improved the detection ability of attack behavior, improved the accuracy, and reduced the false positive rate.

4.2. Experimental Results of Multiclassification Tasks

Figure 11 shows the detection accuracy of FCRNN-IDS on the training set with different structures and Learningrates. As can be seen from the figure, when the Learningrate was 0.5 and the hidden node was 60, the model has the highest detection accuracy on the training set, which was 99.87%.

As shown in Figure 12, from the detection accuracy of FCRNN-IDS on the test set KDDTest + under different structures and Learningrates, it can be seen that the Learningrate was 0.5 and the hidden node was 80; the model has the highest detection accuracy in the test set KDDTest+, which was 81.29%.

As shown in Figure 13, from the detection accuracy of FCRNN-IDS on the test set KDDTest-21 under different structures and Learningrates, the Learningrate was 0.5 and the number of HiddenNodes was 80; the model has the highest detection accuracy on the test set KDDTest-21, which was 64.67%.

Table 4 shows the detection accuracy of FCRNN-IDS on the training set and 2 test sets when performing multiclass detection tasks with different network structures and different Learningrates. Obviously, the experimental results on multiclassification tasks show that different network structures and Learningrates can affect the detection ability of the FCRNN-IDS. As shown in Table 4, when the hidden layer of the FCRNN-IDS was set to 80 nodes and the Learningrate was set to 0.5, the model has higher detection accuracy on the KDDTest-21 and KDDTest + test sets, which was 81.29% and 64.67%.

In order to compare the detection accuracy of various algorithms, similar to the 2-type task experiment, J48, Naive Bayes, Random Forest, and multilevel models were established through data mining software Weka and open source machine learning. Using 10 layers of cross-validation in the training set KDDTrain+, model training was performed using 7 machine learning algorithm models including layer perceptrons and support vector machines, and then, the model detection accuracy was tested in the test set. The experimental results are shown in Figure 14. Compared with the previous 2-class task, the detection accuracy of the traditional classification model generally drops under the multiclass task. Multilayer perception has the highest detection accuracy on the test sets KDDTest+ and KDDTest-21, 78.10% and 58.40%, respectively.

Under the same conditions, the neural network-based classification model achieves a detection accuracy of 79.9% when performing multiclassification tasks. Obviously, FCRNN-IDS performs better than other neural network-based detection models when performing multiclassification tasks.

Tables 5 and 6 show the confusion matrices of the neural network-based detection model and the fully connected recurrent neural network-based detection model on the test set KDDTest+, respectively.

Comparing the detection results in Tables 5 and 6, it can be seen that, in terms of correctly detecting DoS attacks, detection attacks, and U2R attacks, the detection model based on the RNN was fully connected to correctly detect more than 429 and 165 detection models based on neural networks, respectively, 2 contact records. Of course, in terms of correctly detecting normal connection records and R2L attack categories, the detection model based on the fully connected RNN correctly detected 20 and 272 fewer connection records than the neural network, respectively.

The confusion matrices of the four attack types detected by the model on the test set KDDTest+ are shown in Tables 710. Table 10 shows that the model false positives and recalls vary according to the type of attack. Table 11 shows the recall and false positive rates for different attack types.

In order to compare the detection performance of the fully connected neural network model and the partially connected RNN model proposed in [9] for intrusion detection, the training set and test set were constructed according to the method mentioned in [9], as shown in Table 12. In the experiments, the model was set to 20 HiddenNodes, the Learningrate was 0.1, and the training epoch was 50 times. The detection accuracy of the trained model on the test dataset reaches 97.09%, which was higher than 94.1% in the literature.

As shown in the experimental results above, the fully connected model proposed in this study has stronger feature space modeling ability and higher accuracy. Of course, without GPU acceleration, the model training time based on the fully connected RNN was 1765 seconds, which was higher than the training time of the model based on the partially connected RNN at 1383 seconds.

5. Conclusion

Compared with traditional machine learning classification models, fully connected recurrent neural network, as a deep learning method, has stronger feature representation ability, can more comprehensively map high-dimensional feature space into low-dimensional feature representation, and has the ability to express complex functions. Therefore, the detection model based on the fully connected RNN can detect a large number of abnormal attack records in binary and multiclassification tasks, improve the accuracy of intrusion detection, and reduce the false positive rate. For example, in terms of correctly classifying abnormal records, the fully connected recurrent neural network-based detection model correctly detected 462 more records than the neural network-based detection model when performing a 2-class task [34].

The main contributions of this study are as follows:(1)A new intrusion detection system based on fully connected recurrent neural network (FCRNN-IDS) was proposed, the training method of the model was studied, and the detection rate of models with different structures and different Learningrates was studied.(2)Using the dataset NSL-KDD, the detection performance of seven traditional machine learning methods in 2-class and multiclass tasks was studied, respectively. It lays a foundation for FCRNN-IDS with traditional learning methods.(3)The detection accuracy of FCRNN-IDS in 2-class and multiclassification tasks was studied, the performance of FCRNN-IDS in detecting various types of attacks was deeply analyzed, and the performance of FCRNN-IDS in detecting various types of attacks was compared.

Data Availability

The dataset can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.