[Retracted] Intrusion Detection for Industrial Control Systems Based on Open Set Artificial Neural Network

Wang, Chao; Wang, Bailing; Sun, Yunxiao; Wei, Yuliang; Wang, Kai; Zhang, Hui; Liu, Hongri

doi:https://doi.org/10.1155/2021/4027900

Security and Communication Networks

On this page

Abstract Introduction Related Work Evaluation Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article Retraction

!

This article has been Retracted. To view the article details, please click the ‘Retraction’ tab above.

Special Issue

Machine Learning for Security and Communication Networks

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 4027900 | https://doi.org/10.1155/2021/4027900

[Retracted] Intrusion Detection for Industrial Control Systems Based on Open Set Artificial Neural Network

Chao Wang,^1,2Bailing Wang ,^1,2Yunxiao Sun,^1,2Yuliang Wei,^1,2Kai Wang,^1,2Hui Zhang,³and Hongri Liu^1,2

Academic Editor: Chi-Hua Chen

Received10 Jun 2021

Accepted10 Aug 2021

Published19 Aug 2021

Abstract

The security of industrial control systems (ICSs) has received a lot of attention in recent years. ICSs were once closed networks. But with the development of IT technologies, ICSs have become connected to the Internet, increasing the potential of cyberattacks. Because ICSs are so tightly linked to human lives, any harm to them could have disastrous implications. As a technique of providing protection, many intrusion detection system (IDS) studies have been conducted. However, because of the complicated network environment and rising means of attack, it is difficult to cover all attack classes, most of the existing classification techniques are hard to deploy in a real environment since they cannot deal with the open set problem. We propose a novel artificial neural network based-methodology to solve this problem. Our suggested method can classify known classes while also detecting unknown classes. We conduct research from two points of view. On the one hand, we use the openmax layer instead of the traditional softmax layer. Openmax overcomes the limitations of softmax, allowing neural networks to detect unknown attack classes. During training, on the other hand, a new loss function termed center loss is implemented to improve detection ability. The neural network model learns better feature representations with the combined supervision of center loss and softmax loss. We evaluate the neural network on NF-BoT-IoT-v2 and Gas Pipeline datasets. The experiments show our proposed method is comparable with the state-of-the-art algorithm in terms of detecting unknown classes. But our method has a better overall classification performance.

1. Introduction

The function of industrial control systems (ICSs) is to monitor and control industrial physical processes such as gas, water, and electricity transmission. ICSs (to improve the readability, we list in Table 1 all the acronyms involved in our paper) have been isolated by the physical gap in the early years. Recently, these systems have been connected to the Internet. Although they greatly improve business efficiency, they cause many security issues [1]. The authors of [2] summarized the main attacks on ICSs, among which the most famous one is Stuxnet [3]. As the attack against ICSs will lead to catastrophic consequences, including economic losses, physical equipment damage, and potential consequences that may cause casualties, it is important to develop security protection technologies.

The intrusion detection system (IDS) is one of the techniques for ICSs security that has been extensively studied in recent years [4]. When intrusions occur, IDS can generate alerts by examining data records or network traffic. Based on the concept of defense in depth, an attack detection system using multisource data is constructed [5]. A multifeature data clustering optimization model [6] is proposed to improve the detection rate and detection time. Over the last few years, deep learning approaches have shown to be effective in many fields. More and more IDS researches adopt deep learning methods, such as deep belief network [7] and bidirectional simple recurrent unit [8]. However, most IDSs [9] presume that all classes detected at classification time are also present in the training set, which is hard to employ in the real environment. This problem is called open set recognition [10].

In this research, we introduce a novel intrusion detection method based on an open set artificial neural network (ANN), with the help of openmax and center loss function. We investigate the open set recognition problem from two perspectives in order to solve it. The openmax layer helps the ANN model adapt to the open set environment, breaking through the limitations of traditional softmax, and the center loss function enables the model to further improve the open set detection ability by obtaining a better representation. We seek to develop an intrusion detection method that can detect unknown attack classes and classify known classes as well. When implementing the method, we take both abilities into account. This research has made the following contributions:(i)We employ openmax instead of softmax to provide ANN IDS the ability to detect unknown attack classes. Using openmax, the new neural network model is capable of concurrently classifying known classes and detecting unknown classes.(ii)The center loss function is utilized in the training phase to enhance the representation of input data. Benefited from the center loss function, the neural network learns not only separable but also discriminative representation. The learned representation improves the detection ability of unknown classes.(iii)We evaluate the performance of our proposed method on two datasets (NF-BoT-IoT-v2 and Gas Pipeline). The results show that our method has achieved comparable performance to baseline methods in detecting unknown attack classes. However, our method performs better in terms of overall classification ability and takes less time to train the neural network.

This paper has the following organizations. First, Section 2 reviews some related literature concerning ICSs intrusion detection. Then, in Section 3, we have a full introduction to our suggested method. After that, Section 4 verifies the performance of our method on two intrusion detection datasets. Finally, our conclusion and future work are presented in Section 5.

When it comes to the security of ICSs, there have been many works involved. In [4], cyber security solutions for ICSs have been divided into four categories, namely, authentication solutions, privacy-preserving solutions, key management systems, and intrusion detection systems. In addition, IDSs are divided into several categories according to the machine learning methods used (e.g., IDS based on deep learning, IDS based on support vector machines, and IDS based on decision trees). IDS plays an impact role in the security protection of ICSs. From the perspective of intrusion detection data sources, there are two ways to classify IDS [11]: host intrusion detection system (HIDS), which uses the data that originates from the host system, and network intrusion detection system (NIDS). NIDS inspects traffic generated by the network. In this study, we concentrate on NIDS that is based on deep learning.

Deep learning approaches, such as ANN, convolutional neural network (CNN), recurrent neural network (RNN), and long short-term memory (LSTM, a kind of RNN) network, have demonstrated extensive applicability in a range of domains, including computer vision and natural language processing [12]. There are certain security applications as well, such as traffic classification [13], intrusion detection [14, 15], and malware detection [16]. It outperforms typical machine learning approaches by harnessing the power of deep learning.

Many papers have been published that use deep learning to implement IDS. In [17], researchers offered a method based on a GoogleLeNet-LSTM model. The intrusions are classified by a softmax function after multiple processing approaches. An IDS based on CNN was put forward and tested on the NSL-KDD and UNSW-NB15 dataset in [18]. When compared to ResNet50 and GoogleLeNet, the method performed well on the NSL-KDD dataset. Reference [19] created an effective IDS based on LSTM by combining package content level and time-series level information. The authors of [20] employed the deep autoencoder for learning the normal network behaviors and a supervised deep neural network model for classifying. These methods mentioned above have been shown to be effective in the experiments.

However, when an IDS is utilized in the real world, it will encounter classes it has never encountered before. Since the network environment is complicated and dynamic, with the increasing of attack methods, it is difficult for IDS to cover all attack classes during training. Because the IDS is trained using the closed set assumption, it would incorrectly classify these unknown classes into one known class. When an input is too far away from known classes, a promising open set IDS should reject it as “unknown” as well as classify known classes [9]. Some researches have been conducted to overcome the challenge of open set recognition [10]. But there is relatively little research in the field of intrusion detection using deep learning.

The authors of [21] proposed a new Open-CNN model that uses the openmax layer [22] instead of softmax to detect unknown attacks. In this paper, we add the center loss function to learn more powerful data representations further. A system called CADE [23] was built which leverages the power of contrastive learning [24] to train an autoencoder neural network. To detect data that belong to the unknown class, CADE employs Median Absolute Deviation [25] based on the distance of latent representations between samples and class centers. Contrastive learning constructs loss function for sample pairs. In this method, the number of training pairs dramatically grows. It inevitably results in slow convergence and instability [26]. Although CADE is successful in detecting unknown classes, it fails to obtain good performance when classifying all classes. In the experiments section, this approach will be compared to ours. To demonstrate the distinctive characteristics, the main differences between related work and our research are shown in Table 2.

3. Method

Our proposed method is described in depth in this section. First, we go through the concept of open set recognition and the architecture for our solution. Further, the two strategies we employed to solve the open set problem, namely, openmax and center loss function, are explained. The openmax aims at detecting unknown classes, in the meanwhile, the center loss function enhances this ability.

3.1. Overview

The closed set assumption, which states that all classes seen during the testing phase present in the training data, is used by most machine learning approaches. However, in real-world situations, this ideal assumption is hard to establish, as it is difficult to cover all classes while training the classifier.

According to the statement in [27], there are three basic categories of classes, namely, known classes, known unknown classes, and unknown unknown classes. Known classes represent the classes with distinctly labeled positive training examples (also serving as negative examples for other known classes). Known unknown classes are labeled negative examples, but not necessarily grouped into meaningful categories. Training models including known unknown classes produce a model with an explicit “other class.” Unknown unknown classes are classes unseen during training time. Compared with open set recognition, closed set recognition considers the known classes only.

In this paper, we focus on known classes and unknown unknown classes. For the sake of simplicity, we refer to “unknown unknown classes” as “unknown classes.” Figure 1 illustrates the difference between closed set and open set. When under the closed set assumption, unknown classes are classified as one of the known classes wrongly, as seen in Figure 1(b). Open set recognition aims to train a classifier that not only classifies known classes but also handles unknown classes [10] as Figure 1(c) shows.

(a)

(b)

(c)

Figure 1

The comparisons between closed set and open set. Figure (a) depicts the distribution of different classes. Under the closed set assumption, two unknown classes did not exist in the training set. (b) presents one possible classification boundary of a closed set classifier. Unknown classes are incorrectly classified. One possible classification result is shown in (c) when considering open set recognition. The classifier gives the correct classification result for known classes and outputs unknown label for unknown classes also.

From a mathematical point of view, consider we have a set of known classes , and an input feature , an ideal model trained for open set recognition should output a label for each known class and reject unknown classes as shown by (1). Unknown classes are represented by .

When dealing with classification tasks, the softmax is frequently utilized as the final function in the neural networks. Softmax outputs a probability for each class. The class which has maximum probability is determined as the final label usually. When coming across the open set issue, we can use a threshold to reject the unknown classes if softmax probability is low for all known classes. But the performance is not satisfactory [22].

Our proposed framework for solving this challenge is shown in Figure 2. The structure is made up of three parts. Instead of using the softmax layer, the openmax layer [22] is used to detect unknown classes, with ANN on the left serving as the foundation classifier. On this basis, we add center loss function during the training time. Center loss function interacts with the training phase actively to enable neural networks to learn not only separable but also discriminative representations to facilitate the classifier in identifying new unseen classes [26]. Next, we describe openmax and center loss function in depth.

3.2. Openmax

Consider a set of labeled samples , where denotes the number of samples and , given a test sample , usually, neural networks use softmax function to output a vector which contains the probability distribution of all the known classes. Let the input of softmax function be . In consistency with [22], is called activation vector (AV). The neural network learns a mapping function between input data and AV, . The probability of every class is given in where the dimension of AV is equal to the number of classes in . The corresponding predicted label is the index with the largest probability in the probability vector .

In order to break the limitations of softmax, [22] proposed a new layer called openmax, which is used for open set recognition of neural networks. The neural network is trained with softmax first still. Each known class is represented as a mean activation vector (MAV). To be precise, the mean is computed over correctly classified training instances. By leveraging the distance between AV of training samples and MAV corresponding class, the Weibull distribution models are fitted for each known class. Based on this, openmax calculates a recalibrated AV that contains unknown classes. In the following, we introduce how openmax works in detail.

The openmax layer works through two phases mainly, namely, the learning phase and the testing phase, respectively. In the learning phase, the parameters used in the testing phase are calculated. Let denote the set of AV which contains instances classified correctly for class , where . In the meanwhile, represents the MAV of class . A distance function is used to compute the distances between instances in with . In this paper, we use Euclidean distance. After sorting the distances, a Weibull model is fitted with the largest distances by FitHigh function from libMR [28]. The model contains parameters .

With the information derived from training data, the testing phase for detecting new samples is summarized in Algorithm 1. Considering the AV of input , the Weibull CDF probability of the top largest activation vector is scaled. Then, we calculate the revised activation vector with the top scores changed. The calibrated AV is given by where the is coefficient to calibrate AV. The calculation of is presented in lines 1–7 of Algorithm 1.

	Data: Activation vector of test sample , the number of top classes used to calibrate AV , Weibull model parameter vector .
	Result: Predicted label .
(1)	fordo
(2)
(3)
(4)	end
(5)	fordo
(6)
(7)	end
(8)	fordo
(9)
(10)	end
(11)
(12)	fordo
(13)
(14)	end
(15)

Lastly, apply the softmax function to recalibrated activation vectors again. In this way, the openmax outputs a probability vector that includes unknown classes. When the th class has the largest probability, if , the input belongs to the unknown classes.

3.3. Center Loss Function

The loss function is used to check the error between the neural network output and the ground truth. Since the neural network has huge number of parameters, to train the neural network, it means to find parameters that minimize the loss function calculated over the training set.

The softmax loss function (also named cross-category loss) is used to measure the error in the classification tasks usually. Detailed computation is shown in where and are the size of mini-batch and the number of class, respectively. is the function to denote whether this input sample belongs to some classes. When is fulfilled, ; otherwise, . The calculation of is shown in (2).

Under the effect of the softmax loss function, the trained neural network is good at finding the decision boundaries. To give a demonstration, we trained a neural network on the MNIST dataset [29]. For visualization, we set the neuron number of the last hidden layer to 2. The distribution of ten classes in the MNIST dataset is shown in Figure 3(a). The distribution is called deep features [26]. Different color represents different classes, and star symbols denote the centers for corresponding class.

(a)

(b)

(c)

From Figure 3(a), the different classes are separable. Although they can obtain good performance in classification tasks under the closed set recognition, different classes show significant intraclass variations, which are not discriminative enough. To obtain discriminative deep features, the center loss function was proposed [26]. There are two subfigures shown in Figure 3 in addition. These subfigures are generated by training with the center loss function. During the training phase, the center loss function enlarges the interclass differences of deep features and reduces the intraclass variations of deep features also, as shown in Figures 3(b) and 3(c).

Openmax equips the neural network with the ability to detect unknown classes. However, the openmax does not interact with the training process, which means it uses the AV trained under softmax loss still. In this research, we adjust the center loss function to train a better AV. In the following, we describe the center loss function in detail.

Suppose we use the stochastic gradient descent (SGD) with the mini-batch strategy to train the neural network, the center loss is defined in (5). The detailed calculation process of center loss is outlined in Algorithm 2.

	Data: training data {}. Initialized parameters of neural network, centers of AV . Hyperparameters , , and learning rate . The number of iterations .
	Result: the parameters .
(1)
(2)	while not converge do
(3)
(4)	Compute the joint loss by
(5)	fordo
(6)	Compute the backpropagation error by
(7)	end
(8)	fordo
(9)	Update the centers by
(10)	end
(11)	Update the parameters by
(12)	end

is the output AV of sample , and is the center of AVs belong class . Since it is hard to update the centers of the whole dataset, the batch centers are used instead. Also, when updating centers, to decrease the effect of some anomaly points, a scaler is used to control the change rate of centers. The centers are updated by

In (7), if is satisfied, the . Otherwise, .

Finally, the loss function is defined as the combination of softmax loss and the center loss bywhere hyperparameter is used to control the relation between these two loss functions. With the joint supervision of softmax loss and center loss, the neural network could learn a separable and discriminative AV.

4. Evaluation

Two intrusion detection datasets are utilized to test the efficiency of our proposed method. The experiments and corresponding results are described in this section. First, we introduce the datasets and performance measures. Then, experiments results are thoroughly examined.

4.1. Datasets

In the field of network intrusion detection, the KDD Cup 99 dataset is widely used, and the NSL-KDD dataset [30] is an improved version based on KDD Cup 99. But these datasets have been criticized extensively and are considered inadequate and out of date [31]. Also, these datasets lack characteristics of ICS.

In this research, we select the NF-BoT-IoT-v2 dataset [32] and the Gas Pipeline dataset [33] to evaluate the performance. These datasets are collected in recent years, which contain some new attacks and are more representative of the modern network environment.

4.1.1. NF-BoT-IoT-v2 Dataset

The NF-BoT-IoT dataset [34] was created by the Cyber Range Lab of the Australian Center for Cyber Security (ACCS) using Ostinato and Node-RED tools. NF-BoT-IoT-v2 [32] is an extending version of NF-BoT-IoT using NetFlow-based features. There are four attacks in the dataset, namely, DoS, DDoS, Reconnaissance, and Theft. Since there are a large number of records in the dataset, it is difficult to train the neural network. We randomly sample the records in the dataset. The detailed distribution after sampling is shown in Table 3.

After dropping some invalid columns (i.e., IPV4-SRC-ADDR, L4-SRC-PORT, IPV4-DST-ADDR, and L4-DST-PORT) and label, there are 39 features left. We pick one of the attack classes as an unknown class to conduct our experiments repeatedly, which means the picked one only appears in the testing set. The log function is applied to decrease the effect of large feature difference except for some category features (i.e., PROTOCOL, L7-PROTO, TCP-FLAGS, CLIENT-TCP-FLAGS, SERVER-TCP-FLAGS, FTP-COMMAND-RET-CODE, and DNS-QUERY-TYPE). We split the known attacks and Benign class with a ratio of 80 : 20.

Further, the dataset needs to be encoded and normalized. The processing techniques are applied to the training set first; then, the building normalized or encoded model is used to handle the testing set. We use the one-hot encoder scheme to handle the category columns mentioned above. Since the different attack classess contain different category values, there are some differences in the handled features. But the training set and testing set for the specific attack class are consistent. When encountering unknown category values, the encoder result is set as 0. The standard scaler is used to scale all the features with mean 0 and standard deviation 1. In order to speed up the training of the neural network, the SelectPercentile function in the scikit-learn [35] with default ANOVA F-value as an indicator is used to select features (i.e., 60%). Finally, there are about 120 features left.

4.1.2. Gas Pipeline Dataset

Another dataset is the Gas Pipeline dataset [33] which was created in 2014 by Mississippi State University. This dataset includes seven attack categories. Like the NF-BoT-IoT-v2 described above, the distribution of attack classes and Benign is imbalanced, which is hard to obtain good performance, we sample the Benign class and CMRI attack. The detailed distribution is shown in Table 4. Malicious state command injection (MSCI), for example, is an attempt to send malicious commands to devices and modify the state of the system as result.

Similarly, we choose an attack class as the unknown class repeatedly and split the known data with a ratio of 80 : 20. In addition, to normalize features, the standard scaler is employed. Finally, the data dimension after processing is 26.

4.2. Evaluation Metrics

In this research, we use the metrics applied in the classification task to evaluate the performance of our method. For example, when solving the problem of detecting unknown classes, there are four results considered in our research as shown in Table 5.

A true positive (TP) means that this flow record belongs to unknown classes when it is predicted as unknown. Otherwise, a false negative is a situation when an unknown class record has been misclassified as known. If a record belongs to a known class, it results in a false positive (FP) if the prediction is unknown class. In contrast, a true negative is a situation where a known class record has been classified as known correctly. Using these concepts, three metrics are used to denote our performance by the following equations:

Similarly, when considering overall classification, the performance metrics mentioned above are applied for all classes in the dataset. The measures are averaged across all classes, including unknown classes, to determine overall classification performance.

4.3. Results

The proposed method was developed on our machine which consists of Intel® Xeon® CPU E5-2640 v4 @ 2.40 GHz, three NVIDIA Tesla P100 PCIe 12 GB graphics cards, and 128 GB of RAM. The Keras framework [36] is used to program the neural network model in Python language. We utilize the PReLu activation function [37] in the hidden layer and the Adam algorithm [38] to train the neural network. Due to varied input dimensions after preprocessing, the number of neurons in the hidden layer for the two datasets is varied. In general, detailed settings are shown in Table 6.

The hyperparameters for openmax are crucial for detecting unknown class data. In common, deep learning methods use part of the training set as the validation set to select hyperparameters. But it is hard to choose detailed parameters for open set recognition since the data of unknown classes cannot be obtained. There are two parameters important for openmax, tail size , and top rank . Tail size of openmax affects the number of data used to fit Weibull distribution. Top rank affects the number of largest activation vectors used to revise. When the tail size is small, anomalies can easily affect the function; thus, we set it to 1,500 for the NF-BoT-IoT-v2 dataset. Considering the fact that the number of samples in the Gas Pipeline dataset is small, for this dataset is set as 250. As for top rank , we utilize the part of the training set that comprises known classes only as the validation set and use the performance of classifying known classes as an indicator to pick it. When this hyperparameter is higher, many samples that belong to known classes would be misclassified as unknown. For both datasets, is 1.

We begin by comparing the performance of our proposed method in detecting unknown classes to baselines. The overall classification results are shown then. Finally, we run the sensitivity analysis on our proposed method.

4.3.1. Detection Ability of Unknown Classes

The primary task of open set recognition is to detect unknown classes. When working in a closed set environment, all samples that belong to unknown classes are classified as one of the known classes, resulting in a dramatic drop in performance. To conduct open set experiments, we perform studies by iterating different intrusion classes as an unknown class, with Benign always being the known class because Benign samples are easier to obtain.

To begin, we go over the steps of our experiment utilizing the NF-BoT-IoT-v2 dataset. Assume that the DDoS attack class is set as unknown in the dataset. The model will not encounter any DDoS samples during the training phase since all DDoS records will be included in the testing set. The remaining instances of known classes are divided into training and testing set like common evaluation processes. As a result, the testing set contains both known and unknown classes. This testing set is used to assess the performance of the open set classifier. With the experiment settings above, there are four experiments for the NF-BoT-IoT-v2 dataset and seven for the Gas Pipeline dataset. To compare performance, we take the average of the results for the two datasets, respectively.

To demonstrate the performance of our proposed method, three baseline methods are included. The first two methods are Autoencoder [23] and CADE (contrastive learning + autoencoder) [23]. These two methods both use the latent representation learned by the autoencoder. But the CADE leverages contrastive learning to learn a more powerful latent representation. After that, they compute the centroid (the mean value for each dimension in Euclidean space) for each known class. If the distance between the latent representation of one sample and each centroid is larger than some threshold values, this sample is determined as unknown class. Instead of setting the distance threshold for each class manually, they employ a method called Median Absolute Deviation [25], which is to estimate the data distribution within each class by calculating the Median of the Absolute Deviation from the median of distances. An openmax-only ANN is the third baseline. This method has the same hyperparameters setting as our proposed method except it does not use the center loss function during training. It is used to highlight the improvement when adding the center loss function.

(1) Unknown Class Detection Results. Table 7 provides the corresponding three measures that were tested on two datasets using four different approaches. On the NF-BoT-IoT-v2 dataset, our method reports the greatest precision and score, which are 0.93 and 0.84, respectively. CADE obtains the best recall of 0.95. The score of our method is 0.03 higher than CADEs. Although the openmax method has higher precision, it has the lowest recall. Its score is lower than CADEs and ours, but better than that of autoencoder.

On the Gas Pipeline dataset, our approach has the best precision and score, which are 0.82 and 0.75, respectively, when compared to the other three baselines. However, the difference in score between our method and CADE is minor. The recall of CADE is the best. The openmax and autoencoder achieve similar results in the three measures.

(2) Effectiveness of Center Loss Function. Table 7 confirms the effectiveness of adding center loss function to the neural network also. When experimenting on the NF-BoT-IoT-v2 dataset, with the help of the center loss function, although the precision increased by only 0.03, the improvement of recall and score is higher. The score has increased by 0.12 when it comes to the Gas Pipeline dataset.

In general, our method achieves comparable results with CADE. The CADE employs contrastive learning, but our approach uses center loss to interact with the training process to learn superior feature representations. By contrast, our technique takes less time to train the neural network. Also, the comparisons in Table 7 prove the impact of the center loss function.

The detailed performance results for every attack class are shown in Figures 4 and 5. On the NF-BoT-IoT-v2 dataset, our method and CADE have comparable results in most attack classes. But CADE performs poorly in the Theft attack. The score of both autoencoder and CADE in the Theft attack is about 0.4. On the Gas Pipeline dataset, the autoencoder and openmax failed to detect the MFCI attacks. Even with our method or CADE, the score can achieve 0.6 approximately only. In addition, the four approaches have poor performance when it comes to NMRI attack.

(a)

(b)

(c)

(a)

(b)

(c)

4.3.2. Comparisons with Closed Set Classifier

Classification tasks in an open set environment require the ability to classify known classes as well. Similarly, we repeatedly select one attack class as an unknown class, as we did before. When evaluating the performance of classifying all classes, we use the average of all metrics including each known class and unknown class. The autoencoder and CADE method are used as baselines also. For these two methods, except for the records classified as unknown, other records are classified according to the label of closest centroid.

Figures 6 and 7 give examples of the classification confusion matrix for the NF-BoT-IoT-v2 and the Gas Pipeline, respectively. Specifically, in Figure 6, the Theft attack class is set as unknown for the NF-BoT-IoT-v2 dataset; in the meanwhile, the Reconnaissance is the unknown class for the Gas Pipeline dataset.

(a)

(b)

(c)

(a)

(b)

(c)

Using Theft as an unknown class in the NF-BoT-IoT-v2 dataset, three experiment setting results are shown in Figure 6. Figure 6(a) presents results when all attack classes are known. Almost all records are classified correctly. In Figure 6(b), all Theft records are misclassified. Due to the closed set nature of softmax, this will happen definitely, which reduces the classification capabilities of the deployed model. When it comes to our method, records of the Theft class are classified correctly at about 0.91. Some records of Reconnaissance are predicted as DoS, while some records of other classes are also misclassified as Theft.

For the Gas Pipeline dataset shown in Figure 7, when softmax is used in the open set environment presented in Figure 7(b), most of the Reconnaissance records are predicted as Benign wrongly. Also, most of the NMRI records are misclassified even under the closed set, which is consistent with the results shown in Figure 5(c), in which the score of all methods is poor. All unknown Reconnaissance records are classified as the unknown category correctly in Figure 7(c). However, some data of known classes are classified wrongly.

Table 8 reports the statistics on two datasets. The softmax represents a neural network with the softmax function trained under the closed set environment. In this case, all unknown class samples are incorrectly classified. This result is used to compare the improvement of the other four methods. Our proposed method has significant improvements in classification performance than the closed set softmax classifier. Because the unknown attack data are been classified correctly, our method yields encouraging outcomes. When compared to other methods, our method produced the best results for three metrics on both datasets. The CADE method also enhances the performance of overall classification. Compared with CADE, our method has higher improvement on both datasets. However, the methods (autoencoder and openmax) produce worse results than the softmax classifier because they fail to classify known classes, resulting in a decrease in performance.

4.3.3. Sensitivity Analysis

During the experiments, the setting of hyperparameters affects the performance of the method. For simplicity, we run tests on various openmax function hyperparameter settings. There are two significant hyperparameters, tail size (top largest distance to fit Weibull distribution) and top rank (top largest AV to scaled WeibullCDF probability). We use the NF-BoT-IoT-v2 dataset to demonstrate this process. We vary these two parameters to show the effect in Figures 8 and 9.

(a)

(b)

(c)

(a)

(b)

(c)

Figure 8 indicates the average performance of detection unknown classes. The x-axis represents tail size change, whereas the y-axis represents the corresponding performance. In addition, different colors represent different values of top rank. The three measures are lower when the tail size is small. During the experiments, the tail size starts from 100 and increases by 200 each time. When rank is 1, the precision grows along with the tail size. Once the tail size reaches a certain point, it achieves steady results. When the tail size reaches 500, the precision begins to deteriorate as the rank rises to 2 or 3. This is due to the misclassification of an increasing number of instances that belong to known classes. The recall appears to be steady when tail size is more than 2,000. In Figure 8(c), the score achieves the best result when the tail size is about 1,100 and rank is 2.

The performance of the overall classification results is shown in Figure 9. When the tail size rises to 800, these metrics begin to fall when the top rank is 2 or 3, owing to misclassified instances. When the top rank is 1, these metrics are stable. When the top rank is set to 2 and the tail size is 1,100, the best score is 0.85. It is difficult to find ideal settings for the openmax function since there are no open set samples to check, as we described before. The tail size indicates the amount of data utilized to fit the Weibull distribution. It is tough to achieve better outcomes when the tail size is small. In our tests, we use tail size as 1,500 and top rank as 1. Although the results of our experiment are not optimal, the difference is acceptable.

5. Conclusions and Future Work

The security of ICSs is becoming increasingly crucial, as intrusion issues can cause dangerous consequences. In this study, we provide a novel solution for the ICS intrusion detection task, trying to address the open set recognition problem by combining openmax and center loss with the ANN model.

Considering the complicated network environment and rising means of attack, to build a more capable IDS, it should handle the open set recognition also. Open set recognition aims to train a classifier that not only classifies known classes but also handles unknown classes. To achieve this goal, openmax rejects samples that are far from known classes using the Weibull distribution derived from correctly classified instances. To improve the representation of input data, the center loss is paired with the origin softmax loss. And it increases performance further. We experiment on two datasets: NF-BoT-IoT-v2 and Gas Pipeline. Our proposed method achieves promising results compared with some baseline methods. In particular, the center loss function provides great help for improving performance.

In terms of future work, there are a few directions worth exploring. In this study, for example, the basis classifier is a shallow neural network. It can be improved by using a more powerful network architecture. In addition, since the center loss function provides great help for openmax, some other loss functions deserve investigation. Finally, the new encounter classes need to deal with. Since training a neural network requires a lot of data, it is important to explore how to train a new model when the number of samples belonging to this class is limited.

Data Availability

In this research, we use two datasets, namely, NF-BoT-IoT-v2 and Gas Pipeline, respectively. Readers who want to reproduce our results can access these datasets from corresponding reference papers.

Conflicts of Interest

All the authors hereby declare no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the National Key Research and Development Program of China (no. 2018YFB2004200).

References

J. Suaboot, Z. Tari, J. Grundy, and A. Fahad, “A taxonomy of supervised learning for IDSs in SCADA environments,” ACM Computing Surveys, vol. 53, no. 2, 2020.
View at: Publisher Site | Google Scholar
T. Alladi, V. Chamola, and S. Zeadally, “Industrial control systems: cyberattack trends and countermeasures,” Computer Communications, vol. 155, pp. 1–8, 2020.
View at: Publisher Site | Google Scholar
R. Langner, “Stuxnet: dissecting a cyberwarfare weapon,” IEEE Security Privacy, vol. 9, no. 3, pp. 49–51, 2011.
View at: Publisher Site | Google Scholar
M. A. Ferrag, M. Babaghayou, and M. A. Yazici, “Cyber security for fog-based smart grid SCADA systems: solutions and challenges,” Journal of Information Security and Applications, vol. 52, Article ID 102500, 2020.
View at: Publisher Site | Google Scholar
F. Zhang, H. A. D. E. Kodituwakku, J. W. Hines, and J. Coble, “Multilayer data-driven cyber-attack detection system for industrial control systems based on network, system, and process data,” IEEE Transactions on Industrial Informatics, vol. 15, no. 7, pp. 4362–4369, 2019.
View at: Publisher Site | Google Scholar
W. Liang, K.-C. Li, J. Long, X. Kui, and A. Y. Zomaya, “An industrial network intrusion detection algorithm based on multifeature data clustering optimization model,” IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 2063–2071, 2020.
View at: Publisher Site | Google Scholar
S. Huda, J. Yearwood, M. M. Hassan, and A. Almogren, “Securing the operations in SCADA-IoT platform based industrial control system using ensemble of deep belief networks,” Applied Soft Computing, vol. 71, pp. 66–77, 2018.
View at: Publisher Site | Google Scholar
J. Ling, Z. Zhu, Y. Luo, and H. Wang, “An intrusion detection method for industrial control systems based on bidirectional simple recurrent unit,” Computers & Electrical Engineering, vol. 91, Article ID 107049, 2021.
View at: Publisher Site | Google Scholar
E. M. Rudd, A. Rozsa, M. Günther, and T. E. Boult, “A survey of stealth malware attacks, mitigation measures, and steps toward autonomous open world solutions,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 1145–1172, 2017.
View at: Publisher Site | Google Scholar
C. Geng, S.-J. Huang, and S. Chen, “Recent advances in open set recognition: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2020, Article ID 2981604, p. 1, 2020.
View at: Publisher Site | Google Scholar
A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019.
View at: Publisher Site | Google Scholar
M. Z. Alom, C. Yakopcic, S. Westberg, and T. Taha, “The history began from AlexNet: a comprehensive survey on deep learning approaches,” 2018, https://arxiv.org/abs/1803.01164.
View at: Google Scholar
V. Rimmer, D. Preuveneers, M. Juarez, T. Van Goethem, and W. Joosen, “Automated website fingerprinting through deep learning,” in Proceedings of the Network and Distributed Systems Security (NDSS) Symposium 2018, San Diego, CA, USA, February 2018.
View at: Publisher Site | Google Scholar
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol. 7, pp. 41525–41550, 2019.
View at: Publisher Site | Google Scholar
M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and H. Janicke, “Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study,” Journal of Information Security and Applications, vol. 50, Article ID 102419, 2020.
View at: Publisher Site | Google Scholar
M. Ganesh, P. Pednekar, P. Prabhuswamy, D. S. Nair, Y. Park, and H. Jeon, “CNN-based android malware detection,” in Proceedings of the 2017 International Conference on Software Security and Assurance (ICSSA), pp. 60–65, Altoona, PA, USA, July 2017.
View at: Publisher Site | Google Scholar
A. Chu, Y. Lai, and J. Liu, “Industrial control intrusion detection approach based on multiclassification GoogLeNet-LSTM model,” Security and Communication Networks, vol. 2019, Article ID 6757685, 11 pages, 2019.
View at: Publisher Site | Google Scholar
S. Potluri, S. Ahmed, and C. Diedrich, “Convolutional neural networks for multi-class intrusion detection system,” Mining Intelligence and Knowledge Exploration, vol. 144, pp. 225–238, 2018.
View at: Publisher Site | Google Scholar
C. Feng, T. Li, and D. Chana, “Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks,” in Proceedings of the 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 261–272, Denver, CO, USA, June 2017.
View at: Publisher Site | Google Scholar
M. Al-Hawawreh, N. Moustafa, and E. Sitnikova, “Identification of malicious activities in industrial internet of things based on deep learning models,” Journal of Information Security and Applications, vol. 41, pp. 1–11, 2018.
View at: Publisher Site | Google Scholar
Y. Zhang, J. Niu, D. Guo, Y. Teng, and X. Bao, “Unknown network attack detection based on open set recognition,” Procedia Computer Science, vol. 174, pp. 387–392, 2020.
View at: Publisher Site | Google Scholar
A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1563–1572, Las Vegas, NV, USA, June 2016.
View at: Publisher Site | Google Scholar
L. Yang, “CADE: detecting and explaining concept drift samples for security applications,” 2021, https://www.usenix.org/conference/usenixsecurity21/presentation/yang-limin.
View at: Google Scholar
R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742, New York, NY, USA, June 2006.
View at: Publisher Site | Google Scholar
C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median,” Journal of Experimental Social Psychology, vol. 49, no. 4, pp. 764–766, 2013.
View at: Publisher Site | Google Scholar
Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in Computer Vision—ECCV 2016, pp. 499–515, Springer, Berlin, Germany, 2016.
View at: Publisher Site | Google Scholar
W. J. Scheirer, L. P. Jain, and T. E. Boult, “Probability models for open set recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2317–2324, 2014.
View at: Publisher Site | Google Scholar
W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult, “Meta-recognition: the theory and practice of recognition score analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1689–1695, 2011.
View at: Publisher Site | Google Scholar
Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database,” vol. 2, 2010, http://yann.lecun.com/exdb/mnist.
View at: Google Scholar
M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6, Ottawa, Canada, July 2009.
View at: Publisher Site | Google Scholar
K. Siddique, Z. Akhtar, F. Aslam Khan, and Y. Kim, “KDD cup 99 data sets: a perspective on the role of data sets in network intrusion detection research,” Computer, vol. 52, no. 2, pp. 41–51, 2019.
View at: Publisher Site | Google Scholar
M. Sarhan, S. Layeghy, N. Moustafa, and M. Portmann, “Towards a standard feature set of NIDS datasets,” 2021, https://arxiv.org/abs/2101.11315.
View at: Google Scholar
T. Morris and W. Gao, “Industrial control system traffic data sets for intrusion detection research,” in Proceedings of the 2014 International Conference on Critical Infrastructure Protection, vol. 441, Arlington, VA, USA, 2014.
View at: Publisher Site | Google Scholar
N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-IoT dataset,” Future Generation Computer Systems, vol. 100, pp. 779–796, 2019.
View at: Publisher Site | Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, and V. Michel, “Scikit-learn: machine learning in python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
View at: Google Scholar
F. Chollet, “Keras,” 2015, https://keras.io.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: surpassing human-level performance on imagenet classification,,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034, Washington, DC, USA, December 2015.
View at: Publisher Site | Google Scholar
D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” 2017, https://arxiv.org/abs/1412.6980.
View at: Google Scholar

Copyright

Copyright © 2021 Chao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1301

Downloads

846

Citations

Security and Communication Networks

Machine Learning for Security and Communication Networks

[Retracted] Intrusion Detection for Industrial Control Systems Based on Open Set Artificial Neural Network

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Overview

3.2. Openmax

3.3. Center Loss Function

4. Evaluation

4.1. Datasets

4.1.1. NF-BoT-IoT-v2 Dataset

4.1.2. Gas Pipeline Dataset

4.2. Evaluation Metrics

4.3. Results

4.3.1. Detection Ability of Unknown Classes

4.3.2. Comparisons with Closed Set Classifier

4.3.3. Sensitivity Analysis

5. Conclusions and Future Work

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright