Abstract

The massive use of information technology has brought certain security risks to the industrial production process. In recent years, cyber-physical attacks against industrial control systems have occurred frequently. Anomaly detection technology is an essential technical means to ensure the safety of industrial control systems. Considering the shortcomings of traditional methods and to facilitate the timely analysis and location of anomalies, this study proposes a solution based on the deep learning method for industrial traffic anomaly detection and attack classification. We use a convolutional neural network deep learning representation model as the detection model. The original one-dimensional data are mapped using the feature mapping method to make them suitable for model processing. The deep learning method can automatically extract critical features and achieve accurate attack classification. We performed a model evaluation using real network attack data from a supervisory control and data acquisition (SCADA) system. The experimental results showed that the proposed method met the anomaly detection and attack classification needs of a SCADA system. The proposed method also promotes the application of deep learning methods in industrial anomaly detection.

1. Introduction

Industrial control systems (ICSs) play an important role in critical infrastructure sectors such as railway, petrochemical, and electricity. Although the application of information technology has significantly improved production efficiency, it has also resulted in many cyber-attacks to ICSs that may cause damage to the national infrastructure and bring about major economic losses [1]. In 2010, an Iranian nuclear power station was attacked by Stuxnet using operating system vulnerabilities and led to the infection of many PLCs connected to centrifuges [2, 3]. This was the first discovered destructive malware explicitly designed for ICSs. In 2012, the Flame virus was discovered in several Middle Eastern countries. This malware can collect sensitive information from various fields such as individuals, private companies, government agencies, and academic institutions. In 2014, hackers attacked the ICSs in the energy field of Europe and the United States using the Havex malicious software to specifically target supervisory control and data acquisition (SCADA) systems [4]. In addition to these well-known industrial security incidents, hundreds of attacks against ICSs appear every year. Although there are fewer attacks on industrial systems than on the Internet, the damage caused by cyber-physical attacks cannot be ignored.

ICSs were initially designed with insufficient consideration for network security because of resource constraints and system isolation [5]. With the development of modern ICSs and information technology, potential security issues have been gradually exposed. To ensure an industrial process runs steadily and reliably, an ICS needs to be protected. Traditional information system solutions have been applied to ICSs as an early defense method. However, these measures cannot adequately detect cyber-physical attacks under real-time and resource-constrained conditions [6]. In recent years, the anomaly detection and safety protection of ICSs have been researched extensively worldwide to identify malicious patterns by measuring the deviation of current behavior with normal behavior as the standard. This study focuses on improving the performance of the industrial anomaly detection method. In addition, it is not enough to treat abnormal behavior detection as a binary classification problem. To quickly locate the source when a network is attacked and to achieve the mitigation and recovery of the control system state, it is necessary to observe ICS abnormal patterns in more detail.

Motivated by the research objectives above, this study attempted to apply deep learning to SCADA anomaly detection and attack classification. Deep learning is a type of machine learning method that has developed rapidly in recent years. Deep neural networks can capture deep relationships that are not easily obtained by ordinary means. The deep learning method can automatically learn from original data with no need to manually select features [7]. The deep neural network model we selected was the convolutional neural network (CNN) that has been widely used for image classification and recognition. We used this method to classify ICS abnormal traffic. A real gas pipeline data set proposed by the Mississippi State University was used in this study.

The remainder of this paper is organized as follows. Section 2 discusses related work. Section 3 provides the theoretical basis of the method, and Section 4 introduces the methodology of the proposed method based on a CNN. Section 5 introduces the data sets used in this study and shows the relevant experimental results. Finally, Section 6 summarizes the results.

2.1. Anomaly Detection

As a subset of intrusion detection, anomaly detection plays a significant role in the active defense process of ICSs. It is also a key technology for discovering abnormal behavior. To summarize existing research work, the anomaly detection approaches of ICSs include the following types.

2.1.1. Knowledge Based

This method analyzes the system state, behavior pattern, or protocol specification under normal conditions and establishes a detection rule to detect attacks that do not conform to the specification. For example, the industrial intrusion detection system proposed by Khalili and Sami [8] used the Apriori algorithm to generate candidate key states of the ICS iteratively. During each iteration, the key state of the industrial process is decided based on expert experience. Carcano et al. [9] proposed a state-based SCADA intrusion detection system using single-signature and state analysis techniques to analyze Modbus data packets and proposed a rule language to describe Modbus signatures and field device status. Experimental results showed that the proposed IDS detected potential threats to a SCADA system. Yang et al. [3] designed multiattribute IDS for SCADA systems in smart grids. The IDS was constructed using the access control whitelist and the protocol whitelist method. At the same time, normal system behavior rules were used as additional methods. For the high periodicity of an industrial control system flow, Goldenberg and Wool [10] used a deterministic finite automaton (DFA) for each HMI-PLC channel’s modeling to construct an accurate flow detection model. However, the traditional cyclic DFA model could not handle the burst traffic in an industrial control network. Markman et al. [11] constructed a new burst-DFA model based on semantic analysis that successfully solved the above problem. The shortcoming of the anomaly detection technique based on knowledge is that it cannot detect potential attack operations that exploit system vulnerabilities or that conform to protocol specifications.

2.1.2. Statistics Based

The statistical anomaly detection technique often uses analytical and statistical correlation methods to analyze the parameters of ICSs and establish the normal behavioral contours of a system. Nasr and Varjani [12] analyzed the alarm attributes of a SCADA system and used a statistical method called SADM to detect the abnormality of a smart grid. The CUSUM algorithm is a commonly used statistical algorithm. Do et al. [13] proposed an improved algorithm named VTWL CUSUM to capture transient attacks in SCADA systems and optimized the algorithm by the finite moving average method. The inadequacy of statistical anomaly detection methods is that they are insensitive to the sequence of events and internal connections. At the same time, intruders can train detection systems to treat intrusions as normal behavior.

2.1.3. Machine Learning Based

Machine learning methods have been extensively used in data mining, speech recognition, target detection, and other fields. Researchers have tried to apply related technologies in the field of anomaly detection in ICSs. Shang et al. [14] used the Modbus/TCP protocol as their research goal and trained an OCSVM model using the protocol function code sequence. Furthermore, the particle swarm optimization algorithm was used to optimize the model parameters. Zhou et al. [15] comprehensively analyzed the multidomain knowledge of the field-control layer in industrial process automation and extracted multimodal data based on domain knowledge. An intelligent classifier based on the hidden Markov model (HMM) was constructed to realize the intrusion detection of industrial process data. Nader et al. [16] researched two approaches of one-class classification regarding support vector machines and applied these two methods to SCADA anomaly detection. A heuristic algorithm was also proposed to optimize the machine learning parameters.

To a certain extent, intrusion detection based on machine learning can improve the accuracy of abnormal behavior detection in the industrial control environment, and it is of great significance in establishing an intelligent and efficient intrusion detection model.

2.2. Deep Learning

We were inspired by the findings of Nader et al. [16]. They proposed that a multiclass classification of anomalies can be achieved based on the completed work so that the types of attacks on a SCADA system can be directly determined. This proposal is very valuable, and we attempted to implement anomaly detection and attack classification for ICSs and also to consider algorithm performance. Deep learning methods seem to be a good choice. Compared with traditional machine learning methods, the most significant advantage of deep learning is the ability to learn features directly from the original data automatically, and it has excellent performance [17]. The successful application of deep learning methods in the field of image classification and speech recognition has fully proved this point, and there have been some related studies in the security field [18].

Javaid et al. built a network intrusion detection system using the self-taught learning method [19]. The authors used the proposed approach to perform anomaly detection on an NSL-KDD data set and further improved the classifier performance using the NB tree, random tree, and J48. The metrics for two-class and five-class problems were calculated to demonstrate the effectiveness of the self-taught learning method. Thing [20] analyzed threats that may exist in IEEE 802.11 networks and used the stacked automatic encoder (SAE) to implement anomaly detection and attack classification. The authors used different activation functions to enhance the classification performance of the model. The experimental results showed that the PReLU model based on the two-hidden-layer and the three-hidden-layer architecture had a relatively balanced performance in the anomaly detection and attack classification for IEEE 802.11 networks. At the same time, the SAE of the two-hidden-layer architecture was superior to that of the three-hidden-layer architecture. The experimental results showed that the proposed method had a higher overall accuracy of 98.6688% compared with the most advanced method. Yan and Han [21] used an SAE model to reduce the original sample data and then added sparse constraints to the model to improve the generalization ability and classification accuracy of the model. Experiments showed that the feature extraction ability of SAE was significantly better than that of traditional machine learning methods.

Some deep learning methods have been applied to smart grids. Ashrafuzzaman et al. [22] used a feed-forward artificial neural network to detect false data injection attacks against a power grid. Wilson et al. [23] used the SAE model to detect anomalies in a power system. Wang et al. [24] used the SAE model in their research and found that a deep learning model reduced the uncertainty of electric load forecasting and thereby indirectly improved the performance of anomaly detection. Existing research shows that deep learning also has potential in industrial system security.

Existing research has shown that the deep learning method has superior performance in the field of anomaly detection and attack classification. In this study, we used these studies to promote the use of raw industrial traffic data to perform anomaly detection and attack classification tasks. We attempted to convert industrial data streams into different images and use a CNN for learning and testing. To the best of our knowledge, this is the first attempt to apply the CNN method for industrial anomaly detection and classification.

3. Problem and Solution Statement

In this section, we analyze the characteristics of industrial networks in comparison to a traditional network. These statements will provide a theoretical basis for our proposed solution. The inadequacy of anomaly detection based on the traditional machine learning method is also taken into consideration.

3.1. Correlation between Features

Industrial network traffic variables have a stronger correlation than general-propose computer networks. Operating a particular variable in the industrial production process may cause a knock-on effect. Undoubtedly, the trend of changes in features caused by cyber-attacks is different from those of the normal behavior. This suggests that the rational application of the correlation between industrial characteristics will assist in the detection of anomalies.

3.2. High Cost of Error

The significance of national infrastructure determines the higher requirements for anomaly detection and classification of industrial network traffic compared with general-propose computer networks. Misclassification of industrial traffic may lead to catastrophic consequences. Anomaly detectors should strive for higher detection rates and response speeds to reduce associated costs. The precise classification of various attacks should be guaranteed to achieve timely analysis and identify the location of anomalies.

3.3. Imbalanced Instance Distribution

An ICS is in a stable and normal state for the majority of its operating time; therefore, it is relatively easy for researchers to obtain normal traffic samples. In reality, the probability of anomalous events occurring is lower than in normal events, which makes it difficult to collect attack traffic data. Imbalanced distribution may negatively impact the performance in machine learning-based anomaly detectors and classifiers. One of the traditional solutions is to synthesize minority class samples by exploiting the similarity of feature spaces, but this method’s disadvantage is overgeneralized [25]. Another option is to reduce the size of majority samples using a random deletion method; however, some important information in majority of the samples will be lost. As a trade-off, feature engineering improves the model ability and solves the imbalance problem by selecting and extracting significant features that represent the characteristics of a sample. However, the current pattern of manually selecting features is very complicated and requires researchers to have heuristic expertise.

Based on the above analysis, we propose an anomaly detection and attack classification model based on a CNN. The CNN is a popular deep learning technique and has been confirmed as the best choice in the field of image classification. The solution we propose has the following advantages.

3.4. Feature Correlation

The method we propose considers the actual situation of the industrial production process and the relationship between traffic features. We used a feature mapping method based on the Mahalanobis distance before the data input to implement the correlation measure between the dimensions of traffic instances.

3.5. High Performance

The sparse connectivity and shared weights of a CNN greatly reduce the related parameters and improve the training speed. The feature extraction layer eliminates similarities and maintains differences between various class instances. In other words, removing redundant information will be more conducive for correct detection results.

3.6. Feature Selection Automation

Deep neural networks can perfectly solve the problem of sample imbalance and reduce the workload of manual design features because they have the ability to automatically learn from the training data and obtain new effective feature representations.

4. Methodology

Based on the discussion in Section 3, this section provides a detailed description of the relevant methods. Considering the input requirements and the full utilization of the feature correlation, we first encoded the captured industrial process traffic and mapped it into a matrix. Next, we used a CNN to learn the data features and extract deeper representations that were more conducive to recognition. Finally, anomaly detection and classification were performed by a supervised machine learning algorithm.

4.1. Feature Mapping

We propose a feature mapping method based on the Mahalanobis distance that transforms one-dimensional data into a two-dimensional matrix that can be used as CNN input. The Mahalanobis distance takes into account the relationship between features, and there is a certain correlation between the features in an ICS. We present the steps to convert an industrial process data stream into a Mahalanobis distance matrix.

First, we defined n industrial data traffic flow as . Based on the characteristics of network traffic for an industrial environment, can be expressed as , where m is the number of features included in each data flow. In this study, the value of m was 26 and represents the value of the l-th feature in the i-th data stream.

To make full use of the correlation between different features in the i-th network stream feature vector, was converted into a matrix of m rows and m columns. The specific conversion method iswhere is an m-order identity matrix. Obviously, the diagonal elements of the matrix are the values of the m-dimensional features.

We can represent each column of matrix with an m-dimensional vector :here,

Therefore, can be expressed in m m-dimensional vectors as

Next, we calculated the covariance matrix of the matrix to find its inverse matrix:

The correlation between different features of the traffic flow feature vector is defined by the Mahalanobis distance as

Ultimately, the i-th flow record can be represented as a symmetric matrix with m rows and m columns diagonally all zeros as

4.2. Feature Matrix Visualization

We can treat each element in the MHD matrix after feature mapping as a pixel point; therefore, each data set instance can be transformed into a grayscale map. Before generating the grayscale map, we used the following method to normalize the MHD matrix:where and represent the minimum and maximum values in the matrix , respectively. After the linear transformation shown in equation (8), the original elements were mapped in [0, 1].

The size of the grayscale maps generated by the data set used in this study was 676 (26  26) bytes. The visualized images of each category are shown in Figure 1, and the descriptions of the categories are provided in Table 1.

Observable differences between the various types of SCADA traffic prove that the anomaly detection method was reliable.

4.3. Convolutional Neural Network

A CNN is a multilayer neural network developed from a traditional neural network. It essentially learns a deep nonlinear network structure, realizes complex function approximation, and represents the input-output mapping relationship [26]. At the same time, it learns the basic characteristics of a data set from a small sample set. The CNN consists of convolutional layers, subsampling layers, and fully connected layers. The common CNN architecture is primarily a combination of convolutional layers and subsampling layers. The fully connected layers are the upper layers, the input is the features extracted by the lower layers, and the output is the classification result.

Figure 2 depicts the CNN architecture used in this study. Compared with other classical architectures such as AlexNet, VGGNet, and GoogleNet, the model architecture and input size of LeNet-5 were more suitable for our requirement. We adjusted the CNN architecture of LeNet-5 appropriately in two aspects. First, the input layer of the network was designed to be 26 × 26 pixels according to the grayscale image size obtained by the feature mapping. Second, the number of nodes in the output layer was changed.

CNN takes the original image of size 26 × 26 × 1 as input, and the convolution layer C1 performs convolution operations with six 3 × 3 kernels to obtain six 24 × 24 feature maps. The S2 layer takes the output of the C1 layer as an input and uses 2 × 2 windows to pool six images and then obtains six 12 × 12 feature maps. The kernel size of C3 was the same as that of C1, but we used the pretrained 16 channels for convolution operations; therefore, the result was 16 10 × 10 feature maps. The subsampling S4 layer pooled 16 inputs using 2 × 2 receptive fields, and the result was 16 5 × 5 feature maps. The last two layers were fully connected layers. The number of nodes set was referenced to LeNet-5, and there were eight final output nodes. The CNN architecture used the Softmax function to implement multiclassification and dropout to mitigate overfitting of the model and the nonlinear activation function ReLU to introduce sparsity into the neural network.

5. Evaluation

5.1. Data Set

The most well-known intrusion detection data set is the KDD data set [27], which is a transformation of the DARPA data set and contains nine weeks of network connectivity data collected on a simulated US Air Force LAN. NSL-KDD is an improved version of the KDD data set that enhances the performance of the classification method [28]. However, KDD and NSL-KDD are intrusion detection data sets on traditional networks. The data set used in this study consisted of real data collected from the SCADA system of a natural gas pipeline test platform designed and developed by the Mississippi State University [29]. The platform included a master terminal unit, a remote terminal unit, and a human-machine interface. A proportion integrals differential (PID) controller was used to maintain the stability of the air pressure in the pipeline. The natural gas pipeline data set had a total of 27 features grouped into two classes: network traffic features and payload content features.

The network traffic features were used to indicate the communication status of the SCADA system and include the device address of the request/response packet and the location and byte size of the request/response memory in the packet. The time feature was used to record the time interval between the request and response in the SCADA communication. There was not much difference between normal data instances. Similarly, when the SCADA system was in a normal communication state, the value of the command/response CRC error rate feature was small, and the value of this feature may increase when the system is attacked.

The payload content characteristics changed in different SCADA systems because of the variety of measurement variables and control methods. In the gas pipeline data set used in this study, the payload features included response/command function codes, current measured values/initial values of gas pressure, and system control modes. In addition to these parameters, PID-related attributes were also considered, and adjustments to related parameters could change the behavior of the PID controller.

Finally, the label feature was used to determine whether the data instance was normal behavior or a certain type of attack. The specific categories of natural gas pipeline data set are shown in Table 1. The data set contained eight types of labels. Table 2 shows the distribution of these data instances.

5.2. Experiment Description

We used TensorFlow developed by Google to implement our CNN model. The goal of the proposed model was to achieve SCADA traffic classification. Eighty percent of the data set was used for training the model, and the other 20% was used for testing. The batch size was 50, and the learning rate was set as 0.001. The CNN was iteratively trained until its loss function converged. The time spent on training was 20 epochs. The detailed process of the experiment is shown in Figure 3.

The experimental process was executed as follows:Step 1 (collecting raw data and processing it): the SCADA data set is divided into a training data set and a test data set, and feature mapping is performed.Step 2 (initializing CNN): construct a CNN with reference to the architecture shown in Figure 2.Step 3 (training the model): the training data are input into the model, and the weight coefficients of each layer of the CNN are determined through training.Step 4 (testing the model): the test data are entered into a CNN to classify the SCADA traffic and determine if each metric is above a threshold. If it is larger than the lower limit, fine-tune the parameters and seek the optimal result; otherwise, directly adjust the parameters and repeat Step 3.

Experiments were conducted with an 8 GB RAM 3.2 GHz i5 CPU operating system. Each new instance spent approximately 0.253 s in the feature mapping process and 0.192 s in the detection process. The results show that our method had real-time significance in SCADA with a sampling rate of 1 Hz.

5.3. Evaluation Metrics

Two application scenarios of this method were taken into account. One case was binary classification, that is, a simple distinction between normal traffic and abnormal traffic. The other case was multiclassification, in which the instance was directly tagged with a specific label. The following metrics were used to evaluate the performance of the classifier:

Accuracy (A) represents the overall performance of the classifier. In addition, the results in Table 3 reflected a problem of unbalanced sample size; therefore, we calculated the precision (P) and recall (R) of each type of instance to ensure that the results were not distorted because of too many normal samples. The F1 value acted to reconcile P and R.

5.4. CNN Design

We designed the model according to the basic hierarchical structure of LeNet-5 to analyze the influence of different architectures on the experimental results. First, considering the small input size, the sampling window of the subsampling layer was set as 2 × 2 to avoid information loss. Next, to ensure a fixed center point and sensitivity to edges in the convolution process, only convolution kernels with odd sizes were used. Finally, when the size of a convolution kernel was too large, it was easy to overfit, which is not conducive to subsequent training and testing. Therefore, the size of the convolution kernel in the designed network structure was no more than 5 × 5. The four architectures and their performance on data sets are shown in Table 3.

As shown in Table 3, the performance of the first structure was the best with accuracy and loss values of 99.32% and 0.025, respectively. The results of the second and third architectures were roughly the same, suggesting that the kernel order had no effect. Results of the fourth structure were the least ideal, suggesting that a large convolution kernel size is a poor choice.

5.5. Experiment Results

The metrics of the binary classification and the multiclassification were calculated. The accuracy of the two scenarios is shown in Table 4. The other three metrics of the binary classification and the multiclassification are shown in Tables 5 and 6, respectively.

The results in Table 4 show that our method satisfied the application requirements of binary classification and multiclassification (the overall accuracy of binary classification and multiclassification was 99.46% and 99.32%, respectively), and the average accuracy of the classifiers was 99.39%. Table 5 indicates that our method was good at solving the two-class problem; the lowest indicator also reached 98.76%. Our approach also performed well in the eight-class problem. Table 6 shows the accuracy, recall, and F1 values of the multiclassifier where the reconnaissance attack achieved the optimal value (recall, precision, and F1 values were 100%). The NMRI and MSCI indicators were slightly lower (93%–98%), and we will further explore in the future whether this is owing to certain characteristics of the two types of data.

5.6. Comparison

Before comparing with other machine learning methods, we conduct a lateral comparison experiment. The gas pipeline system reduced (10%) data set was used for model training and testing, and this data set also published by the Mississippi state university. The performance of the two data sets on different categories is shown in Table 7.

We found that the performance of this research method on 10% data set is no better than the complete set, both on the recall and precision. Only three categories (Normal, CMRI and Recon) scored higher than 90%, and only Recon results are at the same level as the original performance. This seems to prove that data set size does affect the performance of deep learning methods. But in the long run, this is not a serious problem, because the era of industrial big data has arrived.

The performance comparison with other methods is divided into two parts, accuracy performance and time occupancy performance. Due to the different evaluation metrics used in different references, it is difficult to perform a full performance comparison for the same method. Considering that the time performance of similar methods should be similar, we compare the accuracy performance and time occupancy performance of different methods using the same evaluation metrics.

We compared the accuracy performance of our method with the following three methods: the SVM ensemble method derived from the study of Nader et al. [16] that used multiple support vector machine combinations to implement the construction of a multiclassifier, and the HMM [30] and decision tree method [31] which are classical machine learning algorithms.

Table 8 shows the overall accuracy of the four multiclassification methods. The performances of the traditional machine learning methods were similar. The accuracies of the SVM ensemble method, the HMM method, and the decision tree method were 94.5%, 93.4%, and 93.1%, respectively. In contrast, our proposed anomaly classification method based on the CNN achieved the highest overall accuracy of 99.3%.

Figures 4 and 5 depict the performance of different methods in recall and precision, respectively. As can be seen from the recall analysis, our method obtained the highest score in five categories (Normal, CMRI, MSCI, MPCI, and Recon) where the instance of the normal state and the reconnaissance attack were nearly perfectly classified. In the case of the DoS attack, our method had a recall of 96%, which was slightly lower than that of the HMM and the SVM ensemble method. The precision of most types of traffic obtained through our method was mostly maintained at a high level above 98%, except for the MSCI attack with a detection precision of 94.9%. Note that this value was only 2.5% lower than that of the SVM ensemble method.

In addition, we found the metric values were zero in some situations. The HMM performed poorly at detecting both the CMRI attack and the MSCI attack, which the authors believe was owing to flaws in the training set [26]. Similar to the previous situation, the SVM ensemble method did not detect an NMRI attack; the metric value of both recall and precision was zero. However, in contrary to the above methods, our method successfully identified all kinds of attacks. In the detection of an NMRI attack, our method achieved 93.8% recall and 99.7% accuracy. Our method also had excellent performance in the CMRI attack and the MSCI attack which was not detected by the HMM. From the overall trend, the fluctuation range of our deep learning model’s indicators was smaller than that of other methods, and the recall rate and accuracy were concentrated in the ranges 93%–100% and 94%–100%, respectively. This phenomenon indicated that our proposed classifier had a more balanced performance.

We compared the time occupancy performance of our method with the following methods [32]: the self-adjusting memory (Sam) model for the k-nearest neighbor (k-NN) algorithm (SAM-kNN), the primal estimated subgradient solver for support vector machines (SVM) algorithm (Pegasos), the adaptive random forests (ARF) algorithm, and the evolving spiking restricted Boltzmann machines algorithm (e-SREBOM). The comparison results are shown in Table 9.

By comparison, it can be seen that the CNN operation consumes more time, but the difference is not large, and the Kappa statistic of the e-SREBOM is 74.31, which is the best method of accuracy in the comparison method. Considering that GPUs can greatly improve the computational efficiency of CNN and the continuous improvement of computing power in recent years, our method still has great advantages.

6. Conclusions and Future Work

ICSs are the lifeblood of countries, and it is necessary to implement anomaly detection to ensure their security. In this study, we proposed a method based on deep learning to achieve anomaly detection and attack classification for SCADA systems. Considering the characteristics of the relationship between the various features of ICSs, a feature mapping method based on the Mahalanobis distance was proposed. Our feature mapping method converted one-dimensional flow data into a two-dimensional matrix to be used as a CNN input.

The experimental results show that the proposed method achieved excellent performance on both the two-class problem and the multiclass problem, met the expected requirements of SCADA anomaly detection and attack classification, and provided assistance for the safety of an ICS.

At present, the method we proposed cannot detect new types of attacks, but it can theoretically detect the corresponding variant attacks by learning the basic knowledge and principles of existing attacks. In the future, we will design attacks according to the characteristics of SCADA scenarios and prove the effectiveness of our CNN method. At the same time, we will evaluate the possibility of using other deep learning methods in the SCADA anomaly detection and attack classification to achieve better industrial security defense.

Data Availability

The data used to support the findings of this study are included within the article. The original dataset we used in the experiments is industrial control system (ICS) cyber-attack dataset, which is published by Tommy Morris and Wei Gao from Mississippi State University. All data can be accessed from https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by the Qinghai Province Natural Science Foundation (2017-ZJ-91), the National Natural Science Foundation of China (61872015), the Foundation of Science and Technology on Information Assurance Laboratory (No. 614211204031117), and the Beijing Polytechnic Research Fund (2017Z004-008-KXZ and 2018Z002-019-KXZ).