Abstract

Industrial control systems (ICSs) are closely related to human life. In recent years, many ICSs have been connected to the Internet rather than being physically isolated, which has improved business efficiency while also increasing the risks of being attacked. The security issues of ICSs have gotten a lot of interest in the research community because attack events that aim at ICSs can cause catastrophic damage. An intrusion detection system (IDS) serves as an important tool for providing protection. Many IDS studies using machine learning and deep learning have been proposed. However, high-dimensional data may cause overfitting, resulting in inferior performance. To improve the classification performance, we suggest a dimension reduction technique based on the supervised autoencoder (SupervisedAE) and principal components analysis (PCA) in this study. To obtain more discriminative latent representations, compared with the conventional autoencoder, the SupervisedAE absorbs the label information during the training process. In this way, the improved autoencoder model is trained with reconstruction error and classification error simultaneously. Based on the latent representations extracted from the SupervisedAE, we add the PCA algorithm. The additional PCA algorithm reduces the dimension of features further. We conduct a series of experiments utilizing the suggested technique on a public power system data set to evaluate the performance. Compared with various dimension reduction methods, including autoencoder variants, the technique proposed in this study shows higher performance. In the meanwhile, it outperforms some existing detection methods in terms of accuracy and F1 score.

1. Introduction

Industrial control systems (ICSs) are widely used in industries to fulfill certain industrial goals, such as manufacturing, material transportation, or energy transportation [1]. Since the ICSs were connected to the Internet and exposed to numerous attacks, the security problems have been studied for years. ICSs differ from standard IT systems in that when they are been attacked, it poses a major danger to human health and safety, as well as financial loss [1]. Many ICS attacks have occurred in recent years [2], including Stuxnet [3], BlackEnergy [4], and Industroyer [5].

Intrusion detection system (IDS) has received a lot of attention as a strong tool for providing protection [610]. In general, IDS examines data records in the network or from the host to determine whether or not operations are safe. The system would raise alarms if operations are malicious. The operator would then make the decision based on the examination results. IDSs based on machine learning and deep learning [1114] have been thoroughly investigated in recent years. However, overfitting caused by high-dimensional data still poses a challenge for the performance of IDS. The authors of [15] point out that the dimension reduction techniques, in addition to the classifier algorithms, are important in the IDS research also. There have been many works proposed for the dimension reduction of IDS. To build an efficient IDS, principle component analysis (PCA) and linear discriminant analysis (LDA) are used [16, 17]. A sparse autoencoder [18] is implemented to reduce the dimension of input features for support vector machine (SVM) based IDS.

In this study, we suggest a dimension reduction technique that combines supervised autoencoder (SupervisedAE) with PCA algorithm for the IDS to overcome the challenge of high-dimensional problems. Compared with the conventional autoencoder, SupervisedAE adds label information during the training time to obtain more discriminative latent representations. The improved autoencoder model is trained by reconstruction error and classification error. To further decrease the dimension, the PCA algorithm is applied to the latent representations extracted from SupervisedAE. This research has made the following contributions:(i)We employ an improved autoencoder called SupervisedAE for the dimension reduction of IDS. The novel model adds a softmax layer that connects to the output of the encoder. With the joint loss function (reconstruction error and classification error), the SupervisedAE learns more discriminative latent representations. Experiment results demonstrate that the SupervisedAE outperforms other autoencoders.(ii)To reduce the dimension further, we apply the PCA algorithm to the latent representations extracted from SupervisedAE. We use a nested cross-validation procedure to select the optimal number of PCA components in the experiment. The combined technique shows higher performance than some other dimension reduction methods.(iii)We conduct a series of experiments on a power system data set. There are four distinct classifiers used in the test. With the suggested dimension reduction technique, these classifiers achieve higher performance than using original features. In particular, the K-nearest neighbors (KNN) classifier achieves the best results. The results are higher than other existing detection methods.

The remainder of the paper is laid out as follows. First, some related works are presented in Section 2. Especially, some works that involve dimension reduction are thoroughly examined. Then we have a detailed introduction to the SupervisedAE and the whole framework of IDS in Section 3. After that, Section 4 shows the performance of our method evaluated on the power system data set and compares the results to other methods. Finally, we draw the corresponding conclusion and point out some future works in Section 5.

The ICSs have been connected to the Internet in recent years, which increases the danger of being hacked. The security issues of ICSs have received a lot of attention [19, 20]. IDS is one of the effective security solutions [21] for monitoring activities and ensuring regular processes (the other three being authentication solutions, privacy-preserving solutions, and key management systems).

Many works about IDS have been proposed in the research field. An SVM-based IDS is built using the features of ICS communication to categorize regular and abnormal packets [6]. The authors of [7] presented an IDS model based on a random forest with an adaptive boosting technique to increase the detection rate. An IDS based on a bidirectional simple recurrent unit is proposed in [8]. With the help of skip connections, the proposed model improves the training effectiveness. To build an effective intrusion detection model, a hybrid deep belief network (DBN) is developed [22], which improves accuracy over previous DBN approaches. Two detection methods based on random subspace methods were proposed [9, 10]. These two methods would be used to compare with our method.

The authors of [15] introduce some key points related to IDS. In particular, they introduce some works that include feature engineering. Because of the curse of dimensionality, high-dimensional data may induce overfitting for trained models, as well as require more memory and computational cost. How to reduce the dimension of features is a significant work. There are two approaches for removing irrelevant features and improving the performance of the model: feature selection and feature extraction. The feature selection method [23, 24] is used to select a subset of features. Compared with feature selection, feature extraction maps the original features into a low-dimensional feature space. Authors of [25] proposed a model that combines oversampling and feature selection. The model employs the gradient penalty Wasserstein generative adversarial networks to generate additional attack samples and the ANOVA approach to choose a subset of features. The combined model shows better performance. An artificial neural network classifier is used to create the IDS [26]. The ranking of information gain and correlation is used to implement feature reduction. The results are promising. A correlation-based feature selection strategy was proposed to eliminate irrelevant features and improve detection performance [27]. SVM, multiple layer perceptron (MLP), and KNN are used to identify attacks using the new subset of features. The KNN technique delivers the best results when compared to results produced utilizing original features. We would compare performance with these methods.

As a feature extraction method, PCA has been widely used in the field of intrusion detection [2830]. A hybrid approach combining information gain and PCA was proposed [29], and the ensemble classifier based on SVM, instance-based learning algorithm, and MLP is used to detect attacks after dimension reduction. The model has achieved encouraging performance. Autoencoder, a sort of neural network, is also used to reduce the dimension of features [31].

Various works have proved that dimension reduction methods could improve the performance of IDS. In this study, we focus on the feature extraction method and improve the autoencoder model by absorbing the label information during training time.

3. Methods

We introduce the overall framework of our suggested technique in this section. To begin, we go through the theory of autoencoder in-depth, covering the conventional autoencoder and sparse autoencoder. In particular, the SupervisedAE is discussed. Then, we explain the PCA algorithm applied to the latent representations extracted from SupervisedAE. Finally, the entire framework of IDS is presented.

3.1. Autoencoder
3.1.1. Basic Autoencoder

Autoencoder (AE) is an unsupervised neural network [32]. Typically, the autoencoder is employed to reduce the dimension of features. The fundamental concept of the autoencoder is to rebuild the input.

As shown in Figure 1, the autoencoder is separated into two parts: encoder and decoder. The encoder converts the input into the latent representation, and in the meanwhile, the decoder reconstructs input from the compressed latent representation. Considering the following set of input data , where is the number of samples in the data set, the encoder function transforms the input data into the latent representation . Then, using the decoder function , the new reconstructed input is obtained as follows:

The goal of the training autoencoder is to find parameters in the encoder and decoder that minimize the reconstruction error. In this study, we use the mean squared error to calculate it. The corresponding loss function is as follows:

The classic autoencoder is a three-layer neural network with only one hidden layer. The other two layers are input and output. The number of neurons in the hidden layer is usually fewer than it in the input layer to avoid the network just copying the input into output. The architecture can be expanded to include more hidden layers. The number of neurons in the encoder gradually decreases. And the number of neurons is generally symmetrical for encoder and decoder. The compressed latent representations are learned by the autoencoder in this way. The decoder is usually abandoned after training the autoencoder, and the latent representations are used for following classification or other operations.

3.1.2. Sparse Autoencoder

To extract better feature representations, there are some variants of autoencoder by imposing constraints. The sparse autoencoder (SparseAE) [33] adds a sparse penalty term in the hidden layer. The corresponding loss function is given by

Compared with the conventional autoencoder, the loss function of SparseAE add a sparsity penalty term , where is the number of neurons in the hidden layer, is a predefined sparsity parameter, and denote the average activation of hidden unit . function is the Kullback–Leibler divergence that is used to measure the divergence between and . in (3) is used to control the weight of the penalty.

3.1.3. Supervised Autoencoder

In this study, we utilize a supervised classification model for the development of IDS. It should be trained using the labeled data set with normal and attack samples before being used in practice. When we employ the classic autoencoder to reduce the dimension of features, it is frequently used in an unsupervised manner without the label, as stated before. It seeks to minimize the reconstruction error as much as possible. Then, the IDS model is trained on the compressed latent representations instead of original features.

To improve the conventional autoencoder and obtain more discriminative latent representations for classifying, a supervised autoencoder model was proposed [34, 35]. The novel model adds the class label during the training process of the autoencoder. Figure 2 depicts the architecture of the SupervisedAE model used in this study. It adds a softmax layer connected to the latent layer compared with the autoencoder shown in Figure 1. The label information is processed by the softmax layer. The number of neurons in the softmax layer is the same as the number of classes. In this way, the SupervisedAE is trained by reconstruction error and classification error simultaneously. The implementation of SupervisedAE employed in this study is described as follows.

Softmax is a function used by neural networks to perform classification tasks, in which cross-entropy is used to measure the classification error. Consider the following collection of classes: , with the label , where , for input , the softmax function outputs a probability for class as demonstrated in the following equation:

where is the output of fully connected layer connected to latent layer.

With the corresponding probability, the classification loss is calculated bywhere is a function that indicates whether or not the condition is satisfied. If the condition is met, . Otherwise, the value is 0. The following equation displays the joint loss function :

The relationship between reconstruction error and classification error is controlled by the variable .

In this section, we introduce the SupervisedAE model. The new model is trained using a joint loss function with reconstruction error and classification error. In particular, the additional softmax layer calculates classification error. After training the improved autoencoder, when testing new samples, the decoder and softmax layer are abandoned. The new latent representation is more discriminative compared with that obtained by the conventional autoencoder.

3.2. PCA Algorithm

The PCA algorithm identifies the principal components in a data set that account for the largest amount of variance [36]. It has been used as a feature extraction method widely. The SupervisedAE introduced before can reduce the dimension of features to any predefined values. Motivated by the previous work that combines the sparse autoencoder and PCA [37], we employ the PCA algorithm to reduce the dimension of latent representations further. In this way, we can set the hidden layer settings of the SupervisedAE to some reasonable values and use the nested cross-validation procedure to choose the best number of components for PCA. The final reduced features output by PCA is used to train various classifiers.

The steps for extracting principal components are outlined below. Considering the latent representations extracted from SupervisedAE, the mean value is calculated first from each dimension. The covariance matrix is computed after removing the mean value for all dimensions. Then, derive the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors are chosen based on the number of components predefined and sorted by eigenvalues. A feature matrix is constructed by combining selected eigenvectors and sorting them by eigenvalues. Finally, the latent representations are transformed using the feature matrix. Readers can refer to [38] for further information on PCA calculation.

3.3. Framework

The basic elements of our proposed IDS model are presented with the introduction of the SupervisedAE and PCA algorithm. This section introduces the entire working procedure.

Figure 3 displays the whole framework. There are two phases: training and testing. The data set is divided into a training set and a testing set, and the training set will be utilized to train the overall model. The model has three parts: normalization module, dimension reduction module, and classifier module. In the training process, we first train the dimension reduction module. The detailed process is shown in Algorithm 1. All features are scaled using the min-max normalization. After that, we use the scaled training set and corresponding label information to train the SupervisedAE model as shown in lines 2–6. Then, the PCA algorithm is trained on the latent representations extracted from the SupervisedAE. Finally, the training set whose features have been reduced by the reduction module is used to train the various classifiers.

Data: Training features xi with label yi. Hyperparameter α, the number of iteration t.
Result: Parameters of dimension reduction module and reduced features
// Step 1: Preprocessing the training dataset
(1)Normalize data xi with Min-Max normalization by
// Step 2: Training SupervisedAE with the normalized dataset
(2)while not converge do
(3)   
(4)   Compute the joint loss by
(5)   Train SupervisedAE using the joint loss and update the parameters
(6)End
// Step 3: Computing the latent representation z i by encoder function
(7)
// Step 4: Reducing the dimension of latent representations z by PCA
(8)

In the testing phase, all of the trained modules are merged as the IDS model. And the integrated model predicts the class label for the testing set.

4. Evaluation

In this section, we display the performance of our proposed dimension reduction technique conducted on a power system data set. First, we introduce the data set used in the experiments. Then, the metrics used to evaluate the performance are described. Finally, the experiment results are presented and discussed.

4.1. Data Set

To evaluate the performance, we use the power system attack data set [39] created by Mississippi State University and Oak Ridge National Laboratory. The data set was generated by the configuration shown in Figure 4. The data set mainly includes measurements from each phasor measurement unit and data log from Snort, a simulation control panel, and relays.

There are two power generators in the configuration: G1 and G2. There are also four breakers (BR1 to BR4) that can be turned on or off by intelligent electronic devices (IEDs, R1 to R4). The data set has 128 features and 1 label for each sample. Each phasor measurement unit (PMU) has 29 different types of measurements, accounting for 116 different attributes in total. Table 1 contains a collection of corresponding features and their descriptions. The work state of PMU is described by these features. In addition, there are 12 features including control panel logs, snort alerts, and relay logs.

In the data set, there are 37 power system event scenarios, including natural events (8), no events (1), and attack events (28). In Table 2, the associated events and labels are listed. There are three attacks: data injection, remote tripping command injection, and relay setting change. The data injection attack is attackers try to blind the operator by sending a fake alert. Relay setting change is attacker alters the relay setting to disable their function. Remote tripping command injection is the attacker sends a command to make a breaker open. There are 15 files in the data set. On average, each file has around 5,300 samples.

4.2. Evaluation Metrics

The metrics utilized in classification tasks are used to evaluate our technique in this study. When an attack class is used as a positive class, four different types of classification results are displayed below:(i)True positive (TP): the attack sample is correctly classified as attack class(ii)False positive (FP): normal sample is wrongly classified as attack class(iii)True negative (TN): normal sample is classified as normal correctly(iv)False negative (FN): one attack sample is classified as normal

We calculate performance measures using formulas as follows based on the classification results provided above:

Because the data set contains a large number of classes, we calculate the results for all of them. After that, we calculate the average results for the overall performance.

4.3. Results

We use Python programming language to implement the proposed method on our machine that consists of Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40 GHz, three NVIDIA Tesla P100 PCIe 12 GB graphics cards, and 128 GB of RAM. To create neural network models, we use the Keras framework [40]. The scikit-learn library [41] is used to implement machine learning methods, including classifiers and dimension reduction techniques. Apart from the autoencoder and its variants, there are two other techniques (PCA and LDA) involved in comparing the dimension reduction performance with our methods.

LDA [42], compared with the previously mentioned PCA, is a supervised dimension reduction method. Unlike PCA, LDA aims to find a projection function that minimizes within-class distance while maximizing between-class distance simultaneously. Four classifiers are applied to classify samples and evaluate the performance of our dimension reduction technique. The classifiers are described as follows:(i)K-nearest neighbors (KNN): KNN is a supervised classification method. It is a nonparametric classifier. It uses the majority class of K-nearest neighbors assigned to the test samples.(ii)Decision tree (DT): DT is a tree-based classifier. The internal node represents an if rule accounts for an attribute. The leaf node represents the final output label.(iii)Adaptive boosting (AdaBoost): AdaBoost is an ensemble algorithm. It constructs a strong classifier from a collection of weak classifiers. The construction method is sequential. The new model is created to correct the error from the last model.(iv)Bagging: Bagging is an ensemble learning algorithm also. A number of the base classifier is trained on different data sets. Bootstrap resampling is used to create the data sets.

We employ DT as the base classifier for the AdaBoost and Bagging. In the hidden layers of the neural network, we apply the ReLU activation function [43]. Adam optimization [44] is used to train the neural network. The neuron number settings of a neural network are hard to select. We employ three hidden layers for the SupervisedAE since a single layer autoencoder is too shallow to learn a better representation. Some other hyperparameters are listed in Table 3. The hyperparameters of other autoencoders have the same settings as the SupervisedAE.

After extracting the latent representations from SupervisedAE, the PCA algorithm is used to reduce dimensions further, and various classifiers are trained to classify samples. To select optimal hyperparameters for these models and evaluate the performance, we use the nested tenfold cross-validation procedure. In the process of nested cross-validation, there are two loops: the inner loop and the outer loop. The outer loop splits the data set into a training set and a testing set with a ratio of 9:1. The inner loop uses the classical cross-validation procedure to select optimal hyperparameters on the training set. And then the outer loop evaluates the performance using optimal hyperparameters derived from the inner loop. Both the inner loop and the outer loop are repeated ten times. The optimization target in the inner loop is accuracy in this study. The final performance is the averaged results calculated on the testing set. Table 4 displays the detailed hyperparameter settings. For PCA and LDA algorithms, we vary the number of components from 1 to 16. It should be noted that the PCA used in our proposed model has the same setting. Some hyperparameters for the classifier are selected from a collection of values.

Next, we compare our proposed dimension reduction technique that leverages the SupervisedAE and PCA, with several other dimension reduction methods. The performance results are compared to other existing detection methods then. Finally, we conduct the performance analysis of hyperparameter settings.

4.3.1. Comparisons with Other Methods

The performance of our suggested technique is compared with the performance of classifiers utilizing original features first. The purpose of the comparisons is to show that our suggested technique achieves the desired dimension reduction results. The accuracy and F1 score are displayed in Tables 5 and 6, respectively.

In Table 5, we present the accuracy under three conditions: no dimension reduction, SupervisedAE only, and SupervisedAE combined with PCA. There are four classifiers for each condition. When using the original features, the Bagging approach achieves the best accuracy with 0.8986. The KNN classifier comes in second with an accuracy of 0.8691. The results of the other two methods are lower. Each of the four classifiers improves differently while using reduced features extracted by the SupervisedAE. The KNN classifier, in particular, has the best accuracy of 0.9365. It has an increase of around 0.067 when compared to the results utilizing original features. The DT and AdaBoost have both increased by roughly 0.04. The Bagging is raised by roughly 0.028, which is smaller than the change in KNN. The additional PCA reduces the dimension even further (from 64 to some value in 1–16), as can be seen in the table, yet the performance does not suffer significantly. On some specific data sets, the model performs worse than without PCA when using the KNN classifier, but the overall average results are comparable. Both methods yield more stable results than using original features as the standard deviation illustrates. The KNN classifier has a standard deviation of just 0.005.

The F1 score displayed in Table 6 can draw a similar conclusion. As previously stated, the nested tenfold cross-validation procedure chooses hyperparameters based on accuracy; hence, the F1 score is slightly lower than accuracy. In conclusion, when compared to the classifiers using original features, our suggested technique successfully reduces the dimension of the features. Furthermore, all four classifiers have better performance in terms of accuracy and F1 score when using the reduced features. The additional PCA method decreases the dimension further without sacrificing too much performance.

To further verify the effectiveness of our method, we compare the corresponding results with various dimension reduction methods in Figures 5 and 6. In the figures, the mean values of accuracy and F1 score calculated over 15 data sets are presented. We compare all of the dimension reduction methods using four classifiers. Dimension reduction methods include PCA, LDA, AE, and SparseAE. The “None” means that the classifiers are trained on the original features.

The accuracy results are shown in Figure 5. For the four classifiers, the LDA yields the worst results. The performance is lower than when utilizing original features. When employing the KNN classifier, PCA produces better results than when using original features, indicating that the dimension of features is properly reduced. For other classifiers, PCA has poor results. For all four classifiers, AE and SparseAE have similar results. SparseAE has slightly poorer performance than AE. However, their results are inferior to PCA. In theory, they should be able to achieve similar results to PCA. The experimental result here may be caused by insufficient training of AE. But the training settings are the same with our proposed method, which illustrates the improvement of our method. In comparison to other dimension reduction methods, our proposed technique that combines SupervisedAE and PCA shows promising results. The F1 score in Figure 6 can draw a similar conclusion.

From the tables and figures above, we can observe that the KNN classifier has the best performance. We utilize it as the final classifier to compare with some other existing attack detection methods. There are four distinct baselines. RSKNN [10] and RSRT [9] are the first two baselines. These two techniques use the random subspace method to build a large number of trained classifiers. The majority voting rule is used to determine the final result. KNN and random tree are used as the base classifier. CFS-MLP [27] and CFS-KNN [27] are the remaining two baselines. These methods use a correlation-based feature selection method first and then train MLP and KNN classifier on the selected features, respectively. The comparisons of accuracy and F1 score are displayed in Table 7.

With an accuracy of 0.9188 and an F1 of 0.9187, CFS-KNN outperforms the other three baselines. The second one is RSRT, which has an accuracy of 0.9131. Table 7 confirms that our suggested method has the best accuracy and F1 score. When compared to the best baseline CFS-KNN, the accuracy of our method is 0.018 higher than it. Furthermore, our method has a low standard deviation, indicating that its performance is stable. In conclusion, when compared to other detection methods, our technique yields the best results.

4.3.2. Analysis of Hyperparameters

In this section, we analyze the influence of different hyperparameters. During the experiment, the hidden layer settings of SupervisedAE and the number of PCA components are most important. We compare the different hidden layer settings first and then show the influence of various PCA components.

In general, choosing optimal hidden layer settings for neural networks is difficult because the training time is longer than traditional machine learning methods and the search space of the hyperparameter is huge. As previously stated, we set the hidden layers to three, as one layer is difficult to obtain a better representation, and adding additional layers would increase the training time. Next, we use six combinations to demonstrate the differences, and the results are displayed in Table 8. Except for the neuron numbers in the hidden layer, all of the experiments in this part have identical settings. Also, all of these use the PCA algorithm to reduce the dimension of features further.

Table 8 confirms that when there is only one layer, the performance is poor. Its results are the lowest, especially when “32” is used. The performance improves as the number of layers increases. The accuracy of three layers, “64-32-64,” increases by 0.02 than one layer setting of “96.” In addition, when the neuron number of three layers is “96-32-96,” the accuracy is lower than when it is “96-64-96.” This is most likely related to the fast decline of neuron numbers. The F1 score can also reach the same conclusions. In conclusion, the three hidden layers have reached desired results.

In the above experiments, the optimal number of PCA components is chosen by nested cross-validation. But it is necessary to demonstrate performance with the different number of PCA components. In this part, we conduct experiments involving two conditions. The two conditions differ in whether to use SupervisedAE. The condition “With SupervisedAE” means that the SupervisedAE and PCA are combined to reduce dimension. The condition “Without SupervisedAE” means that only the PCA algorithm is used to reduce dimension. We plot the relationship between accuracy and the number of components in Figure 7. It includes four classifiers. After combining with two conditions, there are eight lines in the figure. The different number of PCA components are represented by the x-axis. As previously stated, we set the number of components from 1 to 16. The y-axis represents the accuracy values.

The accuracy of all classifiers in Figure 7 presents an increasing trend as the PCA components rise. First, we analyze the condition of employing the SupervisedAE. The accuracy gradually improves and eventually becomes stable. The accuracy is poor with a lower PCA component. When it is 1, the accuracy for all lines is approximately 0.30. As it is increased to 2, the accuracy improves dramatically, reaching about 0.80. The accuracy becomes stable when the number of components is increased to 9–10. The KNN and Bagging classifiers achieve higher results. However, the mean accuracy values of Bagging are lower than the KNN classifier. This is consistent with the conclusion from Table 5. The AdaBoost and DT classifiers produce nearly identical results. In the figure, these two plot lines are overlapping.

When using the PCA algorithm to reduce the dimensions only, the accuracy of classifiers improves as the number of PCA components grows also. When the number of components reaches 10, the performance remains stable. The highest accuracy is achieved by KNN and Bagging classifiers. The DT and AdaBoost classifiers produce identical accuracy as the lines of these two methods are overlapping. But the performance is lower than that with SupervisedAE. For the sake of simplicity and space-saving, we have omitted the F1 figure, as the figures for these two metrics are nearly identical. In conclusion, the additional PCA algorithm produces stable results when the number of components is higher. In the experiments, the nested cross-validation would choose the optimal number of PCA components.

5. Conclusions and Future Work

Since many attack events that aim at ICSs have been reported, the security issues of ICSs are becoming increasingly important. To provide security protection, IDS examines data records in the ICSs and raises alerts for malicious operations. Especially, IDS based on machine learning and deep learning has been investigated thoroughly. However, high-dimensional data still poses a challenge. To improve the performance of IDS, we propose a dimension reduction technique based on SupervisedAE and PCA algorithm in this study.

This research uses an improved autoencoder named SupervisedAE by introducing label information during the training time. In this way, the new autoencoder model is trained with reconstruction error and classification error simultaneously. Compared with the conventional autoencoder, the SupervisedAE obtains more discriminative latent representations. The experiment results show that the performance of classifiers trained on latent representations extracted from SupervisedAE outperforms that trained on original features. In addition, the PCA algorithm is applied to reduce the dimension of features further. When compared to other dimension reduction methods, our combined technique performs best. The KNN classifier, in particular, yields the greatest results. Furthermore, when compared to other existing detection methods, the suggested technique has higher accuracy and F1 score, demonstrating its efficacy.

In the future, there are several directions to extend our work. Some other variants of autoencoder are worth investigating such as denoising autoencoder, and variational autoencoder. Since we focus on the dimension reduction technique in this paper, only a few machine learning classifiers (e.g., KNN) are used. To boost performance even further, more complicated classifiers could be applied. In addition, we only test the dimension reduction technique on the power system data set in this work. It is critical to test the technique on more ICS data sets to verify its efficacy and robustness.

Data Availability

The data set we used in this paper is available at https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets. Readers who are interested in our research can access the data set and reproduce our results.

Conflicts of Interest

All the authors hereby declare that there are no conflicts of interest.

Acknowledgments

This research was funded by the National Key Research and Development Program of China (no. 2021YFB2012400).