Abstract

Recently, machine learning techniques, especially supervised learning techniques, have been adopted in the Intrusion Detection System (IDS). Due to the limit of supervised learning, most state-of-the-art IDSs do not perform well on unknown attacks and incur high computational overhead in the Internet of Things (IoT). To overcome these challenges, we propose a novel IDS based on unsupervised techniques, namely, UTEN-IDS. UTEN-IDS uses the ensemble of autoencoders to handle the network data and performs the anomaly detection by an Isolation Forest algorithm. The effectiveness of the proposed method is verified using two benchmark datasets. The results show that our approach has significant advantages in classification performance and proves its utility in the IoT network when compared to other approaches.

1. Introduction

Network security has become an essential challenge in information systems, especially in the Internet of Things (IoT). IoT is a network of devices such as computers and sensors. The devices in IoT are likely to be vulnerable to various attacks [1]. According to the report [2], two-thirds of enterprises have already experienced a cybersecurity incident linked with their IoT devices. Some common cyberattacks, such as Distributed Denial of Service (DDoS), call for further research [3] and pose serious threats to IoT. The Intrusion Detection System (IDS) has been created to detect attacks in modern networks, including IoT networks. By analyzing the traces of network intrusion, IDS can detect attacks and raise alarms in real time.

However, traditional IDSs suffer from weaknesses such as a low detection rate and high false alarm rate [4]. Nowadays, Machine Learning (ML) techniques are used in IDSs to overcome weaknesses. ML is a type of interdisciplinary field that emulate human intelligence. Supervised learning and unsupervised learning are two types of ML [4]. The difference between them lies in the use of labeled data. The state-of-the-art intrusion detection methods are almost supervised learning methods, which means these methods need both attack and normal data for model training. They have achieved a high detection rate for some well-known attacks. However, the labeling process is completed manually, and the samples may be mislabeled. Moreover, the labels need to be updated regularly. Because of the lack of relevant labels, these methods fail to detect increasing novel attacks in the current IoT network.

Deep learning (DL) is one branch of ML, while the neural network is the key component of DL. Compared to ML models, DL models deal with big data effectively and gain more and more attention. Some state-of-the-art neural networks, such as Reservoir Computing Network (RCN), are applied to target detection [5], text classification, cybersecurity, and so on. In intrusion detection, DL models with complex structures still require an amount of labeled data for training and cost many computational resources. However, the high computational overhead makes these models challenging to be used for IoT devices with few resources [6].

In general, the existing solutions are in the following difficulties: (a)Over-reliance on supervised techniques.(b)High computational overhead.(c)Poor detection performance on unseen attacks.

We propose an Unsupervised Technique ENsemble-based IDS (UTEN-IDS) to solve the above challenges. UTEN-IDS combines unsupervised technologies in an ensemble way. For example, an ensemble of lightweight autoencoders (AEs) in UTEN-IDS is used to reconstruct the input data and calculate the Root Mean Square Errors (RMSEs). After the reconstruction, the Isolation Forest (IF) [7] uses the RMSEs for final classification. Excellent detection performance is the most fundamental requirement for an IDS [8]. We use the CES-CIC-IDS 2018 dataset [9] and the MQTT-IOT-IDS2020 dataset [10] to verify the performance. The results show that UTEN-IDS has a better detection performance. Because UTEN-IDS uses lightweight models and generates low overhead, it can be applied to IoT network.

The contributions of this paper are as follows: (a)We propose UTEN-IDS, an unsupervised technique-based IDS. The IDS is composed of unsupervised techniques, such as AE and IF. AEs learn the input data without the label, and IF is used to make the final decision. AE can extract crucial features from data, and the IDS takes full advantage of AE in intrusion detection to improve the detection rate.(b)We propose a feature clustering method, which uses the Mean Shift algorithm to cluster the features based on the correlation of features. Without any predefined parameters, this method divides the closely related features into the same cluster for the training of AE.(c)We evaluate the performance of the proposed method. The experimental results suggest that the proposed UTEN-IDS is superior to the state-of-the-art approaches.

The rest of the paper is organized as follows: Section 2 discusses the related research; Section 3 introduces the theoretical knowledge; Section 4 describes the proposed method; Section 5 presents the experiment setup and results; Section 6 concludes the paper.

The ML classifiers used in IDSs, such as Decision Tree (DT) [11], Random Forest (RF) [12], Support Vector Machine (SVM) [13], and neural network [14], can analyze the features of network traffic to distinguish malicious activities from network traffic. Ingre et al. [15] proposed a detection method based on the DT classifier, and the NSL-KDD [16] dataset was used to test the performance. The experimental results showed that the proposed method could achieve high detection rates. Zhang et al. [17] presented an approach based on the Convolutional Neural Network (CNN) and sampling technique, with an accuracy of 98.82% on the UNSW-NB15 dataset [18] for binary classification. In [19], the authors designed a feedback mechanism to detect errors based on the recent detection results. They used Multilayer Perceptron (MLP) as the classifier, with an accuracy of 97.66% on the NSL-KDD. As one paradigm of ML, the ensemble method constructs the ensemble with base classifiers to improve accuracy [20]. Voting is the simplest way to implement the ensemble method. Gu et al. [21] proposed a method based on SVM ensemble and feature augmentation, with an accuracy of 99.41% on the NSL-KDD dataset. In their research, the quality-improved technique [22] provided high-quality training data, and an SVM ensemble was applied for classification. In [23], the authors proposed a Voting-based Neural Network (VNN). The method creates various neural network models and picks the best models from them. The chosen models are used to perform the detection by majority voting. Although the above methods achieve high accuracy, they need labeled data for training and are insufficient to provide security against unknown attacks.

Supervised learning is more common than unsupervised learning for intrusion detection. However, the unlabeled traffic data generated in the network is suitable for unsupervised learning, such as Unsupervised Feature Selection (UFS) [24]. In [25], the authors used one UFS technique to select features from intrusion detection datasets, and Redundancy Penalization (RP) technique based on mutual information was applied to filter the features further. Carrasco and Sicilia [26] proposed an unsupervised neural network model based on Natural Language Processing (NLP), tested against the UNSW-NB15 dataset, and it achieved 99.20% precision and 82.07% recall. Bohara et al. [27] used two unsupervised clustering algorithms for intrusion detection. But this method needs a combination of host and logs to achieve a good detection performance. Mingqiang et al. [28] used a graph-based algorithm to cluster the data and an outlier detection method to decide which cluster was malicious. This method remains computationally expensive due to the complexity of the algorithm.

In [29], the authors proposed Kitsune, an online intrusion detection method based on AE. Kitsune reconstructed the input data through AEs and used RMSE to record the reconstruction error. The maximum RMSE in the training phase was stored as a classification threshold. Kitsune performed anomaly detection tasks in a semi-supervised way. However, the threshold obtained in this way may be inaccurate. In [30], the authors proposed AE-IDS, used the RF algorithm to select features, and grouped the features by affinity propagation clustering [31], and AEs reconstructed the feature groups. AE-IDS selects features in a supervised way, and selected features are grouped based on the average, ignoring the correlation between the features. Zavrak and İskefiyeli [32] used Variational Autoencoder (VAE) for intrusion detection and calculated Reconstruction Probability (RP). The RP value is used for classification, but the proper threshold value was still hard to determine. In [33], the authors used AE to classify the intrusion behaviours, and IF is applied twice for reclassification. However, this method did not consider the effect of redundant features on detection performance and the shortcoming of IF when tackling high-dimensional data. The detection performance also depends on the predefined value of the threshold. Unlike [33], we use IF to perform the final prediction based on the output of AEs, and our solution does not require a predefined threshold.

Although some achievements have been made in AE-based intrusion detection, few researchers focus on the choice of threshold for classification. It is critical to find one proper threshold which directly affects the performance of the classifier. In this paper, the proposed method can address this issue. UTEN-IDS is inspired by the existing research, but it is pretty different from the above methods. It is composed of two layers. The first layer consists of AE, and IF is in the second layer. IF plays the role of the threshold by detecting unknown attacks adaptively.

3. Background

3.1. Autoencoder

AE is one kind of neural network. It is used to output an accurate data representation by learning the the low-dimensional features [34]. AE can transform the data into a lower-dimensional space [35], and we can perform the classification with the transformation differences.

Figure 1 shows the structure of the AE with three layers. Suppose that the input sample has features, and denotes the th feature where . Each feature corresponds to a neuron in the input layer separately. Encoding and decoding are two main working phasesof AE. The data is transformed from the input layer to the hidden layer in the encoding phase, while the transformation from the hidden layer to the output layer is described as decoding. is reconstructed after the two phases to obtain the output , and is close to . It is considered that an AE learns the function : .

The execution process of AE is not complex. Let denotes the th layer in AE, and denotes the total number of neurons in . Thus, the weights which connect to can be described as the matrix with rows and columns. The bias of connection is denoted as the vector with dimensions. Let record the total parameters for all layers, and . is randomly generated when AE is initialized. Forward propagation is used to activate the neural network layer by layer. is activated by , and is activated by …, the output layer is also activated by doing so. Let be -dimension vector generated by neurons in , the calculation of can be defined as follows: where is called the activation function.

To make the propagation process more effective, sigmoid is usually applied as F, and it is is given by: is calculated in the output layer when forward propagation is finished. AE tries to make the output value equal to the actual value . We denoted the above process as ; then, we have . After that, the Back Propagation (BP) algorithm is usually used to reduce the losses during reconstruction.

Finally, the pattern of data is learned by AE. If the input sample is different from the samples that AE has learned, there will be a significant error between and . In this work, the AE is only trained using normal data and learns the concepts of legitimate behaviours in the network. After that, the AE is tested with mixed data containing abnormal cases, leading to a high reconstruction error for the anomaly. Let denotes the dimensionality of . We can compute the errors of reconstruction by RMSE:

3.2. Isolation Forest

IF is an unsupervised learning-based anomaly detection algorithm, which focuses on the isolation of few outliers [7]. IF could be seen as the ensemble of isolation trees, which does not require the whole input data for training and captures the character of outliers through the samples of data. These trees work by splitting the data with the randomly selected feature value. The path lengths of anomalies are usually shorter than the normal ones in these trees. Based on the path length of trees, the anomaly score is calculated to identify the outliers. Suppose that the input dataset has samples, the anomaly score of sample is defined as follows: where denotes the path length in one isolation tree for , denotes the expected path length of these trees, and is a constant associated with the dataset. The outliers usually have high scores. IF has been applied in different fields due to its low memory consumption and high detection precision.

4. Methodology

In this work, we have proposed UTEN-IDS to identify cyberattack, especially unknown attacks. We aim to design a lightweight IDS for IoT networks, providing accurate detection performance. The framework of the UTEN-IDS is shown in Figure 2. UTEN-IDS works through 3 main phases: preprocessing, feature clustering, and anomaly detection.

We clean the input data and select the best features during the preprocessing phase. In the feature clustering phase, the features are grouped into several subsets according to the correlation between these features. After that, each sample in the dataset is divided into several sub-samples, which are distributed in different feature subsets. In the anomaly detection phase, the sub-samples in the same feature subset are processed separately by AE. All the AEs are considered an ensemble, and the number of AEs is also the number of subsets. When AE completes the reconstruction of the samples in the subset, we have the collection of RMSEs. We use the IF algorithm to make the final classification based on all the collections.

The current phase serves the next stage. As shown in Figure 3, only normal traffic data is collected for training, so there is no need to collect and label attack instances. The mixed data consisting of both normal and attack instances are used for the testing. In the following subsections, we will describe the proposed method in detail.

4.1. Preprocessing

The network traffic data with null values or infinite values, in many cases, cannot be used as the input of ML algorithms. Therefore, we process the input data in the first phase. We replace the infinite value and null value with zero for input data. Additionally, the features with low variance are removed because the information provided by these features is minimal, and they are not of benefit to the training of AE.

4.2. Feature Clustering

Feature clustering is the process that merges all features with high correlation into the same feature cluster. The purpose of feature clustering used in UTEN-IDS is to train the ensemble of AEs better. By doing so, redundant features are merged, and their impact on final classification is reduced. On the other hand, the features with the strongest correlation should be grouped into the same subset. Different subsets represent different characteristics of input data. An AE is trained with only one subset, which helps the AE learn the data pattern deeply. We will show the performance difference between the ensemble of AEs and the single AE in the experiment.

In this process, we first evaluate the correlation between the input features. The correlation coefficient between the vectors and is computed as follows: where , , and are the standard deviation of , the standard deviation of , the covariation of and , and the correlation coefficient, respectively.

Secondly, we define the distance between two features based on the correlation coefficient: where and is the number of features.

We believe that features are either relevant or irrelevant, and both positive and negative correlations show a link between the features. Therefore, the absolute value of the correlation coefficient is used in the equation. By doing so, the distance of features is limited to the range . If is close to , the two features have a strong correlation with each other. Then, we have the -by- distance matrix , where denotes the distance between and .

Finally, we use an unsupervised clustering algorithm to cluster the features. Specifically, we use the Mean Shift algorithm [36] to cluster . Mean Shift is an iterative algorithm for clustering. There are two main reasons for using Mean Shift: (1)Mean Shift does not require predefined parameters, such as cluster numbers and initial cluster centers.(2)The performance of Mean Shift is robust with acceptable algorithm complexity.

After that, the features in the same cluster are highly correlated, and the features in different clusters are almost irrelevant. If the input samples are all used for clustering, there will be a significant increase in the computational complexity. Therefore, we use 25% of the training set for clustering. Let represent the feature clustering result, a list of clusters , where , is the number of clusters, and is the number of features in . is seen as the feature mapping function. According to , the features of all training samples are grouped into subsets, which are used for the next phase. These subsets composes the training set. It should be noted that feature clustering is only performed in the training phase. For the testing process, UTEN-IDS maps the input samples to subsets according to .

4.3. Anomaly Detection

A two-layer unsupervised ensemble model is used for the anomaly detection phase. The ensemble of AEs is implemented in the first layer, and IF is in the second layer.

4.3.1. The Ensemble of Autoencoders

The standard DL models (e.g., MLP) are trained in a supervised manner and consume many computing resources. But AE is trained in an unsupervised fashion, and it does not require labeled samples. AE is also applied to the anomaly detection domain [37]. For the abnormal sample, the reconstruction error calculated by AE is different from the error of normal ones. The IDS can classify the samples correctly with the errors. Therefore, we choose AE as the core of UTEN-IDS and use AE to capture the changes in the network behaviours from the input.

As mentioned before, the AE is used to reconstruct the input sample . To make the reconstruction more efficient, we apply the following settings to all the AEs: (1)The number of layers is set to 3. AE with a three-layer structure can reconstruct well. On the other hand, larger layers will increase the computational overhead and require more time for training.(2)The input and output layers have neurons, where is the number of input features. The hidden layer has neurons because too many neurons in this layer may lead to overfitting.(3)The weights in the decoding phase is the matrix transpose of the weights in the encoding, where . It is known as Tied Weights [38]. Only a set of weights is adjusted by the AE, which speeds up the training process and enables AEs to capture more information from the data.

Suppose that there are feature subsets, they are , where represents the feature subset and . According to , sample is mapped to subsamples; then, we have , and is in . All the sub-samples in are used to train individual AE separately, and the number of AEs is . We take as an example to explain this process.

For , the weights are initialized randomly before training. Firstly, the input is 0-1 normalized to get . Secondly, based on and the weights, forward propagation using the sigmoid function is completed through the entire network; then, we have in the output layer. Thirdly, the BP algorithm is used to propagate the errors during the process. Based on the errors, we use Stochastic Gradient Descent (SGD) to tune the weights of . By doing so, every sample is learned by AE only once and the weights are updated gradually. SGD has the advantage of being able to train a detection model online [29]. After basic training of UTEN-IDS locally, we can deploy it on the network for detections or continue training online. Finally, the RMSE between and is calculated and returned as the output for . We repeat the process until all the AEs are trained. After that, the ensemble of AEs is generated.

After training, AE can execute the prediction on unknownsamples. The input sample is also mapped into subsamples according to . The sub-samples are used as the input of , respectively. More specifically, the weights of are not updated anymore, only forward propagation is performed and RMSE is returned as a prediction score. If one attack instance is processed by AE, we will have , and , where denotes the RMSE.

4.3.2. Detection Using Isolation Forest

The ensemble of AEs complete reconstruction for input subsets; then, we have the RMSEs . However, it is insufficient to make accurate decisions using these RMSEs simply.

In [29], the maximum RMSE in the training phase is used as the classification threshold. Let denote the threshold. For a given instance, if the value of reconstruction error is higher than , this instance will be considered an anomaly. Furthermore, the larger values indicate more significant anomalies. The threshold affects the classification performance, and the proper value of the threshold is usually tuned by experiments. It is not easy to find the optimal . In other words, the existing solutions do not have the self-learning ability, and they can not detect attacks adaptively.

Based on the RMSEs, we use the IF algorithm to solve the threshold problem. The IF model brought great performance in the area of anomaly detection. It also works well in situations where the training set does not contain anomalies [7].

One advantage of our method is its self-learning ability. The IF algorithm distinguishes anomalies from normal activities, making UTEN-IDS detect different kinds of attacks in an adaptive manner. Specifically, the trained AEs are used to predict the normal samples, and the obtained RMSEs are used to train the IF. Then, AE and IF process the test set which contains attack samples. The workflow of anomaly detection can be summarized as follows:

Step 1. Split the input training set into two datasets, namely, .

Step 2. AEs are trained on .

Step 3. AEs predict and obtain the collections of RMSEs, .

Step 4. IF is trained on .

Step 5. AEs predict the testing set and obtain the RMSEs, then IF classifies the RMSEs.

is set to 25% of the training dataset, and the samples of is also used in the feature clustering phase to obtain . The remaining 75% of samples are used to train AEs. The samples of are unknown to AE, and the RMSEs calculated by AE will be close to the ones in the realistic scenario. This allows UTEN-IDS to take full advantage of the training set. If the whole training set or is used as , we find that it takes more time to create the model, but the detection rate is not improved.

If the RMSE of one instance is considered an outlier by IF, this instance will be classified as an attack. IF suffers from a “curse of dimensionality” [7], but the critical low-dimensional features (RMSEs) will help IF achieve excellent detection performance.

5. Experiments and Analysis

5.1. Dataset

The latest intrusion detection dataset represents the modern malicious behaviours in the current network. For this reason, the CSE-CIC-IDS 2018 dataset and the MQTT-IOT-IDS2020 dataset are selected to demonstrate the effectiveness of the proposed method.

The CSE-CIC-IDS 2018 dataset was published by the Canadian Institute for Cybersecurity (CIC) in 2018, which collected a variety of modern attack behaviours. This dataset includes the experimental machine’s network traffic and system logs, along with more than 80 features extracted from the source pcap files by CICFlowMeter, a network traffic flow generator and analyzer [39]. The dataset consists of different attack scenarios, and these scenarios are stored in subdatasets. However, not all the attack samples in the dataset are suitable for testing models. Some attacks like “SQL injection” and “Infiltration” are insufficient to detect with network traffic [30]. In this work, we pay close attention to common attacks, such as DoS, DDoS, and brute force attack. Therefore, as shown in Table 1, 6 subdatasets are used in our experiments, involving eleven types of attacks. To test the performance of UTEN-IDS properly, the features such as source and destination IP are removed.

In this paper, we use the MQTT-IOT-IDS2020 dataset to test the performance of UTEN-IDS in the IoT network. The Message Queuing Telemetry Transport (MQTT) protocol is one of the most standard communication protocols used in IoT. The dataset consists of several kinds of IoT traffic, which is generated under the simulated environment of the MQTT-based IoT network. For this dataset, we are still concerned about the attacks such as brute force. We extract two kinds of brute force attacks from the original bi-directional flow-based dataset. The data record the characteristics of the flows in the IoT. The final dataset used in our experiment is summarized in Table 2.

5.2. Evaluation Metrics

To evaluate the performance of UTEN-IDS, we use the following evaluation metrics: recall, F1 score, and Average Accuracy (AA). Recall reveals the detection rate, and the F1 score comprehensively evaluates the detection ability. It is the harmonic mean of the precision and recall, which can be formulated as:

The average accuracy evaluates the generalization ability of the classifier. In the binary classification task, it is the mean value of specificity and recall. F1 score, recall, and AA are all positively correlated with the detection performance.

5.3. Experimental Setup

We implement the UTEN-IDS in Python. The experiments are conducted on the machine with Linux operating system, 32 GB RAM, Tesla V100 GPU, and Xeon GOLD 6148 CPU.

UTEN-IDS is only trained with normal traffic data in specific network scenarios. In Table 3, for both the sub-datasets of CES-CIC-IDS 2018 and the IoT dataset, 80% of normal samples are used in the training process, 10% of normal samples and 50% of attack samples are used as the test set, and the remaining samples are for validation. The normal samples are far more than the attack samples, and AEs need many samples for the training process. Therefore, 80% of the normal data is selected for training. The proportion of each class in the test and validation set is the same. To evaluate the performance of UTEN-IDS, different types of attacks are combined with normal data as the test set, respectively.

5.4. Comparative Experiments

“Contamination” is a key parameter for IF, which means the amount of contamination of the dataset. This parameter determines the cutoff value of anomaly score for IF. To obtain a robust model, we choose a range of “contamination” values and perform the comparison on the validation dataset. Through experiment, we observe that the values of 0.1 and 0.02 offer optimum performance for CES-CIC-IDS 2018 and MQTT-IOT-IDS2020, respectively.

Our method completes the detection through the anomaly detection algorithm, and the selection of anomaly detection algorithms affects the final classification result. Only those state-of-the-art algorithms in anomaly detection are worth considering. For this reason, we select Elliptic Envelope (EE) [40] and Local Outlier Factor (LOF) [41] as competitors of IF. They are all robust and explicable anomaly detection algorithms. Table 4 shows that the UTEN-IDS using IF achieves the best performance on all the metrics on the validation set of the IoT dataset.

Table 5 shows the effect of feature clustering on the final performance. We conduct a comparison experiment based on the two schemes: single AE and multiple AEs. Single AE represents only one AE used in UTEN-IDS, which skips the feature clustering phase; multiple AEs represent the UTEN-IDS with the ensemble of AEs. It can be inferred that multiple AEs outperform single AE from Table 5.

UTEN-IDS solves the issue of intrusion detection by using the IF, and IF is trained based on the RMSEs. To prove that the AE helps improve the performance of the anomaly detection algorithm, we compare the performance of UTEN-IDS with the IF. Here, IF is trained based on the raw data. In Table 6, the obtained results of UTEN-IDS are significantly better than IF model, which proves that IF with the RMSE attains a higher performance compared to IF with the raw data.

The above experiments explain the necessity of some steps or hyperparameters in our method. We conduct comparisons with several detection methods in the following subsections.

5.4.1. Performance Analysis on CES-CIC-IDS 2018 Dataset

In the selection of competitors, we consider the intrusion detection methods belonging to the state-of-the-art. Therefore, we select AE-IDS, KitNET [29], and AE. They are all advanced detection methods. Here, AE is an unsupervised neural network, which network structure is the same as UTEN-IDS. We use one statistical approach used in [42] to set the threshold of AE reconstruction loss, and the threshold is used for classification. As mentioned earlier, Kitsune is an online method and KitNET is the core detection algorithm of Kitsune. The anomaly threshold is a crucial parameter for this method. We tune the parameter value by experiments.

To prove the advantages of our proposed method against new attacks, we compare the performances of UTEN-IDS with the competitors on the CES-CIC-IDS 2018 dataset. The four approaches use the same samples fortraining. Then, the performance is measured by the testing set.

Table 7 shows the performance comparison of the four methods on the different attack classes of the CES-CIC-IDS 2018 dataset. For DoS attacks-GoldenEye, SSH-brute force, and DoS attacks-HOIC, the accuracy values of UTEN-IDS are significantly higher than that of others.

Table 8 shows the comparison results of the F1 score on the dataset. F1 scores take into account both the detection rate and classification performance. For most subdatasets, the performance of UTEN-IDS is better than that of the other methods.

In general, UTEN-IDS achieves the best results on most subdatasets. The result shows that UTEN-IDS has a strong generalization and detection ability against DoS, DDoS, and brute force attacks. However, the proposed method does not perform well in detecting DoS attacks-Hulk, DoS attacks-SlowHTTPTest, and DDoS attack-LOIC-UDP.

For SlowHTTPTest attacks and LOIC-UDP attacks, we notice that UTEN-IDS outperforms KitNET and AE, but it is not as good as AE-IDS. One reason is the redundant features of the subdatasets. We observe that AE-IDS uses the RF algorithm to select less than 20 features and achieves the best records, which indicates that many features can be removed. On the contrary, UTEN-IDS uses more than 60 features, including many noisy features. These features have a negative effect on the result. On the other hand, we find that the reconstruction errors of the two types of attacks are both great through the validation set. We take 0.001 as the “contamination” value of IF and implement the experiment again. For both attacks, the accuracy of UTEN-IDS reached 99.98% and obtain a better performance.

In detecting “DoS attacks-Hulk,” the F1 score of our method does not reach 20%. The reason for this is that there is some similarity between the features of the subdataset. We calculate the mean RMSE values of normal samples and the attack samples, respectively. We find that their difference was less than 0.01, which shows that these two kinds of samples are similar. Therefore, the IF algorithm cannot make accurate decisions based on the RMSEs, which leads to poor performance.

5.4.2. Performance Analysis on the IoT Dataset

To evaluate the performance of our method in the IoT environment, we compare UTEN-IDS with other detection methods using the extracted samples of MQTT-IOT-IDS2020.

Table 9 shows the detailed performance comparison of the four methods on the test set. The recall of UTEN-IDS (99.45%) is lower than the best record (100%) in detecting MQTT brute force attacks. However, UTEN-IDS achieves the best accuracy and F1 score records of each class. The results indicate that UTEN-IDS cope effectively with the brute force attacks in the IoT network.

To further interpret the effectiveness of UTEN-IDS, we conduct running time comparisons on the four detection methods. Table 10 shows the total running time of these approaches based on the extracted samples of MQTT-IOT-IDS2020. It can be seen that the time cost of UTEN-IDS is lower than that of AE-IDS and KitNET. Although AE takes the shortest time, the classification performance is not as good as UTEN-IDS.

Figure 4 shows the results for binary classification performance based on the IoT dataset, which are measured by recall, F1 score, and AA. Both kinds of brute force attack samples are used to test the performance of different detection methods. We notice that both the F1 score and accuracy of UTEN-IDS are higher than other methods. The comparison results suggest that the proposed method is superior to other intrusion detection methods.

The results of the above experiments have demonstrated the superiority of UTEN-IDS. The proposed method has not only a robust detection performance but also a low time complexity.

6. Conclusion

The threats of IoT intrusion are increasing day by day. In our opinion, the solution is IDS. In this work, we proposed UTEN-IDS, a lightweight IDS based on unsupervised techniques, to tackle IoT security threats. It was divided into preprocessing, feature clustering, and anomaly detection phases. A variance filtering method was used to select features in the preprocessing stage. The feature clustering phase was used to obtain the feature subsets. The anomaly detection module of the proposed method was a two-level ensemble model. AEs were used in the first level, and IF was in the second level. The AEs reconstructed the input feature subset and calculated the RMSEs. The IF performed the classification based on all the RMSEs.

Two public datasets, CES-CIC-IDS 2018 and MQTT-IOT-IDS2020, were used to verify the performance of the proposed method. The results showed that our method is superior to the other methods. However, our approach did not perform well in detecting some attack types, such as “DoS attacks-Hulk.” In our future studies, we plan to optimize our method to improve the detection rate of attacks that are hard to classify.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (grant number 61906099) and the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (grant number KF-2019-04-065).