Network intrusion detection is one of the critical techniques to enhance cybersecurity. Several few-shot learning-based methods have recently been proposed to alleviate the dependence on large training samples in many supervised learning methods. However, it is still a challenge to achieve real-time higher-accuracy intrusion detection which is an essential requirement for high-speed network security. In this study, we propose a novel few-shot learning-based network intrusion detection method to address this challenge. Specifically, we improve the detection accuracy and real-time processing speed simultaneously in the metric procedure via two mechanisms: (i) we utilize a hard sample selection scheme as a refining stage of our triplet network model training to increase the detection accuracy; and (ii) we design a lightweight embedding network and parallelize the metric feature extraction process to achieve real-time analysis speed. To evaluate the proposed method, we construct few-shot learning-based datasets by using two real and heterogeneous network traffic intrusion detection data sources. Extensive results demonstrate that our method outperforms the state-of-the-art methods in terms of real-time performance and high detection accuracy of malicious samples.

1. Introduction

Network intrusion detection plays a key role in establishing secured and reliable networks [1]. Network Intrusion Detection System (NIDS), usually a binary classification model, can effectively distinguish between abnormal attacks and normal traffic, thus ensuring the stable operation of the network. Recently, to tackle the problem of insufficient training network attack data, several few-shot learning-based detection methods, including Siamese networks [2], prototypical networks [3], and the ensemble model [4], have been proposed to deal with limited abnormal samples during training. Compared to traditional methods, such as random forest [5], deep autoencoder [6], and ResNet [7], few-shot learning-based network intrusion detection methods do not rely on a large number of samples for training and have shown better performance in solving complex and frequent types of attacks [8, 9].

Some new approaches have been recently proposed to improve the classification performance of existing few-shot learning-based models. For example, He et al. [10] used an attention mechanism to enhance feature representations extracted from a metric learning network. Xu et al. [9] introduced 3D temporal convolutional networks to exploit temporal information to improve the accuracy of NIDS. However, these methods still suffer from the reliability issue of existing metric learning models when representative data samples are not effectively selected at the training stage. In addition, these advanced few-shot learning methods have paid little attention to detection efficiency even though real-time processing performance is essential to ensuring high-speed network security. Compared to classical deep neural network models, metric learning-based models have higher computational complexity as similarities between the testing and training samples in a supporting set are computed to extract the features for the final classification [11].

In our work, we propose a novel parallelized triplet network to improve the real-time detection performance. Specifically, the proposed method uses an initialized triplet model to select hard samples which cannot be identified in a standard triplet network to enhance the generalization ability of the model. To achieve real-time processing performance, we develop a lightweight triplet network by introducing a depthwise separable convolution algorithm and a global average pooling (GAP) mechanism to reduce the number of parameters and computational complexity. This new network architecture allows feature extraction from the triplet network to be undertaken in parallel. As such, different triplet sample pairs created from the same test sample during the detection process can be simultaneously input to the model for feature extraction so that the real-time performance can be improved. The contributions of this study are summarized as follows:(i)We conceive a multistage few-shot training framework that utilizes an initialized triplet model to select and establish a set of representative training pairs for more effective triplet loss training(ii)We design a lightweight deep triplet network. The efficiency of the network is improved by using depthwise separable convolution and GAP to achieve fast feature extraction and high concurrency of multiple models by reducing computational complexity(iii)Furthermore, we develop a mechanism that parallelizes the feature extraction of the triplet model when calculating the similarities between the testing sample and samples in the supporting set so that the real-time processing performance is achieved.

The remainder of this study is organized as follows: Section 2 discusses the related work of the proposed method, and Section 3 describes the overall detection framework, including the underlying lightweight triplet network and the parallelized triplet metric model. Section 4 presents the datasets, evaluation metrics, and experiment results. Finally, the work is concluded in Section 5.

2.1. Methods of Network Intrusion Detection

Many existing research works have been developed to address the reliability of NIDS. These efforts are usually made on exploiting machine learning and deep learning algorithms. A fuzzy rule-based automatic intrusion detection system [12] was proposed as a solution to deal with precise measurement and uncertainty in the judgment of each criterion. Furthermore, fuzzy TOPSIS (technique for order of preference by similarity to ideal solution) was used for response prioritization in multicriteria decision-making. Iannucci and Abdelwahed [13] proposed a probabilistic model-based intrusion detection system built on a multiagent discrete-time Markov decision process (MA-MDP), which effectively captures the dynamics of both the defended system and the attacker. Wu et al. [14] proposed an intrusion detection method by using a convolutional neural network. This method converted the vector format of the original data into an image format and CNNs were applied to extract image features to detect intrusions.

A reliable intrusion detection system needs to consider both detection accuracy and efficiency. Ambusaidi et al. [15] proposed a mutual information-based algorithm that analytically selects the optimal feature for classification. This mutual information-based feature selection algorithm can handle linearly and nonlinearly dependent data features and irrelevant features from the original data. The work proposed by Selvakumar and Muneeswaran [16] deployed a filter and wrapper-based approach where the firefly algorithm was used in the wrapper to search for the best subset of features. The SwiftIDS approach proposed by Jin et al. [17] achieved the abovementioned objectives in two ways. One way was to simplify data preprocessing by using LightGBM support for classification features. The other way was to analyze the traffic data arriving in different time windows through a parallel intrusion detection mechanism. In these ways, the delay caused by the later-arriving data waiting for the end of the intrusion detection cycle of the first-arriving data can be avoided. Due to the devices of WSNs that do not have powerful processing performance due to power limitations, Zhao et al. [18] proposed a lightweight dynamic autoencoder network method for NID, which realizes efficient feature extraction through lightweight structure design. Furthermore, they proposed a novel NID method [19] for IoT based on a lightweight deep neural network. To avoid high-dimensional raw traffic features leading to high model complexity, the method used the PCA algorithm to achieve feature dimensionality reduction. Besides, the classifier used the expansion and compression structure, the inverse residual structure, and the channel shuffle operation to achieve effective feature extraction with low computational cost.

While significant progress has been made, these existing state-of-the-art methods still face challenges. For example, simplifying preprocessing methods for data or dimensionality reduction of data can reduce unnecessary computations, but the computational complexity of the algorithms has not been explored much. Second, these methods do not pay much attention to the scarcity of attack samples in real networks.

2.2. Few-Shot Intrusion Detection

Few-shot learning models have been proposed to address the tasks with a limited number of training samples [8]. For example, the prototypical networks [20] feature the same embedding function for the support and query sets, turning the classification problem into the nearest neighbour problem in the embedding space. The relational networks [21] are constructed by constructing neural networks to calculate the distance between two samples and thus analyze the degree of matching. The matching networks [22] are characterized by two different embedding functions for the support and query sets, with the output of the classifier being a weighted sum of the predicted values between the support set samples and the query set samples. The Siamese networks [23] are trained by constructing pairs of samples as input to the twin structure by random combination and calculating the distance between the pairs to measure similarity. Yu and Bian [3] exploited a deep convolutional neural network algorithm that was integrated into the metric learning network to calculate the Euclidean distances of different samples to further distinguish between normal traffic samples and attack traffic samples. Ouyang et al. [24] used orchestrating one-hot encoding and principal component analysis for data preprocessing and built a complete FS-IDS model by further training of the preliminary IDS model. He et al. [25] proposed a few-shot detection method based on CNN and autoencoder. The method used some anomalous samples to build a structure for extracting deep features and selected the features of normal samples as training data. The method also introduced an attention mechanism to improve detection accuracy. Similarly, Zhou et al. [2] constructed a Siamese CNN coding network to measure the distance of input samples based on their optimized feature representations and proposed a robust cost function including three specific losses to improve the training efficiency. As an advanced detection method, Xu et al. [9] further processed traffic data from spatial and temporal features. This method combined temporally neighbouring samples in the same connection into a spatial three-channel image and constructed Siamese networks to detect image-based intrusion events using Conv3D convolution operation.

Obviously, the deep learning algorithm still occupies a vital part of the few-shot learning method. Unfortunately, these few-shot intrusion detection methods pay little attention to the computational burden that the metric procedure brings to the final decision. Complex feature extraction networks and similarity metric mechanisms still restrict the real-time processing capability of these methods. In addition, as an important barrier to the network security operation, the abovementioned methods are still inadequate in terms of accuracy and need to be further improved.

2.3. Triple Network

The triplet network, which is derived from Siamese networks, consists of three embedded networks with shared parameters [26]. It has been widely used in metric learning tasks [2729]. Nguyen et al. [30] automatically learned motion patterns from small image blocks by training a triplet network for change detection based on motion features. Abdullah et al. [31] proposed a bidirectional triplet network to match text with remote-sensing images. The network consisted of a long short-term memory network (LSTM) and CNNs (based on EfficientNet-B2, ResNet-50, Inception-v3, and VGG16) and used averaging fusion strategy to fuse features associated with five image sentences to achieve a more robust embedding. Ji et al. [32] proposed a dual triplet network for image zero-shot learning. The method projected semantic information into visual space by using a mapping network and then learned visual semantic mappings via two triplet networks. In this method, one triplet network focused on negative attribute features and the other triplet network paid attention on negative visual features to ensure that the data information is fully utilized. Gao et al. [33] proposed a new heterogeneous information network embedding algorithm. In the data sampling phase, a metaschema-based random walk was performed to extract semihard quadruplets based on the node type and its degree. In the representation learning phase, a relational triplet loss is designed to optimize the distance of triplet embedding on diverse heterogeneous relationships.

With the widespread use of triplet networks, some improved approaches focused on the unique data format required for algorithm training as the random selection of triplet sample pairs could lead to uneven distribution of the data and unstable performance in the model training process. Schroff et al. [34] proposed a method to select triplet sample pairs by defining three types of samples including easy triplets, semihard triplets, and hard triplets, and he pointed out that the generalization ability of the model will be limited if only easy triplets were used, while semihard and hard samples were more effective to train the network. Hermans et al. [35] further proposed an improved version of triplet sample pair generation with online-batch hard sample mining.

Compared with the abovementioned networks, the triplet network shows greater advantages but still lacks lightness and stability. The parallelized detection structure and lightweight embedding network can cope with large-scale traffic in more real time. Equally, more stable prediction results and a more effective intrusion detection system can be obtained by further exploiting the potential features of the training data.

3. Proposed Method

Despite the significant advantages of the triplet network compared to other metric learning networks, it still suffers from two weaknesses: (1) similar to other metric learning methods, it has a high computational burden at the feature extraction stage. Consequently, it is not suitable for those applications which have real-time processing requirements and (2) the convergence of the network is sensitive to training pair samples used in the triplet loss calculation. In this study, we propose a parallelized lightweight triplet network to achieve real-time intrusion detection performance. Further, we introduce a paired sample selection stage to extract more representative pairs for the network training. As a result, our method achieves high detection accuracy with a real-time processing speed.

3.1. The Architecture of the Few-Shot Learning-Based Intrusion Detection

Figure 1 depicts the architecture of building our few-shot learning intrusion detection by using the designed lightweight parallelized triplet network. The architecture mainly includes four modules, that is, preprocessing stage, initial training stage, refinement training, and detection.

The preprocessing module is to process the few-shot training set, the query set, the support set, and the test set and to create the corresponding input sample pairs according to the position relationships of anchor, positive, and negative, which are described in detail in Section 3.2. The training set is used to train the triplet model; the query set is used to verify the performance of the model in the training phase; the support set is used as a metric for refinement training and detection; and the test set is used to test the effect of the model.

The initial training module is to train a preliminary detection model on the few-shot training set by means of a triplet network, which contains the embedding network and triplet loss. The embedding network takes on the task of feature vector extraction, and it performs feature vector extraction through a deep CNN network with shared weights to achieve a lightweight detection model while avoiding overfitting. The triplet loss mainly undertakes the task of loss calculation in the training process, which metrics the feature vectors output from the embedded network, calculates the distance between the feature vectors, and then updates the network weights.

The refinement training module is to further improve the generalization ability of the model by selecting more representative and hard pairs for training. The selected samples are captured via the initial model to metric the samples in the training set where the hard samples are chosen as valuable samples according to a certain ratio. These samples are used as training samples to obtain the refined detection model.

The detection module is to obtain the labels of the testing samples by using the similarity measurements of the testing sample with samples in the supporting set. A parallel process structure is designed to measure the similarities for real-time performance.

In summary, the method obtains a more accurate model through double training in the training phase and achieves the detection of test samples in the detection phase through the parallel deployment of multiple models. In terms of deployment, the method is a concept of multiple triplet networks combined to provide a parallel network structure to enable feature extraction and real-time detection from different traffic samples. First, in the training phase, the network relies on a small number of samples to obtain an initial detection model and then refines the training to select more representative samples to improve the model’s ability to detect malicious traffic. Second, in the detection phase, improved and refined detection models are distributed to edge computing devices for parallelized deployment. This deployment strategy enables real-time detection of online traffic through a lightweight detection model and responds to anomalous traffic for future model refinement.

3.2. The Lightweight Enhanced Triplet Network for Reducing Computational Complexity

The proposed triple network consists of three embedding networks with shared weights, which takes three inputs with an anchor sample, a positive sample, and a negative sample. The parameters of the embedding network are updated by using the backpropagation algorithm. Specifically, during the training process, the triple network randomly selects a sample from the training set as the anchor. Then, a sample of the same type as anchor is randomly selected for positive and a sample of a different type for negative, forming a triplet sample pair (positive, anchor, and negative) as the input to the embedding network. Finally, the Euclidean distances of positive sample pairs (anchor and positive) and negative sample pairs (anchor and negative) are calculated separately based on the feature vectors output from the embedded network to further calculate the loss. As shown in Figure 2, triplet samples are input to the embedding network. The embedding obtains the sample features by stacking multiple layers of convolutional pooling operations. The purpose of training is to make the distance between the anchor and the positive as small as possible and the distance between the anchor and the negative as large as possible, i.e., by updating the network weights to make the anchor closer to the positive and away from the negative. Therefore, triplet loss is calculated as shown in the following formula:where d+ and d are the distances of the (anchor and positive) feature vectors and the (anchor and negative) feature vectors, respectively. The specific calculation is shown in the following equations:

The ability to address large-scale traffic is a very important metric for intrusion detection methods, which is closely related to the computational and parametric quantities of the detection model. Therefore, we used depthwise separable convolution [36] and GAP [37] to implement a lightweight deep embedding network.

Typically, CNNs need to define multiple filters to enrich the underlying features, which allow one-dimensional traffic samples to evolve into multichannel feature maps after convolution. Different from the conventional convolution operation, depthwise separable convolution contains two parts: depthwise convolution (dw-conv) and pointwise convolution (pw-conv). Among them, each filter of dw-conv is responsible for convolving one channel. Pw-conv operation is equivalent to a weighted fusion of multiple feature maps from the previous layer. As shown in Figure 3, dw-convc initializes three convolutional kernels for extracting three different feature maps, respectively. Pw-conv fuses three 1D feature mappings into a new feature mapping through a 13 filter, although this fusion inevitably losses some features. However, as shown in Figure 2, we add this operation before the underlying features enter the higher-level convolution, achieving a reduction in computational complexity while barely affecting the effectiveness of feature extraction.

In addition, the embedding network inputs the one-dimensional vectorization (Flatten) of the feature map output from the last convolutional block to the fully connected layer. GAP was proposed to replace fully connected layers to average the output feature maps of the last convolutional layer for compact and efficient feature representations. As shown in Figure 2, the output of the last convolutional layer of each embedding is made as the input of the GAP, and the output feature vectors are aggregated through a concatenate layer after being activated. With the abovementioned two improvements, the number of parameters and computation of the embedded network are greatly reduced. The architecture avoids overfitting issues to improve the reliability of our model.

3.3. Parallelized Triplet Model
3.3.1. Parallelized Triplet Model for Refinement Training

The effectiveness of the model is closely related to the quality of the training samples. Compared to easy samples, hard sample pairs provide more valuable distinctive power in the training of the network. By using these valuable samples, the generalization ability of the model can be improved.

However, the valuable samples are difficult to select due to a large number of combinations. As shown in Figure 4, during the training process, not only the similarity between samples can be obtained by the (anchor, positive) pairs but also the dissimilarity between samples can be obtained through the (anchor, negative) pairs. As described in Section 3.2, the resulting model is trained to maximize distances between different types of input sample pairs and minimize distances between the same types of pairs. In our work, we select a certain proportion of positive and negative samples as sample pairs according to the distance. The threshold for valuable sample selection is shown in the following formula:.

When the distance of a sample pair is higher than a defined threshold on distance, we select this pair into the training set for the refinement stage. The σ in formula (4) as a hyperparameter represents the percentage of the valuable samples to be selected in terms of distance, which we usually set to σ = 25%. After selecting the hard samples, we use them to construct (anchor, positive, and negative) sample pairs by random selection to add to the initial training set to obtain the final detection model.

3.3.2. Parallelized Triplet Model for Detection

Because the model obtained by the triplet network is more suitable for unsupervised classification tasks and cannot directly give the labels of the tested samples, we build a few-shot intrusion detection framework that can perform metric learning through a parallel structure. The training process of the model contains a measure of the (anchor and positive) part, which can effectively capture the intraclass similarity. In the detection phase, different models with the same weights are able to simultaneously obtain the distance between the test samples and the positive and negative samples as a similarity. The average similarities are then compared to determine the label of the test sample.

In addition, an extremely important constraint of few-shot learning in intrusion detection is the ability to handle large-scale traffic data. Since the determination of the label in the detection process relies on the similarities between the test sample and samples in the support set, it is a computationally intensive process to calculate the similarities across all pairs. Therefore, we build a parallel intrusion detection mechanism using the metric model obtained from the training to reduce the computational complexity. As shown in Figure 5, the parallel triplet network model can simultaneously process multiple input sample pairs established by the test and the support samples to obtain the similarity values simultaneously as a feature representation for the detection. Specifically, a single detection model can extract features on two input samples at a time to obtain the Euclidean distances between the test sample and different types of support samples. Thus, parallelized deployment of multiple models allows simultaneous comparison of a given test sample with different support samples, obtaining the distance of the test sample from the positive sample set as well as the negative sample set and then, by comparing these distances, obtaining the labels of the test samples. The lightweight model can achieve the high concurrency required for parallelism, meets the limitations of storage and computational units on parallel processing, and enables fast detection of large-scale traffic.

4. Experiments, Evaluation, and Discussion

In this section, we conduct experiments on two benchmark datasets to verify the advantages of the proposed method on few-shot intrusion detection. We first introduce the datasets and the experimental environment, then we show the advantages of the method by comparing other algorithms, and finally we choose the state-of-the-art few-shot intrusion detection method to compare with the proposed method under the same experimental conditions. Through the experiments, we can obtain answers to the following questions:(i)Whether our proposed method has advantages in few-shot scenarios compared to existing general intrusion detection methods?(ii)Whether our proposed method has an improvement in detection accuracy compared to advanced few-shot intrusion detection methods?(iii)How does our proposed method perform in terms of detection efficiency compared to advanced few-shot intrusion detection methods?

4.1. Experiment Setup
4.1.1. Description of the Benchmark Datasets

To demonstrate the effectiveness of the proposed approach, we conduct experiments on two publicly available and heterogeneous intrusion detection datasets.

CICIDS-2017 dataset [38]: the dataset was released by the Canadian Institute for Cyber Security (CIC) in 2017 and contains normal samples and 14 of the latest common attacks such as distributed denial of service, SQL injection, and port scanning. Specifically, the dataset contains 2,830,743 samples on continuous time from Monday to Friday. Each sample included 78 different features. We selected 220 of them as the few-shot training set for training and validated the detection effect on 20,710 test sets. The label distribution of the few-shot dataset is shown in Table 1.

In the training set, the relationship between the number of normal samples and abnormal samples is 1 : 1, and the number of samples available for training is 10 for each abnormality type, which satisfies the requirement of few-shot learning. The test set contains 10,000 normal samples and 10,710 abnormal samples, which can fully examine the detection accuracy of the method. Among them, the number of Heartbleed, Infiltration and Web Attack-Sql Injection samples is relatively few. As most advanced NIDS classifiers based on deep learning are less sensitive to unknown attacks, and traffic samples for emerging forms of attacks such as “zero-day” attacks are difficult to obtain, it is important to identify unknown attacks during the detection process [8]. Therefore, we simulate these three types of samples as unknown attacks and include them in the test set only.

UNSW_NB15 dataset [39]: this dataset is the latest intrusion detection dataset created by the Australian Centre for Cyber Security (ACCS). It was generated from approximately 2,540,047 samples of network traffic through the IXIA PerfectStorm. Because the original dataset is too large, the agency also released a dataset containing 82,332 samples for evaluation, and our experiments were launched on this dataset. The dataset covers 9 different attack types such as Backdoor, Denial of Service, Reconnaissance, Shellcode, and Worms. Each of these samples contains 49 features. Since the variety of attacks became less, we used 140 of these samples as a few-shot training set and 16,662 of them as a test set.

Similarly, the ratio of normal to abnormal samples in the few-shot training set created on this dataset is 1 : 1, and the number of samples available for training in each abnormality type is 10. As shown in Table 2, 10,000 normal and 6,662 abnormal samples are included in the test set, and unknown attacks are simulated using Worms and Shellcode types which are not included in the training set.

4.1.2. Implementation and Metrics

The few-shot intrusion detection in the study is a process of binary classification of traffic samples, and the detection results are classified into four types as follows:(i)TP: attack samples are correctly detected as attack samples(ii)FN: attack samples are incorrectly classified as normal samples(iii)TN: normal samples are correctly detected as normal samples(iv)FP: normal samples are incorrectly classified as attack samples

We use precision, recall, and F-measure to validate the detection performance of the method in the study. The precision is the proportion of real positive samples in the samples that are judged to be positive and can be expressed by the formula, . The recall refers to the proportion of samples that are judged to be positive in all samples that are truly positive and can be expressed by the formula, . F-measure is the summed average of precision and recall, and the formula is .

In addition, we used the number of parameters, floating points of operations (FLOPs), and detection time to measure the performance of the method in terms of operational performance. The number of parameters in the embedded network is related to the size of the kernels and the number of input and output channels and can measure the usage of computational resources such as memory during model training and detection. FLOPs are the number of multiplication and addition operations in the model and are used to measure the computational complexity of the model. Since there are some hard-to-measure factors in the detection process, they can be judged by the actual time consumption. Therefore, the intrusion detection time for large-scale samples is also a very important evaluation metric.

To complete the testing of the method, the training experiments were performed on a CPU Intel Xeon E5-2620, GPU NVIDA GTX1080ti, 64 GB of RAM, 11 GB of video memory, CuDNN 7.6.5, CUDA 11.0, Tensorflow 1.13.1, and Keras 2.2.4.

4.2. Refinement Training Results

To compare the improvement of the detection effect by refinement training, we conducted experiments on a few-shot training set. After obtaining the initial model, we set the distance proportion of the valuable samples to 25%. The distribution of the valuable samples obtained after refinement training is shown in Figure 6.

Although valuable samples occupy 25% of the distribution distance, they only account for a very minor proportion of the number of samples. In order to better fit these samples, we added the valuable samples to the training set in the posttraining phase and performed refinement training. Compared with the initial model, the improvement in detection accuracy of the refinement training model is shown in Table 3.

From the two different datasets, the models obtained after refinement training showed a significant improvement in detection accuracy. On the CICIDS-2017 dataset, the model obtained after refinement training improves nearly 3 percentage points in precision rate and maintains the lead in the overall evaluation metric F-measure despite the slight decrease in recall rate. On the UNSW_NB15 dataset, the refinement trained model improved significantly in all three evaluation criteria, including precision, recall, and F-measure. While the number of misclassified normal samples decreased from 719 to 329, the number of correctly detected abnormal samples improved by 131. Therefore, the refinement training under the parallel model structure has a significant improvement on the few-shot intrusion detection method.

4.3. Comparison and Analysis with Other Classifiers in Few-Shot Scenarios
4.3.1. Overall Performance

The merit of an intrusion detection system is evaluated based on its ability to correctly classify traffic samples. To evaluate the classification performance of our proposed method on a few-shot training set, we use three widely used classifiers, including the deep learning algorithms CNN [40], ResNet [41], and XGBoost [7]. With sufficient training data, all three algorithms can achieve high classification accuracy. However, as shown in Table 4, in scenarios where the number of samples is scarce, both the single deep learning algorithm and the machine learning algorithm produce many misclassifications for both normal and abnormal samples. Obviously, in scenarios with very few sample sizes, machine learning algorithms and deep learning algorithms are prone to overfitting due to the lack of training samples. The few-shot learning method is not limited by the number of samples, and the final result is determined by metric with the support set of samples after feature extraction, with fewer false positives compared to other algorithms for normal samples. In contrast, the tripletnet-based classifier achieved the best performance in terms of all the evaluation metrics.

Figure 7 shows the detection performance of the four classifiers in terms of precision, recall, and comprehensive evaluation metric F-measure. The proposed method accomplishes the successful detection of more than 10,000 samples using only a minority of samples for learning. Among them, on the CICIDS-2017 dataset, the detection rate of the algorithm for attack traffic is 96.18%, which is higher than CNN by 1 percentage point. Among them, the detection rate of the method on the CICIDS-2017 dataset is 96.18% for attack traffic, which is 1 percentage point higher than that of CNN. Although this advantage does not pull away from the CNN algorithm, the proposed method is better overall when combined with the recall rate and F-Measure. CNN shows more false positives in the detection of normal samples. In addition, the proposed method has an all-round advantage over the other three classification models on the UNSW_NB15 dataset. The structure of the triplet metric model makes it possible to achieve a detection rate of 97.19% for abnormal samples, which is more than 10 percentage points ahead compared to deep learning algorithms such as CNN and ResNet. The above can fully illustrate that the method in the study is capable of the detection task in the case of extreme scarcity of samples.

4.3.2. Comparison of Detection Results under Different Attack Types

Table 5 shows the detection results of different classifiers for 14 attack types on the CICIDS-2017 dataset. Among them, the first 11 attack types appear in the training set. The last 3 attack types are not trained and are used to simulate unknown attacks. It can be clearly seen that the XGBoost algorithm is better at detecting attack samples compared to the deep learning algorithm. This indicates that the general deep learning algorithm is not suitable for the training task in the sample scarcity scenario. Due to the scarcity of samples, deep learning algorithms are highly prone to overfitting, which affects the classification accuracy. However, the detection rate of the XGBoost algorithm for samples is not stable, for example, it is extremely poor for known attacks such as FTP-Patator, Port Scan, Unknown Attacks, Heartbleed, and Infiltration. The proposed method outperforms the other three algorithms in the detection of multiple attack types. Although the advantage is not very significant, there is a stable detection for all known attack types. In addition, for the detection of unknown attacks, our proposed method is still able to achieve better detection accuracy using the capability of metric learning.

In contrast, the experimental results on the UNSW_NB15 dataset fully demonstrate the advantages of the method in the study. The proposed method leads on seven out of the nine attack types shown in Table 6. The detection results of XGBoost outperform the other two deep algorithms. However, combined with the recall shown in Figure 7, the advantage of XGBoost is based on a large number of false positives for normal samples.

4.4. Comparison and Analysis with the State-of-the-Art Methods
4.4.1. Comparison of Detection Accuracy

To demonstrate the detection performance and efficiency of the proposed method, we use the existing state-of-the-art method as a comparison. To ensure reliable experimental results, the algorithm model in the FC-Net method is built as described in Section 4 and paragraph B of the original study [9], and the consistency of the dataset is maintained.

As shown in Table 7, compared to FC-Net, the proposed method maintains the leading position on different datasets. From the perspective of detection of attack samples, the few-shot learning method established using the triplet network can reach 96.18% and 97.23% on the two datasets, respectively, which can effectively detect the network traffic of the attack, exceeding the few-shot learning method established on the Siamese network structure. From the perspective of the detection of normal samples, FC-Net has a lower false alarm than our method on the CICIDS-2017 dataset for normal samples, but still has a higher false alarm on another dataset. It cannot be considered as an advantage of the method. Therefore, on the whole, our algorithm surpasses the advanced method FC-Net.

4.4.2. Comparison of Operational Performance

Intrusion detection requires real-time performance which means it could effectively cope with large-scale network traffic. Constrained by the embedded network and the procedure, FC-Net performs poorly in detecting real-time process. As shown in Table 8, the number of parameters of the 3D-ConV model built by FC-Net reached 806,273, which means that the detection model consumes great memory resources when running and it is difficult to address large-scale data simultaneously. In terms of FLOPs, the computational complexity of FC-Net’s j is enormous. In contrast, our method is only 0.1% of the FC-Net method in terms of computation and number of parameters. Moreover, in terms of the detection time of the two methods for different test sets of samples, the proposed method accomplishes the classification task for 20,710 and 16,662 test samples in 1.8 s and 1.5 s, respectively, on a dual detection framework with higher scalability. Thus, our approach has the capability to address large-scale network traffic in real time.

5. Conclusions

In this study, we proposed a few-shot intrusion detection framework based on triplet network. The method implements staged learning and parallelized detection through the proposed lightweight model with high concurrency. Experimental results demonstrate the significant advantages of the method in terms of detection performance and efficiency in the case of a limited number of samples. High availability is a constant topic in the field of intrusion detection. In future research, we will further explore the application of a meta-learning framework in real-time intrusion detection.

Data Availability

The datasets used in the article are the two public datasets, CICIDS-2017 and UNSW_NB15. They can be downloaded from the following website links: https://www.unb.ca/cic/datasets/ids-2017.html and https://research.unsw.edu.au/projects/unsw-nb15-dataset.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the Youth Fund Project of the National Nature Fund of China under grant no. 62002038.