An Efficient Intrusion Detection Method Based on Federated Transfer Learning and an Extreme Learning Machine with Privacy Preservation
Current network security is becoming increasingly important, and intrusion detection is an effective method to protect the network from malicious attacks. This study proposes an intrusion detection algorithm FLTrELM based on federated transfer learning and an extreme learning machine to improve the effect of intrusion detection, which implements data aggregation through federated learning and facilitates the construction of personalized transfer learning for all organizations. FLTrELM first builds a transfer extreme learning machine model to solve the problem of insufficient samples and probability adaptation, then uses the model to learn to protect data privacy without sharing training data under the federated learning mechanism, and finally obtains an intrusion detection model. Experiments on the NSL-KDD, KDD99, and ISCX2012 datasets verify that the proposed method can achieve better detection results and robust performance, especially for small samples and new intrusions, and protects data privacy.
Currently, computer networks and the Internet are the fundamental components of our society, making great contributions to the economy and impacting people’s work and lifestyle . The network not only has become the foundation of society and modern life but also stores a large amount of data related to people’s private information and national security. At the same time, with the rapid development of network technology, attacks on networks are increasing at an alarming rate. When a network intrusion or an attack occurs, it will inevitably affect our normal activities, threaten national security, and weaken the security of personal information. Therefore, network security has become increasingly important, and the problem of cybersecurity has been the focus of a growing number of people [2–4]. As an emerging security defense technology, the intrusion detection system (IDS) [5, 6] can improve the reliability and security of the system by detecting and responding to various malicious behaviors, while actively protecting the network system from illegal external attacks. IDS aims to classify network access as normal or intrusive based on rules or models. The system that has proven to be efficient and powerful examines network traffic data on computer networks to determine harmful activities and issues alert when such activities are detected. In addition, IDS can improve the reliability and security of cyberspace by detecting and responding to various malicious behaviors and has become an important technical means of protecting cyberspace security against network attacks and intrusions .
In recent years, with the rapid development of machine learning, deep learning and artificial intelligence are used for pedestrian detection , and their application in intrusion detection has become a commonly studied research topic in the field of network security . Sumaiya Thaseen and Aswani Kumar  proposed an intrusion detection model based on chi square feature selection and a multiclass support vector machine (SVM). The model uses parameter tuning technology to optimize the kernel parameters of the radial basis function. The main idea is to construct a multiclass support vector machine that has not been used for intrusion detection thus far to reduce training and testing time and improve the individual classification accuracy of network attacks. Ikram et al.  integrated different deep neural network (DNN) models, such as multilayer perceptron (MLP), backpropagation network (BPN), and long-term and short-term memory (LSTM), to establish a robust anomaly detection model. The model uses XGBoost to integrate the results of each deep learning model to achieve higher accuracy . Based on migration learning and whether IOV can provide a small amount of marker data for new attacks in time, a new model update scheme is proposed. Vehicles can complete the update without obtaining any marker data through the IOV cloud. The experimental results show that compared with the existing methods, the detection accuracy of our model scheme is improved by at least 23%. Xu et al.  designed a multisource transfer learning intrusion detection system for privacy protection. First, the system uses Paillier homomorphism to encrypt the models that are trained from different source domains and uploaded to the cloud and then proposes a multisource transfer learning intrusion detection system based on encryption XGBoost (e-XGBoost) based on a privacy protection scheme. Cheng et al.  used federal learning (FL) to protect privacy through a local training model and used transfer learning (TL) to improve training efficiency through knowledge transfer.
Compared with other commonly used machine learning algorithms, such as SVM, DBN, and CNN, extreme learning machine (ELM)  has the advantages of better performance, faster learning speed, and minimal human intervention, which has quickly gained the attention of researchers. ELM was applied it to the network intrusion detection system, effectively improving the detection effect of the network intrusion detection system. Cheng et al.  proposed a basic ELM method based on random features and a kernel-based ELM classification method, which is superior to the commonly used SVM method in classification in terms of training and testing speed and detection accuracy. Singh et al.  proposed an intrusion detection technology based on an online sequential extreme learning machine (OS-ELM), which uses alpha analysis to reduce time complexity while using feature selection based on filtering, correlation, and consistency discards irrelevant features. Wang et al.  proposed a kernel-based extreme learning machine (KELM) with supervised learning capabilities to replace the BP algorithm in DBN to shorten the training cycle. Wang et al.  applied an equation-constrained optimization ELM (C-ELM) to network intrusion detection, proposed an adaptive incremental learning strategy to derive the optimal number of hidden neurons, and developed optimization standards and a method that adaptively added hidden neurons through a binary search.
Recently, federated learning  has become one of the most promising directions for the future development of machine learning. The purpose of federated learning is to conduct collaborative training without sharing private data. It does not need to aggregate the data required for model training for centralized calculation but transmits encrypted gradient-related data and uses multisource data to collaboratively train the same model . Its emergence allows traditional machine learning models to achieve better training results while ensuring data security and privacy, which has the advantages of distributed collaboration, good scalability, strong privacy protection capabilities, and low cost. Even federated learning is positioned as the last mile of artificial intelligence .
In this study, benefiting from the advantages of federated learning, a framework FLTrELM based on federated transfer learning and extreme learning is proposed to solve the problems of data islands, scarce labeled samples, data privacy protection, and personalization in intrusion detection. FLTrELM uses federated learning  and homomorphic encryption  to build a powerful extreme learning machine model by aggregating data from independent institutions while protecting data privacy. After the model is built, FLTrELM reuses the transfer learning method to achieve personalized model learning for each organization, solves the lack of label samples and data distribution mismatch, and effectively improves the intrusion detection effect.
The algorithms proposed in [10, 11] do not use transfer learning and cannot use relevant data to help target learning tasks create an intrusion detection classifier. Moreover, references [12, 13] are intrusion detection models based on migration learning, but the application scenario of the former is a vehicle network, and they cannot solve the problem of data islands. Although the literature  has the advantages of federated learning and transfer learning, it uses convolutional neural networks and reinforcement learning. The training efficiency of the algorithm is discounted, and the unselected client data are wasted. The final model is a global model, not a personalized model for each mechanism. Compared with them, the FLTRELM algorithm proposed in this study has higher training efficiency, makes full use of the data of each organization, effectively solves the data island, and obtains a personalized learning model for each organization.
Our contributions are highlighted as follows:(1)To the best of our knowledge, we are the first to apply federated extreme learning transfer learning and ELM to intrusion detection and propose FLTrELM. It aggregates intrusion detection data from different organizations without sacrificing data privacy and security and simultaneously obtains a strong learning model that includes individualized behaviors suitable for each organization through knowledge transfer.(2)The experimental results show that compared with traditional machine learning methods, FLTrELM substantially improves the accuracy of intrusion detection, especially for fewer samples and new intrusions. Additionally, at the same time, through knowledge transfer, a strong learning model that includes individualized behaviors applicable to various institutions is obtained.
The rest of the study is arranged as follows. Section 2 reviews the related works of federated transfer learning and an extreme learning machine; in Section 3, an intrusion detection algorithm based on federated transfer learning and an extreme learning machine are proposed in this study; in Section 4, the effectiveness of the algorithm is verified on NSL-KDD, KDD99, and ISCX2012 datasets; Section 5 summarizes the main work of this study.
2. Related Works
2.1. Federated Transfer Learning
Federated learning was first proposed by Konen et al.  in 2016 and was used by Google to train machine learning models based on mobile phones distributed around the world. Compared with traditional machine learning algorithms that require the last large amount of high-quality data collected from various institutions to be trained on the cloud server for centralized training, it allows each user to train the model on the local machine and upload the model to the server for aggregation after being encrypted. Finally, a global learning model is obtained through multiple iterations. This learning method not only protects the users’ privacy but also does not require data aggregation to cause uncontrollable data flow and sensitive data leaks. The process of federated learning is shown in Figure 1.
Figure 1 shows that the learning process of federated learning is as follows:(1)The organization downloads the global model mt from the central server(2)Organization k trains on local data to obtain the local model mt,k (local model updated in the tth iteration of the kth organization)(3)Each organization uploads the locally updated model to the central server(4)The central server performs a weighted aggregation operation after receiving each mechanism model to obtain the global model mt (update the global model at the tth iteration)
To ensure data privacy, federated learning only allows all remote devices to exchange model gradients with a central server. In this process, each distributed device uses local data to train its own model and then uploads the local model to the central server. After aggregating all the collected models, the server returns the new global model to each device.
According to the different distribution patterns of samples and data feature spaces, federated learning can be divided into three categories, namely, horizontal federated learning, vertical federated learning, and federated transfer learning . Horizontal federated learning is suitable for situations where the user features of the two datasets overlap substantially, but the user overlap is small; vertical federated learning is applicable to the situation where the user features of the two datasets overlap very little, but the user overlaps considerably. Federated transfer learning  is different from the previous two federated learning algorithms. It is used when the user and user characteristics of the two datasets rarely overlap, without segmenting the data, but using transfer learning  (the transfer of knowledge from an existing field to a new field related to it) to overcome the lack of data or labels is often used to solve the problem of different feature spaces of datasets and the scarcity of label samples. Therefore, the federated transfer learning formed by federated learning + transfer learning has formed a mechanism that can protect privacy and the jointly model as well. This mechanism has received strong responses in the industry and is especially useful between different companies/organizations across domains. Recently, an increasing number of researchers have begun to pay attention to this field [30, 31]. This study is keenly aware of the huge advantages of federated transfer learning in the field of intrusion detection and the first federated transfer learning algorithm specifically proposed for intrusion detection to solve the current problems.
2.2. Extreme Learning Machine
An extreme learning machine (ELM) is a new learning algorithm proposed by Huang et al. in 2006 . It is used to train the single hidden layer feedforward neural network (SLFN). By randomly selecting the input layer weight and the hidden layer bias, the output layer weight is calculated according to Moore Penrose (MP) generalized inverse matrix theory by minimizing the loss function composed of the training error term and the regular term of the output layer weight norm. As a nonlinear model, ELM has good generalization and nonlinear mapping and can be used to solve dimensional disaster problems. Compared with other machine learning algorithms, such as BP neural networks and SVMs, it has the advantages of fast learning speed, less intervention, and good computational scalability .
Given a training dataset , is the weight connecting the ith neuron, is the bias of the ith hidden neuron, is the output weight vector connecting the neuron, and is the output function of the hidden layer neuron. The number of hidden layer nodes of the limit learning machine is L, and indicates the mark corresponding to the ith data example. The network structure of the ELM is shown in Figure 2.
As seen in Figure 2, the input of the neural network from left to right is a training sample set , there is a hidden layer in the middle, the input layer is fully connected to the hidden layer, and the output matrix of the hidden layer is recorded as follows:
In (1), the form of is shown as follows:
In (2), the function represents the activation function , which is a nonlinear piecewise continuous function, and commonly used functions include the sigmoid function and the Gaussian function. In this study, the sigmoid function is used. After passing through the hidden layer, it enters the output layer. According to Figure 2 and formula (2), the output of the “generalized” single hidden layer feedforward neural network ELM is as follows:where , . The unknown quantities in (2) and (3) are , b, and β. The ELM adjusts the weights and deviations between neurons according to the training data, and what we actually learn is contained in the connection weights and deviations. Equation (3) can be converted to (4) as follows:
The purpose of ELM is to minimize the training error and obtain the parameters with a better learning effect on the training dataset β. In the ideal state, (4) holds, but in practice, the output can only be as close to the sample label as possible. In this study, the method of minimizing the approximate square difference is used to connect the weight of the hidden layer and the output layer β. The objective function is as follows:
According to , the Moore Penrose generalized inverse matrix is calculated by an orthogonal projection matrix, and the optimal solution is as follows:
Although ELM has made some achievements, there is still room for improvement. Some scholars have also proposed algorithms to optimize ELM [35, 36]. At present, the traditional extreme learning machine is limited by the fact that the training and the test data meet the constraints of independent and identically distributed and sufficiently available training data. In reality, it is often hoped to use only a small amount of new data and a large amount of historical data to learn an accurate model. With the emergence of transfer learning, it is possible for the further development of ELM.
3. The Proposed FLTrELM Algorithm
3.1. Definition of Problems
Given data from K different organizations (users), the feature spaces of the data are different from each other. An extreme learning machine model MALL is trained on the data of all organizations. In this study, the federated transfer learning model MFED is trained through all data collaboration. Among them, no organization will disclose its data to other organizations. For convenience, binary classification problems are considered, the class label set is , and the multiclassification problem can be extended to the binary classification problem. Assuming that the accuracy rate is expressed as Acc, then the goal of learning is to ensure that the accuracy of federated learning MFED is close to or better than that of the traditional learning model, MALL, as follows:where represents a very small nonnegative real number. Suppose the classification space is . For convenience and without loss of generality, this section only considers the binary classification problem, and the multiclassification problem can be extended to the binary classification problem.
3.2. Framework of FLTrELM
FLTrELM aims to achieve accurate detection of malicious network intrusions through federated transfer learning without sacrificing the privacy and the security of institutional data. Without loss of generality, we assume that there are 3 organizations (users) and 1 server, and more organizations can be expanded under actual conditions. Figure 3 provides an overview of the framework.
In Figure 3, three organizations, A, B, and C, cannot share their own data due to the “wall” of data privacy protection. Their respective datasets are D1, D2, and D3, and the models are Model A, Model B, and Model C, respectively. Each organization cannot use each other’s data training models. They can only exchange model knowledge with the cloud model through knowledge migration. In this way, each model establishes contact through the cloud model and breaks the “separation wall.” The framework protects the data privacy of various institutions and can use each other’s model knowledge to help train their own models.
The framework mainly includes the following four procedures:(1)The cloud model on the server is trained according to the public dataset(2)The server distributes the cloud model to all institutions, and each institution can train their own model on their own data (Figure 4 shows knowledge migration process)(3)Each organization’s model is uploaded to its own model to the cloud server, which helps train new cloud models through model aggregation(4)Repeat procedures (1)–(3) until convergence is achieved, and each organization can obtain a personalized model with a good learning effect
Due to the great distribution difference between the server data and the data of each organization, probability adaptation must be carried out with migration learning to obtain a model more suitable for each organization, that is, the migration knowledge process in Figure 3. The detailed process of this procedure is shown in Figure 4. The detailed knowledge migration process is as follows: the cloud model uses the cloud dataset DC training to obtain the model cloud model. This model knowledge is used to help organization i train model mi, to migrate the knowledge in the cloud model that is very similar to the model mi, we select the migrated knowledge through probability adaptation to obtain a new model mi. It is worth noting that this process encrypts all parameters using data homomorphic encryption. The sharing process does not involve any data leaks , meaning that all institutions do not share data. Learning is completed through encrypted shared parameters to protect data privacy.
The learning process of FLTrELM involves model establishment and parameter sharing. After the cloud model is established, it can be directly applied to various organizations. In the actual situation, it is obvious that the samples in the server and the data of various organizations have highly distinct probability differences. Therefore, traditional intrusion detection models fail in personalization, and transfer learning can adapt to the probability differences between models to achieve personalization. In addition, due to the data privacy and the security issues of various institutions, the models of various institutions cannot be easily and continuously updated.
3.3. Implementation of FLTrELM
3.3.1. Construction of the Transfer Learning Model
Federal learning solves the problem of data islands between institutions. Therefore, the data of all organizations can be used to build a cloud model, and then, each organization can directly use the cloud model. However, due to differences in the distribution of data from various organizations and the cloud data, it is obvious that the model does not perform well for specific users, meaning that it cannot provide users with personalized features. In this study, transfer learning is used to build a personalized model for each user (organization), as shown in Figure 4. In this way, through the acquired cloud model parameters, transfer learning is performed on users to learn their personalized models.
According to , the server is the source domain , and organization i is the target domain . On , we assume that the parameters of ELM are , combined with transfer learning and extreme learning machine structure risk minimization theory and the ELM optimization algorithm proposed in to construct the objective function as follows:
The constants C1 and C2 represent the penalty parameters of domain adaptation contribution and target classification error, respectively; represents the difference term between the classifier of source and the target domains; constraint indicates that the target domain classifier can classify correctly; constraint ensures that the effect of transfer is not worse than the classification effect only learned on the dataset of the target domain and limits the possibility of a negative transfer. By calculating (7), a classifier is found in the target domain, which can correctly classify the samples in the target domain.
The Lagrangian form of (8) is shown as follows:where , , and are Lagrange multipliers. , , , and are the number of samples in the target domain.
We find the derivative of the parameter sum of the Lagrange function in (8) and set the derivative to zero to obtain the following formula:
3.3.2. Federal Learning Process
FLTrELM uses federated learning to implement encryption model training and sharing. The learning process of federated learning is mainly composed of two key parts, namely, cloud model learning and user model learning. The basic learning model is an extreme learning machine. fS represents the server model to be learned, and the objective function of learning is as follows:where represents the loss function of the network, n represents the number of samples on the server, and represents the network parameters that need to be learned.
After obtaining the cloud model, fS, it is distributed to all organizations. As seen in Figure 3, the organization’s “wall” prohibits the direct sharing of information. This process uses the existing homomorphic encryption technology  to avoid information leaks. Through federated learning, user data can be aggregated without affecting privacy and security.
The organizational learning model is expressed as follows:
In all organizational models, ft is based on a shared cloud model and uploaded to the server for aggregation. In federated learning, good performance can be obtained by sharing initialization and averaging models. Therefore, this study uses the model average to align the user model and averages K user models in each round of training for cloud model updating. Note that here, we are only averaging the user model, and the updated cloud model is expressed as follows:where is a parameter of the network, which represents the learning model of the kth organization. After enough iterations, the updated server model has better generalizability. Subsequently, new users can participate in the next round of server model training, so FLTrELM also has the ability to incrementally learn.
3.4. Training FLTrELM
From Section 3.3, the learning process of the FLTrELM is given in Algorithm 1. The improved algorithm can continue to work with newly emerging organizational data and update the user model and the cloud model at the same time when faced with new data. Therefore, the longer the FTLTrELM is used, the more personalized the model in each organization, and the better the effect of intrusion detection.
4. Experimental Results
4.1. Experimental Setting Evaluation Criteria
All experiments in this study are performed on a PC machine with a processor Intel Core (TM), 3.6 GHz, 8 GB RAM, and Windows 10 operating system. To verify the effectiveness and the generalization performance of the proposed algorithm in intrusion detection FLTrELM, three intrusion detection datasets, namely, NSL-KDD, KDD99, and ISCX2012, are used as experimental datasets. The benchmark algorithms selected in the experiment are ELM , TrAdaBoost [13, 14, 39], and SVM, among which SVM is implemented using the LIBSVM  toolkit.
The 10-fold cross-validation method is a standard method for evaluating machine learning algorithms, and this study uses the intrusion detection model proposed by its evaluation. Specifically, we randomly sample the original dataset into 10 mutually exclusive subsets of equal size. In each run of the model, nine subsets are selected to train the intrusion detection model, and the remaining subset is used to test the model. Therefore, by repeating the above process 10 times, each subset has an equal chance of being selected to train and test the model. Finally, the performance of the proposed detection model is obtained by averaging the results of the test subset. The average of the results of all experiments repeated ten times is used as the final comparison result.
Commonly used evaluation indicators for detection include precision, detection rate, accuracy, false-positive rate, and miss rate. Precision reflects the proportion of correctly classified samples to the total number of samples; the larger the value, the better. Accuracy reflects the proportion of true positive samples to the total number of samples classified as positive, and the larger the value, the better. The detection rate reflects the proportion of positive samples classified as positive in all positive samples. Accuracy and detection rate are a pair of contradictory indicators. The higher the accuracy, the fewer false-positives, and the higher the detection rate, the fewer false-negatives. If there is more precision, the detection rate will increase, but the accuracy will decrease, and vice versa. In intrusion detection, the false-positive rate refers to the proportion of misclassified positive samples to all negative samples. The smaller the value is, the better and the higher it is; it will be prone to “the wolf is coming.”
The formal description of the precision, detection rate, accuracy, false-positive rate, and miss rate is as follows:where represents the number of positive samples that are correctly classified as positive samples, represents the number of negative samples that are incorrectly classified as positive samples, represents the number of negative samples that are correctly classified as negative samples, and represents the number of positive samples that are incorrectly classified as negative samples.
In the work of this study, the average accuracy rate and false alarm rate of the experimental results obtained by the 10-fold cross-validation method are used as overall evaluation indicators to verify the effectiveness and the accuracy of the algorithm.
This section describes the NSL-KDD, KDD99, and ISCX2012 datasets and preprocesses them.
(1) ISCX2012 Dataset. Researchers noticed that the attack types considered in the KDD99 intrusion detection dataset are now out of date. In 2012, the center of Information Security Excellence (ISCX) of the University of New Brunswick (UNB) released an intrusion detection dataset named ISCX2012 . This dataset contains seven days of original network traffic data, including normal traffic and four intrusion types, namely, DoS, Prob, R2L, and U2R (see Table 1, for details). In the experiment, 2% of the data are selected from the training dataset, most of the labeled information is deleted as the source domain dataset, the remaining labeled data are composed of the target domain dataset, and the two datasets together constitute the training dataset; similarly, 1% of the data are taken from the test dataset as the test dataset.
(2) KDD 99. KDD99 is a widely used competition data for intrusion detection provided by the Lincoln Laboratory of Massachusetts Institute of Technology. It is an intrusion detection dataset with the best influence and credibility in academia . The dataset has 5 ∗ 106 pieces of data, and each piece of data has 41 characteristic attributes and 1 class identifier. There are approximately 38 attack types, of which 21 attack types appear in the training dataset, and another 17 unknown attack types appear only in the test dataset. The purpose of this design is to test the generalizability of the classifier model. The ability to detect unknown attack types is also one of the important indicators to evaluate the effect of classifiers in intrusion detection applications.
Thus far, the most used by researchers is the 10% KDD99 dataset (including the training dataset and the test dataset), which is a sample of 10% of all datasets of the KDD99 dataset, and this dataset is also used in the article. The 10% dataset contains 1 type normal with normal signs and 4 major network attack types, namely, DOS, probing, U2R, and R2L. In the two 10% datasets, the four types of cyber-attacks contain different amounts of attack behavior. Table 2 details 22 attack behaviors in the training dataset and 39 attack behaviors in the test dataset, and normal data are also counted as one type of attack in the table.
For the intrusion detection algorithm to be able to recognize new attack behaviors by learning from the training dataset, the test dataset in Table 3 contains more new attack behaviors than the training dataset. In Table 3, the proportion of normal in the two datasets in the 10% dataset is basically the same, but the proportions of the other four attack types are significantly different; because U2R and R2L have very small proportions, most of the current detection algorithms have difficulty detecting these two types of attacks.
(3) NSL-KDD. NSL-KDD  is an optimization of the KDD99 dataset, deleting some duplicate records, including different classification difficulty levels, and the number is more balanced so that it can be used as an effective benchmark dataset to correct and effectively detect the ability of the model. The NSL-KDD dataset includes 4 sub-datasets as follows: KDDTrain+, KDDTrain+_20Precent, KDDTest+, and KDDTest+21. This study uses KDDTrain + for training and KDDTest + for testing. The dataset contains 4 anomaly types, which are subdivided into 39 attack types, of which 17 unknown attack types appear in the test set. Each record includes 41 characteristics and 1 category identifier. Among the 41 features, there are 9 basic TCP connection features, 13 TCP connection content features, 9 time-based network traffic statistics features, and 10 host-based network traffic statistics features. The details of the NSL-KDD dataset are shown in Table 4.
4.2.2. Data Preprocessing
In the intrusion detection dataset, there are nonnumerical data and the dimension difference between the values, and these data need to be converted into numerical data and unified dimension processing. Therefore, the data preprocessing operation includes two steps, namely, character type digitization and data normalization.
(1) Character-Type Digitization. The ISCX2012, NSL-KDD, and KDD99 dataset processing methods are also the same. In each record, their symbol characteristics are converted into numerical data by using 1 to N − 1 encoding. For example, for KDD99, we convert 3 network connection types, 70 network service types, 23 attack types (including normal type Normal), and 11 network connection states of the character type of the dataset into numerical types. The converted forms of the 11 network connection types are shown in Table 5, and the other character types are similar.
(2) Data Normalization. After digitization, the TrELM algorithm uses MMD to select the data. In the MMD, the distance is calculated. For the continuous feature attributes in the dataset, the measurement methods of each attribute are different, so the calculation of the distance between the data has a greater impact, which in turn affects the accuracy of the calculation results. To avoid the above situation, the difference between the different features can be eliminated. For discrete features, the method of MMD normalization is adopted. For continuous features, the Z-Score method is used to fix the value at [0, 1], as shown in formulas (18) and (19):where x is the original sample data, is the maximum value, is the minimum value, is the average value, is the standard deviation, and and are the normalized results of the original data.
4.3. Experimental Results and Analysis
This section analyzes the experimental results of the SVM, ELM, TrAdaBoost, and FLTrELM algorithms proposed in this study on the NSL-KDD, KDD99, and ISCX2012 datasets to verify their effectiveness. In addition, the influence of the adjustable parameter C1 on the results of FLTrELM is discussed and analyzed.
Tables 6–8 show the average accuracy, false-positive rate, and miss rate of the algorithm on the NSL-KDD, KDD99, and ISCX2012 datasets. The following conclusions can be obtained:(1)Sufficient available intrusion detection training samples are the basis of high-accuracy classifier training. On the intrusion detection datasets, NSL-KDD and KDD99, there are a large number of these three types of attacks, namely, normal, Prob, and DOS. All algorithms have high accuracy for these three types of attacks, reaching over 96%. In the same way, on the ISCX2012 dataset, the accuracy of all algorithms against a large number of normal, infiltration, and DDoS attacks reached 90%.(2)On the intrusion detection datasets, NSL-KDD and KDD99, for the attack types U2R and R2L with a small number of samples, traditional intrusion detection algorithms are not enough to train and obtain a high-accuracy detection model. Therefore, they have low accuracy against the two types. FLTrELM [13, 14] and TrAdaBoost are transfer learning algorithms that use knowledge from a large number of well-labeled intrusion detection samples to train the detection types for U2R and R2L. Therefore, their detection rates for U2R and R2L will be improved. In Tables 6 and 7, it can be seen that FLTrELM [13, 14] and TrAdaBoost have improved accuracy on U2R and R2L, especially the accuracy rate of R2L. R2L is above 70%, and U2R is above 46%. On the ISCX2012 dataset in Table 8, the transfer learning algorithms, FLTrELM [13, 14] and TrAdaBoost, are more accurate than ELM and SVM for the smaller number of attack types HttpDos and BFSSH, where FLTrELM performed best. Therefore, FLTrELM has a significant effect on improving the detection rate of U2R and R2L attack types that contain a small number of samples.(3)In terms of the false alarm rate, Tables 6 and 7 show that the false alarm rates of the three intrusion attack behaviors, namely, normal, Probe, and DOS in the intrusion detection datasets NSL-KDD and KDD99 do not exceed 5%. Among them, FLTrELM has the lowest false alarm rate, which is below 6%. In the intrusion detection behavior U2R and R2L, the three nontransfer benchmark algorithms performed poorly. The false alarm rates of R2L and U2R on the NSL-KDD dataset reaches 10% and 9% or more, respectively, and on the KDD99 dataset reaches more than 9%. However, TrELM performs relatively well on these two datasets below 6%. On the ISXC2012 dataset, Table 8 shows that TrELM has the lowest false-positive rate among all attack types.(4)In terms of the miss rate, it can be seen from Tables 6–8 that FLTrELM is the lowest among the nine attack behaviors compared with the benchmark algorithm.
The experimental results show that, on the KDD 99 and the NSL-KDD datasets, TrELM’s accuracy for the five attack types is higher than that of the benchmark algorithm, and the accuracy of the attack type with a small number of samples is also considerably improved. In the ISCX2012 dataset, FLTrELM’s accuracy for the 5 attack types is also better than that of the benchmark algorithms.
Therefore, FLTrELM improves the detection rate of all 9 attack behaviors, especially for the R2L attack behavior detection rate with sparse samples; there is no problem where the detection rate of a certain attack behavior is too low, and the detection rate is very different. We effectively reduce the imbalance problem of the attack type detection in the machine learning algorithm; FLTrELM has shown substantial advantages in the false-positive rate and the false-negative rate. In other words, FLTrELM achieves the best classification accuracy for all intrusions. This is because federated learning can indirectly use more information in distributed data to train better models, and the model becomes more consistent through transfer learning. The characteristics of each organization, compared with traditional methods (SVM and ELM), greatly improve the recognition results. Compared with [13, 14] and TrAdaBoost algorithms, the proposed algorithm still has advantages in terms of accuracy. In short, the experiment proves the effectiveness of FLTrELM, and the proposed algorithm can also protect the privacy of data; thus, it will not be worse than the benchmark algorithm. The experimental results on the three datasets also show that the FLTrELM algorithm has better generalization performance.
Table 9 shows the average training time of the algorithm on three intrusion detection datasets. Compared with the transfer learning algorithms, TrAdaBoost and FLTrELM, the transfer learning algorithms, ELM and SVM, have no transfer learning process and do not need to process additional data, so their training time is relatively low. However, TrAdaBoost and FLTrELM [13, 14] need to transfer the knowledge in the auxiliary data to help build the target learning task, and the training time will increase. FLTrELM  needs data from multiple institutions, so the training time will also increase; FLTrELM proposed in this study inherits the characteristics of the fast solution of ELM; compared with similar algorithms  that use the CNN, training time is shorter and the algorithm efficiency is higher.
It can be seen from (12) that the objective function of FLTrELM includes the parameter C1, and its value will affect the learning effect, so it is necessary to analyze the sensitivity. On the NSL-KDD, KDD, and ISCX2012 datasets, according to parameter C1, it takes [0, 1] different values to record the changes in the classification effect of the FLTrELM algorithm. The average accuracy rate is shown in Figures 5–7, according to the C1 parameter sensitivity analysis of the experimental results presented by them to illustrate the influence of parameters on the learning effect of FLTrELM.
Using the parameter grid search method provided by , the value of C1 is determined, and the experimental results on the real dataset are recorded at the same time. Taking different values within a certain range, the classification effect of FLTrELM is considerably different. It can be seen that the closer the domain relationship, the greater the value and the higher the accuracy. Therefore, the algorithm is sensitive to the regularization parameter, C1, within a certain value range, and the parameter value when the algorithm classification effect is the best that can be obtained on different cross-domain tasks.
In this study, the first algorithm, FLTrELM, based on federated transfer learning and an extreme learning machine for intrusion detection is proposed. FLTrELM aggregates data from different organizations without compromising privacy and security, realizes the adaptation to each user through knowledge transfer, and conducts personalized model learning. The FLTrELM provides a method for future research on intrusion detection. The effectiveness of the algorithm is evaluated in the experiment. The experimental results show that FLTrELM effectively solves the traditional intrusion detection algorithm for small samples and emerging attack behaviors with low detection accuracy, privacy protection, and data islanding problems and improves the detection effect. In the future, our future work is as follows: (1) the difference in the conditional probability of each organization’s data in the process of transfer learning will be given to measuring; (2) the improvement of training efficiency.
The data used to support the findings of this study can be obtained from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this study.
This work was partially supported by Major Special Fund Projects in Heilongjiang Province, China (Grant no. 2020-230100-54-01-000358).
J. Gu and S. Lu, “An Effective Intrusion Detection Approach Using SVM with Naïve Bayes Feature Embedding,” Computers & Security, vol. 103, 2021.View at: Google Scholar
H. W. Wang, J. Gu, and S. S. Wang, “An effective intrusion detection framework based on SVM with feature augmentation,” Knowledge-Based Systems, vol. 136, pp. 130–139, 2017.View at: Google Scholar
A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016.View at: Google Scholar
R. Yahalom, A. Steren, Y. Nameri, M. Roytman, A. Porgador, and Y. Elovici, “Improving the effectiveness of intrusion detection systems for hierarchicaldata,” Knowledge-Based Systems, vol. 168, pp. 59–69, 2019.View at: Google Scholar
Y. Jiang, G. Tong, and H. Yin, “A pedestrian detection method based on genetic algorithm for optimize XGBoost training parameters,” IEEE Access, vol. 99, pp. 118310–118321, 2019.View at: Google Scholar
C. Cheng, W. P. Tay, and G. Huang, “Extreme learning machines for intrusion detection,” in Proceedings of the 2012 International Joint Conference on Neural Networks, pp. 1–8, IJCNN), Brisbane, Australia, June, 2012.View at: Google Scholar
H. B. Mcmahan, E. Moore, and D. Ramage, “Communication-efficient learning of deep networks from decentralized data,” Artificial Intelligence and Statistics, pp. 1273–1282, 2017.View at: Google Scholar
Q. Yang, “Federated learning: the last on kilometer of artificial intelligence,” CAAI transactions on intelligent systems, vol. 15, no. 1, pp. 183–186, 2020.View at: Google Scholar
C. C. C. C. Chang and C. C. C. Lin, “A library for support vector machines,” Neural Computation, vol. 19, 2011.View at: Google Scholar
L. Dhanabal and S. P. Shantharajah, “A study on NSL-KDD dataset for intrusion detection system based on classification algorithms,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 6, pp. 446–452, 2015.View at: Google Scholar