Abstract

Rapid progress of networking technologies leads to an exponential growth in the number of unauthorized or malicious network actions. As a component of defense-in-depth, Network Intrusion Detection System (NIDS) has been expected to detect malicious behaviors. Currently, NIDSs are implemented by various classification techniques, but these techniques are not advanced enough to accurately detect complex or synthetic attacks, especially in the situation of facing massive high-dimensional data. Besides, the inherent defects of NIDSs, namely, high false alarm rate and low detection rate, have not been effectively solved. In order to solve these problems, data fusion (DF) has been applied into network intrusion detection and has achieved good results. However, the literature still lacks thorough analysis and evaluation on data fusion techniques in the field of intrusion detection. Therefore, it is necessary to conduct a comprehensive review on them. In this article, we focus on DF techniques for network intrusion detection and propose a specific definition to describe it. We review the recent advances of DF techniques and propose a series of criteria to compare their performance. Finally, based on the results of the literature review, a number of open issues and future research directions are proposed at the end of this work.

1. Introduction

Network Intrusion Detection System (NIDS) is a new generation of network security equipment following the traditional security measures such as firewall and data encryption [1], which has been rapidly developed in recent years. It successfully resists many attacks and malicious actions and is called the second line of defense in the Internet. However, in the current big data era, the large amount of traffic data makes NIDS face critical challenges. First, large amounts of high-dimensional data increase processing complexity and need huge computing and storage resources. Second, many redundant and unrelated data could adversely affect network security detection. Third, some new attacks are difficult to detect due to big data process and analytics. Besides, the inherent weakness of NIDSs, such as high false positives (FP) and high false negatives (FN), raises urgent requests on effective solutions. Data Fusion (DF), as a promising technology of big data, has been applied into the domain of network intrusion detection to overcome the above-mentioned challenges in recent years.

The concept of DF originated from the US Air Force project; the US Department of Defense first proposed a Joint Directors of Laboratories (JDL) DF model based on national defense monitoring needs in 1987 [2]. Subsequently, DF was gradually studied and applied in other fields, such as automatic control, image recognition, target detection, and cyber security, and many scholars have proposed definition of DF based on their own studies and researches [3]. In order to clearly show the role of DF technology in network intrusion detection, an expression of DF in the field of NIDS is presented in this article.

In general, DF can be applied into three layers according to where fusions are needed, namely, data layer, feature layer, and decision layer. The data layer is the lowest system layer, playing the role of processing and integrating raw network data; the feature layer is the middle layer, fusing and reducing features of the preprocessed data; the decision layer is the highest layer, fusing and combining the inferences or decisions of various processing units. In the field of NIDS, most researches of data fusion only focus on the feature layer and the decision layer. It is because the network data they need to fuse comes from public datasets that have already been fused at the data layer. The use of DF technology at the feature level can greatly reduce the size of data processing, thereby enhancing the efficiency of NIDSs. Besides, useful and refined data generated by feature fusion can support decision-making and further improve the robustness and accuracy of the system. As for using of DF technology at the decision level, the decision fusion center fuses the decisions of multiple local detectors to obtain more accurate and reliable identifications of network behaviors.

Currently, a lot of research work has been carried out on DF for intrusion detection in order to improve the performance of NIDS. However, we found that the open datasets, the number of experimental data samples, and the fusion techniques used in many literatures are diverse. It is difficult to understand and analyze the strengths and weaknesses of different fusion techniques. Thus, it becomes essential to specify uniform criteria to evaluate them in view of a large number of references and give performance statistics of the current literature. This work is meaningful because it can make it easier for researchers and practitioners to understand the characteristics of the current DF techniques and methods.

In this article, we provide a thorough review on DF techniques in NIDS. We first describe DF for NIDS by representing the process and role of fusion for motivating this research work. We review existing DF techniques used in intrusion detection and propose evaluation criteria to analyze and compare the characteristics and performance of different fusion techniques. Besides, we simply analyze different open network datasets that can be used for testing intrusion detection techniques. Based on our review, we put forward current main challenges and point out promising research directions in this field.

The main contributions of this survey are listed as follows.(1)We give a description of DF for NIDS in order to motivate related research in this field.(2)We propose a number of evaluation criteria for evaluating fusion techniques for network intrusion detection.(3)We further employ the proposed criteria to review the performance of different fusion techniques, which offers a good reference for scholars in the fields of network security and information fusion.(4)We propose the challenges and promising research directions of DF for network intrusion detection based on our review.

The remainder of this article is organized as follows. Section 2 gives a brief introduction about the background knowledge of NIDS and DF. Several commonly used fusion techniques are elaborated in Section 3. Section 4 puts forward the evaluation criteria of data fusion techniques based on a large amount of literatures. The power of different fusion techniques is analyzed and compared in Section 5. In Section 6, the existing issues of DF are discussed, and some promising research directions are proposed. Section 7 summarizes the whole article.

2. Background Knowledge

In order to better understand this article, this section introduces some basic theory, including network intrusion detection and DF. Network intrusion detection is an old topic that has been repeatedly studied. We mainly present two kinds of intrusion detection techniques, anomaly-based and misuse-based, and explain their advantages and disadvantages, separately. As regards DF, we introduce it from its source, definitions, levels, and applications and put forward a general DF framework for intrusion detection to facilitate intuitive understanding.

2.1. Network Intrusion Detection

NIDS is a kind of network security scheme that can monitor the network transmission in real time and alert or take corresponding measures when detecting some behaviors that threaten network security. Actually, NIDS can be regarded as a pattern of recognition system that can distinguish malicious attacks from normal network behaviors. Intrusion detection technology plays an important role in the process of identifying malicious behaviors. The intrusion detection techniques based upon data mining generally fall into two categories: misuse detection and anomaly detection [4, 5]. The misuse-based detection, also called signature-based detection, is based on known attack signatures. It usually uses the well-known attack signatures to match and identify attacks. The advantages and disadvantages of the misuse-based detection are as follows [6].(1)Advantages(i)Fast and efficient detection of known attacks or specific attack tools.(ii)Detecting attacks without generating an overwhelming number of false alarms.(iii)Allowing system administrators, regardless of their security skills, to track their system security issues and run exception handlers.(2)Disadvantages(i)Hard to detect novel or unknown attacks.(ii)Hard to detect the variants of known attacks.

Due to the efficient detection and low false positive rate (FPR), the misuse-based IDSs are widely used in commercial networks. Furthermore, much excellent open-source software has also been implemented, typically represented by Snort. The Snort IDS is one of the commonly used misuse-based NIDSs, which performs real-time traffic analysis, content searching, and content matching to discover attacks using preidentified attack signatures [7]. It is popular with many researchers because of its open source and adaptability to various platforms. In [1], Tian et al. fused the alerts through Snort to test the performance of their proposed detection fusion system.

Although the misuse-based detection is efficient, it can only detect known attacks and cannot detect novel or zero-day attacks [38]. To detect novel attacks, the anomaly-based NIDS have been proposed. In many related literatures, most of the network behaviors acquired by researchers are normal, so NIDSs usually uses the anomaly-based detection techniques. Anomaly detection is a recognition model based on normal behaviors of the network connections. Any deviation from the established pattern of normal behaviors is considered to be a suspicious action. The anomaly detection seems to be able to detect all types of attacks, including unknown attacks. However, it indicates that some activities are suspicious but not malicious, resulting in high FP [39]. The advantages and disadvantages of the anomaly-based detection are as follows [6].(1)Advantages(i)It can detect novel or unknown attacks.(ii)It Produces information that can in turn be used to define signatures for misuse detectors.(2)Disadvantages:(i)It requires extensive training data of network connections and behaviors.(ii)FPR is not ideal.

The misuse-based detection is efficient in detecting known attacks but cannot detect novel attacks, while the anomaly-based detection can detect unknown attacks but usually has a high FPR. Therefore, NIDS used only one of these two which could be limited in performance and scope of application. To avoid the above defects, many hybrid approaches have been proposed, which combine the advantages of both misuse and anomaly detection [40]. Hybrid intrusion detection technology can be divided into three categories as follows.(1)Anomaly-based detection on top of misuse-based detection(2)Misuse-based detection followed on top of anomaly-based detection(3)Misuse-based and anomaly-based detection in parallel

Zhang et al. [15] implemented a hybrid system through the following first approach. This hybrid system can be used to detect known intrusions in real time and to detect unknown intrusions offline. Generally, in the past two decades, NIDSs have been fully studied. Intrusion detection technologies continue to improve and update. The performance of NIDSs has been greatly optimized accordingly, but NIDSs still face many challenges. The use of DF technology in the field of NIDS is a very promising research direction, which holds great potential to deal with these challenges.

2.2. Data Fusion
2.2.1. Data Fusion Definition

The concept of DF first appeared and applied in the military field in the 1980s, with strong military characteristics, which was called “intelligence synthesis.” Joint Directors of Laboratories (JDL) defines DF from the perspective of military applications as follows: DF is a process dealing with the association, correlation, and combination of data and information from single and multiple sources to achieve refined position and identity estimates, complete and timely assessments of situations, threats, and their significance. Waltz and Llinas [41] supplemented and modified the above definition in their work, replaced the “position estimate” with the “state estimate,” and added the detection function, which gave the definition: data fusion is a multilevel and multifaceted process and mainly completes the detection, integration, correlation, estimation, and combination of data from single and multiple data sources. Its purpose is to achieve an accurate estimate of the status and identity of the target and to make a complete and timely assessment of the situation and threats. Many other DF definitions are presented by some scholars based on their own researches and analysis. Although these definitions give us inspiration and guidance to some extent, they are not exhaustive in a particular area. A more specific expression of DF in the field of intrusion detection is beneficial to researchers within the field and motivates their own work. Therefore, based on these facts, we presented a specific description of DF in NIDS: “single source or multisource data collected from the network is preprocessed to obtain a uniform data format. More refining data of greater quality is obtained through feature fusion and association, which greatly improves the identification of malicious network behaviors. The initial decisions generated from multisource data are integrated in a decision fusion center to achieve more accurate and comprehensive inferences or decisions.” This expression is based on network intrusion detection; the goal of DF is to improve efficiency, accuracy rate (ACC) and robustness while reducing FNR and FPR, saving computing resources of system. We believe that the proposed definition is helpful to practitioners and researchers in the field of intrusion detection.

2.2.2. Data Fusion Levels

The data fusion is mainly applied at three levels with respect to the processing stage of the fusion [42]. Normally, three main levels are discerned: data, feature, and decision. At different levels, the representation of information is different: the outputs of the data level fusion and the feature level fusion are the “states,” “characteristics,” and “attributes,” and the outputs of the decision level fusion are “inferences” or “decisions.” Different fusion techniques and methods are usually used in different levels to improve overall performance of data processing.

The brief description of fusion levels is shown as below.(1)Data level fusion: it is also called low level fusion, which combines several different raw data sources to produce refined data that is expected to be more informative and synthetic.(2)Feature level fusion: it combines many data features and is also known as intermediate level fusion. The objective of feature fusion is to extract or select a limited number of important features for subsequent data analysis through feature reduction methods, which can reduce computation and memory resources.(3)Decision level fusion: it is also called high level fusion, which fuses decisions coming from multiple detectors. Each detector completes basic detection locally including preprocess, feature reduction, and identification to establish preliminary inferences on observed objectives. And then these inferences are fused into a comprehensive and accurate decision through the decision fusion techniques.

2.2.3. Data Fusion Applications

As a technology, DF is a multidisciplinary research field with a wide range of potential applications in such areas as automatic control, image recognition, target detection, and intrusion detection. The following is a brief introduction to DF applications based on the review of some related literatures.

In [43], Cao et al. presented a fire automation control system based on DF by applying it into intelligent building. The control system consists of six layers (sensor layer, sensor subsystem layer, primary fusion subsystem layer, decision management subsystem layer, actuator subsystem layer, and actuator layer). It can be applied into intelligent building to automatically realize accurate fire alarm and fire protection.

Zhang et al. proposed a DF based smart home control system [44]. The proposed smart home control framework includes the Internet access module, information acquisition module, and internal network service module with Bluetooth connection, data fusion controller that uses fuzzy logic and fuzzy neural network, and embedded computer in household appliances. It integrates information from multiple sources to control household appliances to create an intelligent home environment.

In [45], DF system based on D-S (Dempster-Shafer) evidence reasoning was proposed, in which two Charge Coupled Device (CCD) cameras and an Infrared Radiation (IR) sensor are used to extract the characteristics for identifying a missile target. Based on the D-S evidence reasoning, the authors recognized missile target and jamming light on region square feature and clutter and fire pile on position feature, respectively. The probability of identification obtained by integrating the three sensors with D-S evidence is greatly improved comparing with the method of using a single sensor.

Hu and Wang applied DF fuzzy theory to develop a fire alarm system based on a wireless sensor network [46]. This system not only offers detection correctness, but also improves the intelligence of monitoring. The proposed method has excellent performance and it is superior to traditional diagnostic methods with a single sensor.

In [47], a deep model for remote sensing DF and classification was proposed. The Convolutional Neural Network (CNN) is used to efficiently extract abstract information characteristics from Hyperspectral Image/Multispectral Image (MSI/HSI) and Light Detection and Ranging (LIDAR) data, respectively. Then, Deep Neural Networks (DNN) was used to fuse the heterogeneous characteristics obtained by CNN. The proposed depth fusion model provides competitive results in terms of classification accuracy. In addition, the proposed deep learning idea opens a new window for future remote sensing data fusion.

In [48], Yan et al. applied DF to reputation generation and proposed a reputation generation method based on opinion fusion and mining. The opinions were fused and classified into a number of major opinion sets containing opinions with similar or identical attitudes. Based on these opinion sets, the rating is aggregated to normalize the reputation of the entity. The experimental results from actual data analysis of several popular Chinese and English commercial websites demonstrated the versatility and accuracy of the method.

Liu et al. collected four articles to study the application of DF in the Internet of Things (IoT) [49]. With a large number of wireless sensor devices, IoT generates a large amount of data, which are massive, multisourced, heterogeneous, dynamic, and sparse. In the special issue, they believed that DF was an important tool for processing and managing these data to improve processing efficiency and provide advanced intelligence. By exploiting the synergy among the datasets, DF can reduce the amount of data, filter noise measurements, and make inferences at any stage of data processing in IoT.

A DF model for intrusion detection was presented in [42], based on clustering. The model uses a centralized approach to fuse data from different analyzers and then make a final analysis decision. The main strength of the proposed approach lies in its accuracy to fuse information from different detection modules and its adaptability to scalability. In addition, the DF module takes into account the efficiency of each analyzer in the process of fusion and can predict upcoming network threats.

2.2.4. A General Fusion Framework for Network Intrusion Detection

Herein, we specify a general fusion framework for network intrusion detection, as shown in Figure 1. The framework is comprised of the following parts.

(a) Input/Data Source. In order to monitor network status and detect and prevent attacks, we need to collect data from multiple sources in the network. These data include different types of packets and the statistical logs of network devices, for example, hosts, routers, and switches. They have different types and formats and cannot be processed directly.

(b) Data Preprocessing. The function of data preprocessing is to eliminate obviously wrong, invalid, or duplicate data and to get the valid data that can be used. The raw data is normalized and digitized through data preprocess, which is then converted into a unified format for analysis and processing.

(c) Feature Fusion. The network data has the characteristics of big data. Massive network data not only overly consumes computing and storage resources, but also cause dimensional disasters. Feature fusion occurs at the feature level and can reduce a large number of features to few features. The more streamlined data after feature fusion play a more important role in decision-making than the original features while accelerating data processing and increasing the detection accuracy of NIDS.

(d) Classification. Intrusion detection can be seen as a pattern recognition system. Its performance is determined by the classifiers. Classifier models are obtained through training to identify abnormal network behaviors and make timely responses to the network attacks.

(e) Decision Fusion. Decision fusion is the integration of multiple results of basic detectors. The so-called decisions in the intrusion detection can be understood as the detection results of network behaviors. Decision fusion can achieve improved accuracy and more specific inference than the way of using a single detector alone. Besides, decision fusion can effectively detect complex attacks by integrating multiple decisions.

(f) Output/Decision. Output is the final decision, which usually is a judgment in the NIDS, either an abnormal behavior (e.g., an attack) or a normal behavior.

3. Data Fusion Techniques for NIDS

This section introduces the data fusion techniques, mainly focusing on feature fusion and decision fusion. We classify the fusion techniques shown in Figure 2 and describe the commonly used fusion techniques.

As mentioned above, DF techniques in NIDS can be classified into the data layer fusion, the feature layer fusion, and the decision layer fusion. To the best of our knowledge, the majority of researches on NIDS are based on open datasets, which leads to the result that the data level fusion is omitted in the related literatures. Therefore, we mainly review the DF techniques at the feature layer and the decision layer.

There are two main categories for feature fusion in NIDS: filters and wrappers [50]. The filters are applied through statistical methods, information theory based methods, or searching techniques [51], such as Principal Component Analysis (PCA), Latent Dirichlet Allocation (LDA), Independent Component Correlation Algorithm (ICA), and Correlation-Based Feature Selection (CFS). The wrapper uses a machine learning algorithm to evaluate and fuse features to identify the best subset representing the original dataset. The wrapper is based on two parts: feature search and evaluation algorithms. The wrapper approach is generally considered to generate better feature subsets but costs more computing and storage resources than the filter [27]. The filter and the wrapper are two complementary modes, which can be combined. A hybrid method is usually composed of two stages. First, the filter method is used to eliminate most of the useless or unimportant features, leaving only few important ones, which can effectively reduce the size of data processing. In the second stage, the remaining few features representing the original data are used as input parameters to send into the wrapper to further optimize the selection of important features.

The decision fusion methods are divided into two classes: winner-take-all and weighted sum, by considering how to combine decisions from basic classifiers [32]. Majority vote, weighted majority vote, Naïve-Bayes, RF (Random Forest), Adaboost, and D-S evidence theories are classified as the type of winner-take-all because they all have measured values for each basic classifier and the final decision depends on the classifier with the highest measured value. In case of the weighted sum, the weight of each basic classifier depends on its own capabilities. The weights of basic classifiers are calculated, and then their outputs with the weights are added to give a final decision. The method of weighted sum mainly includes average and neural network. Figure 2 gives the categories of fusion techniques. In what follows, we briefly described several commonly used feature fusion and decision fusion techniques, respectively.

3.1. Feature Fusion Techniques

There are many types of feature fusion methods in the literature. We introduce some of them due to space limitations. Some classic fusion techniques are described below.

3.1.1. PCA

Principal Component Analysis (PCA) is a multivariate statistical technique used for feature reduction [12, 52]. The goal of PCA in intrusion detection is to extract n (small integer) most important features representing the dataset. It can achieve dimensionality reduction while removing noise from the data and improving the performance of the system. In order to achieve these goals, PCA needs to extract new variables, that is, the main components. The first principal component has the largest variance that is the most representative of the entire dataset. The second principal component is computed under the constraint of being orthogonal to the first component and to have the largest possible variance. The other principal elements are calculated in the same way. These principal components form the new features of the original data. Before applying PCA, the data must be averaged and normalized to avoid the imbalance between the data values. PCA is popular in feature fusion because its simplicity and high precision. Nonetheless, in fact, each principal component can be represented by a linear combination of primitive features, which leads to a lack of interpretability for these principal components, especially when a large number of features are involved.

3.1.2. CFS

Correlation-Based Feature Selection (CFS) evaluates and ranks feature subsets rather than individual features [27]. It tends to have a set of attributes highly correlated with the class but with low intercorrelation. CFS often uses a variety of heuristic search strategies (such as hill climbing and best-first) to search a feature subset space within a reasonable time period. It first calculates the matrix of feature-class and the feature-feature correlation from the training data and then uses best-first to search the feature subset space [50]. The equation for CFS iswhere is the heuristic of the feature subset containing features, is the average value of all feature-classification correlations, and is the average value of all feature-feature correlations. The molecular means the predictive ability of features, and the denominator indicates the redundancy between features.

3.1.3. GA

Genetic Algorithm (GA) is a search heuristic model for simulating natural selection processes [53]. This heuristic approach is often used to generate useful solutions for optimization and search problems. GA is a kind of Evolutionary Algorithm (EA), which uses natural evolution-inspired techniques (such as genetic, mutation, selection, and crossover) to generate solutions for optimizing results. We can use the evaluation function to calculate the goodness of each chromosome. This operation begins with the initial population of randomly generated generations of chromosomes, and the quality of each individual is gradually increased. Each individual chooses three basic GA operators, namely, selection, crossover, and mutation. In intrusion detection, in the face of a large number of features of original data, the GA can search for a subset of the raw features through Support Vector Machine (SVM), Neural Networks (NN), or other classifiers as evaluation functions. The advantage of this approach is that it has a flexible and powerful global search capability that converges from multiple directions without regard to previous knowledge of system behaviors. The main drawback is the high consumption of computing resources.

3.2. Decision Fusion Techniques

Comparing with feature fusion, the level of decision fusion is higher, and the data to be merged is more abstract. The decision fusion further improves the performance of the detection system, especially when a single detector is difficult to identify complex network behaviors. In what follows, we introduce several common decision fusion techniques.

3.2.1. Weighted Majority Vote

Weighted majority vote can assign weights to each basic classifier, which indicates the importance of the outputs of different classifiers for a final decision [32]. The weight varies according to the ability of the basic classifier to separate the samples. The formula is as below.where is the outputs of the classifiers from the decision vector , where is the number of classifiers and is 1 or 0 depending on whether classifier chooses , or not, respectively. The final decision to fuse multiple classifiers is determined by the base classifier’s output and corresponding weights . This method assigns a higher weight to the basic classifier with higher accuracy, but it ignores other inaccurate base classifiers. The weights for the base classifiers are difficult to obtain and adjust. Therefore, it is difficult to detect new network attacks.

3.2.2. Bayesian Estimation

Bayesian estimation is applied to DF for a long time. It is an excellent method if prior probability is known. In order to obtain the most accurate and comprehensive information, this method first analyzes the compatibility of various sensors, removes false information with low confidence, and makes the Bayesian estimate of useful information under the assumption that the corresponding prior probabilities are known. The advantages of Bayesian approach include explicit uncertainty characterization and fast and efficient computation. Moreover, Bayesian networks offer good generalization with limited training data and easy maintenance when adding new features or new training data [23]. The disadvantage of Bayesian estimation is that it cannot distinguish unaware and uncertain information, and it can only handle the related events. In particular, it is difficult to know the prior probabilities in practical applications. When the hypothetical prior probabilities are contradictory to reality, the results of the inference will be undesirable and will become quite complicated when dealing with multiple hypotheses and multiple conditions. In fact, the Bayesian inference methods are now rarely applied in DF because of these defects.

3.2.3. D-S Evidence Theory

The Dempster-Shafer evidence theory, abbreviated as D-S theory, is a complete theory of dealing with uncertainty. Its most notable feature is the usage of “interval estimates” rather than “point estimates” for the description of uncertainty information. It shows great flexibility in distinguishing between unknown and uncertain. These advantages make it widely applicable to information fusion, expert systems, intelligence analysis, and multiattribute decision analysis.

In the NIDS using the DS evidence theory, the results of each basic classifier are considered to be different “evidences.” Different pieces of evidence of the same hypothesis (e.g., network connection categories, such as normal or attack) are integrated to obtain the supporting degree of the hypothesis. On the basis of the supporting degree, whether the network connection is normal or intrusion can be finally judged [31]. Zhao et al. used D-S theory to fuse several basic classifiers [33]. The correct rates of fused results in terms of every kind of intrusions are all close to, or even higher than, the highest correct rates of all basic detectors, which achieves a high correct rate to all intrusions. D-S Evidence Theory is considered as the generalization of the Bayesian theory. It can well represent “uncertainty” and does not need to know prior probabilities, compared with the Bayesian theory. Besides, it also has some drawbacks, such as the fact that the evidence is required to be independent and there is a potential exponential explosion in computation.

3.2.4. Neural Network

Neural Network (NN) is a supervised learning method that consists of input neurons, output neurons, and hidden neurons. In order to represent the relationship between the input neuron and the output neuron, the neural network needs a large amount of labelled data to train and obtain an accurate model. NN has the characteristics of self-learning, self-adaptation, self-organization, and fault-tolerant, which enable it to solve complex nonlinear problems. Furthermore, the advantage of NN is that it can automatically adjust the connection weights without any domain-specific knowledge, while other methods use preselected weights to combine outputs [32]. Therefore, its strong capabilities can be well adapted to the requirements of multisource DF in NIDS. In network intrusion detection, the classification results of multiple detectors are used as input neurons, and the output neurons are integrated classification results. The output of the neural network is used as feedback to adjust the training parameters. With the improved parameters, the detectors can be fused to produce an improved resultant output. The main drawback of NN is the lack of valid criteria for creating, selecting, and combining the results of the base classifiers. For example, one may use a Multilayer Perceptron (MLP) or a radial basis function to find fusion weights with different structure.

Please note that the DF techniques are not limited to the above-mentioned ones. Other techniques are no longer described in detail. These techniques can be applied to fuse network data. The performance comparison of different fusion techniques is given in Section 5 based on the criteria proposed in Section 4.

4. Evaluation Criteria of DF Techniques

The application of DF techniques in intrusion detection has received particular attention in the field of network security. Many studies on DF have been conducted to improve the performance of NIDS. However, DF in NIDS still faces many serious challenges, such as how to reduce the complexity of massive data, how to ensure data security, and how to overcome the complexity and improve the efficiency of the fusion. Therefore, in order to facilitate the analysis and comparison of different fusion techniques, we propose a number of criteria for evaluating the performance of fusion techniques in NIDS based on the traditional criteria of IDS performance. Herein, we introduce specific evaluation criteria. Since most of the experiments for NIDS performance testing are based on a few public datasets, we firstly introduce the commonly used datasets for intrusion detection.

4.1. Datasets

Since real-time network data brings personal or organizational privacy issues and cannot be used for comparison of different algorithms, most of researches conduct experiments based on open datasets. Fusion techniques may show different performance based on different datasets. Herein, we introduce some classic datasets and new but more realistic datasets that are used in the field of intrusion detection research.

4.1.1. DARPA Dataset

In order to evaluate difficult intrusion detection techniques, the United States MIT Lincoln Laboratory successfully constructed a complete dataset in 1998, namely, DARPA 1998. The dataset is a 9-week network connection data collected from a simulated US Air Force LAN, dividing into training data and testing data. The testing data contains some types of attacks that do not appear in the training data, which makes the dataset more realistic. The KDD99 dataset was generated for the KDD cup competition, which extracts 41 features from the DARPA 1998 dataset. It is one of the most popular and comprehensive intrusion detection datasets and is widely applied to evaluate the performance of NIDSs [54]. It includes a complete training set, 10% training set, and a testing set. Each connection record in the KDD99 training dataset contains 41 feature attributes and an attack type label. The type of attack in KDD99 training dataset mainly includes Denial-of-Service (DOS) attacks, Probe attacks, User-to-Root (U2R) attacks, and Remote-to-Local (R2L) attacks. The KDD99_10% packet is a 10% sample of KDD99 packets, with approximately 490,000 data records, which is used in most of the literatures. However, there are many problems in KDD99; for example, the number of different types of attacks is not balanced and some data records are duplicate or invalid. To address these problems in the KDD99 dataset, as a new revision of the KDD99, NSL-KDD was proposed by Tavallaee et al. [55]. The training and testing datasets of the NSL-KDD consist of approximately 125,973 and 22,544 connection records, respectively. Similar to the KDD99 dataset, each record in this dataset has 41 quantitative and qualitative features.

4.1.2. Kyoto 2006+ Dataset

There is a fatal problem in the existing dataset benchmark (KDD99) for network security, which does not reflect the current network security situation and the latest attack characteristics. This is because it generated from a simulated network nearly 20 years ago. To overcome its limitations, the Kyoto 2006+ dataset was presented by Song et al. [56]. It is a dataset based on actual traffic data from 2006 to 2009, which comes from different types of honeypots installed in the Kyoto University. The dataset consists of 14 conventional features captured by honeypots based on the KDD99 dataset and 10 additional features. Conventional features include the duration of the session, service, source byte, and destination byte, which is meaningful and important for subsequent data processing or decision-making. In addition to 14 statistical features, additional features were extracted, which may enable us to investigate effectively what kinds of attacks happened in networks. It can be used for further analysis and evaluation of NIDSs. The Kyoto 2006+ dataset includes about 50,033,015 normal sessions and 434,343,255 attacks, in which 425,719 attacks are unknown. Each connection in the dataset has 23 features. Compared to the KDD99 dataset, the Kyoto 2006+ dataset is generated in the real network. By using the Kyoto 2006+ dataset, researchers can access more realistic and practical network security attacks.

4.1.3. UNSW-NB15 Dataset

The above-mentioned datasets cannot meet the needs of research on the current network security situation, especially KDD99 and NSL-KDD. The UNSW-NB15 [57] was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a dataset that consists of real modern normal activities and synthetic contemporary attacks. The data collection period was 16 hours on January 22, 2015, and 15 hours on February 17, 2015. Tcpdump tool is used to capture 100 GB of the raw traffic. This dataset contains nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. Moreover, the Argus and Bro-IDS tools are used and twelve algorithms are developed to generate in total 49 features with class labels. There are 175,341 records in the training set and 82,332 records in the testing set. The key characteristics of the UNSW-NB15 dataset are a hybrid of the real modern normal behaviors and the synthetic attack activities. Thus, this dataset is considered as a new benchmark dataset that can be used for evaluating NIDSs by the NIDS research community [57]. It is worth noting that the IXIA tool contains all the information about the new attacks that are continuously updated from CVE site 4. This site is a public information security vulnerability and exposure dictionary. However, it is undeniable that the UNSW-NB15 dataset is more complex than the KDD99 dataset [58].

4.2. Validity

The validity is the key to measuring the quality of the NIDS. The purpose of the application of fusion technology is to improve the performance of intrusion detection. Therefore, the validity can still be used to measure the fusion technology.

The elements of the validity evaluation metrics include TP (the number of positive samples predicted to be positive), FP (the number of negative samples predicted to be positive), FN (the number of positive samples predicted to be negative), and TN (the number of negative samples predicted to be negative). Based on these measurement elements, the accuracy (ACC), precision rate (PR), recall rate (RR), F-Measure, FPR, and FNR are applied to evaluate the performance of the fusion techniques. These metrics’ formulas are listed in Table 1.

4.3. Efficiency

In the big data era, communications and activities between people generate high volume and high-dimensional network data that require real-time classification. In NIDS, not only the network behavior classification technology needs to be efficient, but also the efficiency of data fusion is crucial [59], which determines the efficiency of NIDS. Training time and testing time can be used to measure the efficiency of fusion technology. Besides, the number of features produced by feature fusion also measures the efficiency of the fusion technique.

4.4. Data Security

In actual network monitoring, DF and classification techniques concern data security issues in order to provide trustworthy data fusion results, such as data confidentiality, integrity, and creditability. We must consider that the privacy of individuals or organizations cannot be compromised when we analyze and fuse network data. Therefore, data security and data privacy also need to be considered in data fusion.

4.5. Scalability

Digital communications will enter the era of 5G with the rapid technology development. Large-scale heterogeneous networks have become the trend of network development, and mass data and heterogeneous DF technologies are increasingly important. Fusion techniques and frameworks should take scalability into consideration, such as compatibility with different data formats and scalability of memory and CPU, which, therefore, becomes a measure of fusion technologies.

5. Comparisons and Discussions

Based on the above evaluation criteria, we conduct a rigorous review and analysis on 31 related studies, of which 23 are feature fusion techniques and the remaining 8 are decision fusion techniques. The results of research and analysis are listed in Tables 2 and 3, respectively. The experiments reported in the above work were conducted based on published datasets, including KDD99, NSL-KDD, Kyoto 2006+, and UNSW-NB15. We analyzed and compared the performance of different fusion techniques in terms of the feature fusion and the decision fusion based on the proposed criteria and specified metrics. It must be mentioned that the following comparisons are made based on different datasets. In addition, the experimental details in the literature are different, which may affect the performance evaluation of data fusion techniques.

5.1. Comparison of Feature Fusion Techniques

In this part, we review feature fusion techniques based on our proposed criteria and show our evaluation results in Table 2.

The original intention of feature fusion is to reduce the size of data and improve the operation efficiency of NIDS. Therefore, the efficiency is the key to measure the quality of feature fusion. We concern with training time compared with testing time in evaluating the efficiency of feature fusion. This is because the training time is usually far longer than the corresponding testing time. We first analyze and compare the training time of classifiers using different feature fusion techniques based on different datasets. For the KDD dataset series (DARPA99, KDD99, and KDD99_10%), we can find that the training time of network intrusion classifier using the following feature fusion techniques is shorter than others, such as GFR, FRM-SFM [18], and CART [23]; CFS-GA [25] is very efficient for the NSL-KDD dataset; based on the Kyoto 2006+ dataset, PLS [12] helps to reduce time consumption of classifier training. In summary, these mentioned fusion techniques are outstandingly efficient in the training time of network behavior classifier. What these fusion techniques have in common is that fewer features are generated regardless of the dataset, with a minimum of 4 features in [25]. The filter is more efficient than the wrapper among these feature fusion techniques, and the hybrid methods usually have excellent efficiency.

In addition to efficiency, the validity is also an important measure of feature fusion techniques. For the KDD dataset series, SA-SVM [20], GA-LR [16], (Filter-MISF, FMIFS) [17], PCA [11], MIFS [24], (FRM-SFM, GFR) [18], SVM [9, 10, 19], (GeFS-mRMR, GeFS-CFS) [19], and NN [9] achieved very high accuracy, exceeding 99.20%, and the highest was 99.96% of SA-SVM. In addition, the FPR of Filter-MISF, GA-LR, Filter, MIFS, MLCFS [24], and SVM are less than 0.50%. We found out that GA-LR, SVM, Filter-MISF, and MISF perform very well in terms of validity in the KDD dataset series. As for the NSL-KDD dataset, (FMISF, MIFS, FLCFS) [24], Chi-Square [21], FVBRM [27], and CFS [30] performed excellently in accuracy, both exceeding 96.75% and up to 99.91% of FMIFS. The FPR of FMISF, MIFS, and FLCFS are all lower than 0.53%, and Chi-Square’s FAR is 0.13%. These feature fusion techniques have outstanding characteristics in NIDSs based on NSL-KDD datasets. In the Kyoto 2006+ dataset, the accuracy of (FMIFS, MIFS, FLCFS) [24] and (HVS, PCA) [12] was all higher than 97.12%, and the FPR of FMIFS, MIFS, and FLCFS are all below 0.58%.

A notable fact is that the accuracy of the classification in the new dataset (UNSW-NB15) is not as good as the old datasets mentioned earlier (such as KDD dataset series). The major reason is that the UNSW-NB15 dataset is considered complex due to the similar behaviors of the modern attack and normal network traffic compared to the KDD99 dataset [55]. So far, the effectiveness of network intrusion detection is not good based on the UNSW-NB15 dataset. The accuracy in [16] reached the highest accuracy 81.42% based on our statistics, and the corresponding feature fusion technique and classifier are GA-LR, C4.5, respectively. Decision Tree (DT) classifier has indeed performed better in the UNSW-NB15 dataset [55] than other methods. The misfortune is not alone. The FAR of NIDSs in the UNSW-NB15 dataset is also bad. Therefore, advanced classification techniques and feature fusion techniques need further study. In general, GA-LR, SVM, Filter-MISF, and MISF show excellent validity in the KDD dataset series; FMISF, MIFS, FLCFS, and Chi-Square are more valid in the NSL-KDD dataset; the feature fusion techniques with high-validity are FMIFS, MIFS, and FLCFS in the Kyoto 2006+ dataset. Because the performance of network intrusion detection based on UNSW-NB15 dataset is not very good, more advanced fusion and classification techniques should be further investigated in order to identify the anomalies from this complex dataset.

Unfortunately, the fusion techniques in the literature we have reviewed have not considered the security of data fusion. The data privacy issues were not covered because existing experiments were based on the public datasets. In addition, the scalability of fusion technologies and frameworks were normally not mentioned in the past work. However, these properties of data fusion are particularly important in the big data era. More efforts are needed in order to solve these issues.

5.2. Comparison of Decision Fusion Techniques

In this subsection, we analyze the performance of different decision fusion techniques based on the proposed criteria and show our evaluation results in Table 3.

According to Table 3, we can find that the training and testing time of the classifiers are not recorded. The reason is that decision fusion techniques fuse the recognition results of basic classifiers. Although the training and testing time of classifiers can reflect the efficiency of classifiers, it cannot reflect the merits of decision fusion techniques. Besides, the KDD dataset series are used in the most statistical literature. So herein, we mainly analyze the validity of decision fusion techniques based on the KDD dataset series. The accuracy of D-S Evidence Theory [32, 33] and NN [33] is over 99%, which is usually higher than the accuracy of a single basic classifier. The FPR is also reduced through the integration of basic classifiers. The FPR in [31] (D-S Evidence Theory) is as low as 0.19%. As a group, D-S Evidence Theory, Data-Dependent Fusion, NN, RF, and Adaboost show good fusion performance in combining multiple basic decisions.

Like the feature fusion techniques, the existing decision fusion techniques did not consider the credibility of basic decisions and data security in the process of integration, which will affect the reliability of the final results or cause privacy leakage. Besides, most of the literatures also fail to analyze the scalability of decision fusion. We believe that these aspects are very important and should attract special attention.

6. Open Issues and Future Research Directions

In recent years, DF has achieved special attention and has developed rapidly in many fields. In the field of network intrusion detection, scholars have conducted extensive researches in DF and have made significant progress. However, the current data fusion techniques still face some serious challenges or open issues, which are summarized as below according to our literature review.

First, most of the existing researches were conducted based on open datasets and the practicability of these fusion algorithms or techniques needs further validation. Few researches used real network data because it is easy to expose privacy and cannot measure or compare with other existing works, which is not conductive to the development of data fusion technology. In fact, this is a difficult contradiction, which hinders the further development of network intrusion detection.

Second, in the era of big data, the network security monitoring and prevention may need real-time fusion and processing of massive network data. However, large data communication overhead and long computation delay are obviously a big challenge to overcome.

Third, existing DF technologies do not consider data security, including confidentiality and credibility. The feature fusion techniques could reveal the privacy of individuals or organizations, and the decision fusion techniques need to identify the credibility of local decisions. All above are not considered in the past work.

Fourth, since most of the researches conducted their work over some public datasets and these datasets are preprocessed, there are few data level fusion techniques used in intrusion detection. However, we are facing a large number of different types of raw data in actual networks. Thus, the data layer fusion becomes indispensable for intrusion detection. Special efforts are expected on data fusion with regard to network intrusion detection.

Fifth, there is a lack of studies on the visualization of data fusion. Through utilizing the visualization algorithm, we can not only deeply understand the features and effectiveness of the fusion technology, but also easily identify the distribution characteristics of the fused data. Few articles use a visual method to analyze classical datasets. In [60], Ruan et al. performed a visual analysis of the KDD99 dataset using MDC and PCA techniques to clearly identify normal and attack clusters. Based on this research, we believe that it is also necessary to provide a beautiful and comprehensive data fusion expression.

In addition, based on the above open issues, we further proposed a number of promising research directions in the field of data fusion for network intrusion detection.

First, the improvement of data fusion technology depends on new datasets to evaluate and verify. Most of the fusion techniques and intrusion detection technology show excellent performance on some old datasets, such as KDD99 and NSL-KDD. However, these datasets are out of date and do not represent the current network security situation, which deviates from the actual network security detection. More research needs to be done on new dataset collection, such as UNSW-NB15. The existing problem is that the performance of feature fusion based on the UNSW-NB15 dataset is not good. We should further study more advanced or appropriate fusion techniques to better identify abnormalities from complex network data.

Second, big network data fusion techniques should be investigated. The current fusion techniques are difficult to effectively and adaptively integrate network data of high-velocity, varieties of formats and types. In the era of big data, in addition to the large amount of data, the network data that needs to be collected come from different sources in different types of networks. Therefore, the collection of heterogeneous network data is required to research more advanced fusion methods.

Third, universal, flexible, and extensible fusion framework should be studied. There are many kinds of data fusion technologies, and the principles and mathematical theories of some fusion technologies are not easy to understand. Therefore, the simple, easy-to-use, universal, and easy-to-expand network data fusion architecture is worth studying. It can modularize mature fusion techniques and provide open interfaces for new fusion methods and architectures; thus, it greatly promotes the development of data fusion in the field of intrusion detection.

Fourth, data security in data fusion should be ensured. Most of the existing researches are based on public datasets, and security issues were not considered at all. In an actual network, network data includes personal or organizational information, which is easily revealed during the integration process, and the credibility and integrity of network data are difficult to guarantee. Data security and privacy should be protected and ensured in order to achieve trustworthy data fusion.

Finally, data layer fusion is an essential part of study towards efficient and practical data fusion in real-time network intrusion detection. The data layer fusion has not been seriously studied by relevant literature because of the widespread use of public datasets. The study of data layer fusion is also very significant, especially for practical applications. However, it is very difficult to collect and evaluate the original network data containing various modern attacks.

7. Conclusion

In this article, we categorically presented a detailed review on the feature fusion techniques and the decision fusion techniques used in NIDSs. A specific description of DF in the field of intrusion detection was presented in order to motivate this work. Based on the literature study, we proposed the evaluation criteria of data fusion techniques in terms of NIDS. The performance of different data fusion techniques is measured using the proposed criteria. We found that, in the feature fusion, in addition to some excellent fusion techniques, such as SVM and MIFS, the improved types of fusion techniques and hybrid fusion techniques are generally efficient and valid. For the decision fusion techniques, D-S Evidence Theory, NN, RF, and Adaboost can combine multiple decisions more precisely than other methods regarding the studies based on KDD dataset series. In addition, we found many effective classification algorithms in NIDS, namely, RF, C4.5, NN, and SVM, as well as their variants. Unfortunately, the current fusion techniques normally did not consider the security and the scalability of DF.

DF has been regarded as one of the most important technologies in improving the performance of the NIDSs. The use of DF can well alleviate the defects of network intrusion detection and improve the comprehensive performance of NIDSs. However, there are still many deficiencies in current DF techniques. Based on our review, we pointed out the main challenges and promising future research directions in this field of research. In summary, this article provides a good reference for researchers and practitioners in the field of network intrusion detection.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is sponsored by the National Key Research and Development Program of China (Grant 2016YFB0800700), the NSFC (Grants 61602359, 61672410, and U1536202), the Project Supported by Natural Science Basic Research Plan in Shaanxi Province of China (Program no. 2016ZDJC-06), the Fundamental Research Funds for the Central Universities (JB181503), the 111 Project (Grants B08038 and B16037), and Academy of Finland (Grant no. 308087).