Abstract

As computer networks keep growing at a high rate, achieving confidentiality, integrity, and availability of the information system is essential. Intrusion detection systems (IDSs) have been widely used to monitor and secure networks. The two major limitations facing existing intrusion detection systems are high rates of false-positive alerts and low detection rates on zero-day attacks. To overcome these problems, we need intrusion detection techniques that can learn and effectively detect intrusions. Hybrid methods based on machine learning techniques have been proposed by different researchers. These methods take advantage of the single detection methods and leverage their weakness. Therefore, this paper reviews 111 related studies in the period between 2012 and 2022 focusing on hybrid detection systems. The review points out the existing gaps in the development of hybrid intrusion detection systems and the need for further research in this area.

1. Introduction

The Internet has thrived, hence an increase in information sharing, making network security a problem of concern. Attackers around the globe have their eyes on computer systems with the motive of deploying attacks. The security of an electronic device is breached when a successful attack occurs. Intrusion is defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource” [1]. The Integrity aspect of a given infrastructure serves to ensure information remains unaltered by unauthorized users. Availability incorporates all aspects of the infrastructure that makes information readily available to users in the system. Confidentiality implies that the information in a given system is protected from unauthorized access and viewing by external parties. Therefore, a computer network is considered to be fully secured when the core objectives of these three attributes are sufficiently met. To help achieve these objectives, intrusion detection systems have been developed with the primary intent of monitoring incoming traffic in computer networks for any potential malicious intrusions.

An intrusion detection system (IDS) scans information system resources and reports any malicious activities in the system. More advanced IDSs have the capability of acting against the attacks. The action taken by this advanced IDS is to block the malicious users or activities from accessing the computer resources. We have two major categories of intrusion detection systems, which include misuse based and anomaly based. Misuse-based IDSs are developed to flag known attacks using patterns of the known attacks [2]. Misuse detection systems use patterns of well-known attacks or weak spots of the system to match and identify known intrusions. The positive side of misuse IDS is the ability to detect known attacks with great precision. The major challenge facing this type of IDS is their inability to flag new forms of attacks [3]. Misuse intrusion detection systems stand out because of their ability to flag many or all known attack patterns. The main problem facing misuse-based systems is the inability to flag emerging attacks or zero-day attacks. In general, they have a high rate of detection and low rate of false alarms compared to anomaly-based systems. The anomaly-based technique stores the normal behavior of a user in a database and compares it with the current behavior of the user [4]. If there is a substantial difference, then there is something wrong or abnormal. The major advantage of anomaly detection is that it does not require information of known attacks, and thus they can detect new forms of attacks. It has a high rate of false alarm compared to misuse-based IDS.

Hybrid intelligent systems have been developed to solve the challenges of the existing intrusion detection systems, such as high rate of false-positive alerts and low detection rate of novel attacks. Hybrid is a technique that combines misuse-based and anomaly-based techniques [5]. The hybrid technique resolves the disadvantages of the two legacy IDSs. Research shows that hybrid detection systems have better performance compared to single IDS.

Despite their proven performance, hybrid intrusion detection systems remain largely unexplored as seen from the few number of existing systematic literature reviews on the topic. This work, therefore, attempts to perform a comprehensive systematic literature review on hybrid intrusion detection systems between 2012 and 2022 with the objective of pointing out existing gaps in the development of these systems.

This study is arranged as follows. Section 2 introduces and discusses IDS. Section 3 provides a discussion on hybrid detection techniques. Section 4 discusses the methodology adopted in this paper. Section 5 discusses the findings. Section 6 points out the existing gaps in the reviewed literature and insights for future research. Table 1 summarizes all hybrid intrusion detection systems between the periods of 2012 and 2022. Finally, Table 2 lists all abbreviations in this study.

2. Intrusion Detection Systems

Denning introduced the technique of detecting intrusion, and since then researchers have worked hard to automatically detect intrusions in network systems [6]. Intrusion detection systems have been defined as the technique of using artificial intelligence, machine learning, and database systems to uncover malicious patterns in large datasets [2]. IDS can be broadly classified into two major categories, anomaly-based IDS and misuse-based IDS. Recently, other methods have emerged through the integration of anomaly and misuse intrusion IDSs to yield more categorizes.

2.1. Anomaly-Based Intrusion Detection Systems

Anomaly intrusion detection systems profile the normal behavior of a system. They monitor the normal operations of the system, and if they detect an anomaly, a flag is raised. Instead of keeping all patterns of well-known malicious dataset and updating as new patterns emerge, anomaly detection systems outline “normal” operations of a system and flag anything that deviates from the outline [2]. According to [7], anomaly IDS contains three stages: parameterization, training stage, and detection stage. In the parametrization stage, the data are formatted to capture the normal behavior of the device. After parameterization, the model is trained to represent the normal behavior. The detection stage is where the model detects and flags any deviation from the normal behavior based on the parameterized data [7].

Different intrusion detection mechanisms have been used in the development of the anomaly IDS. Mishra and Yadav [8] outlined the following techniques: data mining techniques, machine learning-based techniques, and statistical approaches. In these techniques, some researchers have used single algorithms while others have opted to integrate algorithms to improve the performance of the IDS [8].

Atefi et al. [9] developed anomaly detection based on profile signature using genetic algorithm and support vector machine algorithms. SVM outperformed GA in terms of precision rate. The researchers combined the two algorithms to form a hybrid IDS. The evaluation of the hybrid IDS produced better performance compared to the single algorithms.

Khoei et al. [10] investigated the application of three types of ensemble learning techniques for anomaly IDS. The three techniques applied were bagging, boosting, and stacking. The performance of the three techniques was compared with that of decision tree (DT), Naïve Bayes (NB), and K-nearest neighbor (KNN). The results showed that stacking-based ensemble learning techniques outperformed the traditional learning techniques in terms of detection rate, false alarm rate, miss detection rate, and accuracy rate.

Rakshe and Gonjari [11] developed an intrusion detection model based on SVM and random forest algorithms. The two algorithms were used for classification purposes. The models were evaluated using NSL-KDD. The models recorded detection accuracy of more than 95%. The performance of the two models was compared, and the random forest algorithm performed better than SVM in the classification of traffic.

Kumar et al. [12] developed an anomaly intrusion detection system based on four algorithms, namely, Naïve Bayes, ID3, MLP, and ensemble learning. The models were evaluated using CICIDS2017 dataset. The ensemble model was developed by combining NB, ID3, and MLP. The metrics used in the evaluation of the models were precision, recall, accuracy, and F1 score. ID3 (decision tree) performed better compared to the other models.

Once anomaly intrusion detection systems have been developed, they do not need regular updates unless a major user or system change has been done. Anomaly IDS can flag new forms of attacks, unlike the misuse IDS. Due to the above-mentioned characteristic of anomaly intrusion detection systems, they are considered to be more effective compared to their counterpart misuse intrusion detection system whose performance highly depends on stored patterns that require regular updates.

Profile creation is the main issue in anomaly intrusion detection because there is no fixed normal action or behavior of the user, and different users use computer systems differently. Capturing the profile of different users as normal has proven to be difficult, hence creating the main limitation of anomaly IDS. With the limitation arises the issue of high false-positive alerts because any abnormal action by the user is considered an attack. Research in this area is focused on how to profile normal action and how to reduce high false-positive rates.

2.2. Misuse/Signature Intrusion Detection System

Misuse intrusion detection systems depend on well-known attack signatures to capture attacks and to flag intrusions using well-known patterns. The well-known signatures are captured and labeled to assist in intrusion detection. The labeled patterns are stored in a database that needs regular updates when new patterns are captured. For detection of attacks, misuse-based IDS compares the received traffic with the stored signatures in the database; if the patterns are similar, the traffic is marked as an intrusion; else, the traffic will be marked as normal.

Unlike anomaly-based IDSs, misuse IDSs are easy to create as the pattern of malicious code is known. The code of the malicious malware is analyzed for a unique pattern, and this pattern is used to create the baseline signature to be used for detection. This makes misuse-based IDSs have a high positive detection rate as they depend on well-known information. Users must keep updating the corresponding databases for new signatures.

Over the years, research has been done on this area of misuse intrusion detection. Zhang et al. [13] proposed a misuse intrusion detection system for defending LAN users using the XGBoost algorithm. To develop and evaluate the model, the researchers used real-time data collected from LAN of 10 different Asian countries. The model was evaluated using collected data from 45 networks. The model recorded 97.5% in overall precision and 97.5% in the overall recall. In addition, the researchers observed that LAN intrusion detection is affected by ARP, MDNS, and NBNS protocols. The main advantage of this model is that it was evaluated using real-time network data which means that the model can be deployed in the existing LANs as it is or with minor changes.

Taher et al. [14] used the artificial neural network (ANN) and support vector machine (SVM) technique to develop a signature-based intrusion detection model. The two algorithms were to find the algorithm with the best performance in terms of classification. NSL-KDD dataset was used for the evaluation of the models. According to the researchers, the ANN-based model outperformed the SVM model in classification. The ANN-based model recorded a detection rate of 94.02%. The model can be further investigated using an updated dataset.

Erlacher and Dressler [15] proposed Internet Protocol Flow Information Export (IPFIX) signature-based intrusion detection known as FIXIDS. The model uses the newly added HTTP-related flow information elements (IEs) to detect intrusion in high-speed networks. The model outperformed Snort in general. This technique can be investigated further in future for standard flow.

Tug et al. [16], using blockchain technology, proposed collaborative signature-based intrusion detection system referred to as CBSigIDS. The model uses blockchain technology to incrementally update and distribute secure signatures database in a collaborative network. Evaluation of the model shows that blockchain technology can be used to improve the performance of signature-based IDS in secure manure. In future, research can be done on the application of blockchain technology in anomaly IDS.

The main limitation with misuse intrusion detection systems is that they cannot detect zero-day attacks or new forms of attacks. At the point of realization of a new form of attack and the creation of the signature of the attack, most of the computer systems are already left vulnerable. Misuse intrusion detection systems also require large storage memory to store the signature library.

The focus area of research on this type of intrusion detection system is on how to reduce the volume consumed by the database. Another potential area of research is how to make this IDS able to detect zero-day attacks.

3. Hybrid Intrusion Detection System

With the evolving variety of attacks, the two classical IDSs mentioned above cannot protect our information systems effectively. New methods of combining different intrusion detection systems to improve their effectiveness have been proposed. Research has shown that combined algorithms perform better than single algorithms [17].

The goal of hybrid intrusion detection systems is to combine several detection models to achieve better results. A hybrid intrusion detection system consists of two components. The first component processes the unclassified data. The second component takes the processed data and scans it to flag out intrusion activities [18].

Hybrid intrusion detection systems are based on combining two learning algorithms. Each learning algorithm possesses unique features, which assist in improving the performance of the hybrid [19]. Hybrid IDSs can be broadly categorized into cascaded hybrid, integrated-based hybrid, and cluster + single hybrid.

In [5], Kim et al. proposed a hybrid intrusion detection system based on signature-based and anomaly detection components. In the first stage of the model, a misuse detection component was applied to detect known attacks based on the captured patterns. This component was based on the C4.5 decision tree algorithm. The second stage consisted of an anomaly detection component to leverage the shortcomings of the misuse detection component. To develop the second component of the model, multiple one-class SVM algorithms were used. The performance of the model was tested using the KDD Cup 99 dataset. The model performed better than the single traditional IDS.

In [20], the researchers combined feature extraction techniques and classification techniques to increase detection rate while at the same time reducing false alarm rate. In the first stage of the hybrid, chi-square was used for feature selection. The goal of this stage was to reduce the number of features in the dataset but maintaining the important features that capture the attacks. In the second stage, a multiclass support vector machine (SVM) algorithm was used for classification. Multiclass support vector machine was used in this model to improve classification rate. The model was evaluated using the NSL-KDD dataset, with the results showing that the model recorded a high detection rate with a low false alarm rate.

In [21], Khraisat et al. developed a hybrid detection model based on a C5 decision tree classifier and one-class support vector machine (OC-SVM). The model consisted of two major components. A C5.0 decision tree classifier was used to develop the first component of the model for misuse detection. The second component was developed using OC-SVM for anomaly detection. The researchers tested the performance of the model using the NSL-KDD and Australian Defence Force Academy (ADFA) datasets, and the results showed that the hybrid model was superior to single-based models.

Khan proposed a hybrid intrusion detection model based on convolutional neural network (CNN) and recurrent neural network (RNN). The research aimed to improve feature extraction, which is fundamental in the performance of intrusion detection systems. CNN was used in the first phase to extract local features in the dataset, with the RNN being used in the second phase to extract temporal features in the dataset. This technique resolved the issue of data imbalance on the available dataset. To test the performance of the model, the CSE-CIC-DS2018 dataset was used, which is the updated dataset. The model outperformed other intrusion detection models, with an intrusion detection accuracy of 97.75% [22].

In [23], the researchers proposed a hybrid model intrusion detection model for smart home security. The model consisted of two components. The first component applied machine learning algorithms to real-time intrusion detection. Algorithms used in this component included random forest, XGBoost, decision tree, and K-nearest neighbors. The second component applied the misuse intrusion detection technique for detection of known attacks. To test the performance of the model, the CSE-CIC-IDS2018 and NSL-KDD datasets were used. The model recorded an outstanding performance for detection of both network intrusion and user-based anomalies in smart homes.

In [24], the authors proposed a hybrid intrusion detection system for online network intrusion detection. The researchers integrated improved particle swarm optimization and regularized extreme learning machine (IPSO-IRELM). In this study, IPSO was used to optimize IRELM. The model was tested using UCI balance dataset, NSL-KDD dataset, and UNSWNB15 dataset. The model recorded a high accuracy rate as well as capabilities to classify the minority features.

In [25], a hybrid detection model based on Spark ML and the convolutional-LSTM (Conv-LSTM) network was proposed. The model consists of two components: the first component uses Spark ML to detect anomaly intrusion while the second component deploys Conv-LSTM for misuse detection. To investigate the performance of the model, the researchers used ISCX-UNB dataset. The model recorded an outstanding performance of 97.29% accuracy in detection. The researchers proposed that the model can be evaluated further using a different dataset as a way of attempting to reproduce the results.

In [26], the authors developed an intrusion detection system by combining firefly and Hopfield neural network (HNN) algorithms. The researchers used Firefly algorithm to detect denial-of-sleep attacks through node clustering and authentication.

In [27], the researchers proposed a hybrid detection system for VANET (vehicular ad hoc network). The model consisted of two components. The researchers deployed a classification algorithm on the first component and a clustering algorithm on the second component. In the first stage, they used random forest to detect known attacks through classification. For the second stage, they deployed weighted K-means algorithm for the detection of anomaly intrusion. The model was evaluated using the current dataset, CICIDS2017 dataset. The researchers proposed further evaluation of the model in real-world environments. In another work [28], the researchers integrated random forest algorithm with unsupervised clustering algorithm based on coresets. This model was used for detection of real-time intrusions in VANET. Compared with other models, the model recorded better performance in terms of accuracy, computational time, and detection rate.

Barani [29] proposed a hybrid detection model based on genetic algorithm and artificial immune system (AIS) (GAAIS) for intrusion detection on ad hoc on-demand distance vector-based mobile ad hoc network (AODV-based MANET). The model was evaluated using different routing attacks. Compared with other models, the model improved detection rate and decreased the false alarm rate.

In [30], the researchers used integrated firefly algorithm with a genetic algorithm for feature selection MANET. To classify the selected features in the first stage of the model as either intrusion or normal, the researchers used replicator neural network for classification. The model performance was compared to that of fuzzy-based IDS. The model outperformed fuzzy-based IDS in accuracy as well as precision and recall.

4. Methodology

The methodology used consists of three primary phases: planning, conducting, and reporting as outlined by Kitchenham and Charters [31]. The three steps can be explained as follows:(a)Planning: the main goal of this phase is to define the research goals and the review protocol. Review protocol defines how the review will be done. It consists of all the elements of review.(b)Conducting: once the protocol has been defined, the review process can start. The main stages in this phase include identifying relevant research, selecting primary studies, and extracting required data and synthesis data.(c)Report: finally, in reporting the review, data extraction strategies are defined and the steps to be used in data synthesis are outlined.

5. Review Process

5.1. Research Questions (RQs)

The main objective of this paper was to analyze the hybrid intrusion detection system techniques that were developed from 2012 to 2022. The following research questions were developed in line with the main objective:(a)RQ1: which hybrid techniques have been used in intrusion detection systems? Objective: to identify techniques used in the development of hybrid IDS.(b)RQ2: which classical algorithms were used in the integration of the hybrid? Objective: to identify commonly used algorithms in hybrid IDS.(c)RQ3: which evaluation metrics are used in the hybrid intrusion detection systems? Objective: to identify commonly used metrics in the evaluation of IDS.(d)RQ4: which datasets are used in hybrid intrusion detection system research? Objective: to identify commonly used datasets in hybrid IDS.

5.2. Search Strategy

Research shows that it is important to be guided by a search strategy in the systematic review [31]. In defining our search strategy, we were guided by the steps outlined by Thyago et al. [32]. The main two steps in this process are defining keywords and the sources of the study. The keywords were derived from the research questions. The keywords and synonyms used are as follows:(1)Hybrid OR Integrated OR Cascaded.(2)Intrusion detection System OR IDS(3)Artificial Intelligence OR Machine Learning

We used the Boolean operators (OR) and (AND) to define the search string. The operator (OR) was used between synonyms, while (AND) was used between the keywords. The following search strings were defined:(1)“Hybrid” OR “Integrated” OR “Cascaded”(2)“Intrusion detection System” OR “IDS”(3)“Artificial Intelligence” OR “Machine Learning”

Finally, the search strings were combined as follows: ((1) AND (2) OR (1) AND (2) AND (3)).

The researchers used the following digital libraries that are recognized in publishing research in the area of intrusion detection systems [33].(i)The Institute of Electrical and Electronics Engineers (IEEE) Library (https://ieeexplore.ieee.org/)(ii)The Association for Computing Machinery (ACM) Digital Library (https://dl.acm.org/)(iii)Springer Link (https://link.springer.com/)(iv)Science Direct (https://www.sciencedirect.com)

Several searches were done on the above listed libraries but the search strings that yielded better result on each database are as follows:(i)IEEE: ((“Hybrid” OR “Integrated” OR “Cascaded”) AND (“Intrusion detection System” OR “IDS”) AND (Artificial Intelligence OR Machine Learning))(ii)ACM: [All: [[all: “hybrid”]] OR [All: [all: “integrated”]] OR [[All: [all: “cascaded”]]] AND [[All: [[all:] OR [All: “intrusion detection system”] OR [All:]]]] OR [[All: [all: “ids”]]] AND [[All: [publication] OR [All:]]]](iii)Springer Link: ((“Hybrid” OR “Integrated” OR “Cascaded”) AND (“Intrusion detection System” OR “IDS”) AND (Artificial Intelligence OR Machine Learning))(iv)Science Direct: ((“Hybrid” OR “Integrated” OR “Cascaded”) AND (“Intrusion detection System” OR “IDS”))

The initial search obtained 2,084 articles. Table 3 shows the number of articles obtained from the digital database.

5.3. Publication Selection Criteria

For inclusion criteria, all primary studies that have reviewed hybrid intrusion detection systems and articles published between January 2012 and February 2022 were included in the study. Single algorithm studies, secondary studies, short papers, duplicated studies, non-English studies, and incomplete papers were excluded. In addition, all studies that were not relevant to the research questions were excluded from the research. Table 4 summarizes the inclusion/exclusion criteria.

5.4. Study Selection Process

To conduct the selection process, the papers were selected according to the established strings, and papers were also selected based on the title, abstract, and keywords on this stage. The selected papers from the first selection process were subjected through the second selection process, which was based on reading the entire text of the paper.

The primary reviewer conducted the selection process. The secondary reviewer conducted an inter-rater reliability test on the selected papers. This was done to make sure that there was no bias in the selection process from the primary reviewer. In the first step, 1875 studies were excluded by the reviewers as they did not satisfy the inclusion criteria. Of those excluded, 1786 were out of scope, 8 were grey studies, 27 were single algorithm studies, 53 were short papers, and 1 was duplicate paper. In the second step, 98 studies were excluded by the reviewers as they did not satisfy the inclusion criteria. Of those excluded, 78 were out of scope, 2 were single algorithm studies, 17 were short papers, 1 was non-English paper, and 1 was incomplete paper. In this research 111 papers were selected for the review as shown in Table 5.

5.5. Data Extraction Process

The objective of this step is to provide an answer to the research questions for each paper in a semi-structured way. To avoid bias in the data extraction process, a data extraction form was developed. The data extraction form captured key elements to answer the research questions as shown in Table 6.

6. Results and Discussion

6.1. Year of Publication

Figure 1 shows the number of publications per year. The year with the most publication is 2020. The graph indicates a continuous increase in research in the field of hybrid IDS. This can be attributed to the desire of improving the efficiency and effectiveness of IDS.

6.2. Research Questions (RQs)

In this section, the outcome of the literature review will be analyzed and discussed as per the research questions.

RQ1: which hybrid techniques have been used in intrusion detection systems?

In this question, the research sought to understand which techniques were used in the development of the hybrid IDS. Research shows that hybrid approaches can be broadly categorized into three: cascaded hybrid, integrated-based hybrid, and cluster + single hybrid.

As shown in Table 7, the most used hybrid technique was the cascaded hybrid technique (72 papers), the integrated-based hybrid technique (36 papers), and the cluster + single technique (3 papers).

RQ2: which classical algorithms were used in the integration of the hybrid?

In this question, the researcher sought to understand the classical algorithms applied to hybrid techniques. It was established that the most used algorithms in hybrid detection systems were SVM, DT, K-means, Naïve Bayes, KNN, GA, and PSO as shown in Table 8. The rest of algorithms appeared less than 5 times in the selected papers.

RQ3: which are the evaluation metrics used in the hybrid intrusion detection system?

Metric is the measure of the performance of ML algorithm on a given dataset. Metrics are used mostly to compare the performance of different models and determine the most effective one.

Accuracy is a frequently applied metric. The purpose of this metric is to compare the correctly detected outcomes against the total detected outcomes.

True-positive rate (TPR), also known as either recall, sensitivity, or detection rate, is the fraction of correctly detected positive outcomes compared to positive observation.

False-positive rate (FPR), which is referred to as false alarm rate (FAR) or fall-out, is the fraction of wrongly predicted positive outcomes compared to actual negative observations.

True-negative rate (TNR) is also called specificity. This metric is the ratio of correctly predicted negative outcomes compared to actually negative observations.

False-negative rate (FNR) is also called miss rate. This metric is the ratio of wrongly predicted negative outcomes compared to positive observations.

F-score/F-measure is a measure that combines a model’s precision and recall into an overall accuracy figure. F1 scores range from 0 to 1 with 1 being perfect and 0 indicating poor performance.

Precision is the ratio of correctly predicted positive outcome compared to positive prediction.

Time is a metric used to measure the efficiency of a model. This can be done either during the training stage or during the evaluation stage.

This study found that three metrics were used in more than 50% of the research as shown in Figure 2. These are accuracy, detection rate, and false alarm rate. Accuracy tests the performance of a model in terms of the number of correctly predicted results. The higher the accuracy, the better the model. This explains why the metric has been used in most of the studies. TPR or detection rate measures the capabilities of a model to flag attacks. This is a very important metric as the objective of any intrusion detection system is to flag attacks. Lastly, false alarm rate (FAR) is the measure of false alarms produced by the model. The more the false alarms, the poor the model. The metric can be used by the designers to improve the performance of the model by reducing or eliminating false alarms.

The above three metrics form the key evaluation metrics for any detection model. With the three metrics, it is possible to determine the overall performance of a model.

RQ4: which datasets were used in hybrid intrusion detection system research?

Figure 3 depicts datasets used in hybrid intrusion detection system research. Dataset used is one of the most important elements in the development of anomaly-based intrusion detection systems. Despite that, the conducted review indicates that researchers are using old datasets in developing hybrid intrusion detection systems. The two most commonly used datasets are KDDCup99 and NSL-KDD. Research shows that these two datasets were developed in 1999. With the ever-changing digital landscape, these datasets cannot be used to develop effective models to combat current cyber threats. The analysis of SLR has shown that we have very few updated datasets to be used in the existing network infrastructure. The pie chart is the representation of our results.

7. Conclusion

This study has filled the gap that exists in the current body of knowledge on systematic literature review on hybrid intrusion detection systems. This systematic analysis on hybrid IDS points out the existing gaps in the development of hybrid intrusion detection systems and the need for further research on this area. The analysis of SLR indicates that the field of hybrid intrusion detection techniques is an area of focus for many researchers due to its potential of solving the issue of intrusion because this technique increases the performance and efficiency of intrusion detection systems compared to a single algorithm. Investigation on how well to integrate the existing algorithms is of the essence in this field. Most of the hybrid intrusion detection systems are based on three major categories: cascaded hybrid technique, integrated-based hybrid technique, and cluster + single technique. Based on this work, most of the studies focused on cascaded hybrid technique (65%) This method combines the classical algorithms either parallel or in serial format. The second most widely used technique according to the conducted analysis is the integrated-based hybrid technique (35%). This technique aims at optimizing the classical algorithms. Integrated-based hybrids are more efficient and give better results compared to other forms of hybrid techniques. Thus, to develop an efficient and effective IDS, integrated-based hybrid should be adopted in developing the IDS. Lastly, cluster + single technique was the least used technique (3%). The literature review has shown that the existing algorithms have the potential to solve the problem of intrusion but cannot still evolve with the ever-changing digital environment. Most of the models rely on human intervention to update them. There is a need for models which can learn their environment and update themselves without human input.

According to the conducted study, researchers have deployed different types of algorithms in the development of hybrid intrusion detection. The commonly used algorithm includes ANN, SVM, DT, K-means, Naïve Bayes, KNN, GA, and PSO.

For evaluation of the models, fifteen types of datasets were used in the analyzed studies. The datasets that recorded high utilization in the analyzed studies include KDDCup99 and NSL-KDD. Despite their high recorded popularity, these datasets have received criticism from researchers. Most researchers point out that these datasets were developed years ago, and hence they are outdated and ineffective in developing modern intrusion detection systems. In addition, researchers have observed that these datasets do not capture the current forms of detection, and hence they lack the capabilities of defending modern network infrastructure. To resolve this challenge, the analyzed literature review observed emerging datasets which capture current intrusions. These include CICIDS2017, UNSW-NB15, CSE-CIC-IDS2018, and Bot-IoT datasets. The problem is that most of the studies are still using old datasets. For effective IDS, researchers in this field of intrusion detection systems need to embrace the updated datasets.

The three most commonly used metrics for performance evaluation of IDS are accuracy, TPR, and FPR. Future studies should consider also including CPU utilization and detection time as performance metrics. The detection of intrusion should be done on a real-time basis before any damage is caused, and hence the detection time should be as low as possible. In the development of intrusion detection systems, resource utilization should be considered. In this review, only a few papers included CPU utilization as a performance metric.

Data Availability

The secondary data supporting this systematic review are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by the National Key Research and Development Program of China (grant nos. 2019YFE012990 and 2018YFC1506102), National Natural Science Foundation of China (grant no. 41605121), South African National Research Foundation (grant nos. 114911, 137951, and 132797), and Tertiary Education Support Programme (TESP) of South African ESKOM.