Research Article

Design and Development of an Efficient Network Intrusion Detection System Using Machine Learning Techniques

Table 1

Taxonomy of latest hybrid intrusion detection methods.

Hybrid-based intrusion detection techniques with feature selection techniques
YearResearch papersAlgorithmsTechniquesDatasetEvaluation criteriaFeature selectionResults

2017[26]SVM, IWDSVM is applied as a classifier. Feature reduction applying IWD (intelligent water drop) methodKDD-Cup ‘99 datasetDetection rate, precision rate, accuracy rate, false alarm rateIWDAchieves a detection rate of 99.40%, precision rate of 99.10%, false alarm of 1.40%, accuracy rate of 99.05%

2017[28]Particle swarm optimization (PSO)Particle swarm optimization (PSO) algorithm is applied for pruning the node of DT, and the pruned DT is applied for the network IDS classificationKDD-Cup ‘99 datasetAccuracy rate, precision rate, FPR., IDR, timePSOAccuracy of 96.65%, a precision of 99.98%, FPR of 0.136, IDR of 92.71%, and execution time of 383.58 sec. is obtained

2017[29]Prioritized KNN algorithm, optimized SVM algorithm, Naïve Bayes feature selection approachPKNN is used for detecting input attacks, hybrid HIDS strategy (based on Naïve Bayes feature selection); OSVM is applied for outlier rejection. Naïve Bayes is applied as the feature selector approachKyoto 2006+ dataset, KDD-Cup ‘99 dataset, and NSL-KDD datasetSpecificity, sensitivity, detection rate, precisionNBFSAn overall sensitivity rate of 53.24%, detection rate of 94.6%, precision of 56.62%, specificity of 98.21% are obtained on all datasets

2017[30]Artificial neural networkParticle swarm optimization (GSPSO) is employed to train ANN, gravitational search (GS), and combination of GSNSL-KDD datasetMSE, detection rate, timeNot appliedMSE of 0.4527%, a detection ratio of 95.26%, and execution time of 103.70 seconds are obtained

2017[31]Hybrid multilevel data mining algorithmFlexible mutual information-based feature selection (FIMS) is employed as feature selector, MH-ML (multilevel hybrid machine learning), MH-DE (multilevel hybrid data engineering), MEM (micro expert module) for training the KDD-Cup ‘99 datasetKDD-Cup ‘99 datasetDetection rate, recall, accuracy rate, -value, precision rateFIMSA detection rate of 66.69%, accuracy of 96.70%, recall of 96.70%, precision of 96.55%, and -value of 96.60% are achieved

2018[32]Support vector machine (SVM)Chisqselector employing the SVM classifier for reduction of featuresKDD-Cup ‘99 datasetAUPR, AUROC, timeChisqselectorAUPR of 96.24%, AUROC of 99.55%, and execution time of 10.79 seconds are obtained

2018[33]Vector-based genetic algorithmThree feature selection methods are employed, linear correlation-based feature selection (LCFS), modified mutual information-based feature selection (MMIFS), and forward feature selection algorithm (FFSA), chromosomes as vector and training data as metricsKDD-Cup ‘99 dataset and CTU-13 datasetFPR, accuracy rateLCFS, FFSA, MMIFSFPR of 0.17% is achieved, and accuracy rate for the DoS is 99.8%

2018[34]Neural network with resilient back propagation algorithm, CARTNeural network with resilient back propagation algorithm to update the weights; feature reduction is performed by CARTISCX & ISOT datasetDetection rate, accuracy rate, FPRCARTAn accuracy rate of 99.20%, detection rate of 99.08%, and FPR of 0.75% are obtained

2018[35]Symmetrical uncertainty and genetic algorithm (SU-GA) is used as classification algorithmGenetic algorithm is used on selected features; symmetric uncertainty is applied to find best featuresUCI datasetAccuracy rateGAAn accuracy of 83.83% is obtained, and an execution time of 0.23 seconds is achieved on all approaches

2018[36]Genetic algorithmNeurofuzzy inference system, neural fuzzy genetic, fuzzy logic controller, multilayer perception for attack classificationKDD-Cup ‘99 datasetAccuracy rateFuzzy ruleA true attack detection and false alarm detection accuracy up to 99% rate of 1%.

2019[37]Random forest, Naive Bayes, J-48, -nearest neighbor algorithmWrapperSubsetEval and CfsSubsetEval are applied as two feature selection techniques, while random forest, -NN algorithm, Naive Bayes, and J-48 are applied as the classifiersNSL-KDD datasetDetection rate, accuracy rate, –measure, TP rate, FP rate, MCC, and timeWrapper and filterOverall accuracy rate of 99.86%, overall FPR of 0.00035%, overall detection ratio of 0.9828%, -measure of 0.706%, overall TPR of 0.929%, overall MCC of 0.955%, and total execution time of 10.625 seconds (executed on NSL-KDD dataset with 25 attributes on all attack types)

2019[38]-means clustering, DBSCAN, SMO-means is applied for data grouping, DBSCAN is employed to eliminate noise from data, and SMO is applied for intrusion detectionKDD-Cup ‘99 datasetDetection rate, accuracy rateDBSCANAn approx detection rate of 70% and an approx accuracy of 98.1% are obtained

2019[39]Intelligent flawless feature selection algorithm (IFLFSA), entropy-based weighted outlier rejection (EWOD), intelligent layered classification algorithmEWOD is used to detect outliers in data, IFLFSA is used as feature selection, and intelligent layered classification algorithm is applied to classify the dataKDD-Cup ‘99 datasetAccuracy rateIFLFSAOverall accuracy of 99.45% is achieved

2019[40]ID3, -nearest neighbor, isolation forest-nearest neighbor is used to apply a class to unknown data point, ID3 is used as feature selector, and isolation forest is employed to segregate normal data from anomalyNSL-KDD & KDD-Cup ‘99 datasetDetection rate, accuracy rate, false alarm rate-NNThe performance with KDD-Cup ‘99 dataset has a detection rate of 97.20%, accuracy of 96.92%, and FPR of 7.49%. Performance on NSL-KDD dataset has a detection rate of 95.5%, accuracy of 93.95%, and a FPR of 10.34%

2019[41]Best first search and Naïve Bayes (BFS-NB) algorithmBest search is applied as attribute optimization approach, and Naïve Bayes is employed as classifierKDD datasets from the US Air ForceAccuracy, sensitivity, specificityNaive BayesSensitivity analysis of 97%, accuracy of 92.12%, and specificity of 97.5% are obtained

2020[42]Deep neural network (DNN), classical AutoEncoder (CAE)Deep neural network (DNN) is applied as classification, and classical AutoEncoder (CAE) is applied as a feature selector approachUNSW-NB15 dataset(DNN)Classical AutoEncoder (CAE)Precision of 92.08%, -measure of 91.35%, accuracy of 91.29%, recall of 90.64%, and FPR of 0.805

2020[43]-nearest neighbor (-NN), extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM), SDN controllerHierarchical extreme learning machine (H-ELM), extreme learning machine (ELM), and -nearest neighbor (-NN) are applied for classification, and SDN controller is employed as a feature selection approachNSL-KDD dataset(-NN), (ELM), (H-ELM)SDN controllerAn accuracy of 84.29%, FPR of 6.3%, precision of 94.18%, recall of 77.18%, -measure of 84.83%

2021[44]ANN is applied as a classifierAn integration technique () is employed to improve the classification accuracyNSL-KDD dataset and UNSW-NB15 dataset()Correlation-based feature selection techniqueAn accuracy of 98.45%, specificity of 94.38%, sensitivity of 92.94%, and execution time of 500 seconds are obtained on the NSL-KDD dataset. For the UNSW-NB15 dataset, an accuracy of 96.44%, specificity of 98.4%, a sensitivity of 50.4%, and an execution time of 1023 seconds are achieved

2021[45]SVM, modified binary gray wolf algorithmSVM is used as a classifier and, modified binary gray wolf algorithm is applied as feature selection approachNSL-KDD datasetSVMModified binary gray wolf algorithmAn accuracy of 96%, FPR of 0.03, detection rate of 0.96, and execution time of 69.6 h

2021[46]Multiclassifier, deep neural network, kernel densityRandom forest differential evaluation with kernel density for predicting unusual activities. For input classification, a multiclassifier is applied, while a deep neural network is employed as the learning and training of the data. Kernel density is used for clustering and prediction of data.HHAR datasetRandom forest differential evaluation with kernel density, multiclassifier, deep neural network, kernel densityBasic sort-merge treeAn accuracy rate of 98.4%, a sensitivity of 96.02%, and a specificity of 99.8%