Abstract

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.

1. Introduction

The IoT is the new era of technology in the digital world. IoT is empowering physical objects in processing data seamlessly [1]. It makes the physical objects interactive and makes them responsive without any human intervention. According to a recent Gartner report, there will be around 8.4 billion connected physical things worldwide in 2020 and it is expected that this number will increase to 20.4 billion by the year 2022 [2]. These applications are highly promising and serve the best of them. This number boosts the scholars to work on IoT in terms of its potential, performance, efficiency, challenges, threats, and security as well. Therefore, it is optative to have high security, privacy, authentication, and recovery from attacks. Spoofing, eavesdropping, DoS, and DDoS are some attacks on IoT applications [1], and to safeguard these applications, we need methods that can prevent these types of attacks [3]. Fog is also a new emerging technology that lets the user to virtual store and process the data between the cloud and devices, and fog can play a vital role in IoT security. Fog nodes have the potential to produce an alarm or warning to IoT systems if they encounter any suspicious data or requests [1]. Some researchers applied edge computing which is also one of the fastest-growing technologies that can be embedded with other technologies to improve its security, potential, performance, mobility, and data management. So, it can also be applied to IoT applications as well. Edge computing techniques provide a shorter response time [4]. This helps in better latency and system performance especially with the data generated by IoT applications. It can serve in the prevention of eavesdropping and data breaches in IoT applications. Moreover, IoT works on three-layer architecture and it has perception, network, and application as three layers. To achieve maximum security and privacy in IoT systems, it is essential to have security at various layers. Many architectures have been proposed for the security of IoT on various layers using machine learning (ML) or deep learning (DL). Scholars have studied various issues, challenges, and threats in IoT. Also, the existing security systems are not enough to handle all the aspects of security. So, more advanced and enhanced security systems are required; otherwise, IoT may lose its high potential and high demand. This advanced and improved system can be deployed with the help of the latest technologies that can be replaced with classical algorithms in IoT security. Machine learning (ML), deep learning (DL), and artificial intelligence (AI) serve methods that will improve the performance and efficiency of algorithms. Efficient intrusion detection systems can help reduce malicious IoT traffic. The incoming IoT packet streams are monitored continuously by intrusion detection systems [5]. There are two main threat detection approaches: signature-based and anomaly-based. A pattern is designed from previously recognized attacks by a signature-based approach. Therefore, an IDS based on signatures equates the signatures with the events seen and reports a hazard when matched. There are several problems relating to IDSs based on signatures, and the following are summarized:(i)The very first problem is that the only known attacks with well-investigated features can be found, while zero-day (i.e., unknown) attacks cannot be detected. Unfortunately, attackers continue to develop their strategies to bypass conventional security mechanisms in various attack behaviors [6].(ii)The second problem is that even though the numbers of newly discovered attacks grow, so does the number of signatures, leading to further similarities between stored patterns and new occurrences. This raises the detection systems more complex, which has a direct impact on the system’s response time, making this a critical problem for real-time intrusion detection systems. Therefore, under certain conditions, these IDSs’ system performance degrades due to limited source availability [7].

Anomaly-based detection techniques can address all of the above limitations. A system based on anomalies observes a sequence of incoming packets and builds the normal behavior model of the system. The learned model then identifies abnormalities relying on an index of similarities between normal and abnormal packets. In this approach, the major challenge is that to build a model with unique normal system behavior, a reference with different underlying behaviors, generated by individual data sources. Admittedly, various types of data references can produce an increased false-positive rate by reducing the resemblance between malicious and normal learned behaviors [8]. After reading several research works in this field, the authors mainly raised concerns about the validity of signature-based IDSs in scenarios where attacks could not be found [9]. This paper gave this assumption a shape. Indeed, an IDS-based signature could not detect unknown attacks because its vocabulary of attacks could not contain those definitions. Worse still, an IDS is deployed over the distance, if not on end devices or low-cost IoT gateways. As required in a signature-based approach, the regular update of attack definitions is more difficult.

The major issue that arises while implementation of intrusion detection model is that it has to handle large amount of data. The large, irrelevant, and redundant data may cause negative impact on performance of machine learning. Therefore, building machine learning algorithms, feature selection plays a major issue. The accuracy and time complexity of the model is affected due to the presence of irrelevant features. In this paper, intrusion detection model is presented in the presence of feature selection methods. In this paper, a wrapper feature selection algorithm for IDS is proposed to handle large number and high-dimensional dataset. Redundant or irrelevant features are filtered that significantly improve the training time and accuracy of the machine learning. Filter, wrapper, and embedded methods are three types of feature selection. In filter method, each feature is assigned with a weight and feature subsets are used with machine learning for classification. Filter method has benefits such that it requires fewer computing resources and time but the main issue with this type of feature selection is that it lacks compatibility with classification process and thus results in low accuracy. Another feature selection method is wrapper method that considered the classification performance while selecting feature subset. This method results in high accuracy but takes more computational time. The embedded feature selection is another method that shows performance in between filter method and wrapper method. In IDS, data need high accuracy as training time was not of much concern. Therefore, in this paper, wrapper feature selection method is adopted. Pearson correlation, f_score correlation, and rank-based chi-square feature selection techniques are combined together to design a hybrid wrapper-type feature selection method that selects optimal or relevant features out of number of feature sets. Correlation feature selection method finds the association among features. But one of the drawbacks is that if it takes entire population (i.e., entire data), it does not result in good performance. Therefore, in this paper, hybridization of feature selection techniques is performed on different sets of data. This results in accurate association among features and gives high accuracy.

The key contributions of this paper are as follows:(i)In this paper, state-of-the-art about intrusion detection frameworks for detecting malicious IoT traffic and their challenges are also presented along with(ii)This paper also presented a model based on feature selection techniques intending to design a lightweight algorithm for IoT traffic(iii)The proposed framework is effective in the IoT scenario as the methodology can handle big data along with the best features(iv)Finally, this paper gives a comparative state of the art with existing techniques

The remaining section of this paper is illustrated to be as follows: Section 2 introduced the background knowledge of intrusion detection techniques for IoT security, and their challenges are also discussed. In Section 3, paper gives an overview of the proposed methodology and training algorithms. In Section 4, result analyses of the proposed model on the different datasets are presented. This section also gives the comparative state of the art with existing techniques. Finally, in Section 5, the conclusion and future research scope are discussed.

The involvement of IoT devices, in our daily lives, is increasing, and the critical issue of security of the collected data by these devices is also rapidly increasing. Thus, in [10], a three-layer intrusion detection system is introduced. The perspective of the system is to determine the domain of the IoT network based on cyberattacks [11]. Overall IDS architecture consists of three layers; first layer consists of a tool that scans the network and recognizes the linked IoT devices based on the Mac addresses and categorizes them based on their behavior. The second layer identifies the genuine or spiteful packets from the connected IoT devices. If any spiteful or malicious packet is found in the second layer, then the third layer will determine which kind of attack it is. Some commercial IoT devices are connected in the home, and to collect the created traffic from these devices, tcpdump was programmed to run on an access point. Finally, in the Syslog server, this gathered to traffic in the form of PCAP logs is transferred and then stored. As the network process of collecting data from the testbed is started, a time span for both useful and harmful data was decided. The testbed of IoT devices was well arranged and implemented so that overall inbound and outbound traffic which was processing on access point was recorded by using the tool tcpdump. To increase the complexity of the network, four automated multilevel spiteful attacks, some scenarios were established on the network. When an individual attack takes place, scripts were developed to generate logs. This is important for the data labeling task which will be further done to supervise the machine learning. The next process is the feature selection in which the development of machine learning is based on an intrusion detection system and IoT. The limitation of this system is that this is not a real-time implemented system.

In [12], the intrusion detection system was designed by using the fog computing method to implement it in the network which is spread. The introduced system consists of two sections; i.e., the first one is attack observation at fog nodes which uses the OS-ELM algorithm which detects that the packet which is coming through IoT traffic is genuine or just to create attacks. The second section of the proposed system is summating at cloud service which provides the global view to examine and observe the ongoing security condition of IoT applications. This is used to forecast the upcoming action of the attacker. The results of this experiment are estimated based on accuracy and response time, the given system achieves 97.36% of accuracy, and response time is evaluated as this system determines attacks 25% faster as compared to other algorithms. This system is to be protected from proactive attacks, which is the limitation of this system.

In [13], the author introduced a confidential preserving distributed intrusion detection system structure based on progressive learning. This model is used to recognize denial of service attack because many researchers remain unsuccessful in determining real-time traffic dataset and thus many attackers insert spiteful traffic patterns to corrupt the training structure. The proposed structure consists of three networks; they are generative network which gathers all the incoming traffic from all IoT devices, and then, unique feature from all the devices was extracted with the help of autoencoders. The second network is the bridge network, and all the collected data of useful features which is extracted from the first network (generative network) are sent to bridge network. The gathered data are analyzed and then compared with the available data in the third network which is classifier network. In this network, only important data are sent by the bridge network so that the model will not behave as time-consuming. To save time in execution, CNN (convolutional neural network) model is used to minimize false alarms and inessential service visits. Simply, it can be defined as the overall process can be divided into three phases; the first one is the preprocessing phase; the second phase is the comparison phase; the third phase is the classification phase based on separate coding extraction of feature and fusion, incremental maintaining module, and finally classification process. This structure has given classification accuracy with minimum space and low computational cost, but the limitation of this structure is that it is not able to identify new attacks in multiple attack scenarios.

In [14], the author proposed a model of intrusion detection honeypot based on SoLA to identify malware attacks. Honeypot is nothing but an unreal environment that collects the data of attacks and attackers to only trap them but not to prohibit them. The introduced system of intrusion detection honeypot’s architecture consists of a low interaction honeypot server and IDS network. Both of them collect all the information from the incoming traffic and investigate the collected data. Complex event processing (CEP) engine connects different events of direct attacks from host and network, honeypot agent, SDN controller, etc. Depending on the CEP outcome, the spiteful process is determined and destroyed. Honeypot agent is structured by applying social leopard algorithm (SoLA). The introduced system of intrusion detection honeypot uses the complex event processing technique to interrelate between features of the host, network, and several events. The ransomware encryption process here takes place as read, encrypt, write, and delete. When there is progression or movement from one state to another one in the fake folder, it investigates whether the activity is done by the user or it is the activity of the attacker. Once the file is read and encrypted, it denotes it as doubtful activity and examines the particular variable and then the outcome of this process is sent to CEP engine and firewall; here, engine interrelates the values from the honey folder, audit watch, and SDN application and creates an alert which is based on high accuracy, and thus, it determines the malware attacks with minimized loss and high accuracy. The software-defined networking (SDN) applications upgrade network security by applying simple commands. It does not work on healthcare implants, so further this model can be amplified for Internet-connected toys to identify the malware attacks.

In [15], to identify the harmful data that are inserted within the IoT network, a light-weighted intrusion detection system is used. The attack identification is done by using machine learning which relies on a support vector machine (SVM). The architecture of the introduced system comprises two stages: the first one is the training stage and the second one is the evaluation stage. The overall system is conducted by varying the traffic intensity. Training databases carrying labeled samples are acquired in the training stage, these databases are then used to acquire their features, and a pool of features is generated which can be called a feature pool. This pool accompanied by a vector label is sent to train the classifier, and this trained classifier then categorizes the samples as labeled samples and unlabeled samples. To calculate the performance of the classifier, alike features used in the training stage are extricated from data samples. Several experiments were done in this work with different traffic intensities, and it is proved that packet arrival rate feature and support vector machine-based classifier are sufficient to detect the intrusion in the network as compared to other classifiers like NN, k-NN, and DT. The outcomes of using this model are the intrusion detection system which relies on a support vector machine (SVM) to achieve adequate detection of attacks.

In [16], the author introduced the intrusion detection system which relies on brilliant deviation, and the researcher named it Passban; this system is used to secure the Internet of things that are linked to this system. Passban is comprised of packet flow discovery, feature extraction, training and loading of the model, action manager, and web management interface. In this work, author focuses on the performance of network interchange having the objective of identifying the patterns which slightly vary from them, and names them as anomalies. These different patterns are the pattern of attacks which takes place in the network. One-class classification, a type of learning strategy, is accessible for the present unambiguous condition. There are many algorithms obtained that rely on basically two techniques; the first one is profiling, and the second one is isolation. Feature extraction is another step on which machine learning is then applied; several features are extracted from the data. Trained data are saved in the local memory of the edge device and the prediction; phase-predicted anomalies are also detected. Finally, all the anomalies are then forwarded to the action manager. In this, Passban is worked on two scenarios and then results are declared. In the first scenario, LOF and i-forest are capable of detecting all the attacks with adequate accuracy, while in the second scenario arrangements are not requisite, which means Passban is connected individually to the network which is to be identified for the attack; it can scan overall traffic of the device linked with the network. This technique is useful in threat determining with the accurate performance, and it can be applied on inexpensive devices also.

In [5], the author discussed multiple works that rely on IoT devices, security procedures, and machine learning procedures. The main aim of the research is to create a junction in between the above given three areas. The first junction is between IoT and security procedure, IoT architecture consists of three layers: perception layer, network layer, and application layer, and every layer has different attacks on it. Many machine learning techniques for intrusion detection were discovered. This survey bestows complete analysis of network intrusion detection for IoT security established in various aspects of learning techniques. Here, IoT attacks are categorized based on challenges they are spoofing, routing attacks like sinkhole attack, selective forwarding attack, black hole attack, wormhole attack, replay attack, tampering attack, repudiation attack, and man-in-the-middle attack; these are technical terms of attacks. Other kinds of attacks are based on design challenges, and some of them are interoperability and diverseness, security and privacy, etc., based on mechanism filter packets, adopt encryption, employ robust password authentication schemes, and audit and log activities. Various learning techniques are described by the different researchers; some of them are machine learning and deep learning, and based on these learning strategies, different algorithms are generated like decision tree, artificial neural network, Naïve Bayes, optimum path forest algorithm, logistic recursion, support vector machine, etc. This research work is helpful from an academic point of view and also industrial research.

In [17], author proposed a 2-stage AI IDS in SD-IoT network that has flow classification and feature extraction as its two stages. The architecture has self-learning ability. To get the best features been extracted, improved bat algorithm is used and for its optimal performance swarm division and binary differential mutation are applied with it. Then, an improved random forest is applied for network flow classification, and to improve the classification, weighing mechanism is also used. Experimental results are performed on a subset of the KDD Cup 1999 dataset after downsampling. The experimental results validate the better accuracy and lower overhead of an architecture which is better than the previous solution, and as far as it is concerned with future work, it can be extended to be applied in the real network for classification of traffic.

In [18], author proposed a novel SDRK-ML algorithm, that is, supervised DNN, and further extended it to an unsupervised clustering technique. The algorithm is placed between IoT and cloud layers to make it work better. Fog nodes are set as a gateway and perform data acquisition, and later, feature extraction is done which has been inputted to trained SDRK in which deep feedforward NN and K-means work on its core but in this paper, due to the unsuitability of K-means, a variation of K-means named as “RRS-K-means” is used to overcome the issue of NP-completeness. In a testbed, it is mentioned that a programmable feature of fog is installed that mitigates the attack for evaluation. The experimental results are performed on the benchmark NSL-KDD dataset. The limitation of this paper was that the fog nodes themselves can also be a point of attack to hack, so identification of same and reducing the retraining time are some suggested study works.

In [19], the author developed a CNN-based architecture that extracts the properties of the link load to detect roadside unit intrusion. This deep architecture consists of six covert layers, three of which are convolutional layers and three of which are pooling layers that implement average pooling with factor two to obtain abnormal fluctuations. As activation function, the sigmoid function is taken. The spatial characteristics of link charges are indicated as a matrix, and a loss function based on the standard L1 is presented to train the model’s backpropagation algorithm. The first assessment parameter of sensitivity, calculated using different weights and biases, is precision for the evaluation of performance. Low-orbit ion cannon is installed for the experimental result, and the result is driven from four attacks, TCP, UDP, SYN, and HTTP. To implement DDoS, LOIC is installed.

The author of [20] proposed a CorrAUC approach for the effective selection of traffic through the use of algorithms to improve traffic detection in the IoT network. The method proposed works in four steps. In the first step, a function selection metric called CorrAUC extracts the characteristics. In the second step, a wrapper technique is used to develop and design an algorithm based on the same metric. In the third step, it combines the ROC curve correlation attribute (CAE) and ROC curve area (AUC) to select the effective function to detect the bot-IoT. In the last step, integrated TOPSIS and Shannon entropy will be used for the validation of selected features on a bijective soft set. The Pearson correlation coefficient is utilized between M and N attributes in the feature selection matrix, but a feature becomes effective if the relationship between feature and class is not strongly correlated so that the correlation is calculated for greater precision. A newly developed dataset called bot-IoT is used for experimental evaluation. The most significant works reviewed in this section are summarized in Table 1 with their limitations.

3. Methodology

The proposed framework for intrusion detection in IoT is illustrated in Figure 1. This framework gives a general overview of the framework that is composed of basically three layers: data layer, communication or network layer, and the application layer. The data layer is composed of smart sensor data or IoT nodes. The collected data, either it is from sensors or any data user, are collectively communicated to the next layer, i.e., communication or network layer. This layer is composed of gateway or switching devices that are responsible for analyzing the collected network data. The abnormal packets are analyzed using the proposed approach and report to an administrator, whereas the normal data packets are transmitted to the next layer for their storage and analysis purposes. We explained the technique proposed in this section step by step with details. Our proposed method includes three steps for effective selection in the IoT network: data collection and extraction of features, optimal selection of features, and classification (as shown in Figure 2). The IoT packets are captured and transmitted for preprocessing and feature extraction at the data collection stage. Second, the proposed feature selection method is used which selects functions that contain sufficient information and then selects the feature to accurately filter it and select effective features for the selected ML algorithm based on these optimal features. The proposed algorithm is an assessment of correlations with an entropy feature estimate to solve the problem by a specific machine learning (ML) algorithm [21] of effective intrusion detection selection. The entropy estimate, which provides more detailed information on whether or not selected features are similar, is a mathematical method used for homogeneity measurement. In terms of the effective function selection for IoT attack detections in the IoT network environment, this technology produces very effective results. In addition, our method selects features that carries sufficient information for identifying IoT attacks [22] on the IoT network. To understand clearly, the methods for effective feature selection in the IoT network are discussed below, taking into consideration the detection of attacks by IoT.

3.1. Data Collection and Feature Extraction

The incoming traffic flow is captured, and further necessary features are extracted out of normalized incoming data packets [23]. These extracted features help out to find the type of attack and its identification. But before the feature selection process, it is required to preprocess the extracted data or features because data preprocessing plays a vital role in network traffic as the volume of data handled is huge. The algorithm for this stage is illustrated in Algorithm 1.

(1)Begin
(2)Input: Incoming IoT traffic (IoTtraffic)
(3)Output: Extracted Features ()
i = input packets, = gateway nodes,
inorm = normalized input packet
(4)For each iϵ IoTtraffic
⟶Gateway (i)
z_score (i)⟶ inorm
 Extract (i)⟶
 Return ()
(5)End

In preprocessing process, reduction of redundant data and normalization is an important step. This results in balanced data formation within a specified range. The z-score technique is used to normalize the incoming data packets, as illustrated in the following equation:where xi = ith feature set,  = mean of ith feature set, and  = standard deviation of ith feature set.

3.2. Feature Selection

For effective feature selection to solve the problem associated with malicious attack detection for IoT networks, three feature selection methodologies are hybridized with entropy estimation. This will result in the selection of the best and optimal feature selection. Correlation among features that selects effective features reduces volume of data as well as calculations also. Further feature entropy estimation is performed to overcome from class imbalance problem by eliminating irrelevant features.

A rank-based chi-square feature selection algorithm [24] is used to evaluate the dependency level of feature sets (fi) on the class label (cl). Mathematically, it is represented as in the following equation:where  = number of observations in a class cl and  = number of expected observations in a class cl.

In addition, the technique of Pearson moment correlation was adopted. This technique was used to study the relationship between independent and target class characteristics more thoroughly. Pcorr, a range of values from +1 to −1, is available for the Pearson correlation coefficient. A value of 0 indicates that the two variables are unrelated. A value above 0 is a positive association, and a value below 0 is a negative one. This technique is used to identify relationships between different characteristics or attributes. Mathematically, it is represented as follows:where Ai and Bi = feature sets and and  = mean of feature sets Ai and Bi, respectively.

Lastly, the F-score correlation feature selection method was adopted. F-Score correlation is an algorithm that is used to determine the direct or indirect relation among data values. If this F-score value is smaller among feature sets, then those features are not related to each other, whereas if the value is higher, then that feature is highly related and can be added to the feature subset. Mathematically, it is represented as in the following equation:where  = mean of fi feature set,  = mean of ith attribute of the fj feature set,  = ith attribute of the nth instance in fj feature set, and  = number of attributes in jth feature set.

After finding correlations from three different algorithms, different feature sets are identified. These feature sets are further input into the feature entropy estimation algorithm which results in optimal feature selection that contributes to finding the class of input data packets. This is considered to be an ensembling of feature selection that is preferred to give a precise result. This is illustrated in Algorithm 2.

(1)Begin
(2)Input: Fv
(3)Output:
(4)R ← nrows ()
(5)C ← ncols ()
(6)
(7)
(8)
(9)for each i in C do
FEE ()
(10)end for
(11)return

Feature entropy estimation (FEE) was used to find the best-related feature extraction; information gain formula is used for feature selection. In the context of the target variable, it evaluates the gain for each variable. The calculation is referred to as the mutual information between the two random variables in this slightly different application. The best characteristic is determined by the entropy calculation. Entropy is an uncertainty measure that can be used to deduce the distribution of characteristics in a concise way. It is mathematically evaluated as in the following equation:where Ef = entropy of feature sets, fi = ith feature set, and  = entropy of nth subset of feature sets.

Therefore, for attack detection, these feature selection techniques are applied to generate important, relevant features and remove unnecessary features to reduce computational complexity that will result in a reduction of execution time for malicious IoT traffic.

3.3. Extreme Gradient Boosting Classification

Gradient boosting is a kind of collective machine learning algorithm that can be used to solve categorization [25] and regression modeling issues. Decision tree models are used to create ensembles. To correct the forecasting misclassification caused by past models, trees are introduced to the array at the same time and matched. Gradient boosting gets its name from the fact that the loss gradient is reduced when the model is fitted, almost like a neural network. Simulations are fitted using a gradient-based approach and any configurable differentiable loss function. Because the GBDT algorithm is prone to overfitting, the XGBoost technique incorporates normalization factors into the original GBDT algorithm. XGBoost has been extensively enhanced in contrast to previous algorithms, as evidenced by the greatly enhanced training time and accuracy. Let us consider input as xi, output as oi, and as the observed and predicted label, respectively. Mathematically [26], the learning model is represented aswhere  = the weak learning function.

The loss function, while training, is mathematically represented as in the following equation:where  = training loss function,  = empirical loss between observed and predicted labels, And  = loss of boosted learner.

The entire training process is illustrated in Algorithm 3.

(1)Begin
(2)Input: , N= number of iterations
(3)Initialize: ft, t = 1, 2, ….T
(4) = Algorithm 1 (IoTtraffic)
(5) = Algorithm 2 ()
(6)for t = 1 to T do
(7)compute gradients ()
(8)
(9)
(10)if  = = min
return
(11)end if
(12)end for

4. Results and Discussion

In this section, first, we have illustrated the dataset used and the environment of the experiment. Then, the metrics used are discussed for the measurement of performance of proposed models, and later, results are discussed.

4.1. Experimental Setup

We selected three datasets for the performance evaluation of the proposed approach, namely, the NSL-KDD dataset [27], the UNSW NB15 dataset [28], and the CICIDS2017 dataset [29]. The NSL-KDD dataset was generated from the KDD Cup’99 dataset to eliminate identical datasets and alleviate the problems involved with the KDD Cup’99 dataset. There are 125,973 data records in the NSL-KDD train database and 22,544 data files in the test data file. The size of the NSL-KDD record is large enough that the full record can be used without the use of a representative sample. The given dataset is comprised of 41 characteristics and 22 training intrusion attacks. Here, the connection has 21 characteristics, and the type of connection all together in the same host has 19 characteristics. The IXIA Perfect Scenario program in the Australian Center for Cyber Security (ACCS), Cyber Range Lab, established a combination of spatial and temporal activities from the unprocessed network packets of the UNSW-NB 15 dataset [28]. 100 GB of unprocessed data traffic is collected using the tcpdump utility (e.g., pcap files). There are nine different security threats in this dataset. And the last dataset is CICIDS2017 [29] which includes even more fresh data packets, both with and without assault, that is remarkably similar to real-world communication networks. This database comprises current real-world network-like information that has been gathered over five days and included a variety of malware as well as normal data. This work is employed on a 64-bit Intel Core-5 CPU with 8 GB RAM in Windows 10 environment. Machine learning algorithm is implemented in MATLAB 2020a.

4.2. Performance Parameters

The proposed work was evaluated on the basis of the following parameters:where TP stands for true positive that means if actual and predicted data samples are an anomaly in nature, then TP is evaluated, TN stands for true negative that means if actual and predicted data samples are not an anomaly in nature, then TN is evaluated, FP stands for false positive that means if actual and predicted data samples are normal and anomaly in nature, respectively, then FP is evaluated, and FN stands for false negative that means if actual and predicted data samples are an anomaly and normal, respectively, then FN is evaluated.

4.3. Result Analysis

Table 2 shows the performance evaluation of the proposed intrusion detection system on the NSL-KDD dataset with 5-fold validation. Table 2 represents performance parameters of accuracy, precision, recall, and F_Measure. Similarly, Table 3 represents the performance evaluation on the CCIDS2017 dataset. And Table 4 represents the performance evaluation on the UNSW_NB15 dataset. In this analysis, the random samples from the test dataset are selected and evaluated. In this work, 5-fold validation is performed. The dataset is divided into 5 parts randomly, one part is selected testing, and other parts are used for training. Similarly, Table 5 represents the time complexity of the proposed algorithm. From Table 5, it was observed that the average time complexity on the NSL-KDD dataset was approx. 38 sec, whereas for USNW_NB15 and CCIDS 2017 the time complexity was approx. 2 sec and 3 sec, respectively. The proposed methodology results in a lightweight low-cost feature selection method for IoT devices. This is due to its low computational time complexity that is illustrated in Figure 3. This figure justifies the time taken for the selected number of features from incoming IoT network traffic.

4.4. State-of-the-Art Comparison

The IoT or edge nodes are vulnerable to network attacks, and network connectivity enables malware injection from the Internet. In most of the attack detection learning models, vanishing gradient problem occurs and faces overfitting issues during the latter stages of training. Nowadays, it has become one of the most promising research areas for researchers as daily new attacks are introduced in the network. This section is dedicated to exploring the work of other researchers in the field of intrusion detection. A comparative state of the art with other existing works is illustrated in Table 6.

5. Conclusion

Attack detection in IoT is quite an essential task to keep track of the security of IoT traffic. In the past few years, many researchers have implemented machine learning (ML) techniques to track and block malicious IoT traffic. But in the presence of inappropriate features, these ML models lead to misclassification issues along with time complexity during the learning process. This noteworthy issue needed to be resolved by designing a framework for optimal and accurate feature selection from malicious IoT traffic. For this purpose, a new framework model is proposed. Firstly, the proposed feature selection approach is developed by combining rank-based chi-square, Pearson correlation, and f_score correlation to extract relevant features out of all available features from the dataset. These algorithms are a type of wrapper technique that filters out the features more accurately and effectively for classification. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. The experimental simulation was performed by using three datasets, NSL-KDD, UNSW-NB15, and CCIDS2017, and compared with some existing works. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. The following conclusions can be derived from the implementation of the proposed algorithm:(i)The proposed framework can enforce security and trustworthiness on the Internet of things (IoT)(ii)The feature selection techniques remove the drawback of a local minimum, and they converge faster(iii)By selecting optimal features, training time is reduced(iv)Highly related features are needed for improvement in performance level. Unnecessary features will cause calculation complexity(v)Faster execution with reduced features results in faster alert of intrusion, and prevention measures can be applied accordingly

In future work, this work would be extended for other datasets also and more real-time attack detection would be explored. This would create fine-grained usage limitations to ensure privacy characteristics across big datasets even while enabling classification algorithms and analyses to operate on top of them. Internet of things (IoT) application frameworks would develop the necessary technical capabilities to impose sufficient security controls as even more data are collected, transferred, and analyzed over a common infrastructure.

Notations

IoTtraffic:Incoming IoT traffic
:Extracted features
:Gateway nodes
inorm:Normalized input packets
xi:ith feature set
Mean (xi):Mean of ith feature set
:The standard deviation of ith feature set
:Z-score of feature sets
f:Feature sets
cl:Class label
:Rank-based chi-square feature selection
:Number of observations in class cl
:Number of expected observations in class cl
Pcorr:Pearson moment correlation
Ai, Bi:Any feature sets
, :Mean of feature sets
:F-score correlation
:Mean of fi feature set
:Mean of the ith attribute of the fj feature set
:The ith attribute of the nth instance in fj feature set
:Number of attributes in jth feature set
Ef:The entropy of feature sets
fi:ith feature set
:The entropy of nth subset of feature sets
xi:Input data
oi:Observed class label
:Predicted class label
:Weak learning function
:Training loss function
:Empirical loss function between observed and predicted labels
:The loss function of the boosted learner
:Feature vector
:Optimal feature vector.

Data Availability

The data that support the findings of this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the King Khalid University Researchers Supporting Project Number (R. G. P. 1/77/42), King Khalid University, Saudi Arabia.