Abstract

IPv6 routing protocol for low-power and lossy networks (RPL) has been developed as a routing agent in low-power and lossy networks (LLN), where nodes’ resource constraint nature is challenging. This protocol operates at the network layer and can create routing and optimally distribute routing information between nodes. RPL is a low-power, high-throughput IPv6 routing protocol that uses distance vectors. Each sensor-to-wire network router has a collection of fixed parents and a preferred parent on the path to the Destination-oriented directed acyclic graph (DODAG) graph’s root in steady-state. Each router part of the graph sends DODAG information object (DIO) control messages and specifies its rank within the graph, indicating its position within the network relative to the root. When a node receives a DIO message, it determines its network rank, which must be higher than all its parents’ rank, and then continues sending DIO messages using the trickle timer. As a result, DODAG begins at the root and eventually extends to encompass the whole network. This paper is the first review to study intrusion detection systems in the RPL protocol based on machine learning (ML) techniques to the best of our knowledge. The complexity of the new attack models identified for RPL and the efficiency of ML in intelligent and collaborative threats detection, and the issues of deploying ML in challenging LLN environments underscore the importance of research in this area. The analysis is done using research sources of “Google Scholar,” “Crossref,” “Scopus,” and “Web of Science” resources. The evaluations are assessed for studies from 2016 to 2021. The results are illustrated with tables and figures.

1. Introduction

The Internet of Things (IoT) concept is a new and old concept introduced in 1999 by Kevin Ashton [1, 2]. He described a world where everything, including inanimate objects, has a digital identity of its own, and computers can organize and manage them [3, 4]. When this concept was first introduced, Ashton probably only had in mind the use of radio frequency-based identification chips. Research and transformation in IoT, which encompasses all aspects of human society and simplifies communications through interconnecting billions of ubiquitous objects, provides access and extracts accurate information from the massive volume of data delivered [5]. With the increasing development and use of intelligent equipment, this idea is getting closer to implementation day by day. Forecasts show that between 2009 and 2021, intelligent and interconnected devices will grow by 30 percent to more than 26 billion units. IoT tries to connect all the devices that can process and communicate through the IPv6 protocol. The RPL [6] routing protocol for LLN, also known as 6LoWPAN (IPv6 over low-power wireless personal area networks) [7, 8], was recently proposed to standardize connectivity.

The Internet Engineering Task Force (IETF) organized the routing over low-power and lossy (ROLL) network working group to introduce LLN networks. High packet losses, low bit rate, throughput and delivery ratio, poor stability, constrained resources, and the ability to work in harsh and challenging environments for a long time are the main features of LLNs [9, 10]. The LLN design commenced from the concept such: “the Internet Protocol could and should be applied even to the smallest devices, and that low-power devices with limited processing capabilities should be able to participate in the Internet of Things. An LLN network is a sensor network that communicates using the RPL protocol and the physical data link layer [11]. In IoT, unlike sensor networks or other wireless communication networks, there are fewer restrictions on wireless equipment connection, and any equipment can be connected to any equipment with any processing and communication capabilities. For this reason, IoT systems are vulnerable to many attacks, and their security strongly influences the security needs of these devices [12]. One of the most important aspects of secure communication in an IoT system is that it is entirely secure. In other words, the relationship between confidentiality and authenticity must be ensured. Establishing a secure connection and ensuring confidentiality and accuracy, and using open authentication makes it impossible to ensure the system’s security and communications. The system is still vulnerable to a wide range of attacks. As a de facto routing protocol for IoT infrastructures, RPL has experienced many security threats at the data traffic, network topology, and node resource levels [13, 14].

2. Motivation

Today, one of the most critical and widespread threats is in wireless networks, especially intrusion detection for wireless sensor networks (WSN) [15]. The intrusion action threatens the security of the collection and access to information so that the intruder can use it to advance his sinister goals. Network intrusion is defined as any unauthorized attempt to access, distort, alter, or corrupt information to make a system unreliable. Today, in almost all large-scale information technology infrastructures, we need an efficient intrusion detection system (IDS) to protect our networks from existing and future attacks. Despite advances in protection and detection mechanisms, it is still wholly impossible to protect computer networks. For competing in an arms race against very complex and different types of network intrusions, traditional intrusion prevention methods such as firewalls, access control, or encryption are insufficient to protect the networks [16, 17] entirely. An IDS tries to classify the activity of connections into two categories: normal and abnormal. In more advanced systems, the type of deviant behavior, also called intrusion, is sometimes identified. In an IDS, each connection is described based on a set of features, and decisions about whether the connection is normal or abnormal are made using those features. Intrusion detection systems allow detecting abnormal behavior in the network where an intruder intends to gain irregular access to the network after passing through a network security system. IDS is responsible for detecting and exposing intruders who detect unauthorized activities through network traffic monitoring or user activity reports [15, 16, 18].

It is not possible to prevent the intrusion entirely. Only by taking measures can the dimensions of the intrusion be reduced. Vulnerabilities of software and protocols and the network’s structural vulnerabilities provide intruders’ conditions to exploit [19]. Software vulnerabilities are caused by poor implementation and programming language. Structural vulnerabilities are network configuration problems in controlling access to critical network points. Intrusion detection is generally divided. Host-based intrusion detection: these systems are installed on the host computers and inspect the system’s activities and files. Distributed intrusion detection: a system that collects, processes, and analyzes network packets using several agents on different network parts. These systems operate in a distributed manner to receive information from other parts of the network and distribute the analysis load.

So far, numerous studies have been presented on the design of a robust yet functional IDS for various IoT applications. These IDSs are centralized, hybrid, or distributed, depending on their deployment. The mechanism for intrusion detection, i.e., based on specification, anomaly, signature, misuse-based, or hybrid, can detect, classify, and countermeasure attacks. Some of the work done is designed for the RPL protocol and can detect RPL insider attacks or attacks threatening the network structure and isolating attackers. Using ML and DL algorithms to detect attacks and counter threats intelligently is a new approach that has been considered in recent papers for RPL-based IDS design. Due to the various challenges posed by common scenario-based methods and intrusion detection datasets, the focus on learning-based models to detect and deal with complex and unknown attacks has become increasingly popular with researchers. Several review papers have focused on the LLN network and RPL security in recent years, examining the challenges in designing IDS for LLN-based IoT infrastructures. However, there is a lack of research that comprehensively studies intrusion detection methods for RPL, focusing on ML techniques. Therefore, ML’s important role in providing an innovative and efficient intrusion detection strategy for complex and cooperative attacks in RPL is our primary motivation for this work.

3. Contribution and Organization

This paper provides an overview of IoT intrusion detection systems researches. Then, penetration detection techniques, deployment strategy, architectures, attack types, and validation in the proposed solutions are examined. The RPL protocol faces several security challenges as the most critical solution for routing many IoT LLN devices’ data and forwarding traffic toward the Internet. Identifying RPL features and classifying the different types of known attacks completes our knowledge to design a robust and efficient IDS. This study, after considering ML and classification methods, as part of the data mining concept, will present the new mechanisms for designing IDS based on these solutions for RPL. The primary purpose of this review is to transfer useful information from RPL-based IoT intrusion detection sources to interested researchers. So, they can identify existing challenges and discover ideas for efficient IDS design for challenging environments facilitating complex and unknown attacks. The critical contribution of this study based on the presented motivations is as follows: (i)IoT intrusion detection autopsy including placement strategy, architecture, detection methods, security threats, and assessment methods(ii)An overview of the design and system features of RPL, control packets’ structure, the construction process, and the techniques for returning to stable conditions(iii)Classification of insider and routing attacks in RPL protocol(iv)Familiarity with the materials, methods, and technologies provided to penetration identification into the IoT ecosystem(v)An overview of data mining, ML and classification concepts, and popular presented models(vi)Acquaintance with the equipped facilities by data mining for IDS(vii)Introducing the various ML mechanisms proposed to intrusion detection in RPL(viii)Providing open issues and challenges and in line with the current research review in IDS design for IoT and LLN

The roadmap of the paper is organized as follows. Section 4 provides a comprehensive literature review on the IDS types designed for the 6LoWPAN and WSN infrastructures in IoT, the standard attacks and threats to IoT applications, intrusion detection systems focusing on objective functions designed for RPL, ML-based solutions for designing IDS for RPL-based IoT, and the types of datasets used by ML-based systems to detect and countermeasure RPL and IoT attacks. Section 5 represents a taxonomy of IoT intrusion detection solutions and describes these categories of IDS placement, architectures, and validation strategies. The RPL routing protocol, design features, control messages’ structure, DODAG graph construction process, fixing mechanisms, and trickle timer algorithm are discussed in Section 6. Next, Section 7 introduces and categorizes RPL insider attacks and attacks that threaten the RPL routing process. In Section 8, the data mining concept, machine learning and classification methods, and data mining aspects in IDS design will be presented. Section 9 proposes machine learning-based methods for IDSs used for RPL-based IoT. In Section 10, we will statistically analyze the current review. Sections 11 and 12 represent the discussion and explain the challenges of designing an efficient IDS for RPL-based IoT and future research directions in the present study for potential researchers. Finally, Section 13 provides the conclusion. Table 1 also represents a list of abbreviations and their descriptions in this research work for the reader.

4. Literature Review

In the paper [20], a framework for real-time intrusion detection called SVELTE was proposed and implemented in the Contiki/Cooja system. This method uses three elements to detect intrusion in real-time. The first element collects traffic information on the network. The second element detects the presence or absence of intrusion in the network based on the data collected and analyzed. The third element is a small distributed firewall to prevent the attack’s spread and block the distributed attacks. Pongle and Chavan [21] propose a Denial of Service (DoS) attack detection architecture for 6LoWPAN networks, a standard protocol designed to transfer data between small Internet-connected devices. The proposed architecture is an IDS integrated with the ebbits framework. The purpose of presenting such an architecture is to detect and counteract DoS attacks by rejecting 6LoWPAN networks. To evaluate the performance of the proposed architecture, the authors performed the experiments in real-time using penetration testing systems, which showed an improvement in the detection rate of attacks. Besides, with the development of IDS, more attacks can be detected. In the paper [22], a profile-based IDS is presented to detect an attack on IoT routing services. The purpose of this paper is to prevent, detect, and isolate the effect of routing attacks. This method’s mechanism identifies and detects the attacker by analyzing the behavior of nodes in the network. The servers collect the behavior and transmission of the nodes and send them to the server. This method has a high accuracy of diagnosis. Bostani and Sheikhan [23] presented a real-time combined method to detect internal intrusion that may occur in the 6LoWPAN network. In this model, the MapReduce method is used to detect distributed intrusion. The proposed model uses real-time anomaly-based and Misuse-based methods to detect intrusion. In other words, this method has used supervised and nonsupervised methods to detect intrusion. The main focus of the proposed method is on detecting distributed attacks such as distributed denial of service (DDoS).

Combining various technology, services, and standards to allow IoT solutions necessitates the use of multiple technologies, services, and standards, each with its own set of security and privacy criteria. As a result, it is reasonable to believe that the IoT model, including mobile communications networks (for example, WSN), cloud systems, and the Internet, has security concerns. As previously said, standard protection and privacy controls are inextricably linked to three major factors: IoT components’ minimal computing capacity, the vast number of interconnected devices, and data exchange between objects and users cannot be seen for IoT technology. An example of how IoT devices are vulnerable is described in [24]. In this paper, the authors examine three IoT devices’ activities (Philips light bulb, Belkin WeMo socket, and Nest smoke alarm) and demonstrate how these devices’ protection and privacy can be jeopardized. The authors found a flaw in the answer request message shared between the bridge and the Philips light bulb (a wireless router and the Philips Hue application). The intruder will discover the contact bridge’s registered usernames and P addresses by communicating in plain text with them. Using the developers’ Python code, the intruder can also take complete control of the communication bridge system via HTTP PUT requests. According to the paper [25], IoT technologies’ rapid creation may ignore security and privacy threats.

Several security vulnerabilities have been identified by creating popular commercial standard services and products. This paper’s researchers put together an intelligent irrigation device that includes a section that provides environmental readings, a module for carrying out user decisions, and a unit that connects the user to the rest of the architecture. They have used an Arduino Uno single-board device to execute all sensing and activation functions and the web application. Web server vulnerability, SQL injection threats, infiltration into XSS, and wireless communications are only a couple of the breakthroughs described by the authors. The authors, for instance, listed the following attacks: as it happens in the real world, an intruder potency constructs a Software Powered Access Point alongside the identical Service Set Identifier without authentication. Via all transmissions of bogus authentication packets, it will then temporarily shut down all IoT applications. At this stage, IoT devices can attempt to reconnect to the same app connection point with the same identifier and best signal. According to the authors of the paper, sophisticated operating systems may deter attacks. Still, the operating systems of many IoT devices that lack proper functionality may not be able to say the difference. They will link to the attacker’s forged Software Allowed Access Point. As shown in Table 2, an attacker could eavesdrop on network traffic and send remote requests to IoT devices.

IoT customers need to release patch codes for software and hardware vulnerabilities in their products. Also, the development of new IoT products should protect the interaction between IoT entities as a concern. The stability of IoT networks will be improved as a result of these steps. Additional protections, such as intrusion detection systems, are also needed because attackers may attempt to detect new vulnerabilities by merging known vulnerabilities that have not been appropriately secured. Item copying, malicious object replacement, firmware removal, extraction of security parameters, eavesdropping, PITM, routing attack, and denial of service were all listed as security risks involving IoT organizations in the paper [41]. Table 2 organizes the suggested intrusion detection device prototypes for the IoT into groups based on the kinds of threats they will track (according to the authors) [42]. Security risks associated with traditional technologies and interfaces used to create the IoT ecosystem can extend to the IoT systems, as noted in [42]. For example, insecure communication over HTTP and malicious code injection. These types of attacks are considered regular attacks.

The proposed IoT detection systems can be split into two classes, as seen in Table 3. Detection techniques for routing threats and denial of service attacks. Other attacks listed in the analysis include PITM and regular attacks. In this segment, five classes of related scientific journals are analyzed and discussed. IoT procedures, attacks attributed to Minimum Rank with Hysteresis Objective Function (MRHOF) [43] and OF0 [44] objective functions (OF), IDS procedures and feature collections, datasets and classifiers related to ML, and preprocess and load balancing (LB) methods were the five crucial issues. We have chosen two papers from each group to address here due to page restrictions. Table 3 includes more similar papers from each category and a distance overview for our novel approach. Practical and simulation studies, expert judgment, assessment, interpretation, and opposing viewpoints were used to pick publications, claims, and literature. Researchers used scholarly literature search engines, archives, and newspapers to find research’s benefits, shortcomings, and holes.

Propose a survey about IoT-based IDS and employing ML, anomaly-driven approaches, energy-efficient intrusion detection, along with objective function behavior study about IoT methodologies [18], all of which are important to our research in this paper. Centered on mesh-under and route-over [45] systems, researchers address energy usage as a criterion employed to evaluate typical behavior profiles to detect malicious behavior. Each node must check energy utilization at predetermined sampling ratios also disclose any deviations indistinction to planned amounts. Malicious behavior is described as deviations from predicted values, and the sensing node record is cleared in the routing table accordingly. The research builds against the node behavior concept, emphasizing energy consumption (EC), arguing that packet overhead and memory usage consider appropriate IoT-based IDS criteria. Even though the study presented a deep characterization of IoT- based IDS, it lacked technical scope.

Besides, the mere analysis may not be an appropriate solution to provide an objective approach to design ML-based IDS for IoT. Again, Rehman et al. [46] address the RPL protocol’s RAOF susceptibility. The attacker disservices the routing protocol metrics. Accordingly, the neighboring nodes are attracted to the malicious node and choose him as the preferred parent, which indicates the success of the attacker’s objective function. Their simulation results indicate the attack’s effect when contemplating a convenient place for the attacker inside the RPL DODAG. When considering RAOF, the research is essential because it discovered a connection between EC, OF, hop count (HC), and other routing criteria. Nevertheless, it does not give any counterarguments to their strategy.

When proposing an attack on routing metrics, the relation between EC and HC metrics should be explained in more detail. The paper included a systematic review of OF vulnerabilities across RPL networks and implementing a new attack not previously seen in science. Airehrour et al. [42] propose a trust-based RPL design identifying BH and SF attacks in IoT around OF0 and MRHOF. The conscious trust design is examined in contrast to OF0 and MRHOF to see if their method is wealthy. SF attacks across the conscious trust design were increasingly and substantially diminished due to their findings. These attacks involve separating malicious nodes from the rest of the network. MRHOF and OF0, on the other hand, will not identify or cut off SF-based attacks.

Furthermore, the trust-based agreement identified and distinguished BH attacks using transmitted packet arrangement and sequence ID review, but OF0 and MRHOF do not have this feature. They also did not use highly detailed OF0 and MRHOF mechanisms to identify and separate malicious nodes. There is a lack of basic information in this regard. Airehrour et al. [47] also provide a trust-based mechanism for identifying and countermeasuring rank and Sybil attacks. The performance of Sec-Trust-based RPL contrasts with the conventional OF0 and MRHOF-based protocol. MRHOF-based RPL performs better than OF0 based on resource and network flow analysis-based metrics. MRHOF has a higher susceptibility than SecTrust-RPL when just considering Rank and Sybil Assault. Although the authors claim that their protocol is more stable than OF0 and MRHOF, they have not talked about objective function and RPL protocol-based IDS. We need this context to expand our research scope since ML combined with IDS can eliminate the need for RPL and OF to detect and isolate attacks. Experiments comparing SecTrust-RPL to the other two OFs using an appropriate IDS would have given a reasonable assessment when considering IoT protection.

Sheikhan and Bostani [63] describe a security framework for attacks detecting across IoT infrastructures created based on a distributed design in IDS methodologies and feature selection (FS). Their suggested approach focuses on using ML to identify SF and SH attacks that disrupt planned actions or cause irregular activities. According to their reports, anomaly detection effectively detects the SH and SF attack up to 80.95% and 5.92% false alarm rate. Misuse-based monitoring predicts up to 97.88% of SF and SH attacks with a 1.96% false alarm rate. Despite the higher detection probability and lower false alarm rate of SF and SH attacks by the misuse-based mechanism, this method can only detect known attacks. Although this research emphasizes the need to identify significant behavioral characteristics such as packet reception rate, average latency, packet loss rate, and maximum HC, more studies are needed to detect OF-based attacks. Napiah et al. [64] use multilayer perceptron (MLP) to detect hybrid attacks such as HF, WH, SH, and Flood in RPL and 6LoWPAN, called compression header analysis intrusion detection (CHA-IDS). This mechanism, from naïve Bayes (NB), support vector machine (SVM), MLP, Random Forest, Logistic, and J48 algorithms for collecting and analyzing raw data, offers intrusion detection capability for 6LoWPAN based on anomaly and signature-based features. They have experimental evidence that CHA-IDS outperforms other 6LoWPAN IDS models in detecting mixed attacks. Compared to PONGLE and SVELTE, this mechanism uses compressed header data for 6LoWPAN instead of signal strength and rank indicators calculated from detection functions. Abnormal routing patterns, destination port, context ID, destination context ID, and subsequent header are used to effectively identify attacks by ML algorithms. According to their findings, J48 was the most efficient ML algorithm across all datasets, while Random Forest came in second. Studies on 6LoWPAN and RPL bugs, as well as shortcomings in existing IDS processes, are among the paper’s highlights.

Buczak and Guven [70] use the publicly available datasets NetFlow, Knowledge Discovery and Data Mining (DM) 1999 (KDD 1999), Secure Shell (SSH), Domain Name System (DNS), DARPA 1998-2000, and tcpdump for ML algorithm, classification mechanisms, and DM-based intrusion detection. The training should be done on the same dataset to ensure accurate comparisons with other samples during the research process in ML algorithms. KDD 1999 is the best dataset identified since development and is constrained by attacks that may cause problems for use as a reference dataset for OF0 and MRHOF-based RPL in IoT. Eight DM algorithms, ANN (artificial neural network), deep learning-based ANN (DLANN), C4.5, C5.0, k-nearest neighbors, SVM, linear discriminant analysis (LDA), and NB, have been proposed by Alam et al. [71] for the IoT. They use three sensor datasets from the University of California, Irvine (UCI) data warehouse to test new DM algorithms required for IoT or the convenience of existing standard algorithms for IoT datasets. The results of comparisons for IoT datasets showed that C4.5, C5.0, ANN, and DLANN performed better than NB, SVM, neural networks (NN), and LDA in terms of accuracy and elapsed time criteria.

Each ANN contains neurons that can learn complex and nonlinear functions and be appropriate in various fields such as DM, machine vision, medical applications, reinforcement learning, and deep learning by emulating human brain attitude [73, 74]. The ANN and DLANN algorithms had the highest detection accuracy, but they suffered from computational cost and poor memory performance. The two algorithms, C4.5 and C5.0, had high processing speed and accuracy and low memory usage. The study covers an area that was difficult to describe in the literature review, while the KDD and DARPA datasets are primarily used for ML-based IoT analyses. Since the paper contains an original investigation in a field that has not been studied before, it provided more in-depth descriptions of the UCI datasets. Yin and Gai [72] talk about ML and the DM techniques complexities used to solve preprocessing and balancing related to enormous and new data types complexities. Classification processes, preprocessing, feature collection, and data sampling are all discussed in this paper. According to the publication, various classification algorithms are available to establish appropriately balanced, tremendous-quality datasets. The common preprocessing mechanism increases the database’s accuracy by proper sampling and reducing the selected features. The authors tried to provide access to and develop preprocessing mechanisms with ML to create extreme quality datasets. They used only the C4.5 algorithm-based classifier to exclude the contradictory effects of the 12 datasets used.

The findings show that when FS is made before data sampling, a classifier’s accuracy is more accurate. When data is largely imbalanced, experimental results show that under sampling rather than oversampling is preferable when considering minority classes. Other preprocessor levels may have been included in the experimental operation to increase the dataset’s precision further. Our primary focus areas for the current review paper were identified by following the available literature: network-related and EC criteria, feature collection, vulnerability analysis of OF0 and MRHOF objective functions, development of an innovative dataset based on IoT attacks, and variations to IoT attacks. For example, unlike [18, 64], our paper describes the ML-IDS mechanism that detects a combination of attacks over OF0 and MRHOF based on network and power consumption metrics. Also, unlike [70, 72], the feature reduction, normalization, sampling, and preprocessing methods have been used to create a dataset based on the two mentioned objective functions’ attacks. Besides, to the best of our knowledge, no one uses time series-based ML classifiers when using the new IoT dataset to detect a combination of different objective functions (such as OF0 and MRHOF) attacks focusing on network and energy usage metrics. Table 4 summarizes the analysis of the most critical new high-quality papers under review.

5. Intrusion Detection in IoT

Due to IoT’s unique features that influence intrusion detection systems’ development, the IoT systems’ current solutions are insufficient. First of all, the network nodes’ memory and processing capacity that host the IDS is an important issue. IoT networks are made up of limited resource nodes. As a result, locating nodes that can support intrusion agents in IoT applications is more challenging. Second, network architecture’s functional characteristics are essential. End systems are connected directly to individual nodes, such as switches and routers, in conventional networks responsible for transferring packets to their destination. However, IoT networks are typically multistep, and normal nodes transmit packets and serve as terminal devices. The final function has to do with network protocols. Protocols not used in standard networks, such as RPL, 6LoWPAN, IEEE 802.15.4, and CoAP (The Constrained Application Protocol) [75], are used in IoT networks [76]. On the other hand, the papers are based on intrusion detection systems for IoT-related components. However, none of them look at basic intrusion mechanisms for IoT. In the following, IDS detection strategies and detection methods are designed, and traditional threats or attacks to security in the IoT also whereby IDSs may be used. Also, the validation strategy used in intrusion detection methods for IoT will be described. In general, the classification of related works is as follows and is shown in Figure 1 [77]: (1) placement strategy, (2) architecture, (3) detection methods, (4) security threats, and (5) validation strategy.

5.1. The Strategy of Intrusion Detection System Placement

An intrusion detection device can be installed on the 6LoWPAN border router (6BR), one or more dedicated hosts, or any physical entity in an IoT network. The ability to detect intruder attempts from the Internet against artifacts in the physical layer is one of the benefits of installing an intrusion detection feature in the 6BR. As a result of the IDS’s frequent queries in the network, a 6BR IDS can create a contact overhead between the LLN nodes and the 6BR. The connectivity overhead associated with network control may be minimized by deploying an intrusion system on LLN nodes. It does, however, need more time (energy, storage, processing). Due to the limited capacity of LLN nodes, this may be a challenge. Although the more extensive distribution of intrusion detection equipment on a large scale may lead to less control over network traffic and tremendous computing potential, this strategy requires more precise organization of different network domains, which seems problematic. The advantages and drawbacks of three different placement strategies for IDSs (distributed, centralized, and hybrid) are as follows: (i)Distributed intrusion detection system deployment strategy(ii)The IDS is located within any LLN network’s physical entity in the distributed model [77, 78]

IDS installed in each node must be optimized, and resources are limited. The watchdog nodes (inspection nodes) monitor the activity of neighboring nodes. INTI (intrusion detection of SH attacks on 6LoWPAN for IoT) is a solution proposed by Cervantes et al. [38], which integrates the principles of trust and reputation with watchdog to monitor and minimize attacks. At first, nodes are classified as representatives, linked to members, and arranged in a hierarchical system. Depending on the network reset or an attack case, each node’s position will vary over time.

Consequently, each node holds an eye on a superior node by predicting its incoming and outgoing traffic. It broadcasts a message to warn other nodes and isolate the attackers when it identifies an attack node. The authors have not addressed the solution’s effect on low-capacity nodes. From the traffic prediction method, we can point to Monte Carlo Q-learning [179] and multitask learning [180].

5.1.1. Placement Technique for a Centralized Intrusion Detection System

The intrusion detection system is installed in a centralized portion in the centralized model, such as the 6BR either a consecrated host. 6BR collects all LLN node data, connects it to the Internet, and requests Internet users to send it to LLN nodes. As a result, the 6BR detection device will examine all traffic passing through the LLN and the Internet. However, traffic analysis via the 6BR is insufficient to identify attacks affecting LLN nodes. A diagnostic method that can detect traffic shared between LLN nodes while avoiding this monitoring behavior on node operations must be built. Monitor with limited power. During an attack that exploits a portion of the network, the centralized intrusion detection system can have trouble tracking nodes even though this approach increases network traffic. The authors demonstrate that LLN nodes do not need extra memory to run the heart rate algorithm, and the energy overhead is negligible.

5.1.2. Placement Technique for Hybrid Intrusion Detection Systems

Hybrid mode blends oriented and distributed principles to win benefits and escape drawbacks. The first solution divides the network into clusters or domains in hybrid mode, with the intrusion detection mechanism only present on each cluster host’s primary node. As a consequence, this node is in charge of keeping track of the other cluster nodes. This description tends to be consistent with Cervantes et al. [38]. The authors, on the other hand, divide the network into clusters and choose cluster leaders. Any node, whether it is a leader or not, should keep an eye on its neighbors. Only selected nodes host instances of the IDS in hybrid approaches which are always efficient. As a result, hybrid location IDSs could need more time than distributed location IDSs to build. Amaral et al. [34] suggested a hybrid solution for an IDS considering the Internet of Things. The IDS is hosted on selected nodes in the network in this approach. Through eavesdropping on packets shared in their vicinity, these chosen nodes (watchdogs) assist in detecting intrusions. Based on a series of rules, watchdog determines if a node is corrupted. Since each part of the network can behave differently, every watchdog (inspector node) includes its own rules. A 6BR, for example, receives a higher number of data messages from a common node. The dependence on permission is one of the benefits of this approach that creates various rules for every network area. Lee et al. [56] take a regional network organization approach. By creating a column of observer nodes, they use a combination technique. An observer node listens in on its neighbors’ interactions and decides if a node is at risk using a limited number of surveillance nodes that span the whole network. This approach has the benefit of not contributing to the amount of contact needed. Because the monitoring node only listens in on transmissions between its neighbors, this is the case. Lee et al. [57] divide the network into small clusters of several related nodes in another paper.

A cluster head (CH), a directly linked node to all cluster members, is present in each cluster. Every CH has an instance of the IDS that tracks the cluster members by listening to their communications. Cluster participants must remind the CH of essential knowledge about themselves and their neighbors. According to the authors [21, 58], the CH in question could be a more robust node. To build a solution, they choose lightweight IDS. IDS modules are found in the 6BR and other network nodes in the second solution. The inclusion of a core variable distinguishes this strategy from the previous one. The 6BR’s IDS module is responsible for tasks that need more resource power. In contrast, standard node intrusion detection systems are typically lightweight. Reza et al. [58] proposed SVELTE, an intrusion detection method in which the 6BR host focuses on IDS module processing. E.g., by analyzing RPL network data, this person is in charge of detecting intrusions. Network nodes handle lightweight functions such as sending RPL network data to the 6BR and notifying the 6BR about malicious traffic they receive. Pongle et al. [21] suggested a system in which network nodes are in charge of identifying changes in their neighborhood and forwarding information to centralized modules in the 6BR. Concentrated modules are in charge of saving and processing this information to track intrusions and identify potential attackers. However, the IDS’s explanation may point to an architecture that takes a lot of traffic to detect intrusions. However, the results show that energy overhead, closed overhead, and memory utilization are adequate in a limited-node environment.

5.2. Intrusion Detection System Architectures

Classification based on the detection system’s architecture is independent, distributed, participatory, and hierarchical, and the moving agent is explained in detail in this section [32]. (i)Independent Architecture. Each observer node collects information and detects intrusion itself. The observer node may be centralized or distributed. In a centralized observer node architecture, each network node acts as an observer node. In a distributed observer node architecture, each observer node monitors a specific area of the network. Each sensor node must be within at least one observer node. Each monitoring node has an independent intrusion detection(ii)Distributed and Participatory Architecture. IDS agents are executed on each monitoring node. All monitoring nodes cooperate in the intrusion detection procedure. IDS monitors its neighboring nodes’ behavior, but the exchanged data and alerts with another monitoring node from across the network participate in the overall decision. Such a system improves diagnostic performance. This architecture is suitable for network infrastructure with a DODAG(iii)Hierarchical Architecture. It is suitable for a cluster sensor network with a hierarchical structure and includes multiple DODAGs (common Sink node). Sink nodes act as CH agents. Local agents are designed and deployed based on an IDS with an independent architecture and cooperate in the intrusion detection process(iv)Moving Agent Architecture. Employs several mobile agents to collaborate on the intrusion detection process. The mobility of IDS agents may improve the performance of IDSs. Mobile agents, which can move from one node to another, are a unique executable code that gives particular application self-control. Agent migration means the transfer of the agent between two nodes or the transfer of data, the calculations of which are also performed during this process(v)Examining Many Routing Attack Types. Attacks such as WH, SF, HF, SH, Sybil, and identifier can be mentioned. Walgren et al. [31] provided an IDS capable of detecting SF attacks. In this regard, Reza et al. [20] can detect the two types of SH and SF threats by their proposed IDS. Cervantes et al. [38] have developed a system for detecting SH attacks. In this work, the authors address the mobility of nodes and network self-repair, which is significantly related to the work of Reza et al. [20]. Pongle et al. [21] have developed an IDS to detect WH attacks

5.3. Validation Strategy

According to Balci [79], validation entails testing the built model’s behavior according to the research objectives with sufficient accuracy. Several confirmation processes, each characterized through the two sources of proof: specialists and data (information). The employment of experts, on the other hand, offers a subjective and often qualitative paradigm. For quantitative validation, data could be more relevant. This analysis aims to look at the validation technique used in IoT intrusion detection methods. Such parameters will serve as a starting point for deciding the field’s maturity. The following classification of validation methods is given for this reason [79]: (i)Hypothetical. Cases of an ambiguous relationship to actual phenomena and varying degrees of realism(ii)Empirical. Experimental approaches, such as gathering systematic experimental evidence from organizational contexts, are empirical methods(iii)Simulation. Methods for simulating such IoT situations(iv)Theoretical. Formal or systematic scientific claims that justify results are referred to as theoretical(v)None. There is no form of validation included

Scientific progress is based on the completion of results. They can be objectively tested and compared in large-scale simulations. Most traditional intrusion detection system analysis is based on data from Lincoln Laboratory/DARPA tests in 1998 and 1999. This work is the most thorough review of intrusion detection analysis that has been published to date and is the most comprehensive evaluation of intrusion detection research to date. Several criticize and point out that this is an ancient data set that cannot adapt to attacks’ latest trends. Having a data set is crucial to understanding a model correctly.

6. RPL Routing Protocol

The RPL protocol is designed for routing in LLN, in which frequent connection interruptions and packet losses are inevitable. RPL has static and reactive nature with its tree-like structure. In partnership with IPv6, it provides data aggregation and interoperability between Internet-based devices. In a 6LoWPAN network, RPL is mainly used. In an RPL-based 6LoWPAN, this protocol generates a DODAG. It also supports single-way traffic to a destination-oriented graph between 6LoWPAN devices and between devices and DODAG root without a destination-oriented graph and two-way traffic (usually 6BR). The RPL protocol is Proactive and starts routing as soon as the network begins. In a network, each node has a CH that acts as a gateway for that node. If the node does not have information in its routing table to direct the packet, it redirects it to its CH node. This guidance will continue until the node reaches its destination or the Sink node with relevant information. Therefore, the header node will have a larger routing table. Route selection is one of the essential factors in RPL [80].

The OCSVM technique for anomalous detection in Supervisory control and data acquisition (SCADA) Networks based on machine learning may achieve a high detection rate, substantially lowering the false alarm and false-negative rates. The restrictions of a one-class technique on a kernel strategy are how to determine the suitable threshold and minimize the associated cost. IForest’s implementation impact is not high in OCSVM, but it has accurate and can manage the advantages of important information. Hence, it is suitable for use in online learning. To meet the targets of multiclassification, you may use the supervised learning algorithm, particularly the decision tree technique, to quickly learn, construct, and store the learned intrusion rules. Although KNN’s usage of feature vectors has a comparable impact to decision trees, KNN’s processing cost restricts its application [181, 182]. The infiltration into the industrial control network differs from the intrusion into the Internet network. Somewhat of focusing on network communication flaws, the former threat focuses on faults associated with employing industrial control and industrial gear. ML and data analysis may be used to discover the link between normal and aberrant. During the experiment, it was discovered that using the one-class classification approach for intrusion detection can only identify abnormal, not locate abnormal classes. The use of such unsupervised learning for intrusion detection limits the experimental goal and results from interpretation; nevertheless, using a semisupervised method to the intrusion detection system is an enhanced aspect. In addition to low cost, security, and mobility support, industrial applications demand dependable connectivity with minimal latency [183]. RPL is gaining much traction in industrial applications since it meets most of the fundamental criteria and, with the existing enhancements, can be used to create a versatile, reliable, and scalable routing solution. GTM-RPL improves RPL’s performance by allowing it to handle mobile nodes and optimize throughput, making it a viable option for industrial uses [184].

6.1. Design Goals and Network Model Based on RPL

The RPL routing protocol is an over-the-top interconnection mechanism comprising MAC (medium access control) and physical IEEE 802.15.4 layers, distance vector protocol, and source routing protocol. The RPL protocol also has a tree-like structure. The nodes alternately send their sensing data to central points called the low-power border router (LBR) or 6BR, which is the cumulative point of traffic for low-power nodes. Finally, the data is routed to the Internet or a non-DODAG structure. The RPL supports point-to-point traffic as well. The RPL LLNs have two principal characteristics, in particular [81]: (1)Usually, the bit rate is low (lower than 250 kbps)(2)Correspondence has a high rate of error and, as a result, poor data throughput

A low-power connection has a high bit error rate and a long unavailability period, which significantly impacts routing protocol architecture. When default routes are unavailable, the protocol is configured to respond to high network conditions and offer alternate routes. RPL is built on the directed acyclic graph (DAG) topology, as previously mentioned. The DAG defines the default paths between nodes in a tree-like layout. On the other hand, a DAG system is more than just a regular tree [82]. A node in a DAG can have several parents, while classical trees can only have one. The RPL organizes the nodes as a destination-oriented DAG. The DAG root is a default Internet path (port) provided by the destination nodes (Sink). One or more DODAG may exist in a network, each of which specifies an RPL Instance with a single identifier. Several RPL instances will run in the same network simultaneously, but they are technically different. A node can be connected to several RPL instances, but each instance can only have one DODAG. The RPL routing protocol combines both mesh and hierarchical topologies as one of its functions. According to its design, the RPL protocol acquires a hierarchical mechanism that nodes can participate in one or more DODAGs structures simultaneously based on different parameters such as the application type. RPL supports a mesh topology, enabling routing, if necessary, through sibling nodes, instead of parents and children. In terms of topology control and routing, this hybrid of mesh and hierarchical networks gives a lot of versatility [83]. Figure 2 provides an overview of the two-instance RPL network and the three DODAG networks.

The following features are included in the RPL protocol [29]:

Autoconfiguration: as RPLs comply with IPv6, RPL-based LLNs usually use simple IP routing functions to dynamically find network routes and destinations. This functionality is assured with nearby pathways of detection.

Self-healing: RPL has demonstrated the capacity to respond logically to topology and node faults in the network. Links and nodes are not constant in LLNs and can vary widely. The RPL implements mechanisms to remove or reduce the risk of failure by selecting more than one parent for each DAG node.

Loop avoidance and detection: due to its nonloop existence, a DAG should get a greater rank than its parent nodes. RPL uses reactive procedures to discover loops for topological shifts. It also initiates global and local recovery procedures to fix or prevent loops.

Independence and transparency: RPL is a link-layer protocol that can be used on restricted networks or in combination with highly restricted systems. As a result, RPL is unaffected by data link layer technologies.

Multiple edge routers: in an RPL network, multiple DAGs may be formed, each with its core. A node may be part of several instances and play various roles in each of them. As a consequence, network availability and LB would help the network.

The three traffic patterns shown in Figure 2 of the third DODAG [28] can be used to send RPL packets: (i)Multipoint to point (MP2P) uses upward directions from the leaves to the root(ii)Point to multipoint (P2MP) uses downward directions, from root to leaf(iii)Point-to-point (P2P) uses all up and down directions of network routes

Each of the traffic patterns is explained in turn in the following sections.

6.1.1. MP2P (Multipoint-to-Point) Mode of Operation

RPL protocol can handle MP2P traffic, data aggregation traffic from several nodes, and the DODAG root. In most LLN-based IoT applications, multipoint-to-point traffic accounts for the majority of network traffic streams. 6BRs, which play an essential role in the network and offer an interface for connecting to the Internet, are the most popular MP2P destinations. RPL supports MP2P traffic with DODAG root connections to destinations. Root routers were used to construct upward paths when DODAG was installed. The recommended parent chain is used to construct default paths from nodes to root [84]. The key benefit of MP2P traffic is that it can use partial routing mode, which means that the node only has to store the destination, which is the DAG root.

6.1.2. Point-to-Multipoint (P2MP) Mode of Operation

RPL protocol also characterizes the P2MP operation, depicting traffic forwarded from the root to multiple nodes in a downward direction. RPL uses the DAO (destination advertisement object) control packet mechanism for destination advertising to support external P2MP one-way traffic, which is used for a small number of specific LLN-based IoT applications such as home and industrial automation. The DAO mechanism provides fewer routes in the DAG structure for destination access for routers. Routers deliver DAO messages to their parents or DAG root in one part to install downward directions [85]. In-network prefixes and ad addresses for each destination are given in the DAO messages. Each router that sends a DAO message to the root adds its address to a DAO message reverse routing path. For this reason, the source node can route traffic to its child nodes in the DODAG structure.

6.1.3. Point-to-Point Mode of Operation (P2P)

RPL routing protocol offers routing structures considering two DODAG nodes. The 6BR must transfer packets to the destination when the origin is routed because of the P2P traffic support in the RPL network. Two instances exist: (1) if the destination node is at the same point as the sender node in the same propagation range, it can immediately transmit a message to the destination without passing it on to its parent. (2) The P2P mechanism depends on the presiding in the network’s storage or nonstorage mode [86]. If not saved, routers do not store information about downward routes (no child information and just process the data source). Each packet must first be sent to the root through the DODAG upward route, after which it will be sent to its destination. Routers in storage mode save downward path routing information locally. If the destination is a router descendant, the message is sent to the router closest to the destination. If the destination is not a descendant, a message is sent to the parent node, which sends the packet to its destination using the same rules as before. As a result, the packet will be transmitted from the child to the parent to connect the tree to the router, the source, and the destination nodes’ first ancestor. The RPL routing protocol’s network model is seen below. As a result, RPL distinguishes three kinds of nodes [86]:

6.1.4. Low-Power and Lossy Border Router (LBR) or 6BR

The root of a DODAG is a point of accumulation in the network that suggests the network’s capacity to create a DAG. Between the Internet and the LLN, the LBR serves as a firewall (or edge router). (i)Router. A system that can both produce and send traffic is referred to as a router. This method of routing cannot generate a new DAG because it is relying on an individual(ii)Host. This term refers to a final system that can produce data traffic but not transmit it. DODAG is the most fundamental topological part of RPL. The DODAG root is a destination-oriented DAG in which a particular node called root is seen in Figure 2

The properties of the DODAG root are as follows [87]:

(1) Usually serves as a 6BR. (2) Sink inserts data into a graph without causing it to rotate. (3) In DODAG, the node is usually the final destination, acting as a specific transmission point connecting the LLN to IPv6 networks. (4) Ability to create a new DODAG down to the root nodes.

Each node in DODAG is assigned a rank. According to the root “DODAG,” a node’s rank is specified as the node’s status among the other nodes. In the DODAG structure, the root has the lowest rank value. In the downward direction, this rank will increase, and in the upward direction, the value will decrease. Therefore, nodes close to the root have lower ranks than their descendants or lower nodes in this structure, as shown in Figure 2. A DODAG’s geometry is close to a tree-cluster topology, with all traffic being stored at the base. The DODAG architecture, on the other hand, differs from the cluster tree in that a node is based on both its parent (with higher rank) and other sibling nodes (with equal rank) [88]. In DODAG, rank is used to avoid and detect routing loops and identify parent and sibling nodes. RPL requires nodes to maintain a list of possible parents and siblings to be used if a parent’s routing capacity is disabled. Each router defines a secure set of parents on a path to the DODAG root and assigns itself to a preferred parent depending on the objective function when constructing a network topology. The goal feature specifies how RPL nodes interpret one or more parameters within rankings and pick and optimize DODAG paths. It is also in charge of evaluating routing constraints and optimization goals and measuring rank based on basic routing parameters (such as latency, connection quality, and connectivity). The design of efficient target functions is still a work in progress. In one case, they used the expected transmission count (ETX) to pick a crucial path in RPL routing by successfully passing a packet over a connection. The path from a specific node to DODAG’s root represents the path that minimizes the number of ETX from the start to the root [88].

6.2. RPL Control Messages

Figure 3 illustrates the structure of RPL messages, which are a new type of ICMPv6 control message. The RPL control message is made up of the following components:

Three fields make up an ICMPv6 header: type, code, and checksum. The body of a message is made up of a base message and many choices. The type area, set to 155 for RPL, defines the type of ICMPv6 control message (IANA approved). The kind of RPL control message is specified in the code area [89].

The RPL category area currently has four codes, each of which is explained separately below [27]:

DODAG information solicitation (DIS): the DIS message is mapped to x00 and is used to ask an RPL node for a DODAG knowledge object. In adjacent DODAGs, DIS can be used to analyze neighboring nodes. Flags and fields for potential use are used in the new DIS message format.

DODAG information object (DIO): the DIO message is mapped to 0x01 and exported by the DODAG root to construct a new DAG, which is then sent via the DODAG structure rendered process. The DIO message contains network knowledge that helps a node discover an RPL Instance, learn its configuration parameters, choose a DODAG parent packet, and keep DODAG updated. Figure 3 depicts the DIO base object shape. The below are the major DIO object fields [90]: (i)RPL instance ID, an 8-bit data that starts with the root DODAG, indicates the instance RPL ID of which DODAG is a part(ii)Version number shows the version number of a DODAG, which usually increases with each update of the network information, keeping all nodes up to date with the new update(iii)A 16-bit field defines the rank of the DIO message sender node called a Rank(iv)DTSN is an 8-bit flag that is used to hold downward paths open(v)G is a flag that specifies whether the current DODAG satisfies the application’s intent(vi)The mode of operation (MOP) defines the RPL instance’s operating mode, determined by the root DODAG

There are four different modes of service, each of which serves maintenance and multisegment downward routes differently. By default, upward paths are supported. Each node connecting to DODAG must behave as a router when dealing with the MOP; otherwise, it will be regarded as a leaf node [90]. (i)Prf is a 3-bit field specifying DODAG root precedence over other DODAG roots. Its value ranges from 0x00 (the default) to 0x07 (the highest priority)(ii)The DODAGID is a DODAG root-listed 128-bit IPv6 address that recognizes DODAG uniquely. Finally, an Options field can be present in the DIO base object

Destination advertisement object (DAO): the DAO message is mapped to 0x02 and is used to relay reverse track information to record upstream nodes. DAO messages are sent to add routing tables with their children’s prefixes by some node other than DODAG root and advertise their child’s addresses and prefixes. After this DAO message passes through the default DAG path from a specific node to the DODAG root, a complete path between the DODAG root and the DODAG node is created. The DAO base object format is shown in Figure 4. As the figure shows, the DAO Message’s key fields are [91]: (i)RPLInstanceID is an 8-bit data representing the DIO’s RPL instance ID(ii)Flag K indicates that a DAO message needs authentication(iii)The exponential number for each DAO message is the DAOSequence(iv)The 128-bit field DODAGID is a DODAG root specified field that specifies a DODAG. Only if flag D is equal to one is this field involved

Destination advertisement object acknowledgment (DAO-ACK): the Unicast Message is sent by a DAO receiver (DAO parent or DODAG root) in response to the DAO Message provides DAOSequence, RPLInstanceID, and termination status information. Do not forget. Nodes greater than 128 mean inaccessible, and a node has to choose a replacement parent [92].

6.3. RPL DODAG Manufacturing Process

The DODAG graph is created step by step. First, as Figure 4 indicates, the root plays a DIO message. The RPL nodes are expected to detect an RPL instance, use its parameters for setting up, choose a parent set, and construct a DODAG graph. This message contains information. The DIO packet recipient node will add the DIO transmitter to its parent list in the routing table and calculates the rank according to the OF inserted in the received DIO. The node’s rank value matches its location in the root graph and consistently exceeds its parent rank to ensure the graph is distant from its existence. DIO messages are then updated to the neighbor and sent to him. The node selects a preferred parent based on the parent list and is used as the default gateway to send data to the DODAG core [93]. Both nodes involved in the DODAG graph have a standard ascending path to the root at the end of this step. All preferred parents compose this path. DIO messages are transmitted intermittently using the trickle [94] algorithm depending on the time set to maximize the network-related control messages’ transmission frequency. By playing a DIS message to ask for a DIO message from your neighbors, DAO messages are used to build pathways downward. Router nodes in the DODAG structure can administer nodes’ routing tables according to the type of service defined in the DIO control packets. For maintaining downward paths in an RPL instance, the RPL routing protocol has two operating modes [93]: (i)Storage Mode. In this situation, the child sends a DAO message unicast to the preferred parent, storing the content of the DAO messages received by his children before submitting the new DAO packet by gathering accessible information. The multipart mode may be allowed or disabled in storage mode(ii)No Storage Mode. The DAO message is sent unicast to the DODAG root in this mode. As a result, middle parents do not save DAO messages; instead, they keep their addresses in the stack of the received DAO packet picture path and send it to their preferred parent. Consequently, no parent stores their child nodes’ address in this case, and only the root, which receives all DAO packets, can store and manage all downward paths [95]

6.4. RPL Repair Mechanisms

Inconsistencies and correction loops: The RPL routing protocol integrates loop avoidance, inconsistencies detection, and DODAG correction. As the parent nodes’ rank value increases, they approach the root in the DAG structure; the child nodes also tend to select the lower-ranked (higher value) parent as the preferred parent; hence, the infinite counting problem in RPL occurs. It cannot be reconnected to another node because it is broken. Of course, the value of both parents and children does not stop increasing [96]. The RPL routing mechanism prohibits loops in the DODAG structure by limiting the amount of rank increase allowed. If the node could not recognize the rank property, the loop would happen, so we can say that DODAG is a graph without a cycle. An outgoing node must declare a finite rank below—DODAG to prevent this. A different process can be used for the outgoing node, creating an intermediate DODAG and then reconnecting to the initial DODAG. The data route validation function in the routing protocol may also identify anomalies [96]. The routing information in packets is contained within an RPL option carrier in the IPv6 step-by-step process. Here are some definitions of flag: (i)Flag “O” down indicates the expected upward or downward direction of the packet. When this flag is activated, the router forwards the packet to a child node using downward routs, or vice versa; the packet will be sent to the parent with a higher value rank in the upward direction towards the root(ii)The “flag” R “error-rank” signals the presence of a rank error. The rank error occurs when the rank value and situation of a packet containing the flag below are not adjusted

“Flag” F “error-forward” shows that a node cannot transfer packets to the destination in the case of downward packets. RPL nodes can trigger correction processes when anomalies are observed. These structures would also sustain the network’s topology in connection and node failures [97]. A preferred parent is not usable; the local correction process requires choosing another route for routing packets and selecting another parent node among its parent record. It can additionally route data packets from another relative (neighbor) node, like the same rank node. It cannot be ideal for replacing this path. This locally efficient correction function helps the network to converge in an appropriate time frame. As various inconsistencies malfunction due to the local correction mechanisms, the DODAG root may begin to correct globally by increasing the Number of DODAG graphs. Then, the RPL network is completely reconstructed [97].

6.5. Trickle Algorithm

DIO messages are propagated quasi-periodically using an algorithm called trickle. The trickle algorithm is a transfer scheduling algorithm for the initial local communication between nodes in a network based on a stable model. When a network is stable, nodes exponentially reduce their communication rate, sending trickle messages in just a few packets per hour. In contrast, when a node detects an inconsistencies, it acts with quick trickle messages to resolve the inconsistencies [98]. Initially, the trickle algorithm was proposed for propagation and maintenance in WSN. It has been shown that it can be used for various purposes such as control traffic time, multicast propagation, and path discovery [99].

The Internet Engineering Task Force (IETF) standardized the trickle algorithm to regulate DIO messages’ transmission to generate network graphs in RPL. The trickle divides time into non-conformal intervals so that the smallest distance is Imin and the size of the highest distance is . At each distance, each node tries to send its trickle message based on trickle rules. This algorithm works based on some parameters, variables, and regulations. There are three parameters to configuring the trickle algorithm that is described below [5, 99].

It makes vulnerabilities and protection against attacks difficult. The RPL protocol specifies the number of security mechanisms. It combines local and global processes and loop intrusion and identification techniques. As discussed earlier, it also sets two safeguards for data packet encryption. However, their protection on the communication layer and the transmission or applications level is focused on these networks’ standard construction. The following will be thoroughly investigated on attacks against the RPL protocol. In two pages, special RPL attacks and RPL-related routing attacks, RPL attacks are listed in detail [5].

7. RPL Protocol-Specific Attacks

7.1. Internal RPL Attacks

This section describes in detail the specific attacks associated with the RPL protocol that is considered as part of internal attacks:

Storage routing table overhead attacks: as long as storage mode is allowed for these nodes, the RPL protocol is active, conveying routers created by RPL and holding routing tables. Increasing the volume of exchange routing tables in the network topology causes significant overheads that can also promote fake routes by DAO. This saturation inhibits new legal paths and affects network capacity and the possibility of memory overflow [100].

Rank increase attacks: these attacks include purposefully raising an RPL node’s rank value to construct a network loop. A rank value in an RPL network is determined by each node and corresponds to the root node’s location in the graph structure [101, 102]. The nodes’ rank in the downward paths will be continuously increasing to stabilize the DODAG graph structure. Each node’s calculated rank must be higher than its parents’ rank. The nodes must first delete the parents in their routing table with values higher than the currently calculated rank to switch their parent and change the rank [103]. In the DODAG structure, each child node selects a preferred parent from their parent list to minimize sending data cost towards the destination. An attacker can advertise a fake rank higher than expected on the network. Therefore, if the new parent’s DODAG rank is lower than the previous one and there is no loop prevention mechanism in the RPL, multiple loops will be created in the network. The loop correction function must give many DIO messages (trickle timer reset) and provide a long convergence period in this situation. When node batteries are depleted, and the RPL network gets congested, the attack is part of a resource utilization attack. If the number of ranks increases by each node in the DAG structure can be recorded, both inconsistencies in the graph configurations can be detected. The amount of this type of attack can be reduced. It is worth mentioning that if a node does not have any objective function matching or cannot accommodate the amount of traffic it gets, it can legitimately raise its rank score. After all, the new OF must have mechanisms to detect the intrusions with the loop or update the graph structure when the loop occurs. RPL has inherent capabilities for loop detection or prevention by validating data transmission paths [103].

DAG inconsistencies attack: inconsistency in the DODAG structure is detected by a node when the packet it receives from a higher rank node is set to “O” in its flag bit. For example, though the packets’ path does not fit the rank relation, it may cause a graph loop. This problem is controlled by the flag “R” error-rank bit. Since contradictions with a node are found, there are two possibilities [104, 105]: (1)If the error-rank flag is not activated, the node first fixes its value and then transmits the data packet. It is not just a matter of route inconsistency; it is a severe condition to the RPL network(2)Setting the “R” bit in the received packet means a rank error. The packet receiving node will be ignored if it is already set, and the timer will be reset. In case of this phenomenon, control packets will be sent frequently. The only thing a malicious node does is change the flag or apply a new flag to the header. This attack’s immediate result is that the goal node’s DIO trickle timer will have to be reset. In this situation, the node transmits DIO messages constantly, causing local chaos in the RPL network, draining the nodes’ batteries, and affecting connection availability. All the attacking neighbors are involved in this attack, and therefore, unnecessary traffic is processed. Furthermore, by altering lawful traffic, the aim node discards all packets. It creates a BH that divides the network’s components. The trickle timer’s reset rate during an RPL option has been limited to no more than 20 resets per hour to reduce the flooding caused by this attack. Also, instead of a fixed threshold, two network feature solutions are used. The first solution is an adaptive threshold with fixed parameters. Another form is a rank attack. The attacker does not search the rank relationship for a malicious node and does not set the “R” flag if anomalies are identified. The distinction between the DA (destination advertisement) inconsistencies and the DA inconsistencies is that the intruder cannot use the flags to render false circles. For real circles, however, it does not choose any solution. If they occur, the consequences would be identical [105].

Version number attacks: the version number field is an essential part of a DIO control packet that does not change value when sent and received in the DAG structure. If there is a need for a general fix to the graph structure, its value will be increased by the 6BR. If this field value does not change in the received DIOs from a node, the sender is not yet connected to DODAG and cannot be used as a parent. The attacker may cause instability in the graph structure by manipulating and increasing the version number field’s value and retransmitting the DIO to neighbors. The entire DODAG graph will be rebuilt unnecessarily as a result of such an attack. This attack will result in a lot of loops and data packet failure as a result. Unnecessary sequential graph reconstruction also significantly increases control message overhead, node resource loss, and network congestion. VeRA’s security mechanism is provided to prevent vulnerable nodes from root-forging and sending an illegal incremental number. This approach allows the use of a hash-based authentication method. In this case, a node will quickly determine if the root node or another malicious node has changed the version number and unable to usurp DODAG’s origin identification [106].

Routing table falsification: routing information may be formatted or changed in a routing protocol to spread falsification paths to other nodes. By manipulating or forming DAO control messages to construct false downward routes, this attack can be carried out on the RPL network. When storage mode is allowed, this is possible. A malicious node, for example, advertises routes to nodes that are not within the DODAG. The following network is configured since the target nodes have incorrect paths in their routing table. As a result, the path may take longer to complete, packets may be discarded, and the network may become overburdened. The RPL protocol is yet to investigate this attack [107].

Routing information broadcast attacks: in this type of attack, each node in the DAG structure stores control packets received from valid nodes afterward publishes them on the network. Since the topology and routing paths of complex networks shift often, this attack is very disruptive. Thus, routing information broadcast attacks disrupt the correct network topology and persuade nodes to update their routing tables with incorrect and outdated information. The RPL protocol uses sequence counters to guarantee that routing information is new. The version number is integrated into DIO packets. The current path sequence stores alternative routes information in DAO packets [108].

LR attack: in an LR attack, the attacker regularly sends an LR message without connection quality problems. This phenomenon leads to LR around the node. This attack affects the packet delivery ratio more than other attacks. It increases the number of control packets exchanged and end-to-end delay (E2ED). It also increases the EC of the nodes [109].

Neighbor attack: in this attack, the attacking node broadcasts the DIO packet received from its neighbor without any change. The node receiving this packet may think that a new neighbor has sent this message. He may want to add this node to his list of potential parents or choose him as the preferred parent if that node may not be within the range of the victim node [110].

DIS attack: DIS packets are used to receive network topological information before connecting to it. If the attacker broadcasts the DIS messages, the receiving node of this packet will reset its DIO timer. If the attacker replays DIS messages, the recipient sends a DIO packet in response [60, 111].

Worst parent attack: this attack is called the rating attack, which systematically chooses the worst preferred parent based on the objective function. As a consequence, the course is not optimum, and productivity suffers. This attack is one of the most violent and dangerous attacks on RPL because the child nodes need their parent to route and direct their packets. The neighboring nodes cannot track and detect this type of attack [53, 112].

Storage DAO inconsistencies attack: this type of attack occurs when the DAO control packet detects that the node is set in the downward path. However, in the child node routing table, this route is invalid [113]. DAO inconsistencies loop recovery is a method provided by RPL for fixing these inconsistencies. By sending a flag “F” error forwarding in data packets, RPL router nodes will deprecate downward paths by signaling that a child node could not deliver a packet. The packet is returned to its parent with the active “F” flag, forcing it to connect with another neighbor. Packets sent in downward paths may barely come back to their initial position. This event will happen when the router sends a packet to the parent whose flag bit “F” is fixed and the flag bit value “O” is not set. Suppose the parent node receives a packet with the “F” flag set. In that case, it erases the value entered in that flag and tries to send it to another neighboring node according to its routing table information. The procedure is replicated if the alternative neighbor already has an inconsistent model. The purpose of this attack is to divert the nodes from accessible downward paths. This attack also leads to segregation and instability, and additional congestion if packets must be sent from the following optimal routes. Eventually, the child’s nodes become delayed and hungry. To reduce this attack’s impact on the network, 6553 RFC suggests that the rate of discarded downward routing entries is limited to 20 times per hour [113].

Decreased rank attacks: in the DODAG structure, the further we go to the root, the lower the rank, and the nodes close to the root need more control because they can attract more traffic and become hotspots network. Nodes in the RPL tend to reach a lower rank and a position closer to the root. A malicious node with lower rank ads can attract many nodes to itself or the DODAG Instance and cause an imbalance in the graph structure. An intruder node can change its rank value by forging DIO packets in the RPL [114, 115].

7.2. RPL Protocol-Related Routing Attacks

This section describes in detail the RPL protocol-related attacks, which are another part of internal attacks:

HF attack: HF attacks are packets that one node sends to other nodes to connect to the network. With all broadcast packets of high signal strength and good routing metrics, an attacker can identify itself as a neighbor of many nodes (or even the entire network). However, suppose the nodes are far from the attacking node. In that case, their messages will not reach their destination to connect to that node because the attacker is not within their range. In RPL, this attack occurs when an attacker uses DIO packets for advertising a DODAG. If encryption is used on the network, the attacker must capture a network node to attack with the keys in hand. If the nodes’ topology information is known, filtering incoming packets from remote nodes will reduce this attack’s impact on the RPL. The RPL itself can significantly reduce this attack’s impact within 10 minutes, but some anomalies remain in the network [116, 117].

Sinkhole attacks: there are two phases to this attack: the malicious node is first used to draw a vast traffic volume by displaying forged results (for example, better up and down quality links). Then, after unlawfully processing traffic, it corrects or discards it. The attack can be conveniently carried out in RPL networks by controlling the rank value. Because of false ads, other nodes often chose the malicious node as their selected parent, lowering efficiency. As a result, the routes are not network-optimized. This attack alters the topology of the network and decreases its efficiency. Also, a BH attack is where an attacker tries to divert all network traffic [118, 119].

WH attacks: an off-band communication between two nodes using wired or wireless connections is used in this attack. WH can be used to deliver packets more efficiently than conventional routes. An intruder intercepts packets transmitted by nodes on one side of the network and distributes them to nodes on the other side. This attack is simple to carry out in wireless networks since the intruder will transmit the requested traffic to himself through the WH and decrypt all wireless transmissions. In this type of attack, the intruder uses a tunneling system to send routing information from one part of the network topology to another, thus falsifying the data transmission path during the routing process. If nodes are in the same neighborhood, they can see each other even if they are far apart. As a result, they can create non-optimal paths based on objective function [120, 121].

BH attacks: a malicious intruder throws away all packets to be sent in this type of attack. When combined with SH attacks, this attack can be highly damaging, causing massive traffic to be lost. This attack is classified as a DoS attack. The attacker will detach many nodes from the network if they are in a strategic location on the graph. Gray hole or limited forwarding attacks are another forms of attack in which the attacker only throws a section of the network [122, 123].

Sniffing attacks: in the sniffing process, the attacker can listen to the traffic exchanged through various wired and wireless networks and capture or distort their data without informing the legal sender and receiver. An attacker may use a hacked computer or directly steal packets from shared media on wireless networks to carry out the attack. Partially topological information, routing information, and data content can be derived from intercepted packets. Suppose an attacker eavesdrops on control messages in RPL networks. Therefore, the attacker can exploit neighboring nodes’ configuration information such as rank, DODAG ID, and RPL instance version number. Intruders can achieve a local view of the network topology, addresses, and packets content exchanged between source and destination by sniffing the network and eavesdropping packets sent and received. This attack is brutal to detect, owing to its static nature. When an unknown intruder is involved, the best way to avoid interception is to encrypt communications [124, 125].

Traffic analysis attacks: these attacks use the features and patterns of traffic on the connection to collect routing information. Furthermore, if the packets are secured, this attack can be carried out. Attacks like sniffing accumulate information about the RPL network, define the parent-child relationship, and partially view the topology. The attacker’s rank decides the attack’s result. It will process much traffic if it is close to the root node. As a result, it will gather more data than when the node is on the DODAG’s edge [126].

Identity attacks: these attacks, also famous as CloneID, occur when a malicious node mimics the legitimate node identity. By providing root node access, which is the critical point in creating and maintaining the DODAG topology and managing routes information and data exchanges, the attacker can listen to traffic and, by forging root identity, launch various attacks such as Sybil on the network. In Sybil attacks, the network performance and services can be disrupted using a malicious server that uses physical nodes’ logical inputs and forges their identities [127, 128].

8. Data Mining (DM)

The method of finding fascinating trends in vast volumes of data is known as data mining. It is an intriguing paradigm based on objective laboratory evidence, is novel and theoretically beneficial, and is easy enough for humans to understand. Such intriguing patterns represent information. Many people mistakenly believe that DM entails extracting information from data. Others consider DM to be only one stage in the process of information discovery. The following steps are known to be part of the information exploration process [127]: (1)Data clearance: deletes noise and incompatible data(2)Data integration: combining multiple data sources(3)Data selection: recovery of data related to analysis from the database(4)Data conversion: convert data to a form suitable for DM, such as summary by integration(5)DM: using methods to extract data patterns(6)Pattern evaluation: identification of correct knowledge-based patterns according to measurement criteria

Steps 1 to 4 are various data analysis methods. The data is prepared for mining and collecting information using imaging techniques and knowledge presentation to show the customers found knowledge. The user engages with a knowledge base during the DM process. The user is provided with the found patterns, which are then saved in the database as new information. This diagram illustrates DM as one of the phases in the information discovery process. It is crucial since it detects secret trends that can be analyzed. On the other hand, DM refers to the whole method of knowledge discovery in business, media, and science (perhaps because it is shorter than the term knowledge discovery from data).

Consequently, a general view of DM success is taken into account: DM is the process of extracting useful information and patterns from massive volumes of data. Databases, data servers, the Web, and other data archives that are dynamically streaming across the environment are examples of data sources [129]. As one of the most application-oriented examples, technologies for data processing data mining utilizes various techniques from other areas. Statistics, ML, pattern analysis, data warehousing and database systems, knowledge recovery, illustration, high-performance computing algorithms, and a wide variety of other technologies are only a few examples. The technologies used in DM are depicted in detail in Figure 5.

8.1. Machine Learning (ML)

Machine learning is a growing field that analyzes how computers learn or improve their performance based on input data. This science seeks to automate the recognition of complex patterns and intelligent decision-making [130]. For example, in a post office, ML can identify handwritten postcodes on envelopes after receiving several samples of different codes.

Supervised learning: in this ML field, the dataset under training uses different labels to classify the instances. For example, various postcode images with specific machine understanding concepts are used for classification-based supervised learning [185].

Unsupervised learning: this ML model is generally equivalent to clustering, with the received instances having no label classes. Unsupervised learning uses clustering to identify data classes [131, 132]. The image collection contains the postal code handwritten digits is delivered as input by unsupervised learning. Because under training data classes are unlabeled, the learning model is inefficient and cannot analyze the received image clusters’ semantic concepts and are only adapted to different digits.

Semisupervised learning: this ML technique allows for labeled and unlabeled instances to be used while learning the model. We can use labeled instances to learn data classes in one approach and benefit unlabeled cases to correct class boundaries in something else [133, 134]. One number of instances can be presumed to belong to the positive class. In contrast, the rest of the instances can be assumed to belong to the negative class with two classes. The decision boundary can be defined more accurately using unlabeled instances. Also, two positive instances in the upper right corner, despite being labeled, can be identified as noisy or skewed data [133].

Active learning: active learning is a computer learning system that directly encourages users to engage in the learning mechanism. This learning model asks for a specific user (e.g., an expert in a subject) to label an instance, which may be a collection of unlabeled instances or created by the learning software. Given the small range of occasions that can be questioned for marking, this strategy improves the model’s accuracy by incorporating information from human users. DM and ML have many parallels. ML also relies on model accuracy when it comes to classification and clustering. DM emphasizes the reliability and scalability of extraction techniques on massive data collections and methods used in structured data and the development of new and alternative methods [135, 136].

8.2. Classification

The task of discovering a model (or function) representing and separating data classes or concepts is known as classification. The model’s extraction is based on evaluating a collection of experimental results (data objects with class labels). Next, the model simulates a tag class of data targets for items not classified in a particular class. The learning stage (where a classification model is created) and the classification stage (where the data is classified) are also part of the data classification process (the expanded model is adopted to anticipate delivered data class tag) [137]. A classifier is constructed in the first step to represent a predefined set of data classes or concepts. A classification algorithm produces this learning step (or learning step) by classifying a classifier by evaluating and learning from various database instances and class tags. The model is then used to classify the data in the second process. Before applying the model to the actual results, the accuracy of the category forecast must be determined. A series of trials accomplish this. Models can be created using various tools, including classification principles, decision trees, mathematical formulas, and neural networks [137, 138].

8.3. Data Mining in the Intrusion Detection System

Data mining techniques can be classified based on differences in performance, model representation, priority criteria, and algorithms. In the field of IDSs, one of the main functions of the models is classification. The classification technique is used to separate data as normal, destructive, and offensive. Three decision tree classification techniques, support vector machine, and Bayesian method are used in this project, each explained separately [139, 140].

8.4. Decision Tree

A decision tree is similar to a flowchart in that it has a tree structure. Internal (nonleaf) nodes in this tree reflect a property test. Each leaf node (or end node) represents a class, and each branch represents the test output. The decision tree classifies in the following way: the values of the instance X attributes in the decision tree are evaluated and reviewed for an instance like X that does not have a class name [141]. The path from the root to the leaf node containing the class predicted for instance X is generated. Classification laws can be quickly translated from decision trees. Decision tree categories have been prevalent due to the lack of specialized knowledge with parameter setting. They are very suitable for discovering exploratory learning. Decision trees can also be used for multidimensional data. The knowledge gained is in the form of a visual tree and is generally easy for humans to understand. The steps to learn and classify the decision tree are straightforward. Decision tree classifiers are usually very reliable. However, efficient usage is dependent on the data available. Attribute selection parameters are used during tree creation to choose the attribute that best separates instances into distinct classes. Many branches of decision trees can display noisy or out-of-date training results. These branches can be identified and deleted using tree pruning to increase classification accuracy on unobserved data [141, 142].

8.5. Support Vector Machine (SVM)

SVMs are a classification system for linear and nonlinear results. In brief, SVM is an algorithm that operates like this. This algorithm employs a nonlinear mapping to convert educational data to a higher dimension [143]. The cloud looks for the best linear page in this new dimension (i.e., the “decision boundary” separating one class’s instances from another). With a convenient nonlinear mapping large enough, data can be separated from two separate classes via a cloud [144]. The SVM algorithm uses backup vectors (“instructional instances”) and margins (defined by backup vectors) to find the cloud page. These concepts will be explained further. Because of their ability to model dynamic nonlinear decision boundaries, SVMs are highly accurate and less vulnerable to overemphasis than other approaches, considering their slow training time. The discovered help vectors often serve as a full explanation of the model that was studied. In numerical forecasting and grouping, SVM can be used [144].

8.6. Bayesian Method

Statistical classifiers include Bayesian classifiers. This classification can be used to predict the possibility of an instance entering a specific class. This classification is based on Bayesian philosophy, which will be discussed further down [145]. The primary Bayesian classifier’s efficiency, also known as the naïve Bayesian classifier, is comparable to decision trees and selective neural network classifiers in classification algorithms studies [144]. When used for massive datasets, Bayesian classifiers demonstrate high accuracy and speed. The influence of one attribute’s value on the given class mark is believed to be independent of the importance of the other characteristics in basic Bayesian classifiers. This assumption is called “class-conditional independence” and simplifies the calculations. For this reason, in its naming, “naïve” has been used in a superficial sense [144].

9. ML Methods in RPL Protocol

9.1. Feature Selection Methods for Building an Intrusion Detection System

In the ML process, feature extraction and selection are two crucial stages. ML models are trained using features. FS methods are beneficial for determining a subset of features within a dataset that decreases processing time and increases classification accuracy. In particular, three different selection methods exist (1) method of filtering, (2) method of wrapping, and (3) method of embedding [124, 146]. Filter methods preprocess data by calculating and predicting the target feature based on the relationship between features. CFS (correlation-based function selection) is a heuristic search technique that pairs the feature evaluation formula by calculating the necessary correlation between the features and the class identifier. The critical goal of feature discovery is to identify a subset of strongly associated features with the class identifier but not with each other. The feature reduction methods for IDS have received much attention, mainly when using the KDD dataset [147149]. Shubhangi and Meenu have created an IDS that uses a heuristic technique to detect denial of service attacks and filters the attributes. Using the KDD dataset [150], they applied the concepts of knowledge benefit, gain ratio, and correlation. Swapnil and Sanyam [151] also take a similar view. Sun and Kasongo developed their IDS using filter-based methods [152]. This paper goes into greater depth on how FS can enhance classification accuracy and overall IDS efficiency.

CHA-IDS is a compression header analysis-based intrusion detection framework built for RPL [64]. In their research, Stephen and Arockiam [96] discussed rank inconsistency attack (RInA) mitigation. The rank value is tampered with in RInA to make the network vulnerable. E2V, which consists of three stages, has been proposed to alleviate this rank attack. Pursuing and mitigating RInA-based vulnerabilities such as BH, SH, and limited forwarding is the primary goal of this pattern. In the first action, the rank must be approximated. Next, malicious nodes must be identified and removed. The energy level is benefited to discriminate authentic nodes from spurious ones also distinguish rank inconsistencies. Nevertheless, only various types of rank attacks seem to be the target of this detection model.

Neerugatti and Reddy [153] provide a similar algorithmic mechanism that RPL uses to discover rank attacks. The k-nearest neighbor algorithm represents the MLTKNN solution for ranking attacks. “The rank attack in the RPL protocol is the physical location of the node about the boundary router (root node) on neighbor nodes,” according to [153]. When constructing a DODAG graph in RPL, the attacking node, by tricking the 6BR, will be able to create a path through advertising a fake rank. MLTKNN has been suggested to detect this malicious or intruder node. The proposed methodology is tested with 30 motes in the Cooja simulation. However, it is worth noting that only the rank attack is discussed in this job. Shin et al. proposed a new IDS system for anomalous intrusion discovery devices in RPL [154]. This new solution can detect packet drop attacks in RPL and detect the fake packets falling by the network’s data packets losses. Following [154], “nodes in RPL retain a packet distribution ratio of their forwarding links to compute a routing metric ETX.” The suggested approach uses this value to obtain nodes’ usual behavior since this phenomenon indicates the network’s data packets loss in those communication paths [154]. For their intrusion detection system, the authors used the Cooja emulator, which is used to evaluate Contiki-based systems’ performance. “The findings of the evaluation suggest that the approach is effective at identifying malicious packet dropping attacks.” This scheme is specifically designed to detect legitimate packet-dropping attacks. It does not protect against RPL or WSN other attacks, as previously stated. Bhandari et al. [155] proposed a “congestion-aware routing protocol (CoAR) that depends on the selection of an alternate parent to ease network congestion” [155]. By combining various routing metrics and using the “multicriteria decision-making (MCDM)” mechanism by the child nodes, the authors’ proposed solution selects the preferred parent node among many candidate parents. The proposed method uses the neighborhood index metric to break the tie of routing points.

Yavuz et al. [156] have proposed a modular deep learning approach using a seven-layer structure of ANN to detect different types of attacks such as version number, HF, and decreased rank attack in IoT. In this research, multilayer vision and NB classifiers are used for analysis. Neerugatti and Reddy [157] suggested WH attack detection and introduced a novel method (ADWA). The authors’ ADWA approach uses an acknowledgment mechanism to detect WH in RPL. Contiki-Cooja simulation outputs using TelosB sky motes reveal significant improvements in latency, packet distribution rate, and detection of WH attack metrics. Although the above research offers excellent countermeasures against attacks in the RPL, they do not provide any strategies to detect and prevent both rank and WH attacks simultaneously.

9.2. Available Improvements for RPL Using Intelligence Approaches

Objective functions (OF) are responsible for setting routing rules for the RPL protocol; howbeit, structuring is not necessary for OF. As a result, RPL gave researchers the freedom to improve or evolve the OF following the requirements. Various routing approaches [104, 158] have been suggested to enhance network efficiency by improving multiple performance parameters, such as packet distribution ratio in the network, transferability (throughput), EC, cost of operation (overhead), and other cases. The precise measurement of connection efficiency is an essential consideration for connectivity in a wireless network. Ancillotti et al. provide a mechanism for improving the measuring probe based on reinforcement learning (RL) for RPL called RL-based link quality estimation (LQE) [159]. The RL-probe feature uses asynchronous mode for LQE, which puts it in both proactive and reactive phases alongside synchronous. The obtained signal strength indicator (RSSI) near the side of the ETX metric is analyzed in the proactive process. The reactive phase performs an RPL rapid local fix to conducting the LQE. The synchronous mode in LQE divides the nodes into clusters to enhance probing. It also benefits from the unique prioritization of created classes based on a multiarm bandit (MAB) to improve probing. The author of RL-probe streamlined the probing process but did not suggest tuning the relation metric. Clustering also added to the power overhead.

Researchers have developed a context-aware method for LB in RPL (CLRPL) [160] for IoT infrastructures with significant and complex traffic loads. Their new OF, called CAOF, solves the thundering herd phenomenon (Herd Decampment Phenomenon) [87, 161] in the DODAG structure and calculates each node’s rank based on its parent’s rank, residual energy, and ETX. Their other new OF, called CARF (context-aware routing factor), uses the parent chain to balance the load and residual energy instead of relying on a single parent. Their proposed method has a significant DIO overhead but avoids the illusion of equality with the preferred parent selection and improves network resource consumption and packet losses.

To overcome the congestion problem in parent nodes, which arises from the nonsymmetric distribution of child nodes, the authors using the MCDM Mechanism, propose a new congestion-aware objective function (CoA-OF) for RPL (CoAR) [155]. This objective function uses the three ETX, RE, and QU metrics based on the technique for order choice mechanism by similarity to ideal solution (TOPSIS) [162], to select the preferred parent. Congestion detection is performed using a comparative threshold solution for buffer occupancy calculated depending on past and present traffic. The proposed solution imposes more energy overhead on parents’ regular settings. It improves packet delivery rates, power consumption, and throughput in high traffics.

The authors have proposed a multiobjective OF that depends on Quality of Service (OFQS) [163] for RPL that automatically adjusts various instances based on the criteria set out in the smart grid (SG) specifications. This new OF uses delay, ETX, and three-mode power state (adjusted depending on the nodes’ remaining resources) to make routing decisions. OFQS assigns a weight to each route based on these three criteria. They also divided the traffic into three categories: essential, noncritical, and seasonal. The path with the shortest delay (between 1 and 30 seconds) and more significant than 99.5 percent efficiency is chosen by critical traffic. Noncritical traffic, on the other hand, takes the path with the shortest delay and best reliability of 98 percent. Periodic traffic takes a route with a modest delay (about 5 minutes to 4 hours) and a 98 percent reliability. E2ED, PDR, and network lifespan are all improved by the suggested approach. On the other hand, their tuning criteria are locked in stone and cannot cope with a complex network. The ML approach can be used to solve it.

The authors in [164] provide a versatile objective function. The data forwarder is selected based on a combination of forwarding delay, ETX, and EC criteria for applications that require real-time data exchange, data durability, and energy efficiency. This OF sets a specific weight for each metric and obtains the composite additive metric. The weight is defined based on the application type or the form of traffic. This additive composite metric is applied to each entry of the parent node routing table. For each entry, the parent table is reconstructed according to the calculated metric. The proposed solution improves the packet delivery ratio and EC and imposes additional overhead on the system. RPL has been enhanced with the chaotic genetic algorithm (CGA) [165]. This algorithm’s fundamental goal is to use chaos and genetic algorithms to improve the parent selection process. CGA improves the search by using a genetic algorithm’s global search efficiency to find the best solution using chaotic algorithm ergodicity. A composite metric is a combination of HC, residual energy ratio, ETX, and queue length. Each metric has a specific weight. A noisy genetic algorithm optimizes the weighting factors in CM to choose the correct parent. This genetic algorithm boosts residual energy, E2ED, and performance rate metrics. The network overhead, on the other hand, was not taken into account in this analysis.

Lamaazi and Benamar [166] suggest a new objective function called OF-EC. To make routing decisions, OF-EC uses a fuzzy logic technique. ETX and energy use (EC) are merged in the OF-EC. By choosing the right relative, the proposed OF will minimize EC and packet loss. It does, however, raise the pace at which parents shift. Bahramlou and Javidan [167] use the aggregation method to make the most scarce capital. Researchers estimate the number of children to calculate each parent’s rank, a technique that, in heavy traffic, improves the DIO packets overhead, packet retransmissions, delivery rates, and EC but will increase congestion conditions in parents. The trigger feature in this proposed solution monitors the network environment and selects less-congested parents as the preferred parent. They also offer an efficient aggregation approach that minimizes node resource consumption by reducing data packets and combining correlated data. Zier et al. [168] suggested E-RPL as a way to fulfill QoS routing requirements while reducing network access overhead. They restrict the nodes that can wait for DIO to monitor the DIO overhead for ETX. In the DAG structure, nodes remain to receive DIOs from their neighbors before publishing their DIS requests. Otherwise, they will send DIS to receive DIO packets. The sink or gateway will first produce its DIO and then release DIS or DAO. The authors provide a multiconstrained objective function for E-RPL that considers energy and delay metrics with random weights to calculate the rank. Their commitment decreases EC as well as the time it takes from start to finish. However, it lengthens the time it takes for the network to come together.

A new objective function based on fuzzy routines by Fabian et al. [169] provided for the environment dynamic adjustment using EC and ETX metrics. When the node battery level exceeds the defined threshold, ETX is used to calculate the node rank. Suppose that the two-residual energy and cumulative ETX will be used to determine the node rank more minor than the threshold. Third, as the battery runs out, this node is turned off. While the proposed goal feature improves PDR and throughput, it does so at the expense of increased EC. An LB system is used by Ghaleb et al. [170] to pick the parent, which increases network stability. Each node in the DAG structure creates a list of its children (CHlist) based on the received data packets’ analysis. Preferred parents are also selected based on two ETX and the number of child nodes metrics. Shifting LB details can occur with the trickle timer exceeded. To stop this trend, the authors have designed a fast propagation timer for CHlist and trickle. If a node has parents with the same rank, a standard metric will be used to select the parent. The CHlist will be examined, and the node with the fewest children will be chosen as the default parent. The Balance Timer was created to remove the need for regular parent adjustments. The authors’ proposed solution could delay network convergence and loop due to rapid packet propagation. Also, the packet delivery ratio and EC in the network will increase. Table 5 shows a similar approach. The author represents an opportunistic objective function based on fuzzy logic (OOP-OF) [58]. This new OF uses the number of children, ETX, and HC metrics for the parent node and seeks to ensure QoS for applications that require reliability and low latency. The proposed fuzzy system evaluates the mentioned metrics as input and combines them based on fuzzy rules. Finally, the aggregated dataset is de-fuzzified, and the parent node routing table is reconstructed based on these outputs. The proposed method will improve latency and delivery rate but increase EC.

10. Statistical Analysis of Review

In this paper, the review of research is presented based on intrusion detection methods of ML. The analysis is done using research sources of “Google scholar”, “Crossref”, “Scopus”, and “Web of Science” resources. Based on the web of science search, only nine papers are published with “RPL” and “machine learning” in their titles and abstracts. Based on Scopus’s results, 32 papers are published with “RPL” and “machine learning” in topics. For analysis of the papers with “RPL protocol” in the topics, the web of science database results with 344 papers is illustrated in Figures 68. The publication’s plot was published with the “RPL protocol” in the topics; the maximum number of papers belongs to 2020. Based on Figure 6, the number of papers on this topic has increased. Moreover, regarding Figure 7, most of the papers are research papers, and 17 review papers are published on this topic. Besides, most of the papers (47/344) are submitted with authors from India. Then, South Korea, England, China, France, Italy, the USA, Iran, Spain, and Saudi Arabia are in the other ranks.

11. Discussion

The inherent characteristics of low-power and lossy networks such as resource constraints, unstable infrastructure, frequent link failures, unreliable communications, and topology dynamics [88] predispose them to various attacks and threats and make it even more difficult detection and mitigation these intrusions effects. Encryption or authentication-based security solutions for major routing solutions in network structures such as WSN quickly deplete node resources due to high computational overload and are unsuitable for LLN and RPL [53, 188]. Given the widespread threats to IoT security, much research has been conducted on known RPL attacks to date. A comprehensive RPL-based intrusion detection and countermeasures solution should be able to detect multiple simultaneous or cooperative attacks. Such a mechanism should also detect and mitigate the effects of malicious nodes, isolate malicious nodes from normal, use an appropriate notification mechanism to inform other nodes, identify mobile nodes, and evaluate the impact of network dynamics in the event of an attack [5].

Various papers have examined different types of attacks, including known or unknown attacks on the RPL protocol. Some of these intrusions are studied more than others, such as SH, BH, SF, and DIS attacks. Other types have received less attention. No study designs its IDS based on Worst Parent attacks, or rare IDSs have focused on neighbor attacks, rank, and DAG inconsistency. No IDS designed to detect or counter all types of attacks has been provided so far, so a comprehensive and ideal IDS should be able to detect all attacks and distinguish between similar functions. Therefore, this critical issue should be further studied in the future. The proposed mechanisms for intrusion detection into the RPL-based IoT, the better it can classify different attacks and evaluate the impact and depth of attacks on the network structure, the more accurate and precise algorithm will be.

Numerous studies have used various metrics such as energy overhead, the ability to detect mobile nodes from static, scalability, the ability to reduce the impact of attacks, and increase the detection rate to evaluate their performance. IDSs that, in addition to the authors’ claim, also perform well in the real world must prove their comprehensiveness in addressing various metrics to detect unknown attacks to an acceptable level and be resilient to the system in this situation. A review of multiple studies shows that specification-based mechanisms constitute the predominant part of the mechanisms proposed for IDSs, and anomaly and misuse-based detection, as well as hybrids, are next. Few studies on digital signature-based IDS have been conducted to date. Specification-based IDSs are highly prevalent on LLNs because they keep less CPU and memory busy. To understand the behavior of different types of attacks, classify and analyze them, and provide an efficient intrusion detection solution suitable for LLN and IoT applications, we will need to update conventional datasets [172]. The datasets provided for intrusion detection, which result from different simulations or based on data obtained from different testbeds, such as the KDD 1999 dataset, are usually extracted based on application layer intrusions [173]. They are not the result of LLN-specific traffic flows or threats in the RPL. Therefore, a reliable RPL-based IoT dataset is not yet available. Researchers should evaluate the effectiveness and validity of their proposed approach based on independent simulations and experimental results.

IDSs based on ML algorithms for RPL require a particular database based on RPL and LLN events and processes to enable training and evaluation of ML-based data. Using ML betters malicious behaviors detection on RPL. It distinguishes such behaviors from normal processes in the network appropriately. ML-based techniques combining with DL-based solutions can better meet the challenge of resource constraints in the IoT by optimizing the feature selection process and reducing their dimensions [174]. ML-based learning models in IDS design can improve identifying and dealing with unknown and new attacks and improves IDS behavior based on experienced and untested data by providing new models more intelligently and independently of human control. The combination of ML algorithms with big data [175] can help to realize real-time intrusion detection scenarios [176] when new attacks occur, and IDS optimally trained when exposed to such attacks.

12. Challenges and Open Research Issues

This section will look at fundamental challenges that may benefit future researchers and offer suggestions for addressing them. Identifying existing gaps and taking advantage of opportunities leads us to design a robust yet secure, adaptable, and intelligent IDS that is comprehensive, lightweight, cost-effective, and integrates with new technologies in line with the Internet of Things will guide.

Ability to detect and analyze IDS against new and unknown attacks: IDSs are designed to detect a wide range of attacks, such as RPL attacks. Various studies have shown that specification-based and hybrid mechanisms can better detect unknown attacks than others that IDS has not been trained to detect. So far, little research has focused on identifying new and lesser-known attacks on RPLs. The IDS detection capability, interoperability, and scalability of the various solutions against these intrusions need to be evaluated more accurately. Also, there is a lack of research that assesses IDS from identifying and countering all known types of attacks in the RPL and unknown intrusions. A comprehensive IDS has not yet been presented in this regard.

Collaborative IDS design against cooperative attacks: cooperative attacks are among the most complex ones that can significantly jeopardize DAG performance. These attacks can carry out their threat in a distributed manner by controlling multiple border routers and multiple sensor nodes. So far, little research has focused on identifying these types of attacks. No IDS has been developed that can detect these attacks with the participation of various DODAGs.

Node mobility detection and topology dynamics: most designed IDSs consider the network topology to be static. Over time, several nodes enter and exit the DODAG structure, resulting in many instability and losses of the LLN structure. The various attacks introduced for RPL can be performed using mobile nodes, so IDS must track node mobility. When designing IDS, the dynamics of the network topology must be taken into account.

Overhead and cost metrics: due to the limitations mentioned at the level of LLN nodes, the cost should be considered one of the main parameters in IDS design. The practical solutions proposed for IDS do not consider the cost in parallel with the robustness of IDS against various threats. Attacks that threaten the DAG structure complicate the process of detecting such attacks. Intrusion detection algorithms to detect and deal with these threats cause computational overhead, memory, and energy resources depletion of LLN nodes. Therefore, the stability and robustness of IDS against various security threats should be considered at the same time as its lightweight.

Proper placement of IDS on the network: IDS should capture all traffic exchanged between network devices, including sensor nodes, hosts, and user-side equipment, and monitor comprehensively inbound and outbound network traffic and events. Therefore, the proper position of IDS can directly impact the optimal monitoring of network traffic. Proper IDS placement can lead to better performance, such as higher detection rates and less energy and computational overhead to other network nodes.

Utilizing new ML models to detect intelligent attacks: new ML methods such as active learning for optimal IoT-based IDS training can overcome the problem of data scarcity. Due to the topology dynamics, the multiple needs and requirements of the applications, and the heterogeneity of the nodes in the IoT infrastructure, traditional scenario-based methods will not necessarily be effective. Therefore, ML-based methods can lead to efficient and lightweight intrusion detection mechanisms. Simultaneously, the LLN node resource constraints have challenged the widespread ML-based mechanisms for LLN and the RPL protocol. So, ML algorithms used need to be updated to fit LLN structures for IoT intrusion detection. No IDS has been developed to detect complex ML-based attacks that explore network security vulnerabilities.

Supporting various cyber-physical systems technologies and applications: major IoT-based IDS solutions are optimized for 6LoWPAN networks. In contrast, a wide range of cyberphysical systems (CPS) such as smart homes or industrial and enterprise applications use other standards such as Bluetooth and Wi-Fi. CoAP and MQTT are also used in the IoT application layer. IDS must cover a wide range of different IoT standards and technologies and identify and track the various attacks and threats that arise in them.

Ability to analyze real-time security notifications in IDS: by adopting an appropriate and real-time strategy for handling network security notifications such as attack type, attacker characteristics and location, and adverse effects, more proper decisions can be made against them. Many LLN nodes and the large volume of generated notifications, a significant portion of which have a lower priority, make notification processing tedious, complex, and time-consuming. So, in future research, real-time notification processing should be given special attention. Proper IDS placement in the network and data correlation and abstraction techniques can make faster and easier analyzing the notifications.

Support for QoS metrics: many IoT applications are real-time and require minimal latency. Most IoT intrusion detection studies have limited their scenarios to small- or medium-sized networks. While in the real world, IoT can be an infrastructure of massive nodes with multiple resources. Lack of scalability support can jeopardize IDS performance in identifying complex and widespread threats in the IoT, resulting in depletion of node resources, reduced network performance, and user dissatisfaction. Therefore, IDS for IoT applications, while highly robust, must be scalable and support QoS.

Analyzing encrypted traffic by IDS: encryption is one of the most common techniques for securing data on the network. According to the Gartner report, 80% of enterprise web traffic is encrypted by 2019 [177]. Most IDSs designed for the IoT cannot process encrypted traffic, so attacking nodes can use the encryption technique to their advantage to escape detection by IDS. According to Cisco, 70% of web-based malware traffic is encrypted, and 60% of organizations’ attempts to decrypt malware web traffic have failed [178]. Using Cisco Encrypted Traffic Analytics (ETA) technology, metadata and encrypted traffic can be analyzed, and malicious activity can be detected, regardless of the protocol type and featured included in the IP packets. ETA with passive monitoring can detect all kinds of threats without decrypting traffic.

13. Conclusion

IoT security is of great importance due to its increasing pervasiveness and sensitive application areas. On the other hand, the secure routing protocols proposed each have their weaknesses and will not guarantee complete security for IoT. Accordingly, the attacking plan to these networks is prosperous, and another strategy must be considered to identify the vulnerabilities. Intrusion detection systems (IDS) lead us to this goal. The RPL protocol is subject to various types of security attacks. Low-power and lossy networks’ inherent features such as dynamic topology, resource constraints, infrastructure instability, unreliable communications, high losses, and low bit rates make them vulnerable to all kinds of attacks. These limitations and problems are not only specific to RPL-based infrastructures. However, they can also be seen in a variety of WSNs or even wired communications. RPL specifies many protection protocols, including general and local repair mechanisms, loop avoidance, and detection strategies. It also encrypts data packets using two security modes. The normal development of such networks is focused on the link layer, transmission layer, and application layer protection. However, it has believed that the intruder could get around the link layer’s protection by gaining access to a shared key. An intruder may be a faulty or misconfigured node that degrades network performance by its behavior. This paper offers a concise summary of IoT intrusion research efforts. Relevant intrusion detection systems for IoT or IoT intrusion detection techniques that may be part of an intrusion detection system were analyzed in the literature. These papers were published from 2009 to 2021. A classification was used to categorize these papers based on the following features: authentication approach, IDS displacement, protection risk, identification method, and IDS architectonics. Based on the analysis done, it can be inferred that intrusion detection system architectures for IoT are only in their early stages.

Data Availability

The paper is a review and data is not applicable.

Disclosure

The funding sources were not involved to support the study design, collection, analysis, interpretation of data, writing of the manuscript, or in the decision to submit the manuscript for publication.

Conflicts of Interest

We declare no conflict of interest.