Abstract

The Industrial Internet of Things (IIoT) is a recent research area that links digital equipment and services to physical systems. The IIoT has been used to generate large quantities of data from multiple sensors, and the device has encountered several issues. The IIoT has faced various forms of cyberattacks that jeopardize its capacity to supply organizations with seamless operations. Such risks result in financial and reputational damages for businesses, as well as the theft of sensitive information. Hence, several Network Intrusion Detection Systems (NIDSs) have been developed to fight and protect IIoT systems, but the collections of information that can be used in the development of an intelligent NIDS are a difficult task; thus, there are serious challenges in detecting existing and new attacks. Therefore, the study provides a deep learning-based intrusion detection paradigm for IIoT with hybrid rule-based feature selection to train and verify information captured from TCP/IP packets. The training process was implemented using a hybrid rule-based feature selection and deep feedforward neural network model. The proposed scheme was tested utilizing two well-known network datasets, NSL-KDD and UNSW-NB15. The suggested method beats other relevant methods in terms of accuracy, detection rate, and FPR by 99.0%, 99.0%, and 1.0%, respectively, for the NSL-KDD dataset, and 98.9%, 99.9%, and 1.1%, respectively, for the UNSW-NB15 dataset, according to the results of the performance comparison. Finally, simulation experiments using various evaluation metrics revealed that the suggested method is appropriate for IIOT intrusion network attack classification.

1. Introduction

A modern industrial revolution brings deep change and human growth, resulting in “Automation of Everything.” It uses computer networks to link both digital devices, data mining, and real-world application management [1]. This revolution’s opportunity helps everybody to access trillions of data and information that brings new opportunities. Significant increases in efficiency in the physical and digital industries may be felt by humans, resulting in a better quality of life and a more prosperous society. The creation of vast quantities of data from various sensors is popular in the Industrial Internet of Things (IIoT) world. These applications can be felt in various industries like healthcare, retail, automotive, and transport. In many industries, the IIoT can greatly increase efficiency, productivity, and operational efficiency. The IIoT will first develop existing processes and facilities, but the ultimate aim is to create completely new and vastly enhanced goods and services. Many companies recognize how and where IIoT innovations and solutions can lead to organizational changes, new and improved goods and services, and entirely new business models. On the IIoT, machine learning and deep learning algorithms can increase reliability, production, and customer satisfaction by combining technological innovations, sensors, programs, and applications.

Anything necessitates a wide range of technology that must be carefully integrated and orchestrated. These advancements in technology allow intelligent machines, machinery, appliances, and integrated automation systems [2, 3] to automate routine operations and solve complex problems without human interference. Improvements in the smart workplace, smart data exploration, cognitive automation, and other aspects of business smartness should all be included. A digital twin is a virtual representation of physical assets, systems, and so on. It is commonly known as the Internet of Things (IoT), which is constantly developing all of these appliances and supplying us with an incredibly growing dataset that can be evaluated for efficiency, architecture, maintenance, and a host of other issues. A key feature of any digital twin is that it is constantly updated and “learns” any changes that arise in real-time. The IoT concept and its solutions have made a lot of changes in the physical world.

Since the cloud has altered how individuals and organizations communicate and perform business online, cyberspace plays an important role in today’s societies and economies [4, 5]. As a result, the IIoT encompasses a variety of devices, software, and facilities that bridge the gap between the virtual and physical worlds [6]. Due to the connectivity of information technology (IT) and organizational technology (OT), industrial systems that depend on locked and exclusive communication systems are vulnerable to a wide range of interference activities [7, 8].

Machine-to-machine (M2M) and machine-to-person (M2P) connections to the network are used in IIoTs using the TCP/IP interface using various IIoT protocols [9, 10]. The number IIoTs have the number of flaws and bugs that can be abused using a range of advanced attack methods which has increased significantly. The attackers attempt to take advantage of these processes to steal sensitive information, commit financial funds, and corrupt device resources [11]. If the cybersecurity domain does not discover interesting mitigation strategies for stopping cyberthreats to the IIoT, it is estimated that they will cost up to $90 trillion by 2030 [12].

Protecting vital services and infrastructure is becoming a more critical problem in every organization as the volume of IIoT devices and implementations continues to grow [13]. Among the most frequent risks in IIoT networks is malware that abuses zero-day vulnerabilities. The perpetrators infect vulnerable computers to track and change their activities, using a variety of techniques like Progressive Determined Risk (PDR), Denial-of-Service (DoS), and Decentralized DoS. (DDoS). For instance, in 2010, the Stuxnet worm attacked Iran’s nuclear program, in 2013, Iranian hackers hacked into the ICS of a dam in New York, and in 2015, the black-energy passive attack was explicitly equivalent to approximately 80.000 power outages in Ukraine [14, 15]. These nefarious practices showed that conventional cyberthreat methods, like security protocols, cryptography, access controls, and biometrics Interruption Discovery Systems (IDSs), are no longer sufficient for delivering successful vital infrastructure protection.

As a network security tonic, the network intrusion detection system (NIDS) is important in detecting and addressing all Internet attacks. The IIoT has become an essential portion of present machinery for data and knowledge transfer, necessitating the need for global network security [16]. To safeguard workstation schemes from multiple grid invasions, network intrusion detection systems (NIDS) are often used to recognize system traffic. In [17], intrusion is a framework that attempts to break information system’s security services. Researchers have been inspired to create new IDSs in response to the threats posed by these invasive frameworks. Several intrusion detection systems (IDS) have previously been developed and upgraded, but they are still susceptible to a range of assaults. An increasing interest in anomaly detection research is due to IDS’ ability to track and forecast malicious behavior unknown assaults. However, current machine learning-based irregularity discovery methods still have a high false alarm rate [18].

Recently, findings indicate that feature extraction is now at the core of a more accurate IDS [19, 20]. In most detection methods, the feature selection technique is used to pick the fitness values which input attributes for classification models, with the goal of aggregate discovery performance and reducing error rate in NIDS [8]. In particular, classifier feature vectors are massive, and not all of them apply to the groups to be categorized, requiring the use of a feature selection strategy. Conversely, feature selection approaches can be divided into three categories: filter approach, wrapper approach, and embedded approach [21]. The most popular feature selection strategy focused on selecting the best-fitted functionality which relies on dataset measurements lacking seeing classifier’s performance. The wrapper method, on the other hand, is superior since the classifier feedback is used to evaluate the quality of the feature subclass, leading to higher prediction performance. The integrated process is analogous to wrapper approaches in that an intrinsic process modeling function in the classifier could be used to improve the learning algorithm’s search efficiency.

Until now, several various categories for IDSs have been planned. Depending on the classification algorithm utilized, intrusion detection systems (IDSs) may be categorized as rule-based, misappropriation discovery, or diverse schemes. IDSs may either be classified as real-time if they use persistent system tracking or as sporadic or inactive if the tracking occurs only occasionally taking place at fixed times or even offline using data collected and processed over some time. Furthermore, new classifications have recently been introduced while discussing Industrial Control Systems (ICSs) with unique criteria and characteristics. The authors of [22] suggested a new classification system for IDSs called ICS, which are classified into three types: protocol review, traffic processing, and control process modeling.

Countermeasures are taken based on the information gathered from the detection systems about the identified attacks. The more accurately the type of attack is classified, the more effective the chosen countermeasures will be, and the less they will interfere with the device or network’s proper operation. Furthermore, in some situations, countermeasures may have more severe effects than the attack itself if we do not detect the same form of attack. As a result, we aim to develop an intrusion prevention method that has proven expertise in each type of attack Moreover, for both routine and irregular assaults, our system must have a low false alarm rate and a high detection accuracy, allowing limited processing to correctly classify. The latter function is important because intrusion detection systems are used in industrial control systems that operate critical infrastructures, where reliable and timely warning of cyberthreats is critical [23].

The feature extraction strategy is effective for the design and execution of legitimate security solutions, as well as for improving IDS performance [2426]. In certain phenomenon detection methods, the need for greater accuracy and a lowered false alarm rate inspired the concept of data preprocessing and identification as the two mutual levels for IDS prototypes [27, 28]. The preprocessing phase removes the identification process which uses the reduction of attributes after removing redundant features from the dataset, retaining a decreased feature set that can be used to generate a high-performance version to predict attack classes using the base classifier.

Therefore, based on [8], this paper integrates the emerging infrastructure for applications of the Industrial Internet of Things. The authors reviewed the proposed scheme, offered the incorporation of work into a three-tier design for IIoT systems, and tested it against the NSL-KDD and UNSW-NB15 datasets. A rule-based model and a genetic search tool were used for the hybrid feature selection; thus, the evaluator subset was used to compute the connection between the class and each feature. The highest correlation from the attribute and class relationship is then chosen for selection. The merits of each attribute were then evaluated; function selection is known as the genetic search method, which produces attributes with the greatest value. If two attribute segments have the same performance score, the rule-based algorithm (rule assessment phase) produces the feature subset with the fewest volume of subset features. Finally, the features that have been selected are loaded into the ANN for template matching and assault selection. The ability of rule-based schemes combined with learning techniques to improve output precision has been demonstrated [29].

This was inspired by the assumption that integrating classifier optimization techniques into the feature representation and driving it with a rule-based algorithm would improve the performance of an IDS. The paper identifies intrusion in the IIoT network using the proposed model. The datasets used has huge features and parameters; an effective feature selection has to be employed to effectively reduce the high dimensionality of the datasets. This was done to reduce the burden this will have on the classifier. Furthermore, a feature extraction technique would make it easier for the classifier to select the most relevant qualities and exclude those that have a detrimental impact on classifier’s performance. This motivated the creation of a new model using rule-based feature selection to effectively select the most relevant features from the datasets. The DFFNN classifier is used to train the features selected using this hybrid rule-based feature selection. The suggested model’s performance is then assessed using current methodologies.

This paper’s contributions are as follows: (i)A system for intrusion prevention in the Industrial Internet of Things network is suggested(ii)A hybrid approach focused on hybrid deep learning and rule-based feature selection for in-depth intrusion detection analysis(iii)A relation of the current approach to prior methods for intrusion detection in the IIoT network is made. The proposed approach is stable, more efficient, and less resource-intensive, according to experimental results

2. Industrial Internet of Things Analytics Overview

Manufacturing, transportation, electricity, and healthcare are all affected by the Industry 4.0 revolution, which necessitates a change in industries that depend heavily on operational technology (OT). Previously, fog and edge computing [30] technologies were needed for Industrial IoT to ensure the required integration across Industry 4.0. However, this uprising introduces a new interrelated aspect that is critical for IIoT Analytics. The DL algorithms improve big data analytics capabilities, while IIoT Technologies enhance the utility of each of these categories. These algorithms can aid in the identification, categorization, and decision-making of each of these data types. The DL in combination with big data technologies generates practical and valuable data for policymaking. DL will be critical in IIoT and data analytics for effective and efficient selection, specifically in the field of streaming data and real-time insights in conjunction with edge computing systems [1].

Several business verticals, such as healthcare, grocery, automobiles, and transportation, are using IIoT applications. In many industries, the IIoT can greatly increase dependability, performance, and service quality. The IIoT will first develop current procedures and facilities, but the eventual aim is to create completely novel and vastly enhanced goods and services. Many companies recognize how and where IoT innovations and solutions can lead to organizational changes, new and improved goods and services, and entirely new business models. On the IIoT, machine learning and deep learning algorithms can increase reliability, production, and customer satisfaction by combining various machinery, procedures, apps, and applications. Anything necessitates a wide range of technology that must be carefully integrated and orchestrated.

These advancements in technology allow intelligent machines, tools, engines, and integrated control systems to execute repetitive duties to solve difficult problems without the need for human involvement [31]. Smart workplace advancements, intelligent data discovery, cognitive automation, and other aspects of business smartness should all be included. A digital twin is a virtual representation of physical assets, systems, and so on. This is generally alluded to as a result of the Internet of Things, which increasingly extends all of these appliances while providing us with a similarly increasing data collection that can be evaluated for efficiency, architecture, and repair, among other things. Any digital twin’s main advantage is that it is continually updated and “learns” any updates that occur in almost real time. The IIoT model and its applications are creating significant disruptions in the market globally.

2.1. The Four Key Components of Industrial IoT Architecture

Intelligent Edge Gateway: An intelligent edge gateway is a computer program closely aligned with sensor nodes that can capture, aggregate, and sanitize light data streaming. It allows one to upload tabulated and relevant data to the Internet of Things network. It acts as a connection between the hardware and the cloud IoT network in general.

IoT Cloud: The main IoT framework that uses data processing, machine knowledge, and artificial intelligence methods to handle massive quantities of data. The processing capabilities including device control, stream analytics, event management, a rules engine, alerts, and updates are all available. It offers components like big data analytics, as well as authorization, virtualization, end-to-end encryption, SDKs, and application APIs.

Business Incorporation and Platform: This is a backend framework that connects many IT schemes to certify that computer data is collected and processed in the full operational loop. ERP, QMS, planning and scheduling, and other systems are examples of such systems. Data analysis can be divided into three groups depending on the form of the result obtained. There are three types of analysis: descriptive, predictive, and prescriptive. Figure 1 displays IIoT architecture with four (4) layers including things, intelligent gateway, IoT cloud, and business application and integrations.

As Anomaly Detection System (ADS) is an essential security management system that functions as a sniffer and deciding driver for routing traffic and spot suspicious activities [32], it functions as a packet capture and decoding engine for ensuring security and recognizing anomalous behavior. Since it can track both visible and invisible (zero-day) threats, the focus is on creating a pattern from standard data and treating any variance from it as an intrusion [33]. For example, the aim of [34, 35] centered on finding ADS using Particle Swarm Optimization (PSO) techniques for optimizing the performance of the One-Class Support Vector Machine (OCSVM) method by harvesting Modbus/TCP message network streams for testing and verifying the system. In [36], the authors built an IDS/ADS centered on this design, which was learned on offline data from a SCADA setting using network traces.

In [37], the authors constructed an IDS centered on the Modbus/TCP protocol setting using a K-NN classifier. While the aforementioned mechanisms performed admirably in certain cases, they were designed for particular configurations with a strong FPR. Similarly, in [38], the authors proposed an improved intrusion detection system (IDS) for matching the diverse structures of SCADA schemes using diverse OCSVM frameworks to select the right one for efficiently identifying multiple assaults. When operating, nevertheless, this computer used a large amount of computational power and had a high false warning rate for identification. Using SCADA mechanisms to obtain different aspects of contact events and using an SVM algorithm to identify attacks, authors in [39] suggested an ADS for detecting Modbus/TCP protocol-infiltrated assaults. The detection method, on the other hand, was ineffective in detecting irregular behaviors.

To prevent the effects of factors associated with the OCSVM’s ability to track network attacks successfully, the authors in [40] merged the OCSVM method, and the recurrent -means clustering algorithm was used. In another valuable effort, [41] proposed a critical infrastructure intrusion detection system centered on an artificial neural network (ANN) method that trained a multiperceptron ANN to identify anomalous network activity using fault back-propagation and Levenberg-Marquard features. Using a virtual network, in a relevant try, [42] used an ANN to detect DoS/DDoS attacks in IoTs, and in [43], the authors proposed a decentralized IDS based on artificial immunity for IoT devices. In [44], another set of researchers projected a Possibility Risk Identification-centered Intrusion Detection System (PRI-IDS) method for detecting replay attacks by inspecting Modbus TCP/IP protocol network traffic. However, these schemes had a high rate of false alarms and had trouble identifying certain new attacks.

In a related effort, the authors of [45] create a learning firewall that receives tagged samples and automatically configures itself by writing conservative preventive rules to avoid false alerts. We create a novel classifier family called classifiers that, unlike standard classifiers that just focus on accuracy, use zero false positive as the decision-making criterion. The authors first illustrate why naïve modifications of current classifiers, such as SVM, do not produce acceptable results and then present a generic iterative technique to achieve this goal. The proposed classifier, which is based on CART, is used to create a firewall for a Power Grid Monitoring System. We also put the technique to the test on the KDD CUP’99 dataset to see how well it works. The outcomes support the efficacy of our strategy.

IDSs have indeed been analyzed utilizing subsurface networks for identifying irregular findings from host and network-based systems by several researchers [4648]. An ANN with a shallow network has one or two hidden layers, while a deep network has several hidden states of various architectures [49]. Deep learning is a form of a common machine-learning technique used by academic and industrial researchers because it can learn a detailed computational mechanism that mimics the normal behaviors of the human mind [50].

Several researcher has proved that the swiftness which received system signals is converted into massive datasets which pose a significant obstacle to IDS architectures’ ability to analyze the subsequent large amounts of data for actual processing [5153]. The authors in [54] suggested a new rule-based approach for detecting DoS assaults that relied on domain expert knowledge. For identifying DoS attacks, a rule-based classification algorithm was used, and the final classification was carried out by applying the rules from the rule base and was confirmed using a domain expert. Feature selection techniques, also known as spatial removal, can aid in the conversion of databases from an elevated to a lesser spatial domain that better represents the problem space with the same efficacy [55, 56]. Unconnected variables may be eliminated without lowering data’s importance to the detection model, which is the foundation of introduced feature collection [57, 58]. Most datasets have several attributes but few examples, according to [59, 60], possibly requiring and using feature selection techniques.

In [61], the authors provide an attack taxonomy based on the several layers of the IoT stack, such as device, infrastructure, communication, and service, as well as the specific characteristics of each layer that can be exploited by adversaries. Furthermore, we explain IoT-related vulnerabilities, exploitation techniques, attacks, impacts, and potential mitigation mechanisms and defense strategies using nine real-world cybersecurity incidents that attacked IoT devices deployed in the consumer, commercial, and industrial sectors. These with various additional examples emphasize the fundamental security vulnerabilities of IoT systems and indicate the possible attack implications of such interconnected ecosystems, while the suggested taxonomy provides a systematic approach for categorizing attacks based on the impacted layer and its impact.

A rule-based classifier-based data reduction strategy has been proposed in [62]. The suggested dimension reduction technique is an innovative data preparation technique that decreases both attributes and occurrences in testing specimens while keeping classifier precision. In [63], the authors suggested a fuzzy-based semisupervised learning method for IDS that constitutes a significant quantity of unlabelled data powered by labeled data to increase classification performance. The authors used the fuzzy measure to produce a trained independent hidden node feedforward neural network used to generate a fuzzy set vector of small, medium, and large specimen classification on unlabelled data. The training set is reused after using each vector of data classification independently in the initial training dataset. In a related work, the authors in [21] suggested a new method wrapper-based NIDS architecture based on Bayesian networks. The feature selection technique is used in this context to extract the appropriate features from the sample so that the Bayesian network classifier can reliably predict attack types.

For intrusion detection, [64] suggested a crossbreed method combining SVM and the ant colony. The aim of integrating the two machine learning techniques is to account for the shortcomings and capabilities of both methods to provide a more precise occurrence grouping. Similarly, in [65], the authors projected a wrapper approach for lightweight malware discovery based on decision trees. The suggested technique has four processes: preprocessing or removal of duplicate attack patterns, feature selection centered on a genetic algorithm (GA), postprocessing for standardized results, and traffic classification techniques centered on a neurotree technique. Similarly, in [66], a wrapping suitability purpose centered on a violation word for a wide amount of attributes with good classification precision and strict enforcement. The suggested wrapping fitness value is effective for feature extraction while maintaining prediction performance, according to the experiments. In [67], the paper suggested a decision tree classifier-based NIDS function collection depending on GA. The researchers used a GA to derive input data for decision trees as a classification algorithm to improve identification and reduce false alarms in cyberthreat detection.

In [53], the authors proposed a smart rule-based identification scheme for detecting Deprivation of Service (DoS) assaults in cloud servers scheme. The study used scoring and rating algorithms to simulate a cloud service, assault identities, and choose the best functionality. To discover assaults, a rule-based grouping procedure grounded on quality expertise was used to the selected features. The key benefit of their proposed model is a lower rate of false alarms and increased protection. But, due to the complex nature of attacks, the risk of confusion was not addressed. A modern feature selection strategy and a more effective KKN classifier were proposed in [68] for intrusion detection. The introduced feature set significantly reduces the existence of irrelevant features, thus improving KNN classifier’s classification ability to distinguish kinds of invasion. Furthermore, the suggested feature selection algorithm reduces classifier’s error warning rate dramatically. In a related work in [69], the authors suggested a new sophisticated artificial potential field technique for selecting features, as well as the implementation of a phased architecture as a base classifier for assault identification, when the suggested algorithm had a better classification exactness and a low wrong alarm degree as opposed to other approaches.

A machine learning classification algorithm to extract malware photographs with a mix of local and global characteristics was propped in [70, 71]. Their processes had a classification precision of 98.4% on a broad-scale study, using 9339 samples from 25 malware relatives in the Malimg dataset. Their methods achieved 99.21% classification accuracy in small-scale research, with 5288 samples from 8 malware relatives in the Malimg dataset. The authors in [72] created a CNN model that is used to separate threats from a corpus of binary executables. Moreover, this method had a classification accuracy of 98.52% when tested against a dataset of 9339 samples from 25 malware executables. Besides that, this template is used to randomly select 10% of samples in each loop to assess a malware family. In [73], the authors proposed a CNN-based malware classification model. From a dataset of 9339 samples, this model had a 98% accuracy. In each loop, a random method is used to pick 10% of samples to evaluate the malware family in question.

CNN used to create a malware classification model in [74]. The study used a corpus of 9339 samples from 25 diverse malware groups; this method had a 94.5% accuracy rate. In the same vein, in [75], the authors created a deep convolutional neural network that uses color image visualization to discover malware assaults on the Internet. Their findings showed that their classification efficiency for measuring cybersecurity threats had improved. The authors in [76] suggested a system built on Random Coefficient Selection and Mean Adjustment Method (RCSMMA). RCSMMA performs well against a variety of modern cyberattacks. Authors in [77] outlined the most important smart city applications and discussed the major issues of privacy and protection in smart city application architecture as a result of malware attacks. To avoid antagonists in the global sensor network, the authors in [78] proposed a stable steering and watching protocol using multivariant tuples.

To establish a powerful defense system against invaders, authors in [79] recommend developing strong intrusion detection systems that can detect intruders. In this paper, an ensemble classifier based on Crowd-Search is employed to categorize the UNSW-NB15 dataset, which is based on IoT. The most important characteristics from the dataset are first identified using the Crow-Search method and then provided to the ensemble classifier for training using the linear regression, Random Forest, and XGBoost algorithms. The proposed model’s performance is then compared to that of state-of-the-art models to ensure that it is effective. The experimental results show that the suggested model outperforms the other models studied.

The widespread use of the internet in all aspects of human existence has raised the possibility of malicious attacks on the network. Intrusion detection systems have emerged as a result of the ease with which activities carried out via the network can spread. The patterns of attacks are also dynamic, necessitating effective cyberattack classification and prediction. To identify intrusion detection system (IDS) datasets, in [80], the authors proposed a hybrid principal component analysis (PCA)-firefly-based machine learning model. The dataset for this study was obtained from Kaggle. For the transformation of the IDS datasets, the model first uses One-Hot encoding. For dimensionality reduction, the hybrid PCA-firefly method is used. For classification, the XGBoost algorithm is used on the reduced dataset. To demonstrate the superiority of our suggested strategy, we undertake a detailed evaluation of the model using state-of-the-art machine learning approaches. The results of the experiments show that the suggested model outperforms the existing machine learning models.

From the existing related work, it can be seen that DL algorithms can considerably be used to increase the efficiency of IDS for IIoT by achieving the highest prediction performance while maintaining a low false alarm rate. Thus, motivating the use of the DL model with a hybrid rule-based technique for the automatic feature selection and sensing anomaly trends in data as suspect vectors using data transmission depth coverage. The proposed DFFNN based with hybrid rule-based feature selection model contains a rule-based model using a genetic search engine to select the relevant features and the DAE-DFFNN algorithm to classify IIoT network by classifying the constraint values of the DAE. It can find a good approximation for communication networks and transform high-dimensional data to low-dimensional data using DAE-DFFNN model’s decreased layer, as explained in the following subsections.

4. The Proposed Intrusion Detection for Industrial Internet of Things Network

In this analysis, the deep feedforward neural network (DFFNN) is used to generate an effective ADS for IIoT locations. In the testing stage, a dual feature extraction employs a genetic search system as well as a rule-based algorithm. The subsection assesses or calculates the connection between individual features as well as the category. The class-attribute interaction with the highest similarity is used for filtering. This is referred to as function assessment. The genetic search procedure determines the qualities of each feature based on this function assessment and returns the attributes with the uppermost suitability value. If two attribute subsections have the same performance score, the rule-based algorithm (rule assessment phase) yields the feature vectors with the fewest quantity of subsection attributes. Finally, the chosen attributes are fed into the ANN, which is used to create models and classify attacks. These parameters are used to set up a standard DFFNN for discovering current and new attack instances. The DFFNN is used to detect mischievous vectors during the testing process. By translating the reduced hidden units, various hidden layers in the methodology will properly develop a detailed feature vector and grab the most important features. The subsections go into the specifics of the proposed system methodology.

4.1. Deep Feedforward Neural Network (DFFNN)

The fundamental deep learning models are deep feedforward networks, often known as feedforward neural networks or multilayer perceptrons (MLPs). A feedforward network’s purpose is to approximate a function . For example, transfers an input to a category in a classifier. A feedforward network learns the values of the parameters that result in the best function approximation by defining a mapping . Because information flows through the function being evaluated from , the intermediate calculations necessary to define , and finally, to the output , these models are referred to as feedforward models. There are no feedback links; therefore, model’s outputs do not feedback into it. Recurrent neural networks are feedforward neural networks that have been extended to incorporate feedback connections. The DFFNN is usually described as an ANN method with input neurons, several hidden nodes, and an output neuron that are all directly connected without the use of a cycle [81].

The secret surface of each node reflects indistinct attributes dependent on the preceding stage’s display, which are dynamically computed and processed in multiple layers to produce the outputs. This strategy is trained using a stochastic slope descent back-propagation methodology [82]. You can give a deeper feedforward neural network the ability to capture more complicated representations by creating a deeper feedforward neural network. If the complexity is justifiable, this could be justified. It has the advantage of being able to readily represent more complex functions.

The source data is fed into input nodes before being forwarded on to the hidden units, which generates a nonlinear manipulation of the information before being moved on to the output nodes in this deep-learning technique. To calculate the quality of the result, a feature role or back-propagation fault [83] is calculated, which is the discrepancy between the predicted and real presentation, and its value is transmitted backward across the unknown nodes to change the masses. The loss function is measured utilizing sole or minibatch specimens of the training examples rather than the whole set, with loads calibrated during each test to determine that the model is correctly suited.

This computation training data approach is based on the random chance of neural network variable activation, which results in the template being put in minima solutions with poor normalization [84]. To improve the convergence rate and the results of supervised learning, pretraining unsupervised strategies, specifically an AE, can be used to build the activation specifications [11].

4.2. Deep Autoencoder (DAE)

A DAE is a feedforward neural network strategy for fast unsupervised computing execution [85]. It investigates the estimation of a unique task, where the result is equal to the input to construct a definition of a collection of data, that is, , . Its schematic representation consists of vectors in the input nodes and several concealed units of nonlinear initiation attributes. To learn compact features of the input data, the extracted features employ fewer neurons than the input nodes. As a result, it knows the most significant attributes and lowers spatial size and views the input data as an abstraction. At the end of the method, the output layer is shown as a close depiction of the input layer.

An AE’s simplest framework comprises three layers: input, secret, and output. If the training data has samples, each has several proportions, as well as a spatial function vector (); the Tanhinitiation function [85] is used and calculated using

The encoder and decoder are the two key components of the AE algorithm [86, 87]. A deterministic mapping called an encoder method is used [86] to transform the input vector into a hidden layer representation , and the dimensionality is reduced to provide the right number of codes. where is a weight matrix, is the number of neurons in a concealed level , is the bias vector, is the Tanhinitiation utility, and , are the mapping parameters.

The product of the concealed layer’s depiction is plotted, and the translator method is calculated by the deterministic plotting as an approximation to restructure the input as an estimate .

is a weight matrix, is a bias vector, and represents the mapping parameters .

The information in that compressed representation is then used as inputs to reconstruct the original information after being transformed to fit the secret surface. The reform mistake (i.e., the alteration between the raw document and its low-dimensional reproduction) for a standard or minibatch training set(s) is calculated by the training process.

Feature selection phase:

Definition 1 (subset). A feature is said to be relevant if there exists some and for which such that

Definition 2 (SubsetEval). If the connection between an individual component of when the association between a function and the outside parameter is understood, as well as the intercorrelation across each set of parameters, the connection between a standardized test made up of the combined modules and the outside parameter can be estimated in (6). where seems to be the association between the sum of the modules and the external parameter and is the number of elements, is the average of the component-to-outside-variable correlations, and is the average component-to-outside-variable intercorrelation.

Definition 3 (Genetic search). The term “genetic search” refers to an exploration that is motivated by normal progression. A suitability task that is a lined grouping of an accuracy duration and an effortlessness duration is used in this genetic search. where represents a function subset, represents DFFNN’s average cross-validation precision, represents the number of instances or training samples, and represents the number of subset features.

Definition 4 (Rule engine). If there are several feature subsets with identical fitness values, the rule-based instrument yields a feature subgroup with fewer features , else, it yields the feature subgroup with the uppermost appropriateness value to the base classifier as in (8).

This study suggests an effective intrusion discovery model for safeguarding the IIoT system against the mischievous activity. Figure 2 displayed the architecture of the projected model with the training and testing phases.

Figure 2 shows the proposed intrusion detection in IIoT network. In an IIoT setting, the proposed scheme investigates and chooses critical information from large-scale data. The first phase in the suggested method is data preprocessing, which includes function translation and regularization model.

4.2.1. Feature Transformation

Since the suggested framework only embraces mathematical properties, the apiece rhetorical attribute value is transformed into a mathematical formula; for instance, the NSL-KDD dataset contains multiple figurative attributes like procedure natures with reference values like ICMP, TCP, and UDP, which are plotted to 1, 2, and 3, respectively.

4.2.2. Feature Normalization

Since DL relies on different features based on masses. Data could be skewed into various spots due to levels, causing some values to update quicker than others [8, 11]. As a result, it is important to deal with this problem using statistical normalization, in which the function for each feature value is calculated by where is the standard deviation and is the mean of the values for a given function .

Since networks have high dimensionality, it is important to minimize it to increase computing resources and develop a compact and flexible ADS strategy [88]. As a result, the suggested DAE-DFFNN method is used to decrease high proportions to low proportions via a main reduced surface. More specifically, the model contains a nonlinear mechanism that encrypts a lot of features into the lesser feature set in the reduced hidden state, requiring dimensionality reduction to be realistic without the necessity for professional acquaintance. The purpose of the rule based with DAE-DFFNN dimension reduction is to identify excellently embodiments from the unclear framework in the probability model in terms of increased learning and also processed and decreased attributes.

4.3. Details of the Datasets Used

The testing, examining, and assessing of the behavior of the discovery scheme depends solely on the dataset, and this plays a vital role in getting a better result. A high-performance one not only yields effective outcomes for an offline device but can be successful in an actual setting. Most authors also used the well-known NSL-KDD datasets, which is a revised variant of the KDD CUP 99 database that solves the KDD CUP 99’s main problems by deleting duplicate information and selecting documents concerning their proportions. It comprises 148,517 documents (77,054 standards and 71,460 assaults) after preprocessing, apiece of which includes 41 attributes and a class mark. Probing, DoS, user to root (U2R), remote to local (R2L), and normal are the five classes [89, 90]. However, despite being commonly used in IDSs, it is now obsolete [91]. As a result, a novel dataset called UNSW-NB15 is used to effectively test our proposed work. It includes contemporary synthesized attack activities and represents actual current normal behaviors [92]. It has a total of 257,673 records (93,000 regular and 164,673 attacks), each with 41 features and a classification mark. Fuzzers, examination, backdoors, DoS, vulnerabilities, standard, reconnaissance, shellcode, and worm are all among the ten separate class labels, one standard, and nine attacks.

4.4. Performance Analysis

To assess the performance and comparison of the proposed algorithm using DL and hybrid rule-based model with other existing models, the following performance metrics were used. The amount of correct and incorrect outcomes in a classification problem was summed and compared; the results were with the reference results. Accuracy, precision, recall, specificity, and -score are just a few of the most common matrices. True-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) statistical indices were calculated to solve the confusion matrix, as shown in Equations (10)–(16).

From Equations (10) to (16), accuracy denotes how often the prediction is correct, whereas precision denotes how often the class will be correct during prediction. However, recall indicates how much of the all-positive class was correctly predicted, whereas specificity assesses how well the negatives were identified. The -score is a combination of exactness and recall. The quantity of correct negative estimates distributed by the total quantity of negative forecasts is known as specificity. The true-positive rate (TPR) is defined as the proportion of properly recognized attacks over the total quantity of dataset classes, as seen in Equation (15). The TPR stands for discovery rate. The false alarm rate (FAR) is calculated by dividing the number of records wrongly denied by the total number of normal records. Equation (16) defines the FAR evaluation metric. As a result, in the IIoT system, the impetus for intrusion detection prediction is to achieve a higher accuracy and detection rate (DR) with a lower false alarm rate.

5. Results and Discussion

The R programming language platforms were used to implement the proposed model, and the evaluation was done using the explained performance metrics. Both datasets with the relevant DAE-DFFNN with dual rule-based design are used to seamlessly incorporate all characteristics. The NSL-KDD dataset contains 77,054 regular documents and 71,460 assault documents, as well as different samples from the UNSW-NB15 dataset, which contains 93,000 regular documents and 92,000 assault documents, with 20 percent of the normal records represents 40%, 20%, and 60% of the testers which were used for testing, respectively.

The network structures and parameters adopted based on the experiments yield the peak DR and lowermost FPR. The proposed model used the best network structures for both datasets after the best features are selected using the hybrid rule-based genetic search engine in addition to the DAE feature selection model are 41 nodes for one input layer, 10, 3, and 10 nodes for the three hidden layers and 41 nodes for the output layer for the DAE technique, and 2 nodes for the DFFNN model for the output layer with 2 nodes. For the NSL-KDD dataset, 0.0015 is the learning rate and 0.2 momenta start, and for UNSW-NB15 dataset, L1 and L2 regularizations of , momentum start of 0.2, momentum stable of 0.4, 17 ramp momentum, annealing rate of 2-6, and 100 epochs; 0.002 learning rate for the Tanh activation function was used.

Table 1 and Figure 3 show the performance of the projected model using both NSL-KDD and UNSW-NB15 datasets using numerous metrics. The results obtained using various metrics show that the projected model is very important and relevant in intrusion detection of IIoT network for attack prediction and classification.

Table 2 displays the accuracy, detection rate, and FPR of the proposed model on the datasets. The findings reveal that the model outperforms the UNSW-NB15 dataset on NSL-KDD, with a precision of 99.0 percent, a detection rate of 99.0 percent, and an FPR of 1.0 percent.

Table 3 shows the discovery rates for the classes in the NSL-KDD and UNSW-NB15 datasets using the projected model. The outcomes for the UNSW-NB15 dataset are displayed in Table 3 and Figure 3 for the discovery rates of the record types classes: analysis (92.3%), backdoor (95.2%), DoS (97.3%), exploits (98.0%), fuzzer (67.1%), generic (99.8%), normal (99.6%), salicode (90.3%), worm (81.7%), reconnaissance (92%), and shellcode (90.2%), respectively. The results for the NSL-KDD dataset using the projected model are displayed in Table 3 and Figures 4 and 5 to determine the records types like DoS, normal, U2R, R2L, and probe with discovery rates of 99.2%, 99.7%, 75.5%, 94.3%, and 99.0%, respectively. The proposed model demonstrated overall better performance for intrusion detection in both used datasets even though some results like U2R, fuzzer, and worms are not too high in both datasets.

5.1. The Comparison of the Proposed Model with Existing Methods

To show how feature selection affects classification algorithm’s detection efficiency, Table 4 compares the proposed approach to several known approaches. Table 4 displays the cumulative performance measures for the proposed system and other models using the decreased UNSW-NB15 dataset. The precision and FPR of the suggested approach are better than those of other approaches. The suggested network intrusion detection method, in general, has a 98.9 percent accuracy, which is 0.1 percent higher than the modified KNN with the second-highest accuracy. Similarly, as compared to other classifiers, the proposed method’s FPR has a very low error percentage of 1.1 percent. When equated to other techniques using the reduced UNSW-NB15 dataset, the proposed approach performed better across all evaluation metrics. The proposed method’s marginally higher accuracy is due to its robust feature selection and rule-based fitness assessment.

The suggested model’s efficiency is contrasted to that of nine recently developed anomaly detection techniques, including the ADS system based on DL, the Filter-based Support Vector Machine (F-SVM) [95], the Computer Vision Method (CVT) [96], the Dirichlet Mixture Model (DMM) [91], the Triangular Area Nearest Neighbors (TANN) [97], DBN [98], RNN [52], DNN [81], and Ensemble-DNN [99]. Table 5 compares the identification rate and false-positive rate of our proposed system to other models tested on the NSL-KDD dataset. Our developed scheme delivers the desired performance, with 99 percent DR and 1.8 percent FPR. The first four models demonstrated rational results in identifying destructive events after a feature selection process. F-SVM used shared information to solve linear and nonlinear data properties, which was then paired with the SVM for attack detection. Nevertheless, to improve IDS efficiency, this model’s search strategy must be refined. CVT and TANN used the PCA technique to reduce the data measurements.

The F-SVM has a detection rate of 92.2% and FPR of 8.7%, CVT with a detection rate of 95.3% and FPR of 5.6%, DMM with a detection rate of 97.2% and FPR of 2.4%, TANN with a detection rate of 91.1% and FPR of 9.4%, DBN with a detection rate of 95.1% and FPR of 4.5%, RNN with a detection rate of 73.0% and FPR of 3.6%, DNN with a detection rate of 76.0% and FPR of 15%, ensemble-DNN with a detection rate of 98.0% and FPR of 14.7%, and ADS with detection rate of 99.0% and FPR of 1.8%. The proposed model differs from previous DL-based IDSs in that it uses a basic mathematical algorithm (DAE) and a hybrid rule-based function selection to estimate parameters that are appropriate DFFNN input to create its classification effectively and efficiently. Moreover, the model knows and examines high-level functionality, automatically decreases data dimensionality, and effectively portrays important features due to the reduced hidden layer. As a consequence, the proposed model is optimal for use in a real-world industrial environment with a vast amount of unlabeled and unstructured data, such as IIoT.

6. Conclusion

This paper proposes an ADS model for identifying destructive activities in IIoT networks utilizing data from TCP/IP packets. It employs unsupervised DL strategies that are hybrid rule-based with automated dimensionality reductions to provide a good description of standard network structures for unsupervised learning. The suggested DAE-DFFNN with hybrid rule-based design is successfully used to develop and remove essential features that improve its overall efficiency. As compared to other strategies developed in recent research, the proposed model achieves the maximum identification rate of 99.0 percent and the fewest false alarms of 1.0 percent when checked on different data samples from the NSL-KDD and NSW-NB15 datasets. Both NSL-KDD and NSW-NB15 were included in the proposed model since they are often used by researchers in intrusion detection and as a benchmark. The use of hybrid rule-based feature collection improves the consistency of the proposed model by using only appropriate features for class classification in the datasets. The future analysis would consider the use of real-world data gathered by the IIoT system to determine the effectiveness of its operation in these settings. In addition, in future work, the proposed model will be extended to accommodate different protocols.

Data Availability

No data available.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.