Abstract

The idea of a smart city is to connect physical objects or things with sensors, software, electronics, and Internet connectivity for data communication through the Internet of Things (IoT) devices. IoT enhances productivity and efficacy intelligently using remote management, but the risk of security and privacy increases. Cyber threats are advancing day by day, causing insufficient measures of security and confidentiality. As the hackers use the Internet, several IoT vulnerabilities are introduced, demanding new security measures in the IoT devices of the smart city. The threads concerned with IoT need to be reduced for efficient Intrusion Detection Systems (IDSs). As a result, machine learning algorithms generate correct outputs from a large and complicated dataset. The output of machine learning could be used to detect anomalies in IoT-network systems. This paper employed several machine learning classifiers and a deep learning model for intrusion detection using seven datasets of the TON_IoT telemetry dataset. The proposed IDS achieved an accuracy of 99.7% using Thermostat, GPS Tracker, Garage Door, and Modbus datasets via voting classifier.

1. Introduction

A smart city is an architecture composed of data and communication technologies to create, deploy, and eventually support advancement to oversee cities and address the challenges of urbanization smartly and viably. The main focus in the smart city is to connect various objects to transmit the data intelligently. Multiple countries have presented such smart city ideas to utilize the resources and manage urbanization growth effectively. The information and communication technologies (ICTs) can be deployed to accomplish the fulfillment of smart cities, especially the Internet of Things (IoT), which is most important for effective operations [1]. IoT requires an Internet source for communication with other objects, nodes, and applications over the cloud to get information on their adjacent object. IoT devices led to extensive utilization in an advanced healthcare environment, connecting patients and doctors to extend healthcare services intelligently. In smart healthcare, mostly integration of clinical decision support systems is deployed. The IoT-based system guarantees to provide cost-effective solutions to the healthcare domain [2]. Since it is used in everyday life, IoT is often referred to as the Internet of People (IoP) from individuals to organizations. As a result, the number of connected devices is increasing all over the world. Many sensors are used in embedded systems to gather real-time data from physical objects from afar. We can create intelligent decision systems and effectively manage IoT environments using the data obtained from the sensors. The link of commonly used real-world gadgets to the Internet, on the other hand, often poses concerns about cybersecurity risks. As a result, organizations and countries are concerned about the safety of IoT devices against anomalies. The required actions must include physical and cybersecurity steps and confirmation of protection against significant IoT architecture attacks. To protect and guard against attacks from infected IoT devices, intelligent intrusion detection techniques for IoT devices must be designed and built. However, many intrusion detection devices require a significant amount of computing power and energy [3].

1.1. Motivation and Contributions

New threats regularly arise because IoT devices run in an embedded and interdependent setting. Furthermore, since IoT devices are always left unattended, a malicious attacker may gain access to them. Since IoT devices are usually connected to cellular networks [4], eavesdropping may reach privately held information from the contact platform. Aside from these security concerns, IoT systems cannot continue to have sophisticated security measures due to their limited energy and processing capacity. To secure IoT applications from cyber-attacks, another line of defense should be installed into IoT networks. AI-based systems have recently gained credibility in a common framework for detecting network attacks, such as IoT networks. IoT sensors and network traffic should be logged and analyzed to learn standard patterns. When a person’s behavior deviates from the ordinary, this is a sign of irregular behavior. These methods have also been checked to predict emerging risks, developing a set of IoT interfaces and network security protocols.

The main contributions of this research are detailed as follows:(a)A deep-learning based approach with current databases is employed to categorize the attacks(b)A safeguard is introduced for the IoT network’s reputation and ensures that it is only available to approved users(c)A basis for incorporating IDS into an IoT-based system as an application is proposed.

IoT networks face additional security challenges than conventional computer systems due to a variety of reasons. Firstly, IoT systems are incredibly complex in processors, platforms, communication methods, and protocols. Secondly, to bind physical objects, IoT systems comprised of Internet-connected modules and control devices are used. Thirdly, there are no well-established limits of IoT schemes, which often shift due to the versatility of users and computers. Fourthly, they will be physically endangered by IoT structures or a part of them. Fifthly, limited resources make it impossible to incorporate advanced security techniques and applications on IoT computers. Finally, because of the exponential expansion of IoT-based computers, those networks could be vulnerable to attacks on privacy and protection [5, 6].

Several tools and applications have been created to mitigate network attacks by detecting inconsistencies in the IoT environment using machine-learning and deep-learning techniques. Several state-of-the-art strategies for classifying these anomalies using machine learning techniques in the IoT infrastructure have been reported in the literature. Nonetheless, deep learning methods have been used for the same reason by a few. Deep learning methods have proved to be the best state of the art for pattern matching, and they can detect any input in an IoT system as true or false. Signature-based techniques, specification-based methods, anomaly-based tactics, and mixed strategies are the four main types of ID attacks [1].

Signature-based methods start by looking for correlations between a set of network data and a function database. If the scanned data suits the signature record, the data would be considered illegal. It is helpful to determine the type of attack precisely. It is a low-labor-intensive project with little demand. They encourage machine managers to define rules and thresholds in advance. The same rules will be followed. IDS detects the current system and network status. The IDS will detect an abnormal state and react appropriately once the threshold is exceeded or the rules are violated [6].

Anomaly-based methods aim to figure out which phenomena are abnormal and which are not. The main advantage of using this method is to detect potentially new intrusions. However, its one disadvantage is prone to false positives. Machine learning algorithms are currently being studied in anomaly-based intrusion detection methods to improve their advantages. Machine learning algorithms can monitor active activity and equate it to known intrusion footprints to identify potential attacks using anomaly-based intrusion detection techniques. In a hybrid approach, multiple recognition techniques are used in the same scheme. This solution would eliminate the current limitations of a single mechanism and increase the overall stability of the IoT method. The wholly developed IDS, on the other hand, would be extremely large and complicated. The technique will become more complex as a result, and more capital will be required. In addition, intrusion detection can be time and expense-consuming due to the many protocols involved [7]. Vigneswaran et al. [8] developed an anomaly-based IDS that functions in traditional networks and trains and tests the model using the KDDCup99 dataset. The proposed solution has an accuracy of 95% and should be adopted. However, they use the KDDCup99 dataset, which lacks homogeneous data and few specific records, making reliable findings challenging to come by.

Ajaeiya et al. [9] advocated for anomaly-based IDS that only uses network functionality. The R-tree algorithm outperforms the other machine learning models with a 99.5 percent true positive rate and a 0.001 percent false-positive rate. Their results showed how effective mathematical algorithms like Random Forest could be. Their dataset, on the other hand, is not a test that raises questions about its validity. Abubakar et al. [10] proposed an SDN-compatible identification tool. They had a signature-based ID and an anomaly-based ID that were trained and tested on the NSL-KDD dataset. The detection precision is higher than 97.4%. Intrusions observed solely by anomaly detection, on the other hand, cannot be distinguished from those detected by signature detection.

Tang and Kapitnov et al. [11] suggested a protocol for connected networks that uses blockchain technologies to facilitate peer-to-peer communication. The protocol ensures the communication mechanism’s protection and manages variability in working states. Currently, researchers are looking at turning blockchain into a multiagent system.

Li et al. [12] suggested an enhanced method for extracting IoT data features to detect IDS for smart cities using deep migration learning. They have said that their plan would compensate for the lack of an appropriate training set. They also claimed that their approach yielded higher detection rates at high performance than conventional approaches and significantly reduced clustering time.

Arshad et al. [3] suggested a new intrusion prevention scheme for IoT systems with limited resources. As a result, intrusion protection is disabled for IoT devices and the edge router. To browse network packets, IoT devices are used as IDS nodes. It can only receive raw packets from the host router node, which contain confidential data. For genuine time-destroying behavior in domestic IoT gadgets, Anthi et al. [7] proposed three-layer IDS architecture. The protection layers in this architecture define intrusion for IoT systems based on their normal or irregular behavior.

2.1. Threats in IoT

IoT signifies a heterogeneous environment of sensing devices connecting over the Internet [13]. The threads associated with IoT differ from the conventional networks because it has limited computational power and memory. Furthermore, IoT devices utilize insecure wireless communication media, that is, 802.15.4, LoRa, ZigBee, and 802.11ac. Moreover, IoT devices lack standard operating systems, different formats, and application-specific functionality, due to which standard security protocol is difficult to develop [14]. All these shortcomings cause various types of security and privacy threats.

In addition, the communicating IoT devices are mostly multivendor, demanding a reliable tool to act as a bridge [15]. Various research works have highlighted the issue of software updates to billions of IoT devices [16]. Therefore, the detection of threats and challenges associated with IoT-based systems is significant during the design and implementation of the security measures for IoT machines. Internet Engineering Task Force (IETF) has recognized various IoT threats [17], such as man-in-the-middle (MiTM) attack, Denial of Service (DoS) attack, replacement of firmware with malicious code, privacy threats, and eavesdropping attacks.

The basic ideas of security and privacy rotate about the Availability of the network, Confidentiality, and Integrity of data. Any unauthorized access of data may cause a breach of availability, confidentiality, or integrity. Thus, privacy threat is a concern with the privacy of the data, while security threads influence the integrity of the data and availability of the network. Figure 1 illustrates different privacy and security threats associated with IoT devices.

2.2. Security Threats
2.2.1. Denial of Service (DoS)

DoS is a common and basic implementation of security threats that could be utilized against an IoT device. DoS attack is a preferred tool for intruders due to the low-security features in many IoT devices. DoS attack happens when the attackers take control to make a device unavailable. The main aim of a DoS attack is to down the network by sending illegal requests. The advanced type of the DOS is referred to as Distributed DoS (DDoS), where several attacks are involved in a single target [18]. Different kinds of DDoS attacks are used, but all of them have the same objective. The most common of attack’s type is a Botnet attack in an IoT network [19].

2.2.2. Man-in-the-Middle (MiTM)

These attack approaches are considered old enough in the cyber world [20]. Sybil attacks, message tamper, and spoofing can be classified as MiTM attacks. IP spoofing, DNS spoofing, ARP Spoofing, and HTTPS spoofing are the common attacks of spoofing.

2.2.3. Malware

Malicious software is also known as malware. It exists either in the trojan horse, worm, spyware, virus, malvertising, or rootkit [21]. A few examples that are suffered from malware are healthcare devices, vehicular sensors, and smart home products.

2.3. Privacy Threats

The users and their data are comprised in the IoT devices are inference attacks, sniffing, and deanonymization.

2.3.1. Man-in-the-Middle (MiTM)

As we know, there are two types of MITM attacks, one is active and the other is passive. The passive MiTM attacks silently listen to the transfer of data among two devices. This attack does not change the data but violates privacy only. After accessing a device, an intruder can watch silently for a couple of days before attempting the attack. The increasing numbers of IoT devices such as smartphones, toys, and wristwatches produce a high impact of passive MiTM attack sniffing and eavesdropping. Similarly, active MiTM attacks are included in harming the data. For example, a client will communicate with the server, possibly connecting with the MiTM attacker, who is personating to be the server, as illustrated in Figure 2.

2.3.2. Data Privacy

It is concerned with data leakage [22], identity theft, data tampering, and reidentification [23]. Data tempering is used to alter the data and it can be categorized as an active attack of data privacy. Similarly, data leakage and reidentification are an example of passive attacks of data privacy.

To summarize, an IoT-based system is not fully secure because it facilitates the users to access their data without any trouble. But on the other side, it provides an insecure atmosphere for the attackers to access any network segment. Various ways of the threats are depicted in Figure 3, through which IoT-based systems may compromise. Thus, users should be aware of all these security weaknesses to protect themselves from cyber threats. Various methods are employed to reduce cyber threats. Most recently, AI-based system has been used to classify network traffic on a large setup.

3. Materials and Methods

Various machine learning algorithms, including deep learning models, are utilized to find network attacks. In the proposed system, the first data balancing is performed through the Minority Oversampling Technique (SMOTE) method [24] to avoid overfitting. Then, random forest, voting classifier (ensemble of logistic regression, random forest, and Gaussian Naive Bayes), artificial neural network (ANN), and 1D CNN (convolutional neural network) are applied to find the normal and abnormal traffic in IoT environments. Figure 4 demonstrates the proposed IDS for the IoT network.

3.1. ToN-IoT Telemetry Dataset

In this work, we used a dataset known as the ToN-IoT Telemetry dataset [25], which can be retrieved at the ToN-IoT repository [26]. This dataset was gathered from various sources through Telemetry of IoT devices plus logs of operating systems and network traffic of IoT-based systems.

The ToN-IoT datasets were categorized with a label of normal or attack for binary classification. They also included a type of subclasses: DDoS, backdoor, injection, normal, password cracking, ransomware, and Cross-site Scripting (XSS). In the Train_Test_datasets folder [26], the total number of seven datasets were evaluated for IoT devices, including Weather, Thermostat, GPS Tracker, Fridge, Garage Door, Modbus, and Motion Light. The distribution of attacks for each dataset is presented in Figure 5. A brief description of these cyber-attacks is provided in Table 1.

3.2. Data Balancing

The Synthetic Minority Oversampling Technique (SMOTE) method [24] is usually employed for data balancing. The main idea of SMOTE is to create new minority cases by incorporating various minority cases that remain together. Initially, the k-nearest neighbors of all minority cases are identified. Then, minority cases are initiated on the positions among the minority cases and their k-nearest neighbors till the database is balanced. Thus, the problem of overfitting is avoided.

3.3. Classification

The proposed model is evaluated through various machine learning algorithms and deep learning models: random forest, voting classifier, ANN, and 1D CNN. The following parameters are used during the training of classifiers, as presented in Table 2.

3.4. Evaluation Criteria

The main purpose of this model is to classify the normal and abnormal attacks based on the following outcomes, as illustrated in Table 3.

The following formulae are evaluated based on TP, TN, FP and FN as reported in Table 4. Also, confusion matrix is evaluated to demonstrate how much data is correctly and wrongly classified.

4. Results and Comparisons

The experiments are carried out using machine learning and a deep learning model on the ToN-IoT datasets. Metrics used to evaluate the performance are accuracy, precision, recall, and F-score [31]. The highest result achieved through different classifiers for each dataset is presented in Table 5. For example, the result shows that the voting classifier has achieved the highest accuracy of 99.7% for the Thermostat, GPS Tracker, Garage Door, and Modbus dataset. Furthermore, these results are presented in the form of a confusion matrix, as illustrated in Table 6.

The accuracy obtained through the employed classifier against each dataset is presented in Table 7. The random classifier achieved maximum accuracy, that is, 99.7% on various datasets of the ToN-IoT telemetry dataset.

Several ways are used to protect communication protocols [32, 33] and devices [34]. The summary of existing methods based on IDS for IoT is reported in Table 8 for comparing purposes.

5. Conclusion and Future Work

Currently, several cybersecurity checks are implemented to maintain the security and privacy of IoT networks. Hence, this paper has presented an AI-based model for intrusion detection using seven datasets of TON-IoT telemetry datasets for IoT networks to include the contribution in this regard. The proposed model observes traffic across the IoT-based system and forecasts any possible intrusion using embedded artificial intelligence. The proposed model is trained and tested on seven datasets (Thermostat, GPS Tracker, Garage Door, and Modbus datasets) from the ToN-IoT dataset and achieved 99.7% accuracy using a voting classifier.

Many efforts are still required to develop a smart city fully equipped with IoT-based sensors for secure and significant monitoring of all threats. Designing and building such security and privacy procedures for IoT appliances is necessary, making it a core element of any network. We propose to fuse the seven datasets of the TON-IoT dataset with various deep learning models as future work.

Data Availability

For experiments, in this work, we used a dataset known as the ToN-IoT Telemetry dataset [25], which can be retrieved at the ToN-IoT repository [26].

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the Artificial Intelligence and Data Analytics Lab (AIDA), Prince Sultan University, Riyadh, Saudi Arabia. The authors are thankful for the technical support.