Abstract

5G and beyond (B5G) applications generate tremendous computing-intensive, latency-sensitive, and privacy-sensitive tasks, which differ from the legacy cloud computing tasks, requiring more sophisticated scheduling strategies. We must satisfy the stringent service requirements, particularly privacy preservation that has not been sufficiently considered in the past. Meanwhile, we need to balance the tasks offloaded to different edge nodes to avoid overwhelming some fog nodes, which may degrade the overall performance. To appropriately schedule the privacy-sensitive tasks while balancing the traffic load, we define IoT tasks according to their security need, processing time, and real-time requirement and segment IoT tasks into smaller pieces based on their privacy levels. The sliced tasks are scheduled to multiple fog nodes with satisfactory security reputations to avoid a compromised fog node handling a whole task. Meanwhile, we consider the constraint of the response time of all available fog nodes before scheduling IoT tasks to avoid chaos task scheduling that may overwhelm some fog nodes. Regarding this, we propose a reinforcement learning (RL) model in which the agent tends to satisfy the required latency and security requirements while avoiding overloading some fog nodes to minimize the average delay. The numerical results demonstrate that the proposed approach performs well in a better-balanced load and less performance violation in latency and security.

1. Introduction

With the mature of 5G technologies that support pervasive Internet connections anywhere at any time, the number of Internet users will reach 5.3 billion, almost 71 percent of the population, in 2023 [1]. Most of them use wireless devices, such as smartphones, tablets, smartwatches, home meters, and wearable devices, proliferating numerous mobile IoT data that require real-time and secure services. These services are pretty different from the legacy cloud data services, thus soliciting the fog nodes at the edge rich in computing, storage, and bandwidth resources, to satisfy the emerging service demands of IoT data [2, 3].

Efficiently offloading IoT tasks to fog nodes while satisfying the QoS requirement with minimized fog resources is of primary concern. Motivated by the fact that attacking cloud servers results in significant data leakage, some literature has considered fragmenting IoT tasks to increase privacy measures. For instance, authors in [4] sliced an IoT task into three pieces with different sizes and schedule them to the cloud, fog, and local machine, respectively. Each of the sliced pieces lacks a part of crucial information, thus increasing the privacy of overall data even if a portion of data is being compromised. To avoid data leakages from location-based services (LBS) and keep the actual data trajectory, dummy trajectories were sent along with encrypted data to LBS for processing and getting results at fog nodes [5]. A decentralized blockchain-based technique is proposed in [6] utilizing the edge nodes to assist the burdens of cluster heads formed, aiming to minimize the system latency while maximizing the data safety of the user.

Another crucial issue is to avoid overwhelming some fog nodes with numerous IoT tasks while leaving some other fog nodes underutilized, which would degrade the overall performance of fog networks. Local fog managers and SDN controllers have been utilized to balance the load both locally and globally by splitting each fog node into several levels, with each level indicating a load threshold at that fog node. Fog managers monitor the load levels of fog nodes belonging to their respective clusters and distribute the tasks accordingly [7]. Some works have considered offloading the task to the nearest available fog node for processing first and then to the cloud if the fog node is overloaded [8].

To satisfy the privacy requirement while balancing the load of fog nodes, this paper fragments each IoT task according to their security level and offloads each sliced task piece to the fog node with the least response time and satisfactory security reputation. For this purpose, we categorize the fog nodes and IoT tasks into three different security levels, i.e., low, medium, and high. The processing capacity of fog nodes and processing requirement of IoT tasks are categorized in a similar way. A fog node with a higher processing capacity can process IoT tasks in less time than the one with low processing capacity.

To achieve the above ideas, this paper proposes to deploy a reinforcement learning agent at one of the edge servers in the fog layer. Before scheduling an IoT task to any fog node or cloud, the agent slices the task into several fragments based on the security requirement and size of the IoT task. Only one sliced piece can be scheduled to the same fog node. If the agent schedules an IoT task piece to a fog node with a lower security level than the fragment requires, it is considered a security violation, and the agent will receive a substantial negative penalty for such action. Moreover, the agent also monitors the required response time to serve the task piece by all the available fog nodes at that time step and selects the node with a lesser response time, thus avoiding overloading only the nearest fog nodes. Regarding this, we propose a Q-learning-based algorithm, which guides the agent to schedule the incoming IoT task pieces to fog nodes or cloud, considering the security, real-time, and bandwidth requirements while avoiding overwhelming some fog nodes.

The remainder of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents the preliminaries, including the system model and description, the reinforcement learning environment, and the proposed Q-learning-based algorithm. Section 4 defines the simulation environment and discusses the simulation results. Finally, Section 5 concludes the work.

Minimizing the delay while scheduling IoT tasks to fog nodes has been the research focus of many existing works. Chiti et al. in [9] proposed a distributed algorithm to optimally select fog nodes based on matching theory, aiming to minimize the maximum total task completion time to achieve efficient task offloading for real-time applications with considerations on communications and computational costs. Tran-Dang et al. proposed the adaptive resource-aware task offloading scheme to minimize the average delay by selecting an optimal policy to determine the most appropriate fog node with available resources for task offloading. Yousefpour et al. reduced the service delay when offloading the tasks to fog nodes by considering a generalized model with flexible topology [10]. Wang et al. have proposed an imitation learning-based approach to provide the most updated information to the vehicular network by minimizing the Age-of-Critical-Information (AoCI) taking advantage of the edge nodes [11].

Mobile fog computing using unmanned aerial vehicles (UAVs) as fog nodes has been widely discussed in the past few years. In [12], Ning et al. considered offloading tasks to UAV-based edge servers that are deployed to minimize the total cost. Authors in [13] considered grouping the IoT users to different clusters and providing computing services through UAVs serving as mobile edge nodes. In [14], Wang et al. proposed an imitation learning-based approach to deploy UAVs as mobile edge nodes and addressed the issue of deploying UAVs belonging to different service providers in a shared area such that the profits of the owners along with the utilities of users are maximized.

Some literature also addressed the energy efficiency issue while offloading the tasks to fog nodes. Zhang et al. proposed an energy-minimizing algorithm that selects a fog node for task offloading, considering the energy consumption, history about the average energy usage of fog nodes, and priorities of fog nodes [15]. Yang et al. maximized the energy efficiency of fog networks consisting of fog nodes with different computing resources by proposing the maximal energy-efficient task scheduling algorithm [16]. Jalali et al. compared the nanodata centers (nDCs) used in fog computing with the centralized cloud data centers. They pointed out that energy efficiency is impacted more by the application types, the type of access networks attached to the nanoservers, and the active and idle time of nanoservers, rather than the number of hops [17].

3. System Description and System Model

3.1. System Description

A three-layer integrated fog and cloud framework is considered as shown in Figure 1. The bottom IoT layer consists of IoT devices such as tablets and smartphones, which generate IoT tasks with different requirements in terms of their nature, such as real time or not, security and privacy preservation level, and bandwidth-intensity.

The middle fog layer comprises fog nodes with different capabilities, such as different security credits, computing and storage capability, and mobility. Specifically, their security reputation and processing capability can be categorized into high, medium, and low. The processing ability of a fog node affects the task execution time when a task goes from the task waiting queue to the processing stage. On the other hand, the reinforcement learning (RL) agent is deployed in the edge node, which is responsible for scheduling and offloading IoT tasks to other fog nodes or the cloud for processing.

The top cloud layer is assumed to have sufficient computing and storage resources to serve all the IoT tasks but cannot satisfy the real-time requirement due to its long distance from the IoT users. Serving an IoT task with the real-time requirement by the distant cloud is deemed as a delay violation.

Fog nodes can serve IoT tasks with real-time requirements and security levels less than the security reputation of the fog nodes. We slice a single IoT task into smaller pieces based on their security requirement to help in privacy preservation. A fog node with an equal or higher security reputation than the security level of an IoT task piece can provide service. Moreover, a single fog node is not allowed to serve more than one piece of a sliced IoT task. IoT task pieces can be offloaded to either the cloud or fog nodes flexibly only if their requirements can be satisfied. The objective of task offloading is to minimize the average end-to-end (E2E) delay, avoiding overwhelming some fog nodes to balance the traffic load among fog nodes. We treat the average response time of the whole iCloudFog as the vital metric for evaluating the network load balance. The reason behind this is that if there are some overloaded fog or cloud nodes, their response time may become very considerably large, leading to a high average response time.

The response time is determined by the E2E delay, consisting of the processing time, queuing delay, propagation time, and transmission time when serving an IoT task. Therefore, we always consider the E2E delay before scheduling any task and avoid offloading IoT task segments to nodes requiring long response time, which in turn helps balance the overall traffic load. Based on the above assumptions, this paper tackles the problem of IoT task scheduling by proposing an intelligent solution based on Q-learning that meets the requirements of security, real-time, etc., of IoT tasks while minimizing the average E2E delay of iCloudFog when offloading IoT tasks for load balancing.

3.2. System Model

Table 1 shows the notations of sets, parameters, variables, and their descriptions used for describing the system. Cloud with sufficient computing, storage, and network resources is represented by and can process all but the real-time tasks. We use the sets of and to denote the fog nodes and IoT tasks, respectively, where and are integer indexes.

Each fog node in the set is a four tuple, i.e., , where , , , and are the available computing/storage resources, security level, processing power, and response time of the fog node . Specifically, the value of 0, 1, or 2 for means fog node has low, medium, or high security reputation; the value of 0, 1, or 2 of indicates low, medium, or high processing capability of fog node .

Similarly, each IoT task element in the set is a three tuple characterized by quality-of-service (QoS) requirements, i.e., , where , , and denote the requirements of security level, real-time service, and required computing/storage resources. Specifically, the value of 0, 1, or 2 for indicates the low-, medium-, and high security requirement; is one when task has the real-time requirement and zero vice versa. The value of 0, 1, or 2 for indicates task is low, medium, or high computing-intensive. denotes the maximum delay requirement of task.

4. RL Model for Load Balancing and Privacy-Aware Task Offloading

The objective is to minimize the average delay of task offloading to balance the traffic load of fog nodes and meanwhile satisfy various QoS requirements, especially the privacy requirement. We assume that a fog node with a high traffic load suffers from a long response time to handle new tasks, and hence, the response time of a fog node determines the probability of it being selected to provision a new IoT task. The problem is formulated as a Markov decision process (MDP) model which can be solved by the reinforcement learning- (RL-) based algorithm proposed in this paper. The proposed RL MDP model is represented by a quadruple , where , , , and define a finite set of states, a finite set of actions, a transition probability matrix for every action in each state, and the reward associated with every state [18], respectively.

4.1. State

The state of fog node at time slot is determined based on its available resources , security level , response time , and the response time of cloud . Based on the assumption that the cloud always has sufficient resources to serve tasks, the cloud resources do not affect the system state and thus is not considered here. Nonetheless, the response time of the cloud is vital for task scheduling and contributes to determining the system state. The state of fog node consisting of a couple of substates at time slot is represented as follows:

The system state space is comprised of the states of all fog nodes and the response time of the cloud, which is represented by Equation (2), where is the total number of fog nodes.

Assume the state of available resource of fog node changes from to in time slot and is the set of possible states of the available resources; the transition probability of the available resource state, i.e., , is defined as follows:

4.2. Action

The system action space consists of all the actions that the agent can take at a particular state. The action for IoT task in time slot can be chosen from a combination of available fog nodes and cloud while avoiding selecting the same fog node. This is based on the assumption that an IoT task can be sliced into pieces and each node cannot provision more than one of the pieces. The number of the combination based on the fog nodes and cloud is defined as follows: where means the total available fog nodes plus the cloud. is the number of fragments that IoT task has been sliced. Hence, the action space can be represented as follows: where is the number of combination of fog nodes and cloud obtained from Equation (4).

4.3. State Transition

When the agent takes an action in state of the system at time slot , the system will change its state from to , and the transition probability matrix of the system is defined as follows:

4.4. Reward

When the reinforcement learning (RL) agent explores the environment, it learns from the reward fed back by the environment when it takes some action from the action set in a particular state, resulting in the system’s transition to the next state [19]. The reward of our agent on each time step is determined by the response time of fog node and its security level deciding how well it can serve an IoT task with security requirement, denoted by as follows: where is the negative or positive reward received by the agent based on the security level of fog node and security requirement of IoT task ; , , and are constant values where ; and are binary parameters, where indicates a delay violation, meaning the delay requirement of IoT task (fragment) cannot be satisfied by fog node , and indicates a security violation, meaning the security credit of fog node is lower than IoT task (fragment) requires and vice versa; is the negative or positive reward that the agent gets based on the response time of fog node ; and therein is the response time of fog node for provisioning IoT task at time slot . where , , and are the transmission delay, propagation delay, and processing delay, respectively, when scheduling task to fog node , as defined in Equations (11), (12), and (13). Transmission delay is calculated by dividing task size over the link bandwidth ; propagation delay is calculated by dividing the distance between fog node and the source device of task , i.e., , over the light speed ; processing delay considers the task arrival rate and service rate at fog node .

Based on Equations (7), (8), and (9), the agent will receive a positive reward if it schedules the task to a fog node that satisfies the task security requirement. Nonetheless, if it schedules the task to a fog node with lower security level than the task requirement, causing a security violation, i.e., , the agent will be penalized with a large negative reward .

In addition, the agent will receive a positive reward equal to the response time of fog node , i.e., , if there is no delay violation when scheduling IoT task . In case of delay violation, i.e., , the agent will then select the fog node with the least response time at that particular time step , avoiding scheduling the task to a busy fog node with heavy load and thus balancing the system load. The considerable negative penalty for delay violation forces the agent not to schedule task to a fog or cloud node whose response time exceeds the delay requirement of IoT task .

To avoid greedy decisions when achieving the load-balanced environment, the long-term reward is considered rather than short-term rewards by considering the expected cumulative future discounted reward for fog node at time step , i.e., , computed as follows: where is the future time slot when the reward is determined and is the discounting factor. Therefore, the agents try to select those actions that can maximize the sum of rewards it will receive [19]. Specifically, based on Equation (14), the agent determines the discounted cumulative future reward for all the fog nodes in its vicinity and activates the fog node with the highest reward. The optimal fog node can be defined as follows:

4.5. Q-Learning Policy

We use the most used RL method, i.e., Q-learning, to obtain the policy. The goal is to enable the agent to learn from the environment and select the best action under a particular state to maximize its rewards. The policy dictates the agent which action to select at some particular state. To keep a balance between exploration and exploitation, we use policy [19] in our model as follows: where is the probability of taking a random action in an policy. Exploration allows the agent to explore the environment by trying multiple actions and learning the best results for those actions. After learning next time, i.e., , it exploits those values to schedule optimally. In Q-learning algorithm, the Bellman equation [19] is used to update the state-action paired values of the Q-table: where is the learning rate that determines how much our agent cares about the previously learnt information. If , the agent will override the most recent value rather than learning from past values. A too-small will result in slow learning; will restrict the agent from updating the old values. is the discount factor, defining the nature of the agent regarding long-term or short-term rewards while making decisions.

4.6. Q-Learning-Based Algorithm for Scheduling

This section introduces the proposed Q-learning-based algorithm as shown in Algorithm 1 in the pseudocode. It takes as input the system state space , action space , states of available resources , set of IoT tasks , set of fog nodes , and cloud . The objective of the algorithm is to select an optimal action for the agent so that the best fog node or cloud for offloading the IoT task is picked. Regarding this, the output of our algorithm is the optimal policy and Q-table values.

For IoT task , the agent first converts the task into number of fragments based on the security requirements and task size as shown in line . The agent then takes an action by selecting a combination of the fog nodes or cloud to serve all the fragments of IoT task from the available fog nodes and cloud server. The agent receives the response time, available resources, security level, and processing power of each selected fog node and then calculates the reward based on the response time and security adherence the selected fog node can provide to the IoT task. Then, the agent updates the Q-table values, available resource state, and system state.

Input:
Output: Delay violations, security violations, optimal Q-table values and policy
1 while All IoT tasks are scheduled do
2 Obtain by slicing the IoT task based on security requirement and task size ;
3 Use policy Equation (16) to select action ;
4 Get response time of selected fog node using (10).
5 Get of selected fog node .;
6 Calculate reward using (14)
7 Use (17) to update the Q-table;
8 Update the available resource state , system states , time step
9 end while

5. Performance Evaluation

5.1. Simulation Environment

We used a Jupyter Notebook on an Intel Core i7 system with 16-GB RAM and an Nvidia GTX-1650 dedicated graphics card to evaluate the Q-learning algorithm. Table 2 presents the parameters and their values used for simulation. Assume nine fog nodes and one cloud server in the environment. The number of tasks varies from 100 to 500. The maximum delay threshold for IoT tasks is set in a range of 50 ms to 500 ms. Link bandwidth of the link from IoT task source to fog nodes is assumed to be 1 Gbps, while the link between fog nodes and cloud is assumed to be 4 Gbps [20]. The sizes of IoT task are distributed as 10% ranging from 500 Mb to 1 Gb, 20% ranging from 500 Kb to 1 Mb, and 70% ranging from 1 to 50 Mb. The distance between IoT task source devices and fog nodes is assumed to be within 1000 m. The resource units of fog nodes range from 75 to 85, while the resource units required by the IoT tasks range from 5 to 9.

6. Results and Analysis

Figure 2 evaluates the system load balancing in terms of the average delay under different numbers of IoT tasks, i.e., 100 to 500, for schemes with and without IoT task slicing based on their security requirements. We consider the ratio of IoT tasks with low, medium, and high security levels as 1 : 1 : 1 and 1 : 1 : 3, respectively. It can be noticed that the average delay of the proposed approach is larger as compared to the approach that does not consider the IoT task slicing. The reason is that the task, when sliced, should be scheduled to different fog nodes, suffering from more complexity due to satisfying the delay and security requirement of each IoT task piece and thus leading to longer overall average delay. Nonetheless, the delay is still within the required range of general real-time tasks. Moreover, the delay for the tasks with a higher ratio (i.e., schemes of 1 : 1 : 3) of high security requiring tasks is larger than the one with equal distribution of low, medium, and high security requiring tasks (i.e., schemes of 1 : 1 : 1). This is due to security constraints in place, since the agent must schedule the high security requiring tasks to the fog nodes with high security levels only, which are lesser in number. Hence, the tasks must be queued first at those fog nodes, which add to the average delay.

Figure 3 compares the average end-to-end delay for schemes with and without IoT task slicing and load balancing considerations. It is evident that the task offloading schemes taking load balancing into consideration, i.e., nonsliced with load balancing and sliced with load balancing, have a lesser average delay than those without a load balancing approach, i.e., nonsliced w/o load balancing and sliced w/o load balancing. The difference is not that much for a small number of tasks, but as the number of tasks increases from 300 to 500, the difference becomes more considerable.

Figure 4 shows the performance of security violations for schemes with and without security considerations. The result figure shows that if not consider the security requirement, the agent will end up scheduling higher or medium security requiring tasks to fog nodes with lesser security credit, reflected by high ratio of security violations with increasing IoT task numbers, which threatens the task privacy. Moreover, the security violations increase when the ratio of high security requiring tasks increases from 1 : 1 : 1 (33%) to 1 : 1 : 3 (60%). On the other hand, our proposed model performs much better with even 60% high security requiring tasks.

Figure 5 shows the ratio of delay violations for schemes with and without load balancing considerations for IoT tasks with low to medium to high security ratios of 1 : 1 : 1 and 1 : 1 : 3, respectively. For both ratios, it is evident that our proposed model with security constraints and load balancing approach performs better with fewer delay violations than the approach that does not consider load balancing.

7. Conclusion

This paper proposed an IoT task offloading algorithm based on Q-learning that forces the agent to schedule the incoming IoT tasks to fog nodes with lightweight traffic loads and satisfactory security credit; while in task scheduling, the agent fragmentedanIoT task first based on its security requirement and size and then scheduled the task fragments to fog nodes with equal or higher security credits than the task fragments’ security level. During this process, the agent tried to balance the load of the network by considering the response time of fog nodes for each particular IoT task, which reduced the overall average delay while ensuring security constraints.

Data Availability

The data of our paper is generated in our own lab based on the requirements of our simulation scenarios and we have not used any public data. Nonetheless, we declare that once our paper is accepted, we will share our data by uploading it to the submission portal as a supportive file or anywhere required by the WCMC journal.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (Ministry of Education) (Grant No. 2020R1I1A3072688).