Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing

Huang, Binbin; Xiang, Yuanyuan; Yu, Dongjin; Wang, Jiaojiao; Li, Zhongjin; Wang, Shangguang

doi:https://doi.org/10.1155/2021/5532410

Security and Communication Networks

On this page

Abstract Introduction Related Work System Model Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Security Threats to Artificial Intelligence-Driven Wireless Communication Systems 2021

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 5532410 | https://doi.org/10.1155/2021/5532410

Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing

Binbin Huang,¹Yuanyuan Xiang,¹Dongjin Yu,¹Jiaojiao Wang,²Zhongjin Li,¹and Shangguang Wang³

Academic Editor: Xiaolong Xu

Received02 Mar 2021

Accepted15 May 2021

Published25 May 2021

Abstract

Mobile edge computing as a novel computing paradigm brings remote cloud resource to the edge servers nearby mobile users. Within one-hop communication range of mobile users, a number of edge servers equipped with enormous computation and storage resources are deployed. Mobile users can offload their partial or all computation tasks of a workflow application to the edge servers, thereby significantly reducing the completion time of the workflow application. However, due to the open nature of mobile edge computing environment, these tasks, offloaded to the edge servers, are susceptible to be intentionally overheard or tampered by malicious attackers. In addition, the edge computing environment is dynamical and time-variant, which results in the fact that the existing quasistatic workflow application scheduling scheme cannot be applied to the workflow scheduling problem in dynamical mobile edge computing with malicious attacks. To address these two problems, this paper formulates the workflow scheduling problem with risk probability constraint in the dynamic edge computing environment with malicious attacks to be a Markov Decision Process (MDP). To solve this problem, this paper designs a reinforcement learning-based security-aware workflow scheduling (SAWS) scheme. To demonstrate the effectiveness of our proposed SAWS scheme, this paper compares SAWS with MSAWS, AWM, Greedy, and HEFT baseline algorithms in terms of different performance parameters including risk probability, security service, and risk coefficient. The extensive experiments results show that, compared with the four baseline algorithms in workflows of different scales, the SAWS strategy can achieve better execution efficiency while satisfying the risk probability constraints.

1. Introduction

In recent years, with the explosive growth of smart devices (such as smart cameras, smart glasses, smart bracelets, and smart phones), a large number of advanced mobile applications (such as real-time navigation systems, interactive online games, virtual reality, and augmented reality) are emerging rapidly. In order to efficiently process these mobile applications, mobile devices need to be equipped with abundant computing resources and battery capabilities [1, 2]. However, due to the limited size of mobile devices, they are usually resource-constrained. Therefore, the conflict between the ever-increasing resource requirements of mobile applications and the limited resource capabilities of mobile devices brings great challenges to execute these mobile applications.

Mobile Edge Computing (MEC) as a new computing paradigm brings remote cloud resource to the edge servers nearby mobile users, enabling mobile users to offload partial or all computation tasks of mobile applications to edge servers for collaborative execution, and thereby greatly alleviating the conflict between resource supply and demand, effectively reducing the application completion time and the mobile devices’ energy consumption [3–5].

Many mobile applications are typical workflow models, and they consist of a sequence of precedence-constrained tasks. For example, a video streaming-based face recognition application mainly consists of motion detection and face recognition. The face recognition further consists of face detection, image preprocessing, feature extraction, and classification [3, 6]. In mobile edge computing, workflow application scheduling has a higher complexity in comparison to independent task scheduling [7–9]. In addition, it also faces two challenges for workflow application scheduling in mobile edge computing as follows. One is the edge environment dynamics, such as the time-varying channel quality and workload of edge servers, which can impact the workflow application scheduling decision. The other is the security problem of workflow application scheduling. Due to the open nature of the edge environment, the edge servers that aggregate an amount of user data frequently suffer from malicious attacks such as data leakage and tampering, which pose a serious threat to successfully execute these offloaded tasks [10–13]. Hence, it needs to employ various types of security services to effectively defend against the hostile attacks and protect these offloaded tasks. However, employing security services inevitably incurs additional security overhead, which will increase the completion time of workflow application. Therefore, it is a big challenge to design an efficient security-aware workflow scheduling scheme to reduce the completion time of workflow application while satisfying its security requirement.

To meet the aforementioned challenges, this paper formulates the security-aware workflow scheduling problem in MEC to be a Markov Decision Process (MDP) [14]. The environment state, which consists of the task list on each edge server, the workloads on each edge server, and the channel states between the mobile device and the edge servers, can be observed. Based on the environment state, the task nodes of the workflow are dynamically scheduled to edge servers. The deep reinforcement learning algorithm is suitable to solve decision-making problems with unknown prior knowledge [15–19]. To solve this problem, this paper proposes a deep reinforcement learning-based security-aware workflow scheduling scheme (SAWS). Its main objective is to optimize the completion time of workflow while satisfying its security requirement. To evaluate the effectiveness of the SAWS scheme, this paper implements average workload minimization (AWM), maximum SAWS (MSAWS), Greedy, and HEFT baseline algorithms. We compare the SAWS scheme with these four baseline algorithms under different risk probabilities, different security services, different risk coefficients, different edge server’s computing capacities, and different number of edge servers. The experimental results demonstrate that the SAWS strategy can optimize the completion time of workflow application while satisfying the risk probability constraint. The main contributions of this paper can be summarized as follows: This paper focuses on the security problem of workflow scheduling in a dynamic edge computing, which is more complex than independent task scheduling. This paper formulates the security-aware workflow scheduling problem in mobile edge computing to be a finite Markov decision process, and its main objective is to minimize the completion time of workflow while satisfying the risk probability constraint. This paper proposes a deep Q-network-based security-aware workflow scheduling (SAWS) scheme to solve the workflow scheduling problem in a dynamic edge computing environment with malicious attacks. Extensive experimental results demonstrate that the SAWS scheme can greatly reduce the completion time of workflow application while satisfying the risk probability constraint.

The rest of this paper is organized as follows. In Section 2, the related work is summarized. In Section 3, the system model and problem formulation for security-aware workflow scheduling in MEC are presented. In Section 4, the deep reinforcement learning-based security-aware workflow scheduling scheme is described in detail. In Section 5, the simulation parameters are settled, and the experimental performance is analyzed. In Section 6, the work of this paper is concluded.

The task offloading problem in the MEC has been studied in a lot of works. According to different optimization goals, these works can be classified into three categories. The first one is task offloading with the goal of optimizing the mobile device’s energy consumption. For example, Huang et al. [7] propose a security and cost-aware task offloading scheme based on deep reinforcement learning for task offloading in single-user multiserver scenarios. Its main goal is to minimize the task processing delay and mobile device energy consumption while satisfying the security requirement for task. Chen et al. [20] formulate task offloading problem in single-user single-server scenario to be a stochastic optimization problem and decompose this problem into two deterministic optimization subproblems. To solve these two subproblems, a TOFEE algorithm is proposed to optimize the mobile device’s energy consumption. Wu et al. [21] propose a Lyapunov optimization-based energy-efficient task offloading scheme to determine the operating position of the application, the objective of which is to minimize the average energy consumption of mobile devices while satisfying the average response time constraint. The second one is task offloading with the goal of optimizing the task processing delay. For example, Chalapathi et al. [22] propose a task scheduling scheme to solve the task offloading problem in multiple cloudlets, aiming at minimizing the task processing delay. Xu et al. [23] design an adaptive task offloading scheme, which leverages decomposition-based multiobjective evolutionary algorithms to generate feasible solutions, to optimize the task processing latency and resource utilization of edge system. The third one is task offloading with the goal of optimizing the weighted sum of the mobile device’s energy consumption and the task processing delay. Wu et al. [24] propose a Lyapunov optimization-based energy-efficient task offloading scheme to control the computational and communication overheads and further choose optimal computational location for the application to minimize energy consumption and task processing time. However, all above works mainly focus on the independent task scheduling in MEC. The task nodes of workflow are precedence-constrained. The above schemes are not suitable for workflow scheduling.

To further study the workflow scheduling problem in MEC, Xu et al. [25] construct a multiresource energy consumption model to solve the unity problem for traditional energy consumption model and propose a particle swarm algorithm-based energy-efficient multiresource workflow scheduling algorithm. Its main objective is to reduce the energy consumption of mobile devices while satisfying the completion time constraint for workflow. Wu et al. [26] construct a weighted resource sum graph based on resource consumption and further design a novel cost-efficient partitioning scheme, the objective of which is to find the optimal partitioning scheme to reduce execution time and energy consumption. Zhu et al. [27] formulate the workflow scheduling problem in MEC to be a joint optimization problem of energy consumption and time delay and adopt the deep Q network algorithm to solve the optimal scheduling scheme. However, the execution order of the workflow is assumed in advance, and how to calculate the execution order of workflow with precedence constraints is not introduced. In addition, this paper does not pay attention to the security problem of workflow scheduling in MEC. Liu [28] proposes a novel maximum probability function and deep Q network-based multiworkflow scheduling scheme to solve the scheduling problem in multiuser edge computing environment, which can find a high-quality workflow scheme in a dynamic environment. However, this paper does not pay attention to the security problem of workflow scheduling in dynamic MEC. Therefore, all the above scheduling schemes are not suitable for security-aware workflow scheduling in dynamic mobile edge computing.

With the escalation of data security threats in mobile edge computing [10–12, 29, 30], a lot of related works have taken some measures to protect security-critical applications and the large amount of data generated in mobile devices from malicious attacks. Huang [6] designs a workflow scheduling scheme based on Genetic Algorithms to minimize the mobile device’s energy consumption under the completion time of workflow and risk probability constraints. Elgendy et al. [11] design a multidevice and single-server cooperative task offloading scheme to solve the security-aware multiuser resource allocation and task offloading problem. The goal is to minimize the time delay and energy consumption of the whole system. Jia et al. [31] design an identity-based anonymous authentication key agreement protocol to ensure the security of sensitive data in MEC. He et al. [32] design a security mechanism based on adaptive algorithms to solve the security problem of IoT applications in mobile edge computing. Chen et al. [33] propose a malicious application detection method based on deep learning on mobile devices, which greatly improves the security of mobile edge computing. Xu et al. [34] design a secure service offload approach to promote Internet of vehicles service utility and edge utility while ensuring privacy security in software-defined networks enabled edge computing. Xu et al. [35] adopt a location-sensitive-hash (LSH) method to encrypt the feature information for the offloaded services and further design s LSH-based offloading scheme, the goal of which is to minimize the energy consumption and response time of all services while guaranteeing the service security. All above researches mainly design security strategies from different points to ensure the security of edge computing, and they do not pay attention to the security problem of workflow scheduling in a dynamic edge computing with unknown prior knowledge. Aiming at this problem, this paper mainly focuses on security-aware workflow scheduling problem in dynamic mobile edge computing environment with security threats.

3. System Model and Problem Formulation

In this section, we first introduce the mobile edge computing model, security cost model, communication model, and risk probability model in mobile edge computing environment, respectively, and then describe the security-aware workflow scheduling problem in detail.

3.1. Mobile Edge Computing Model

As illustrated in Figure 1, we consider a mobile edge computing system, which consists of a mobile device and edge servers The mobile device can be denoted by a two-tuple , where denotes the CPU frequency of the mobile device, and denotes the number of CPU cores of the mobile device. Due to the limited computing resources and battery capacity of mobile device, the workflow applications (such as a video streaming-based face recognition application) running on mobile device can be scheduled to edge servers through wireless network. Each edge server can be denoted by a two-tuple , where denotes the CPU frequency of the th edge server, and denotes the number of CPU cores of the th edge server. Each edge server has an execution queue that is used to store the tasks scheduled to the th edge server.

Each mobile application can be abstracted into a workflow model, which can be denoted by a directed acyclic graph (DAG) , in which denotes a set of task nodes, and denotes a set of edges between task nodes. Each task node can be characterized by a three-tuple , in which denotes the workload (CPU Cycles) of task node , denotes the input data size (MB) of task node , and denotes the output data size (MB) of task node . The edge represents the precedence constraint between task nodes. This means that task can be executed only after task is executed. The system time is logically divided to equal length time slots, and the time slot duration is . The index sets of time slots can be denoted by . At the beginning of each time slot, a task node in workflow is scheduled to the edge server.

3.2. Security Cost Model

The task nodes scheduled to edge servers are vulnerable to suffer from stealing and tampering security threats. In order to guard against these security threats, these task nodes need to employ encryption service and integrity service [36–38], respectively. Referring to the literature [38], encryption services mainly include IDEA, DES, Blowfish, AES, and RC4 algorithms. Each encryption algorithm has its own security level and encryption speed, which can be found in Table 1. The different encryption algorithms with different security levels can be flexibly selected to protect data from being stolen. Integrity services mainly include TIGER, RipeMD160, SHA-1, RipeMD128, and MD5 hash functions. Each hash function has its own security level and hash speed, which can be found in Table 2. The different hash algorithms with different security levels can be flexibly selected to protect data from being tampered. By flexibly selecting different encryption and hash algorithms with different security levels, an integrated security protection is formed to protect against security threats.

To ensure the security of task nodes scheduled to edge servers, the integrated security protection consisting of encryption and hash algorithms with different security levels needs to be employed. However, different security protection leads to different security cost. When the task node in the workflow is scheduled to the th edge server, the total encryption cost on the mobile device can be calculated by [6]where . When the task node is scheduled to the th edge server, and denote the security levels of the encryption service and integrity service, respectively. denotes the encryption speed of encryption service with encryption level . denotes the hash speed of integrity service with security level . When the edge server receives the task , it first decrypts the task and the total decryption cost can be calculated by

3.3. Communication Model

Due to the user’s mobility, the channel state between the mobile device and different edge servers is dynamically changing. We assume that the channel state between the mobile device and the edge servers is constant in each time slot and is dynamically changing in different time slots. In each time slot , the transmission rate between the mobile device and the th edge server can be calculated by where denotes the transmission bandwidth between the mobile device and the th edge server, denotes the transmission power of the mobile device, denotes the wireless channel gain between the mobile device and the th edge server, and denotes the Gaussian white noise power.

3.4. Risk Probability Model

To measure the risk degree of the task nodes scheduled to edge servers, it is necessary to establish a risk probability model to quantify the risk probability of these tasks.

Without loss of generality, referring to the literatures [36–38], the malicious attacks of data leakage and data tampering on the th edge server are assumed to follow Poisson’s distribution with parameters and . Therefore, the task node in the workflow is scheduled to the edge server , and the risk probability of data leakage or data tampering can be calculated by [6, 38]

Based on the above the description, when the task in the workflow is scheduled to the edge server , the risk probability of the task suffering from these two malicious attacks can be calculated by

When the risk probability of each task scheduled to the edge server does not exceed , the risk probability of task execution must meet the following risk constraint:

3.5. Problem Formulation

In this section, we formulate the security-aware workflow scheduling problem in the mobile edge computing to be a Markov Decision Process. We first introduce the sorting strategy of workflow nodes and then define the state space, action space, and reward function of this problem. Finally, the objective function and constraints of this problem are defined.

3.5.1. Sorting of Workflow Nodes

In order to sort all the task nodes in the workflow, we assign a weight to each task node [39]. The value of can be calculated by where denotes the average time of the task node executing on all edge services; denotes the transmission rate between edge servers, where the task node and its successor node are located; denotes the set of all successor nodes of the task node . Since the edge server each task node is scheduled to is not known in advance, the priority of the task node can be calculated by the average time of the task node executing on all edge servers. The priorities of all task nodes in workflow can be calculated by equation (7). According to the priorities of all task nodes, these task nodes can be sorted in descending order.

3.5.2. State Space

In each time slot , the sorted task nodes are scheduled in turn. The edge server each task node is scheduled to is dependent on the system state. The system state in time slot can be denoted bywhere is an n-dimensional vector, denoting the workload states of edge server; denotes the state of the scheduled tasks in edge servers; denotes the channel state between the mobile device and edge servers. Specifically, denotes the workload of the edge server in time slot ; denotes a set of all task nodes scheduled to the edge server in time slot ; denotes the channel state between the mobile device and the edge server in time slot .

3.5.3. Action Space

In each time slot , the system action can be denoted by where is a n-dimensional vector, denoting the edge server the current task node is scheduled to. Specifically, denotes whether the current task node is scheduled to the edge server . If the value of is 1, it denotes that the current task node is scheduled to the edge server ; otherwise, it is the opposite. Note that, in each time slot , the current task node can only be scheduled to a single edge server. Therefore, the system action needs to meet the constraint condition . denotes the security level of the encryption service employed by the task nodes scheduled to edge servers. denotes the security level of the encryption service employed by task node scheduled to the th edge server. denotes the security level of the integrity service employed by the task nodes scheduled to edge servers. denotes the security level of the integrity service employed by the task node scheduled to the th edge server.

3.5.4. Reward Function

In each time slot , given the system state , after taking an action , the immediate reward obtained by system is . The immediate reward is defined aswhere denotes that, in time slot, the task node scheduled by taking the action is . denotes the execution delay of the workflow until the th time slot, and denotes the increment of the workflow execution delay after scheduling the task in time slice .

When the task node is scheduled to the edge server , the latest completion time is needed to be calculated. In order to calculate , it is necessary to calculate the start time of the task node , the encryption time of the task node on the mobile device, the transmission time of the task node transmitted from the mobile device to the edge server , the waiting time of the task node on the edge server , the decryption time of the task node on the edge server , and the execution time of the task node on the edge server . In general, there may be multiple predecessor nodes for a task node . Therefore, in order to calculate the start time of task node , it needs to calculate the maximum sum of the completion time and the transmission time for all the predecessor nodes of the task node . and can be calculated by equations (11) and (12), respectively:where denotes the set of all predecessor nodes of the task node ; is a predecessor node of . is the completion time of the task node ; is the transmission time between the scheduled node and its predecessor node .

When the task nodes are scheduled to different edge servers, they will be exposed to different risk probabilities, thereby incurring different start time and different completion time. Therefore, this paper needs to find an optimal scheduling strategy in a dynamic MEC with security threats, the main goal of which is to minimize the completion time of the workflow while satisfying the risk probability of the task nodes.

The objective of this paper can be denoted by equation (13). The risk probability constraint of the task node can be denoted by equation (14).

Due to the fact that the MEC environment is dynamical, and its state change is unknown (such as the gain state of the wireless channel), it is difficult for traditional optimization methods to solve the security-aware workflow scheduling problem in a dynamic MEC with security threats. However, the deep reinforcement learning algorithm, as a model-free machine learning approach, is good at solving such dynamic stochastic optimization problems. In the next section, the deep reinforcement learning-based security-aware workflow scheduling scheme is introduced in detail.

4. Deep Reinforcement Learning-Based Security-Aware Workflow Scheduling Scheme

The security-aware workflow scheduling problem in a dynamic MEC with security threats is formulated to be a finite Markov Decision Process. The action space of this problem is discrete. To solve the optimal workflow scheduling scheme, this paper proposes a SAWS scheme based on deep Q network (DQN).

As shown in Figure 2, the DQN framework consists of three main functional components: (1) the evaluated Q network: the evaluated Q network is consisting of one input layer, one hidden layer, and one output layer. The number of neurons in the input layer is equal to the number of dimensions of the state, the number of neurons in the hidden layer is taken as 2048 in this paper, and the number of neurons in the output layer is equal to the number of dimensions of the action. (2) The target Q network: the structure of the target Q network is the same as that of the evaluated Q network. To continuously approach the Q function, the parameters of the target Q network are periodically updated by the parameters of the evaluated Q network. (3) The replay memory: the function of replay memory is to store these state transition experiences . A minibatch of state transition experiences are randomly chosen from the replay memory to train the Q network in the direction of minimizing a sequence of the loss function. The detailed processes of deep Q-network-based SAWS scheme are described in Algorithm 1.

	BEGIN
(1)	Initialize the replay memory with the size of , and a minibatch of the state transition experiences with the size of ;
(2)	fordo
(3)	Resetting the system state ;
(4)	for do
(5)	At the beginning of each time slot , the current state of the system is observed;
(6)	Based on the current state , randomly select an action with probability and select the action with the largest value with probability;
(7)	The immediate reward can be calculated and the system state in the next time slot can be observed;
(8)	The state transition experience can be obtained and stored into the replay memory;
(9)	The immediate rewards at each step are accumulatively summed;
(10)	Randomly sample state transition experiences from the replay memory to train the Q network;
(11)	Calculate the expectation of the mean-squared error between the current evaluated value and the target value :
(12)	end for
(13)	end for

During the training stage, the system state in each time slot is first observed and fed into the evaluated Q network. Then, the evaluated Q network computes the evaluated Q values for all possible actions corresponding to the system state . The action with the largest value is chosen with probability, and the action is chosen randomly with probability, and the immediate reward can be calculated. Next, the system state in the next time slot can be observed, and the state transition experience can be obtained and stored into the replay memory with size . Finally, a minibatch of samples are randomly selected from the replay memory to train the Q network in the direction of minimizing the loss function and the corresponding network parameters are saved.

The loss function is defined as the expectation of the mean-squared error between the current evaluated value and the target value :

During the testing stage, the system state is first reset, and the learned network parameters are loaded. Then, at the beginning of each time slot, the current system state is observed and fed into the trained neural network. Next, the neural network selects an optimal action for the system state and the corresponding reward is calculated.

5. Experimental Evaluation

To demonstrate the effectiveness of the proposed SAWS scheme in this paper, a lot of comparative experiments can be conducted. In this section, the simulation parameters are first set. Then, MSAWS, AWM, Greedy, and HEFT baseline algorithms are introduced. Finally, the performance of the SAWS scheme in comparison with these four baseline algorithms is analyzed under different simulation parameters.

5.1. Parameter Settings

This paper mainly considers a mobile edge computing system consisting of a mobile user and edge servers. Different workflow applications generated on the mobile device need to be scheduled in a dynamic MEC with security threats. Referring to the literatures [6, 7], the detailed parameter settings in experiment are introduced as follows:(1)The parameter settings of the mobile device: the CPU frequency and the CPU core number of the mobile device are set to and , respectively.(2)The parameter settings of edge servers: the number of edge servers is set to . The CPU frequencies of five edge servers are set to . The numbers of CPU cores are , and , respectively. The risk coefficients of confidentiality service for these five edge servers are and , respectively. And the risk coefficients of integrity service for these five edge servers are and , respectively.(3)The communication parameter settings: the transmission power of each edge server is , the maximum bandwidth is , the white Gaussian noise power is , the path loss constant is , the path loss exponent is , and the reference distance is [6, 7]. The distance between the mobile device and each edge server is .(4)The parameter settings of workflow: the number of task nodes in different workflows is set to 50, 100, and 150, respectively. The out degree or in degree of each intermediate task node is less than 5, and every two task nodes can be connected with 10% probability to form an edge. The workload of each task node is in the range of 1 to . The input data size of each task node is in the range of to , and its output data size is set from to . The maximum risk probability of each task node is .(5)The parameter settings of the neural network: the evaluated Q network is consisting of one input layer, one hidden layer, and one output layer, and the number of neurons in the hidden layer is 2048. The learning rate is 0.003, and the learning discount factor is 0.9. The size of the replay memory is 3000, and the size of the state transition experiences randomly sampled from the replay memory is 64. The maximum value of episodes is set to . The maximum value of steps in each episode is equal to the number of task nodes in workflow.

5.2. Performance Analysis

To demonstrate the effectiveness of the proposed SAWS scheme, this paper implements MSAWS, AWM, Greedy, and HEFT baseline algorithms and compares the SAWS scheme with these four baseline algorithms under different experimental parameters. Average Workload Minimization (AWM): In each time slot, the AWM strategy chooses the edge server with the smallest average workload to schedule the task node. SAWS: This abbreviation represents a security-aware workflow scheduling scheme. Its main goal is to minimize the completion time of workflow while satisfying the risk probability constraint. MSAWS: Based on the SAWS scheme, the security service with the security level 1 is chosen for these scheduled task nodes. Greedy: In each time slot, the Greedy algorithm selects the edge server that enables each scheduled task node to complete at the earliest based on the current environment. HEFT [40]: This abbreviation represents heterogeneous earliest finish time. This algorithm is a workflow scheduling strategy based on list and is widely used in workflow scheduling. It first needs to calculate the priority of task nodes based on their computational and communication costs. Then, the task node is scheduled to the server that can complete it at the earliest.

5.2.1. The Convergence Analysis of SAWS

Three different types of workflows with 50, 100, and 150 task nodes are scheduled by the SAWS scheme. Figure 3 shows their learning curves, respectively. It can be observed that the completion time gradually decreases and tends to be stabilized with the increasing of learning time (i.e., the number of Episodes). This result indicates that the proposed SAWS scheme can learn an optimal policy to schedule workflow applications with different task nodes. The optimal policy can minimize the completion time of workflow while satisfying risk probability constraint. Moreover, as shown in Figure 3, it can be further observed that the completion time of workflow application with 50 task nodes is smallest, that of workflow application with 100 task nodes is medium, and that of workflow application with 150 task nodes is the largest. This is because the larger the scale of the workflow application, the larger the completion time.

(a)

(b)

(c)

5.2.2. The Impact of Different Risk Probabilities

To examine the impact of different risk probabilities on the completion times of different workflows, the risk probability is varied from 0.2 to 1.0 with the increment of 0.2 for workflows with 50, 100, and 150 task nodes, respectively. Figure 4 shows the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms under different risk probabilities for workflows with 50, 100, and 150 task nodes. As shown in Figure 4(a), the completion time of the SAWS algorithm is less than that of the MSAWS, AWM, Greedy, and HEFT algorithms. The main reason is that the SAWS algorithm can learn a security-aware workflow scheduling scheme in a dynamic MEC with security threats. This scheme can make an optimal scheduling decision according to different system states, thereby minimizing the completion time of the workflow while satisfying the risk probability constraint. The AWM algorithm selects the edge server with the least average workload to execute task node; hence, it is difficult to obtain an optimal solution. Although the Greedy and HEFT algorithms select the edge server that enables the task node to execute the task node at the earliest completion, it does not consider the after effect of task scheduling and is difficult to get an optimal solution. The MSAWS algorithm always selects the security service with the security level 1 to encrypt these scheduled task nodes. The MSAWS algorithm can effectively ensure the risk probability but significantly increases the completion time of workflow application. Moreover, we can observe that the completion time of five algorithms gradually decreases with the increase of the risk probability. It is because the greater the risk probability, the lower the security service level employed by task node to ensure its risk probability, and thereby the shorter the completion time of the workflow.

(a)

(b)

(c)

In addition, we can observe from Figure 4 that the completion time of workflow gradually decreases with the increase of the number of task nodes in workflow. The reason for this is the same as discussed in Section 5.2.1.

5.2.3. The Impact of Different Security Services

To evaluate the impact of different security services on the completion times of different workflows, only encryption service or only integrity service is employed by task nodes in different workflows. For simplicity’s sake, only encryption service and only integrity service are denoted by Confi_Only and Integ_Only, respectively. Figure 5 shows that the completion time of Confi_Only and Integ_Only gradually decreases with the increase of the risk probability. It can be explained that the higher the risk probability, the lower the security level employed, the higher the processing rate of the security service, and thereby the shorter the completion time of the workflow. Moreover, it can be further observed that the completion time of Integ_Only is shorter than that of Confi_Only. This is because when the security level of the encryption service is approximately equal to that of the hash service, the processing rate of the hash service is higher than that of the encryption service. At last, it can be observed from Figure 5 that, with the increase of workflow nodes, the completion times of Confi_Only and Integ_Only gradually increase. The reason for this is the same as that discussed above.

(a)

(b)

(c)

5.2.4. The Impact of Different Risk Coefficients

Figure 6 shows the impact of different risk coefficients on the completion times of different workflows. We vary the risk coefficients of stealing and tampering security threats from 0.3 to 3, with the increment of 0.3. We can observe from Figure 6 that the completion time of Confi_Only and Integ_Only gradually increases with the increase of the risk coefficient. It is due to the fact that the task nodes are attacked more frequently with the increase of risk coefficient. In order to satisfy the risk probability constraint, the security service with a higher level is employed, which leads to longer task processing delay and the completion time of workflow. Moreover, we can observe from Figure 6 that the completion time of Confi_Only is higher than that of Integ_Only. The main reason is that when the security level of the encryption service is approximately equal to that of the hash service, the processing rate of the encryption service is lower than that of the hash service, which leads to a longer task processing delay and the completion time of workflow. Finally, we can see from Figure 6 that the completion time of Confi_Only and Integ_Only gradually increases with the increase of the number of the task nodes in workflow. The reason for this is the same as that discussed in Section 5.2.1.

(a)

(b)

(c)

5.2.5. The Impact of Different Edge Server’s Computing Capacities

Figure 7 shows the impact of different edge server’s computing capabilities on the completion time of different workflows. As shown in Figure 7, we can see that the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms decreases with the increase of the number of the CPU cores. The main reason is that the more the CPU cores, the stronger the edge server’s computing capacity, and thereby the shorter the task processing delay. Therefore, the completion time of workflow gradually decreases. In addition, we can further observe from Figure 7 that the SAWS algorithm performs better than the MSAWS, AWM, Greedy, and HEFT algorithms in terms of completion time of workflow. The reason for this is the same as that discussed in Section 5.2.2. Finally, we can observe that the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms gradually increases with the increase of the number of task nodes in workflow. The reason for this is the same as that discussed in Section 5.2.1.

(a)

(b)

(c)

5.2.6. The Impact of the Number of Edge Servers

Figure 8 shows the impact of different number of edge servers on the completion time of different workflows with 50, 100, and 150 task nodes, respectively. To investigate the impact of different number of edge servers on performance, we vary the number of edge servers from 2 to 6 with the increment of 1. As shown in Figure 8, we can observe that the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms gradually decreases with the increase of the number of edge servers. It can be explained that the greater the number of edge servers, the stronger the computing capacity of the whole system, and thereby the shorter the completion time of workflow. Moreover, we can further observe that the completion time of the SAWS algorithm is lower than that of the MSAWS, AWM, Greedy, and HEFT algorithms. The reason for this is the same as that discussed in Section 5.2.5. At last, we can observe that, with the increase of task nodes in workflow, the completion times of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms gradually increase. The reason for this is the same as that discussed above.

(a)

(b)

(c)

6. Conclusions and Future Work

This paper proposes a reinforcement learning-based security-aware workflow scheduling (SAWS) scheme to solve the workflow scheduling problem in a dynamic MEC with security threats. This paper first constructs the mobile edge computing model, security cost model, communication model, and risk probability model, respectively. Then, this paper formulates the security-aware workflow scheduling problem to be a finite Markov Decision Process. To solve this problem, this paper adopts a deep Q network approach to learn an optimal security-aware workflow scheduling policy. The SAWS scheme enables minimization of the completion time of workflows while satisfying the risk probability. To verify the effectiveness of the SAWS scheme, this paper implements the MSAWS, AWM, Greedy, and HEFT baseline algorithms and compares the SAWS scheme with these four baseline algorithms under different experimental parameters such as the risk probability, the security service, the risk coefficient, the edge server’s computing capacity, and the number of edge servers. The extensive experimental results demonstrate the effectiveness of the proposed SAWS scheme.

Data Availability

The experiment data supporting this experiment analysis are from previously reported studies, which have been cited. The experiment data used to support the findings of this study are included within the article. The experiment data are described in Section 5 in detail.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Science Foundation of China (Nos. 62002316, 61802095, 61572162, and 61572251), the Zhejiang Provincial National Science Foundation of China (Nos. LQ19F020011 and LQ17F020003), the Zhejiang Provincial Key Science and Technology Project Foundation (No. 2018C01012), and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (No. SKLNST-2019-2-15).

References

C. Yi, J. Cai, and Z. Su, “A multi-user mobifle computation offloading and transmission scheduling mechanism for delay-sensitive applications,” IEEE Transactions on Mobile Computing, vol. 19, no. 1, pp. 29–43, 2020.
View at: Publisher Site | Google Scholar
T. Q. Dinh, J. Tang, Q. D. La, and T. Q. S. Quek, “Offloading in mobile edge computing: task allocation and computational frequency scaling,” IEEE Transactions on Communication, vol. 65, no. 8, pp. 3571–3584, 2017.
View at: Publisher Site | Google Scholar
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: the communication perspective,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017.
View at: Publisher Site | Google Scholar
Z. Yuezhi and Z. Di, “Near-end cloud computing: opportunities and challenges in the post-cloud computing era,” Chinese Journal of Computers, vol. 42, no. 4, pp. 677–700, 2019.
View at: Publisher Site | Google Scholar
C. Calero, “5Ws of green and sustainable software,” Tsinghua Science and Technology, vol. 25, no. 3, pp. 401–414, 2020.
View at: Publisher Site | Google Scholar
B. Huang, “Security modeling and efficient computation offloading for service workflow in mobile edge computing,” Future Generation Computer System, vol. 97, pp. 755–774, 2019.
View at: Publisher Site | Google Scholar
B. Huang, Y. Li, Z. Li et al., “Security and cost-aware computation offloading via deep reinforcement learning in mobile edge computing,” Wireless Communications and Mobile Computing, vol. 2019, Article ID 3816237, 20 pages, 2019.
View at: Publisher Site | Google Scholar
G. Zhang, W. Zhang, Y. Cao, D. Li, and L. Wang, “Energy-delay tradeoff for dynamic offloading in mobile-edge computing system with energy harvesting devices,” IEEE Transactions on Industrial Informatics, vol. 14, no. 10, pp. 4642–4655, 2018.
View at: Publisher Site | Google Scholar
S. Ranadheera, S. Maghsudi, and E. Hossain, “Mobile edge computation offloading using game theory and reinforcement learning,” 2017, https://arxiv.org/abs/1711.09012.
View at: Google Scholar
S. N. Shirazi, A. Gouglidis, A. Farshad, and D. Hutchison, “The extended cloud: review and analysis of mobile edge computing and fog from a security and resilience perspective,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 11, pp. 2586–2595, 2017.
View at: Publisher Site | Google Scholar
I. A. Elgendy, W. Zhang, Y. C. Tian, and K. Li, “Resource allocation and computation offloading with data security for mobile edge computing,” Future Generation Computing System, vol. 100, pp. 531–541, 2019.
View at: Publisher Site | Google Scholar
R. Roman, J. Lopez, and M. Mambo, “Mobile edge computing, Fog et al.: a survey and analysis of security threats and challenges,” Future Generation Computing System, vol. 78, pp. 680–698, 2018.
View at: Publisher Site | Google Scholar
T. Hongliang, Z. Yong, L. Chao, and X. Chunxiao, “Overview of research on database confidentiality protection technology in cloud environment,” Journal of Computers, vol. 40, no. 10, pp. 2245–2270, 2017.
View at: Publisher Site | Google Scholar
O. Pedreira, F. Garcia, M. Piattini, A. Cortinas, and A. Cerdeira-Pena, “An architecture for software engineering gamification,” Tsinghua Science and Technology, vol. 25, no. 6, pp. 776–797, 2020.
View at: Publisher Site | Google Scholar
M. Maimaiti, Y. Liu, H. Luan, and M. Sun, “Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation,” Tsinghua Science and Technology, vol. 40, p. 1, 2020.
View at: Publisher Site | Google Scholar
E. A. A. Alaoui, S. C. K. Tekouabou, S. Hartini, Z. Rustam, H. Silkan, and S. Agoujil, “Improvement in automated diagnosis of soft tissues tumors using machine learning,” Big Data Mining and Analytics, vol. 4, no. 1, pp. 33–46, 2021.
View at: Publisher Site | Google Scholar
Y. N. Malek, M. Najib, M. Bakhouya, and M. Essaaidi, “Multivariate deep learning approach for electric vehicle speed forecasting,” Big Data Mining and Analytics, vol. 4, no. 1, pp. 56–64, 2021.
View at: Publisher Site | Google Scholar
A. Guezzaz, Y. Asimi, M. Azrour, and A. Asimi, “Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection,” Big Data Mining and Analytics, vol. 4, no. 1, pp. 18–24, 2021.
View at: Publisher Site | Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
View at: Publisher Site | Google Scholar
Y. Chen, N. Zhang, Y. Zhang, X. Chen, W. Wu, and X. S. Shen, “TOFFEE: task offloading and frequency scaling for energy efficiency of mobile devices in mobile edge computing,” IEEE Transactions on Cloud Computing, vol. 10, p. 1, 2019.
View at: Publisher Site | Google Scholar
H. Wu, Y. Sun, and K. Wolter, “Energy-efficient decision making for mobile cloud offloading,” IEEE Transactions on Cloud Computing, vol. 8, no. 2, pp. 570–584, 2020.
View at: Publisher Site | Google Scholar
G. Chalapathi, V. Chamola, C. K. Tham, S. Gurunarayanan, and N. Ansari, “An optimal delay aware task assignment scheme for wireless SDN networked edge cloudlets,” Future Generation Computer Systems, vol. 102, pp. 862–875, 2020.
View at: Publisher Site | Google Scholar
X. Xu, X. Zhang, X. Liu, J. Jiang, L. Qi, and M. Z. A. Bhuiyan, “Adaptive computation offloading with edge for 5G-envisioned internet of connected vehicles,” IEEE Transactions on Intelligent Transportation System, vol. 99, pp. 1–10, 2020.
View at: Publisher Site | Google Scholar
H. Wu, K. Wolter, P. Jiao, Y. Deng, Y. Zhao, and M. Xu, “EEDTO: an energy-efficient dynamic task offloading algorithm for blockchain-enabled IoT-edge-cloud orchestrated computing,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2163–2176, 2020.
View at: Publisher Site | Google Scholar
J. Xu, X. Li, R. Ding, and X. Liu, “Multi-resource computing offloading strategy for energy optimization in mobile edge computing,” Computer Integrated Manufacturing System, vol. 25, no. 04, pp. 954–961, 2019.
View at: Publisher Site | Google Scholar
H. Wu, W. J. Knottenbelt, and K. Wolter, “An efficient application partitioning algorithm in mobile environments,” IEEE Transactions on Parallel Distributed System, vol. 30, no. 7, pp. 1464–1480, 2019.
View at: Publisher Site | Google Scholar
A. Zhu, S. Guo, M. Ma et al., “Computation offloading for workflow in mobile edge computing based on deep Q-learning,” in Proceedings of 2019 28th Wireless and Optical Communications Conference WOCC 2019, IEEE, Beijing, China, July 2019.
View at: Publisher Site | Google Scholar
H. Liu, “Scheduling multi-workflows over edge computing resources with time-varying performance, a novel probability-mass function and ∂ DQN-based approach,” in Proceedings of International Conference on Web Services, pp. 197–209, Beijing, China, October 2020.
View at: Google Scholar
M. C. Sanchez, J. M. C. de Gea, J. L. Fernández-Alemán, J. Garcerán, and A. Toval, “Software vulnerabilities overview: a descriptive study allow,” Tsinghua Science and Technology, vol. 25, no. 2, pp. 270–280, 2020.
View at: Publisher Site | Google Scholar
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, “A survey of data partitioning and sampling methods to support big data analysis,” Big Data Mining and Analytics, vol. 3, no. 2, pp. 85–101, 2020.
View at: Publisher Site | Google Scholar
X. Jia, D. He, N. Kumar, and K. K. R. Choo, “A provably secure and efficient identity-based anonymous authentication scheme for mobile edge computing,” IEEE Systems Journal, vol. 14, no. 1, pp. 560–571, 2020.
View at: Publisher Site | Google Scholar
D. He, S. Chan, and M. Guizani, “Security in the internet of things supported by mobile edge computing,” IEEE Communications Magazine, vol. 56, no. 8, pp. 56–61, 2018.
View at: Publisher Site | Google Scholar
Y. Chen, Y. Zhang, S. Maharjan, M. Alam, and T. Wu, “Deep learning for secure mobile edge computing in cyber-physical transportation systems,” IEEE Networks, vol. 33, no. 4, pp. 36–41, 2019.
View at: Publisher Site | Google Scholar
X. Xu, Q. Huang, H. Zhu et al., “Secure service offloading for internet of vehicles in SDN-enabled mobile edge computing,” IEEE Tranactions on Intelligent Transportation System, vol. 10, pp. 1–10, 2020.
View at: Publisher Site | Google Scholar
X. Xu, Q. Huang, Y. Zhang, S. Li, L. Qi, and W. Dou, “An LSH-based offloading method for IoMT services in integrated cloud-edge environment,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 16, no. 3, 2021.
View at: Publisher Site | Google Scholar
H. Chen, X. Zhu, D. Qiu, L. Liu, and Z. Du, “Scheduling for workflows with security-sensitive intermediate data by selective tasks duplication in clouds,” IEEE Transactions on Parallel Distributed Systems, vol. 28, no. 9, pp. 2674–2688, 2017.
View at: Publisher Site | Google Scholar
Y. Wu, J. Shi, K. Ni et al., “Secrecy-based delay-aware computation offloading via mobile edge computing for internet of things,” IEEE Internet Things Journal, vol. 6, no. 3, pp. 4201–4213, 2019.
View at: Publisher Site | Google Scholar
Z. Li, J. Ge, C. Li et al., “Energy cost minimization with job security guarantee in Internet data center,” Future Generation Computing Systems, vol. 73, pp. 63–78, 2017.
View at: Publisher Site | Google Scholar
Y. Qin, H. Wang, S. Yi, X. Li, and L. Zhai, “An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning,” The Journal of Supercomputing, vol. 76, no. 1, pp. 455–480, 2020.
View at: Publisher Site | Google Scholar
H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Transactions on Parallel Distributed System, vol. 13, no. 3, pp. 260–274, 2002.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Binbin Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

595

Downloads

740

Citations

Security and Communication Networks

Security Threats to Artificial Intelligence-Driven Wireless Communication Systems 2021

Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing

Abstract

1. Introduction

2. Related Work

3. System Model and Problem Formulation

3.1. Mobile Edge Computing Model

3.2. Security Cost Model

3.3. Communication Model

3.4. Risk Probability Model

3.5. Problem Formulation

3.5.1. Sorting of Workflow Nodes

3.5.2. State Space

3.5.3. Action Space

3.5.4. Reward Function

4. Deep Reinforcement Learning-Based Security-Aware Workflow Scheduling Scheme

5. Experimental Evaluation

5.1. Parameter Settings

5.2. Performance Analysis

5.2.1. The Convergence Analysis of SAWS

5.2.2. The Impact of Different Risk Probabilities

5.2.3. The Impact of Different Security Services

5.2.4. The Impact of Different Risk Coefficients

5.2.5. The Impact of Different Edge Server’s Computing Capacities

5.2.6. The Impact of the Number of Edge Servers

6. Conclusions and Future Work

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright