Abstract

Significant challenges for reasoning tasks scheduling remain, including the selection of an optimal tasks-servers solution from the possible numerous combinations, due to the heterogeneous resources in edge environments and the complicated data dependencies in reasoning tasks. In this study, a time-driven scheduling strategy based on reinforcement learning (RL) for reasoning tasks in vehicle edge computing is designed. Firstly, the reasoning process of vehicle applications is abstracted as a model based on directed acyclic graphs. Secondly, the execution order of subtasks is defined according to the priority evaluation method. Finally, the optimal tasks-servers scheduling solution is chosen by Deep Q-learning (DQN). The extensive simulation experiments show that the proposed scheduling strategy can effectively reduce the completion delay of reasoning tasks. It performs better in algorithm convergence and runtime compared with the classic algorithms.

1. Introduction

In recent years, Internet of Vehicles (IoV) has become a research hotspot for the Intelligent Transportation System (ITS) [1]. The autonomous driving of IoV not only improves the driving safety, but also solves the problem of traffic inefficiency and lane congestion. It is a challenge for the autonomous driving to complete the target application under strict time constraint and restricted computing resources. The current work for autonomous driving mostly focuses on how to design the specific functions, such as traffic recognition, into reasoning tasks [2, 3]. Less attention is paid to how to schedule these reasoning tasks of autonomous driving to the appropriate computing nodes with low latency. Fortunately, IoV in Mobile Edge Computing (MEC) could schedule the real-time tasks from vehicles to the Road Side Units (RSU) with powerful computing resources, alleviating the task execution delay. Besides, a reasonable scheduling for reasoning tasks in MEC can effectively reduce both the execution latency of tasks and the workload of vehicles [411]. However, due to the heterogeneous resources in edge environments and the complicated data dependencies in reasoning tasks, significant challenges for reasoning tasks scheduling remain, including the selection of an optimal tasks-servers solution from the possible numerous combinations.

Existing studies are mainly done subject to task scheduling and task coordination through heuristic algorithms [1215], such as Particle Swarm Optimization (PSO), Colony Algorithm (CA), and Genetic Algorithm (GA). Although these works could obtain the feasible solutions while satisfying different constraints, they fail to predict the deviation between the feasible and optimal solutions in advance, which makes their solutions easily fall into the local optimum. Several studies have been devoted to task scheduling using reinforcement learning (RL) algorithm [1626], which can not only correct the deviation between the feasible and optimal solutions, but also accelerate the convergence of perfect results. Specifically, Lin et al. [23] proposed a time-driven scheduling strategy based on Q-learning algorithm for reasoning tasks of autonomous driving in IoV. The experimental results demonstrated that the performance of RL algorithms based on simulated annealing was better than other classic algorithms. This work is instructive for our work. Zhao et al. [20] put forward a distribution scheduling algorithm based on DQN to achieve the best balance between latency, computational rate, and energy consumption, for an edge access network of IoV. They prioritized the tasks of different vehicles according to the analytic hierarchy process (AHP). The experimental results showed that the average task processing delay of the proposed method could effectively improve the task offload efficiency. However, the priority between tasks has not been scientifically calculated and weighted, but only evaluated by experts based on their experience. The current work [27, 28] for priority evaluation is mostly subjective by experts. There are great achievements in multivehicle task collaborative scheduling [5, 7, 11, 20]. However, the time-driven scheduling for single-vehicle reasoning tasks with data dependencies is still an open issue.

In response to this issue, two research questions are considered: (1) how to design a model for the reasoning tasks with data dependencies to evaluate the latency caused by task execution and data transmission? (2) How to develop an efficient and reliable scheduling strategy to reduce the latency during vehicle driving? To solve the above questions, we design time-driven scheduling based on RL for reasoning tasks in vehicle edge computing, which considers the differences of heterogeneous real-time reasoning tasks and optimizes the completion latency of reasoning tasks.

The main contributions of this paper are concluded as follows:(1)A latency model is designed for reasoning tasks with data dependencies, which considers the latency caused by task execution and data transmission(2)The scheduling for reasoning tasks in MEC is defined as a Markov Decision Process (MDP), which models the scheduling strategy for a reasoning task as the state, the resource allocation decision for each subtask as the action, and the completion latency of a reasoning task as the reward(3)A time-driven scheduling strategy based on DQN is designed to explore an optimal tasks-servers solution from the possible numerous combinations in vehicle edge computing

The remaining part of the paper proceeds as follows. We review the related work in Section 2. Section 3 introduces the problem definitions of reasoning tasks scheduling. Section 4 describes the proposed reasoning tasks scheduling strategy in detail. Section 5 conducts the comparative experiments and analyzes the performance of the proposed strategy. Finally, Section 6 summarizes the work of this paper and looks forwards to the future research directions.

Task scheduling in MEC has been extensively studied [410]. In general, task scheduling approaches mainly include the methods based on heuristic algorithms [1215] and reinforcement learning [1626].

2.1. Methods Based on Heuristic Algorithms

Xie et al. [12] proposed a novel Directional and Non-local-Convergent Particle Swarm Optimization (DNCPSO) to address workflow scheduling in cloud-edge environment, which could reduce the make span and cost dramatically and work well for task scheduling in complex applications. Wu et al. [13] studied how to dynamically partition a given application effectively into local and remote parts while reducing the total cost in cloud-edge environment. They proposed a Min-Cost Offloading Partitioning (MCOP) algorithm, which could significantly reduce the execution time and energy consumption by optimally distributing tasks between mobile devices and servers. Lin et al. [15] proposed a linear-time rescheduling algorithm for the task migration in MCC environment. The algorithm started from a minimal-delay scheduling solution and subsequently performed energy reduction by migrating tasks among the local cores and the cloud.

The methods based on heuristic algorithms can easily fall into the local optimal solution, which usually fails to get a good result. Moreover, the time required to process reasoning tasks of IoV is usually strict. The methods based on heuristic algorithms are not suitable for such problem due to their long algorithm execution time.

2.2. Methods Based on Reinforcement Learning

To adapt the scheduling strategies for dynamic scenarios, Deep Reinforcement Learning (DRL) has been widely applied to the task scheduling problems in MEC systems in recent years.

Chen et al. [16] designed a double DQN-based computation scheduling policy for a virtual MEC system. Numerical experiments showed that their proposed policy could achieve a significant improvement in computation scheduling performance. Xiong et al. [17] proposed an improved DQN algorithm to minimize the long-term weighted sum of average completion time of jobs and average number of requested resources in IoT edge computing system. Simulation results showed that the proposed algorithm has a better performance than the original DQN algorithm. Wang et al. [18] proposed a new DRL-based scheduling framework to address the challenges of task dependency and adapting to dynamic scenarios in the MEC system. The proposed DRL solution could automatically discover the common patterns behind various applications so as to infer an optimal scheduling policy in different scenarios. Rjoub et al. [21, 26] proposed four deep and RL-based scheduling approaches to automate the process of scheduling large-scale workloads onto cloud computing resources, while reducing both the resource consumption and task waiting time. These approaches derived an appropriate task scheduling mechanism that could minimize both tasks’ execution delay and cloud resources utilization. Qi et al. [22] firstly proposed a multitask DRL approach for scalable parallel task scheduling (MDTS) in IoV. For avoiding the curse of dimensionality when coping with complex parallel computing environments and jobs with diverse properties, they extended the action selection in DRL to a multitask decision, where the output branches of multitask learning were fine-matched to parallel scheduling tasks. Huang et al. [24] proposed a DRL-based Online Offloading (DROO) framework to optimally adapt task scheduling decisions and wireless resource allocations to the time-varying wireless channel conditions in a wireless powered MEC network. Numerical results showed that the proposed framework could achieve near-optimal performance while significantly decreasing the computation time.

RL-based methods mostly assume that the scheduling problem is a learning task. Through preliminary training, an effective scheduling policy for the task can be quickly formed by a reasonable designed RL algorithm. Note that current work for IoV mostly focuses on multivehicle collaborative scheduling, but the time-driven scheduling for single-vehicle reasoning tasks with data dependencies is still an open issue.

3. Problem definition

Table 1 shows the notations used in this paper.

Figure 1 gives an example of reasoning tasks scheduling in vehicle edge computing. This example considers autonomous driving reasoning system [2, 3], which consists of applications such as emergency rule inference engine and security operations. The user equipment (UE) makes scheduling decision for those applications according to the status in edge environments and application profiles; thus some of them are executed locally on the vehicle (i.e., UE) while others are scheduled to the edge by wireless channels. In this work, we consider that the edge environment is composed of RSUs providing resources including computing, communications, and storage to UE in each time-slot, which is expressed as . The computation capacities of vehicle and RSU are denoted as and .

In time-slot , a reasoning task can be expressed as a directed acyclic graph (DAG) as in Figure 2, where is a set of subtasks and is the data dependencies between subtasks. The data dependency indicates that there is a directed arc between subtasks and , and task cannot start until task has been completed. The set of direct precursors of task can be expressed as . A task cannot be executed until all of its direct precursors are completed.

In vehicle edge computing, a subtask in the reasoning task can be either offloaded to the edge or executed locally on the vehicle. If the offloading occurs for a reasoning task, the process delay will be related to the subtask profile and the environment state. The subtask profile includes the required CPU cycles for running the subtasks , data size of the subtasks , and the tolerable delay of the subtasks . Besides, the environment state contains the transmission rate of wireless channel . Therefore, the transmission latency for subtasks executing on edge nodes can be calculated as (1) and (2). If subtasks are executed locally on the vehicle, there is only execution latency on the user equipment, which can be obtained by .

The scheduling plan for a reasoning task is denoted as distribution relationship matrix as (3). If , it means that subtask is offloaded to edge node ; otherwise, it means that subtask is executed locally. When the edge node is running normally, the execution latency of the reasoning tasks can be expressed by (4), where means the processing latency of the reasoning tasks. If there is no available edge node in the edge environment, all subtasks will be executed serially on the vehicle, where mi is set to 0. When the worst scheduling occurs, the completion latency of the reasoning tasks is described as in equation (5):

To make better use of computing resources in different edge environments, we assume that edge nodes should satisfy the following processing principles:(1)A subtask is processed by only one corresponding edge node, which is formally defined as (6).(2)After all subtasks are assigned to the corresponding edge nodes, the edge nodes begin to process the subtasks.(3)The subtasks on different edge nodes without data dependencies can be processed in parallel.(4)The subtasks on the same edge node are processed according to the data dependencies. Otherwise, they are processed according to their corresponding priorities.(5)The tolerable delay of the subtasks on the edge node is not greater than the execution latency of the corresponding edge node, which is formally defined as (7):

The reasoning task scheduling discussed in this paper can be summarized as follows: in various time-slots, a reasoning task on a vehicle can be decomposed into several subtasks, and these subtasks can be reasonably scheduled to the edge nodes for processing based on a specific scheduling algorithm. The scheduling algorithm proposed in this paper can minimize the execution latency of reasoning tasks, which can be expressed as

4. Algorithm Design

In this section, we first describe the priority evaluation for subtasks in a reasoning task, which is employed to determine the order of execution for the subtasks without data dependencies. And then give an overview of our proposed scheduling algorithm. Finally, we introduce the implementation of the scheduling algorithm in detail.

4.1. Priority Evaluation for Each Subtask

It is difficult to estimate the execution time of a reasoning task that defines the execution sequence of subtasks. Fuzzy analytic hierarchy process (FAHP) [2729] is usually employed to analyze multiobjective problems, which decomposes the problem hierarchically according to its feature and overall goal, forming a bottom-up gradient hierarchy. In this work, FAHP is commonly used to measure the subtask weight, which can determine the order of execution for the subtasks without data dependencies. Each subtask weight is modified by calculating the information entropy of objective factors (i.e., each subtask’s own parameters) [30, 31]. The pseudocode of the priority evaluation for each subtask is described as follows: and represent the relative importance of subtask factors. and are the number of factors and the information entropy of them.

Input: computational complexity , the amount of data , the tolerable delay
Output: the priority of subtask z
(1)  Sort subtask’s factor according to equation (9) and construct matrix P
(2)  for to maximum rows of P do
(3)   
(4)   for to maximum columns of P do
(5)    
(6)   end for
(7)  end for
(8)   are transformed through equation (12) to obtain R
(9)  for to maximum rows of R do
(10)   
(11)   for to maximum columns of R do
(12)    update via equation (13)
(13)   end for
(14)  end for
(15)  calculate the information entropy via equations (14) and (15)
(16)  obtain via (16)
(17)  
4.2. Scheduling Algorithm

MDP is a basic model of the RL in this paper. The scheduling algorithm can be simplified according to the MDP property, which means that the next state is only related to the current state as Figure 3. In Figure 3, each state represents a corresponding allocation strategy for real-time vehicle tasks in different edge environment and corresponds to a specific reward. Each action is calculated by the agent (neural network), and it is used to guide the current state to a better direction.

Input: initial state, maximum number of rounds, maximum number of iterations in a single round
Output: the scheduling strategy for reasoning tasks
(1)  Initialize the experience pool of constant storage space, the action-value function with random weight and the corresponding
(2)  for to Maximum number of rounds do
(3)    initial state
(4)   for to maximum number of iterations in a single round do
(5)   Choose the action with the largest historical reward with possibility , otherwise choose a random action
(6)   Execute action to get the next state and use Algorithm 2 to calculate the reward
(7)   Store in the experience pool
(8)   
(9)    Random sampling from the experience pool
(10)    Construct an error function according to equation (17), and back-propagation to update the parameters
(11)   Update per few steps
(12)   If satisfies the termination state, the current iteration is ended
(13)  end for
(14)end for

The model characteristics of the discussed problem in this paper are described as follows:(1)State space: the number of states for the feasible solutions is not constant. They can change dynamically as the change of the number of subtasks after decomposition and the changed distribution of edge nodes in various time-slots.(2)Action space: the number of optional actions in the action space is equal to the number of subtasks. Action selection means scheduling the corresponding subtasks in current state to the specific edge nodes.(3)Reward value: this work tries to minimize the completion latency of the reasoning task, so the reward value is set to .

The scheduling strategy is based on the DQN algorithm. It can be abstracted as a function fitting problem when the discrete tangent dimension of the state and action space are high. The pseudocode of our scheduling algorithm is described in Algorithm 2.where and represent the learning rate and discount factor, respectively. is the state after executing the action in the iteration. represents the action of the largest reward in state , and represents the accumulated reward during the iterations.

4.3. Algorithm Implementation

In various time-slots, reasoning tasks and edge environments can change dynamically. These changes are summarized as follows:(1)The topological structure of reasoning tasks and the number of nodes in edge environments(2)The computational complexity, the datasets between subtasks, and the tolerable delay of subtasks in various environments(3)The transmission latency and execution latency of subtasks

Input:, , , ,
Output:
(1)  Initialization: set the array , the subtask queue Q and the set of predecessor nodes R to
(2)  Use the constraint relationship to set the array
(3)  Enqueue the subtask with to Q and set the number of traversed subtasks , the number of subtasks in the current layer to the current queue size
(4)  whiledo
(5)    if u = k then
(6)     .
(7)  end if
(8)   The subtask is dequeued, and the task is expressed as ,
(9)   for to do
(10)     if there exists a directed edge of to i then
(11)   Add the subtask and its predecessor node set R() to R(i) I(i)- = 1
(12)      if I(i)= 0 then
(13)   enqueue the subtask to Q
(14)   end if
(15)    end if
(16)  end for
(17)end while
(18)According to , the subtasks are assigned to edge nodes.
(19)Initialization: set the subtask completion list to , set the remaining execution latency of subtasks by , and set the current running time
(20)whiledo
(21)  Determine the subtask to be assigned to each edge node, which satisfies the direct predecessor set is subset of
(22)  Find the minimum execution latency from the currently executed subtasks in parallel
(23)  , when , add the subtask to and set
(24)end while
(25)return h

The algorithm implementation will calculate the completion latency of reasoning tasks in edge environments. The pseudocode of the algorithm implementation is described in Algorithm 3. Figure 4 presents the calculation process of execution latency, which includes the following steps.Step 1: initiate the parameters of Algorithm 3, including the subtask queue Q and the set of predecessor nodes R. Next, a reasoning task is expressed as a specific directed acyclic graph.Step 2: Q is used to sort the subtasks by the topology of the reasoning task.Step 3: calculate the task execution time according to the specific strategy derived from Algorithm 2.

5. Simulation Experiment and Analysis

5.1. Experimental Parameter Settings

The simulation experiments are implemented with the Python 3.7 and conducted on a 64-bit Win10 system, which is configured with Inter(R) Core (TM) i7-7700HQ CPU and 16 GB RAM. Our proposed scheduling algorithm is DQN, and Q-learning algorithm [23] and GA-PSO [32] are introduced as the comparison algorithms. Based on the effects of adjusting parameters in many experiments, the corresponding parameters of DQN and Q-learning [23] are set as , , and . The corresponding parameters of GA-PSO [32] are set as , , , , , and . In addition, the number of rounds is set as 100 and the number of iterations is set as 1000 for DQN, Q-learning, and GA-PSO, respectively.

All the algorithms try to find the optimal scheduling result with the shortest completion latency of reasoning tasks in edge environments.

UEs have different reasoning tasks with various topologies and task number, and the topological structure of reasoning tasks is shown in Figure 5. The related parameters for the vehicle edge computing environment are set according to the IEEE 802.11p [33], and other parameters are set as Table 2.

5.2. Analysis of Results

Table 3 shows the completion latency of different reasoning tasks in various edge environments with our proposed scheduling algorithm, where m and n denote the number of edge nodes and subtasks in each experiment. Note that n = 6 corresponds to the “Topology I,” n = 9 corresponds to the “Topology II,” and n = 12 corresponds to the ‘Topology III’ in Figure 5. Each grid in Table 3 corresponds to an experiment with different reasoning tasks with specific topologies and different edge nodes in edge environments. In addition, the execution order of subtasks is based on two rules: traditional rule and priority rule. Traditional rule executes the subtasks according to their corresponding topology depths [23] and priority rule executes the ones according to the priority evaluation for each subtask discussed in Section 4.1.

From Table 3, we find that the completion latency of reasoning tasks reduces as the number of edge nodes increases. Under the same circumstances, the priority rule for subtask execution can effectively reduce the completion latency of reasoning tasks compared to the traditional rule. As the topology complexity of reasoning tasks increases, this gap will become larger. This is because that the maximum number of parallel subtasks in the same time is limited by the number of edge nodes, and the priority rule can increase the upper limit number of the parallel subtasks.

Figure 6 shows the average completion latency of different scheduling algorithms (i.e., GA-PSO, Q-learning, and DQN) with different seasoning tasks in various edge environments, where m denotes the number of edge nodes in each experiment. In each subgraph, we record the completion latency of reasoning tasks with different topologies for 100 rounds and display the average completion latency of reasoning tasks for every 10 rounds. From Figure 6(a), we find that GA-PSO is difficult to converge, although it can get a feasible solution with shorter completion latency of reasoning tasks. In contrast, DQN can not only get a feasible solution with shorter completion latency, but also convergence well. From Figure 6(b) and Figure 6(c), the performance of Q-learning is similar to that of DQN when the numbers of subtasks and edge nodes are both small. However, the convergence performance of Q-learning will decrease as the topology complexity of reasoning tasks increases. The main reason for the different scheduling results with various algorithms is that the increase in the number of subtasks has brought about the multiplication of the number of solutions in searching space. For GA-PSO, finding the optimal solution mainly relies on randomness and fitness function. Therefore, when the number of feasible solutions in searching space is huge, GA-PSO is easy to fall into the local optimal solution. Q-learning is difficult to build the Q list and converge also due to the huge number of feasible solutions in searching space. However, DQN converts the Q list to the Q value function by neural network, which can solve the problem with a huge number of states (i.e., feasible solutions) and make it easier to converge.

Table 4 shows the average runtime (s) of different algorithms with different seasoning tasks in various edge environments. Each grid in Table 4 is the average of the runtime of 100 rounds for different algorithm. From Table 4, we find that the runtime of GA-PSO is relatively stable with different seasoning tasks in various edge environments. This is because that the runtime of GA-PSO mainly depends on the number of particles used in the update process, which is relatively stable even if the edge environments change during the scheduling process. The average runtime of DQN and Q-learning is better than that of GA-PSO, and DQN performs best with different seasoning tasks in various edge environments. This is because the runtime of RL algorithms will decrease as the number of feasible solutions learned increases. In addition, the architecture of the neural network used in DQN is more suitable for reasoning tasks scheduling in vehicle edge computing, compared with the Q list used in Q-learning.

6. Conclusions

This paper proposes a scheduling strategy based on DQN for reasoning tasks in vehicle edge computing, which aims to reduce the completion latency of reasoning tasks. The extensive simulation experiments show that the proposed strategy can achieve the superior performance compared to other classic methods. Our strategy and other classic methods all perform well when the structure of reasoning tasks is simple, while GA-PSO has poor convergence. Specially, our strategy has better performance and convergence than any other classic methods when the structure of reasoning tasks is complex.

In the future, we will improve the scheduling algorithm through optimizing the training efficiency of the neural network to fit the wireless channel fluctuations and radio interference in vehicle edge computing. In addition, we will further consider a multivehicle collaborative scheduling strategy to alleviate uneven resources allocation for multivehicle tasks in edge environments.

Data Availability

The data used to support the findings of this study are included within the article.

Disclosure

This work was presented in part at the 2019 IEEE Intl Conf on Parallel and Distributed Processing with Applications (ISPA) with the title “A Time-Driven Workflow Scheduling Strategy for Reasoning Tasks of Autonomous Driving in Edge Environment.”

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partly supported by the Intelligent Computing and Application Research Team of Concord University College, Fujian Normal University under Grant no. 2020TD001 Natural Science Foundation of China under Grant no. 62072108; Project on the Integration of Industry and Education of Fujian Province under Grant no. 2021H6026 Natural Science Foundation of Fujian Province under Grant nos. 2019J01286 and 2019J01427; and Young and Middle-Aged Teacher Education Foundation of Fujian Province under Grant no. JT180098.