#### Abstract

In vehicular edge computing (VEC), tasks and data collected by sensors on the vehicles can be offloaded to roadside units (RSUs) equipped with a set of servers through the wireless transmission. These tasks may be dependent of each other and can be modeled as a directed acyclic graph (DAG). The DAG scheduling problem is aimed at scheduling the tasks to the servers to minimize the scheduling length (makespan), i.e., the maximum finish time of all tasks. The conventional heuristic algorithms only utilize partial information of the DAG, so the performance of these algorithms is not stable. The state-of-the-art scheduling method employs the graph neural network to further reduce the makespan. However, this method ignores the fact that there are communication delays between tasks scheduled on different servers. In this paper, we tackle the DAG scheduling problem considering communication delays which makes the problem much more challenging. Our method is based on graph convolutional neural network and reinforcement learning. Experimental results show that our scheduling method reduces the DAG scheduling length by 8% to 15% compared with the representative scheduling strategies based on graph neural network models (GAT, GraphSAGE) and 15% to 25% compared with the conventional algorithms (HEFT, LC, and CPOP) and the sequence-to-sequence model.

#### 1. Introduction

With the maturity of cloud computing technology, VCC (vehicular cloud computing) [1] is considered to be a promising method to improve vehicular services. Vehicles with limited resources can offload computing-intensive tasks to the cloud through VCC. However, cloud computing servers may be far away from vehicles running on the road. It may take a long time to transfer tasks from the vehicle to the cloud server and return the calculation results from the cloud server to the vehicle. Thus, VCC might not be suitable for delay-sensitive tasks.

In order to cope with the above issue, researchers proposed vehicular edge computing (VEC) [2, 3]. VEC is a distributed deployment service that extends the computing and storage capacity to the edge of the network. In VEC, a large number of roadside units (RSUs) will be deployed near the road where the vehicles are driving. Thus, the computing tasks and the data collected by sensors on vehicles no longer need to be offloaded to the cloud servers but directly offloaded to the roadside units equipped with a set of servers through the wireless transmission such as 5G. The remaining issue is how to schedule the tasks to the servers to minimize the scheduling length.

At present, some works [2, 4] have discussed the scheduling problem in VEC. However, these works assume that the tasks to be scheduled are independent of each other. In fact, there may be dependencies between different tasks from the same application. Liu et al. [3] also pointed out the issue, and they proposed a dependency-aware task scheduling algorithm where the tasks are modeled as a directed acyclic graph (DAG). However, they assume different tasks require the same computing time and the communication time between tasks is ignored.

With the development of artificial intelligence and machine learning, the neural network is also used to solve the DAG scheduling problem. Mao et al. [5] utilized neural networks and reinforcement learning (RL) [6] to learn job-specific scheduling algorithms. However, they ignore the fact that there are nonnegligible communication delays between tasks scheduled on different servers (processors). Note that the communication delay will become zero if the tasks are scheduled on the same processor. Thus, minimizing the DAG scheduling length needs a tradeoff between placing all the tasks on one processor and placing them on all available processors. In this sense, the DAG scheduling problem with communication delays will be much more challenging than the one without considering them [7].

In this paper, we study the node-weighted and edge-weighted DAG scheduling problem where the node weight represents the task computation time and the edge weight represents the communication time (communication delay) between two tasks.

We design a scheduling method based on a two-layer graph convolutional neural network (TLGC). Similar to [5], we employ reinforcement learning [6] to optimize the training of the strategy, which takes the execution time as the reward for feedback and adjusts the network parameters. However, our reward function is different from [5] since we need to consider communication time between tasks. In addition, when the graph neural network is trained, we also consider the processor network information, which is ignored in [5]. A better scheduling strategy is generated in the process of multiple trainings. More details will be given in Section 4.

Our contributions are as follows: (1)We study the DAG scheduling problem considering both the computation and communication time. The node information is encoded through a graph neural network. Combining with reinforcement learning, we train the network to reduce the communication overhead caused by task scheduling(2)The scheduling scheme based on graph convolutional neural network is evaluated with the conventional DAG scheduling methods [8, 9] and the sequence to sequence scheduling method [10]. The results show that the DAG scheduling length is reduced by 15% to 25%(3)Compared with the state-of-the-art graph neural network models (GAT [11], GraphSAGE [12]), the evaluation shows that the DAG scheduling length of the scheme based on graph convolutional neural network and proximal policy optimization is reduced by 8% to 15%

#### 2. Related Work

VEC can be applied in many fields. Hong et al. [13] proposed mobile fog, which helps the police search for and track target vehicles by using traffic cameras. Wan et al. [14] migrated video processing tasks to computation units deployed by the edge to analyze real-time traffic videos accurately. It can reduce the latency of video analysis and improve the quality of video analysis. Grassi et al. [15] designed ParkMaster, a scheme for detecting open parking spaces based on edge computing. ParkMaster can analyze street videos uploaded to cloud servers to evaluate parking spaces.

There are some research works on how to offload the tasks. Guo et al. [16] modeled the computation offloading problem as a mixed integer nonlinear programming problem. Since the problem is NP-hard, they proposed a suboptimal solution which makes use of particle swarm optimization (PSO) and genetic algorithm (GA). Zhu et al. [17] proposed two approximated algorithms to solve the problem where multiple mobile devices share multiple heterogeneous mobile edge computing servers. Their goal is to minimize energy consumption. Fang et al. [18] designed an approximated offline algorithm to minimize the total response time for finishing all the tasks in edge computing. Zhu et al. [4] proposed a novel scheme named Fog Following Me (Folo), which considers the mobility of vehicles. These vehicles may generate tasks or serve as fog nodes. The privacy and security issues are also considered in vehicular-related networks [19–22].

For the scheduling problem of DAG, conventional solutions define the properties of CP (critical path) [23], bottom-level (BL) [24], and top-level (TL) [25] of DAG. These scheduling algorithms are only based on the properties of one aspect of DAG. They do not consider the global structure information of the graph, so they might not obtain an efficient scheduling strategy. Taking CP as an example, CP is the critical path of DAG, that is, the longest path of the DAG. The goal of scheduling is to reduce the critical path as much as possible. The commonly used algorithms include LC (Linear Clustering) algorithm [8] which repeatedly clusters the critical path directly and CPOP (Critical-Path-on-a-Processor) algorithm [9] which calculates node scheduling priority through the sum of TL and BL values. An illustrating example for scheduling a DAG based on four representative conventional algorithms is shown in Section 5.1.

With the development of neural networks, for the scheduling problem of DAG, more and more researchers often use the model of recurrent neural network (RNN) [26] to read the input information about operations and their dependencies to generate scheduling strategies. Moreover, they use reinforcement learning (RL) to continuously optimize training to generate better strategies [10, 27, 28]. Although the sequence to sequence recurrent neural network model benefits natural language processing, it only serializes the input information. Therefore, in order to extract the information from the graph, researchers proposed graph neural networks [29]. Common graph neural networks include graph convolutional neural network (GCN) [30], graph attention network (GAT) [11], and GraphSAGE [12].

Although GAT introduces the attention mechanism and GraphSAGE generalizes the node information, they cannot accurately extract the key information that affects the scheduling performance, so the scheduling performance by these neural network methods might not be good enough. In our paper, the purpose of the TLGC method using graph convolutional neural network is to consider both the information such as CP and the global DAG structure to generate a more efficient scheduling strategy (cf. Figure 1).

#### 3. System Model

For the scheduling problem of the DAG, the definition of each symbol is shown in Table 1.

Graph is a node-weighted and edge-weighted DAG, where represents the set of nodes (tasks), and represents the set of directed edges. A directed edge = means that the task cannot be executed until task has been finished. represents the computation time of task , and represents the communication time between the two tasks associated with edge . Figure 2 gives an illustrating DAG. The value below a node means its computation time, and the value close to a directed edge means the communication time of the two tasks if they are scheduled on different processors.

Let be the processor group. In order to simplify the system model, it is assumed that all processors are homogeneous and are fully connected. It means that the running time of the same task on different processors is the same, and the communication bandwidth between processors is the same.

The objective of the DAG scheduling problem is to minimize the scheduling length (makespan), which is the maximum finish time of all tasks. Note that the waiting time is counted in each task’s finish time calculation. We also assume the tasks are nonpreemptive in the sense that once a task is scheduled to a processor, it cannot be terminated until it finishes its computation on that processor.

#### 4. The TLGC Scheduling Scheme

In this paper, by considering both the computation and communication time of tasks, the initial scheduling strategy is generated through a graph neural network. The neural network is trained by reinforcement learning (RL). In the training, by observing the results of the generated scheduling strategy, a corresponding reward will be provided for the network. The reward function is set according to an evaluation mechanism, such as to minimize the DAG scheduling length. The RL algorithm utilizes this reward signal to gradually improve the scheduling scheme.

The design of the TLGC scheme faces the following challenges: (1)In DAG scheduling, as mentioned before, the communication delay between tasks cannot be ignored which makes the training much more difficult(2)The purpose of the reinforcement learning is to maximize the cumulative rewards. The reward value directly influences the DAG scheduling length, and it also determines whether the model will converge or not. Therefore, the design of the reward function should comprehensively consider the factors such as communication time of tasks and the degree of parallelism (whether or not to use all the available processors)

We now discuss how to tackle the above challenges in the subsequent subsections.

##### 4.1. Information Embedding

In each state observation, the state information (the states of the DAG and the processors) must be transformed into feature vectors to be transmitted to the policy network. One option is to create a planar feature vector containing all state information. However, this method cannot scale to arbitrary size and topology of DAGs. In addition, processing high-dimensional feature vectors will require a huge size policy network. It will be difficult to train.

Therefore, the scalability can be achieved by using a graph neural network, which encodes or “embeds” state information (e.g., the running time of tasks, the dependency structure between nodes, the communication time between tasks, and the state of processing units) into a set of embedding vectors. The method adopted in this paper is based on a graph convolutional neural network [31] but customized for scheduling. The notations used in this paper and their descriptions are shown in Table 1.

The graph embedding takes the DAG as the input whose nodes have a set of stage attributes (such as task computation time) and outputs two different types of embeddings: (1)Node embeddings capture information about nodes and their child nodes (for example, including the aggregated information along the critical path from the node)(2)DAG embeddings can summarize the information in the DAG and the processor’s information during execution

It is important that the information embedded and stored is not hardcoded. It will automatically learn the statistically significant content and how to calculate the information from the input DAG through end-to-end training. In other words, embedding can be regarded as a feature vector, and a graph neural network can learn and calculate without manual feature engineering.

Given the feature vector of node in a DAG, the embedding of each node is established where is a vector containing the information of all nodes (’s child nodes and their descedants) reaching node . In order to calculate these vectors, starting from the leaves of the DAG, the information propagates from the child nodes to the parent nodes according to a series of information passing steps (Figure 3). In each information passing step, the embedding of a node (the shadow nodes in Figure 3) whose child nodes have aggregated information from its whole child nodes is calculated as where and are nonlinear transformations on input vectors, which are realized by graph neural network, and represents child node set of . The first is the general nonlinear aggregation operation, which summarizes the embedding of ’s child nodes and the communication overhead to ’s child nodes. Adding the summary item from this aggregation to the feature vector of can obtain the embedding of . The same nonlinear transformations and are used repeatedly in all nodes and information passing steps.

When calculating the node embedding of node through nonlinear transformation, it is usually calculated in the form of a nonlinear transformation . In our scheme, the second nonlinear transformation is added. The reason is that the graph neural network cannot calculate some valuable features for scheduling without [5]. For example, it cannot calculate the critical path of the DAG, which requires a series of steps operations on nodes during information passing. Note that the communication delays play an important role in calculating this kind of critical path.

We add a summary node to the DAG to calculate the embedding of the DAG. The summary node takes all nodes in the DAG as child nodes and takes the state of processing units (processors) as its feature vector to calculate the embedding of the DAG. Similarly, the embedding of summary node is also calculated by Equation (1). That is, each aggregation step has its own nonlinear transformations and .

##### 4.2. The Design of the Scheduling Network

The TLGC scheduling scheme constructs the generation of scheduling policy into the Markov Decision Process (MDP). In each decision-making process, the scheduling policy of one node is generated. The scheduling process is illustrated in Figure 1, which is built upon [5]. However, as mentioned before, the processor network information and communication delay are considered which are ignored there.

Determining the next task (node) to be scheduled is based on the assigned score for each task. For task in the DAG, the score of node is , where is a nonlinear function for calculating the score which is realized by the two-layer fully connected neural network. Note that, at each step , only the ready tasks can be scheduled, i.e., the tasks satisfying all the precedence constraints. We denote this kind of ready tasks at step as . Then, the normalization (softmax operation) is used to calculate the probability of selecting task based on the priority scores:

It should be emphasized that changes in real-time according to the execution process of the DAG and limits the output of the normalization operation.

In order to gradually improve the task selection process, we need the following reinforcement learning with the carefully designed reward function.

##### 4.3. The Design of Reward Function and Training Process

We use reinforcement learning (RL) to train the neural network through many offline (simulation) experiments. In these experiments, rewards are provided by observing each decision-making process’s operation. The rewards are set through the evaluation mechanism of the DAG scheduling (such as minimizing the makespan and maximizing parallelism). RL algorithm uses this reward signal to improve the scheduling strategy gradually. Therefore, the design of the reward signal is essential to the training effect of the network.

The finish time of each task can only be obtained after the task is scheduled on a certain processor. Therefore, we adopt two calculation methods for the reward function: (1)We take the communication delay of the tasks as the negative signal of the reward. Considering the situation that the task is scheduled to the processor , there are two cases to calculate the finish time of task . The first case is all of ’s parent nodes (tasks) have also been scheduled on processor . In this case, there is no communication cost between and its parent tasks. Denote as the starting time of on processor in this case; i.e., the time processor has finished executing all the tasks already placed on it. The second case is some of ’s parent tasks have been scheduled to another processor. In this case, there will exist communication time. Denote as the starting time of on processor in this case

At this time, reward is calculated as follows:

After the DAG task is completed, the average reward of each decision is calculated according to the scheduling length of the DAG (the maximum finish time of all tasks) and the above . We now can adjust the neural network based on Equation (5). For this equation, in the numerator means the number of processors. The denominator means the scheduling length of the DAG where means the finish time of executing all tasks placed on the processor .

Similar to [5, 32], the TLGC scheme is then trained by reinforcement learning and proximal policy optimization [33] with the strategy gradient method. The method is to learn by gradient descent of neural network parameters using the above rewards observed during training. These are commonly used methods, and we omit the details.

The above describes how to select the scheduled task. Then, we need to allocate it to some processor that satisfies its earliest start time (EST) [34]. For a task , we need to calcuate all the earliest start time on each processor and then pick the processor with the smallest value. A detailed example for calculating can be found in the TLGC processor selection example (cf. the last paragraphs of Section 5.2). Note that the communication delay between two tasks will become zero if they are placed on the same processor.

#### 5. An Illustrating Experiment for DAG Scheduling with Communication Delays

In order to give the readers a concrete feeling of DAG scheduling with different methods, taking Figure 2 as an example, we will show the corresponding scheduling results based on both the conventional scheduling algorithms and the neural network-based methods.

##### 5.1. Scheduling Results with Conventional Algorithms

For the scheduling problem of a DAG, conventional solutions are to define the properties of CP (critical path), BL (bottom level, as defined in Equation (6)), and TL (top level, as defined in Equation (7)) of a DAG. Conventional scheduling algorithms often only consider these properties but lack the global information of DAGs. Thus, the scheduling results might not be stable.

The critical path of a DAG is the longest path in the DAG. The goal of scheduling is to reduce the critical path as much as possible. The commonly used algorithms include LC algorithm [8] which repeatedly clusters the critical paths, DCP (Dynamic Critical Path) algorithm [23] which orders the tasks based on an increasing sum of TL and BL values, CPOP algorithm [9] which calculates node scheduling priority through a descending sum of TL and BL values, and MCP (Modified Critical Path) algorithm [35] which prioritize tasks with their descending BL values. Although these algorithms intuitively shorten the length of the critical path, the scheduling effects are often not satisfactory.

Considering the DAG in Figure 2, the number of available processors is 3, and the length of the critical path of the DAG before the scheduling is 21. Figure 4(a) shows the scheduling result of the LC algorithm, which maps the DAG into 3 clusters where each processor hosts one cluster of tasks. Note that there is a total order of the tasks in each linear cluster. The scheduling length of the LC algorithm is 17. Figures 4(c) and 4(d) show the scheduling results by employing the DCP and MCP algorithms, respectively. Their scheduling lengths are both 14 which is smaller than the one by the LC algorithm. After performing the CPOP scheduling algorithm, the scheduling length becomes 13, which is shown in Figure 4(b).

**(a) LC**

**(b) CPOP**

**(c) DCP**

**(d) MCP**

As shown in the scheduling results of utilizing the four representative scheduling algorithms, we can see the key to these conventional scheduling algorithms is to reduce the critical path’s length. However, this process ignores the communication time of noncritical path nodes, which also plays an important factor in affecting the DAG’s scheduling length.

##### 5.2. Scheduling Results with Neural Networks

In order to cope with the above issue and to get a smaller scheduling length, more and more researchers often use the model of recurrent neural network (RNN) [26] to read the input information about operations and their dependencies to generate scheduling strategies. Moreover, they use reinforcement learning (RL) to continuously optimize training to generate better strategies [10, 27, 28]. For example, under the scheduling model of seq2seq [28], the scheduling length of DAG in Figure 2 is 13, which is shown in Figure 5(a).

**(a) seq2seq**

**(b) GAT**

**(c) GraphSAGE**

**(d) TLGC**

As mentioned before, although the sequence to sequence recurrent neural network model benefits natural language processing, it only serializes the input information. Therefore, in order to extract the information from the graph, researchers proposed graph neural networks [29]. Common graph neural networks include graph attention network (GAT) [11] and GraphSAGE [12]. Figure 5(b) shows the scheduling result based on the GAT model. Its scheduling length is 13 since it does not thoroughly learn the impact of communication time on the scheduling length. Figure 5(c) shows the scheduling result based on the GraphSAGE model, and its scheduling length is 12 because it does not thoroughly learn the impact of previously scheduled tasks on subsequent tasks. Figure 5(d) shows the scheduling result of our TLGC scheme, and the length is 11, which is the smallest among all the scheduling methods.

The output of our TLGC scheme gives the scheduling order of tasks in Figure 2 as , , , , , , and . The processors that can handle tasks are , , and . For a task , its earliest start time determines the processor where it will be executed. For in Figure 2, its earliest start time is 0 for each processor. Without loss of generality, we schedule task to processor . is also scheduled on processor (); otherwise, the communication with will increase the earliest start time of (). Thus, the earliest start time of is 1.

If is scheduled on processor or , its earliest start time . If is scheduled on processor , it can start after is completed and . Thus, its earliest start time is 2 on processor . The case for is similar with . Its earliest start time on processors , , and are 4, 8, and 8, respectively. Thus, is scheduled on processor and is 4.

For , it depends on and . If is scheduled on processor , the communication time can be saved and since it needs to wait for to be completed. If is scheduled on process , is the maximum between () and (). Thus, is scheduled on processor and is 5. For , its earliest start time on , , and is 7, 9, and 9, respectively. Thus, is scheduled on processor .

The parent tasks of are and . If is scheduled on processor , the communication time between and can be avoided. may start after is completed which is time 9. However, since also depends on , task can only be executed after finishes its computation which is time . Thus, is 11. If is scheduled on processor , considering its parent node , the time it may be executed is . Considering its parent node , the time it may be executed is . So is 8. If is scheduled on processor , its earliest start time is 11. Thus, is 8 on processor and is scheduled on this processor.

Table 2 shows the earliest start time for each task on different processors. The last column in Table 2 means the selected processor where the corresponding task is placed onto. The tasks are executed with the order from the top to the bottom.

#### 6. Experiments

In this section, based on both randomly generated and real-world data sets, we will compare the proposed TLGC scheduling scheme with the conventional scheduling algorithms and neural network based methods to show the superiority of the TLGC scheme proposed in this paper.

##### 6.1. Experimental Environment

The experiment was conducted on a Linux server with 56 Intel Xeon [email protected] GHz CPUs, 256 GB memory, and 3.0 TB hard disk. The operating system of the server is Ubuntu 16.04.7 LTS. The code was implemented by Python 3.7.9 and TensorFlow 1.14.0. The server was equipped NVIDIA Tesla P100 GPU with 16 GB video memory. The versions of NVIDIA driver and CUDA are 440.118.02 and 10.2, respectively.

##### 6.2. Data Sets

This subsection describes the DAG data sets used in this paper. This paper adopts two data sets: random structure [36] and tasks generated from parallelized applications.

Reference [36] generated DAGs with a random structure. However, the data set does not consider the communication cost. We add the communication overhead between tasks in the DAG without changing its topology. The communication overhead is proportional to the amount of data transmitted. Reference [37] investigates the weight of the edge is affected by the computation time of its two end nodes. As a result, the communication overhead is generated as follows: (1)A random value is generated by uniform distribution, normal distribution, or gamma distribution as the randomization parameter of communication overhead(2)The weights of the nodes connected with the corresponding edge (the source node and the destination node ) are added, and the sum is square-rooted to weaken the influence of node weights(3)Multiply the random value by the result obtained in step (2), i.e., (4)According to the requirement of CCR (Communication to Computation Ratios), the weights of edges in the DAG can be scaled

CCR represents the proportion of communication time and computation time in the DAG. In this paper, the values of CCR are set as 0.1, 1.0, and 10.0 to generate three randomized data sets.

For the real data traces, this paper uses the following six DAGs (https://github.com/workflowhub/pegasus-traces) to train and evaluate the graph neural network. The number of edges is usually far less than the square of the number of nodes in practical DAGs, so it is often a sparse graph. The attribute information of the six DAGs is shown in Table 3.

##### 6.3. Results and Analysis

The scheduling network is trained by using the above data sets. Three graph neural networks models (GAT, GraphSAGE, and TLGC) and reinforcement learning method for proximal policy optimization [33] are implemented with the data sets to obtain their scheduling lengths. The results are compared with the commonly used conventional algorithms and sequence to sequence model [10]. The conventional algorithms include the Heterogeneous Earliest Finish Time algorithm (HEFT [9]), Linear Clustering algorithm (LC [8]), and Critical-Path-on-a-Processor algorithm (CPOP [9]). Note that similar to the MCP algorithm [35] mentioned before, HEFT prioritizes the tasks based on their descending BL values, but HEFT breaks ties randomly while MCP algorithm breaks ties with BL values of descendants.

###### 6.3.1. Results on Randomly Generated DAGs

For the randomly generated DAGs, we evaluate the data sets with 100, 1000, and 5000 nodes when (as shown in Figure 6), 1.0 (as shown in Figure 7), and 10.0 (as shown in Figure 8). The experimental results show that the scheduling strategy generated by the TLGC scheduling scheme has stable performance and sufficient superiority compared with conventional scheduling algorithms, the sequence to sequence scheduling model, and other graph neural network models (GAT, GraphSAGE).

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**For data sets with different CCR and node number values, the TLGC scheduling scheme can always find a good scheduling strategy, while the performance of conventional scheduling algorithms is unstable in the sense that the performance of the same algorithm may differ greatly for DAGs with different attributes. As shown in Figure 8(a), HEFT and LC have poor performance in the data set. The scheduling length of our TLGC scheme is almost one-third of the ones by HEFT and LC. In addition, the scheduling length of TLGC is still shorter than that of CPOP, which is the best conventional scheduling algorithm on this data set. Although HEFT did not perform well on this data set, it achieves the shortest scheduling lengths among the conventional scheduling algorithms on the data sets in Figures 7(a) and 7(b). However, even for these two data sets, the scheduling strategy generated by the TLGC scheme is still significantly improved compared with HEFT on the same data sets. The reason is that the conventional algorithms are based on greedy or heuristic ideas and only consider the characteristics of one aspect of the DAG (such as the critical path), so they cannot devise an efficient scheduling based on the global information. In contrast, the TLGC scheme can learn the information of graph topology so it can schedule tasks well.

We note that for some DAGs, the scheduling strategy generated by the TLGC scheme has no remarkable improvement compared with the conventional algorithms (cf. Figure 7(c)). The result can be expected. For the DAG of a specific structure, a strategy may “happen” to find the optimal or suboptimal scheduling, so there is not much room for further reducing the scheduling length.

The scheduling based on sequence to sequence (seq2seq) neural network is also unstable. For example, the scheduling lengths of the seq2seq method are even higher than the ones by conventional scheduling algorithms (cf. Figures 7(c) and 8(c)). However, for the data sets in Figures 8(a) and 8(b), the scheduling lengths of the seq2seq method are much shorter than the ones by conventional scheduling algorithms. The reason is that the sequence to sequence neural network serializes the DAG and ignores graph structure information. Therefore, the performance fluctuates wildly, and the scheduling strategy has a certain contingency.

At the same time, it can be seen that the scheduling strategy based on TLGC reduces the scheduling lengths of the DAGs by 8%-15% compared with the strategies generated by other graph neural network models. In addition, it reduces the scheduling lengths of DAGs by 15%-25% compared with the conventional scheduling algorithms and the scheduling strategy based on the seq2seq method.

###### 6.3.2. Results on the Real Workflow DAGs

The scheduling results for the six real DAG workflows by various scheduling methods are shown in Figure 9. It can be found that the TLGC scheduling scheme still outperforms the conventional scheduling algorithms and all the other neural network-based methods. Specifically, compared with the conventional scheduling algorithms and the scheduling strategy based on the sequence to sequence model, the TLGC scheme can reduce the scheduling lengths by around 20%. Compared with the scheduling strategies based on GAT or GraphSAGE models, the TLGC scheme can reduce the scheduling lengths by around 10%.

We also evaluate the scheduling lengths of the same workflow (cycles) DAG under different scheduling methods when increasing the number of processors. Figure 10(a) compares HEFT (the frequently used conventional scheduling algorithm), sequence to sequence scheduling, and the TLGC scheduling model. Figure 10(b) compares the TLGC scheduling model with the other two neural network models (GAT and GraphSAGE). It can be seen from the figure that, when the number of available processors is not large, doubling the number of processors (from 5 to 10 and 20), the scheduling lengths decrease sharply. Then, the curve tends to be flat, and the scheduling lengths decrease slowly. When the number of processors reaches 80, the scheduling lengths tend not to change with the number of processors. This means that the number of processors is no longer the bottleneck of the task scheduling strategy, and increasing the number of processors cannot reduce the scheduling lengths.

**(a) Comparison of TLGC with conventional algorithm (HEFT) and seq2seq model**

**(b) Comparison of TLGC with graph neural network models (GAT, GraphSAGE)**

For the scheduling performance of the sequence to sequence model, we can see it does not always decrease steadily with increasing the number of processors. For example, when the number of processors increases from 70 to 80, the scheduling length does not decrease but increases, indicating that the sequence to sequence model does not thoroughly learn the structured information of the DAG and its scheduling performance is unstable.

In addition, as shown in Figure 10, with the increase of processor numbers, the TLGC scheduling scheme always achieves the shortest scheduling lengths. Compared with HEFT and the sequence to sequence scheduling model, the scheduling lengths are reduced by around 20% and this proportion tends to be stable with the increased processor numbers. Compared with the other two graph neural network (GAT, GraphSAGE) scheduling models, the scheduling lengths are reduced around 10% and are stable at 10% when increasing the number of available processors.

##### 6.4. Overhead Analysis

In this subsection, we will analyze how much the scheduling generation time will account for in the scheduling length of the DAG under various scheduling methods. For the data sets generated with and node , we randomly select ten samples and calculate the average of both the running time of generating the scheduling strategy and the scheduling lengths of different scheduling methods. We then calculate the proportion of the scheduling strategy generation time in the scheduling length. The results are shown in Table 4.

The TLGC scheme involves the cost of neural network parameter training and inference. The training requires many data sets and takes much time. However, the trained model can be used for scheduling strategy generation. Thus, we only consider the inference time for generating the scheduling strategy. According to this table, we can see that the scheduling generation time of TLGC is much higher than that of the conventional scheduling algorithms. However, for a relatively large DAG, the scheduling strategy generation overhead accounts for about 1% of the scheduling time, which can be ignored.

#### 7. Conclusion

This paper studies the DAG scheduling problem which is aimed at scheduling all the tasks offloaded from vehicles to the servers on the roadside units. The objective is to minimize the scheduling length, i.e., the maximum finish time of all tasks. We propose the TLGC scheduling scheme which adopts the reinforcement learning scheduling based on graph convolutional neural network. Different from previous works [3, 5], the TLGC considers the communication delay between tasks which makes minimizing the scheduling length more challenging. Compared with the representative conventional scheduling methods (HEFT [9], LC [8], and CPOP [9]) and the scheduling model based on seq2seq [10], the scheduling length of TLGC is reduced by 15% to 25%, and the scheduling performance remains stable with the increase of the number of processors. Compared with the other graph neural network models (GAT [11], GraphSAGE [12]), the scheduling length of TLGC is reduced by 8% to 15%.

#### Data Availability

The experiment data sets are from previously reported studies cited in the paper.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant numbers 61832006 and 61972447).