Abstract
The development of connected and automated vehicle (CAV) techniques brings an upcoming revolution to traffic management. The control of CAVs in potential conflict areas such as on-ramps and intersections will be complex to traffic management when considering their deployment. There is still a lack of a general framework for dispatching CAVs in these bottlenecks, which is expected to ensure safety, traffic efficiency, and energy consumption in real time. This study aimed to fill the technique gap, and a comprehensive cooperative intelligent driving framework is put forward to study the problem, which can be used in both on-ramp and intersection scenarios. Based on a multi-objective evolutionary algorithm, CAVs are denoted as a sequence to be searched in solution space, while a multitask learning neural network with adaptive loss function is implemented for optimization target feedback to surrogate the simulation test procedure. The simulation results show that the proposed framework can get satisfying performance with low time and energy consumption. It can reduce time consumption by up to 16.51% for the on-ramp scenario and 9.8% for the intersection scenario, while reducing energy consumption by up to 16.39% and 11.39% for the two scenarios. Meanwhile, an analysis of computation time is carried out, illuminating the flexibility and controllability of the new strategy.
1. Introduction
Connected and automated vehicles are considered to play an important role in improving traffic efficiency and saving energy [1]. The fickle driving behaviors can easily lead to a series of problems, including traffic congestion, energy consumption, and accident [2–4], but transport systems consisting of intelligent vehicles can make a difference using vehicle-to-everything (V2X) communication and advanced control techniques [5–7].
The development of connected and automated vehicles (CAVs) brings both opportunities and challenges to traffic management. As the bottlenecks in traffic organization, intersection and on-ramp become the research hot spots in the domain [8–10]. Conventionally, the vehicles must adhere to the traffic signals in urban scenarios, and corresponding studies are proposed to optimize the trajectory of vehicles in this case [1, 11, 12]. Considering the traffic environment composed of CAVs, traffic signals can be eliminated because the information on the road can be fully obtained [13], while the vehicles on the road can be fully controlled. It is possible to implement cooperative control for CAVs through V2X communication [14]. Thus, the design of a cooperative driving strategy through the use of real-time traffic information becomes particularly important. Ann and Colombo [15] pointed out that an effective cooperative driving framework can work in different traffic scenarios such as intersections, merging roadways, and roundabouts. On account of the significance of cooperative driving, the researchers proposed many theoretical methods to solve the problem for different scenarios. Grand cooperative driving challenges were also organized to promote its development in practice [16].
Some of the existing studies belong to the optimization-based method. Yan et al. [17] proposed a dynamic programming algorithm to evacuate vehicles at the intersection as soon as possible. Zhu and Ukkusuri [18] put forward a linear programming model to dispatch vehicles at autonomous intersections in order to minimize total travel time. Besides, mixed-integer linear programming (MILP) is widely used to obtain solutions [19–21]. However, Li and Wang [13] proposed a framework based on the optimization principle, which utilized a tree search algorithm to achieve the same purpose. All of the listed studies focus on searching for optimal solutions based on different prior hypotheses.
Relevant studies pointed out that the key to solving the problem is determining the right-of-way for CAVs approaching the merging area [22–24]. In other words, the vehicles can be formulated as a passing sequence in the form of arrays, and the performance of the schedule strategy hinges on the way to generate the best passing order among a large number of possible solutions.
In terms of generating passing orders, the existing studies can be classified into two categories. One is the rule-based strategy, which uses some heuristic rules to determine the passing order of vehicles. Dresner and Stone [25, 26] proposed a reservation-based system and assigned right-of-way to vehicles on a first-come-first-served (FCFS) basis. Although the effectiveness of the FCFS method can be proved [26, 27], its rule-based nature always leads to feasible but not optimal solutions. Moreover, the reservation-based strategy cannot outperform traditional signal control in some cases [28, 29]. While the rule-based strategies cannot always perform very well, the other approach to generate passing orders is introduced, called “planning-based strategy” [13]. Meng et al. demonstrated that the planning-based strategy could consistently outperform the FCFS method in intersection scenarios by comprehensive simulations [30]. Actually, the planning-based strategy is a framework that can search for optimal solutions in a huge solution space. The strategy is essentially a traversal problem with intolerable computational complexity. Therefore, consequent studies focused on the reduction in computing time. Xu et al. [31] proposed a grouping-based strategy, which groups CAVs to reduce the count of possible solutions. In their other study, a Monte Carlo tree is built to keep the trade-off between coordination performance and computation time [32]. Meanwhile, Zhang et al. [33] reported a framework that utilized a neural network to surrogate the simulation test process with the intent to reduce computation time. However, the only optimization target they considered is about traffic efficiency indexes such as passing time or total delay, while the value of other targets such as energy consumption or queue length is difficult to acquire. This is caused by the weakness of the trajectory interpretation algorithm in their studies.
Therefore, there is still a lack of a real-time, multi-objective cooperative driving strategy that can be maneuverable and reliable. To this end, we design a multi-objective discrete evolutionary algorithm (MODEA) to search for (near) optimal passing orders, which combines the non-dominated sorting method [34] and state transition algorithm [35]. A multitask learning model is proposed to be a regressor, which can give feedback of objective values to MODEA. The scenario is simulated by Simulation of Urban MObility (SUMO) [36]. The simulation results indicate that the framework can be applied to different scenarios, performing well even under a high concurrency environment.
The rest of the study is arranged as follows: Section “Problem Statement” gives the general form of cooperative driving problems and traffic scenarios the paper studied. Section “Methodology” presents the framework we proposed, including the MODEA and multitask learning method in detail. Section “Simulation and Analysis” provides the simulation results of a series of experiments. Finally, conclusions are given in Section “Conclusion.”
2. Problem Statement
Highway on-ramps and urban unsignalized intersections are two typical scenarios for cooperative driving (see Figure 1). Rios-Torres and Malikopoulos [37] pointed out that the two mainstream frameworks in the cooperative driving field are centralized coordination and decentralized coordination, respectively, while the method proposed in this study belongs to the former. The centralized frameworks rely on a central controller responsible for computing and sending control commands. The controller has a communication range (CR) that defines the boundary of communication and control. This article denotes the CR as a circle, which is widely adopted in previous studies [30, 33, 38]. Only vehicles within the CR will communicate with the controller and be controlled.

Some followed assumptions are listed to make the analysis and implementation easier:(i)Lane-changing behaviors are prohibited in CR for safety consideration.(ii)The system has no interference from pedestrians and non-motor vehicles.(iii)All CAVs can transmit id, position, speed, and other precise information to the controller spontaneously.(iv)The vehicles are homogeneous pure electric CAVs for estimating energy consumption. The energy model can be found in [39].
The general form of the objective function in cooperative driving can be defined as follows:where is the function that represents queue length, energy consumption, or traffic delay, and is the independent variable that will give rise to the optimization target. In this study, two objects are considered: (a) the minimization of time consumption to evacuate all CAVs in CR and (b) electricity consumption for CAVs in the process of a scheduling scheme.
The input of the function denotes a passing sequence, which can be denoted as follows:where is the number of vehicles in CR. Let be the time consumption to evacuate all CAVs in CR, and be the corresponding electricity consumption, and (1) can be transformed to as follows:Here, can be denoted as follows:where represents the time when the vehicle exits from CR. can be denoted as follows:where represents the energy consumption of in discrete time, and readers can refer to [39] for the stepwise energy consumption model.
3. Methodology
Figure 2 illustrates the procedure of the framework this study proposed. The framework uses MODEA with non-dominated sorting and multitask neural network to reduce computation time and implement multi-objective optimization. A population-based evolutionary algorithm is used to search solutions in solution space, while the fitness value of every individual can be obtained from a neural network, which plays the role of target regressor. Then, the framework will be introduced in detail.

3.1. Multitask Learning Model
It is found that carrying out learning for tasks jointly can improve the performance compared with conducting them individually [40]. Thus, in this study, a multitask deep learning model is trained to target the evolutionary algorithm’s feedback. Therefore, the task of the model is learning for target yield in each traffic state. Here, we consider the time consumption and electricity consumption as the targets defined in equations (4) and (5).
For performing the regression task, the input should be appropriately expressed. As in equation (2), a passing sequence can be denoted as an array including CAV ids. We define the encoding of a single CAV as follows:where is the position of from the beginning of the lane, and is the speed, while represents the acceleration of , and will be normalized for input into the model.
In addition, is the encoding of the lane that the driving vehicle belongs to. The encoding method is different according to the different traffic scenarios. For the on-ramp scenario shown in Figure 1(a), the one-hot encoding is applied. However, in the intersection scenario, considering the spatial relationship, we combine with approach direction and driving direction. Figure 3 shows the encoding process that takes the scenarios in Figure 1 as an example. For instance, vehicle D is coming from the west approach, and it will turn left at the intersection, so the encoding of its lane is . Finally, a passing sequence can be formulated as the concatenation of encodings of CAVs.

(a)

(b)
When the vectorized representations of passing sequences are constructed, a neural network model can be built to take the vectors as input. Similar to TextCNN [41], we also use the convolutional neural network (CNN) to carry out the learning process, whereas CNN can extract the features from original data automatically [42]. The structure of the CNN-based multitask learning model is shown in Figure 4. The backbone part takes sequence vectors consisting of several CAV encodings as inputs and extracts latent feature representations for them; then, the specific task part takes the feature representations as input and output time consumption and energy consumption of the sequences in a specific traffic scenario. In the backbone part, one-dimensional convolution layers with different scales of kernel size are applied to extract features.

After determining the basic structure of the neural network, the loss function should be specified to train the learning model towards the optimization goals. Here, considering the training process of two targets in two single-task models, the loss functions are considered as mean squared error (MSE), which is as follows:where is the count of test samples. and are predicting values, while and are ground truth. Generally, the loss function in the multitask learning model can be defined as the naive weighted sum of losses, which is as follows:where the loss weights and are uniform or manually tuned. The performance of the model highly depends on the settings of the weight parameters. Cipolla et al. pointed out that the loss function can be calculated based on maximizing the Gaussian likelihood with homoscedastic uncertainty [43]. As a result, let be the outputs of neural network with weights , and the likelihood as a Gaussian can be defined as follows:where is a scalar that represents observation noise. Let be the sufficient statistics; then, the multitask likelihood can be derived from the following:
Taking logarithmic form, the new loss function can be defined as follows:
Notice that and are the denominators in equation (11). To avoid division by zero errors, the logarithmic form is used for the actual training process:
Finally, the loss function is given in equation (13), which can be adaptive during the training process.
4. Multi-Objective Discrete Evolutionary Algorithm
Generally, the average count of possible passing sequences in cooperative driving grows almost exponentially with the increase in numbers of CAVs in CR [30]. Thus, searching for the best solution is hard when the number of CAVs is large, so this study proposes a population-based evolutionary algorithm to obtain (near) optimal passing order from this perspective.
In multi-objective optimization problems, the Pareto optimal solution is used to select according to the practical problem [44]. The conception of the Pareto optimal solution set is introduced as below. First, in this minimization problem, solution Pareto dominates only if:
We use the corresponding symbol to denote the domination relationship:which represents that dominates . If there is not any solution that dominates , then will be called the non-dominated solution. Accordingly, the Pareto optimal solution set can be defined as the set consisting of all the non-dominated solutions. Therefore, the primary purpose of the algorithm is to search corresponding Pareto optimal solutions. If there is more than one element in the Pareto optimal solution set, two kinds of heuristic strategies can be used:(i)Delay-first strategy (DFS): always choose the solution with minimal time consumption from .(ii)Energy-first strategy (EFS): always choose the solution with minimal energy consumption from .
The form of the candidate solutions in the algorithm is denoted as equation (2), while the initialization operation is generating different integers with ranges from 1 to . The feasible solutions make up a population in the evolutionary algorithm. Considering that lane-changing behavior is prohibited in CR, some solutions will be illegal. For example, in Figure 1(a), the passing order cannot be accepted as candidate solution because is supposed to be in front of . Hence, a repair operation is applied to repair illegal sequence, which is defined as follows:where represents a passing sequence that can be a candidate, and is a matrix that carries out the repair operation. The matrix is constructed according to the order of vehicles on the lanes. For unfeasible sequence , which represents “C-A-B” in Figure 1(a), is as follows:
Then, will be transformed to , which represents “A-C-B,” and it will be legal.
The proposed algorithm uses selection operation, crossover operation, state transition with swap operation, shift operation, and symmetry operation for population evolution. Corresponding operations can be described as follows.
4.1. Selection Operation
Non-dominated sorting technique is used for layering individuals. Algorithm 1 shows the process of non-dominated sorting. In the algorithm, is the non-dominated level, and is the set of all the non-dominated solutions in ; fitness represents the virtual value of individuals, which is used for selection operation. Eventually, the roulette wheel method is applied to choose individuals in the population; then, the crossover operation can be carried out. In the roulette wheel method, the selection probability of individual is defined as follows:where is the value of after iterations in Algorithm 1.
|
4.2. Crossover Operation
Tie-breaking crossover is introduced in this study [45]. This operation can prevent two identical orders from appearing in a sequence, and the procedure is indicated in Figure 5. The start positions and length of subsequences are generated randomly, so the results after crossover could be with duplicated items. A crossover map will also be generated, and the crossover map is actually a random order of integers . Accordingly, the new sequences after exchange can be transformed by multiplying the length of the sequence and adding the crossover map. Finally, as shown in Figure 5, offspring can be produced by sorting operation according to phase 3.

4.3. State Transition
The state transition procedure is probabilistic in the light of predefined probability value . In this study, the value of is set to 0.2 to keep the trade-off between exploration and exploitation. The state transition operations include swap, shift, and symmetry [35]. Swap transformation is used for randomly exchanging subsequences in passing sequences; shift transformation is used for subsequence translation, and symmetry transformation means two subsequences symmetrical about a selected central point exchange their values. These operations can be implemented by several matrixes, which can be denoted as follows:where is a passing sequence after iterations. , , and represent the matrix, which implements symmetry operation, shift operation, and swap operation, respectively. Figure 6 illustrates the three transformations. The length of subsequences is a hyperparameter for swap transformation and shift transformation. The values of these two operations are generated randomly according to the number of CAVs. While for symmetry operation, the length of subsequences and the position of the symmetry center can be generated randomly. Note that the boundary condition will be processed here when the indexes of elements may be out of bounds.

5. Vehicle Control
When a passing order is determined, CAVs can move in the light of the sequence. First of all, the motion of vehicles needs to be constrained by the speed limit and acceleration ability:where denotes the maximum speed limit on the road, and is the maximum deceleration, while is the maximum acceleration constraint by vehicle dynamics.
The virtual vehicle mapping method is used in the framework to ensure safety [46, 47]. Taking the case in Figure 1(a) as an example, if the passing order is “A-C-B,” then C will be mapped into . CAV B will then follow a virtual vehicle mapped by CAV C, which means the mode of motion of CAVs will be divided into two cases: free driving and car following, respectively. The control process of the CAVs in sequences can be given by Algorithm 2. Accordingly, is a function to judge whether there are potential conflicts between and . is a function to guide vehicle to follow vehicle . The equation of can be denoted as follows:where is the start time and is the time when arrives at the conflict zone or stop line. In addition, is the value of the safe gap between two consecutive CAVs. The gap here represents the distance from the front of the following vehicle to the rear of the leading vehicle. If is a real CAV, the value is set to , and if is a virtual vehicle, a correction factor should be added to it, which can be denoted as follows:where is a bool variable, if is virtual, the value of will be 1, and will be the distance for to cross the conflict zone.
|
Using Algorithm 2, the first CAV in sequence drives freely, and a CAV with a minimal relative distance with the first CAV in the rest of the sequence is chosen as car following target.
Finally, if a passing sequence is determined, it will not be altered unless the set of CAVs in CR changes.
6. Simulation and Analysis
6.1. Simulation Platform and CNN Training
This study uses the microscopic traffic simulation software SUMO to study the cooperative driving strategy in two traffic scenarios in Figure 1. Under the premise of comprehensive consideration of reality, the simulation settings are given in Table 1. The simulation step is set to for smoother time-continuous control. The radius of CR in the on-ramp scenario is set to by considering the communication capability [38]. Meanwhile, we set the radius parameter to in the urban intersection scenario because the speed of vehicles in this case is slow, while is enough for vehicle braking.
First of all, more than 50000 records were collected in SUMO for each traffic scenario to serve as the training data. The records include encoding of passing sequence and the combination of two regression targets. We use message-digest algorithm 5 (MD5) to delete duplicated data to ensure the uniqueness of the records. Because the length of CAV encoding in the two scenarios is 5 and 9, respectively, the convolution kernel sizes are set to and to extract different scales of features. The Adam optimizer is used to optimize the weights and biases for the network, and a step decay schedule for learning rate is implemented in the training process for better performance. Accordingly, the rest of the hyperparameters (e.g., batch size, the initial learning rate, and the scales of dense layers) were tuned automatically by applying tree-structured Parzen estimator (TPE), which can search significantly better results compared with random search methods [48].
6.2. Simulation Results
To evaluate the proposed strategy comprehensively, we carried out two kinds of simulations based on the pre-trained CNN model. One is a discrete simulation, which is used for observing the performance of the framework under different static numbers of vehicles to be scheduled. The other is a continuous simulation, which is served to evaluate the framework in different traffic demand levels using the trace data exported from SUMO.
We choose the FCFS strategy as a baseline, whereas it is generally used in the domain. The iteration step and population size in MODEA are set to 30 and 40, respectively. We generate different numbers of CAVs distributed in lanes randomly for the two scenarios, and the results of the discrete simulation are shown in Figure 7. Obviously, the proposed method always has a better performance than the FCFS method. While in the on-ramp scenario, the gap between the two methods becomes more significant with the increase in CAVs. Thus, the capability of global optimization of MODEA can be verified, while the rule-based FCFS method is regarded as weak to get satisfying solutions. Meanwhile, when there is more than one solution in the Pareto front, the final sequence can be chosen manually according to specific requirements.

(a)

(b)
As for continuous simulation, different arrival rates of CAVs are deployed for 2000 simulation steps, and the trace data are exported per 4 times steps. The trace datasets include the information of CAVs such as position, speed, and acceleration, and then, we reload these data in SUMO and carry out simulations. In other words, the same trace data are used for result comparison so that the randomness can be eliminated.
All results presented are averaged over 10 independent runs, when the best results are shown in bold in Table 2. According to Table 2, there is no significant difference between DFS and EFS, which may be caused by the regression error of the neural network. However, with the increase in CAV arrival rate, the difference in results between FCFS and the proposed framework gets more remarkable. It demonstrates that the MODEA can optimize the two objectives jointly.
7. Discussion about Computation Time
In cooperative driving tasks, the computation time of algorithms is vital to ensure safety and efficiency. We focus on the time performance of the proposed framework in this part, and we only consider the on-ramp scenario for evaluating computation time because the time complexity of the algorithm in the two scenarios is equal. All experiments were conducted using Julia programming language on Windows 10 operating system with Intel CORE i7-10750H CPU. Meanwhile, BenchmarkTools.jl package is used to precisely evaluate the computation time performance [49].
As Figure 8 shows, the computation time of the proposed method mainly depends on the population size of MODEA, while the number of CAVs in CR has little effect on the computation complexity, which means that we can control the computation time flexibly by setting the population size of the algorithm manually.

Meanwhile, the influence of computation time on the traffic system should be discussed. First, safety is always the most primary goal to be achieved. The impact of computing time on safety considerations will be reflected in the safe gap . The can be roughly revised with the time consumption :where is used for ensuring safety under any circumstance, so that will be changed in simulations in terms of equation (22).
Then, we carry out a series of simulations using the same trace data exported from SUMO to compare the performance of the control framework under different computation delays. In the test, the delay caused by computation varies from 0.1 s to 0.4 s, while DFS is chosen to get solutions. Figures 9(a) and 9(b) show the time consumption and energy consumption under different circumstances. On average, the FCFS rule will outperform the proposed framework in the time consumption aspect when the computation delay reaches 0.3 s, and it will have almost identical performance in the electricity consumption aspect when the computation delay reaches 0.4 s.

(a)

(b)
8. Conclusions
Over the last few years, many methods have been put forward in the cooperative driving field, but the controllability of optimization objectives and the efficiency of algorithms are still difficult to deal with. Based on the combination of evolutionary algorithm and machine learning technique, this study proposes an intelligent framework that considers both the delay and the energy consumption of vehicles. An encoding approach of CAVs is implemented, and a passing sequence of CAVs is approximately regarded as a sentence in natural language so that the TextCNN can be applied to extract features. Compared with other frameworks, it has some significant advantages:(i)Controllability and flexibility: the optimization objectives and computation time can be adjusted manually, and it can be instrumental under different design requirements.(ii)General applicability: similar to FCFS protocol, the framework can be applied in different cooperative driving scenarios such as intersection and on-ramps.
In future research, a more concrete vehicle control method is supposed to be studied for practicability. Moreover, the neural network this study implements can only deal with a finite number of cases because the input length for the network is fixed. Therefore, the maximum number of CAVs must be assigned, and the zero paddings will be used if the number of CAVs is less than the predefined maximum length. Hence, the form of the neural network and CAV encodings can be further studied for better performance; for example, the encoder-decoder structure can be applied to study the cases of different numbers of CAVs. Finally, the lane-changing behavior of vehicles and pedestrian crossing rules can be considered in the system. However, a more complex but more realistic system will be put before us to study.
Data Availability
The data used to support the findings of this study are produced by simulations.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This study was supported in part by the Key R&D Program of Jiangsu Province in China (No. BE2020013), the Opening Project of Key Laboratory of Intelligent Transportation Systems Technologies, Ministry of Communications, P.R. China (Grant No. 2020-8501), and the R&D Project of China Communications Construction Company (Grant 2019-ZJKJ-ZDZX02-2). The work of the first author was supported in part by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX21_0062).