The self-adaptive traffic signal control system serves as an effective measure for relieving urban traffic congestion. The system is capable of adjusting the signal timing parameters in real time according to the seasonal changes and short-term fluctuation of traffic demand, resulting in improvement of the efficiency of traffic operation on urban road networks. The development of information technologies on computing science, autonomous driving, vehicle-to-vehicle, and mobile Internet has created a sufficient abundance of acquisition means for traffic data. Great improvements for data acquisition include the increase of available amount of holographic data, available data types, and accuracy. The article investigates the development of commonly used self-adaptive signal control systems in the world, their technical characteristics, the current research status of self-adaptive control methods, and the signal control methods for heterogeneous traffic flow composed of connected vehicles and autonomous vehicles. Finally, the article concluded that signal control based on multiagent reinforcement learning is a kind of closed-loop feedback adaptive control method, which outperforms many counterparts in terms of real-time characteristic, accuracy, and self-learning and therefore will be an important research focus of control method in future due to the property of “model-free” and “self-learning” that well accommodates the abundance of traffic information data. Besides, it will also provide an entry point and technical support for the development of Vehicle-to-X systems, Internet of vehicles, and autonomous driving industries. Therefore, the related achievements of the adaptive control system for the future traffic environment have extremely broad application prospects.

1. Introduction

The amount of motor vehicles and correspondent travel demand are continuously increasing with economic and social development. The frequent occurrence of traffic congestion in urban road network has negative impacts on economy and environment. Due to the limited land resources of large cities and restrictions to transportation infrastructure construction from socioeconomic factors, to apply traffic management and control measures in a reasonable and effective way, improve the efficiency of existing transportation facilities, and accommodate the growing traffic demand in big cities have become significant research contents for counteracting urban traffic congestion.

Traffic control is one of the most important technical means to regulate traffic flow, improve the congestion, and even reduce emissions. Its progress and development has always been accompanied by the development of information technology, computer technology, and system science. The self-adaptive control system can adjust the signal timing parameters in real time according to the control target of the manager (such as the minimum delay of the intersection) and the arrival characteristics of the traffic flow at the intersection. Compared with timing control and actuated control, the self-adaptive control system can make better use of the overall traffic capacity of the road network and effectively improve the efficiency of road network traffic.

The traffic data collected by the current traffic control system using induction loop detector and other existing sensors is limited. With the advancement of the wireless communication technologies and the development of the vehicle-to-vehicle (V2V) and vehicle to infrastructure (V2I) systems, called Connected Vehicle or V2X, there is an opportunity to optimize the operation of urban traffic network by cooperation between traffic signal control and driving behaviors. This dissertation proposed a series of cooperative optimization methods for urban streets traffic control and driving assistant under the V2X concept. In addition to the existing induction loop detector technology, the video, infrared, radar, floating cars, and other acquisition technologies and equipment provide urban traffic control system with a network of dynamic acquisition traffic flow status data and controller state data, which greatly enriched the information environment and provides more possibilities for the informationalized and intelligent application research. Urban traffic control is entering the data-rich period of multisource holographic network traffic data from the period with only data of cross-section traffic flow.

Recent advances in traffic control methods have led to flexible control strategies for use in an adaptive traffic control system [1]. Metropolitan road traffic digitized and informationalized infrastructure and related system construction has been developed rapidly in the past decade. At the same time, the emergence of intelligent connected vehicles and automated vehicle jointly build a future traffic travel environment, whose abilities of individual information access and perception as well as the performance of response time and interactive behavior are significantly different from conventional artificial driving vehicles. However, the current self-adaptive traffic signal control system cannot effectively utilize these abundant real-time traffic data, and its theory, methods, and techniques have clearly lagged far behind the progress of its key basic technologies [2]. Therefore, the research of data-driven feedback self-adaptive coordination control in data-rich environment is proposed and actively explored by researchers [3].

Moreover, with the continuous improvement of the theory and technology of intelligent control and nonmodel control, the concept of traffic control is changing under the new traffic data environment shown as Figure 1. The researchers hope that the model of the control system is based on the data model identification rather than the existing mechanism model [4]. Besides, they hope that the system is based on real-time monitoring data rather than the traffic forecast data [5] and the control system can automatically adjust the control strategy instead of the manual intervention [6].

2. The Development History and Deficiency of the Existing Traffic Self-Adaptive Control System

2.1. The Development History of the Existing Traffic Self-Adaptive Control System

According to NCHRP, more than 20 self-adaptive traffic control systems have been developed by transportation research institutes and enterprises worldwide, but less than half systems have been put into use [7]. According to the system’s ability to adapt to the environment and the level of intelligent decision-making, Gartner et al. proposed the evolution of urban transport control system development level in 1995 [8], as shown in Figure 2.

The first-generation self-adaptive control system adopts the multi-time timing control of fine division of period, or completely isolated self-adaptive control, to realize the simple regulation of traffic flow. Take the multi-period timing control system as an example, which divides the traffic flow arriving within a day into multiple periods (such as peak, nonpeak), taking into account changes in daily traffic demand to optimize the signal timing scheme in different periods of time each day, using the comprehensive performance index method or green wave band timing method to optimize and generate a signal timing scheme library [10]. According to the number of weeks and control period, traffic controller can directly select the appropriate offline scheme from the scheme library.

The second-generation traffic signal control system dynamically adjusts the parameters of the signal timing scheme (signal period, green signal ratio, and phase difference). Compared with the timing and induction coordination control system, the second-generation system greatly improved the flexibility and adaptive adjustment ability of the control system. Typical second-generation control systems include SCATS [11] and SCOOT [12].

The third-generation control system uses the similar idea as the second generation to dynamically adjust the signal timing parameter in response to the fluctuation of the time-varying traffic flow at the intersection. Typical third-generation control systems include OPAC [13] and RHODES [14]. Kosmatopoulos et al. chose three traffic networks with quite different traffic and control infrastructure characteristics: Chania, Greece (23 junctions); Southampton, UK (53 junctions); and Munich, Germany (25 junctions), where it has been compared to the respective resident real-time signal control strategies TASS, SCOOT, and BALANCE. The main conclusion drawn from this high-effort inter-European undertaking is that traffic-responsive urban control is an easy-to-implement, interoperable, low-cost real-time signal control strategy whose performance, after very limited fine-tuning, proved to be better or, at least, similar to the ones achieved by long-standing strategies that were in most cases very well fine-tuned over the years in the specific networks [15].

The fourth-generation self-adaptive traffic signal control system is an integrated traffic management and control system, which can realize the integrated management of network traffic and maximize the technical and performance advantages of multiple subsystems [16]. It integrates self-adaptive traffic signal control system and other ITS traffic management systems with system hardware and software integration technology, like dynamic process models of combined traffic assignment and control with different signal updating strategies [17]. It is committed to building an efficient urban traffic control integrated management system to achieve the integration of mobile network management so that it can provide better decision support for local government decision-making [18].

The fifth-generation self-adaptive traffic signal control system is based on the abilities of self-learning and high efficiency calculation in automated vehicles and regular vehicles environment [19]. Based on the empirical information and real-time traffic condition, the fifth-generation adaptive traffic signal control system learns the traffic control knowledge independently and reduces the computational burden of decision optimization intelligently. As of June 2014, InSync system has been applied in 1350 intersections in more than 100 cities across the United States and has become the fastest growing self-adaptive traffic control in the United States, which is also recommended by the FWHA currently [20]. Manolis et al. have developed and evaluated, both by means of theoretical analysis and extensive simulation experiments, a new methodology which fully automatically takes over the manual tuning and calibration procedure. Most importantly, this new methodology, called adaptive fine-tuning (AFT), achieved to improve the performance of the system and compensate the effect of the continuous changes of its behavior that may be due to either internal or external factors. The results from AFT real-life application demonstrated that it was capable of significantly improving the performance of the system in a safe and robust manner. Moreover, the real-life results exhibited the capability of AFT to efficiently adapt and compensated in cases of changes in the system behavior, even if these changes were significant [21].

2.2. The Deficiency and Expectation of the Existing Traffic Self-Adaptive Control System

Each generation system not only inherited the excellent characteristics of the previous generation traffic system, but also moves forward continuously to promote the evolution of traffic control technology under the support of the key basic technology and the guide of the new traffic control strategy. However, there are some shortcomings of the existing self-adaptive traffic control theory, method, and technology with fixed period, as follows:(1)The existing model of static traffic prediction and timing scheme does not have learning ability. Therefore, the relevant departments will recalibrate the model parameters only when the network traffic patterns have significantly changed.(2)With the expansion of the traffic network, large-scale regional road network using centralized control is difficult to guarantee the quality of data transmission.(3)The existing system is only suitable for regional traffic with significant corridor effect (due to fixed phase sequence, it can only achieve one-way green wave), and the control capacity of the typical network traffic flow in the vast majority of cities is limited.(4)Regional road network lacks timely response of the actual traffic fluctuation so that it is difficult to achieve real-time control.(5)The existing traffic control methods mostly simplify the control constraints to establish the precise mathematical model, but these methods are different from the actual traffic flow conditions and the control effect is poor.(6)The system requires a lot of human intervention, and the professional and technical personnel are needed to optimize and maintain the system due to the problem of localized migration process.

Many of the existing self-adaptive traffic control systems use the traffic model to predict the evolution of the network traffic flow under the condition of limited traffic flow data and then use the comprehensive index method to optimize the signal timing parameters. Therefore, volume prediction is an essential part. Associated with the prediction are two aspects: resolution and accuracy. It is imperative to study the relationship and tradeoff between the control strategy, prediction resolution, and its associated error, which are crucial to the development of self-adaptive traffic control systems. In a word, it is the inevitable option to study the theory and method of urban road traffic adaptive control in the future traffic data-rich environment.

3. Research on Traffic Signal Control System Based on Future Traffic Environment

3.1. The Overview of Future Traffic Environment Composition and Development

The composition of the regular vehicle (RV) traffic flow is also changed by the emergence and mixing of the Connected Vehicle (CV) and the autonomous vehicle (AV). It is foreseeable that the car traffic flow will consist of conventional vehicle, CV, and AV in the next few decades. In the “China-made 2025” national strategic plan, it is clearly put forward that China should master the overall driving technology and the key technologies to basically complete the transformation and upgrading of the automobile industry in 2025. The National Highway Traffic Safety Administration (NHTSA) in 2014 enforces that new US vehicles must have networking capabilities [22]. Nowadays, connected vehicles with highly self-driving functions (such as Google’s driverless vehicles, Tesla autopilot) and networked communications functions (such as the generic Cadillac CTS 2016) have completed several different driving conditions experiment or have been put into the market, and a variety of domestic and foreign auto companies and institutions have also entered the field of research. The United States establishes seven test sites to promote the intelligent connected vehicles testing and large-scale demonstration. Now, Nevada, Michigan, and so forth have allowed driverless vehicles to enter public road for testing.

The concept of intelligent connected vehicle was formed in the 1990s, known as cooperative infrastructure vehicle in the beginning. The United States began organizing the Intelligent Vehicle (IVI) Program, the Cooperative Automatic Highway System (CVHAS), and the Vehicle Infrastructure Integration (VII) [23] in 1998. In 2007, the US Department of Transportation renamed VII to IntelliDrive. Michigan, California, and other states gradually established connected vehicle test platform from 2012. In 2004 to 2010, Europe has developed PreVENT, SAFESPOT, CVIS (Cooperative Vehicle Infrastructure Systems) [24], COOPERS, and other projects to develop key technologies of connected vehicle system. Japan began to build VICS in 1991 and developed Smartway [25] project from 2004 so far. These systems and projects have entered the stage of large-scale system applications and related technology policy development. In recent years, under the support of the National Natural Science Foundation of China (NSFC) and the National High Technology Development Program (863 Project), Tongji University, Tsinghua University, Beihang University, and National University of Defense Technology and other academic and industrial institutions have developed several connected vehicle prototypes and test systems based on short-range communication [26]. With the continuous development of related fields such as mobile Internet and Internet of Things, cooperative infrastructure vehicle system and its application have become the new trend of the intelligent transportation system. Besides, the necessity of carrying out research on relevant theories, technologies, standards, policies, and regulations has become a broad consensus. Vehicle-road/vehicle-vehicle communication and traffic safety technology based on cooperative infrastructure vehicle system has become a research focus at this stage.

The earliest research on autonomous vehicles began in the 1980s, represented by the Navlab Self-Driving Vehicle [27] of Carnegie Mellon University and the ALV (Autonomous Land Vehicle) project of the US Defense Advanced Research Projects Agency (DAPRA) [28]. In 1995, Carnegie Mellon University developed the autonomous vehicle Navlab-5, completing a self-driving experiment of nearly 5,000 kilometers across the US, of which 98.2% was completed by the automatic driving system [29]. In 2008, Stanford University developed the driverless vehicle Junior, which can independently plan the path and realize its precise positioning, perceive other social vehicles and interact, and can achieve driving behaviors such as lane changing, U-turn, and parking [30]. In 2010, the ARGO autonomous vehicle, which was developed by Professor Alberto Broggi of the University of Parma, Italy, equipped with laser radar, camera, global positioning equipment, and so on, was exhibited at the Shanghai World Expo after more than 80 days’ travel from the Italian Palma to Shanghai [31]. The red flag HQ3 autonomous vehicle, which is developed by National Defense University of Science and Technology, completed the 286 km self-driving experiment from Changsha to Wuhan, with less than 1% manual intervention mileage of the total mileage [32]. Besides, Google, Nissan, Tesla, GM, Ford, and other companies are also involved in the study of autonomous vehicle, but the technical details of the study are usually not disclosed [33]. In the theoretical study, Levinson et al. [34] optimized the existing automatic driving system, which enabled the vehicle adapted to a variety of lighting, weather, and traffic conditions, to a certain extent, overcoming the challenges of narrow roads, crosswalks, and signal intersections.

3.2. Research Status of Traffic Signal Control System Based on Future Traffic Environment

Following the methods of earliest fixed signal timing and offline delay calculation proposed by Webster [35], the traffic signal control system has evolved from offline to online control, from point to network control, from fixed-time to self-adaptive control. With the development of intelligent transportation system, the research of a new generation of traffic control technology based on multisource heterogeneous data has been gradually started [36]. In recent years, the research on signal control based on cooperative infrastructure vehicle system has become the frontier field of domestic and foreign traffic control theory and application [37]. Professor Yang’s team at Tongji University launched the project “Research on the Next Generation of Traffic Control Technology at Intersection Based on CVIS” in 2010 to 2012 under the support of the National Natural Science Foundation, which analyzed multitarget control mechanism, vehicle-road/vehicle-vehicle communication method, and the prototype of signal control integrated platform with a single intersection as the object. The research of intelligent connected vehicle was mainly focused on the optimization methods of traffic safety, such as collision warning [38] and lane changing assistance [39]. With the concept of active safety and traffic signal control problems being put forward, driving optimization strategy for efficiency and emission reduction, such as the speed guidance strategy considering the signal light state [40], eco driving strategy [41], and so on, has been widely studied. Besides, to meet the special needs like emergency rescue vehicles and bus priority, the multimode signal priority control system considering the real-time status of special vehicles has also been put forward and achieved initial implementation [42]. Automatic driving research mainly focuses on the data collection and forecasting problem of mixed traffic flow [43] and the local optimization method based on the rolling optimization strategy [44]. Most of the optimization targets adopt efficiency-related indicators such as the least delay, the least number of stops, or the shortest across time [45]. About the control effect evaluation, most of the research output optimization control effect based on the secondary developed traditional simulation software [46]. Researches show that the traffic control which considers the mixed traffic flow of the connected vehicle and autonomous vehicle can effectively improve the traffic efficiency of the intersection, compared to the conventional traffic flow control [43].

4. Development Status of Urban Traffic Signal Self-Adaptive Control Method

Traffic congestion in urban road and freeway networks leads to a strong degradation of the network infrastructure and accordingly reduced throughput, which can be countered via suitable control measures and strategies. The traffic signal control method is evolved along with the combination of modern control theory, artificial intelligence theory, traffic information technology, and traffic engineering technology. Because the modern control theory is based on the basic assumption that the mathematical or nominal model of the controlled object is precisely known, the method is collectively referred to as Model Based Control (MBC) theory and method [47]. In the last 20 years, Artificial Intelligence (AI) theory and methods, which were represented by agents, neural networks, fuzzy logic, and group intelligence, were gradually mature. Diakaki et al. presented the design approach, the objectives, the development, the advantages, and some application results of the traffic-responsive urban control (TUC) strategy. Based on a store-and-forward modeling of the urban network traffic and using the linear-quadratic regulator theory, the design of TUC led to a multivariable regulator for traffic-responsive coordinated network-wide signal control that is particularly suitable also for saturated traffic conditions [48].

4.1. Traffic Self-Adaptive Control Method Based on Mathematical Model

According to the status and function of the traffic forecasting module in the traffic control model, the typical traffic signal MBC control method includes Travel‐Time Responsive (CTR) traffic signal control algorithm [49], Predictive Model Control [50], Arrival–Discharge Process, and Storage-Forward Response Control. Arrival–Discharge Process signal control algorithm is based on dynamic programming and the optimization of signal policy is performed using a certain performance measure involving delays, queue lengths, and queue storage ratios [51]. Storage-Forward Response Control using a real-time monitoring data of arrival and leaving traffic flow to simulate the movement of the vehicle platoon and realize the predictive control [52].

According to the different control targets, the traffic signal MBC control method can be divided into a coordinated control method based on comprehensive performance index and a coordinated control method based on green wave band. The comprehensive index method, represented by TRANSYT [53], considers the delay, the number of stops, and the length of queues to obtain the best overall efficiency of the network. The green wave band method is designed to maximize the number of nonstop platoons of the main line, and the typical arterial coordinated control method based on green wave band includes the maximum green wave band method MAXBAND and the multi-green wave band with variable band width method MULTIBAND [54].

4.2. Traffic Self-Adaptive Control Method Based on Intelligent Computing

Some researchers think that optimization of traffic lights in a congested network is formulated as a linear programming problem [55]. However, considering the complexity of the internal structure of the urban regional traffic system and the external operation environment, it is impossible to establish the precise mathematical model. There are many challenges in effectively integrating signal timing tools with dynamic traffic assignment software systems, such as data availability, exchange format, and system coupling [56]. However, little effort has been put in developing control frameworks that are aimed not only at improving the average performance of the system, but also at improving the system robustness and reliability. In the past 10 years, artificial intelligence computing technology simulates human reasoning and learning process and controls the optimal control strategy in the process of interaction between traffic controller and road environment. Fuzzy logic, group intelligence algorithms, and neural network control dominate the many traffic control methods based on intelligent computing.

4.2.1. Fuzzy Logic

The fuzzy control of urban traffic signal is one of the effective connotative solutions to solve the urban traffic problem. Pham et al. proposed a fuzzy logic control for the integrated signal operation of a diamond interchange and its ramp meter, to improve traffic flows on surface streets and motorway. This fuzzy logic diamond interchange (FLDI) comprises three modules: fuzzy phase timing (FPT) module that controls the green time extension of the current phase, phase logic selection (PLS) module that decides the next phase based on the predefined phase sequence or phase logic, and fuzzy ramp-metering (FRM) module that determines the cycle time of the ramp meter based on current traffic volumes and conditions of the surface streets and the motorways [57]. To improve the level of fuzzy controller’s skill of problem-solving, the multilevel fuzzy and other structural models have been proposed and developed from the single-point control to regional traffic control [58, 59].

4.2.2. Group Intelligence

Genetic algorithm (Genetic Algorithms, GA) and Ant colony optimization (ACO) Particle Swarm Optimization (PSO) are the most widely used strategies for simulating the social behavior of biology. Zhao et al. developed a mathematical model for a traffic equilibrium network, in which optimization of lane reorganization and traffic control strategies were integrated in a unified framework. A genetic algorithm (GA) based heuristic is used to yield meta-optimal solutions to the model. Results from extensive numerical analyses reveal the promising property of the proposed model in enhancing network capacity and reducing congestion [16]. Li and Schonfeld presented a hybrid algorithm based on simulated annealing (SA) and a genetic algorithm (GA) for arterial signal timing optimization. A decoding scheme was proposed that exploits our prior expectations about efficient solutions, namely, that the optimal green time distribution should reflect the proportion of the critical lane volumes of each phase. The numerical results indicated that the SA-GA algorithm outperforms both SA and GA in terms of solution quality and convergence rate [60].

4.2.3. Neural Network

The neural network controller creates the optimal timing based on the real-time detection of traffic system information and weather conditions. Spall and Chin proposed a system-level self-adaptive signal control (S-TRAC) method based on Artificial Neural Network (ANN) [61]. Hoogendoorn et al. proposed a new control framework based on the notion of controlled Markov processes, which explicitly take into account the uncertainty in predicted traffic conditions and system performance [62]. In contrast to traditional optimal control approaches, the objective function can include general statistic of the random system performance, such as the mean, standard deviation, or 95-percentile. Srinivasan et al. developed a multiagent unsupervised flow response signal control model based on a hybrid neural network approach, which used a multilevel online learning process to update and adjust its knowledge base and decision-making mechanism. The results showed that the new model significantly improves the traffic situation when the complexity of the scene increases, and the average delay was reduced by 78% and the average stopping time was reduced by 85% compared with the existing credit control algorithm [63]. Kosmatopoulos et al. studied the approximation and learning properties of one class of recurrent networks, known as high-order neural networks, and applied these architectures to the identification of dynamical systems. It was clear that if enough high-order connections were allowed, then this network was capable of approximating arbitrary dynamical systems [64].

4.3. The Review of Current Traffic Self-Adaptive Control Methods

Since the revolutionary development of traffic information technology provides the urban traffic control system with rich data which implied information such as system state changes and process control, the researchers began to think about data-driven traffic control methods, such as how to effectively use these data in the condition of lacking accurate model of a controlled system to achieve the optimal control of the system and the production process and thus overcome the defects of traditional MBC method. The group intelligence algorithm needs to design the efficient heuristic rule according to the background of the practical application problem, and the group intelligence optimization has the deficiencies of long computing time and local optimum [65]. The most critical problem is that the intelligent computing method has no autonomous learning ability, which needs the neural network and the fuzzy logic to increase the phase optimization module so that it can greatly increase the computational complexity of the agent.

Under the condition that the information of the traffic system cannot be completely obtained, the internal mechanism of the system cannot be fully understood, and the precise dynamic model of the controlled object cannot be established, the traffic control method based on the data-driven has the characteristics of modeless, self-learning, simple structure, small computation, and so on. It can solve the deficiencies of the MBC method, such as the need of the precise mathematical model and lack of self-learning ability. It is an inevitable choice for the development and application of self-adaptive feedback control theory of urban regional traffic feedback in rich data environment [9].

5. The Future Development Trend of Traffic Self-Adaptive Signal Control System

Regional traffic control is an important unit in urban traffic control. At present, regional traffic control problem not only has a long-time congestion phenomenon at peak time, but also has obvious ability of grooming in peak time. The traditional traffic signal control method not only lacks the adaptability for the variable traffic flow, but also relies heavily on the traffic model. The existing adaptive/neural/fuzzy control methodologies cannot be used towards the development of a systematic, automated fine-tuning procedure for general large-scale nonlinear control systems due to the strict assumptions they impose on the controlled system dynamics [66].

5.1. Reinforcement Learning Control Based on Data-Driven Method

The large-scale application of emerging technologies such as video information, probe vehicle, connected vehicle, and autonomous vehicle in the transportation industry has broken through the existing traffic data collecting mode. Reinforcement Learning, as a typical “model-free, self-learning” iterative data-driven method, is applicable of regional traffic control based on multiagent reinforcement learning [67]. The controller adopting this method has the ability to perceive the environmental state and select the optimal action according to the target. Different from other machine learning methods, in the control process, reinforcement learning only judges and evaluates the advantages and disadvantages of the changes of environmental states after selecting the action through reinforcement signals obtained by perceiving the environment, rather than intervening with the specific generation procedure. Reinforcement learning algorithm can acquire knowledge in the process of decision-making and evaluation and balance the knowledge exploration and utilization to achieve optimal strategy. Reinforcement learning algorithm has been used in many applications and successfully applied to the traffic signal control of single-point, artery, and regional transportation [68].

At present, -learning algorithm is one of the most frequently used methods in the fields of reinforcement learning, proposed by Watkins in 1989 [69]. -learning algorithm is widely used in the fields of control, depending on the update mode of its special value function.

In -learning, the solution formula of the mainstream value function is as follows:

According to the formula, at the moment of , the state of -learning is . If the taken action is , the corresponding value function will be . The update of the value function is determined by three factors. The first is the current value of the action state value function, , that needs to be updated. The second is to control the corresponding maximum value of all -values of actions in the postexecution state of , and the third is the immediate return, , after the action. Besides, there are also two model parameters, learning rate and discount factor . The former is used to balance the relationship between the learning and utilization of the algorithm. When , the controller tends to explore new knowledge; otherwise it will use the existing knowledge. The latter is used to coordinate the present relationship with the future. When , the controller tends to consider the future return, and when , the controller mainly considers immediate return.

El-Tantawy et al. proposed a multiagent coordinated RL-TSC control system called MARLIN-ATSC. This is a model-free system based on the -Learning algorithm [70]. Kosmatopoulos and Kouvelas introduced and analyzed, both by means of mathematical arguments and simulation experiments, a new learning/adaptive algorithm that can provide convergent, efficient, and safe fine-tuning of general large-scale nonlinear control systems [66]. Abdulhai et al. studied the single-point two-phase intersection control model based on -Learning algorithm and a signal control method which performs better than fixed timing scheme when flow rate changes significantly was found [71].

5.2. Research on Traffic Control Based on Adaptive Performance Optimization

The objectives, approach, advantages, and some application results of recent extensions of the traffic-responsive urban control (TUC) strategy were presented by Diakaki et al. Based on well-known methods of the automatic control theory, TUC allows for traffic-responsive coordinated signal control of large-scale urban networks that is particularly efficient under saturated traffic conditions [72]. Adaptive optimization (AO) schemes based on stochastic approximation principles such as the Random Directions Kiefer–Wolfowitz (RDKW), the Simultaneous Perturbation Stochastic Approximation (SPSA), and the adaptive fine-tuning (AFT) algorithms possess the serious disadvantage of not guaranteeing satisfactory transient behavior due to their requirement for using random or random-like perturbations of the parameter vector. Kosmatopoulos introduced and analyzed a new algorithm for alleviating this problem. Application of the proposed scheme to the adaptive optimization of a large-scale, complex control system demonstrates the efficiency of the proposed scheme [73].

Aboudolas et al. presented a methodology that the traffic flow process was modeled by use of the store-and-forward modeling paradigm, and the problem of network-wide signal control (including all constraints) was formulated as a quadratic-programming problem that aims at minimizing and balancing the link queues so as to minimize the risk of queue spillback [74]. Kouvelas et al. investigated the adaptive fine-tuning algorithm for determining the set of design parameters of two distinct mutually interacting modules of the traffic-responsive urban control (TUC) strategy, i.e., split and cycle, for the large-scale urban road network of the city of Chania, Greece. Simulation results are presented, demonstrating that the network performance in terms of the daily mean speed, which is attained by the proposed adaptive optimization methodology, is significantly better than the original TUC system in the case in which the aforementioned design parameters are manually fine-tuned to virtual perfection by the system operators [75].

5.3. Research on Traffic Control Based on Connected Vehicles and Automated Vehicles Environment

Multimodal vehicles consisting of RVs (regular vehicles), CVs (connected vehicles), and AVs (automated vehicles) develop a persistent multimodal traffic streams shown in Figure 3, which changes the traffic composition of urban road network. Information network of connected vehicles and automated vehicles affected the driving behavior and the demand of traffic stream controlling; it is urgent to do research on the control theory of multimodal traffic streams. The adaptive traffic control strategy aims to respond to real-time traffic demand through current and predicted future traffic flow data modeling. Compared with the traffic flow and occupancy information provided by the fixed coil detector in the traditional traffic environment, the adaptive traffic control system in the V2X environment can collect more detailed data such as vehicle position, speed, queuing length, and stopping time so that it received the attention of many scholars. In the adaptive control strategy under V2X environment, the concept of “rolling horizon” is widely used. Goodall et al. proposed a prediction algorithm based on micro-simulation to implement a distributed adaptive control strategy. This strategy is based on real-time vehicle data and determines 15 seconds as the optimal solution within the rolling optimization time window through simulation. The test results showed that, compared with the induction coordination control, this strategy has obvious advantages in the case of traffic accidents and sudden changes in traffic demand [76].

In the real world, due to the large-scale installation of vehicle-road cooperative communication equipment in existing vehicles, the applicability of various traffic signal control optimization strategies will be directly affected by the penetration rate of vehicle-mounted communication equipment. Therefore, many studies have considered the penetration rate of vehicle-mounted communication equipment into the optimization model. Feng et al. proposed an Estimation of Location and Speed (ELVS) algorithm based on the prevalence rate of different vehicle-mounted communication devices and used it for real-time optimization of traffic signal. Simulation test results showed that the application of this algorithm can reduce the total vehicle delay of 16.33% in the highest penetration rate of vehicle-mounted communication equipment [77]. Guler et al. proposed a traffic control optimization algorithm for unidirectional traffic intersections where some vehicles have communication conditions [78]. Lee et al. integrated vehicle-based real-time communication data and traditional traffic detection data to propose a cumulative travel-time responsive (CTR) real-time traffic control method [79]. Ma et al. evaluated the safety effectiveness of adaptive traffic signal control using the empirical Bayes method. This analysis examined 47 urban or suburban intersections where adaptive traffic signal control was deployed in Virginia using 235 site-years of before data and 66 site-years of after data. It was concluded that adaptive traffic signal control installation can potentially reduce total crashes at highway intersections and that public agencies should consider safety and mobility benefits when justifying adaptive traffic signal control projects [80]. Mandava et al. proposed a vehicle speed planning method based on signal control status of intersections. By proposing speed recommendations to the driver, the probability of not stopping passing the intersection can be increased [81].

5.4. A Review of the Development of Traffic Signal Self-Adaptive Control System Based on Oriented Future Traffic Environment

Above all, many scholars and research institutes have done a lot of research on implementation technology, traffic status analysis, and experimental methods of V2X and automated vehicles. A preliminary study of the traffic control methods for isolated intersections and other special needs under the condition of V2X has been conducted. But the research mainly focused on the construction of physical systems, adaptive analysis, preliminary theoretical research, and single or hybrid (CVs mixed with RVs or AVs mixed with RVs) vehicle traffic flow study. The theory and experimental methods of multimode vehicle traffic flow cooperative control have not been formed yet. Therefore, the study on the traffic control and coordination problems of multimode intersections under urban traffic conditions is still in its infancy. The acquisition technology and means of network traffic control data are developing in the direction of “full spatio-temporal, high-precision, diversified, and high-quality,” and it is possible to acquire a large amount of data generated during the interaction between the road traffic control system and the environment. Therefore, the vehicle terminals of connected vehicles and autonomous vehicles can respond to signal control schemes, surrounding vehicles, and road network conditions, resulting in significant changes in the operational mechanism of multimode vehicle traffic flow.

The traffic control system learns independently throughout the whole control process, which is a kind of closed-loop feedback self-adaptive control based on the control effect. As for traffic control scope, the multiagent reinforcement learning algorithm can realize the precise reasoning of the optimal joint action of the intersection with the benefit of the networked dynamic traffic data acquisition and interaction technology. The reinforcement learning algorithm itself only needs the input and output data rather than the forms of data collection. The system has good compatibility with existing and emerging traffic control systems and technologies. At the same time, under the condition of multimode vehicle traffic flow, it can more abundantly and accurately collect information such as the position and speed of the vehicle and can directly guide the connected vehicles and control the autonomous vehicles. Therefore, four control modes, including traffic signal control, connected vehicles guidance, automated vehicles control, and indirect control of conventional vehicles through connected vehicles and automated vehicles, can be formed. The collaborative control mechanism is in urgent need of research. The related research can not only creatively establish a multimode vehicle traffic flow theory, but also provide a key support for the development of a new generation of traffic control system for solving multimode vehicle traffic flow control problems. It is of great strategic significance and practical value for occupying a commanding height in this field.

6. Conclusion

The multimode traffic flow consisting of conventional vehicles, intelligent connected vehicles, and automated vehicles is gradually becoming the norm throughout the world. Therefore, it is imperative to build a new generation of traffic control systems to meet its development and application needs. Most of the existing traffic control systems adopt “prior” feedforward control method or delay-based limited information control method. The control effect depends on the accuracy of the model describing the actual traffic environment, and it cannot learn and adjust the control knowledge online based on the feedback of the control effect. Obviously, the large-scale development and application of new technologies such as floating vehicles, vehicle-to-vehicle communication (V2V), Internet of vehicles (V2I), V2X, and automatic driving will greatly promote the development of the technical route of urban traffic control systems from the data-poor era to the data-rich era. The real-time detection of the spatiotemporal data based on urban road network traffic status can provide rich and high-quality basic data and fine-grained assessment of control effects for traffic control. In the face of the main defects encountered in the existing self-adaptive traffic control system, relying on abundant traffic control data and using the data-driven approach to delve a closed-loop feedback self-adaptive control system with better uncertainty response capability and higher intelligent decision-making level are inevitable result of the objective needs of the development and application of traffic control and advanced infrastructure technologies. Also, it can provide support for the interaction between the traffic control system and the multimode traffic flow.

Therefore, facing the limitations and major shortcomings of existing traffic signal control systems, relying on a wealth of traffic control interaction conditions and data, and developing a collaborative control system with a high degree of refinement, precision, and better responsiveness and intelligence are the objective need and development direction of traffic control technology. Although the outcome of this paper is multi-intersection coordination control theory under the oriented future traffic environment, it can provide scientific support for the development of future road network traffic control systems and can be widely used in new generation traffic control systems. Also, it can improve road network efficiency to a greater extent, reduce traffic operation costs, prevent and mitigate traffic congestion at intersections, and reduce energy consumption and emissions. Traffic signal control based on reinforcement learning is a true sense of closed-loop feedback self-adaptive control and the instantaneity, accuracy, and self-learning can be guaranteed, which will be one of the future research trends. Besides, it will also provide an entry point and technical support for the development of V2X systems, Internet of vehicles, and autonomous driving industries. Therefore, the related achievements of the adaptive control system for the future traffic environment have extremely broad application prospects.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


The research is supported by Project of National Natural Science Foundation of China (Project no. 61773293) and Key Project of National Natural Science Foundation of China (Project no. 51238008).