Complexity

Research Article

Resilience Analysis of Urban Road Networks Based on Adaptive Signal Controls: Day-to-Day Traffic Dynamics with Deep Reinforcement Learning

Table 1

The training of the DQN.


Step 1	Initialize the parameters of the agent evaluation network (EN) ; assign to the parameters of the target network ; set the training times epoch and the batch number ; set epoch = 0

Step 2	Set epoch = epoch + 1. Assign the free-flow costs of all routes as the initial perceived route costs of drivers; initialize the state of the DTD dynamic model, and set

Step 3	While (DTD model does not reach convergence)

Step 4	Based on equations (12) and (13), generate actions

Step 5	According to equations (7) and (9), obtain actual route costs on day based on and (link flow); according to route perception updating process (1), update the perceived route travel cost on day based on perceived route cost and actual route costs on day ; following this, determine the route flows on day based on route choice probability formula (2) and route assignment equation (3); according to network loading model (9) and equation (4), the link flows on day (new state ) and the actual route costs on day are achieved; since the DTD model is not converged, reward

Step 6	Store the experience into the experience database

Step 7	Select a batch of experience , from

Step 8	Based on equation (14), calculate the return

Step 9	According to (15), update the parameters

Step 10	Update the state, actual route cost, perceived route cost, and route flow: , , , and

Step 11	When the equilibrium is reached, reward takes 10000-RAI, and the RAI is derived from (23); update the parameters of the target network, . One epoch ends

Step 12	If epoch does not reach the set times, return to Step 2; otherwise, store the parameters , and the training of the DQN ends