Research Article
Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement Learning
Algorithm 1
DQN with experience replay
(1) | Definition | (2) | D: = replay memory pool | (3) | N: = maximum number of experiences in | (4) | Q: = action-value function in Eval_net | (5) | action-value function in Target_net | (6) | M: = maximum number of episode | (7) | T: = maximum number of iteration in each episode | (8) | Initialization | (9) | Initial replay memory to capacity | (10) | Initial evaluate action-value function with random weights | (11) | Initial target action-value function with random weights | (12) | For episode do | (13) | Observe n steps before decision-making | (14) | Initialize environment state | (15) | For do | (16) | With probability select a random action | (17) | Otherwise select | (18) | Execute action in SUMO and observe reward and environment state | (19) | Store experience in | (20) | Sample random batch_size experiences from D | (21) | Set | (22) | Updating network parameters by perform a gradient decent step on | (23) | Every C steps reset | (24) | Set | (25) | End for | (26) | End for |
|