Research Article

Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement Learning

Algorithm 1

DQN with experience replay
(1)Definition
(2)D: = replay memory pool
(3)N: = maximum number of experiences in
(4)Q: = action-value function in Eval_net
(5) action-value function in Target_net
(6)M: = maximum number of episode
(7)T: =  maximum number of iteration in each episode
(8)Initialization
(9) Initial replay memory to capacity
(10) Initial evaluate action-value function with random weights
(11) Initial target action-value function with random weights
(12)For episode do
(13) Observe n steps before decision-making
(14) Initialize environment state
(15) For do
(16)  With probability select a random action
(17)  Otherwise select
(18)  Execute action in SUMO and observe reward and environment state
(19)  Store experience in
(20)  Sample random batch_size experiences from D
(21)  Set
(22)  Updating network parameters by perform a gradient decent step on
(23)  Every C steps reset
(24)  Set
(25)End for
(26)End for