Journal of Advanced Transportation

Research Article

Adaptive Traffic Signal Control Model on Intersections Based on Deep Reinforcement Learning

DQN with experience replay

(1)	Definition
(2)	D: = replay memory pool
(3)	N: = maximum number of experiences in
(4)	Q: = action-value function in Eval_net
(5)	action-value function in Target_net
(6)	M: = maximum number of episode
(7)	T: = maximum number of iteration in each episode
(8)	Initialization
(9)	Initial replay memory to capacity
(10)	Initial evaluate action-value function with random weights
(11)	Initial target action-value function with random weights
(12)	For episode do
(13)	Observe n steps before decision-making
(14)	Initialize environment state
(15)	For do
(16)	With probability select a random action
(17)	Otherwise select
(18)	Execute action in SUMO and observe reward and environment state
(19)	Store experience in
(20)	Sample random batch_size experiences from D
(21)	Set
(22)	Updating network parameters by perform a gradient decent step on
(23)	Every C steps reset
(24)	Set
(25)	End for
(26)	End for