Journal of Advanced Transportation

Research Article

Traffic Status Prediction of Arterial Roads Based on the Deep Recurrent Q-Learning

Algorithm pseudocode.

(1)	Initial network structure, the parameter is q. Initial target network, parameter q′ = q.
(2)	Initial trials greedy parameters epsilon, learning rate, reward, attenuation coefficient gamma, number of iterations episodes. Each episode iteration round number T, training batch size, and neural network parameter rotation cycle transfer_cycle.
(3)	for an episode in Episodes do
(4)	Initial traffic state
(5)	For t from 0 to T:
(6)	Selection behavior. (Output an integer with a range of 0 to ): Select with a probability of 1-epsilon, and randomly select the behavior with a probability of epsilon.
(7)	After the behavior is determined, find all states in the data table that match this behavior, and then randomly select one from as (If no match is found in , the behavior is redetermined).
(8)	Put experience into the memory pool.
(9)	Take out batch size data randomly and calculate q_eval and q_next respectively.
(10)	Construct:
(11)	According to q_eval and q_target, back propagation to improve the network q.
(12)	If the number of iterations is an integer multiple of transfer_cycle, then updates q′ = q.
(13)	Current state = .
(14)	When the maximum iteration number T of a single round game is reached, the training of this round is stopped, and the traffic state is returned to the initial trial.
(15)	end for
(16)	end for