Computational Intelligence and Neuroscience

Research Article

Simultaneous Pickup and Delivery Traveling Salesman Problem considering the Express Lockers Using Attention Route Planning Network

Pseudocode of the routing deep Q-learning algorithm.

Input: replay memory capacity N, training episode M, target network update interval C
	Output: trained policy network parameter set
(1)	Initialize replay memory D to capacity N;
(2)	Initialize policy network Q with random parameter θ;
(3)	Initialize target network Q with parameter ;
(4)	for episode ← 1 to Mdo
(5)	Initialize sequence and preprocessed sequence ;
(6)	Initialize turn number t = 1;
(7)	while sequence done do
(8)	if With probability then
(9)	Select a random action a_t from accessible points;
(10)	else
(11)	Select ;
(12)	end
(13)	Execute action a_t in emulator;
(14)	Observe reward r_t and status set x_t+1;
(15)	Set and preprocess ;
(16)	Store transition in D;
(17)	Sample random minibatch transitions from D;
(18)
(19) Perform a gradient descent step on w.r.t θ;
(20)	ift%C = 0 then
(21)	Set , i.e., set
	end
	;
	end
	end