Complexity

Research Article

Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

DDPG algorithm for AUV low-level control.

(1)	Input parameters M, T, N, θ, γ m_min, m_max
(2)	Randomly initialize critic network and actor with weights and
(3)	Initialize target network Q and μ′ with weights
(4)	Initialize replay buffer
(5)	For episode = 0 to M do
(6)	Initialize a random process for action exploration
(7)	Initialize the AUV simulation environment
(8)	Receive initial observation state s₁ from the AUV simulation environment
(9)	For step = 0 to T do
(10)	Select action according to the current policy and exploration noise
(11)	Execute action a_t in the AUV simulation environment
(12)	If then
(13)	Sample a random minibatch of N transitions (s_i, a_i, r_i, s_i+1) from
(14)	Set
(15)	Update critic by minimizing the loss:
(16)	Update the actor policy using the sampled policy gradient:
(17)	Update the target Q networks:
(18)	Update the target policy networks:
(19)	{End if}
(20)	If
(21)	Remove the oldest stored data from the reply buffer
(22)	End if
(23)	Obtain the new state s_t+1
(24)	Obtain reward r_t
(25)	Store transition (s_t, a_t, r_t,s_t+1) in
(26)	End for
(27)	End for
(28)	Output parameters and with weights θ^Q and θ^μ