Research Article

Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

Algorithm 1

DDPG algorithm for AUV low-level control.
(1)Input parameters M, T, N, θ, γ mmin, mmax
(2)Randomly initialize critic network and actor with weights and
(3)Initialize target network Q and μ′ with weights
(4)Initialize replay buffer
(5)For episode = 0 to M do
(6)Initialize a random process for action exploration
(7)Initialize the AUV simulation environment
(8)Receive initial observation state s1 from the AUV simulation environment
(9)For step = 0 to T do
(10)Select action according to the current policy and exploration noise
(11)Execute action at in the AUV simulation environment
(12)If then
(13)Sample a random minibatch of N transitions (si, ai, ri, si+1) from
(14)Set
(15)Update critic by minimizing the loss:
(16)Update the actor policy using the sampled policy gradient:
(17)Update the target Q networks:
(18)Update the target policy networks:
(19){End if}
(20)If
(21)Remove the oldest stored data from the reply buffer
(22)End if
(23)Obtain the new state st+1
(24)Obtain reward rt
(25)Store transition (st, at, rt,st+1) in
(26)End for
(27)End for
(28)Output parameters and with weights θQ and θμ