Research Article

Reinforcement Learning-Based Autonomous Navigation and Obstacle Avoidance for USVs under Partially Observable Conditions

Algorithm 1

UANOA algorithm for navigation and obstacle avoidance of the USV.
Initialize replay memory
 Initialize evaluate function of the USV with random weights
  Initialize target function of the USV with random weights
  for episode = 1,2, …, M do
   for t = 1, …T do
    With probability select a random USV rudder action
    otherwise select
    Get reward and next state by executing rudder action
    Store experience in where is processed by the LSTM network
    Sample random minibatch of experience from
    Set
    Perform a gradient descent step on with respect to the
weights
    Every steps reset
   end
  end