Research Article

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Algorithm 1

Bootstrapped and aggregated multi-DDPG (BAMDDPG).
Randomly initialize main critic networks and main actor networks with weights and
Initialize target networks and with weights
Initialize centralized experience replay buffer
for episode = 1, M do
 Initialize an Ornstein–Uhlenbeck process for action exploration
if #Env == 1 do
  Alternately select and among multiple DDPGs to interact with the environment
else do
  Select all and , each DDPG is bound with one environment
end if
for do
  for #selected DDPG do
   Receive state from its bound environment
   Execute action and observe reward and new state
   Store experience in
  end for
  for do
   Update , , , and according to equations (4)–(6)
  end for
end for
end for
Get final policy by aggregating subpolicies: