Research Article
Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Algorithm 1
Bootstrapped and aggregated multi-DDPG (BAMDDPG).
| Randomly initialize main critic networks and main actor networks with weights and | | Initialize target networks and with weights | | Initialize centralized experience replay buffer | | for episode = 1, M do | | Initialize an Ornstein–Uhlenbeck process for action exploration | | if #Env == 1 do | | Alternately select and among multiple DDPGs to interact with the environment | | else do | | Select all and , each DDPG is bound with one environment | | end if | | for do | | for #selected DDPG do | | Receive state from its bound environment | | Execute action and observe reward and new state | | Store experience in | | end for | | for do | | Update , , , and according to equations (4)–(6) | | end for | | end for | | end for | | Get final policy by aggregating subpolicies: |
|