Research Article

Ensemble Network Architecture for Deep Reinforcement Learning

Algorithm 1

The temporal and target values ensemble algorithm.
Initialize action-value network with random weights
Initialize the target neural network buffer
For episode 1, do
For , do
With probability select a random action , otherwise
Execute action in environment and observe reward
and next state , and store transition ) in
Sample random minibatch of transition ) from
set
Ensemble -learner
set
set
set
Set
Every steps reset
End for
End for