Mathematical Problems in Engineering

Research Article

Ensemble Network Architecture for Deep Reinforcement Learning

The temporal and target values ensemble algorithm.

Initialize action-value network with random weights
Initialize the target neural network buffer
For episode 1, do
For , do
With probability select a random action , otherwise

Execute action in environment and observe reward
and next state , and store transition ) in
Sample random minibatch of transition ) from
set
Ensemble -learner
set
set
set
Set

Every steps reset
End for
End for