Research Article
Ensemble Network Architecture for Deep Reinforcement Learning
Algorithm 1
The temporal and target values ensemble algorithm.
Initialize action-value network with random weights | Initialize the target neural network buffer | For episode 1, do | For , do | With probability select a random action , otherwise | | Execute action in environment and observe reward | and next state , and store transition ) in | Sample random minibatch of transition ) from | set | Ensemble -learner | set | set | set | Set | | Every steps reset | End for | End for |
|