Research Article

Exploration Entropy for Reinforcement Learning

Algorithm 2

Probabilistic Q-learning.
Initialize arbitrarily
Initialize the policy
repeat
 Initialize s,
repeat
  a  action with probability for
  Take action a, observe reward r, and next state
  
  
  Normalize
  
  ,
until is destination
until the learning process ends