Research Article

Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones

Algorithm 1

Temporal difference learning algorithm (adapted from [7]).
initialize all arbitrarily
for all episodes
 initialize
 repeat
    choose the next state using policy
    observe
    
    
 until is terminal state
  where is the value of being in state and is the value estimate
   of the resultant state , is the discount rate, and is the learning rate.