Research Article
Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones
Algorithm 1
Temporal difference learning algorithm (adapted from [
7]).
initialize all arbitrarily | for all episodes | initialize | repeat | choose the next state using policy | observe | | | until is terminal state | where is the value of being in state and is the value estimate | of the resultant state , is the discount rate, and is the learning rate. |
|