Exact solution of the Bellman equation for a -discounted reward in a two-armed bandit with switching arms
We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a -discounted reward and prove that a myopic policy is optimal.
Copyright © 1999 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.