Abstract

We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a myopic policy is optimal.