We consider the symmetric Poissonian two-armed bandit problem. For 
the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a <mml:math alttext="$\beta $" id="E2" xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>&#x3B2;</mml:mi></mml:math>-discounted reward and prove that a 
myopic policy is optimal.

Exact solution of the Bellman equation for a <mml:math alttext="$\beta $" id="E1" xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>β</mml:mi></mml:math>-discounted reward in a two-armed bandit with switching arms

Exact solution of the Bellman equation for a <svg style="vertical-align:-3.56265pt" height="16.6625" version="1.1" viewBox="0 0 10 16.6625" width="10" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><g transform="matrix(.017,-0,0,-.017,.062,12.162)"><path id="x1D6FD" d="M558 587q0 -32 -14 -61t-40 -53.5t-48.5 -41t-54.5 -36.5q144 -51 144 -174q0 -55 -43.5 -108t-104.5 -87q-77 -42 -131 -42q-31 0 -54 20t-31 47l11 18q48 -29 108 -29q79 0 119.5 43t40.5 109t-44.5 107.5t-119.5 50.5l22 47q34 1 65 21q96 61 96 157q0 42 -24 67.5&#xA;t-62 25.5q-24 0 -43.5 -9t-35 -29.5t-27 -44t-22.5 -63t-19.5 -75.5t-18.5 -91q-57 -294 -68 -380q-26 -190 -35 -200q-26 -31 -97 -37l-4 26q19 9 48 170l77 413q23 121 52.5 187.5t83.5 114.5q70 62 148 62q51 0 88.5 -34t37.5 -91z" /></g></svg>-discounted reward in a two-armed bandit with switching arms

International Journal of Stochastic Analysis

Abstract

Copyright

Exact solution of the Bellman equation for a β-discounted reward in a two-armed bandit with switching arms

https://www.hindawi.com/journals/ijsa/1999/924375/