Research Article

Dynamic Request Routing for Online Video-on-Demand Service: A Markov Decision Process Approach

Algorithm 2

Interval value iteration ( ) algorithm.
Input: a BMDP , a value function , and a place for holding the policy in current iteration
Output:   and
(1)Create , ; and hold order sequence of states in , i.e.,
(2)Create , ; and hold the transition probabilities for the order-maximizing MDP with respect
to order and , respectively.}
(3)Create , ; and is the order-maximizing index for order sequences and , respectively.}
(4)Create ; is the index into an ordering .}
(5) ;
(6) ;
(7)for all     do
(8)for all     do
(9)   ;
(10)    ;
  {find order-maximizing index for transition probability in state under action according to (18).}
(11)   for   to   do
(12)    Update and according to (19);
(13)   end for
(14)  end for
(15)    ;(*)
(16)  if   and   then
(17)    ;
(18)    ;
(19)  else
(20)     ;(**)
(21)    ;
(22)  end if
(23) end if