Research Article
Dynamic Request Routing for Online Video-on-Demand Service: A Markov Decision Process Approach
Algorithm 2
Interval value iteration (
) algorithm.
Input: a BMDP , a value function , and a place for holding the policy in current iteration | Output: and | (1) Create , ; and hold order sequence of states in , i.e., | (2) Create , ; and hold the transition probabilities for the order-maximizing MDP with respect | to order and , respectively.} | (3) Create , ; and is the order-maximizing index for order sequences and , respectively.} | (4) Create ; is the index into an ordering .} | (5) ; | (6) ; | (7) for all do | (8) for all do | (9) ; | (10) ; | {find order-maximizing index for transition probability in state under action according to (18).} | (11) for to do | (12) Update and according to (19); | (13) end for | (14) end for | (15) ;(*) | (16) if and then | (17) ; | (18) ; | (19) else | (20) ;(**) | (21) ; | (22) end if | (23) end if |
|