Research Article

Quality Enhanced Multimedia Content Delivery for Mobile Cloud with Deep Reinforcement Learning

Algorithm 1

ADQ Training.

Input: State
Output: Optimal policy to select action
Initialization: Experience Replay Memory D, Online Network Weights , Target Network Weights , online action
value function Q, target action value function , k=i=0
(1)   for  video-episode i=1 to E do
(2) Initialize state sequence for received selected video episode
(3)  for  m=1 to M do
(4)  Select action a according to є- greedy policy from Q with probability
(5)  Execute action a and observe reward
(6)  Set and preprocess the state
(7)  
(8)  Store transition in D
(9)  Sample a mini batch of tuples from distributed prioritized replay memory D
(10)   // A is all possible set of actions
(11)   Determine
(12)         
(13)  reset
(14)  s=
(15)  end for
(16)  end for