Research Article
Quality Enhanced Multimedia Content Delivery for Mobile Cloud with Deep Reinforcement Learning
Algorithm 1
ADQ Training.
Input: State | |
Output: Optimal policy to select action | |
Initialization: Experience Replay Memory D, Online Network Weights , Target Network Weights , online action | |
value function Q, target action value function , k=i=0 | |
(1) for video-episode i=1 to E do | |
(2) Initialize state sequence for received selected video episode | |
(3) for m=1 to M do | |
(4) Select action a according to є- greedy policy from Q with probability | |
(5) Execute action a and observe reward | |
(6) Set and preprocess the state | |
(7) | |
(8) Store transition in D | |
(9) Sample a mini batch of tuples from distributed prioritized replay memory D | |
(10) // A is all possible set of actions | |
(11) Determine | |
(12) | |
(13) reset | |
(14) s= | |
(15) end for | |
(16) end for |