Wireless Communications and Mobile Computing

Research Article

Quality Enhanced Multimedia Content Delivery for Mobile Cloud with Deep Reinforcement Learning

ADQ Training.

Input: State
Output: Optimal policy to select action
Initialization: Experience Replay Memory D, Online Network Weights , Target Network Weights , online action
value function Q, target action value function , k=i=0
(1) *for* video-episode i=1 to E do
(2) Initialize state sequence for received selected video episode
(3) *for* m=1 to M do
(4) Select action a according to є- greedy policy from Q with probability
(5) Execute action a and observe reward
(6) Set and preprocess the state
(7)
(8) Store transition in D
(9) Sample a mini batch of tuples from distributed prioritized replay memory D
(10) // A is all possible set of actions
(11) Determine
(12)
(13) reset
(14) s=
(15) *end for*
(16) *end for*