Wireless Communications and Mobile Computing

Research Article

Double Deep Recurrent Reinforcement Learning for Centralized Dynamic Multichannel Access

DDRQN algorithm.

1) For time-slot t = 1, …, T do
2) Observe an input x(t) and feed it into the online network
3) Generate an estimation of Q-value Q(a) for all available actions by the online network
4) Take N actions , with ε-greedy method (according to (12)) and obtain instantaneously reward r_n(t) for each SU
5) Observe an input x(t + 1)
6) Mark a_h(t), r_h(t) as the action and the reward with high scores Q(a)
7) Store tuple x(t), a_h(t), r_h(t), x(t + 1) in replay memory
8) Sample random minibatch of tuples x_j, a_j, r_j, x_j+1 from replay memory
9) Set
10) Perform a gradient descent step on
11) End for