Research Article

Double Deep Recurrent Reinforcement Learning for Centralized Dynamic Multichannel Access

Algorithm 1

DDRQN algorithm.
1) For time-slot t = 1, …, T do
2)   Observe an input x(t) and feed it into the online network
3)   Generate an estimation of Q-value Q(a) for all available actions by the online network
4)   Take N actions , with ε-greedy method (according to (12)) and obtain instantaneously reward rn(t) for each SU
5)   Observe an input x(t + 1)
6)   Mark ah(t), rh(t) as the action and the reward with high scores Q(a)
7)   Store tuple x(t), ah(t), rh(t), x(t + 1) in replay memory
8)   Sample random minibatch of tuples xj, aj, rj, xj+1 from replay memory
9)   Set
10)   Perform a gradient descent step on
11) End for