Wireless Communications and Mobile Computing

Research Article

Decentralized and Dynamic Band Selection in Uplink Enhanced Licensed-Assisted Access: Deep Reinforcement Learning Approach

DQN algorithm.

Initialize replay buffer
Initialize action value function with parameter
Initialize target action value function with parameter
Input the initial state to the DQN
for do
Execute action from using -greedy policy
Observe and from the environment.
Store the transition into the replay buffer
Sample random minibatch of transitions from
Evaluate the target
Perform a gradient descent step on with respect to
Every C steps, update the target network according to
end for