Research Article

Decentralized and Dynamic Band Selection in Uplink Enhanced Licensed-Assisted Access: Deep Reinforcement Learning Approach

Algorithm 2.

DQN training algorithm for dynamic band selection.
for each agent do
  Initialize replay buffer
  Initialize action value function with parameter
  Initialize target action value function with parameter
  Generate initial state from the environment simulator
end for
for do
  for each agent do
   Execute action from using -greedy policy
   Collect reward and observation
   Observe the next state from the environment simulator
   Store the transition into
   Sample random minibatch of transitions from
   Evaluate the target
   Perform a gradient descent step on with respect to
   Every C steps, update the target network according to
  end for
end for