Research Article
Decentralized and Dynamic Band Selection in Uplink Enhanced Licensed-Assisted Access: Deep Reinforcement Learning Approach
Algorithm 2.
DQN training algorithm for dynamic band selection.
for each agent do | Initialize replay buffer | Initialize action value function with parameter | Initialize target action value function with parameter | Generate initial state from the environment simulator | end for | for do | for each agent do | Execute action from using -greedy policy | Collect reward and observation | Observe the next state from the environment simulator | Store the transition into | Sample random minibatch of transitions from | Evaluate the target | Perform a gradient descent step on with respect to | Every C steps, update the target network according to | end for | end for |
|