Research Article
A Resource Allocation Scheme for Intelligent Tasks in Vehicular Networks
Algorithm 2
Multiagent DQN algorithm.
1: Input: action space , state space , learning factor , discount factor , and other system parameters; | 2: Onput: Target Q-network . | 3: Initialization Process: | 4: for each agent do | 5: Initialize replay buffer ; | 6: Initialize the weights of Q-network ; | 7: Initialize the weights of target Q-network . | 8: end for | 9: Learning Process: | 10: for each agent do | 11: repeat | 12: Initialization state | 13: fordo | 14: Generate a random number ; | 15: ifthen | 16: Select a random action from ; | 17: else | 18: Select the action . | 19: end if | 20: Perform action ; | 21: Get new state and reward by Equation (11); | 22: Store transition in ; | 23: Sample random transitions from ; | 24: | 25: Use as a loss function to train Q-network; | 26: Updata the weights: ; | 27: ; | 28: , after steps. | 29: end for | 30: until converges. | 31: end for |
|