Research Article
Joint Optimization for MEC Computation Offloading and Resource Allocation in IoV Based on Deep Reinforcement Learning
Algorithm 1
Decentralized multiagent DDPG optimization method.
| Randomly initialize critic network and actor with weights and | | Initialize target network and with weights , | | Initialize replay buffer | | for episode | | Initialize a random process foe action exploration | | Receive initial observation state | | | | for | | Select action according to the current policy and exploration noise | | Execute action and observe reward and observe the next state | | Store all transitions in | | Sample a random mini-batch of transitions from | | Set | | | | Update critic network by minimizing the loss | | | | Update the actor policy by using the sampled policy gradient | | | | Update the target networks for each agent : | | | | | | end for | | end for | | end for |
|