Research Article
Optimal Policy Learning for Disease Prevention Using Reinforcement Learning
Algorithm 3
Deep Deterministic Policy Gradient.
(1) | Randomly initialize critic network with weight | (2) | Randomly initialize actor with weight | (3) | Initialize target network with weight | (4) | Initialize target network with weight | (5) | Initialize replay buffer | (6) | while For every episode do | (7) | Randomly initialize for exploration | (8) | Get initial observation state | (9) | while For every step in the episode do | | //Repeat until s is terminal | (10) | Section action as per the current policy and exploration strategy | (11) | Perform action and monitor rewards and new state | (12) | Store in | (13) | Sample a randomly selected minibatch of transition from | (14) | | (15) | | | //Update rule for critic to minimize the loss | (16) | | | //Update rule for actor policy using the sampled policy gradient | (17) | | | //Update rule for target network | (18) | |
|