Research Article

Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

Algorithm 3

Deep Deterministic Policy Gradient.
(1)Randomly initialize critic network with weight
(2)Randomly initialize actor with weight
(3)Initialize target network with weight
(4)Initialize target network with weight
(5)Initialize replay buffer
(6)while For every episode do
(7)  Randomly initialize for exploration
(8)  Get initial observation state
(9)  while For every step in the episode do
     //Repeat until s is terminal
(10)    Section action as per the current policy and exploration strategy
(11)    Perform action and monitor rewards and new state
(12)    Store in
(13)    Sample a randomly selected minibatch of transition from
(14)    
(15)    
  //Update rule for critic to minimize the loss
(16)  
  //Update rule for actor policy using the sampled policy gradient
(17)  
  //Update rule for target network
(18)