Research Article

Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

Figure 4

Reward collection by agent trained with different reinforcement learning algorithms in 100 episodes. (a) Reward collection when the agent randomly chooses action. (b) Reward collection when the agent is trained with Q-Learning. (c) Reward collection when the agent is trained with SARSA. (d) Reward collection when the agent is trained with DDPG.
(a)
(b)
(c)
(d)