Research Article

Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

Figure 5

Comparison of reward collection by agent trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG in 100 episodes. (a) Reward collection when the agent is trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG. (b) Sum of rewards over time when the agent is trained with different reinforcement learning algorithms Q-Learning, SARSA, and DDPG.
(a)
(b)