Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

<div>Comparison of reward collection by agent trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG in 100 episodes. (a) Reward collection when the agent is trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG. (b) Sum of rewards over time when the agent is trained with different reinforcement learning algorithms Q-Learning, SARSA, and DDPG.</div>

Scientific Programming

fig5

Figure 5

Figure 5: Optimal Policy Learning for Disease Prevention Using Reinforcement Learning