Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

<div>Reward collection by agent trained with different reinforcement learning algorithms in 100 episodes. (a) Reward collection when the agent randomly chooses action. (b) Reward collection when the agent is trained with Q-Learning. (c) Reward collection when the agent is trained with SARSA. (d) Reward collection when the agent is trained with DDPG.</div>

Scientific Programming

fig4

Figure 4

Figure 4: Optimal Policy Learning for Disease Prevention Using Reinforcement Learning