Research Article
Optimal Policy Learning for Disease Prevention Using Reinforcement Learning
Figure 5
Comparison of reward collection by agent trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG in 100 episodes. (a) Reward collection when the agent is trained with different reinforcement learning algorithms, that is, Q-Learning, SARSA, and DDPG. (b) Sum of rewards over time when the agent is trained with different reinforcement learning algorithms Q-Learning, SARSA, and DDPG.
(a) |
(b) |