Research Article

Optimal Policy Learning for Disease Prevention Using Reinforcement Learning

Table 1

The comparison of three reinforcement learning algorithms explained in the paper in terms of best rewards and best policy when the agent is executed for 100 episodes.

AlgorithmBest rewardOptimal policy
Year 1Year 2Year 3Year 4Year 5

Random174.16[0.2, 0.7][0.6, 0.9][0.1, 0.8][0.4, 0.6][0.3, 0.1]
Q-Learning228.77[0.3, 0.1][0.3, 0.2][0.5, 0.2][0.9, 0.5][0.5, 0.1]
SARSA161.74[0.3, 0.1][0.3, 0.1][0.3, 0.1][0.3, 0.1][0.3, 0.1]
DDPG325.55[1.0, 0.8][0.1, 0.0][0.1, 0.8][0.6, 1.0][0.6, 1.0]