Research Article
Optimal Policy Learning for Disease Prevention Using Reinforcement Learning
| Input: | | States: S = 1, …, n | | Actions: A = 1, …, n | | Rewards: R: S × A ⟶ R Transitions: T: S × A ⟶ S | | α ∈ [0, 1] and γ ∈ [0, 1] | | Randomly Initialize Q (s, a) ∀ s ∈ S, a ∈ A (s) | | while For every episode do | | Initialize S ∈ S | | Select a from s on the basis of exploration strategy (e.g. ε-greedy) | | while For every step in the episode do | | //Repeat until s is terminal | | Compute π on the basis of Q and strategy of exploration (e.g. π (s) = argmaxaQ (s, a)) | | a ⟵ π (s) | | r ⟵ R (s, a) | | s ⟵ T (s, a) | | Q (s′, a) ⟵ (1 − α).Q (s, a) + α [r + Q (s′, a′)] | | s ⟵ s |
|