Research Article
An Empirical Investigation of Transfer Effects for Reinforcement Learning
Algorithm 1
The Q-learning based algorithm for the sorting task. RL_Sort.
input: Straining, Qn[Sn, An] | (1) | initialize | (2) | upper_bound = n + 1 | (3) | train_steps = 0 | (4) | success_rate = 0.75 | (5) | Sgoal = [1, 2, ..., n] | (6) | repeat | (7) | end = FALSE | (8) | swap_times = 0 | (9) | s = Straining | (10) | current_rate = 0 | (11) | repeat | (12) | Select an action a based on ε-greedy | (13) | Perform the action a and observe s′ and the corresponding reward | (14) | swap_times = swap_times + 1 | (15) | if (s′ is S_goal) then | (16) | Qn[s, a] ⟵ Qn [s, a] + α × (reward_win − Qn [s, a]) | (17) | end = TRUE | (18) | Check the success rate for the latest 100 episodes and assign to current_rate | (19) | elseif (swap_times >upper_bound) then | (20) | | (21) | end = TRUE | (22) | else | (23) | if (dist(s′, S_goal) > dist(s, S_goal)) | (24) | | (25) | elseif (dist(s′, S_goal) < dist(s, S_goal)) | (26) | | (27) | else | (28) | | (29) | s ⟵ s′ | (30) | until end is TRUE | (31) | train_steps = train_steps + 1 | (32) | until current_rate >= success_rate | (33) | return Qn , train_steps |
|