Research Article

An Empirical Investigation of Transfer Effects for Reinforcement Learning

Algorithm 1

The Q-learning based algorithm for the sorting task. RL_Sort.
input: Straining, Qn[Sn, An]
(1) initialize
(2)  upper_bound = n + 1
(3)  train_steps = 0
(4)  success_rate = 0.75
(5)  Sgoal = [1, 2, ..., n]
(6) repeat
(7)  end = FALSE
(8)  swap_times = 0
(9)  s = Straining
(10)  current_rate = 0
(11)  repeat
(12)   Select an action a based on ε-greedy
(13)   Perform the action a and observe s′ and the corresponding reward
(14)   swap_times = swap_times + 1
(15)   if (s′ is S_goal) then
(16)    Qn[s, a]  ⟵  Qn [s, a] + α × (reward_win − Qn [s, a])
(17)    end = TRUE
(18)    Check the success rate for the latest 100 episodes and assign to current_rate
(19)   elseif (swap_times >upper_bound) then
(20)    
(21)    end = TRUE
(22)   else
(23)    if (dist(s′, S_goal) > dist(s, S_goal))
(24)     
(25)    elseif (dist(s′, S_goal) < dist(s, S_goal))
(26)     
(27)    else
(28)     
(29)    s  ⟵  s
(30)  until end is TRUE
(31)  train_steps = train_steps + 1
(32)  until current_rate >= success_rate
(33)  return Qn , train_steps