Research Article

A Sarsa(λ) Algorithm Based on Double-Layer Fuzzy Reasoning

Algorithm 1

DFR-Sarsa(λ).
(1) Initialize parameter vector , eligibility trace vector , discount factor , step-size parameter
(2)Repeat(for every episode):
(3)  initial state
(4) According to (10), compute ,
(5) According to -greedy policy, select activation action ,
(6) According to (13), select action when state is
(7) According to (16), compute , ,
(8) According to (17) and (18), compute
(9) Repeat(for each step of episode)
(10)  Update eligibility trace: , ,
(11)  Take action , receive next state and reward
(12)  
(13)  According to -greedy policy, select activation action ,
(14)  According to (13), select action when state is
(15)  According to (16), compute , ,
(16)  According to (10), compute ,
(17)  According to (17) and (18), compute
(18)  
(19)  
(20)  
(21) Until is the terminal state
(22) Until preset episode number or other terminal condition meets