Mathematical Problems in Engineering

Research Article

A Sarsa(λ) Algorithm Based on Double-Layer Fuzzy Reasoning

DFR-Sarsa(λ).

(1) Initialize parameter vector , eligibility trace vector , discount factor , step-size parameter
(2) Repeat(for every episode):
(3) ← initial state
(4) According to (10), compute ,
(5) According to -greedy policy, select activation action ,
(6) According to (13), select action when state is
(7) According to (16), compute , ,
(8) According to (17) and (18), compute
(9) Repeat(for each step of episode)
(10) Update eligibility trace: , ,
(11) Take action , receive next state and reward
(12)
(13) According to -greedy policy, select activation action ,
(14) According to (13), select action when state is
(15) According to (16), compute , ,
(16) According to (10), compute ,
(17) According to (17) and (18), compute
(18)
(19)
(20)
(21) Until is the terminal state
(22) Until preset episode number or other terminal condition meets