Table 1: FQL algorithm.

( 1 )    𝑡 = 0 , 𝑞 [ 𝑖 , 𝑗 ] = 0 for all 𝑖 [ 1 𝐼 ] and 𝑗 [ 1 𝐽 ] , and observer
 the state 𝑥 𝑡 .
( 2 ) For every rule 𝑖 , determine 𝛼 𝑖 ( 𝑥 𝑡 ) according to the membership
 function 𝑖 .
( 3 ) For every rule 𝑖 , select 𝑘 [ 𝑖 ] with an EEP (formula (7)).
( 4 ) Calculate the inferred action 𝑎 ( 𝑥 𝑡 ) (formula (8)).
( 5 ) Calculate the corresponding 𝑄 ( 𝑥 , 𝑎 ( 𝑥 ) ) (formula (9)).
( 6 ) Perform the action 𝑎 ( 𝑥 𝑡 ) , receive the reward 𝑟 𝑡 + 1 , and observe
 the next state 𝑥 𝑡 + 1 .
( 7 ) For every rule 𝑖 , calculate 𝛼 𝑖 ( 𝑥 𝑡 + 1 ) .
( 8 ) Calculate 𝑉 𝑡 ( 𝑥 𝑡 + 1 ) (formula (10)).
( 9 ) Calculate Δ 𝑄 (formula (11)).
( 1 0 ) Update 𝑞 [ 𝑖 , 𝑗 ] : 𝑞 [ 𝑖 , 𝑗 ] 𝑞 [ 𝑖 , 𝑗 ] + Δ 𝑞 [ 𝑖 , 𝑗 ] (formula (12)).
( 1 1 )    𝑡 𝑡 = 1 , go to (3).