𝙸
𝚗
𝚒
𝚝
𝚒
𝚊
𝚕
𝚒
𝚣
𝚎
𝑄
0
𝚏
𝚘
𝚛
𝑛
=
0
𝚝
𝚘
𝑁
t
o
t
−
1
𝚍
𝚘
𝜎
𝑛
=
𝚌
𝚑
𝚘
𝚘
𝚜
𝚎
𝚂
𝚝
𝚊
𝚝
𝚎
𝑎
𝑛
=
𝚌
𝚑
𝚘
𝚘
𝚜
𝚎
𝙰
𝚌
𝚝
𝚒
𝚘
𝚗
(
𝜎
𝑛
,
𝑟
𝑛
)
=
𝚜
𝚒
𝚖
𝚞
𝚕
𝚊
𝚝
𝚎
(
𝜎
𝑛
,
𝑎
𝑛
)
/
∗
𝚞
𝚙
𝚍
𝚊
𝚝
𝚎
𝑄
𝑛
+
1
∗
/
𝑄
𝑛
+
1
←
𝑄
𝑛
𝑑
𝑛
=
𝑟
𝑛
+
(
𝛾
m
a
x
𝑏
𝑄
𝑛
(
𝜎
𝑛
,
𝑏
)
)
−
𝑄
𝑛
(
𝜎
𝑛
,
𝑎
𝑛
)
𝑄
𝑛
+
1
(
𝜎
𝑛
,
𝑎
𝑛
)
←
𝑄
𝑛
(
𝜎
𝑛
,
𝑎
𝑛
)
+
𝛼
𝑛
(
𝜎
𝑛
,
𝑎
𝑛
)
𝑑
𝑛
𝚎
𝚗
𝚍
𝚏
𝚘
𝚛
𝚛
𝚎
𝚝
𝚞
𝚛
𝚗
𝑄
𝑁
t
o
t
Algorithm 1:
The
Q-learning
algorithm.