Research Article

Intelligent Inventory Control via Ruminative Reinforcement Learning

Table 1

Experimental results.

Line ā€‰Methods
ā€‰ SARSA RSarsa PRS RSarsa.TD PRS.Beta

Relative computation time/epoch
1P113026530
2P212021319
3P313129631

Average cost of early periods
4P18,4217,619 (W)8,379 (p0.43)7,597 (W)7,450 (W)
5P24,9354,606 (W)4,792 (p0.06)4,685 (W)4,411 (W)
6P310,5028,694 (W)9,958 (p0.20)9,390 (p0.07)8,472 (W)

Average cost of later periods
7P17,2147,355 (p0.68)7,051 (W)7,110 (p0.11)7,010 (W)
8P24,3084,388 (p0.90)4,248 (p0.14)4,375 (p0.84)4,194 (W)
9P38,6138,139 (p0.29)8,312 (p0.37)8,486 (p0.43)7,664 (p0.18)