A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection
Algorithm 1
Slack learning algorithm combined with RL.
Input: training data with items, container list with capacity , remaining capacity of the bin, learning rate , discount factor , and the iterative number .
(1)
Initialize Q-table;
(2)
for episode in range do
(3)
(4)
Initialize container list [1, ȷ, n];
(5)
(6)
while not do
(7)
According to state and Q-table, use epsilon greedy strategy to select actions ;