Research Article

A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection

Algorithm 1

Slack learning algorithm combined with RL.
Input: training data with items, container list with capacity , remaining capacity of the bin, learning rate , discount factor , and the iterative number .
(1)Initialize Q-table;
(2)for episode in range do
(3)
(4) Initialize container list [1, ȷ, n];
(5)
(6)while not do
(7)  According to state and Q-table, use epsilon greedy strategy to select actions ;
(8)   [1, i, n] =  [1, j, n]− [1, k, n];
(9)  Calculate immediate Reward and get next State ;
(10)  get from ;
(11)  if is not “terminal” then
(12)   ;
(13)  else
(14)   
(15)   
(16)  end if
(17)  Update ;
(18)   [1, j, n] =  [1, j, n]−;
(19)end while
(20)
(21), ;
(22)end for
Output: Q-table, .