Research Article
A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection
Table 2
Parameters and corresponding descriptions.
| Parameter | Description |
| | Action taken at time step | | State at time step | | Reward at time step | | a set of actions | | a set of states | | Instant reward | | Transition probability between states | | The sum of reward at time step | | Learning rate | | The discount factor | | State-action value function | | Slack | | Number of bins used by the concrete algorithm | | Number of bins contained in the optimal solution of each bin packing instance | | Number of feasible optimal solution instances that the algorithm does not achieve | | Deviation percentage between the solution reached by each algorithm and the optimal solution | | Competition ratio |
|
|