Research Article

A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection

Table 2

Parameters and corresponding descriptions.

ParameterDescription

Action taken at time step
State at time step
Reward at time step
a set of actions
a set of states
Instant reward
Transition probability between states
The sum of reward at time step
Learning rate
The discount factor
State-action value function
Slack
Number of bins used by the concrete algorithm
Number of bins contained in the optimal solution of each bin packing instance
Number of feasible optimal solution instances that the algorithm does not achieve
Deviation percentage between the solution reached by each algorithm and the optimal solution
Competition ratio