Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2021 / Article

Research Article | Open Access

Volume 2021 |Article ID 6653586 | https://doi.org/10.1155/2021/6653586

Ting Yang, Fei Luo, Joel Fuentes, Weichao Ding, Chunhua Gu, "A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection", Mathematical Problems in Engineering, vol. 2021, Article ID 6653586, 15 pages, 2021. https://doi.org/10.1155/2021/6653586

A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection

Academic Editor: José García
Received21 Dec 2020
Revised09 Apr 2021
Accepted10 May 2021
Published19 May 2021

Abstract

The slack-based algorithms are popular bin-focus heuristics for the bin packing problem (BPP). The selection of slacks in existing methods only consider predetermined policies, ignoring the dynamic exploration of the global data structure, which leads to nonfully utilization of the information in the data space. In this paper, we propose a novel slack-based flexible bin packing framework called reinforced bin packing framework (RBF) for the one-dimensional BPP. RBF considers the RL-system, the instance-eigenvalue mapping process, and the reinforced-MBS strategy simultaneously. In our work, the slack is generated with a reinforcement learning strategy, in which the performance-driven rewards are used to capture the intuition of learning the current state of the container space, the action is the choice of the packing container, and the state is the remaining capacity after packing. During the construction of the slack, an instance-eigenvalue mapping process is designed and utilized to generate the representative and classified validate set. Furthermore, the provision of the slack coefficient is integrated into MBS-based packing process. Experimental results show that, in comparison with fit algorithms, MBS and MBS’, RBF achieves state-of-the-art performance on BINDATA and SCH_WAE datasets. In particular, it outperforms its baseline MBS and MBS’, averaging the number increase of optimal solutions of 189.05% and 27.41%, respectively.

1. Introduction

As a classical discrete combinatorial optimization problems [1, 2], the bin packing problem (BPP) [3, 4] aims to minimize the number of used bins to pack items and it is NP-hard [5, 6]. In the past few decades, four main approaches have been extensively studied to resolve the BPP, such as exact approaches [79], approximation algorithms [4, 10], heuristic algorithms, and metaheuristic algorithms [11, 12]. The exact algorithms typically prune the lower bound information to address the BPP, which is suitable for small-scale instances. When the scale of datasets increases, the BPP becomes challenging to the approximation algorithms. The implementation of the metaheuristic algorithms is difficult due to the rigorous requirements on parameter adjustment and calculation complexity [13]. In the contrast, the heuristic algorithm is a popular bin packing method due to its efficiency on solving NP-hard problems.

As one of typical heuristic algorithms, the minimum bin slack (MBS) is particular useful to problems where an optimal solution requires most of the bins, if not all, to be exactly filled [14]. It is also useful for solving the problems where the sum of requirements of items is less than or equal to twice the bin capacity. In MBS, the selection of the packing sequence of the items is based on a predetermined strategy, which ignores the sampling deviation between the data of the items to be packed and cannot explore the global data space.

Therefore, the MBS algorithm may quickly fall into local optimal solutions ignoring the exploration of global item space in the training process. In the stage of iterative training, the deviation of the locally optimal solution is accumulated continuously, and the global optimal solution space is shifted steadily. It may result in a significant difference between the algorithm’s packing result and the optimal solution, which may lead to the failure to achieve the desired performance [14].

In order to solve the problems of MBS described above, we propose a reinforced bin packing framework, dubbed RBF, to resolve the BPP, where a reinforcement learning (RL) method, i.e., the Q-learning algorithm, is exploited to select a high-quality slack for the packing process. The RBF treats Q-learning as a prior data spatial information detector. To ingeniously select data samples as representatives of the datasets, it explores an intrinsic spatial distribution of sample bins by interacting with the environment and estimating the optimal slack of the global bins. The learned slacks are finally exploited in the improved MBS algorithm to pack items.

The proposed RBF can be distinguished from previous work in terms of the following characteristics:(1)The reinforced learning algorithm is exploited to generate the slack automatically, which is further integrated to the MBS algorithm. With high-quality slacks, automatically, rather than manual design or empirical speculation, our method prevents the bin packing process from falling into a local optimal solution, which is a quite challenging problem especially for the large-scale dataset.(2)The instance-eigenvalue mapping function is introduced to efficiently select representative and classified validate set of the input instances based on their similarity. This enables RBF to reduce the learning cost while generating a dynamic slack during the packing process.

The rest of this paper is organized as follows. Related work is presented in Section 2. The formulation of the BPP is depicted in Section 3. In Section 4, we briefly overview the design of the RBF and then detail its key components, such as the RL-system, the reinforced-MBS strategy, and the instance mapping process. Experimental results and theoretical analyses are presented in Section 5. Finally, conclusions are drawn in Section 6.

Existing methods that address the BPP can be roughly classified into four major categories: the exact approaches [1520], the approximation algorithms [4, 21, 22], the heuristic algorithms [14, 2327], and the metaheuristic algorithms [1113, 2830]. In recent years, the RL-based methods [3135] have also been proposed to resolve the BPP.

2.1. Exact Approaches

The exact approaches establish mathematical model and obtain the optimal solution of the problem by solving the mathematical model through optimization algorithms. CPLEX [36] solved the problem with mixed integer programming. Polyakovskiy and M’Hallah [15] characterized the properties of the two-dimensional nonoriented BPP with due dates, which packed a set of rectangular items, and experimentally proved that a tight lower bound enhanced an existing bound on maximum lateness for 24.07% of the benchmark instances. Since the quality of the solution depends on whether the model is reasonable or not, they are only applicable to small-scale instances.

Subsequent improvements focused on reconsideration about constraints in a novelty manner. Chitsaz et al. [18] proposed an algorithm to separate the subcontour elimination constraints of fractional solutions to solve production, inventory, and inbound transportation decision problems. The inequalities and separation procedures were used in a branch-and-cut algorithm. A similar idea was proposed in Mara’s work [20], where an exact algorithm was proposed based on the classic -constraint method. The method addressed N single-objective problems by using reduction with test sets instead of an optimizer. Besides, one classic method belonging to this group was the arc-flow formulation method [9] which represented all the patterns in a very compact graph based on an arc-flow formulation with side constraints and can be solved exactly by general-purpose mixed integer programming solvers. Generally, when the scale of the problem becomes larger, the phenomenon of “combinatorial explosion” will lead to heavy computational overhead in the optimization process. It is difficult for the exact algorithm to be applied to large-scale combinatorial optimization problems.

2.2. Approximation Algorithms

Approximation algorithms are popular because their time complexity is polynomial, while they do not guarantee to find the optimal solution. Typical approximation algorithms include greedy algorithms and local search. Based on the observation and arrangement of Earth observation satellites, the authors in [21] proposed an index-based multiobjective local search to solve multiobjective optimization problems. Kang and Park [37] considered the problem of variable-size bin packaging and described two greedy algorithms. The objective was to minimize the total cost of used bins when the unit size cost per bin did not increase as the bin size increased. Moreover, the survey [38] presented an overview of approximation algorithms for the classical BPP and pointed out that although the approximation algorithms are universal, there is always a gap between the solution and the optimal solution under the polynomial time complexity. However, approximation algorithms are commonly subjective to polynomial time and cannot give guarantees of solutions.

2.3. Heuristic Algorithms

Heuristic algorithms are based on the intuitive and empirical design. Several new heuristics for solving the one-dimensional bin packing problem are presented [39]. Coffman and Garey [10] reviewed various heuristic algorithms, such as NF (Next Fit), FF (First Fit), BF (Best Fit), and WF (Worst Fit) [23]. These are typical online packing algorithms [40, 41] and are called as fit algorithms. Their corresponding offline packing algorithms are NFD [24], FFD, BFD, and WFD [23], which differ significantly from online packing algorithms in which offline algorithms rely on overall information for sorting. The fit algorithms, for example, FF, WF, and BF, give priority for further packing to the bins that have already been packed with items, and a new bin will be activated only when there are no suitable nonempty bins for the current item. The strategy adopted by the fit algorithms ensures that each arriving item can always find a bin to be accommodated. However, it cannot guarantee that the item is the target item for optimum solution under the current situation. To address the issue, Cupta and Ho [14] proposed MBS, which mainly centered on bins and tried its best to find the collection of items that fill the bins. One problem with this method is that its sequence selection strategy often falls in the local region of the input space, which makes it hard for accurate estimation of the slack. Thus, it may result in a locally optimal solution. To solve the above problems, some methods have been proposed. Fleszar and Hindi [42] found that one effective hybrid method integrated perturbation MBS’ and a good set of lower bounds into variable neighbourhood search (VNS), so as to improve its ability in reasonably short processing times. However, due to the complexity and uncertainty of combinatorial optimization problems, heuristic algorithms that rely on empirical criteria are not always reliable.

2.4. Metaheuristic Algorithms

Metaheuristic algorithms are widely used to find optimal solutions for solving problem of BPP. Early typical representatives include genetic algorithms [28] and simulated annealing algorithms [29]. The former is a promising tool for the BPP and one significant improvement is mainly used: grouping genetic algorithms (GGAs). Dokeroglu and Cosar [43] proposed a set of robust and scalable hybrid parallel algorithms. In GGA-CGT (grouping genetic algorithm with controlled gene transmission) [44], the transmission of the best genes in the chromosomes was promoted while keeping the balance between selection pressure and population diversity. Kucukyilmaz and Kiziloz proposed island-parallel GGA (IPGGA) in [45]. It realized the choice of communication topology, determined the migration and assimilation strategies, adjusted the migration rate, and exploited diversification technologies. Crainic et al. [46] proposed a two-level tabu search for the three-dimensional BPP by reducing the size of the solution space. Kumar and Raza [47] incorporated the concept of Paretos optimality for the BPP with multiple constraints and then proposed a family of solutions along the trade-off surface. However, due to the lack of particle diversity in the later stage of genetic algorithms as well as PSO algorithms, premature convergence always occurs [28].

2.5. RL-Based Methods

Machine learning has been extensively studied to resolve the NP-hard BPP by scholars in recent years. Ruben Solozabal’s model tackled the BPP with RL. It trained multistacked long short-term memory cells to perform a recurrent neural network agent, which could embed information from the environment. The performance of the model was just comparable to the FF algorithm when introducing neural network overhead. Inspired by Pointer Network [48], a deep learning technology was successfully applied to learn and optimize the placing order of items [32], solved the classic TSP problem [33], and tackled the 3D BPP. These methods utilized RL to ensure the solution would not converge to local optimum, while they attempted to exploit neural networks [49] in the RL to solve the BPP, which increased the computational cost and time complexity. Heuristic algorithms rely on empirical criteria to consider predetermined strategies and ignore the dynamic exploration of the global data space in BPP. RL-based methods can intelligently mine data information from the environmental space through trial and error. Perhaps, it can help the existing heuristic algorithms to fully explore the effective information in the sample space, which inspired our method.

3. Formulation of the BPP

The classic one-dimensional BPP is formalized as follows. It is assumed that there are n items to be packed into bins with equal capacity . The general objective is to find a packing way to arrange all items with the minimum number of bins, of which the formal mathematical description can be defined as :

Therein, represents the indicator whether the th bin is used or not. A value of 1 indicates that the bin is used, and a value of 0 indicates that it is not used. Note that once the bin is used, the total load of the items placed in cannot exceed the capacity of . Thus, we havewhere means the load of the th item and is an indicator whether the th item is packed into the th bin or not. Especially,  = 1 if the th item is placed into the th container, otherwise  = 0. Furthermore, an equally fundamental constraint is that each item is just placed into one bin:

The detailed explanation of parameters for the formalization is defined in Table 1.


VariableTypeMeaning

DiscreteThe th bin
DiscreteThe th item
DiscreteThe objective function of the number of bins used
ConstantThe capacity of the bin
ConstantThe load of the th item
BinaryIndicator whether the th item is packed into the th bin or not
BinaryIndicator whether the th bin is used or not

4. Design of RBF

In this section, the design of the proposed RBF framework is presented. First, the overview of the RBF is outlined, and then, the details of its key components, such as the RL-system, the reinforced-MBS strategy, and the instance mapping process, are presented.

4.1. Overview

The classical MBS algorithm follows two steps:(1)Utilize lexicographic search optimization procedure [14], also referred as the algorithm, to find the item set that should be allocated to the bin (2)Utilize Step 1 to traverse all items to be packed and the minimum bin slack is , where is the load of the packed th item

The steps above means that the slack is utilized to jump out of the optimal local trap randomly in the classical MBS algorithm, while the exact distribution of the sample space is ignored. To resolve the instability of the random slack, a new bin packing framework, RBF, is presented, where the slack is learnable and adjusted according to the samples’ structure.

The framework of RBF is illustrated in Figure 1, which consists of a RL-system, a reinforced-MBS strategy, and an instance-eigenvalue mapping process, and defined as follows:(1)RL-system: the RL-system is used to generate a suitable slack by a reinforcement learning strategy, where the best action selection strategy is controlled by Q-agent(2)Reinforced-MBS strategy: with the provision of the slack coefficient from the RL-system, the reinforced-MBS strategy is exploited to resolve the packing process(3)Instance-eigenvalue mapping: instead of using the whole dataset directly, the instance-eigenvalue mapping is utilized to generate the representative and classified validate set for the RL-system based on the similarity of the input instances

The main idea of RBF is to utilize the RL-system to learn the slack according to the spatial variation of the sample dataset, and then, the slack can be adapted to the distribution of bins and the remaining items in the data space during the iterative packing process. With the instance-eigenvalue mapping, the representative and classified validate set of the input instances is generated. The validate set is further integrated into the RL-system, where an adaptive slack is generated by the Q-agent. The coefficient of the slack is finally applied in the reinforced-MBS strategy for the packing process.

4.2. Instance-Eigenvalue Mapping

To reduce the amount of calculation for the slack, the representative items are selected for the Q-agent, which can learn the data space without traversing all instances. Here, an instance classification method, called as instance-eigenvalue mapping, is proposed and defined aswhere is an given instance, is the average value of the items in the th instance in the dataset, and , respectively, represent the minimum value and maximum value of the items in the th instance, and denotes the instances eigenvalue of the th instance.

According to the value of the instance-eigenvalue, the whole instances are reordered. The dataset can be divided into different subsets . The last instance of each subset is taken to form a validation set. Then, the validation set is utilized to iteratively learn the slack. Therefore, at each time step in RBF, instead of using the whole instances, Q-agent utilizes the validation set to reduce the repetitive work of the system.

4.3. RL-System

The validate set is integrated into the RL-system with a Q-learning algorithm [50], where Q-agent is utilized to learn the appropriate strategy and then improve the MBS strategy by selecting high-quality slack.

The process of the RL-system can be described as Markov decision processes (MDP) which is represented as a tuple . In the decision-making process of MDP, is the state set, is an action set, is the transition probability between states, is the return value after taking a certain action to reach the next state, and is the discount factor. To be adaptive to the packing circumstances, for example, the current distribution of containers and the remaining items, we proposed a slack learning algorithm and the detailed process is shown in Algorithm 1, where the parameters are illustrated in Table 2. By observing the current state of the environment, Q-agent selects one action that maximizes the value of reward function according to the state observed. With Q-agent continually interacting with the environment, we explore an suitable data selection strategy of the slack coefficient. The algorithm returns both a reward and a new state to Q-agent in each packing iteration, of which the change of states depends on the state transition probability :

Input: training data with items, container list with capacity , remaining capacity of the bin, learning rate , discount factor , and the iterative number .
(1)Initialize Q-table;
(2)for episode in range do
(3)
(4) Initialize container list [1, ȷ, n];
(5)
(6)while not do
(7)  According to state and Q-table, use epsilon greedy strategy to select actions ;
(8)   [1, i, n] =  [1, j, n]− [1, k, n];
(9)  Calculate immediate Reward and get next State ;
(10)  get from ;
(11)  if is not “terminal” then
(12)   ;
(13)  else
(14)   
(15)   
(16)  end if
(17)  Update ;
(18)   [1, j, n] =  [1, j, n]−;
(19)end while
(20)
(21), ;
(22)end for
Output: Q-table, .

ParameterDescription

Action taken at time step
State at time step
Reward at time step
a set of actions
a set of states
Instant reward
Transition probability between states
The sum of reward at time step
Learning rate
The discount factor
State-action value function
Slack
Number of bins used by the concrete algorithm
Number of bins contained in the optimal solution of each bin packing instance
Number of feasible optimal solution instances that the algorithm does not achieve
Deviation percentage between the solution reached by each algorithm and the optimal solution
Competition ratio

The agent receives the performance-driven reward , and then, the sum of discount reward at time step is represented as :

Therein, , and it defines the weight of future reward and discount reward in the sum of reward. The closer is to 0, the more incentive is to consider short-term benefits. The closer is to 1, the more incentive is to consider long-term benefits.

The goal of the Q-agent at each time step is to select an action that can maximize future discount rewards by finding an optimal policy . Here, is the strategy of taking the optimal action at state , while is the strategy of taking action at state . Under the policy , is defined as the expectation of the state-action value function. When the agent takes at , is represented aswhere is the expected function.

The maximum state-action function over all policies is represented in

The update rule of value is shown inwhere is the learning rate of the RL agent.

At each time step , Q-agent observes the current state and selects the action from a discrete set of behaviors , where the value of is equal to the number of the items to be packed. At the beginning, the action is randomly initialized, that is to say, the action corresponding to the random number between 1 and is selected. Then, the RL-system selects the action that can maximize the value at each time step :

The agent uses a greedy learning strategy [51] to choose actions. It selects actions according to the optimal value of Q table with probability and randomly selects with probability.

The state is represented as the remaining space capacity of the bin after each round of packing. At each time step , the remaining items prefer to be packed into as few bins as possible. When the bin is full, the agent is given a reward . If the bin is overflowing, the agent is punished severely and it is told that state like this is not allowed.

The slack is defined as

Therein, is a constant, is the immediate reward achieved by Q-agent, and represents the initial value in the first iteration process. By returning the reward value , the slack is adjusted in each packing round accordingly. The slack can be changed in a range with the change of the reward value . Ultimately, the new round of environment is updated as the subtraction of bin capacity and slack .

Q-agent captures this intuition through performance-driven rewards . At each time step , the agent’s reward is defined as . Therein, is the number of bins that are exactly filled, is the number of bins that are filled in the slack space, is the weight coefficient of positive reward, is the number of overflowing bins, is the punishment coefficient of negative reward less than 0, and is a constant to regulate the value of the entire reward function .

4.4. Reinforced-MBS Algorithm

By introducing the slack learned by the RL agent into the MBS, we propose the Reinforced-MBS algorithm. Therein, is defined aswhere is the slack parameter learned by the agent applying RL. It is calculated as by minimizing the number of used bins on the validation set. Then, the coefficient of slack is passed into our reinforced-MBS algorithm. In detail, the idea of the reinforced-MBS algorithm is shown in Algorithm 2.

Input: training data with items, container list with capacity ; set , , for ,
(1)Initialize , ;
(2)generate random Slack ;
(3)for to itemListdo
(4) Use the improved dictionary search procedure to find the set of items that should be allocated to the bin of
(5)ifthen Pack into the bins ;
(6)else
(7)  
(8)end if
(9)end for
(10)
Output: bins with item set , .

In Algorithm 2, the improved dictionary search procedure is utilized to find the set of items that should be assigned to the bin during the iterative process. The improved dictionary search procedure is shown in Algorithm 3.

Input: for  = 1,…,; ;  = 1, , where ; ; .
(1)Generate Slack .
(2)Step 1:
(3)if 0 then
(4)if =  , go to Step 3.
(5)else
(6) Find the arrangement of the number of the last item in the temporary item list in the original item list, that is, to find makes .
(7)ifthen
(8)  
(9)  ifthen
(10)   , prepare for packing
(11)  end if
(12)  Step 2:
(13)  ifthen
(14)   , go to Step 1.
(15)  else
(16)   ifthen
(17)    go to Step 3
(18)   else
(19)    . Find q makes , go to Step 2.
(20)   end if
(21)  end if
(22)else
(23)  go to Step 2
(24)end if
(25)end if
(26)Step 3: Place the items in into the Kth bin.
Output:.

5. Experimental Evaluation

In this section, experiments are carried out to verify the effectiveness and robustness of the proposed RBF. First, experimental evaluation indexes are introduced and the datasets used in the experiments are detailed. Afterwards, the experimental results are presented and analyzed. Finally, the robustness and stability of our method is discussed.

5.1. Evaluation Indexes and Datasets
5.1.1. Competition Ratio

The competition ratio [52, 53], , is defined aswhere represents the number of bins used by the concrete algorithm and is the number of bins in the optimal solution for the packing instance. The competition ratio equal to 1 means that the algorithm has found the optimal solution.

Generally, OPT() has a lower limit, as shown in formula (14), where is the ceiling function. Due to the limitation of bin packing conditions, the number of bins used in each bin packing iteration cannot be less than the ratio of the total load of the items to the capacity of a single bin:

5.1.2. FSOL

For a dataset, FSOL represents the number of feasible optimal solution instances achieved by the algorithm; in other words, the number of instances whose is 1. For the specialized algorithm alg and the dataset data, is specialized as .

5.1.3. Realization Rate

Realization rate (RT) is defined as formula (15), where INS is the number of instances in the packing dataset:

5.1.4. Gap

is referred to the deviation between the number of used bins obtained by the algorithm and the optimal number of the packing. The relative is exploited to evaluate the performance of the algorithms, which is calculated as

The [54] and [55] datasets are used in the experiment for evaluation. Therein, dataset includes three subsets, such as Bin1data, Bin2data, and Bin3data. The details of datasets are shown in Table 3, such as the number of instances, weight of items, capacity of bins, and the number of items in the instances.


Bin1data720[1, 100]{100, 120, 150}{50, 100, 200, 500}

Bin2data480[1, 700]1000{50, 100, 200, 500}
Bin3data10[20000, 35000]100000200
200[150, 200]1000{100, 120}

5.2. Experimental Results and Analysis

The performance of the RBF is compared with that of the classical Fit algorithms, the MBS algorithm, and the MBS’ algorithm on and datasets shown in Table 3. For each instance of each dataset, the number of items in each category is the same. The experimental results reported in this paper are the average of ten runs under per hyperparameter settings.

5.2.1. Results on

Table 4 lists the results of FSOL, RT, and of the algorithms involved in comparison on , while Table 5 lists the value of . In comparison with the classical heuristic algorithms, such as NFD, FFD, WFD, AWFD, BFD, MBS, and MBS’, RBF obtains the maximum on , while its and are minimum. Furthermore, the improvement of , represented as and defined as formula (17), is further calculated, where and . Especially, for Bin1data and Bin2data, is 165.08% and 179.2%, respectively, while is 5.53% and 41.3%, respectively. For the dataset Bin3data, RBF is the only one that can obtain 2 optimal solutions in the total 10 cases, while the others obtain zero optimal solutions:


AlgorithmBin1data ()Bin2data ()Bin3data ()
RTRTRT

NFD [24]00.001.35025912.291.127100.001.1711
FFD [23]54675.831.049723649.161.031500.001.0739
WFD [25]44261.381.053721344.371.033600.001.0739
AWFD [25]16322.631.066310.201.080600.001.0865
BFD [23]54775.971.049723649.161.031500.001.0739
MBS [14]25235.001.064512526.041.082900.001.0594
MBS’ [14]63387.911.047124751.451.026400.001.0721
RBF66892.781.046834972.711.0101220.001.0198


AlgorithmBin1dataBin2dataBin3dataSCH-WAE

NFD [24]35.03%12.71%17.12%6.22%
FFD [23]4.98%3.15%7.39%6.15%
WFD [25]5.38%3.37%7.39%6.15%
AWFD [25]6.64%8.06%8.66%10.69%
BFD [23]4.98%3.15%7.39%6.15%
MBS [14]6.46%3.08%5.95%5.17%
MBS’ [14]4.71%2.98%7.21%5.10%
RBF4.68%1.01%1.98%1.39%

5.2.2. Results on

The results of the compared algorithms on are listed in Table 6. And, the detail results of RBF on , that is, SOL, OPT, and the time cost (Runtime) on each instance, are shown in Tables 7 and 8. In the setting of Table 7, the number of items is 100 and the container capacity is 1000. Table 8 shows the statistical results when the number of items is 120. Meanwhile, of each algorithm are shown in the last column of Table 5. It is shown that, in comparison with other algorithms on , RBF achieved the minimum and Gap, but the maximum was FSOL and RT. Especially, for , is 472%, while is 346.88%.


AlgorithmRT

NFD [24]10.501.0622
FFD [23]10.501.0614
WFD [25]10.501.0614
AWFD [25]00.001.1069
BFD [23]10.501.0614
MBS [14]2512.501.0574
MBS’ [14]3216.001.0513
RBF14371.501.0139


SetRuntime (s)

Number of items
SCH_WAE1_118181.462466
SCH_WAE1_218182.727113
SCH_WAE1_318183.150908
SCH_WAE1_418183.70839
SCH_WAE1_518180.539083
SCH_WAE1_618180.758503
SCH_WAE1_718180.664705
SCH_WAE1_818180.679174
SCH_WAE1_919184.522568
SCH_WAE1_1018184.374016
SCH_WAE1_1118180.982381
SCH_WAE1_1218181.691106
SCH_WAE1_1318184.604945
SCH_WAE1_14191810.755302
SCH_WAE1_1518181.797069
SCH_WAE1_1618181.450974
SCH_WAE1_1719181.246981
SCH_WAE1_1818180.722101
SCH_WAE1_1918180.713857
SCH_WAE1_2018180.709936
SCH_WAE1_2118180.998908
SCH_WAE1_2219183.028952
SCH_WAE1_2318180.557798
SCH_WAE1_2418180.976983
SCH_WAE1_2518181.074089
SCH_WAE1_2619180.31875
SCH_WAE1_2718183.446602
SCH_WAE1_2818186.492802
SCH_WAE1_2918184.080608
SCH_WAE1_3018181.483333
SCH_WAE1_3118180.731631
SCH_WAE1_3218181.545062
SCH_WAE1_3318181.73632
SCH_WAE1_3418180.730045
SCH_WAE1_3518181.060398
SCH_WAE1_3618180.987255
SCH_WAE1_37191813.783422
SCH_WAE1_3818180.861974
SCH_WAE1_3918183.244424
SCH_WAE1_4018182.785588
SCH_WAE1_4118181.43709
SCH_WAE1_4218181.009547
SCH_WAE1_4318181.022882
SCH_WAE1_4418180.58083
SCH_WAE1_4518184.300251
SCH_WAE1_4618181.063874
SCH_WAE1_4718187.621793
SCH_WAE1_4818180.681381
SCH_WAE1_4918180.96076
SCH_WAE1_5018180.520412
SCH_WAE1_5118180.706839
SCH_WAE1_5218181.22345
SCH_WAE1_5318181.450091
SCH_WAE1_5418180.876361
SCH_WAE1_5518180.839967
SCH_WAE1_5618180.522451
SCH_WAE1_5719189.148892
SCH_WAE1_5818181.304132
SCH_WAE1_5918180.91717
SCH_WAE1_6018180.860201
SCH_WAE1_61181811.420425
SCH_WAE1_6218180.818042
SCH_WAE1_6318181.024627
SCH_WAE1_6418180.630548
SCH_WAE1_6518181.120638
SCH_WAE1_6618187.398161
SCH_WAE1_6718181.001821
SCH_WAE1_6818180.979633
SCH_WAE1_6918180.681692
SCH_WAE1_7019183.835019
SCH_WAE1_7118184.387344
SCH_WAE1_7218181.003916
SCH_WAE1_7318183.558745
SCH_WAE1_7419185.927648
SCH_WAE1_7518180.779189
SCH_WAE1_7618181.061736
SCH_WAE1_7718181.168255
SCH_WAE1_7819180.446685
SCH_WAE1_7918181.053857
SCH_WAE1_8018181.261393
SCH_WAE1_8118183.997979
SCH_WAE1_8218180.953343
SCH_WAE1_8318180.918264
SCH_WAE1_8418180.965771
SCH_WAE1_8518180.714233
SCH_WAE1_8618180.950277
SCH_WAE1_8718181.366335
SCH_WAE1_88181812.544721
SCH_WAE1_8918180.918441
SCH_WAE1_9018181.302644
SCH_WAE1_9118182.591703
SCH_WAE1_9218181.554301
SCH_WAE1_9318181.055307
SCH_WAE1_9418181.010704
SCH_WAE1_9518181.482197
SCH_WAE1_9618181.351396
SCH_WAE1_9718181.305057
SCH_WAE1_9818180.872971
SCH_WAE1_9918183.715139
SCH_WAE1_10018181.34348


SetRuntime (s)

Number of items
SCH_WAE2_122227.881069
SCH_WAE2_222223.576123
SCH_WAE2_322213.484807
SCH_WAE2_422212.672799
SCH_WAE2_522221.696537
SCH_WAE2_622226.829253
SCH_WAE2_7222210.166636
SCH_WAE2_821214.537143
SCH_WAE2_922224.785806
SCH_WAE2_1022211.873941
SCH_WAE2_1122221.246602
SCH_WAE2_1222221.538963
SCH_WAE2_1322226.513991
SCH_WAE2_1422213.337582
SCH_WAE2_15222123.913958
SCH_WAE2_1622221.695519
SCH_WAE2_1722212.759149
SCH_WAE2_1822222.037806
SCH_WAE2_1922221.961526
SCH_WAE2_2023227.537802
SCH_WAE2_2122211.318665
SCH_WAE2_2222221.634718
SCH_WAE2_2322221.795601
SCH_WAE2_2422221.40357
SCH_WAE2_25222219.901876
SCH_WAE2_26222110.612043
SCH_WAE2_2722213.323103
SCH_WAE2_2822212.773831
SCH_WAE2_2922212.13123
SCH_WAE2_3022212.424299
SCH_WAE2_3122221.590401
SCH_WAE2_3222221.371827
SCH_WAE2_3322221.966954
SCH_WAE2_3422213.440176
SCH_WAE2_3522221.483907
SCH_WAE2_3622226.622695
SCH_WAE2_3722221.249159
SCH_WAE2_3822221.299636
SCH_WAE2_39222212.406897
SCH_WAE2_4022222.000733
SCH_WAE2_4122215.219861
SCH_WAE2_42222116.346722
SCH_WAE2_4322222.031538
SCH_WAE2_4422222.05005
SCH_WAE2_4522222.064586
SCH_WAE2_4622211.883535
SCH_WAE2_4722213.715486
SCH_WAE2_4822214.25449
SCH_WAE2_4922228.43341
SCH_WAE2_5022212.382179
SCH_WAE2_5122222.021937
SCH_WAE2_5222221.854207
SCH_WAE2_5322225.216751
SCH_WAE2_54222218.804251
SCH_WAE2_5522227.729899
SCH_WAE2_5622212.015478
SCH_WAE2_5722212.654803
SCH_WAE2_5822222.015065
SCH_WAE2_59222115.788445
SCH_WAE2_60222212.14978
SCH_WAE2_6122226.24467
SCH_WAE2_6222212.718076
SCH_WAE2_63222110.584626
SCH_WAE2_6422212.177987
SCH_WAE2_6522224.06024
SCH_WAE2_6622221.741747
SCH_WAE2_6722222.653687
SCH_WAE2_6822221.437786
SCH_WAE2_6922213.402605
SCH_WAE2_7022223.326743
SCH_WAE2_7122212.248744
SCH_WAE2_72222216.249829
SCH_WAE2_7322221.297735
SCH_WAE2_74222117.573503
SCH_WAE2_7522211.374113
SCH_WAE2_7622222.612285
SCH_WAE2_7722213.444593
SCH_WAE2_7822212.393924
SCH_WAE2_7922212.894696
SCH_WAE2_80232214.637149
SCH_WAE2_81232211.182882
SCH_WAE2_8222223.12165
SCH_WAE2_8322211.875829
SCH_WAE2_84222210.238209
SCH_WAE2_8522218.727173
SCH_WAE2_8622224.11194
SCH_WAE2_8722214.925202
SCH_WAE2_8822221.467682
SCH_WAE2_8922221.679621
SCH_WAE2_9022217.822758
SCH_WAE2_9122212.573397
SCH_WAE2_9222212.676046
SCH_WAE2_9322213.51098
SCH_WAE2_9422211.595649
SCH_WAE2_9522212.751163
SCH_WAE2_9622212.599518
SCH_WAE2_9722221.551431
SCH_WAE2_9822212.829111
SCH_WAE2_9922221.861101
SCH_WAE2_10022212.520166

5.2.3. Cumulative Results

For all the instances in and , the cumulative packing results of the compared algorithms, such as the cumulative , the average RT, the average , and the average , are shown in Table 9. It is shown that RBF obtained 1162 cumulative and the RT was 82.41%, which greatly overwhelmed other compared algorithms. Especially, according to formula (17), the cumulative improvement of RBF to MBS and MBS’, correspondingly, and , is 189.05% and 27.41%, respectively. Figure 2 graphically shows the statistical of the comparison algorithm on each dataset. It is noted that, from the radar in Figure 2, the quantitative curve of RBF for is at the outermost. It means that RBF achieved the largest number of optimal solutions in the test cases. Also, RBF achieved the minimum and Gap. Results indicate that, in comparison with the typical heuristic algorithms, RBF has a stronger global optimal performance. Overall, the proposed RBF obtains first ranks in all metrics according to the results of all compared algorithms listed in Tables 46. In detail, the improvement in of uncomplicated datasets, Bin1data, is limited among all compared methods. RBF achieves best results and wins with few advantages. However, RBF works much better than other methods on difficult datasets, bin2data, bin3data, and . Besides, RBF achieves huge advantages in , and it wins almost all other methods on used datasets.


AlgorithmNFD [24]FFD [23]WFD [25]AWFD [25]BFD [23]MBS [14]MBS’ [14]RBF

Cumulative 607836561647844029121162
Average RT4.26%55.53%46.52%11.63%55.60%28.51%64.68%82.41%
Average 1.17761.05411.05561.08501.05411.06601.04921.0226
Average 17.77%5.42%5.57%8.51%5.42%5.17%5.00%2.27%

5.3. Robustness and Stability

The construction of the validation set is a key procedure of RBF. This experiment is carried out to verify the validity of the eigenvalue mapping function on the Bin1data. Since 10 instances of the Bin1data are selected to form the validation set by the eigenvalue mapping function, here different selection policy are applied for comparison in the packing process. The first policy is that the first 10 instances of the Bin1data are selected, the second policy is that the last 10 instances of the Bin1data are selected, and the third policy is that 10 random instances of the Bin1data are selected to form the validation set. The packing results with different selection policies are depicted in Table 10. It can be seen that the value of slack learned by Q-agent is different with different selection policies. Especially, with the selection policy of the eigenvalue mapping function, RBF achieved the maximum FSOL and RT, while the minimum was and Gap. The results verified the validity of the eigenvalue mapping function, which helped RBF achieved better performance.


Validation setParameter RT (%) (%)

Top 10 cases0.181718926.251.05325.32
The last 10 cases0.871023232.221.06205.87
Random 10 cases0.453230742.641.04924.93
Mapping 10 cases0.581366892.781.04684.68

6. Conclusion and Future Work

In this paper we propose reinforced bin packing framework (RBF) to tackle the one-dimensional BPP. The proposed RBF consists of three main components: the RL-system, the instance-eigenvalue mapping process, and the reinforced-MBS strategy. The RL-system is designed to construct a slack selection policy automatically by Q-agent to select high-quality slack for the heuristic algorithm integrated in RBF. The instance-eigenvalue mapping process is utilized to generate the representative and classified the validate set based on the similarity of the input instances, which greatly eliminates the computational overhead and improves the generalization performance of the model. Finally, with the provision of the slack coefficient from the RL-system, the reinforced-MBS strategy is exploited to resolve the packing process. We evaluate our models on BPP tasks, where RBF exhibits excellent packing ability and experimental results validate its superior performance compared to state-of-the-art proposals on and datasets. Compared to its baseline methods, MBS and MBS’, the average number of optimal solutions achieved by RBF increases by 189.05% and 27.41%, respectively.

For future work, we plan to investigate slack selection policies and new mechanisms to learn them automatically. We also foresee the extension of our method to more complex multiagent reinforcement learning frameworks, where the use of new aspects of the multiagent communication environment is crucial to boost the packing performance.

Data Availability

The datasets used in this paper contain one-dimensional bin packing datasets, such as BINDATA and SCH_WAE datasets, which can be found in http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/binpackinfo.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61472139) and the Shanghai 2020 Action Plan of Technological Innovation (no. 20dz1201400).

References

  1. H. Tang, K. C. Tan, and Z. Yi, “A columnar competitive model for solving combinatorial optimization problems,” IEEE Transactions on Neural Networks, vol. 15, no. 6, pp. 1568–1573, 2004. View at: Publisher Site | Google Scholar
  2. D. Zhang, Z. Fu, and L. Zhang, “Joint optimization for power loss reduction in distribution systems,” IEEE Transactions on Power Systems, vol. 23, no. 1, pp. 161–169, 2008. View at: Publisher Site | Google Scholar
  3. F. Eisenbrand, D. Pálvölgyi, and T. Rothvoß, “Bin packing via discrepancy of permutations,” ACM Transactions on Algorithms, vol. 9, no. 3, pp. 24-25, 2013. View at: Publisher Site | Google Scholar
  4. H. I. Christensen, A. Khan, S. Pokutta, and P. Tetali, “Approximation and online algorithms for multidimensional bin packing: a survey,” Computer Science Review, vol. 24, pp. 63–79, 2017. View at: Publisher Site | Google Scholar
  5. K. Lehmann, A. Grastien, and P. Van Hentenryck, “Ac-feasibility on tree networks is np-hard,” IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 798–801, 2015. View at: Google Scholar
  6. R. Li, Y. Wang, S. Hu, J. Jiang, D. Ouyang, and M. Yin, “Solving the set packing problem via a maximum weighted independent set heuristic,” Mathematical Problems in Engineering, vol. 2020, Article ID 3050714, 11 pages, 2020. View at: Publisher Site | Google Scholar
  7. S. Martello, D. Pisinger, and P. Toth, “New trends in exact algorithms for the 0-1 knapsack problem,” European Journal of Operational Research, vol. 123, no. 2, pp. 325–332, 2000. View at: Publisher Site | Google Scholar
  8. S. Martello and D. Vigo, “Exact solution of the two-dimensional finite bin packing problem,” Management Science, vol. 44, no. 3, pp. 388–399, 1998. View at: Publisher Site | Google Scholar
  9. F. Brandão and J. P. Pedroso, “Bin packing and related problems: general arc-flow formulation with graph compression,” Computers & Operations Research, vol. 69, pp. 56–67, 2016. View at: Publisher Site | Google Scholar
  10. J. D. S. Coffman, M. R. Garey, and D. S. Johnson, “Approximation algorithms for bin packing: a survey,” in Approximation Algorithms for NP-Hard Problems, D. S. Hochbaum, Ed., vol. 1, pp. 46–93, PWS Publishing Co., Boston, MA, USA, 1996. View at: Google Scholar
  11. K. Sim, E. Hart, and B. Paechter, “A lifelong learning hyper-heuristic method for bin packing,” Evolutionary Computation, vol. 23, no. 1, pp. 37–67, 2015. View at: Publisher Site | Google Scholar
  12. H. Wang, W. Wang, H. Sun, and S. Rahnamayan, “Firefly algorithm with random attraction,” International Journal of Bio-Inspired Computation, vol. 8, no. 1, pp. 33–41, 2016. View at: Publisher Site | Google Scholar
  13. M. Abdel-Basset, G. Manogaran, L. Abdel-Fatah, and S. Mirjalili, “An improved nature inspired meta-heuristic algorithm for 1-d bin packing problems,” Personal and Ubiquitous Computing, vol. 22, no. 5-6, pp. 1117–1132, 2018. View at: Publisher Site | Google Scholar
  14. J. N. D. Gupta and J. C. Ho, “A new heuristic algorithm for the one-dimensional bin-packing problem,” Production Planning & Control, vol. 10, no. 6, pp. 598–603, 1999. View at: Publisher Site | Google Scholar
  15. S. Polyakovskiy and R. M’Hallah, “A hybrid feasibility constraints-guided search to the two-dimensional bin packing problem with due dates,” European Journal of Operational Research, vol. 266, no. 3, pp. 819–839, 2018. View at: Publisher Site | Google Scholar
  16. R. Villasana, L. Garver, and S. Salon, “Transmission network planning using linear programming,” IEEE Transactions on Power Apparatus and Systems, vol. PAS-104, no. 2, pp. 349–356, 1985. View at: Publisher Site | Google Scholar
  17. A. Richards, T. Schouwenaars, J. P. How, and E. Feron, “Spacecraft trajectory planning with avoidance constraints using mixed-integer linear programming,” Journal of Guidance, Control, and Dynamics, vol. 25, no. 4, pp. 755–764, 2002. View at: Publisher Site | Google Scholar
  18. M. Chitsaz, J.-F. Cordeau, and R. Jans, “A branch-and-cut algorithm for an assembly routing problem,” European Journal of Operational Research, vol. 282, no. 3, pp. 896–910, 2020. View at: Publisher Site | Google Scholar
  19. Y. Zhang, X. Sun, and B. Wang, “Efficient algorithm for k-barrier coverage based on integer linear programming,” China Communications, vol. 13, no. 7, pp. 16–23, 2016. View at: Publisher Site | Google Scholar
  20. M. I. Hartillo-Hermoso, H. Jiménez-Tafur, and J. M. Ucha-Enríquez, “An exact algebraic ϵ-constraint method for bi-objective linear integer programming based on test sets,” European Journal of Operational Research, vol. 282, no. 2, pp. 453–463, 2020. View at: Publisher Site | Google Scholar
  21. P. Tangpattanakul, N. Jozefowiez, and P. Lopez, “A multi-objective local search heuristic for scheduling earth observations taken by an agile satellite,” European Journal of Operational Research, vol. 245, no. 2, pp. 542–554, 2015. View at: Publisher Site | Google Scholar
  22. D. S. Johnson, “Approximation algorithms for combinatorial problems,” Journal of Computer and System Sciences, vol. 9, no. 3, pp. 256–278, 1974. View at: Publisher Site | Google Scholar
  23. D. S. Johnson, “Near-optimal bin packing algorithms,” Massachusetts Institute of Technology, Department of Mathematics, Cambridge, UK, 1973, Ph. D. dissertation. View at: Google Scholar
  24. M. Hofri and S. Kamhi, “A stochastic analysis of the nfd bin-packing algorithm,” Journal of Algorithms, vol. 7, no. 4, pp. 489–509, 1986. View at: Publisher Site | Google Scholar
  25. N. G. Hall, S. Ghosh, R. D. Kankey, S. Narasimhan, and W. T. Rhee, “Bin packing problems in one dimension: heuristic solutions and confidence intervals,” Computers & Operations Research, vol. 15, no. 2, pp. 171–177, 1988. View at: Publisher Site | Google Scholar
  26. R. Ren, X. Tang, Y. Li, and W. Cai, “Competitiveness of dynamic bin packing for online cloud server allocation,” IEEE/ACM Transactions on Networking, vol. 25, no. 3, pp. 1324–1331, 2016. View at: Google Scholar
  27. F. F. Boctor, “Some efficient multi-heuristic procedures for resource-constrained project scheduling,” European Journal of Operational Research, vol. 49, no. 1, pp. 3–13, 1990. View at: Publisher Site | Google Scholar
  28. J. F. Gonçalves and M. G. C. Resende, “A biased random key genetic algorithm for 2d and 3d bin packing problems,” International Journal of Production Economics, vol. 145, no. 2, pp. 500–510, 2013. View at: Publisher Site | Google Scholar
  29. Y. Wu, M. Tang, and W. Fraser, “A simulated annealing algorithm for energy efficient virtual machine placement,” in Proceedings of the IEEE international Conference on Systems, Man, and Cybernetics, pp. 1245–1250, Seoul, Korea, October 2012. View at: Google Scholar
  30. F. Luo, I. D. Scherson, and J. Fuentes, “A novel genetic algorithm for bin packing problem in jmetal,” in Proceedings of the 2017 IEEE International Conference on Cognitive Computing (ICCC), pp. 17–23, Honolulu, HI, USA, June 2017. View at: Google Scholar
  31. W. Gao and Z.-P. Jiang, “Adaptive dynamic programming and adaptive optimal output regulation of linear systems,” IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4164–4169, 2016. View at: Publisher Site | Google Scholar
  32. I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” 2016, http://arxiv.org/abs/1611.09940. View at: Google Scholar
  33. H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, “Solving a new 3d bin packing problem with deep reinforcement learning method,” 2017, http://arxiv.org/abs/1708.05930. View at: Google Scholar
  34. A. Mirhoseini, H. Pham, Q. V. Le et al., “Device placement optimization with reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, pp. 2430–2439, Sydney, Australia, August 2017. View at: Google Scholar
  35. B. J. Hellstrom and L. N. Kanal, “Knapsack packing networks,” IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 302–307, 1992. View at: Publisher Site | Google Scholar
  36. A. Savi A, J. Kratica, M. Milanovi, and D. Dugošija, “A mixed integer linear programming formulation of the maximum betweenness problem,” European Journal of Operational Research, vol. 206, no. 3, pp. 522–527, 2010. View at: Google Scholar
  37. J. Kang and S. Park, “Algorithms for the variable sized bin packing problem,” European Journal of Operational Research, vol. 147, no. 2, pp. 365–372, 2003. View at: Publisher Site | Google Scholar
  38. E. G. Coffman Jr, J. Csirik, G. Galambos, S. Martello, and D. Vigo, “Bin packing approximation algorithms: survey and classification,” Handbook of Combinatorial Optimization, vol. 1, no. 2, pp. 455–531, 2013. View at: Publisher Site | Google Scholar
  39. K. Fleszar and C. Charalambous, “Average-weight-controlled bin-oriented heuristics for the one-dimensional bin-packing problem,” European Journal of Operational Research, vol. 210, no. 2, pp. 176–184, 2011. View at: Publisher Site | Google Scholar
  40. S. S. Seiden, “On the online bin packing problem,” Journal of the ACM, vol. 49, no. 5, pp. 640–671, 2002. View at: Publisher Site | Google Scholar
  41. L. Epstein and R. Van Stee, “Online bin packing with resource augmentation,” Discrete Optimization, vol. 4, no. 3-4, pp. 322–333, 2007. View at: Publisher Site | Google Scholar
  42. K. Fleszar and K. S. Hindi, “New heuristics for one-dimensional bin-packing,” Computers & Operations Research, vol. 29, no. 7, pp. 821–839, 2002. View at: Publisher Site | Google Scholar
  43. T. Dokeroglu and A. Cosar, “Optimization of one-dimensional bin packing problem with island parallel grouping genetic algorithms,” Computers & Industrial Engineering, vol. 75, pp. 176–186, 2014. View at: Publisher Site | Google Scholar
  44. M. Quiroz-Castellanos, L. Cruz-Reyes, J. Torres-Jimenez, C. Gómez S., H. J. F. Huacuja, and A. C. F. Alvim, “A grouping genetic algorithm with controlled gene transmission for the bin packing problem,” Computers & Operations Research, vol. 55, pp. 52–64, 2015. View at: Publisher Site | Google Scholar
  45. T. Kucukyilmaz and H. E. Kiziloz, “Cooperative parallel grouping genetic algorithm for the one-dimensional bin packing problem,” Computers & Industrial Engineering, vol. 125, pp. 157–170, 2018. View at: Publisher Site | Google Scholar
  46. T. G. Crainic, G. Perboli, and R. Tadei, “TS2PACK: a two-level tabu search for the three-dimensional bin packing problem,” European Journal of Operational Research, vol. 195, no. 3, pp. 744–760, 2009. View at: Publisher Site | Google Scholar
  47. D. Kumar and Z. Raza, “A pso based vm resource scheduling model for cloud computing,” in Proceedings of the IEEE International Conference on Computational Intelligence & Communication Technology, pp. 213–219, Ghaziabad, India, February 2015. View at: Google Scholar
  48. O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 2692–2700, Montreal, Canada, December 2015. View at: Google Scholar
  49. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017. View at: Publisher Site | Google Scholar
  50. B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only q-learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2134–2144, 2016. View at: Publisher Site | Google Scholar
  51. Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time deterministic -learning: a novel convergence analysis,” IEEE Transactions on Cybernetics, vol. 47, no. 5, pp. 1224–1237, 2016. View at: Google Scholar
  52. G. Gutin, T. Jensen, and A. Yeo, “Batched bin packing,” Discrete Optimization, vol. 2, no. 1, pp. 71–82, 2005. View at: Publisher Site | Google Scholar
  53. E. F. Grove, “Online bin packing with lookahead,” in Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 430–436, San Francisco, CA, USA, December 1995. View at: Google Scholar
  54. A. Scholl, R. Klein, and C. Jürgens, “Bison: a fast hybrid procedure for exactly solving the one-dimensional bin packing problem,” Computers & Operations Research, vol. 24, no. 7, pp. 627–645, 1997. View at: Publisher Site | Google Scholar
  55. P. Schwerin and G. Wäscher, “The bin-packing problem: a problem generator and some numerical experiments with ffd packing and mtp,” International Transactions in Operational Research, vol. 4, no. 5-6, pp. 377–389, 1997. View at: Publisher Site | Google Scholar

Copyright © 2021 Ting Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views564
Downloads360
Citations

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.