Abstract
Most existing research on the job shop scheduling problem has been focused on the minimization of makespan (i.e., the completion time of the last job). However, in the fiercely competitive market nowadays, delivery punctuality is more important for maintaining a high service reputation. So in this paper, we aim at solving job shop scheduling problems with the total weighted tardiness objective. Several dispatching rules are adopted in the Giffler-Thompson algorithm for constructing active schedules. It is noticeable that the rule selections for scheduling consecutive operations are not mutually independent but actually interrelated. Under such circumstances, a probabilistic model-building genetic algorithm (PMBGA) is proposed to optimize the sequence of selected rules. First, we use Bayesian networks to model the distribution characteristics of high-quality solutions in the population. Then, the new generation of individuals is produced by sampling the established Bayesian network. Finally, some elitist individuals are further improved by a special local search module based on parameter perturbation. The superiority of the proposed approach is verified by extensive computational experiments and comparisons.
1. Introduction
The job shop scheduling problem (JSSP) has been known as an extremely difficult combinatorial optimization problem ever since the 1950s. In terms of computational complexity, JSSP is strongly -hard [1]. Because of its relevance to contemporary manufacturing systems, extensive research has been conducted on the problem [2]. In recent years, the metaheuristicsβsuch as simulated annealing (SA) [3], genetic algorithm (GA) [4β6], tabu search (TS) [7, 8], particle swarm optimization (PSO) [9, 10], and ant colony optimization (ACO) [11, 12]βhave clearly become the most popular methods.
However, most existing research is still based on the standard JSSP model, so it is inconvenient to apply these algorithms directly to real-world scheduling scenarios. Especially, most algorithms are designed for the makespan criterion (i.e., minimize ). Actually, in the make-to-order production environment nowadays, due date related performances are becoming increasingly significant, because the in-time delivery of goods is vital for maintaining a high service reputation. Therefore, the algorithms that aim at minimizing tardiness in JSSP deserve more investigations.
In an attempt to contribute to the scheduling community, in this paper we study the JSSP with the objective of minimizing total weighted tardiness (TWT). TWT is the weighted sum of each jobβs tardiness against its due date. In some respects, the TWT measure captures the critical factors related with the profitability of a firm more accurately than the makespan. Meanwhile, from the theoretical perspective, the complexity of solving JSSP with TWT objective (abbreviated as TWT-JSSP hereinafter) is much greater than that of solving JSSP with makespan objective [13]. The value of TWT is affected by all the tardy jobs and thus is more sensitive to the changes in the schedule. According to the three-field notation, the studied problem can be described as .
Relatively few publications have discussed the minimization of TWT in JSSP. The only exact method is the branch-and-bound algorithm proposed by Singer and Pinedo [14], while the rest belong to the heuristic category. The first attempt of adapting the shifting bottleneck algorithm (which was very successful for makespan minimization) to TWT-JSSP is reported in [15]. In [16, 17], the authors propose modified shifting bottleneck heuristics for complex job shops (characterized by parallel batching machines, sequence-dependent setup times, and reentrant process flows) with TWT criterion. In these algorithms, the single/parallel machine subproblems are solved basically by the ATC rule or the extended BATCS rule (for the case of batching and setups). The shifting bottleneck heuristic proposed in [18], however, uses genetic algorithms to solve the subproblems. The computations show that using GA as subproblem solution procedures leads to improved results compared to dispatching-based subproblem solution procedures. Besides shifting bottleneck, a large step random walk (LSRW) algorithm is designed in [19], which uses different neighborhood sizes depending on whether a small step or a large step is performed. A small step consists of iterative improvement, while a large step consists of a Metropolis algorithm. In [20, 21], hybrid genetic algorithms are presented for solving TWT-JSSP. The former combines GA with different dispatching rules, while the latter combines GA with an iterated local search procedure that uses the longest path approach on a disjunctive graph model. In [22], a tabu search algorithm is presented for the generalized TWT-JSSP with release times and precedence constraints. Recently, electromagnetic algorithms (EM) have been successfully applied to TWT-JSSP [23]. In fact, most of these algorithms have relatively high computational complexity, and thus they are incapable of solving large-scale TWT-JSSP.
Rule-based scheduling approaches have been investigated for job shop scheduling since the late 1990s [24, 25]. A remarkable advantage of using dispatching rules is that it helps to save computational time and generally produces satisfactory solutions. Recently, modern optimization strategies are applied to find high-quality combinations of dispatching rules. For example, a neural network is designed in [26] to determine the dispatching rule to use on each machine in a job shop (it is necessary to train the neural network beforehand with optimal rule combinations for some known instances). A memetic algorithm with tailored-encoding/decoding schemes, and a local search procedure is proposed in [27] for minimizing the number of tardy jobs in JSSP. The memetic algorithm defeats a multistart hill-climbing approach and a simulated annealing approach. A dispatching rule-based genetic algorithms (DRGA) is proposed in [28], which searches simultaneously for the best sequence of dispatching rules and the number of operations to be handled by each dispatching rule. The DRGA obtains better results than a GA using the conventional dispatching rule representation and a GA that uses the operation permutation representation. A genetic programming-based data mining approach is proposed in [29] to select dispatching rules under a given set of shop parameters (e.g., interarrival times). The results obtained from simulation show that the selected dispatching rules are appropriate according to the current shop status.
The rest of the paper is organized as follows. The discussed problem is mathematically formulated in Section 2. Section 3 makes a brief introduction to the principles of PMBGA. Section 4 proposes a rule-based PMBGA for solving TWT-JSSP. Section 5 presents the computational results and analysis. Finally, Section 6 concludes the paper.
2. Problem Formulation
In a JSSP instance, a set of jobs are to be processed on a set of machines under the following basic assumptions: (i) there is no machine breakdown; (ii) no preemption of operations is allowed; (iii) all jobs are released at time 0; (iv) the transportation times and the setup times are all neglected; (v) each machine can process only one job at a time; (vi) each job may be processed by only one machine at a time.
Each job has a fixed processing route which traverses the relevant machines (in the standard JSSP benchmark instances, each job is required to visit all the machines. But actually, the number of operations for each job () can be less than ) in a predetermined order. The manufacturing process of job on machine is noted as operation , with a duration of . Besides, a preset due date (describing the level of urgency) and a preset weight (reflecting the importance of the order) are given for each job . The objective function is defined as , where defines the tardiness of job .
JSSP can be described by a disjunctive graph [30]. is the set of nodes. is the set of conjunctive arcs which connect successive operations of the same job, so describes the technological constraints in the JSSP instance. is the set of disjunctive arcs, where denotes the disjunctive arcs corresponding to the operations on machine . Each arc in connects a pair of operations to be processed by machine and ensures that the two operations should not be processed simultaneously. Initially, the disjunctive arcs do not have fixed directions.
Under the disjunctive graph representation, finding a feasible schedule for the JSSP is equivalent to orienting all the disjunctive arcs so that no directed cycles exist in the resulting graph. In this paper, we use to denote the set of directed disjunctive arcs which are transformed from the original . Thus, if is acyclic, the schedule corresponding to is feasible (in the rest of the paper, we do not distinguish between and the schedule. For the convenience of expression, we will write as a matrix. The th row of represents the processing order of the operations on machine ).
Based on the disjunctive graph model, the discussed TWT-JSSP can be mathematically formulated as follows:In this formulation, . represents the starting time of operation . denotes the index of the machine that processes the last operation of job , so the completion time of job is . The set of constraints (a) ensure that the processing order of the operations from each job is consistent with the technological routes. The set of constraints (b) ensure that any two operations on the same machine cannot be processed simultaneously.
3. A Brief Introduction to PMBGA
Recently, there has been a growing interest in the evolutionary algorithms that explore the search space by building and utilizing probabilistic models of high-quality solutions. Indeed, these algorithms use the following two steps to replace the conventional crossover and mutation operators in GA:(1)Build a probabilistic model of the selected promising solutions;(2)Sample the built model to produce a new generation of candidate solutions.
The evolutionary algorithms based on such a principle are referred to as estimation of distribution algorithms (EDAs) or probabilistic model-building genetic algorithms (PMBGAs). The major steps of a PMBGA implementation are listed as follows, where is the maximum number of generations.
Step 1. Set the generation index . Initialize the population of the first generation, that is, .
Step 2. Select a subset of promising individuals from .
Step 3. Establish a probabilistic model which somehow describes the distribution characteristics of .
Step 4. Generate a set of new individuals by sampling .
Step 5. Select the best individuals from and assign them to the next generation population .
Step 6. Let . If , return to Step 2. Otherwise, output the best solution in .
The PMBGA is especially useful for the complex optimization problems in which the decision variables are correlated. For such problems, the realized value of a certain decision variable can produce an impact on the optimal value for another decision variable. Therefore, if these variables are optimized in a separate way (or one by one), traditional GA will be very likely to converge to local optimum. PMBGA improves traditional GA by modeling the relationship between decision variables. Clearly, the most crucial element in the design of PMBGA is the type of the adopted probabilistic model (in Step 3), which directly affects the algorithmβs capability of producing high-quality offspring solutions. In the artificial intelligence community, the commonly adopted graphical model for characterizing the relationship between a set of discrete variables is the Bayesian network [31]. A Bayesian network is a directed acyclic graph. A node in the Bayesian network indicates a variable under investigation (each variable actually corresponds to a coding gene for the solutions in PMBGA), and an arc indicates the probabilistic causal relationship between the two nodes connected by it. The direction of the arc implies that the variable corresponding to the head node of the arc is conditioned by the variable corresponding to the tail node. In general, the joint probabilistic distribution of an -variate random vector described by a Bayesian network can be calculated as In this formulation, is a vector of realized values for ; is a set of realized values for the parents (in a Bayesian network, if there exists a directed arc pointing from node to , then is called a parent of ) of the random variable .
A detailed introduction to the PMBGA can be found in [32]. Some important advances and interesting applications of EDA are covered in [33, 34]. Successes in utilizing EDA to solve scheduling problems have been reported in [35, 36] (for flow shop scheduling).
4. The Proposed PMBGA for Solving TWT-JSSP
4.1. Encoding
The proposed PMBGA relies on dispatching rules to record the scheduling policies. Eight scheduling rules are involved in this study. In the following expressions for the priority index, operation belongs to job , and and are the corresponding jobβs weight and due date. and , respectively, denote the set of job successors of operation and the set of job predecessors of operation . is used to indicate that the operation with the smallest index will be chosen from the conflict set, while is used to indicate that the operation with the largest index will be chosen from the conflict set:(1)ATC (apparent tardiness cost): ββ (in this expression, denotes the earliest starting time of operation , and denotes the average processing time of the current operations in the conflict set. is a scaling parameter (or called βlook-aheadβ parameter). is the estimated lead time of operation );(2)SPT (shortest processing time): ;(3)LPT (longest processing time): ;(4)WSPT (weighted shortest processing time): ;(5)SRPT (shortest remaining processing time): ;(6)LRPT (longest remaining processing time): ;(7)EDD (earliest due date): ;(8)ODD (operation due date): .
In PMBGA, each encoding digit (i.e., gene) is expressed by the serial number () of the selected dispatching rule. A solution is represented by a rule sequence , where indicates the rule to use when scheduling the th operation on machine . Therefore, the encoding length for each solution is .
4.2. Decoding
In order to evaluate the fitness of a solution, the Giffler-Thompson algorithm is applied to construct an active schedule based on the specified dispatching rules. The implementation of the Giffler-Thompson algorithm is detailed below.
Input: A sequence of rules .
Step 1. Let (the set of all operations), (the set of first operations of each job). Set and ().
Step 2. Find the operation , and let be the index of the machine on which this operation should be processed. Use to denote all the operations from which should be processed on machine .
Step 3. Delete from the operations that satisfy .
Step 4. Use rule to identify an operation from if currently there are more than one candidates. Schedule operation on machine at the earliest possible time. Let .
Step 5. Let , , where denotes the immediate job successor of operation (if any).
Step 6. If , set and go to Step 2. Otherwise, the decoding procedure is terminated.
In the above description, the release time equals the earliest possible starting time of operation (determined from the already scheduled operations). So, is the earliest possible completion time of operation . represents the set of operations yet to be scheduled at iteration , while represents the set of ready operations (whose job predecessors have all been scheduled) at iteration . In Step 4, the operation set is also called a conflict set.
4.3. Producing Offspring Individuals
(a) Selection
In each iteration, we first sort all the individuals in the current population according to their fitness. Then, we select the best of individuals to form the set , which will subsequently be used to build the Bayesian network in order to produce a new generation of individuals.
(b) The Adopted Network Structure
A Bayesian network has to be built to describe the probabilistic distribution of favorable settings based on the elite individuals in . Each individual can be characterized by a directed acyclic network as shown in Figure 1. In this network, a node () indicates the fact that the dispatching rule used to schedule the th operation on machine is selected as rule Number . The directed arc from node to node (or ) represents the dependency between the two nodes, so it characterizes the possible influence of the rule selection for the th operation on machine on the rule selection for the th operation on machine (or the -th operation on machine 1). Therefore, a directed path from a certain node in the first row to a certain node in the -th row can completely describe an individual in the population (because a directed path records an instantiation of all the rule selections).
(c) Calculation of the Probability Values
Since we adopt a fixed network structure in PMBGA, building the Bayesian network is equivalent to determining the values of all the conditional probabilities according to the selected solution set . After that, new individuals will be produced by iteratively sampling these probabilistic distributions, expecting to obtain high-quality offsprings.
Given a number of individuals (i.e., the training set ), an estimate of the conditional probabilities can be obtained simply by counting the frequencies of occurrence.
Example 4.1. Here, we provide a concrete example to illustrate the probability calculation process. For a PMBGA optimization instance with and , let us further suppose the current contains 40 individuals. In Figure 2, the statistics of these individuals are displayed on a network as previously defined. The weight of each arc (the number placed on the arc) indicates the occurring frequency of this arc (counted for all the individuals in ). For example, if there exists an individual coded as ββ, then the path ββ in the Bayesian network is used to record this individual, and consequently, the weights (counted frequencies) of the arcs and will be increased by 1, respectively. Note that, in the final network, the sum of the weights of all the incoming arcs of a certain node should be equal to the sum of the weights of all the outgoing arcs of the same node. This is because each individual corresponds to a complete path from the first row to the last row.
By using frequency as an approximation for probability, the relevant (conditional) probabilities should be calculated as follows:
According to the above calculation method, a connection can never be rediscovered in the PMBGA if the corresponding conditional probability is zero (e.g., from to ). To overcome this drawback, we can set the minimum count to 1. Taking as an example, the conditional probabilities for the outgoing arcs will then become Now it is possible to discover from , though the probability is small.
(d) The Sampling Process
The sampling process for generating a new individual begins from the root nodes of the Bayesian network. By selecting an outgoing arc at each node based on the calculated conditional probabilities, the whole network can be gradually instantiated.
4.4. The Embedded Local Search
The search mechanism of GA guarantees a good performance in the βexplorationβ of the solution space. However, it has been reported that GA alone cannot achieve satisfactory solution quality for complex optimization problems. Actually, a local search procedure is usually added within the framework of GA in order to provide reliable βexploitationβ ability. In this paper, we design such a local optimizer, which attempts to improve the selected solutions. In each iteration of PMBGA, the local search is carried out for the best of solutions in the current population. Thus, is an important parameter for adjusting the frequency of local search and achieving a balance between exploration and exploitation.
In the following, we will describe how to perform the local search on a given solution (a sequence of rules).
Step 1. Use the Giffler-Thompson algorithm to generate an active schedule based on . The objective value of the obtained schedule is .
Step 2. Set .
Step 3. Exert random perturbations on the processing times: generate a new set of processing times from the normal distribution .
Step 4. Use the Giffler-Thompson algorithm to generate an active schedule based on . Note that, in this process, the processing time of each operation takes its new value, that is, .
Step 5. Evaluate the objective value of under the original processing times, obtaining .
Step 6. If , exit the local search. Otherwise, continue with Step 7.
Step 7. Let . If , terminate the local search procedure with no improvement found. Otherwise, go back to Step 3.
The local search procedure attempts to find a better schedule from an initial starting solution. When the processing time of each operation varies due to the random perturbations in Step 3, different schedules may be obtained by applying the same set of dispatching rules. This is the fundamental idea of the local search procedure. The relative deviation is set as 0.2 times the mean, which can produce a moderate level of perturbation. If the variance is too large or too small, the local search will be inefficient. Finally, if it turns out that , which means that the local search has found an improvement, then the original schedule should be accordingly revised. On the other hand, if no better schedule is found within trials, the local search will quit, leaving the current solution unchanged.
Example 4.2. Here, we provide a concrete example to illustrate the local search procedure. In the TWT-JSSP instance, the processing time of each operation is marked beside the corresponding node in Figure 3 (the disjunctive graph). The due dates of each job are set as , , and . The weights of each job are set as , , and .
Suppose the initial solution is , which indicates using the SPT rule at all times.
First, the solution is decoded by applying the Giffler-Thompson algorithm under the original values of processing times. The following schedule is obtained: The objective value is (see Figure 4).
Next, we add random perturbations to the processing times and thus generate different new instances of the JSSP. In one of these instances, the processing time of each operation is as follows: The solution is decoded again by applying the Giffler-Thompson algorithm under the above new values of . The following schedule can be obtained: when evaluated under the original processing times, the objective value of is (see Figure 5).
Therefore, an improvement has been found on the initial solution (because ).
5. Computational Results
In order to test the performance of the proposed PMBGA, the same benchmark instances as in [15, 19] are used in our computational experiment. In these instances, the due date of each job is set as , where denotes the total processing time of job , and is a coefficient that controls the tightness level of the due date setting. The first 20% of jobs are assigned weighting 4 (very important), the next 60% are assigned weighting 2, and the remaining jobs are assigned weighting 1 (not important). That is, , , and .
Based on extensive computational tests, the algorithm parameters are set as follows. The population size ; the proportion of individuals selected for local search ; the maximum number of random perturbations . In the implementation of the ATC rule, we set and .
We compare the performance of the proposed PMBGA with the hybrid optimization algorithm PSO-SA [37] and a rule-based genetic algorithm (RBGA) which uses the same dispatching rules for encoding but adopts classical crossover and mutation operators. In order to make the comparisons meaningful, we set a computational time limit for all the algorithms. In the following, the time limit for solving each instance is determined as 60 seconds. Each algorithm is run for 20 independent times on each TWT-JSSP instance. Tables 1, 2, and 3 report the average objective value obtained from the 20βruns. β# optβ indicates the number of times that the optimum (the optimum for each instance (listed in the second column in the tables) is first given by the branch-and-bound algorithm [14] and recently updated by the hybrid genetic algorithm [21]) has been reached during the 20βruns. The results demonstrate the superiority of the PMBGA.
According to the computational results, the proposed PMBGA systematically outperforms the comparative methods. In addition, the following comments can be made:(1)The proposed PMBGA performs better than the PSO-SA which adopts operation permutation-based encoding scheme. The advantage of PMBGA is even stronger when the due dates in the TWT-JSSP instances are set tighter. This suggests that the rule-based optimization approach is more effective than sequence-based search when many jobs are prone to be tardy and thus are competing fiercely for the limited machine resources. Applying dispatching rules turns out to be a satisfactory and robust strategy in this situation;(2)Also, PMBGA outperforms RBGA to a greater extent. This reveals the effectiveness of the proposed approach from two aspects:(i)Since RBGA does not involve a local search module, the results show that the specialized local search procedure can help to promote the overall performance of GA. In particular, the local optimizer in PMBGA is closely based on the specific characteristics of the considered JSSP instance: the magnitude of the random sampling is consistent with the processing time (cf. Step 3 of the procedure), which ensures a reasonable size of the search scope.(ii)The results show that using the estimation of distribution principle to optimize the rule combinations is more effective than the traditional crossover and mutation operators. Noticeably, the essential point in this process is to model the relationship and interactions between the rule selections for different jobs and different machines.
Meanwhile, we also test the impact of the parameter on the final solution quality of PMBGA. A reasonable selection of will result in an effective balance between exploration and exploitation. In the following experiment, instance ORB1 under is used and the time limit is set as 40βsec and 60βsec, respectively. The computational results are displayed in Figure 6, where the vertical axis gives the average objective value obtained from 20 independent executions of the proposed PMBGA under each .
According to the results, the setting of has a considerable impact on the solution quality, especially when the computational time is scarce (40βsec), which verifies that the proposed local search module is effective in accelerating the overall convergence of PMBGA. A small means that only a few solutions in each generation can be improved by the local search, which has little effect on the entire population. A large suggests that too much time is consumed on local search, which may impair the normal function of PMBGA because of the reduced generations.
The best setting of under each constraint level is 60 (for tight time budget) and 30 (for loose time budget). When the exogenous restriction on computational time is tight, PMBGA has to rely on frequent local search to find good solutions. This is because, in the short term, local search is more efficient than PMBGAβs mechanism (Bayesian network modeling) in improving a solution. However, the price to pay is possibly a premature convergence of the whole optimization process. On the other hand, when the computational time is more sufficient, PMBGA will prefer a larger number of generations to conduct a systematic exploration of the solution space. In this case, the local search need not be used very frequently, otherwise the steady searching and learning process may be disturbed.
Finally, we observe the impact of the parameters in the ATC rule. We write the estimated waiting time of operation as proportional to its processing time: . Now, we use instance MT10 with to test the two parameters, and . The average objective values obtained by PMBGA under each combination is shown in Figure 7.
In fact, the influence of the ATC parameters is not so significant as the other parameters. But a clearly inferior setting is , which could be eliminated. Based on additional experimental results which are not listed here, both and seem acceptable. Overall, setting and yield satisfactory solution quality for most scheduling instances involved in this study.
6. Conclusion
In this paper, we propose a new probabilistic model-building genetic algorithm for solving the job shop scheduling problem under the total weighted tardiness criterion. Since the TWT objective is systematically more difficult to optimize than the conventional makespan objective, we rely on some dispatching rules for schedule construction. PMBGA is used to search for good combinations of these rules, and a specific local search algorithm is embedded into the optimization framework. The computational experiments have shown the superiority of the proposed methodology.
Future research can be conducted from the following aspects. Although the standard Bayesian network is capable of modeling the interactions between decision variables, it is necessary to improve the computational efficiency in the rule optimization process. Meanwhile, it is interesting to try other encoding schemes, which may be beneficial for the discovery of useful structural properties and may enhance the overall performance of the PMBGA.
Acknowledgment
This paper was supported by the National Natural Science Foundation of China (Grant nos. 61104176, and 60874071).