Abstract

Most existing research on the job shop scheduling problem has been focused on the minimization of makespan (i.e., the completion time of the last job). However, in the fiercely competitive market nowadays, delivery punctuality is more important for maintaining a high service reputation. So in this paper, we aim at solving job shop scheduling problems with the total weighted tardiness objective. Several dispatching rules are adopted in the Giffler-Thompson algorithm for constructing active schedules. It is noticeable that the rule selections for scheduling consecutive operations are not mutually independent but actually interrelated. Under such circumstances, a probabilistic model-building genetic algorithm (PMBGA) is proposed to optimize the sequence of selected rules. First, we use Bayesian networks to model the distribution characteristics of high-quality solutions in the population. Then, the new generation of individuals is produced by sampling the established Bayesian network. Finally, some elitist individuals are further improved by a special local search module based on parameter perturbation. The superiority of the proposed approach is verified by extensive computational experiments and comparisons.

1. Introduction

The job shop scheduling problem (JSSP) has been known as an extremely difficult combinatorial optimization problem ever since the 1950s. In terms of computational complexity, JSSP is strongly 𝒩𝒫-hard [1]. Because of its relevance to contemporary manufacturing systems, extensive research has been conducted on the problem [2]. In recent years, the metaheuristicsβ€”such as simulated annealing (SA) [3], genetic algorithm (GA) [4–6], tabu search (TS) [7, 8], particle swarm optimization (PSO) [9, 10], and ant colony optimization (ACO) [11, 12]β€”have clearly become the most popular methods.

However, most existing research is still based on the standard JSSP model, so it is inconvenient to apply these algorithms directly to real-world scheduling scenarios. Especially, most algorithms are designed for the makespan criterion (i.e., minimize 𝐢max). Actually, in the make-to-order production environment nowadays, due date related performances are becoming increasingly significant, because the in-time delivery of goods is vital for maintaining a high service reputation. Therefore, the algorithms that aim at minimizing tardiness in JSSP deserve more investigations.

In an attempt to contribute to the scheduling community, in this paper we study the JSSP with the objective of minimizing total weighted tardiness (TWT). TWT is the weighted sum of each job’s tardiness against its due date. In some respects, the TWT measure captures the critical factors related with the profitability of a firm more accurately than the makespan. Meanwhile, from the theoretical perspective, the complexity of solving JSSP with TWT objective (abbreviated as TWT-JSSP hereinafter) is much greater than that of solving JSSP with makespan objective [13]. The value of TWT is affected by all the tardy jobs and thus is more sensitive to the changes in the schedule. According to the three-field notation, the studied problem can be described as βˆ‘π‘€Jπ‘š||𝑗𝑇𝑗.

Relatively few publications have discussed the minimization of TWT in JSSP. The only exact method is the branch-and-bound algorithm proposed by Singer and Pinedo [14], while the rest belong to the heuristic category. The first attempt of adapting the shifting bottleneck algorithm (which was very successful for makespan minimization) to TWT-JSSP is reported in [15]. In [16, 17], the authors propose modified shifting bottleneck heuristics for complex job shops (characterized by parallel batching machines, sequence-dependent setup times, and reentrant process flows) with TWT criterion. In these algorithms, the single/parallel machine subproblems are solved basically by the ATC rule or the extended BATCS rule (for the case of batching and setups). The shifting bottleneck heuristic proposed in [18], however, uses genetic algorithms to solve the subproblems. The computations show that using GA as subproblem solution procedures leads to improved results compared to dispatching-based subproblem solution procedures. Besides shifting bottleneck, a large step random walk (LSRW) algorithm is designed in [19], which uses different neighborhood sizes depending on whether a small step or a large step is performed. A small step consists of iterative improvement, while a large step consists of a Metropolis algorithm. In [20, 21], hybrid genetic algorithms are presented for solving TWT-JSSP. The former combines GA with different dispatching rules, while the latter combines GA with an iterated local search procedure that uses the longest path approach on a disjunctive graph model. In [22], a tabu search algorithm is presented for the generalized TWT-JSSP with release times and precedence constraints. Recently, electromagnetic algorithms (EM) have been successfully applied to TWT-JSSP [23]. In fact, most of these algorithms have relatively high computational complexity, and thus they are incapable of solving large-scale TWT-JSSP.

Rule-based scheduling approaches have been investigated for job shop scheduling since the late 1990s [24, 25]. A remarkable advantage of using dispatching rules is that it helps to save computational time and generally produces satisfactory solutions. Recently, modern optimization strategies are applied to find high-quality combinations of dispatching rules. For example, a neural network is designed in [26] to determine the dispatching rule to use on each machine in a job shop (it is necessary to train the neural network beforehand with optimal rule combinations for some known instances). A memetic algorithm with tailored-encoding/decoding schemes, and a local search procedure is proposed in [27] for minimizing the number of tardy jobs in JSSP. The memetic algorithm defeats a multistart hill-climbing approach and a simulated annealing approach. A dispatching rule-based genetic algorithms (DRGA) is proposed in [28], which searches simultaneously for the best sequence of dispatching rules and the number of operations to be handled by each dispatching rule. The DRGA obtains better results than a GA using the conventional dispatching rule representation and a GA that uses the operation permutation representation. A genetic programming-based data mining approach is proposed in [29] to select dispatching rules under a given set of shop parameters (e.g., interarrival times). The results obtained from simulation show that the selected dispatching rules are appropriate according to the current shop status.

The rest of the paper is organized as follows. The discussed problem is mathematically formulated in Section 2. Section 3 makes a brief introduction to the principles of PMBGA. Section 4 proposes a rule-based PMBGA for solving TWT-JSSP. Section 5 presents the computational results and analysis. Finally, Section 6 concludes the paper.

2. Problem Formulation

In a JSSP instance, a set of 𝑛 jobs {𝐽𝑗}𝑛𝑗=1 are to be processed on a set of π‘š machines {π‘€π‘˜}π‘šπ‘˜=1 under the following basic assumptions: (i) there is no machine breakdown; (ii) no preemption of operations is allowed; (iii) all jobs are released at time 0; (iv) the transportation times and the setup times are all neglected; (v) each machine can process only one job at a time; (vi) each job may be processed by only one machine at a time.

Each job has a fixed processing route which traverses the relevant machines (in the standard JSSP benchmark instances, each job is required to visit all the machines. But actually, the number of operations for each job (π‘šπ‘—) can be less than π‘š) in a predetermined order. The manufacturing process of job 𝑗 on machine π‘˜ is noted as operation π‘‚π‘—π‘˜, with a duration of π‘π‘—π‘˜. Besides, a preset due date 𝑑𝑗 (describing the level of urgency) and a preset weight 𝑀𝑗 (reflecting the importance of the order) are given for each job 𝑗. The objective function is defined as βˆ‘TWT=𝑛𝑗=1𝑀𝑗𝑇𝑗, where 𝑇𝑗=max{πΆπ‘—βˆ’π‘‘π‘—,0} defines the tardiness of job 𝑗.

JSSP can be described by a disjunctive graph 𝐺(𝑂,𝐴,𝐸) [30]. 𝑂={π‘‚π‘—π‘˜βˆ£π‘—=1,…,𝑛,π‘˜=1,…,π‘š} is the set of nodes. 𝐴 is the set of conjunctive arcs which connect successive operations of the same job, so 𝐴 describes the technological constraints in the JSSP instance. ⋃𝐸=π‘šπ‘˜=1πΈπ‘˜ is the set of disjunctive arcs, where πΈπ‘˜ denotes the disjunctive arcs corresponding to the operations on machine π‘˜. Each arc in πΈπ‘˜ connects a pair of operations to be processed by machine π‘˜ and ensures that the two operations should not be processed simultaneously. Initially, the disjunctive arcs do not have fixed directions.

Under the disjunctive graph representation, finding a feasible schedule for the JSSP is equivalent to orienting all the disjunctive arcs so that no directed cycles exist in the resulting graph. In this paper, we use 𝜎 to denote the set of directed disjunctive arcs which are transformed from the original 𝐸. Thus, if 𝐴βˆͺ𝜎 is acyclic, the schedule corresponding to 𝜎 is feasible (in the rest of the paper, we do not distinguish between 𝜎 and the schedule. For the convenience of expression, we will write 𝜎 as a matrix. The π‘˜th row of 𝜎 represents the processing order of the operations on machine π‘˜).

Based on the disjunctive graph model, the discussed TWT-JSSP can be mathematically formulated as follows:minTWT=𝑛𝑗=1π‘€π‘—ξ‚€π‘‘π‘—π‘˜π‘—+π‘π‘—π‘˜π‘—βˆ’π‘‘π‘—ξ‚+,𝑑(2.1)s.t.π‘—π‘˜+π‘π‘—π‘˜β©½π‘‘π‘—π‘˜β€²ξ€·π‘‚,βˆ€π‘—π‘˜,π‘‚π‘—π‘˜β€²ξ€Έπ‘‘βˆˆπ΄,(a)π‘—π‘˜+π‘π‘—π‘˜β©½π‘‘π‘—β€²π‘˜βˆ¨π‘‘π‘—β€²π‘˜+π‘π‘—β€²π‘˜β©½π‘‘π‘—π‘˜ξ€·π‘‚,βˆ€π‘—π‘˜,π‘‚π‘—β€²π‘˜ξ€ΈβˆˆπΈπ‘˜,𝑑(b)π‘—π‘˜β©Ύ0,𝑗=1,…,𝑛,π‘˜=1,…,π‘š.(c)In this formulation, (π‘₯)+=max{π‘₯,0}. π‘‘π‘—π‘˜ represents the starting time of operation π‘‚π‘—π‘˜. π‘˜π‘— denotes the index of the machine that processes the last operation of job 𝑗, so the completion time of job 𝑗 is π‘‘π‘—π‘˜π‘—+π‘π‘—π‘˜π‘—. The set of constraints (a) ensure that the processing order of the operations from each job is consistent with the technological routes. The set of constraints (b) ensure that any two operations on the same machine cannot be processed simultaneously.

3. A Brief Introduction to PMBGA

Recently, there has been a growing interest in the evolutionary algorithms that explore the search space by building and utilizing probabilistic models of high-quality solutions. Indeed, these algorithms use the following two steps to replace the conventional crossover and mutation operators in GA:(1)Build a probabilistic model of the selected promising solutions;(2)Sample the built model to produce a new generation of candidate solutions.

The evolutionary algorithms based on such a principle are referred to as estimation of distribution algorithms (EDAs) or probabilistic model-building genetic algorithms (PMBGAs). The major steps of a PMBGA implementation are listed as follows, where GN is the maximum number of generations.

Step 1. Set the generation index 𝑔=0. Initialize the population of the first generation, that is, 𝑃(0).

Step 2. Select a subset 𝑆 of promising individuals from 𝑃(𝑔).

Step 3. Establish a probabilistic model 𝑀 which somehow describes the distribution characteristics of 𝑆.

Step 4. Generate a set 𝑁 of new individuals by sampling 𝑀.

Step 5. Select the best |𝑃(𝑔)| individuals from 𝑃(𝑔)βˆͺ𝑁 and assign them to the next generation population 𝑃(𝑔+1).

Step 6. Let 𝑔←𝑔+1. If 𝑔<𝐺𝑁, return to Step 2. Otherwise, output the best solution in 𝑃(𝑔).

The PMBGA is especially useful for the complex optimization problems in which the decision variables are correlated. For such problems, the realized value of a certain decision variable can produce an impact on the optimal value for another decision variable. Therefore, if these variables are optimized in a separate way (or one by one), traditional GA will be very likely to converge to local optimum. PMBGA improves traditional GA by modeling the relationship between decision variables. Clearly, the most crucial element in the design of PMBGA is the type of the adopted probabilistic model (in Step 3), which directly affects the algorithm’s capability of producing high-quality offspring solutions. In the artificial intelligence community, the commonly adopted graphical model for characterizing the relationship between a set of discrete variables is the Bayesian network [31]. A Bayesian network is a directed acyclic graph. A node in the Bayesian network indicates a variable under investigation (each variable actually corresponds to a coding gene for the solutions in PMBGA), and an arc indicates the probabilistic causal relationship between the two nodes connected by it. The direction of the arc implies that the variable corresponding to the head node of the arc is conditioned by the variable corresponding to the tail node. In general, the joint probabilistic distribution of an 𝑛-variate random vector 𝐗=(𝑋1,…,𝑋𝑛) described by a Bayesian network can be calculated as𝑃(𝐱)=𝑛𝑖=1𝑃π‘₯𝑖π‘₯βˆ£π‘π‘Žπ‘–ξ€Έξ€Έ.(3.1) In this formulation, 𝐱=(π‘₯1,…,π‘₯𝑛) is a vector of realized values for 𝐗; π‘π‘Ž(π‘₯𝑖) is a set of realized values for the parents (in a Bayesian network, if there exists a directed arc pointing from node 𝑋𝑗 to 𝑋𝑖, then 𝑋𝑗 is called a parent of 𝑋𝑖) of the random variable 𝑋𝑖.

A detailed introduction to the PMBGA can be found in [32]. Some important advances and interesting applications of EDA are covered in [33, 34]. Successes in utilizing EDA to solve scheduling problems have been reported in [35, 36] (for flow shop scheduling).

4. The Proposed PMBGA for Solving TWT-JSSP

4.1. Encoding

The proposed PMBGA relies on dispatching rules to record the scheduling policies. Eight scheduling rules are involved in this study. In the following expressions for the priority index, operation 𝑖 belongs to job 𝑗, and 𝑀𝑗 and 𝑑𝑗 are the corresponding job’s weight and due date. 𝐽𝑆(𝑖) and 𝐽𝑃(𝑖), respectively, denote the set of job successors of operation 𝑖 and the set of job predecessors of operation 𝑖. 𝑍 is used to indicate that the operation with the smallest index will be chosen from the conflict set, while 𝑍′ is used to indicate that the operation with the largest index will be chosen from the conflict set:(1)ATC (apparent tardiness cost): π‘ξ…žπ‘–(𝑑)=(𝑀𝑗/𝑝𝑖)Γ—exp{βˆ’(π‘‘π‘—βˆ’π‘Ÿπ‘–βˆ’π‘π‘–β€‰βˆ’β€‰βˆ‘π‘–ξ…žβˆˆπ½π‘†(𝑖)(ξ‚Šπ‘Šπ‘–ξ…ž+π‘π‘–ξ…ž))+/(𝐾×𝑝)} (in this expression, π‘Ÿπ‘– denotes the earliest starting time of operation 𝑖, and 𝑝 denotes the average processing time of the current operations in the conflict set. 𝐾 is a scaling parameter (or called β€œlook-ahead” parameter). ξ‚Šπ‘Šπ‘– is the estimated lead time of operation 𝑖);(2)SPT (shortest processing time): 𝑍𝑖=𝑝𝑖;(3)LPT (longest processing time): π‘ξ…žπ‘–=𝑝𝑖;(4)WSPT (weighted shortest processing time): π‘ξ…žπ‘–=𝑀𝑗/𝑝𝑖;(5)SRPT (shortest remaining processing time): 𝑍𝑖=𝑝𝑖+βˆ‘π‘–ξ…žβˆˆπ½π‘†(𝑖)π‘π‘–ξ…ž;(6)LRPT (longest remaining processing time): π‘ξ…žπ‘–=𝑝𝑖+βˆ‘π‘–ξ…žβˆˆπ½π‘†(𝑖)π‘π‘–ξ…ž;(7)EDD (earliest due date): 𝑍𝑖=𝑑𝑗;(8)ODD (operation due date): 𝑍𝑖(𝑑)=[𝑑𝑗/βˆ‘π‘–ξ…žβˆˆπ½π‘†(𝑖)βˆͺ{𝑖}π‘π‘–ξ…žβˆ‘]Γ—π‘–ξ…žβˆˆπ½π‘ƒ(𝑖)βˆͺ{𝑖}π‘π‘–ξ…ž.

In PMBGA, each encoding digit (i.e., gene) is expressed by the serial number (1∼8) of the selected dispatching rule. A solution is represented by a rule sequence {π‘…π‘–π‘˜βˆΆπ‘–=1,…,𝑛,π‘˜=1,…,π‘š}, where π‘…π‘–π‘˜ indicates the rule to use when scheduling the 𝑖th operation on machine π‘˜. Therefore, the encoding length for each solution is 𝑙=π‘›Γ—π‘š.

4.2. Decoding

In order to evaluate the fitness of a solution, the Giffler-Thompson algorithm is applied to construct an active schedule based on the specified dispatching rules. The implementation of the Giffler-Thompson algorithm is detailed below.

Input: A sequence of rules {π‘…π‘–π‘˜βˆΆπ‘–=1,…,𝑛,π‘˜=1,…,π‘š}.

Step 1. Let 𝑄(1)=𝑂={1,…,π‘›π‘š} (the set of all operations), 𝑅(1)=𝐹(𝑂)={𝑓1,…,𝑓𝑛} (the set of first operations of each job). Set 𝑑=1 and πœ‹π‘˜=1 (π‘˜=1,…,π‘š).

Step 2. Find the operation π‘–βˆ—=argminπ‘–βˆˆπ‘…(𝑑){π‘Ÿπ‘–+𝑝𝑖}, and let π‘šβˆ— be the index of the machine on which this operation should be processed. Use 𝐡(𝑑) to denote all the operations from 𝑅(𝑑) which should be processed on machine π‘šβˆ—.

Step 3. Delete from 𝐡(𝑑) the operations that satisfy π‘Ÿπ‘–β©Ύπ‘Ÿπ‘–βˆ—+π‘π‘–βˆ—.

Step 4. Use rule π‘…πœ‹π‘šβˆ—π‘šβˆ— to identify an operation Μ‚π‘œ from 𝐡(𝑑) if currently there are more than one candidates. Schedule operation Μ‚π‘œ on machine π‘šβˆ— at the earliest possible time. Let πœ‹π‘šβˆ—β†πœ‹π‘šβˆ—+1.

Step 5. Let 𝑄(𝑑+1)=𝑄(𝑑)⧡{Μ‚π‘œ}, 𝑅(𝑑+1)=𝑅(𝑑)⧡{Μ‚π‘œ}βˆͺ{suc(Μ‚π‘œ)}, where suc(Μ‚π‘œ) denotes the immediate job successor of operation Μ‚π‘œ (if any).

Step 6. If 𝑄(𝑑+1)β‰ βˆ…, set 𝑑←𝑑+1 and go to Step 2. Otherwise, the decoding procedure is terminated.

In the above description, the release time π‘Ÿπ‘– equals the earliest possible starting time of operation 𝑖 (determined from the already scheduled operations). So, (π‘Ÿπ‘–+𝑝𝑖) is the earliest possible completion time of operation 𝑖. 𝑄(𝑑) represents the set of operations yet to be scheduled at iteration 𝑑, while 𝑅(𝑑) represents the set of ready operations (whose job predecessors have all been scheduled) at iteration 𝑑. In Step 4, the operation set 𝐡(𝑑) is also called a conflict set.

4.3. Producing Offspring Individuals

(a) Selection
In each iteration, we first sort all the individuals in the current population according to their fitness. Then, we select the best 1/4 of individuals to form the set 𝑆, which will subsequently be used to build the Bayesian network in order to produce a new generation of individuals.

(b) The Adopted Network Structure
A Bayesian network has to be built to describe the probabilistic distribution of favorable π‘…π‘–π‘˜ settings based on the elite individuals in 𝑆. Each individual can be characterized by a directed acyclic network as shown in Figure 1. In this network, a node π‘π‘Ÿπ‘–,π‘˜ (π‘Ÿβˆˆ{1,2,…,8}) indicates the fact that the dispatching rule used to schedule the 𝑖th operation on machine π‘˜ is selected as rule Number π‘Ÿ. The directed arc from node π‘π‘Ÿπ‘–,π‘˜ to node π‘π‘Ÿξ…žπ‘–,π‘˜+1 (or π‘π‘Ÿξ…žπ‘–+1,1) represents the dependency between the two nodes, so it characterizes the possible influence of the rule selection for the 𝑖th operation on machine π‘˜ on the rule selection for the 𝑖th operation on machine π‘˜+1 (or the (𝑖+1)-th operation on machine 1). Therefore, a directed path from a certain node in the first row to a certain node in the (π‘›Γ—π‘š)-th row can completely describe an individual in the population (because a directed path records an instantiation of all the rule selections).

(c) Calculation of the Probability Values
Since we adopt a fixed network structure in PMBGA, building the Bayesian network is equivalent to determining the values of all the conditional probabilities according to the selected solution set 𝑆. After that, new individuals will be produced by iteratively sampling these probabilistic distributions, expecting to obtain high-quality offsprings.

Given a number of individuals (i.e., the training set 𝑆), an estimate of the conditional probabilities can be obtained simply by counting the frequencies of occurrence.

Example 4.1. Here, we provide a concrete example to illustrate the probability calculation process. For a PMBGA optimization instance with 𝑛=3,π‘š=1 and π‘Ÿ=3, let us further suppose the current 𝑆 contains 40 individuals. In Figure 2, the statistics of these individuals are displayed on a network as previously defined. The weight of each arc (the number placed on the arc) indicates the occurring frequency of this arc (counted for all the individuals in 𝑆). For example, if there exists an individual coded as β€œ[3,1,2]”, then the path β€œπ‘31,1→𝑁12,1→𝑁23,1” in the Bayesian network is used to record this individual, and consequently, the weights (counted frequencies) of the arcs 𝑁31,1→𝑁12,1 and 𝑁12,1→𝑁23,1 will be increased by 1, respectively. Note that, in the final network, the sum of the weights of all the incoming arcs of a certain node should be equal to the sum of the weights of all the outgoing arcs of the same node. This is because each individual corresponds to a complete path from the first row to the last row.

By using frequency as an approximation for probability, the relevant (conditional) probabilities should be calculated as follows:𝑃𝑁11,1=8+2𝑁40,𝑃21,1=7+3+4𝑁40,𝑃31,1=5+5+6,𝑃𝑁4012,1βˆ£π‘11,1=8𝑁8+2,𝑃22,1βˆ£π‘11,1𝑁=0,𝑃32,1βˆ£π‘11,1=2,𝑃𝑁8+212,1βˆ£π‘21,1=7𝑁7+3+4,𝑃22,1βˆ£π‘21,1=3𝑁7+3+4,𝑃32,1βˆ£π‘21,1=4,𝑃𝑁7+3+412,1βˆ£π‘31,1=5𝑁5+5+6,𝑃22,1βˆ£π‘31,1=5𝑁5+5+6,𝑃32,1βˆ£π‘31,1=6,𝑃𝑁5+5+613,1βˆ£π‘12,1=2𝑁2+3+15,𝑃23,1βˆ£π‘12,1=3𝑁2+3+15,𝑃33,1βˆ£π‘12,1=15,𝑃𝑁2+3+1513,1βˆ£π‘22,1=2𝑁2+6,𝑃23,1βˆ£π‘22,1=6𝑁2+6,𝑃33,1βˆ£π‘22,1𝑃𝑁=0,13,1βˆ£π‘32,1𝑁=0,𝑃23,1βˆ£π‘32,1=7𝑁7+5,𝑃33,1βˆ£π‘32,1=5.7+5(4.1)

According to the above calculation method, a connection can never be rediscovered in the PMBGA if the corresponding conditional probability is zero (e.g., from 𝑁22,1 to 𝑁33,1). To overcome this drawback, we can set the minimum count to 1. Taking 𝑁22,1 as an example, the conditional probabilities for the outgoing arcs will then become𝑃𝑁13,1βˆ£π‘22,1=3𝑁3+7+1,𝑃23,1βˆ£π‘22,1=7𝑁3+7+1,𝑃33,1βˆ£π‘22,1=1.3+7+1(4.2) Now it is possible to discover 𝑁33,1 from 𝑁22,1, though the probability is small.

(d) The Sampling Process
The sampling process for generating a new individual begins from the root nodes of the Bayesian network. By selecting an outgoing arc at each node based on the calculated conditional probabilities, the whole network can be gradually instantiated.

4.4. The Embedded Local Search

The search mechanism of GA guarantees a good performance in the β€œexploration” of the solution space. However, it has been reported that GA alone cannot achieve satisfactory solution quality for complex optimization problems. Actually, a local search procedure is usually added within the framework of GA in order to provide reliable β€œexploitation” ability. In this paper, we design such a local optimizer, which attempts to improve the selected solutions. In each iteration of PMBGA, the local search is carried out for the best 𝑒% of solutions in the current population. Thus, 𝑒 is an important parameter for adjusting the frequency of local search and achieving a balance between exploration and exploitation.

In the following, we will describe how to perform the local search on a given solution 𝐬 (a sequence of rules).

Step 1. Use the Giffler-Thompson algorithm to generate an active schedule 𝜎1 based on 𝐬. The objective value of the obtained schedule is TWT𝜎1.

Step 2. Set 𝑒=1.

Step 3. Exert random perturbations on the processing times: generate a new set of processing times {𝑝(𝑒)π‘—π‘˜} from the normal distribution 𝒩(π‘π‘—π‘˜,(0.2π‘π‘—π‘˜)2).

Step 4. Use the Giffler-Thompson algorithm to generate an active schedule 𝜎2 based on 𝐬. Note that, in this process, the processing time of each operation takes its new value, that is, 𝑝(𝑒)π‘—π‘˜.

Step 5. Evaluate the objective value of 𝜎2 under the original processing times, obtaining TWT𝜎2.

Step 6. If TWT𝜎2<TWT𝜎1, exit the local search. Otherwise, continue with Step 7.

Step 7. Let 𝑒←𝑒+1. If 𝑒>π‘ˆ, terminate the local search procedure with no improvement found. Otherwise, go back to Step 3.

The local search procedure attempts to find a better schedule from an initial starting solution. When the processing time of each operation varies due to the random perturbations in Step 3, different schedules may be obtained by applying the same set of dispatching rules. This is the fundamental idea of the local search procedure. The relative deviation is set as 0.2 times the mean, which can produce a moderate level of perturbation. If the variance is too large or too small, the local search will be inefficient. Finally, if it turns out that TWT𝜎2<TWT𝜎1, which means that the local search has found an improvement, then the original schedule should be accordingly revised. On the other hand, if no better schedule is found within π‘ˆ trials, the local search will quit, leaving the current solution unchanged.

Example 4.2. Here, we provide a concrete example to illustrate the local search procedure. In the 3Γ—3 TWT-JSSP instance, the processing time of each operation is marked beside the corresponding node in Figure 3 (the disjunctive graph). The due dates of each job are set as 𝑑1=103, 𝑑2=146, and 𝑑3=137. The weights of each job are set as 𝑀1=1, 𝑀2=7, and 𝑀3=4.

Suppose the initial solution is 𝐬=[2,2,…], which indicates using the SPT rule at all times.

First, the solution is decoded by applying the Giffler-Thompson algorithm under the original values of processing times. The following schedule is obtained:𝜎1=βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘‚11𝑂21𝑂31𝑂12𝑂32𝑂22𝑂13𝑂23𝑂33⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦.(4.3) The objective value is TWT𝜎1=2086 (see Figure 4).

Next, we add random perturbations to the processing times and thus generate different new instances of the JSSP. In one of these instances, the processing time of each operation is as follows:𝑝11=29.0,𝑝12=85.0,𝑝13𝑝=58.2,21=52.3,𝑝23=81.9,𝑝22𝑝=65.3,32=78.8,𝑝31=84.0,𝑝33=50.3.(4.4) The solution 𝐬 is decoded again by applying the Giffler-Thompson algorithm under the above new values of π‘π‘—π‘˜. The following schedule can be obtained:𝜎2=βŽ‘βŽ’βŽ’βŽ’βŽ’βŽ£π‘‚11𝑂21𝑂31𝑂32𝑂22𝑂12𝑂23𝑂33𝑂13⎀βŽ₯βŽ₯βŽ₯βŽ₯⎦,(4.5) when evaluated under the original processing times, the objective value of 𝜎2 is TWT𝜎2=1264 (see Figure 5).

Therefore, an improvement has been found on the initial solution (because 1264<2086).

5. Computational Results

In order to test the performance of the proposed PMBGA, the same benchmark instances as in [15, 19] are used in our computational experiment. In these instances, the due date of each job is set as 𝑑𝑗=𝑓×𝑃𝑗, where 𝑃𝑗 denotes the total processing time of job 𝑗, and π‘“βˆˆ{1.6,1.5,1.3} is a coefficient that controls the tightness level of the due date setting. The first 20% of jobs are assigned weighting 4 (very important), the next 60% are assigned weighting 2, and the remaining jobs are assigned weighting 1 (not important). That is, 𝑀1=𝑀2=4, 𝑀3=β‹―=𝑀8=2, and 𝑀9=𝑀10=1.

Based on extensive computational tests, the algorithm parameters are set as follows. The population size PS=50; the proportion of individuals selected for local search 𝑒%=30%; the maximum number of random perturbations π‘ˆ=100. In the implementation of the ATC rule, we set 𝐾=2 and ξ‚Šπ‘Šπ‘–=0.4𝑝𝑖.

We compare the performance of the proposed PMBGA with the hybrid optimization algorithm PSO-SA [37] and a rule-based genetic algorithm (RBGA) which uses the same dispatching rules for encoding but adopts classical crossover and mutation operators. In order to make the comparisons meaningful, we set a computational time limit for all the algorithms. In the following, the time limit for solving each instance is determined as 60 seconds. Each algorithm is run for 20 independent times on each TWT-JSSP instance. Tables 1, 2, and 3 report the average objective value obtained from the 20 runs. β€œ# opt” indicates the number of times that the optimum (the optimum for each instance (listed in the second column in the tables) is first given by the branch-and-bound algorithm [14] and recently updated by the hybrid genetic algorithm [21]) has been reached during the 20 runs. The results demonstrate the superiority of the PMBGA.

According to the computational results, the proposed PMBGA systematically outperforms the comparative methods. In addition, the following comments can be made:(1)The proposed PMBGA performs better than the PSO-SA which adopts operation permutation-based encoding scheme. The advantage of PMBGA is even stronger when the due dates in the TWT-JSSP instances are set tighter. This suggests that the rule-based optimization approach is more effective than sequence-based search when many jobs are prone to be tardy and thus are competing fiercely for the limited machine resources. Applying dispatching rules turns out to be a satisfactory and robust strategy in this situation;(2)Also, PMBGA outperforms RBGA to a greater extent. This reveals the effectiveness of the proposed approach from two aspects:(i)Since RBGA does not involve a local search module, the results show that the specialized local search procedure can help to promote the overall performance of GA. In particular, the local optimizer in PMBGA is closely based on the specific characteristics of the considered JSSP instance: the magnitude of the random sampling is consistent with the processing time (cf. Step 3 of the procedure), which ensures a reasonable size of the search scope.(ii)The results show that using the estimation of distribution principle to optimize the rule combinations is more effective than the traditional crossover and mutation operators. Noticeably, the essential point in this process is to model the relationship and interactions between the rule selections for different jobs and different machines.

Meanwhile, we also test the impact of the parameter 𝑒 on the final solution quality of PMBGA. A reasonable selection of 𝑒 will result in an effective balance between exploration and exploitation. In the following experiment, instance ORB1 under 𝑓=1.5 is used and the time limit is set as 40 sec and 60 sec, respectively. The computational results are displayed in Figure 6, where the vertical axis gives the average objective value obtained from 20 independent executions of the proposed PMBGA under each 𝑒.

According to the results, the setting of 𝑒 has a considerable impact on the solution quality, especially when the computational time is scarce (40 sec), which verifies that the proposed local search module is effective in accelerating the overall convergence of PMBGA. A small 𝑒 means that only a few solutions in each generation can be improved by the local search, which has little effect on the entire population. A large 𝑒 suggests that too much time is consumed on local search, which may impair the normal function of PMBGA because of the reduced generations.

The best setting of 𝑒 under each constraint level is 60 (for tight time budget) and 30 (for loose time budget). When the exogenous restriction on computational time is tight, PMBGA has to rely on frequent local search to find good solutions. This is because, in the short term, local search is more efficient than PMBGA’s mechanism (Bayesian network modeling) in improving a solution. However, the price to pay is possibly a premature convergence of the whole optimization process. On the other hand, when the computational time is more sufficient, PMBGA will prefer a larger number of generations to conduct a systematic exploration of the solution space. In this case, the local search need not be used very frequently, otherwise the steady searching and learning process may be disturbed.

Finally, we observe the impact of the parameters in the ATC rule. We write the estimated waiting time ξ‚Šπ‘Šπ‘– of operation 𝑖 as proportional to its processing time: ξ‚Šπ‘Šπ‘–=𝑏×𝑝𝑖. Now, we use instance MT10 with 𝑓=1.3 to test the two parameters, 𝐾 and 𝑏. The average objective values obtained by PMBGA under each (𝐾,𝑏) combination is shown in Figure 7.

In fact, the influence of the ATC parameters is not so significant as the other parameters. But a clearly inferior setting is 𝐾=1, which could be eliminated. Based on additional experimental results which are not listed here, both 𝐾=2 and 𝐾=3 seem acceptable. Overall, setting 𝐾=2 and 𝑏=0.4 yield satisfactory solution quality for most scheduling instances involved in this study.

6. Conclusion

In this paper, we propose a new probabilistic model-building genetic algorithm for solving the job shop scheduling problem under the total weighted tardiness criterion. Since the TWT objective is systematically more difficult to optimize than the conventional makespan objective, we rely on some dispatching rules for schedule construction. PMBGA is used to search for good combinations of these rules, and a specific local search algorithm is embedded into the optimization framework. The computational experiments have shown the superiority of the proposed methodology.

Future research can be conducted from the following aspects. Although the standard Bayesian network is capable of modeling the interactions between decision variables, it is necessary to improve the computational efficiency in the rule optimization process. Meanwhile, it is interesting to try other encoding schemes, which may be beneficial for the discovery of useful structural properties and may enhance the overall performance of the PMBGA.

Acknowledgment

This paper was supported by the National Natural Science Foundation of China (Grant nos. 61104176, and 60874071).