Abstract

We investigate complexity and approximation results on a processor networks where the communication delay depends on the distance between the processors performing tasks. We then prove that there is no heuristic with a performance guarantee smaller than 4/3 for makespan minimization for precedence graph on a large class of processor networks like hypercube, grid, torus, and so forth, with a fixed diameter 𝛿. We extend complexity results when the precedence graph is a bipartite graph. We also design an efficient polynomial-time 𝑂(𝛿2)-approximation algorithm for the makespan minimization on processor networks with diameter 𝛿.

1. Introduction

1.1. Problem Statement

In this paper, we consider the processor network model, which is a generalization of the homogeneous scheduling delay model in which task allocation on the processors does not have any influence over the length of scheduling. Indeed, since the graph of processors (denoted hereafter 𝐺=(𝑉,𝐸) where 𝑉={𝜋1,,𝜋𝑚} is a set of 𝑚 processors and 𝐸 is the set relationship between them) is fully connected, the starting of a task 𝑖 depends only on the potential communication delay, given by precedence graph between 𝑖 and its own predecessors.

In the processor network model, this assumption is relaxed in order to take into account the fact that the processor graph may not be fully connected. Thus, task allocation on the processors can be expressed by its essential and fundamentals characteristics. We consider a model in which a distance function (which is defined hereafter), denoted 𝑑(𝜋𝑙,𝜋) between two processors 𝜋𝑙 and 𝜋 in the graph of processors impacts computation of the communication delay between two tasks 𝑖 and 𝑗 (subject to a precedence constraint) and consequently on the starting time of task 𝑗. The communication time, using 𝑐𝑖,𝜋𝑙,𝑗,𝜋 for computing the starting time of a task (this notation indicates that the value of the communication delay between task 𝑖, which is allotted to processor 𝜋𝑙 and task 𝑗 which will be executed on the processor 𝜋), is assumed as 𝑐𝑖𝑗𝑑(𝜋𝑙,𝜋), where 𝑐𝑖𝑗 is the communication delay given by the precedence graph.

Formally, the processor network model may be defined as(𝑖,𝑗)𝐸,𝑡𝑗𝑡𝑖+𝑝𝑖+𝑐𝑖𝑗𝑑𝜋,𝜋,(1.1) where 𝜋 (resp. 𝜋) represents the processor on which task 𝑖 (resp. task 𝑗) is scheduled, 𝑡𝑖 represents the starting time of task 𝑖, 𝑝𝑖 represents the processing time of task 𝑖, 𝑑(𝜋,𝜋) represents the shortest path in graph 𝐺 (the graph of processor 𝐺=(𝑉,𝐸)) between 𝜋 and 𝜋, and 𝑐𝑖𝑗 represents the communication delay if two tasks are executed on two neighboring processors (this value is given by the precedence graph).

We consider the classic scheduling UET-UCT (Unit Execution Time-Unit Communication Time, i.e., 𝑖𝑉, 𝑝𝑖=1, and (𝑖,𝑗)𝐸,𝑐𝑖𝑗=1) problem on a bounded number of processors such that the processor network is a structured graph with a diameter 𝛿. In these topologies, processors are numbered as 𝜋1,𝜋2,,𝜋𝑚 and processor 𝜋 may be communicated with processor 𝜋𝑙 with a communication cost equal to 𝑑(𝜋,𝜋𝑙) where 𝑑(𝜋,𝜋𝑙) represents the shortest path on graph 𝐺 between processors 𝜋 and 𝜋𝑙. The communication delay is therefore the distance function proposed above.

In scheduling theory, a problem type is categorized by its machine environment, job characteristic, and objective function. Thus, using the three fields notation scheme 𝛼|𝛽|𝛾,(where 𝛼 designates the environment processors, 𝛽 the characteristics of the job, and 𝛾 the criteria.) proposed by Graham et al. [1], we consider the problem of makespan minimization (denoted in follows by 𝐶max) with unitary task and unitary communication delay (UET-UCT) in presence of a precedence graph 𝐺 on a processors network having a graph 𝐺 such that the communication delay depends on the shortest path on graph 𝐺. This problem is denoted by (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max.

Example 1.1. Figure 1 shows the difference between the two problems 𝑃|prec;𝑐𝑖𝑗=1;𝑝𝑖=1|𝐶max and (𝑃,grid2×2)|prec;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max. (The relationship between processors is as follows: 𝜋0 and 𝜋3 are connected to 𝜋1 and 𝜋2.) The processing time of the tasks and the communication delay between the tasks are unitary (UET-UCT problem). Gantt diagram 𝐺1 represents an optimal solution for the 𝑃|prec;𝑐𝑖𝑗=1;𝑝𝑖=1|𝐶max problem. We can notice that task 𝑧 can be executed on any processor at 𝑡=2. Moreover, Gantt diagram 𝐺2 represents an optimal solution for the problem (𝑃,grid2×2)|prec;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max. In order to obtain an optimal solution, the task 𝑎 must be delayed by one unit of time and must be processed on the same processor 𝜋2 as task 𝑐 at 𝑡=1. Thus, task 𝑒 may be executed at 𝑡=2 only on the processor 𝜋2.

1.2. Organization of the Paper

This paper is organized as follows: the next section is devoted to the related works. In Section 3, after defining the class graph 𝒢 we propose a general nonapproximability result for a nonspecified precedence graph. We also extend the previous result when the precedence graph is a bipartite graph and when the duplication is allowed. In the last section, we design a polynomial-time approximation algorithm with a performance ratio within 𝑂(𝛿).

2.1. Complexity Results

To the best of our knowledge, the first complexity result was given by Picouleau [2]. The considered problem was to schedule unit execution time tasks with a precedence graph on an unbounded number of processors and on a chain or star (a star is a tree of depth one) topology. Picouleau proved that this problem is 𝒩𝒫-complete if the precedence graph is a tree or an outtree. Recently in [3], the authors proved that there is no heuristic with a performance guarantee smaller than 6/5 for minimizing the makespan on a processor network represented by a star. This model is closest to the master-slave architecture. In [4], the authors proved that there is no hope to finding a polynomial-time approximation algorithm with a ratio 𝜌>4/3 for the problem to schedule a set of tasks on a ring or a chain as processors network (see Table 1).

2.1.1. Approximation Results

In ring topology, Lahlou developed, in [5], using the list scheduling proposed by Rayward-Smith [6], a 𝜌-approximation algorithm with 𝑚𝜌1+(3/8)𝑚1/2𝑚 where 𝑚 is the number of processors.

Moreover, Hwang et al. [7] studied approximation list algorithms for scheduling problems where the communication times depend on contention and a distance function for the tasks involved and on the processors that execute the tasks. The authors examined a simple strategy called extended list scheduling, ELS, which is a straightforward extension of list scheduling. They proved that the ELS strategy is unsatisfactory, but improved a strategy called earliest task first.

Recently, in [3] the authors proposed a sophisticated polynomial-time approximation algorithm with a ratio equal to four based on three steps for the problem for the makespan minimization problem on a processor networks as a star forms. In [4] the authors develop two polynomial-time approximation algorithms for processor networks with limited or unlimited resources.

2.2. Our Contributions

In this paper, we answer the following interesting question: is there a large class of graphs, for which it exists a polynomial-time reduction from 𝑛-PARTITION, to show the 𝒩𝒫-completeness? Therefore, it is sufficient to show if the graph 𝐺 is belonging to this class, in order to prove the nonexistence of 𝒫𝒯𝒜𝒮? In order to complete the study of processor networks, we design a polynomial-time approximation algorithm within a ratio at most ((𝛿+1)2/3)+1 where 𝛿 designates the diameter of the graph 𝐺.

3. Computational Complexity for a Large Class of Graph

3.1. The Class Graph 𝒢

We propose a large class of graph 𝒢 for which the problem of deciding whether an instance (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max3 is 𝒩𝒫-complete.

We present now a graph class for which we may apply the same polynomial-time transformation mechanism from 3-PARTITION problem to show that our scheduling problem when processor networks belong to this class is 𝒩𝒫-complete. Hereafter, we give the definition of the prism graph.

Definition 3.1. A prism 𝑃=(𝑉𝑃,𝐸𝑃) of size 𝑘 and length 𝐿 (𝑘,𝐿) is a connected undirected graph for that (i)there are two sets of vertices 𝐾 and 𝐾 such as 𝐾𝑉𝑃, 𝐾𝑉𝑃{𝐾}, and |𝐾|=|𝐾|=𝑘. The vertices are denoted 𝑠1,,𝑠𝑘 (resp. 𝑠1,,s𝑘); (ii)it exists an order on 𝐾 and 𝐾 vertices such that (𝑠𝑖𝐾,𝑠i𝐾,1𝑖𝑘) there is a path of length 𝐿 denoted 𝐶𝑖 between 𝑠𝑖 and 𝑠i; (iii)(𝑖𝑗)𝑥𝐶𝑖{𝑠𝑖,𝑠i}𝑦𝐶𝑗{𝑠𝑗,𝑠j}(𝑥,𝑦)𝐸𝑃.

Moreover, the size of a prism is polynomial in 𝑘. An illustration is given in Figure 2.

Definition 3.2. Let 𝒢 be a collection of graphs. 𝒢 possess the prism property if and only if 𝑛0,𝑛1𝐺𝒢, such that 𝐺 contains a unique subgraph 𝐺1=(𝑉1,𝐸1) of 𝐺 induced by vertices 𝑉1𝑉 with a prism of size 𝑘=𝑛0 and length 𝐿=𝑛1.

Lemma 3.3. The class graph 𝒢 is not empty.

Proof. In particular we will see in Section 3.2 classic structured graph like torus, grid, complete binary tree, and so forth, belonging to this class graph.

Theorem 3.4. The problem of deciding whether an instance of (𝑃,𝐺)|𝛽;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max has a schedule of length at most two is polynomial with 𝛽{prec,bipartite} and 𝐺𝒢.

Proof. No communication is allowed between two pairs of tasks.

The remainder of this section is devoted to proving Theorem 3.5.

Theorem 3.5. The problem of deciding whether an instance of (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙);𝑝𝑖=1|𝐶max has a schedule of length at most three is 𝒩𝒫-complete with 𝐺𝒢.

Proof. The proof is established by a reduction of the 3-PARTITION problem [8].
Instance
A finite set 𝒜 of 3𝑀 elements {𝑎1,,𝑎3𝑀}, a bound 𝐵+, and a size 𝑠(𝑎) for each 𝑎𝒜 such that each 𝑠(𝑎) satisfies 𝐵/4<𝑠(𝑎)<𝐵/2 and such that 𝑎𝒜𝑠(𝑎)=𝑀𝐵.
Question 1. Can 𝐴 be partitioned into 𝑀 disjoint sets 𝒜1,,𝒜𝑀 of 𝒜 such that for all 𝑖[1,,𝑀],𝐵=𝑎𝒜𝑖𝑠(𝑎)=𝑎𝒜𝑠(𝑎)/𝑀?
3-PARTITION is known to be 𝒩𝒫-complete in the strong sense [8]. (Even if 𝐵 is polynomially bounded by the instance size, the problem is still 𝒩𝒫-complete.)
It is easy to see that (𝑃,𝐺)|prec,𝑐𝑖𝑗=𝑑(𝜋𝑙,𝜋𝑘)=1,𝑝𝑖=1|𝐶max3𝒩𝒫.
Given an instance of the 3- PARTITION problem, we construct an instance of the scheduling problem (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max3 with 𝐺𝒢, in the following way.
The precedence graph 𝐺=𝒲+𝒵, which will be scheduled on the processors network 𝐺, is decomposed into two disjointed graphs, denoted as follows by 𝒲 and 𝒵 (the graph 𝒵 is a collection of graphs 𝑍𝑠(𝑎𝑗), i.e., 𝒵=𝑎𝑗𝒜𝑍𝑠(𝑎𝑗)). Hereafter, graphs 𝒵 and 𝒲 are characterized.

Graph 𝑍𝑖
Let 𝑖 be an integer such that 𝑖>1. Graph 𝑍𝑖 consists of 4×𝑖 vertices denoted by 𝑍𝑖[𝑘,0], 𝑍𝑖[𝑘,1], where 0𝑘<2𝑖. The precedence constraints between these tasks are defined as follows: (i)arcs 𝑍𝑖[𝑗,0]𝑍𝑖[𝑗,1] for any 𝑗, 0𝑗2𝑖1, (ii)arcs 𝑍𝑖[2𝑗,0]𝑍𝑖[2𝑗+1,1] for any 𝑗, 0𝑗𝑖1, (iii)arcs 𝑍𝑖[2𝑗,0]𝑍𝑖[2𝑗1,1] for any 𝑗, 1𝑗𝑖1.

Remark 3.6. Valid scheduling of length three for the case where the precedence graph is 𝑍𝑖 in a path of 2𝑖 processors is as follows, for any 𝑗, 0𝑗2𝑖1, (i)tasks 𝑍𝑖[𝑗,0] and 𝑍𝑖[𝑗,1] are executed on 𝜋𝑗, (ii)tasks 𝑍𝑖[𝑗,] are executed at time , for any {0,1}, if 𝑗 is even, (iii)tasks 𝑍𝑖[𝑗,] are otherwise executed at time +1, for any {0,1}.
See Figure 3 for graph 𝑍2 and Figure 4 for the valid scheduling described in Remark 3.6.

Graph 𝒲
Remark 3.7. A path of length 𝑙 admits 𝑙+1 vertices.
The 𝒲=(𝒱𝒱;𝐸𝒲) graph will be defined as follows. Let 𝐺=(𝑉,𝐸) be a graph such that 𝐺𝒢, with 𝑉={𝑣1,,𝑣𝑛}. By Definition 3.2, we know that it exists a unique subgraph 𝐺=(𝑉𝑉,𝐸𝐸) of size 𝑘 and length 𝐿 with desired properties. In the following we set 𝑘=𝑛 and 𝐿=2𝐵+1 and the size of 𝐺=(𝑉,𝐸) is polynomial in 𝑘. Note that 𝑛2𝐵.
The 𝒲-graph is defined by polynomial-time transformations from the 𝐺-graph. The graph given in Figure 5 will be used to illustrated the following construction. (i)The paths of length three are created and precedence constraints are added (see Figure 6). The two sets of tasks 𝒱1 and 𝒱 are created. (ii)The tasks are partitioned into three subsets 𝒱, 𝒦, and 𝒱 (see Figure 7).(iii)The 𝒱1-tasks are now partitioned into two subsets 𝒦 and 𝒱. We consider the subgraph induced by the 𝒱𝒱-tasks (see Figure 8) as the 𝒲graph.
The purpose of removing these tasks is to allow the tasks of 𝒦-graph when the tasks of 𝒲-graph, deprived of these tasks, will be executed on the graph of processors.
The set of vertices 𝑉 is partitioned into two sets 𝑉=𝑉𝑉: (i)𝑉={𝑣1,,𝑣2𝑛(𝐵+1)} the vertices of 𝐺, and defined the vertices of the 𝑛 unique paths of length (2𝐵+1) respecting the characteristics given by Definition 3.1, (ii)𝑉={𝑣2𝑛(𝐵+1)+1,,𝑣𝑛}, the set of an other vertices. Note that these vertices do not belong to 𝐺 graph.
The definition of the 𝒲 graph is given below. (i)𝑖{1,,2𝑛(𝐵+1)}, we create a path of length three 𝑣𝑖[0],𝑣𝑖[1], and 𝑣𝑖[2], with edges 𝑣𝑖[0]𝑣𝑖[1]𝑣𝑖[2]. The set of tasks will be denoted 𝒱1={𝑣𝑖[𝑗]|𝑖{1,,2𝑛(𝐵+1)},𝑗{0,1,2}}. The cardinality of 𝒱1 is 6𝑛(𝐵+1) (see Figure 6). (ii)𝑖{2𝑛(𝐵+1)+1,,𝑛}, we create a path of length three 𝑣𝑖[0]𝑣𝑖[1]𝑣𝑖[2]. This set of tasks will be denoted 𝒱. The number of tasks is 3(𝑛2𝑛(𝐵+1)) with 𝑛=|𝑉|. (iii)(𝑘,𝑙)𝐸, we add the edges 𝑣𝑘[0]𝑣𝑙[2] and 𝑣𝑙[0]𝑣𝑘[2] (see Figure 6).
Now, 4𝑛𝐵 tasks are removed from 𝒲-graph. (In order to clarify the polynomial-time transformation, we give priority to create tasks and remove some ones instead of enumerating all precedence constraints.) Therefore, we consider the following index sets: (i)𝐽0={2𝑖(𝐵+1)𝑖={1,2,,𝑛}}, (ii)𝐽1={2𝑖(𝐵+1)+1𝑖{0,1,2,,𝑛1}, (iii)𝐼0={𝑘{1,,2𝑛(𝐵+1)}{𝐽0𝐽1}and|𝑘iseven}, (iv)𝐼1={𝑘{1,,2𝑛(𝐵+1)}{𝐽0𝐽1}and|𝑘isodd}.
We remove from the 𝒱1-set the following tasks 𝑣𝑘[0], 𝑣𝑘[1] with 𝑘𝐼0, (resp. 𝑣𝑘[1], 𝑣𝑘[2] with 𝑘𝐼1). 𝒦 denotes the set of removed tasks (see Figure 7). Finally, we put 𝒱=𝒱1𝒦 with |𝒱|=2𝑛𝐵+6𝑛 (see Figure 8).
Figures 5, 6, 7, and 8 describe the construction of 𝒲-graph from 𝐺𝒢.
𝐸𝒲 is the set of arcs as described above.
Lastly, the number of processors is 𝑚=𝑛, and they are numbered as 𝜋𝑖 with 𝑖[1,𝑛].
In summary the precedence graph 𝐺=𝒲+𝒵 is composed by 𝒲=(𝒱𝒱,𝐸𝒲) with 3𝑛4𝑛𝐵 tasks and the precedence constraints given before and the graph 𝒵={𝑎𝑗𝒜𝑍𝑠(𝑎𝑖)} with 4𝑛𝐵 tasks.
The transformation is computed in polynomial time.(i)Let us assume that 𝒜={𝑎1,,𝑎3𝑀} can be partitioned into 𝑀 disjoint subsets 𝒜1,,𝒜𝑀 with each summing up to 𝐵. We will then prove that there is a schedule of length three at most.
Let us construct this schedule.
First, the task 𝑣𝑖[𝑗]𝒱𝒱 is executed on the processors 𝜋𝑖 to 𝑡=𝑗 with 𝑗{0,1,2} (if this task exists).
Consider the processors on which the set of 𝒱-tasks are scheduled. By the previous allocation, these processors are numbered as 𝜋1,,𝜋2𝑛(𝐵+1).
Let {𝒜1,,𝒜𝑛} be a partition of 𝒜. Consider 𝒜𝑖={𝑎𝑖1,𝑎𝑖2,𝑎𝑖3} with a fixed 𝑖. The tasks of 𝑍𝑠(𝑎𝑗), 𝑎𝑗𝒜𝑖 are executed between processors 𝜋1+2(𝑖1)(𝐵+1) and 𝜋2𝑖(𝐵+1). Moreover, the tasks 𝑍𝑠(𝑎𝑗)[𝑙,𝑘], 𝑘{0,1}, 𝑙𝐽0 (resp., 𝑘{1,2}, 𝑙𝐽1) are scheduled on 2𝑠(𝑎𝑖𝑗) processors in succession in order to respect a schedule of length three.
Thus without loss of generality, we suppose that the tasks of 𝑍𝑠(𝑎𝑖1) are scheduled between processors 𝜋1+2(𝑖1)(𝐵+1) and 𝜋2(𝑖1)(𝐵+1)+2𝑠(𝑎𝑖1). In similar way, the tasks 𝑍𝑠(𝑎𝑖2) (resp., 𝑍𝑠(𝑎𝑖3)) are executed between processors 𝜋2+2(𝑖1)(𝐵+1)+2𝑠(𝑎𝑖1) and 𝜋1+2(𝑖1)(𝐵+1)+2𝑠(𝑎𝑖1)+2𝑠(𝑎𝑖2) (resp. 𝜋2+2(𝑖1)(𝐵+1)+2𝑠(𝑎𝑖1)+2𝑠(𝑎𝑖2) and 𝜋2𝑖(𝐵+1)).
(ii)Let us assume now that there is a schedule 𝑆 of length at most three. We will prove that 𝒜={𝑎1,,𝑎3𝑀} can be partitioned into 𝑀 disjoint subsets 𝒜1,,𝒜𝑀 with each summing up to 𝐵.

Lemma 3.8. In any valid schedule of length three there is no idle time.

Proof. The number of processors is 𝑚=𝑛 and the number of tasks is 3𝑛 (4𝑛𝐵 for 𝒵-graph and 3𝑛4𝑛𝐵 for 𝒲 graph).

Lemma 3.9. In any valid schedule of length three, the subgraph induced by 𝒱 tasks must be executed on 2(𝐵+1) processors in succession.

Proof. Consider the subgraph induced by the 𝒱 tasks. This precedence graph admits paths of length two and these paths must be executed on the same processor (no communication delay is allowed).
Consider the tasks of path of length one. Let 𝑣𝑖[0]𝒱 be a task without predecessor. By construction 𝑣𝑖[0] admits one successor denoted by  𝑣𝑖+1[2]𝒱.
Suppose that these two tasks are allotted on the same processor 𝜋𝑙. Since that 𝑣𝑖+1[2] admits another predecessor denoted by 𝑣𝑖+2[0]𝒱 then 𝑣𝑖+1[2] is allotted at 𝑡=2.
The task 𝑣𝑖+2[0] cannot be executed at 𝑡=1 on 𝜋𝑙 since this task admits another successor as 𝑣𝑖+1[2]. Therefore, it exists an idle slot at 𝑡=1 on the processor 𝜋𝑙. By construction there is no independent task and since the 𝒵 graph admits only path of length one, then no task can be allotted on this idle slot. This is impossible
In conclusion, the subgraph induced by 𝒱 tasks must be executed on 2(𝐵+1) processors in succession.

Lemma 3.10. In any valid schedule of length three, two subgraphs induced by the 𝒱 tasks from two disjoint paths of length 2(𝐵+1) cannot be allotted on the same processors.

Proof. Consider the 𝒱 tasks which are elements of two disjoints paths of length 2(𝐵+1). A task without predecessor of one path cannot be allotted on the same processor as a task without successor of other path since there is no isolated task to schedule.

Lemma 3.11. In any valid valid schedule of length three the 𝑍𝑠(𝑎𝑗) tasks must be executed on the same processors as the 𝒱 tasks.

Proof. Let Π={𝜋𝑙𝒱 tasks allotted on 𝜋𝑙} be the set of processors on which the 𝒱 tasks are executed.
Suppose that the 𝑍𝑠(𝑎𝑗)-tasks are executed on processors 𝜋𝑘Π. By Lemma 3.8, there is no idle slot, then the tasks on the path of length three are necessarily allotted on processor 𝜋Π. This is impossible by Lemma 3.9.

With previous lemmas, we know that 6𝑛(𝐵+1) tasks (the 𝒱 tasks and the 𝑍𝑠(𝑎𝑗)-tasks) are executed on the 𝑛 disjoints paths of length 2𝐵+1. By Definition 3.2, we know that the graph 𝐺 admits a unique set of 𝑛 disjoints paths of length 2𝐵+1 with desired properties. Moreover with the precedence constraints, these tasks are allotted on a processor path of length (2𝐵+1). Without loss of generality, we suppose that a task 𝑣𝑙𝒱 is executed on the processor 𝜋𝑙 with 𝑙{2𝑛(𝐵+1)+1,,𝑛}.

Building the partition {𝒜1,,𝒜𝑛} with desired property from 𝑆 schedule of length three, we know that two tasks of the same subgraph 𝑍𝑠(𝑎𝑗) (see Lemma 3.11) cannot be executed on two different paths. The edge distance between these two processors is at least two.

We define 𝒜 such that 𝑎𝑗𝒜 if and only if the tasks of the graph 𝑍𝑠(𝑎𝑗) are executed between the processors numbered as 𝜋1+(𝑗1)2(𝐵+1) to 𝜋2𝑗(𝐵+1) with a fixed 𝑗.

Now, we will compute 𝑎𝑖𝒜𝑠(𝑎𝑖).

Using previous remarks, without loss of generality, we suppose that 𝑣𝑖[𝑘] with 𝑖{1,,2𝑛(𝐵+1)} and 𝑘{0,1,2} (if it exists) are executed on 𝜋𝑙 with 𝑙{1,,2𝑛(𝐵+1)}. Consider the 𝑍𝑠(𝑎𝑗)-tasks which are scheduled between processors 𝜋1+(𝑗1)2(𝐵+1) and 𝜋2𝑗(𝐵+1) for a fixed 𝑗{1,,2𝑛(𝐵+1)} except the index such that paths of length three constituted by tasks from 𝒱, are allotted on 𝜋𝑙.

Using Lemma 3.9, we know that the number of 𝒱 tasks executed on processors 𝜋1+(𝑗1)2(𝐵+1) and 𝜋2𝑗(𝐵+1) for a fixed 𝑗 is 6+2𝐵.

In conclusion we have {𝒜1,,𝒜𝑛} which forms a 𝒜 with desired properties.

The construction suggested previously can be easily adapted to obtain a bipartite graph of depth one. Moreover, from the proof of Theorem 3.5, we can derive the following theorem.

Theorem 3.12. The problem of deciding whether an instance of (𝑃,𝐺)|𝛽,𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘)=1,𝑝𝑖=1|𝐶max has a schedule of length at most three is 𝒩𝒫-complete with 𝛽{prec,bipartite}.

Proof. The proof is similar as the proof of Theorem 3.5 by considering the graph 𝐺 instead of widget 𝐺. Nevertheless each path of length two induced by the 𝒱 tasks is transformed into two paths of length one.
We use the same construction as it is proposed for the proof of Theorem 3.5. Nevertheless, all paths of length three are transformed into two paths in the following way: 𝑣𝑖[0]𝑣𝑖[1] and 𝑣𝑖[0]𝑣𝑖[2]. These three must be executed on the same processors. Indeed, if 𝑣𝑖[2] admits several predecessors, it is obvious. Otherwise, suppose that 𝑣𝑖[0] is allotted on a processor 𝜋. So 𝑣𝑖[1] must be executed at 𝑡=1 on 𝜋. The task 𝑣𝑖[2] is scheduled at 𝑡=2 on a neighborhood processor. Therefore no task from the graphs 𝒵 and 𝐺 can be executed on processor 𝜋 at 𝑡=2. Now using the same arguments as previously there is a schedule of length three if and only if the set 𝒜={𝑎1,,𝑎3𝑛} can be partitioned into 𝑛 disjoint subsets 𝒜1,,𝒜𝑛 each summing up to 𝐵.

The proof of Theorem 3.5 therefore implies that the problem where the tasks can be duplicated is also 𝒩𝒫-complete.

Corollary 3.13. The problem of deciding whether an instance of (𝑃,𝐺)|𝛽;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1,𝑑𝑢𝑝|𝐶max with 𝐺𝒢 has a schedule of length at most three is 𝒩𝒫-complete with 𝛽{prec,bipartite}.

Proof. The proof comes directly from Theorems 3.5 and 3.12. In fact, Lemma 3.8 implies that no task can be duplicated (the number of the tasks is equal to the number of processors times 3).

Moreover, nonapproximability results can be deduced.

Corollary 3.14. No polynomial-time algorithm exists with a performance bound less than 4/3 unless 𝒫=𝒩𝒫 for the problems (𝑃,𝐺)|𝛽;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘); 𝑝𝑖=1|𝐶maxand(𝑃,𝐺)|𝛽;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘); 𝑝𝑖=1,dup|𝐶max𝛽{prec,bipartite} with 𝐺𝒢.

Proof. The proof of Corollary 3.14 is an immediate consequence of the impossibility theorem; see [9, page 4].

3.2. Discussion

In the previous section, we propose a class graph 𝒢 for which the problem of deciding whether an instance of (𝑃,𝐺)|𝛽;𝑐𝑖𝑗=𝑑(𝜋,𝜋𝑘);𝑝𝑖=1|𝐶max has a schedule of length at most three is 𝒩𝒫-complete with 𝛽{prec,bipartite} and 𝐺𝒢.

Hereafter, we will exhibit the parameters (𝐿,𝑘) for some classic structured graphs in order to prove that the class graph 𝒢 is not empty.(i)For a grid 𝐺=Grid(𝑚,𝑝) (𝑚,𝑝, where the couple (𝑖,𝑗) designates the 𝑗 the position in the 𝑖 the line; 1𝑖𝑚,1𝑗𝑝) (or torus) topology, we need 𝑘=2𝑛+1 lines and 𝐿=2𝐵+2 columns. The set of vertices for the graph 𝐺 a subgraph of 𝐺 with the desired properties given by Definition 3.2 is 𝑉={(𝑖,𝑗),2𝑖2𝑛,𝑖even,2𝑗2𝐵+3} and 𝑉={(𝑖,1),1𝑖2𝑛+1}{(𝑖,𝑗),1𝑖2𝑛+1,𝑖odd;1𝑗2𝐵+3}. (ii)For the complete binary tree, it is sufficient to consider a tree with height of log(𝑛)+2𝐵+1. (iii)For the Hypercube 𝐻(𝑑) topology (or cube connected cycles), it is sufficient to have 𝑑=2log(𝑛)+𝐵+2. (iv)….

4. An Approximation Algorithm for Processor Networks with a Fixed Diameter

4.1. Description and Correctness of an Algorithm

In order to design an efficient polynomial-time approximation algorithm, the classic strategy consists of taking an instance of the combinatorial optimization problem and applying some transformations and/or using polynomial-time algorithms as subroutines (shortest path, spanning tree, maximum matching, etc.). Afterwards, it is sufficient to evaluate the best lower bound for any optimal solution, and this lower bound may be compared to the feasible solution for the combinatorial optimization problem in order to determine the ratio of an approximation algorithm.

Here, instead of considering an instance 𝐼 and trying to directly develop a feasible solution for the (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶max problem, we consider a partial instance of 𝐼 of our scheduling problem (An instance 𝐼 is constituted by a precedence graph with unit execution time and unit communication time, 𝑚 processors in 𝐺 graph form, with the distance function.), denoted 𝐼. (The partial instance 𝐼 of 𝐼 is constituted only by the precedence graph with unitary tasks and unitary communication time) For any instance 𝐼, we use the classic approximation algorithm proposed by Munier and König [10] for the 𝑃|prec;𝑐𝑖𝑗=1; 𝑝𝑖=1|𝐶max problem. We obtain a feasible schedule, denoted 𝑆 (we omit consideration of the processor graph for the moment) for the previous problem. Nevertheless, this solution is not feasible for our scheduling problem.

We proceed with polynomial-time chain of transformations, from schedule 𝑆 to a schedule 𝑆, in order to get a feasible schedule. It is only in the last step, only for schedule 𝑆, that we guarantee a feasible schedule for the problem (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶max.

This chain is defined as follows: 𝐼𝑓𝑆𝑔𝑆𝑆 (The schedule 𝑆 is a feasible solution for the{𝑃,𝐺}|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶𝑚𝑎𝑥 problem.), where 𝑓 is the Munier-König algorithm [10], 𝑔 the dilatation algorithm (see [11] for details or Appendix A) and the folding algorithm (see [12] for details or Appendix B).

Subsequently, we will consider the three following scheduling problems:(i)𝑃|prec;𝑐𝑖𝑗=1; 𝑝𝑖=1|𝐶max, (ii)𝑃|prec;𝑐𝑖𝑗2; 𝑝𝑖=1|𝐶max, (iii)and finally (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶max.

The principal steps of the algorithm are described below.

An approximation algorithm uses three steps. In each step we apply an algorithm for a specified scheduling problem [1012]. In the two first steps, a schedule is produced (these schedules are not feasible for our problem).(i)In the first step of an algorithm, a schedule (denoted 𝑆 on an unbounded number of processors), for the scheduling problem 𝑃|prec;𝑐𝑖𝑗=1;𝑝𝑖=1|𝐶max is produced. For this problem, Munier and König [10] presented a (4/3)-approximation algorithm that is based on an integer linear programming formulation. They use the following procedure: an integrity constraint is relaxed, and a feasible schedule is produced by rounding. (ii)The second step of an algorithm produces a schedule (denoted 𝑆, also on an unbounded number of processors) from 𝑆 by applying the dilatation principle proposed by [11] for the problem 𝑃|prec;𝑐𝑖𝑗2;𝑝𝑖=1|𝐶max (this algorithm produces a feasible schedule for the large communication delay problem from unitary communication delay. We therefore have 𝑆=𝑔(𝑆) where 𝑔 is the dilatation algorithm. (iii)The third step produces a schedule 𝑆 (feasible for the (𝑃,𝐺)|prec,𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1,𝑝𝑖=1|𝐶max problem) on the 𝐺 topology from 𝑆 using the folding principle [12]. The folding procedure constructs a feasible schedule on restricted number of processors from a feasible schedule on an unbounded number of processors. Thus, 𝑆=(𝑆) with being the folding algorithm.

Note that the length of schedule 𝑆 is less than 𝑆, which is less than 𝑆. The three steps are summarized in Figure 9. The notation description is given in the proof of Theorem 4.2.

Theorem 4.1. The previous algorithm leads a feasible schedule for the problem (𝑃,𝐺|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶max.

Proof. Proof is clear from the previous discussion concerning the description of an algorithm. Indeed, the communication delay is preserved and the precedence constraint is respected. Moreover, at most 𝑚 tasks are executed at any time.

4.2. Relative Performance Analysis

Theorem 4.2. The problem (𝑃,𝐺)|prec;𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1;𝑝𝑖=1|𝐶max may be approximable within a factor of ((𝛿+1)2/3)+1 using the previous algorithm.

Proof. We denote using 𝐶𝑥,𝑦,𝑧max with 𝑥{opt,}, 𝑦{UET-UCT,UET-LCT(c=𝛿),𝐺}, and 𝑧{𝑚,} the length of the schedule. Moreover 𝜌𝐺,𝑚 (resp., 𝜌𝐺,) designates the performance ratio on a 𝐺 processor network model with a bounded (resp., unbounded) number of processors.
Now let us examine the relative performance of this algorithm. (i)According to an algorithm, the first step deals with the problem 𝑃|prec;𝑐𝑖𝑗=1;𝑝𝑖=1|𝐶max.
First of all the Schedule (UET-UCT,∞) is not optimal. Using the algorithm from [10] gives us a 4/3 relative performance. And so, by [10], we know that 𝐶UET-UCT,max43𝐶opt,UET-UCT,max.(4.1)(ii)In the second step, a feasible solution for a large communication delay 𝑐=𝛿 (recall that 𝛿 stands for the diameter of processors network) is created. This solution comes from using the dilatation algorithm. Then, the expansion coefficient is (𝛿+1)/2 ([11]). And so,𝐶UET-LCT(𝑐=𝛿),max𝛿+1243𝐶opt,UET-LCT(c=𝛿),max,𝐶(4.2)UET-LCT(𝑐=𝛿),max2(𝛿+1)3𝐶opt,UET-LCT(c=𝛿),max.(4.3) Thus, we have a schedule on a UET-LCT task system with a communication delay equal to 𝛿 and an infinite number of processors.
By definition it is obvious that 𝐶𝐺,max𝐶UET-LCT(𝑐=𝛿),max,𝐶(4.4)opt,UET-UCT,max𝐶opt,UET-LCT(c=𝛿),max.(4.5) It is necessary to evaluate the gap between the optimal length for the schedule on a fully connected processor graph and a processor graph with a diameter of length 𝐾. For this, we consider unitary tasks subject to precedence constraints and an unbounded number of processors.

Lemma 4.3. The gap between a schedule on a fully connected graph of processors with a large communication delay 𝑐, for all pairs of tasks, and a schedule on a graph of processors with a diameter of length 𝐾, is at most (𝑐+1)/2.

Proof. We need to compare first the relative performance of this schedule on our model with network processor. The relative performance for the UET-LCT task system is not valid for our model. We need to compute a new bound for this schedule on our model.
Let 𝑝={𝑥1,𝑥2,,𝑥𝑛}be a critical path of the schedule (i.e., a path that gives the length of the schedule). Suppose that there is a communication delay between each pair of tasks (𝑥𝑖,𝑥𝑖+1) with 1𝑖<𝑛. In the UET-LCT task system ( with a communication delay equal to 𝑐 for all pair of tasks) the length of the schedule would be (1+𝑐)𝑛𝑐 units of time. In the graph of processors with a diameter of length 𝑘, the same path allows a length of (𝑘/2)(𝑛1)+𝑛 units of time. The worst case of the length for this path is 𝑛+(𝑛1)𝑘 and the best case is 2𝑛1. So, the ratio is (𝑛(1+𝑐)𝑐)/(2𝑛1). For the large 𝑛, we obtain the desired result.

By applying Lemma 4.3, which is valid for all schedules, and in particular for the optimum, with 𝑐=𝛿, we obtain𝐶opt,UET-LCT(𝑐=𝛿),max(𝛿+1)2𝐶opt,G,max(4.6) and so𝐶𝐺,max𝐶UET-LCT(𝑐=𝛿),max𝐶by(4.4)(4.7)𝐺,max2(𝛿+1)3𝐶opt,UET-LCT(𝑐=𝛿),max𝐶using(4.3)(4.8)𝐺,max(𝛿+1)23𝐶opt,𝐺,max𝜌using(4.6)(4.9)𝐺,(𝛿+1)23.(4.10) Now we have to transform this schedule using an infinite number of processors into a schedule with a bounded number of processors. This can be done easily using the method from [12]. The new worst-case relative performance is just increased by one. Thus we have𝜌𝐺,𝑚𝜌𝐺,+1(𝛿+1)23+1.(4.11)

Remark 4.4. Note that the order of the operations may be modified. Nevertheless, the ratio becomes 7/6×(𝛿+1)2. Indeed, the folding principle may be used just after the solution given by an algorithm proposed by Munier and König [10]. We then obtain a schedule on 𝑚 processors. Afterwards, we apply the dilation principle. This order yields a polynomial-time approximation algorithm with a ratio bounded by 7/6×(𝛿+1)2.

Remark 4.5. we may recall two classic results in scheduling problems for which the performance ratio increases by one between the unbounded and bounded versions.

(1) When the number of processors is unlimited, the problem of scheduling a set of 𝑛 tasks under precedence constraints with noncommunication delay is polynomial. It is sufficient to use the classical algorithm given by Bellman [13] as well as the two techniques widely used in project management: CPM (Critical Path Method) and PERT (Project/Program Evaluation and Review Technique). In contrast, when the number of processors is limited, the problem becomes 𝒩𝒫-complete and a (21/𝑚)-approximation is developed by Graham, see [14], where 𝑚 designates the number of processors based on a list scheduling in which no order on tasks is specified.

(2) The second illustration is given by the transition to UET-UCT on unrestricted version to the restricted variant. In [10], we know the existence of a 4/3-approximation algorithm. Using the previous result Munier and Hanen in [15] design a 7/3-approximation for the restricted version.

5. Conclusion

We have sharpened the demarcation line between the polynomially solvable and 𝒩𝒫-hard case of the central scheduling problem (UET-UCT) on a structured processor network by showing that its decision is polynomially solvable for 𝐶max2 while it is 𝒩𝒫-complete for 𝐶max3. This result is given for a large class of graph with a nonconstant diameter. This result implies there is no 𝜌-approximation algorithm with 𝜌<4/3. These results are extended to the case of precedence graph is a bipartite graph.

Lastly, we complete our complexity results by developing a polynomial-time approximation algorithm for (𝑃,𝐺)|prec,𝑐𝑖𝑗=𝑑(𝜋𝑘,𝜋𝑙)=1,𝑝𝑖=1|𝐶max with a worst-case relative performance of (𝛿+1)2/3+1, where 𝛿 designates the diameter of the graph. An interesting question for further research is to find a polynomial-time approximation algorithm with performance guarantee 𝜌 with 𝜌.

Appendices

A.

This section describes the dilatation principle. This principle has been studied in [11], and used for designing a new polynomial-time approximation algorithm with a nontrivial performance guarantee for the problem 𝑃|prec;𝑐𝑖𝑗=𝑐2;𝑝𝑖=1|𝐶max. For the latter problem, the authors propose a 𝑐+1/2-approximation algorithm (the best ratio as far as we know).

A.1. Introduction, Notation, and Description of the Method

Notation 1. We use 𝜎 to denote the UET-UCT schedule, and by 𝜎𝑐 the UET-LCT schedule. Moreover, we use 𝑡𝑖 (resp., 𝑡𝑐𝑖) to denote the starting time of the task 𝑖 in schedule 𝜎 (resp., in schedule 𝜎𝑐).

Principle
The tasks in 𝜎𝑐 allow the same assignment as the feasible schedule 𝜎 on an unbounded number of processors. We proceed to an expansion of the makespan, while preserving the communication delay (𝑡𝑐𝑗𝑡𝑐𝑖+1+𝑐) for two tasks 𝑖 and 𝑗, with (𝑖,𝑗)𝐸, processing on two different processors. For this, the starting time 𝑡𝑐𝑖 is translated by a factor 𝑑.
In the following section, we will justify and determine the coefficient 𝑑.
More formally, let 𝐺=(𝑉,E) be a precedence graph. We determine a feasible schedule 𝜎, for the model UET-UCT, using the (4/3)-approximation algorithm proposed by Munier and König [10]. The result of this algorithm gives a couple of values (𝑡𝑖,𝜋), 𝑖𝑉 on the schedule 𝜎 with 𝑡𝑖 being the starting time of the task 𝑖 for the schedule 𝜎 and 𝜋 the processor on which the task 𝑖 will be processed at 𝑡𝑖.
From this solution, we will derive a solution for the problem with large communication delays. For this, we will propose a new couple of values (𝑡𝑐𝑖,𝜋),𝑖𝑉 derived from couple (𝑡𝑖,𝜋). The computation of this set of new couples is obtained in the following ways: the start time 𝑡𝑐𝑖=𝑑×𝑡𝑖=((𝑐+1)/2)𝑡𝑖 and, 𝜋=𝜋. In other words, all tasks in the schedule 𝜎𝑐 are allotted on the same processor as the schedule 𝜎, and the starting time of a task 𝑖 undergoes a translation with a factor (𝑐+1)/2. The justification of the expansion coefficient is given below. An illustration of the expansion is given in Figure 10.

A.2. Feasibility, Analysis of the Method, and Computation of the Ratio

Afterwards, we will justify the existence of the coefficient 𝑑. Moreover, we prove the correctness of the feasible schedule for 𝑃|prec;𝑐𝑖𝑗=𝑐2; 𝑝𝑖=1|𝐶max problem. Lastly, we propose a worst-case analysis for the algorithm.

Lemma A.1. The coefficient of an expansion is 𝑑=(𝑐+1)/2.

Proof. Let there be two tasks 𝑖 and 𝑗 such that (𝑖,𝑗)𝐸, which are processed on two different processors in the feasible schedule 𝜎. We are interested in obtaining a coefficient 𝑑 such that 𝑡𝑐𝑖=𝑑×𝑡𝑖 and 𝑡𝑐𝑗=𝑑×𝑡𝑗. After expansion, in order to respect the precedence constraints and communication delay, we must have 𝑡𝑐𝑗𝑡𝑐𝑖+1+𝑐, and so 𝑑×𝑡𝑖𝑑×𝑡𝑗𝑐+1,𝑑(𝑐+1)/(𝑡𝑖𝑡𝑗),𝑑𝑐+1/2. It is sufficient to choose 𝑑=(𝑐+1)/2.

Lemma A.2. An expansion algorithm gives a feasible schedule for the 𝑃|prec;𝑐𝑖𝑗=𝑐2;𝑝𝑖=1|𝐶max problem in 𝑂(𝑛).

Proof. It is sufficient to check that the solution given by an expansion algorithm produces a feasible schedule for the UET-LCT model. Let 𝑖 and 𝑗 be two tasks such that (𝑖,𝑗)𝐸. We use 𝜋𝑖 (resp., 𝜋𝑗) to denote the processor on which task 𝑖 (resp., the task 𝑗) is executed in schedule 𝜎. Moreover, we use 𝜋𝑖 (resp., 𝜋𝑗) to denote the processor on which task 𝑖 (resp., the task 𝑗) is executed in schedule 𝜎𝑐. Thus, (i)if 𝜋𝑖=𝜋𝑗 then 𝜋𝑖=𝜋𝑗. Since the solution given by Munier and König [10] gives a feasible schedule on the model UET-UCT, we have 𝑡𝑖+1𝑡𝑗, (2/(𝑐+1))𝑡𝑐𝑖+1(2/(𝑐+1))𝑡𝑐𝑗; 𝑡𝑐𝑖+1𝑡𝑐𝑖+(𝑐+1)/2𝑡𝑐𝑗;(ii)if 𝜋𝑖𝜋𝑗 then 𝜋𝑖𝜋𝑗. We have 𝑡𝑖+1+1𝑡𝑗,(2/(𝑐+1))𝑡𝑐𝑖+2(2/(𝑐+1))𝑡𝑐𝑗; 𝑡𝑐𝑖+(𝑐+1)𝑡𝑐𝑗.

Theorem A.3. An expansion algorithm gives a (2(𝑐+1)/3)-approximation algorithm for the problem 𝑃|prec;𝑐𝑖𝑗=𝑐2;𝑝𝑖=1|𝐶max.

Proof. We use 𝐶,UET-UCT,max (resp., 𝐶opt,UET-UCT,max) to denote the makespan of the schedule computed by Munier and König (resp., the optimal value of a schedule 𝜎). In the same way, we use 𝐶,UET-LCT,max (resp., 𝐶opt,UET-LCT,max) to denote the makespan of the schedule computed by an algorithm (resp., the optimal value of a schedule 𝜎𝑐).
We know that 𝐶,UET-UCT,max43𝐶opt,UET-UCT,max.(A.1) Thus, we obtain 𝐶,UET-LCT,max𝐶opt,UETLCT,max=((𝑐+1)/2)𝐶,UET-UCT,max𝐶opt,UET-LCT,max((𝑐+1)/2)𝐶,UET-UCT,max𝐶opt,UET-UCT,max(𝑐+1/2)(4/3)𝐶opt,UET-UCT,max𝐶opt,UET-UCT,max2(𝑐+1)3.(A.2)

B.

In this section, we present a simple algorithm which gives a schedule 𝜎𝑚 on 𝑚 machines from a schedule 𝜎 on an unbounded number of processors for 𝑃|prec,𝑐𝑖𝑗=1,𝑝𝑖=1|𝐶max. Let 𝑋𝑖 be the set of tasks executed at 𝑡𝑖 in 𝜎 using a heuristic . The 𝑋𝑖 tasks are executed in |𝑋𝑖|/𝑚 units of time in the schedule 𝜎𝑚. We apply this procedure for all 𝑖=0,,𝐶,UET-UCT,max1. The validity of this algorithm is based on the fact there is at most a matching between the tasks executed at 𝑡𝑖 and the tasks processed at 𝑡𝑖+1 (called Brent's lemma, see [12]).

Theorem B.1. From any polynomial time algorithm with performance guarantee 𝜌 (i.e., 𝐶,UET-UCT,max𝜌𝐶opt,UET-UCT,max) for the problem 𝑃|prec,𝑐𝑖𝑗=1,𝑝𝑖=1|𝐶max, we may obtain a polynomial-time algorithm with performance guarantee 𝜌𝑚=(1+𝜌) for the problem 𝑃|prec,𝑐𝑖𝑗=1,𝑝𝑖=1|𝐶max.

Proof. Let 𝐶,UET-UCT,max (resp., 𝐶,UET-UCT,𝑚max) be the length of the schedule given by (resp., by ). In the same way, let 𝐶opt,UET-UCT,max (resp., 𝐶opt,UET-UCT,𝑚max) be the optimal length of the schedule on an unbounded number of processors (resp., in a restricted number of processors). We denote by 𝑛 the number of tasks in the schedule. Clearly, this gives us 𝐶opt,UET-UCT,max𝐶opt,UET-UCT,𝑚max and 𝐶,UET-UCT,max𝜌𝐶,UET-UCT,max. So, 𝐶,UET-UCT,𝑚max𝐶,UET-UCT,max1𝑖=0||𝑋𝑖||𝑚𝐶,UET-UCT,max1𝑖=0||𝑋𝑖||𝑚,𝐶+1,UET-UCT,𝑚max𝐶,UET-UCT,max1𝑖=0||𝑋𝑖||𝑚+𝐶,UET-UCT,max,𝐶,UET-UCT,𝑚max𝐶opt,UET-UCT,𝑚max+𝐶,UET-UCT,max,𝐶,UET-UCT,𝑚max𝐶opt,UET-UCT,𝑚max+𝜌𝐶opt,UET-UCT,𝑚max,𝜌𝑚(1+𝜌).(B.1) This concludes proof of Theorem B.1.