Table of Contents Author Guidelines Submit a Manuscript
Erratum

An erratum for this article has been published. To view the erratum, please click here.

Advances in Operations Research
Volumeย 2011, Article IDย 476939, 20 pages
http://dx.doi.org/10.1155/2011/476939
Research Article

Inapproximability and Polynomial-Time Approximation Algorithm for UET Tasks on Structured Processor Networks

1Laboratoire G-SCOP, 46 avenue Fรฉlix Viallet, 38031 Grenoble Cedex 1, France
2LIRMM, 161 rue Ada, UMR 5056, 34392 Montpellier Cedex 5, France

Received 26 October 2010; Revised 22 March 2011; Accepted 4 April 2011

Academic Editor: Ching-Jongย Liao

Copyright ยฉ 2011 M. Bouznif and R. Giroudeau. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We investigate complexity and approximation results on a processor networks where the communication delay depends on the distance between the processors performing tasks. We then prove that there is no heuristic with a performance guarantee smaller than 4/3 for makespan minimization for precedence graph on a large class of processor networks like hypercube, grid, torus, and so forth, with a fixed diameter ๐›ฟโˆˆโ„•. We extend complexity results when the precedence graph is a bipartite graph. We also design an efficient polynomial-time ๐‘‚(๐›ฟ2)-approximation algorithm for the makespan minimization on processor networks with diameter ๐›ฟ.

1. Introduction

1.1. Problem Statement

In this paper, we consider the processor network model, which is a generalization of the homogeneous scheduling delay model in which task allocation on the processors does not have any influence over the length of scheduling. Indeed, since the graph of processors (denoted hereafter ๐บโˆ—=(๐‘‰โˆ—,๐ธโˆ—) where ๐‘‰โˆ—={๐œ‹1,โ€ฆ,๐œ‹๐‘š} is a set of ๐‘š processors and ๐ธโˆ— is the set relationship between them) is fully connected, the starting of a task ๐‘– depends only on the potential communication delay, given by precedence graph between ๐‘– and its own predecessors.

In the processor network model, this assumption is relaxed in order to take into account the fact that the processor graph may not be fully connected. Thus, task allocation on the processors can be expressed by its essential and fundamentals characteristics. We consider a model in which a distance function (which is defined hereafter), denoted ๐‘‘(๐œ‹๐‘™,๐œ‹โ„Ž) between two processors ๐œ‹๐‘™ and ๐œ‹โ„Ž in the graph of processors impacts computation of the communication delay between two tasks ๐‘– and ๐‘— (subject to a precedence constraint) and consequently on the starting time of task ๐‘—. The communication time, using ๐‘๐‘–,๐œ‹๐‘™,๐‘—,๐œ‹โ„Ž for computing the starting time of a task (this notation indicates that the value of the communication delay between task ๐‘–, which is allotted to processor ๐œ‹๐‘™ and task ๐‘— which will be executed on the processor ๐œ‹โ„Ž), is assumed as ๐‘๐‘–๐‘—๐‘‘(๐œ‹๐‘™,๐œ‹โ„Ž), where ๐‘๐‘–๐‘— is the communication delay given by the precedence graph.

Formally, the processor network model may be defined asโˆ€(๐‘–,๐‘—)โˆˆ๐ธ,๐‘ก๐‘—โ‰ฅ๐‘ก๐‘–+๐‘๐‘–+๐‘๐‘–๐‘—๐‘‘๎€ท๐œ‹โ„“,๐œ‹โ„Ž๎€ธ,(1.1) where ๐œ‹โ„“ (resp. ๐œ‹โ„Ž) represents the processor on which task ๐‘– (resp. task ๐‘—) is scheduled, ๐‘ก๐‘– represents the starting time of task ๐‘–, ๐‘๐‘– represents the processing time of task ๐‘–, ๐‘‘(๐œ‹โ„“,๐œ‹โ„Ž) represents the shortest path in graph ๐บโˆ— (the graph of processor ๐บโˆ—=(๐‘‰โˆ—,๐ธโˆ—)) between ๐œ‹โ„“ and ๐œ‹โ„Ž, and ๐‘๐‘–๐‘— represents the communication delay if two tasks are executed on two neighboring processors (this value is given by the precedence graph).

We consider the classic scheduling UET-UCT (Unit Execution Time-Unit Communication Time, i.e., โˆ€๐‘–โˆˆ๐‘‰, ๐‘๐‘–=1, and โˆ€(๐‘–,๐‘—)โˆˆ๐ธ,๐‘๐‘–๐‘—=1) problem on a bounded number of processors such that the processor network is a structured graph with a diameter ๐›ฟ. In these topologies, processors are numbered as ๐œ‹1,๐œ‹2,โ€ฆ,๐œ‹๐‘š and processor ๐œ‹โ„Ž may be communicated with processor ๐œ‹๐‘™ with a communication cost equal to ๐‘‘(๐œ‹โ„Ž,๐œ‹๐‘™) where ๐‘‘(๐œ‹โ„Ž,๐œ‹๐‘™) represents the shortest path on graph ๐บโˆ— between processors ๐œ‹โ„Ž and ๐œ‹๐‘™. The communication delay is therefore the distance function proposed above.

In scheduling theory, a problem type is categorized by its machine environment, job characteristic, and objective function. Thus, using the three fields notation scheme ๐›ผ|๐›ฝ|๐›พ,(where ๐›ผ designates the environment processors, ๐›ฝ the characteristics of the job, and ๐›พ the criteria.) proposed by Graham et al. [1], we consider the problem of makespan minimization (denoted in follows by ๐ถmax) with unitary task and unitary communication delay (UET-UCT) in presence of a precedence graph ๐บ on a processors network having a graph ๐บโˆ— such that the communication delay depends on the shortest path on graph ๐บโˆ—. This problem is denoted by (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmax.

Example 1.1. Figure 1 shows the difference between the two problems ๐‘ƒ|prec;๐‘๐‘–๐‘—=1;๐‘๐‘–=1|๐ถmax and (๐‘ƒ,grid2ร—2)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmax. (The relationship between processors is as follows: ๐œ‹0 and ๐œ‹3 are connected to ๐œ‹1 and ๐œ‹2.) The processing time of the tasks and the communication delay between the tasks are unitary (UET-UCT problem). Gantt diagram ๐บ1 represents an optimal solution for the ๐‘ƒ|prec;๐‘๐‘–๐‘—=1;๐‘๐‘–=1|๐ถmax problem. We can notice that task ๐‘ง can be executed on any processor at ๐‘ก=2. Moreover, Gantt diagram ๐บ2 represents an optimal solution for the problem (๐‘ƒ,grid2ร—2)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmax. In order to obtain an optimal solution, the task ๐‘Ž must be delayed by one unit of time and must be processed on the same processor ๐œ‹2 as task ๐‘ at ๐‘ก=1. Thus, task ๐‘’ may be executed at ๐‘ก=2 only on the processor ๐œ‹2.

476939.fig.001
Figure 1: Difference between the problem ๐‘ƒ|prec;๐‘๐‘–๐‘—=1;๐‘๐‘–=1|๐ถmax and (๐‘ƒ,grid2ร—2)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘–,๐œ‹๐‘—);๐‘๐‘–=1|๐ถmax.
1.2. Organization of the Paper

This paper is organized as follows: the next section is devoted to the related works. In Section 3, after defining the class graph ๐’ข we propose a general nonapproximability result for a nonspecified precedence graph. We also extend the previous result when the precedence graph is a bipartite graph and when the duplication is allowed. In the last section, we design a polynomial-time approximation algorithm with a performance ratio within ๐‘‚(๐›ฟ).

2. Related Works

2.1. Complexity Results

To the best of our knowledge, the first complexity result was given by Picouleau [2]. The considered problem was to schedule unit execution time tasks with a precedence graph on an unbounded number of processors and on a chain or star (a star is a tree of depth one) topology. Picouleau proved that this problem is ๐’ฉ๐’ซ-complete if the precedence graph is a tree or an outtree. Recently in [3], the authors proved that there is no heuristic with a performance guarantee smaller than 6/5 for minimizing the makespan on a processor network represented by a star. This model is closest to the master-slave architecture. In [4], the authors proved that there is no hope to finding a polynomial-time approximation algorithm with a ratio ๐œŒ>4/3 for the problem to schedule a set of tasks on a ring or a chain as processors network (see Table 1).

tab1
Table 1: Previous complexity results on the processors network model.
2.1.1. Approximation Results

In ring topology, Lahlou developed, in [5], using the list scheduling proposed by Rayward-Smith [6], a ๐œŒ-approximation algorithm with โŒˆโˆš๐‘šโŒ‰โ‰ค๐œŒโ‰ค1+(3/8)๐‘šโˆ’1/2๐‘š where ๐‘š is the number of processors.

Moreover, Hwang et al. [7] studied approximation list algorithms for scheduling problems where the communication times depend on contention and a distance function for the tasks involved and on the processors that execute the tasks. The authors examined a simple strategy called extended list scheduling, ELS, which is a straightforward extension of list scheduling. They proved that the ELS strategy is unsatisfactory, but improved a strategy called earliest task first.

Recently, in [3] the authors proposed a sophisticated polynomial-time approximation algorithm with a ratio equal to four based on three steps for the problem for the makespan minimization problem on a processor networks as a star forms. In [4] the authors develop two polynomial-time approximation algorithms for processor networks with limited or unlimited resources.

2.2. Our Contributions

In this paper, we answer the following interesting question: is there a large class of graphs, for which it exists a polynomial-time reduction from ๐‘›-PARTITION, to show the ๐’ฉ๐’ซ-completeness? Therefore, it is sufficient to show if the graph ๐บ is belonging to this class, in order to prove the nonexistence of ๐’ซ๐’ฏ๐’œ๐’ฎ? In order to complete the study of processor networks, we design a polynomial-time approximation algorithm within a ratio at most ((๐›ฟ+1)2/3)+1 where ๐›ฟ designates the diameter of the graph ๐บโˆ—.

3. Computational Complexity for a Large Class of Graph

3.1. The Class Graph ๐’ข

We propose a large class of graph ๐’ข for which the problem of deciding whether an instance (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmaxโ‰ค3 is ๐’ฉ๐’ซ-complete.

We present now a graph class for which we may apply the same polynomial-time transformation mechanism from 3-PARTITION problem to show that our scheduling problem when processor networks belong to this class is ๐’ฉ๐’ซ-complete. Hereafter, we give the definition of the prism graph.

Definition 3.1. A prism ๐‘ƒ=(๐‘‰๐‘ƒ,๐ธ๐‘ƒ) of size ๐‘˜ and length ๐ฟ (๐‘˜,๐ฟโˆˆโ„•) is a connected undirected graph for that (i)there are two sets of vertices ๐พ and ๐พ๎…ž such as ๐พโŠ‚๐‘‰๐‘ƒ, ๐พ๎…žโŠ‚๐‘‰๐‘ƒโงต{๐พ}, and |๐พ|=|๐พโ€ฒ|=๐‘˜. The vertices are denoted ๐‘ 1,โ€ฆ,๐‘ ๐‘˜ (resp. ๐‘ ๎…ž1,โ€ฆ,s๎…ž๐‘˜); (ii)it exists an order on ๐พ and ๐พโ€ฒ vertices such that (โˆ€๐‘ ๐‘–โˆˆ๐พ,๐‘ ๎…žiโˆˆ๐พ,1โ‰ค๐‘–โ‰ค๐‘˜) there is a path of length ๐ฟ denoted ๐ถ๐‘– between ๐‘ ๐‘– and ๐‘ ๎…ži; (iii)(๐‘–โ‰ ๐‘—)โˆง๐‘ฅโˆˆ๐ถ๐‘–โงต{๐‘ ๐‘–,๐‘ ๎…ži}โˆง๐‘ฆโˆˆ๐ถ๐‘—โงต{๐‘ ๐‘—,๐‘ ๎…žj}โ‡’(๐‘ฅ,๐‘ฆ)โˆ‰๐ธ๐‘ƒ.

Moreover, the size of a prism is polynomial in ๐‘˜. An illustration is given in Figure 2.

476939.fig.002
Figure 2: An example of a prism of size ๐‘˜ and length ๐ฟ.

Definition 3.2. Let ๐’ข be a collection of graphs. ๐’ข possess the prism property if and only if โˆ€๐‘›0,โˆ€๐‘›1โˆˆโ„•โˆƒ๐บโˆˆ๐’ข, such that ๐บ contains a unique subgraph ๐บ1=(๐‘‰1,๐ธ1) of ๐บ induced by vertices ๐‘‰1โŠ‚๐‘‰ with a prism of size ๐‘˜=๐‘›0 and length ๐ฟ=๐‘›1.

Lemma 3.3. The class graph ๐’ข is not empty.

Proof. In particular we will see in Section 3.2 classic structured graph like torus, grid, complete binary tree, and so forth, belonging to this class graph.

Theorem 3.4. The problem of deciding whether an instance of (๐‘ƒ,๐บโˆ—)|๐›ฝ;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmax has a schedule of length at most two is polynomial with ๐›ฝโˆˆ{prec,bipartite} and ๐บโˆ—โˆˆ๐’ข.

Proof. No communication is allowed between two pairs of tasks.

The remainder of this section is devoted to proving Theorem 3.5.

Theorem 3.5. The problem of deciding whether an instance of (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™);๐‘๐‘–=1|๐ถmax has a schedule of length at most three is ๐’ฉ๐’ซ-complete with ๐บโˆ—โˆˆ๐’ข.

Proof. The proof is established by a reduction of the 3-PARTITION problem [8].
Instance
A finite set ๐’œ of 3๐‘€ elements {๐‘Ž1,โ€ฆ,๐‘Ž3๐‘€}, a bound ๐ตโˆˆโ„•+, and a size ๐‘ (๐‘Ž)โˆˆโ„• for each ๐‘Žโˆˆ๐’œ such that each ๐‘ (๐‘Ž) satisfies ๐ต/4<๐‘ (๐‘Ž)<๐ต/2 and such that โˆ‘๐‘Žโˆˆ๐’œ๐‘ (๐‘Ž)=๐‘€๐ต.
Question 1. Can ๐ด be partitioned into ๐‘€ disjoint sets ๐’œ1,โ€ฆ,๐’œ๐‘€ of ๐’œ such that for all ๐‘–โˆˆ[1,โ€ฆ,๐‘€],โˆ‘๐ต=๐‘Žโˆˆ๐’œ๐‘–โˆ‘๐‘ (๐‘Ž)=๐‘Žโˆˆ๐’œ๐‘ (๐‘Ž)/๐‘€โˆˆโ„•?
3-PARTITION is known to be ๐’ฉ๐’ซ-complete in the strong sense [8]. (Even if ๐ต is polynomially bounded by the instance size, the problem is still ๐’ฉ๐’ซ-complete.)
It is easy to see that (๐‘ƒ,๐บโˆ—)|prec,๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘™,๐œ‹๐‘˜)=1,๐‘๐‘–=1|๐ถmaxโ‰ค3โˆˆ๐’ฉ๐’ซ.
Given an instance โ„ of the 3- PARTITION problem, we construct an instance โ„๎…ž of the scheduling problem (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmaxโ‰ค3 with ๐บโˆ—โˆˆ๐’ข, in the following way.
The precedence graph ๎‚๐บ=๐’ฒ+๐’ต, which will be scheduled on the processors network ๐บโˆ—, is decomposed into two disjointed graphs, denoted as follows by ๐’ฒ and ๐’ต (the graph ๐’ต is a collection of graphs ๐‘๐‘ (๐‘Ž๐‘—), i.e., ๐’ต=โˆช๐‘Ž๐‘—โˆˆ๐’œ๐‘๐‘ (๐‘Ž๐‘—)). Hereafter, graphs ๐’ต and ๐’ฒ are characterized.

Graph ๐‘๐‘–
Let ๐‘– be an integer such that ๐‘–>1. Graph ๐‘๐‘– consists of 4ร—๐‘– vertices denoted by ๐‘๐‘–[๐‘˜,0], ๐‘๐‘–[๐‘˜,1], where 0โ‰ค๐‘˜<2๐‘–. The precedence constraints between these tasks are defined as follows: (i)arcs ๐‘๐‘–[๐‘—,0]โ†’๐‘๐‘–[๐‘—,1] for any ๐‘—, 0โ‰ค๐‘—โ‰ค2๐‘–โˆ’1, (ii)arcs ๐‘๐‘–[2๐‘—,0]โ†’๐‘๐‘–[2๐‘—+1,1] for any ๐‘—, 0โ‰ค๐‘—โ‰ค๐‘–โˆ’1, (iii)arcs ๐‘๐‘–[2๐‘—,0]โ†’๐‘๐‘–[2๐‘—โˆ’1,1] for any ๐‘—, 1โ‰ค๐‘—โ‰ค๐‘–โˆ’1.

Remark 3.6. Valid scheduling of length three for the case where the precedence graph is ๐‘๐‘– in a path of 2๐‘– processors is as follows, for any ๐‘—, 0โ‰ค๐‘—โ‰ค2๐‘–โˆ’1, (i)tasks ๐‘๐‘–[๐‘—,0] and ๐‘๐‘–[๐‘—,1] are executed on ๐œ‹๐‘—, (ii)tasks ๐‘๐‘–[๐‘—,โ„“] are executed at time โ„“, for any โ„“โˆˆ{0,1}, if ๐‘— is even, (iii)tasks ๐‘๐‘–[๐‘—,โ„“] are otherwise executed at time โ„“+1, for any โ„“โˆˆ{0,1}.
See Figure 3 for graph ๐‘2 and Figure 4 for the valid scheduling described in Remark 3.6.

476939.fig.003
Figure 3: Graph Z2.
476939.fig.004
Figure 4: Valid schedule of length three for graph Z2.

Graph ๐’ฒ
Remark 3.7. A path of length ๐‘™ admits ๐‘™+1 vertices.
The ๐’ฒ=(๐’ฑโˆช๐’ฑโ€ฒ;๐ธ๐’ฒ) graph will be defined as follows. Let ๐บโˆ—=(๐‘‰โˆ—,๐ธโˆ—) be a graph such that ๐บโˆ—โˆˆ๐’ข, with ๐‘‰โˆ—={๐‘ฃโˆ—1,โ€ฆ,๐‘ฃโˆ—๐‘›โˆ—}. By Definition 3.2, we know that it exists a unique subgraph ๐บ=(๐‘‰โŠ‚๐‘‰โˆ—,๐ธโŠ‚๐ธโˆ—) of size ๐‘˜ and length ๐ฟ with desired properties. In the following we set ๐‘˜=๐‘› and ๐ฟ=2๐ต+1 and the size of ๐บโˆ—=(๐‘‰โˆ—,๐ธโˆ—) is polynomial in ๐‘˜. Note that ๐‘›โˆ—โ‰ซ2๐ต.
The ๐’ฒ-graph is defined by polynomial-time transformations from the ๐บโˆ—-graph. The graph given in Figure 5 will be used to illustrated the following construction. (i)The paths of length three are created and precedence constraints are added (see Figure 6). The two sets of tasks ๐’ฑ1 and ๐’ฑโ€ฒ are created. (ii)The tasks are partitioned into three subsets ๐’ฑโ€ฒ, ๐’ฆ, and ๐’ฑ (see Figure 7).(iii)The ๐’ฑ1-tasks are now partitioned into two subsets ๐’ฆ and ๐’ฑ. We consider the subgraph induced by the ๐’ฑโˆช๐’ฑโ€ฒ-tasks (see Figure 8) as the ๐’ฒโˆ’graph.
The purpose of removing these tasks is to allow the tasks of ๐’ฆ-graph when the tasks of ๐’ฒ-graph, deprived of these tasks, will be executed on the graph of processors.
The set of vertices ๐‘‰โˆ— is partitioned into two sets ๐‘‰โˆ—=๐‘‰โ€ฒโˆช๐‘‰: (i)๐‘‰={๐‘ฃโˆ—1,โ€ฆ,๐‘ฃโˆ—2๐‘›(๐ต+1)} the vertices of ๐บ, and defined the vertices of the ๐‘› unique paths of length (2๐ต+1) respecting the characteristics given by Definition 3.1, (ii)๐‘‰โ€ฒ={๐‘ฃโˆ—2๐‘›(๐ต+1)+1,โ€ฆ,๐‘ฃโˆ—๐‘›โˆ—}, the set of an other vertices. Note that these vertices do not belong to ๐บ graph.
The definition of the ๐’ฒ graph is given below. (i)โˆ€๐‘–โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)}, we create a path of length three ๐‘ฃโˆ—๐‘–[0],๐‘ฃโˆ—๐‘–[1], and ๐‘ฃโˆ—๐‘–[2], with edges ๐‘ฃโˆ—๐‘–[0]โ†’๐‘ฃโˆ—๐‘–[1]โ†’๐‘ฃโˆ—๐‘–[2]. The set of tasks will be denoted ๐’ฑ1={๐‘ฃโˆ—๐‘–[๐‘—]|โˆ€๐‘–โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)},๐‘—โˆˆ{0,1,2}}. The cardinality of ๐’ฑ1 is 6๐‘›(๐ต+1) (see Figure 6). (ii)โˆ€๐‘–โˆˆ{2๐‘›(๐ต+1)+1,โ€ฆ,๐‘›โˆ—}, we create a path of length three ๐‘ฃโˆ—๐‘–[0]โ†’๐‘ฃโˆ—๐‘–[1]โ†’๐‘ฃโˆ—๐‘–[2]. This set of tasks will be denoted ๐’ฑโ€ฒ. The number of tasks is 3(๐‘›โˆ—โˆ’2๐‘›(๐ต+1)) with ๐‘›โˆ—=|๐‘‰โˆ—|. (iii)(๐‘˜,๐‘™)โˆˆ๐ธโˆ—, we add the edges ๐‘ฃโˆ—๐‘˜[0]โ†’๐‘ฃโˆ—๐‘™[2] and ๐‘ฃโˆ—๐‘™[0]โ†’๐‘ฃโˆ—๐‘˜[2] (see Figure 6).
Now, 4๐‘›๐ต tasks are removed from ๐’ฒ-graph. (In order to clarify the polynomial-time transformation, we give priority to create tasks and remove some ones instead of enumerating all precedence constraints.) Therefore, we consider the following index sets: (i)๐ฝ0={2๐‘–(๐ต+1)โˆฃ๐‘–={1,2,โ€ฆ,๐‘›}}, (ii)๐ฝ1={2๐‘–(๐ต+1)+1โˆฃ๐‘–โˆˆ{0,1,2,โ€ฆ,๐‘›โˆ’1}, (iii)๐ผ0={๐‘˜โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)}โงต{๐ฝ0โˆช๐ฝ1}and|๐‘˜iseven}, (iv)๐ผ1={๐‘˜โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)}โงต{๐ฝ0โˆช๐ฝ1}and|๐‘˜isodd}.
We remove from the ๐’ฑ1-set the following tasks ๐‘ฃโˆ—๐‘˜[0], ๐‘ฃโˆ—๐‘˜[1] with ๐‘˜โˆˆ๐ผ0, (resp. ๐‘ฃโˆ—๐‘˜[1], ๐‘ฃโˆ—๐‘˜[2] with ๐‘˜โˆˆ๐ผ1). ๐’ฆ denotes the set of removed tasks (see Figure 7). Finally, we put ๐’ฑ=๐’ฑ1โงต๐’ฆ with |๐’ฑ|=2๐‘›๐ต+6๐‘› (see Figure 8).
Figures 5, 6, 7, and 8 describe the construction of ๐’ฒ-graph from ๐บโˆ—โˆˆ๐’ข.
๐ธ๐’ฒ is the set of arcs as described above.
Lastly, the number of processors is ๐‘š=๐‘›โˆ—, and they are numbered as ๐œ‹๐‘– with ๐‘–โˆˆ[1,๐‘›โˆ—].
In summary the precedence graph ๎‚๐บ=๐’ฒ+๐’ต is composed by ๐’ฒ=(๐’ฑโˆช๐’ฑโ€ฒ,๐ธ๐’ฒ) with 3๐‘›โˆ—โˆ’4๐‘›๐ต tasks and the precedence constraints given before and the graph โ‹ƒ๐’ต={๐‘Ž๐‘—โˆˆ๐’œ๐‘๐‘ (๐‘Ž๐‘–)} with 4๐‘›๐ต tasks.
The transformation is computed in polynomial time.(i)Let us assume that ๐’œ={๐‘Ž1,โ€ฆ,๐‘Ž3๐‘€} can be partitioned into ๐‘€ disjoint subsets ๐’œ1,โ€ฆ,๐’œ๐‘€ with each summing up to ๐ต. We will then prove that there is a schedule of length three at most.
Let us construct this schedule.
First, the task ๐‘ฃโˆ—๐‘–[๐‘—]โˆˆ๐’ฑโ€ฒโˆช๐’ฑ is executed on the processors ๐œ‹๐‘– to ๐‘ก=๐‘— with ๐‘—โˆˆ{0,1,2} (if this task exists).
Consider the processors on which the set of ๐’ฑ-tasks are scheduled. By the previous allocation, these processors are numbered as ๐œ‹1,โ€ฆ,๐œ‹2๐‘›(๐ต+1).
Let {๐’œ1,โ€ฆ,๐’œ๐‘›} be a partition of ๐’œ. Consider ๐’œ๐‘–={๐‘Ž๐‘–1,๐‘Ž๐‘–2,๐‘Ž๐‘–3} with a fixed ๐‘–. The tasks of ๐‘๐‘ (๐‘Ž๐‘—), ๐‘Ž๐‘—โˆˆ๐’œ๐‘– are executed between processors ๐œ‹1+2(๐‘–โˆ’1)(๐ต+1) and ๐œ‹2๐‘–(๐ต+1). Moreover, the tasks ๐‘๐‘ (๐‘Ž๐‘—)[๐‘™,๐‘˜], ๐‘˜โˆˆ{0,1}, ๐‘™โˆˆ๐ฝ0 (resp., ๐‘˜โˆˆ{1,2}, ๐‘™โˆˆ๐ฝ1) are scheduled on 2๐‘ (๐‘Ž๐‘–๐‘—) processors in succession in order to respect a schedule of length three.
Thus without loss of generality, we suppose that the tasks of ๐‘๐‘ (๐‘Ž๐‘–1) are scheduled between processors ๐œ‹1+2(๐‘–โˆ’1)(๐ต+1) and ๐œ‹2(๐‘–โˆ’1)(๐ต+1)+2๐‘ (๐‘Ž๐‘–1). In similar way, the tasks ๐‘๐‘ (๐‘Ž๐‘–2) (resp., ๐‘๐‘ (๐‘Ž๐‘–3)) are executed between processors ๐œ‹2+2(๐‘–โˆ’1)(๐ต+1)+2๐‘ (๐‘Ž๐‘–1) and ๐œ‹1+2(๐‘–โˆ’1)(๐ต+1)+2๐‘ (๐‘Ž๐‘–1)+2๐‘ (๐‘Ž๐‘–2) (resp. ๐œ‹2+2(๐‘–โˆ’1)(๐ต+1)+2๐‘ (๐‘Ž๐‘–1)+2๐‘ (๐‘Ž๐‘–2) and ๐œ‹2๐‘–(๐ต+1)).
(ii)Let us assume now that there is a schedule ๐‘† of length at most three. We will prove that ๐’œ={๐‘Ž1,โ€ฆ,๐‘Ž3๐‘€} can be partitioned into ๐‘€ disjoint subsets ๐’œ1,โ€ฆ,๐’œ๐‘€ with each summing up to ๐ต.

476939.fig.005
Figure 5: The beginning of the construction ofโ€‰โ€‰๐’ฒ graph from ๐บโˆ—โˆˆ๐’ข.
476939.fig.006
Figure 6: Next step of the construction of ๐’ฒ graph. Path of length three is created and precedence constraints between tasks are added.
476939.fig.007
Figure 7: Partition of ๐บโˆ— graph into tasks sets ๐’ฑ, ๐’ฆ, and ๐’ฑ.
476939.fig.008
Figure 8: The final ๐’ฒ graph issue from several transformations.

Lemma 3.8. In any valid schedule of length three there is no idle time.

Proof. The number of processors is ๐‘š=๐‘›โˆ— and the number of tasks is 3๐‘›โˆ— (4๐‘›๐ต for ๐’ต-graph and 3๐‘›โˆ—โˆ’4๐‘›๐ต for ๐’ฒ graph).

Lemma 3.9. In any valid schedule of length three, the subgraph induced by ๐’ฑ tasks must be executed on 2(๐ต+1) processors in succession.

Proof. Consider the subgraph induced by the ๐’ฑ tasks. This precedence graph admits paths of length two and these paths must be executed on the same processor (no communication delay is allowed).
Consider the tasks of path of length one. Let ๐‘ฃโˆ—๐‘–[0]โˆˆ๐’ฑ be a task without predecessor. By construction ๐‘ฃโˆ—๐‘–[0] admits one successor denoted byโ€‰โ€‰๐‘ฃโˆ—๐‘–+1[2]โˆˆ๐’ฑ.
Suppose that these two tasks are allotted on the same processor ๐œ‹๐‘™. Since that ๐‘ฃโˆ—๐‘–+1[2] admits another predecessor denoted by ๐‘ฃโˆ—๐‘–+2[0]โˆˆ๐’ฑ then ๐‘ฃโˆ—๐‘–+1[2] is allotted at ๐‘ก=2.
The task ๐‘ฃโˆ—๐‘–+2[0] cannot be executed at ๐‘ก=1 on ๐œ‹๐‘™ since this task admits another successor as ๐‘ฃโˆ—๐‘–+1[2]. Therefore, it exists an idle slot at ๐‘ก=1 on the processor ๐œ‹๐‘™. By construction there is no independent task and since the ๐’ต graph admits only path of length one, then no task can be allotted on this idle slot. This is impossible
In conclusion, the subgraph induced by ๐’ฑ tasks must be executed on 2(๐ต+1) processors in succession.

Lemma 3.10. In any valid schedule of length three, two subgraphs induced by the ๐’ฑ tasks from two disjoint paths of length 2(๐ต+1) cannot be allotted on the same processors.

Proof. Consider the ๐’ฑ tasks which are elements of two disjoints paths of length 2(๐ต+1). A task without predecessor of one path cannot be allotted on the same processor as a task without successor of other path since there is no isolated task to schedule.

Lemma 3.11. In any valid valid schedule of length three the ๐‘๐‘ (๐‘Ž๐‘—) tasks must be executed on the same processors as the ๐’ฑ tasks.

Proof. Let ฮ ={๐œ‹๐‘™โˆฃ๐’ฑ tasks allotted on ๐œ‹๐‘™} be the set of processors on which the ๐’ฑ tasks are executed.
Suppose that the ๐‘๐‘ (๐‘Ž๐‘—)-tasks are executed on processors ๐œ‹๐‘˜โˆ‰ฮ . By Lemma 3.8, there is no idle slot, then the tasks on the path of length three are necessarily allotted on processor ๐œ‹โˆ—โˆˆฮ . This is impossible by Lemma 3.9.

With previous lemmas, we know that 6๐‘›(๐ต+1) tasks (the ๐’ฑ tasks and the ๐‘๐‘ (๐‘Ž๐‘—)-tasks) are executed on the ๐‘› disjoints paths of length 2๐ต+1. By Definition 3.2, we know that the graph ๐บโˆ— admits a unique set of ๐‘› disjoints paths of length 2๐ต+1 with desired properties. Moreover with the precedence constraints, these tasks are allotted on a processor path of length (2๐ต+1). Without loss of generality, we suppose that a task ๐‘ฃ๐‘™๐’ฑ is executed on the processor ๐œ‹๐‘™ with ๐‘™โˆˆ{2๐‘›(๐ต+1)+1,โ€ฆ,๐‘›โˆ—}.

Building the partition {๐’œ1,โ€ฆ,๐’œ๐‘›} with desired property from ๐‘† schedule of length three, we know that two tasks of the same subgraph ๐‘๐‘ (๐‘Ž๐‘—) (see Lemma 3.11) cannot be executed on two different paths. The edge distance between these two processors is at least two.

We define ๐’œโ„“ such that ๐‘Ž๐‘—โˆˆ๐’œโ„“ if and only if the tasks of the graph ๐‘๐‘ (๐‘Ž๐‘—) are executed between the processors numbered as ๐œ‹1+(๐‘—โˆ’1)2(๐ต+1) to ๐œ‹2๐‘—(๐ต+1) with a fixed ๐‘—.

Now, we will compute โˆ‘๐‘Ž๐‘–โˆˆ๐’œโ„“๐‘ (๐‘Ž๐‘–).

Using previous remarks, without loss of generality, we suppose that ๐‘ฃโˆ—๐‘–[๐‘˜] with ๐‘–โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)} and ๐‘˜โˆˆ{0,1,2} (if it exists) are executed on ๐œ‹๐‘™ with ๐‘™โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)}. Consider the ๐‘๐‘ (๐‘Ž๐‘—)-tasks which are scheduled between processors ๐œ‹1+(๐‘—โˆ’1)2(๐ต+1) and ๐œ‹2๐‘—(๐ต+1) for a fixed ๐‘—โˆˆ{1,โ€ฆ,2๐‘›(๐ต+1)} except the index such that paths of length three constituted by tasks from ๐’ฑ, are allotted on ๐œ‹๐‘™.

Using Lemma 3.9, we know that the number of ๐’ฑ tasks executed on processors ๐œ‹1+(๐‘—โˆ’1)2(๐ต+1) and ๐œ‹2๐‘—(๐ต+1) for a fixed ๐‘— is 6+2๐ต.

In conclusion we have {๐’œ1,โ€ฆ,๐’œ๐‘›} which forms a ๐’œ with desired properties.

The construction suggested previously can be easily adapted to obtain a bipartite graph of depth one. Moreover, from the proof of Theorem 3.5, we can derive the following theorem.

Theorem 3.12. The problem of deciding whether an instance of (๐‘ƒ,๐บโˆ—)|๐›ฝ,๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜)=1,๐‘๐‘–=1|๐ถmax has a schedule of length at most three is ๐’ฉ๐’ซ-complete with ๐›ฝโˆˆ{prec,bipartite}.

Proof. The proof is similar as the proof of Theorem 3.5 by considering the graph ๎‚๐บโ€ฒ instead of widget ๎‚๐บ. Nevertheless each path of length two induced by the ๐’ฑ tasks is transformed into two paths of length one.
We use the same construction as it is proposed for the proof of Theorem 3.5. Nevertheless, all paths of length three are transformed into two paths in the following way: ๐‘ฃโˆ—๐‘–[0]โ†’๐‘ฃโˆ—๐‘–[1] and ๐‘ฃโˆ—๐‘–[0]โ†’๐‘ฃโˆ—๐‘–[2]. These three must be executed on the same processors. Indeed, if ๐‘ฃโˆ—๐‘–[2] admits several predecessors, it is obvious. Otherwise, suppose that ๐‘ฃโˆ—๐‘–[0] is allotted on a processor ๐œ‹. So ๐‘ฃโˆ—๐‘–[1] must be executed at ๐‘ก=1 on ๐œ‹. The task ๐‘ฃโˆ—๐‘–[2] is scheduled at ๐‘ก=2 on a neighborhood processor. Therefore no task from the graphs ๐’ต and ๎‚๐บโ€ฒ can be executed on processor ๐œ‹ at ๐‘ก=2. Now using the same arguments as previously there is a schedule of length three if and only if the set ๐’œ={๐‘Ž1,โ€ฆ,๐‘Ž3๐‘›} can be partitioned into ๐‘› disjoint subsets ๐’œ1,โ€ฆ,๐’œ๐‘› each summing up to ๐ต.

The proof of Theorem 3.5 therefore implies that the problem where the tasks can be duplicated is also ๐’ฉ๐’ซ-complete.

Corollary 3.13. The problem of deciding whether an instance of (๐‘ƒ,๐บโˆ—)|๐›ฝ;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1,๐‘‘๐‘ข๐‘|๐ถmax with ๐บโˆ—โˆˆ๐’ข has a schedule of length at most three is ๐’ฉ๐’ซ-complete with ๐›ฝโˆˆ{prec,bipartite}.

Proof. The proof comes directly from Theorems 3.5 and 3.12. In fact, Lemma 3.8 implies that no task can be duplicated (the number of the tasks is equal to the number of processors times 3).

Moreover, nonapproximability results can be deduced.

Corollary 3.14. No polynomial-time algorithm exists with a performance bound less than 4/3 unless ๐’ซ=๐’ฉ๐’ซ for the problems (๐‘ƒ,๐บโˆ—)|๐›ฝ;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜); ๐‘๐‘–=1|๐ถmaxand(๐‘ƒ,๐บโˆ—)|๐›ฝ;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜); ๐‘๐‘–=1,dup|๐ถmax๐›ฝโˆˆ{prec,bipartite} with ๐บโˆ—โˆˆ๐’ข.

Proof. The proof of Corollary 3.14 is an immediate consequence of the impossibility theorem; see [9, page 4].

3.2. Discussion

In the previous section, we propose a class graph ๐’ข for which the problem of deciding whether an instance of (๐‘ƒ,๐บโˆ—)|๐›ฝ;๐‘๐‘–๐‘—=๐‘‘(๐œ‹โ„“,๐œ‹๐‘˜);๐‘๐‘–=1|๐ถmax has a schedule of length at most three is ๐’ฉ๐’ซ-complete with ๐›ฝโˆˆ{prec,bipartite} and ๐บโˆ—โˆˆ๐’ข.

Hereafter, we will exhibit the parameters (๐ฟ,๐‘˜) for some classic structured graphs in order to prove that the class graph ๐’ข is not empty.(i)For a grid ๐บโˆ—=Grid(๐‘š,๐‘) (๐‘š,๐‘โˆˆโ„•, where the couple (๐‘–,๐‘—) designates the ๐‘— the position in the ๐‘– the line; 1โ‰ค๐‘–โ‰ค๐‘š,1โ‰ค๐‘—โ‰ค๐‘) (or torus) topology, we need ๐‘˜=2๐‘›+1 lines and ๐ฟ=2๐ต+2 columns. The set of vertices for the graph ๐บ a subgraph of ๐บโˆ— with the desired properties given by Definition 3.2 is ๐‘‰={(๐‘–,๐‘—),2โ‰ค๐‘–โ‰ค2๐‘›,๐‘–even,2โ‰ค๐‘—โ‰ค2๐ต+3} and ๐‘‰โ€ฒ={(๐‘–,1),1โ‰ค๐‘–โ‰ค2๐‘›+1}โˆช{(๐‘–,๐‘—),1โ‰ค๐‘–โ‰ค2๐‘›+1,๐‘–odd;1โ‰ค๐‘—โ‰ค2๐ต+3}. (ii)For the complete binary tree, it is sufficient to consider a tree with height of โŒˆlog(๐‘›)โŒ‰+2๐ต+1. (iii)For the Hypercube ๐ป(๐‘‘) topology (or cube connected cycles), it is sufficient to have ๐‘‘=2โŒˆlog(๐‘›)โŒ‰+๐ต+2. (iv)โ€ฆ.

4. An Approximation Algorithm for Processor Networks with a Fixed Diameter

4.1. Description and Correctness of an Algorithm

In order to design an efficient polynomial-time approximation algorithm, the classic strategy consists of taking an instance of the combinatorial optimization problem and applying some transformations and/or using polynomial-time algorithms as subroutines (shortest path, spanning tree, maximum matching, etc.). Afterwards, it is sufficient to evaluate the best lower bound for any optimal solution, and this lower bound may be compared to the feasible solution for the combinatorial optimization problem in order to determine the ratio of an approximation algorithm.

Here, instead of considering an instance ๐ผ and trying to directly develop a feasible solution for the (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถmax problem, we consider a partial instance of ๐ผ of our scheduling problem (An instance ๐ผ is constituted by a precedence graph with unit execution time and unit communication time, ๐‘š processors in ๐บ graph form, with the distance function.), denoted ๐ผโˆ—. (The partial instance ๐ผโˆ— of ๐ผ is constituted only by the precedence graph with unitary tasks and unitary communication time) For any instance ๐ผโˆ—, we use the classic approximation algorithm proposed by Munier and Kรถnig [10] for the ๐‘ƒ|prec;๐‘๐‘–๐‘—=1; ๐‘๐‘–=1|๐ถmax problem. We obtain a feasible schedule, denoted ๐‘† (we omit consideration of the processor graph for the moment) for the previous problem. Nevertheless, this solution is not feasible for our scheduling problem.

We proceed with polynomial-time chain of transformations, from schedule ๐‘† to a schedule ๐‘†๎…ž๎…ž, in order to get a feasible schedule. It is only in the last step, only for schedule ๐‘†๎…ž๎…ž, that we guarantee a feasible schedule for the problem (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถmax.

This chain is defined as follows: ๐ผโˆ—๐‘“โˆ’โ†’๐‘†๐‘”โˆ’โ†’๐‘†๎…žโ„Žโˆ’โ†’๐‘†๎…ž๎…ž (The schedule ๐‘†๎…ž is a feasible solution for the{๐‘ƒ,๐บ}|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถ๐‘š๐‘Ž๐‘ฅ problem.), where ๐‘“ is the Munier-Kรถnig algorithm [10], ๐‘” the dilatation algorithm (see [11] for details or Appendix A) and โ„Ž the folding algorithm (see [12] for details or Appendix B).

Subsequently, we will consider the three following scheduling problems:(i)๐‘ƒ|prec;๐‘๐‘–๐‘—=1; ๐‘๐‘–=1|๐ถmax, (ii)๐‘ƒ|prec;๐‘๐‘–๐‘—โ‰ฅ2; ๐‘๐‘–=1|๐ถmax, (iii)and finally (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถmax.

The principal steps of the algorithm are described below.

An approximation algorithm uses three steps. In each step we apply an algorithm for a specified scheduling problem [10โ€“12]. In the two first steps, a schedule is produced (these schedules are not feasible for our problem).(i)In the first step of an algorithm, a schedule (denoted ๐‘† on an unbounded number of processors), for the scheduling problem ๐‘ƒ|prec;๐‘๐‘–๐‘—=1;๐‘๐‘–=1|๐ถmax is produced. For this problem, Munier and Kรถnig [10] presented a (4/3)-approximation algorithm that is based on an integer linear programming formulation. They use the following procedure: an integrity constraint is relaxed, and a feasible schedule is produced by rounding. (ii)The second step of an algorithm produces a schedule (denoted ๐‘†๎…ž, also on an unbounded number of processors) from ๐‘† by applying the dilatation principle proposed by [11] for the problem ๐‘ƒ|prec;๐‘๐‘–๐‘—โ‰ฅ2;๐‘๐‘–=1|๐ถmax (this algorithm produces a feasible schedule for the large communication delay problem from unitary communication delay. We therefore have ๐‘†๎…ž=๐‘”(๐‘†) where ๐‘” is the dilatation algorithm. (iii)The third step produces a schedule ๐‘†๎…ž๎…ž (feasible for the (๐‘ƒ,๐บโˆ—)|prec,๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1,๐‘๐‘–=1|๐ถmax problem) on the ๐บ topology from ๐‘†๎…ž using the folding principle [12]. The folding procedure constructs a feasible schedule on restricted number of processors from a feasible schedule on an unbounded number of processors. Thus, ๐‘†๎…ž๎…ž=โ„Ž(๐‘†๎…ž) with โ„Ž being the folding algorithm.

Note that the length of schedule ๐‘† is less than ๐‘†๎…ž, which is less than ๐‘†๎…ž๎…ž. The three steps are summarized in Figure 9. The notation description is given in the proof of Theorem 4.2.

476939.fig.009
Figure 9: Description of chain of polynomial-time transformations.

Theorem 4.1. The previous algorithm leads a feasible schedule for the problem (๐‘ƒ,๐บโˆ—|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถmax.

Proof. Proof is clear from the previous discussion concerning the description of an algorithm. Indeed, the communication delay is preserved and the precedence constraint is respected. Moreover, at most ๐‘š tasks are executed at any time.

4.2. Relative Performance Analysis

Theorem 4.2. The problem (๐‘ƒ,๐บโˆ—)|prec;๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1;๐‘๐‘–=1|๐ถmax may be approximable within a factor of ((๐›ฟ+1)2/3)+1 using the previous algorithm.

Proof. We denote using ๐ถ๐‘ฅ,๐‘ฆ,๐‘งmax with ๐‘ฅโˆˆ{opt,โˆ…}, ๐‘ฆโˆˆ{UET-UCT,UET-LCT(c=๐›ฟ),๐บโˆ—}, and ๐‘งโˆˆ{๐‘š,โˆž} the length of the schedule. Moreover ๐œŒ๐บโˆ—,๐‘š (resp., ๐œŒ๐บโˆ—,โˆž) designates the performance ratio on a ๐บ processor network model with a bounded (resp., unbounded) number of processors.
Now let us examine the relative performance of this algorithm. (i)According to an algorithm, the first step deals with the problem ๐‘ƒ|prec;๐‘๐‘–๐‘—=1;๐‘๐‘–=1|๐ถmax.
First of all the Schedule (UET-UCT,โˆž) is not optimal. Using the algorithm from [10] gives us a 4/3 relative performance. And so, by [10], we know that ๐ถUET-UCT,โˆžmaxโ‰ค43๐ถopt,UET-UCT,โˆžmax.(4.1)(ii)In the second step, a feasible solution for a large communication delay ๐‘=๐›ฟ (recall that ๐›ฟ stands for the diameter of processors network) is created. This solution comes from using the dilatation algorithm. Then, the expansion coefficient is (๐›ฟ+1)/2 ([11]). And so,๐ถUET-LCT(๐‘=๐›ฟ),โˆžmaxโ‰ค๐›ฟ+12โ‹…43๐ถopt,UET-LCT(c=๐›ฟ),โˆžmax,๐ถ(4.2)UET-LCT(๐‘=๐›ฟ),โˆžmaxโ‰ค2(๐›ฟ+1)3๐ถopt,UET-LCT(c=๐›ฟ),โˆžmax.(4.3) Thus, we have a schedule on a UET-LCT task system with a communication delay equal to ๐›ฟ and an infinite number of processors.
By definition it is obvious that ๐ถ๐บโˆ—,โˆžmaxโ‰ค๐ถUET-LCT(๐‘=๐›ฟ),โˆžmax,๐ถ(4.4)opt,UET-UCT,โˆžmaxโ‰ค๐ถopt,UET-LCT(c=๐›ฟ),โˆžmax.(4.5) It is necessary to evaluate the gap between the optimal length for the schedule on a fully connected processor graph and a processor graph with a diameter of length ๐พ. For this, we consider unitary tasks subject to precedence constraints and an unbounded number of processors.

Lemma 4.3. The gap between a schedule on a fully connected graph of processors with a large communication delay ๐‘, for all pairs of tasks, and a schedule on a graph of processors with a diameter of length ๐พโˆˆโ„•, is at most (๐‘+1)/2.

Proof. We need to compare first the relative performance of this schedule on our model with network processor. The relative performance for the UET-LCT task system is not valid for our model. We need to compute a new bound for this schedule on our model.
Let ๐‘={๐‘ฅ1,๐‘ฅ2,โ€ฆ,๐‘ฅ๐‘›}be a critical path of the schedule (i.e., a path that gives the length of the schedule). Suppose that there is a communication delay between each pair of tasks (๐‘ฅ๐‘–,๐‘ฅ๐‘–+1) with 1โ‰ค๐‘–<๐‘›. In the UET-LCT task system ( with a communication delay equal to ๐‘ for all pair of tasks) the length of the schedule would be (1+๐‘)๐‘›โˆ’๐‘ units of time. In the graph of processors with a diameter of length ๐‘˜, the same path allows a length of (๐‘˜/2)(๐‘›โˆ’1)+๐‘› units of time. The worst case of the length for this path is ๐‘›+(๐‘›โˆ’1)๐‘˜ and the best case is 2๐‘›โˆ’1. So, the ratio is (๐‘›(1+๐‘)โˆ’๐‘)/(2๐‘›โˆ’1). For the large ๐‘›, we obtain the desired result.

By applying Lemma 4.3, which is valid for all schedules, and in particular for the optimum, with ๐‘=๐›ฟ, we obtain๐ถopt,UET-LCT(๐‘=๐›ฟ),โˆžmaxโ‰ค(๐›ฟ+1)2๐ถopt,Gโˆ—,โˆžmax(4.6) and so๐ถ๐บโˆ—,โˆžmaxโ‰ค๐ถUET-LCT(๐‘=๐›ฟ),โˆžmax๐ถby(4.4)(4.7)๐บโˆ—,โˆžmaxโ‰ค2(๐›ฟ+1)3๐ถopt,UET-LCT(๐‘=๐›ฟ),โˆžmax๐ถusing(4.3)(4.8)๐บโˆ—,โˆžmaxโ‰ค(๐›ฟ+1)23๐ถopt,๐บ,โˆžmax๐œŒusing(4.6)(4.9)๐บโˆ—,โˆžโ‰ค(๐›ฟ+1)23.(4.10) Now we have to transform this schedule using an infinite number of processors into a schedule with a bounded number of processors. This can be done easily using the method from [12]. The new worst-case relative performance is just increased by one. Thus we have๐œŒ๐บโˆ—,๐‘šโ‰ค๐œŒ๐บโˆ—,โˆž+1โ‰ค(๐›ฟ+1)23+1.(4.11)

Remark 4.4. Note that the order of the operations may be modified. Nevertheless, the ratio becomes 7/6ร—(๐›ฟ+1)2. Indeed, the folding principle may be used just after the solution given by an algorithm proposed by Munier and Kรถnig [10]. We then obtain a schedule on ๐‘š processors. Afterwards, we apply the dilation principle. This order yields a polynomial-time approximation algorithm with a ratio bounded by 7/6ร—(๐›ฟ+1)2.

Remark 4.5. we may recall two classic results in scheduling problems for which the performance ratio increases by one between the unbounded and bounded versions.

(1) When the number of processors is unlimited, the problem of scheduling a set of ๐‘› tasks under precedence constraints with noncommunication delay is polynomial. It is sufficient to use the classical algorithm given by Bellman [13] as well as the two techniques widely used in project management: CPM (Critical Path Method) and PERT (Project/Program Evaluation and Review Technique). In contrast, when the number of processors is limited, the problem becomes ๐’ฉ๐’ซ-complete and a (2โˆ’1/๐‘š)-approximation is developed by Graham, see [14], where ๐‘š designates the number of processors based on a list scheduling in which no order on tasks is specified.

(2) The second illustration is given by the transition to UET-UCT on unrestricted version to the restricted variant. In [10], we know the existence of a 4/3-approximation algorithm. Using the previous result Munier and Hanen in [15] design a 7/3-approximation for the restricted version.

5. Conclusion

We have sharpened the demarcation line between the polynomially solvable and ๐’ฉ๐’ซ-hard case of the central scheduling problem (UET-UCT) on a structured processor network by showing that its decision is polynomially solvable for ๐ถmaxโ‰ค2 while it is ๐’ฉ๐’ซ-complete for ๐ถmaxโ‰ฅ3. This result is given for a large class of graph with a nonconstant diameter. This result implies there is no ๐œŒ-approximation algorithm with ๐œŒ<4/3. These results are extended to the case of precedence graph is a bipartite graph.

Lastly, we complete our complexity results by developing a polynomial-time approximation algorithm for (๐‘ƒ,๐บโˆ—)|prec,๐‘๐‘–๐‘—=๐‘‘(๐œ‹๐‘˜,๐œ‹๐‘™)=1,๐‘๐‘–=1|๐ถmax with a worst-case relative performance of (๐›ฟ+1)2/3+1, where ๐›ฟ designates the diameter of the graph. An interesting question for further research is to find a polynomial-time approximation algorithm with performance guarantee ๐œŒ with ๐œŒโˆˆโ„.

Appendices

A.

This section describes the dilatation principle. This principle has been studied in [11], and used for designing a new polynomial-time approximation algorithm with a nontrivial performance guarantee for the problem ๐‘ƒ|prec;๐‘๐‘–๐‘—=๐‘โ‰ฅ2;๐‘๐‘–=1|๐ถmax. For the latter problem, the authors propose a ๐‘+1/2-approximation algorithm (the best ratio as far as we know).

A.1. Introduction, Notation, and Description of the Method

Notation 1. We use ๐œŽโˆž to denote the UET-UCT schedule, and by ๐œŽโˆž๐‘ the UET-LCT schedule. Moreover, we use ๐‘ก๐‘– (resp., ๐‘ก๐‘๐‘–) to denote the starting time of the task ๐‘– in schedule ๐œŽโˆž (resp., in schedule ๐œŽโˆž๐‘).

Principle
The tasks in ๐œŽโˆž๐‘ allow the same assignment as the feasible schedule ๐œŽโˆž on an unbounded number of processors. We proceed to an expansion of the makespan, while preserving the communication delay (๐‘ก๐‘๐‘—โ‰ฅ๐‘ก๐‘๐‘–+1+๐‘) for two tasks ๐‘– and ๐‘—, with (๐‘–,๐‘—)โˆˆ๐ธ, processing on two different processors. For this, the starting time ๐‘ก๐‘๐‘– is translated by a factor ๐‘‘.
In the following section, we will justify and determine the coefficient ๐‘‘.
More formally, let ๐บ=(๐‘‰,E) be a precedence graph. We determine a feasible schedule ๐œŽโˆž, for the model UET-UCT, using the (4/3)-approximation algorithm proposed by Munier and Kรถnig [10]. The result of this algorithm gives a couple of values (๐‘ก๐‘–,๐œ‹), โˆ€๐‘–โˆˆ๐‘‰ on the schedule ๐œŽโˆž with ๐‘ก๐‘– being the starting time of the task ๐‘– for the schedule ๐œŽโˆž and ๐œ‹ the processor on which the task ๐‘– will be processed at ๐‘ก๐‘–.
From this solution, we will derive a solution for the problem with large communication delays. For this, we will propose a new couple of values (๐‘ก๐‘๐‘–,๐œ‹โ€ฒ),โˆ€๐‘–โˆˆ๐‘‰ derived from couple (๐‘ก๐‘–,๐œ‹). The computation of this set of new couples is obtained in the following ways: the start time ๐‘ก๐‘๐‘–=๐‘‘ร—๐‘ก๐‘–=((๐‘+1)/2)๐‘ก๐‘– and, ๐œ‹=๐œ‹โ€ฒ. In other words, all tasks in the schedule ๐œŽโˆž๐‘ are allotted on the same processor as the schedule ๐œŽโˆž, and the starting time of a task ๐‘– undergoes a translation with a factor (๐‘+1)/2. The justification of the expansion coefficient is given below. An illustration of the expansion is given in Figure 10.

476939.fig.0010
Figure 10: Illustration of the notion of expansion.
A.2. Feasibility, Analysis of the Method, and Computation of the Ratio

Afterwards, we will justify the existence of the coefficient ๐‘‘. Moreover, we prove the correctness of the feasible schedule for ๐‘ƒ|prec;๐‘๐‘–๐‘—=๐‘โ‰ฅ2; ๐‘๐‘–=1|๐ถmax problem. Lastly, we propose a worst-case analysis for the algorithm.

Lemma A.1. The coefficient of an expansion is ๐‘‘=(๐‘+1)/2.

Proof. Let there be two tasks ๐‘– and ๐‘— such that (๐‘–,๐‘—)โˆˆ๐ธ, which are processed on two different processors in the feasible schedule ๐œŽโˆž. We are interested in obtaining a coefficient ๐‘‘ such that ๐‘ก๐‘๐‘–=๐‘‘ร—๐‘ก๐‘– and ๐‘ก๐‘๐‘—=๐‘‘ร—๐‘ก๐‘—. After expansion, in order to respect the precedence constraints and communication delay, we must have ๐‘ก๐‘๐‘—โ‰ฅ๐‘ก๐‘๐‘–+1+๐‘, and so ๐‘‘ร—๐‘ก๐‘–โˆ’๐‘‘ร—๐‘ก๐‘—โ‰ฅ๐‘+1,๐‘‘โ‰ฅ(๐‘+1)/(๐‘ก๐‘–โˆ’๐‘ก๐‘—),๐‘‘โ‰ฅ๐‘+1/2. It is sufficient to choose ๐‘‘=(๐‘+1)/2.

Lemma A.2. An expansion algorithm gives a feasible schedule for the ๐‘ƒ|prec;๐‘๐‘–๐‘—=๐‘โ‰ฅ2;๐‘๐‘–=1|๐ถmax problem in ๐‘‚(๐‘›).

Proof. It is sufficient to check that the solution given by an expansion algorithm produces a feasible schedule for the UET-LCT model. Let ๐‘– and ๐‘— be two tasks such that (๐‘–,๐‘—)โˆˆ๐ธ. We use ๐œ‹๐‘– (resp., ๐œ‹๐‘—) to denote the processor on which task ๐‘– (resp., the task ๐‘—) is executed in schedule ๐œŽโˆž. Moreover, we use ๐œ‹โ€ฒ๐‘– (resp., ๐œ‹โ€ฒ๐‘—) to denote the processor on which task ๐‘– (resp., the task ๐‘—) is executed in schedule ๐œŽโˆž๐‘. Thus, (i)if ๐œ‹๐‘–=๐œ‹๐‘— then ๐œ‹โ€ฒ๐‘–=๐œ‹โ€ฒ๐‘—. Since the solution given by Munier and Kรถnig [10] gives a feasible schedule on the model UET-UCT, we have ๐‘ก๐‘–+1โ‰ค๐‘ก๐‘—, (2/(๐‘+1))๐‘ก๐‘๐‘–+1โ‰ค(2/(๐‘+1))๐‘ก๐‘๐‘—; ๐‘ก๐‘๐‘–+1โ‰ค๐‘ก๐‘๐‘–+(๐‘+1)/2โ‰ค๐‘ก๐‘๐‘—;(ii)if ๐œ‹๐‘–โ‰ ๐œ‹๐‘— then ๐œ‹โ€ฒ๐‘–โ‰ ๐œ‹โ€ฒ๐‘—. We have ๐‘ก๐‘–+1+1โ‰ค๐‘ก๐‘—,(2/(๐‘+1))๐‘ก๐‘๐‘–+2โ‰ค(2/(๐‘+1))๐‘ก๐‘๐‘—; ๐‘ก๐‘๐‘–+(๐‘+1)โ‰ค๐‘ก๐‘๐‘—.

Theorem A.3. An expansion algorithm gives a (2(๐‘+1)/3)-approximation algorithm for the problem ๐‘ƒ|prec;๐‘๐‘–๐‘—=๐‘โ‰ฅ2;๐‘๐‘–=1|๐ถmax.

Proof. We use ๐ถโ„Ž,UET-UCT,โˆžmax (resp., ๐ถopt,UET-UCT,โˆžmax) to denote the makespan of the schedule computed by Munier and Kรถnig (resp., the optimal value of a schedule ๐œŽโˆž). In the same way, we use ๐ถโ„Žโˆ—,UET-LCT,โˆžmax (resp., ๐ถopt,UET-LCT,โˆžmax) to denote the makespan of the schedule computed by an algorithm (resp., the optimal value of a schedule ๐œŽโˆž๐‘).
We know that ๐ถโ„Ž,UET-UCT,โˆžmaxโ‰ค43๐ถopt,UET-UCT,โˆžmax.(A.1) Thus, we obtain ๐ถโ„Žโˆ—,UET-LCT,โˆžmax๐ถopt,UETโˆ’LCT,โˆžmax=((๐‘+1)/2)๐ถโ„Ž,UET-UCT,โˆžmax๐ถopt,UET-LCT,โˆžmaxโ‰ค((๐‘+1)/2)๐ถโ„Ž,UET-UCT,โˆžmax๐ถopt,UET-UCT,โˆžmaxโ‰ค(๐‘+1/2)(4/3)๐ถopt,UET-UCT,โˆžmax๐ถopt,UET-UCT,โˆžmaxโ‰ค2(๐‘+1)3.(A.2)

B.

In this section, we present a simple algorithm which gives a schedule ๐œŽ๐‘š on ๐‘š machines from a schedule ๐œŽโˆž on an unbounded number of processors for ๐‘ƒ|prec,๐‘๐‘–๐‘—=1,๐‘๐‘–=1|๐ถmax. Let ๐‘‹๐‘– be the set of tasks executed at ๐‘ก๐‘– in ๐œŽโˆž using a heuristic โ„Žโˆ—. The ๐‘‹๐‘– tasks are executed in โŒˆ|๐‘‹๐‘–|/๐‘šโŒ‰ units of time in the schedule ๐œŽ๐‘š. We apply this procedure for all ๐‘–=0,โ€ฆ,๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโˆ’1. The validity of this algorithm is based on the fact there is at most a matching between the tasks executed at ๐‘ก๐‘– and the tasks processed at ๐‘ก๐‘–+1 (called Brent's lemma, see [12]).

Theorem B.1. From any polynomial time algorithm โ„Žโˆ— with performance guarantee ๐œŒโˆž (i.e., ๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโ‰ค๐œŒโˆž๐ถopt,UET-UCT,โˆžmax) for the problem ๐‘ƒ|prec,๐‘๐‘–๐‘—=1,๐‘๐‘–=1|๐ถmax, we may obtain a polynomial-time algorithm โ„Ž with performance guarantee ๐œŒ๐‘š=(1+๐œŒโˆž) for the problem ๐‘ƒ|prec,๐‘๐‘–๐‘—=1,๐‘๐‘–=1|๐ถmax.

Proof. Let ๐ถโ„Žโˆ—,UET-UCT,โˆžmax (resp., ๐ถโ„Ž,UET-UCT,๐‘šmax) be the length of the schedule given by โ„Žโˆ— (resp., by โ„Ž). In the same way, let ๐ถopt,UET-UCT,โˆžmax (resp., ๐ถopt,UET-UCT,๐‘šmax) be the optimal length of the schedule on an unbounded number of processors (resp., in a restricted number of processors). We denote by ๐‘› the number of tasks in the schedule. Clearly, this gives us ๐ถopt,UET-UCT,โˆžmaxโ‰ค๐ถopt,UET-UCT,๐‘šmax and ๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโ‰ค๐œŒ๐ถโ„Ž,UET-UCT,โˆžmax. So, ๐ถโ„Ž,UET-UCT,๐‘šmaxโ‰ค๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโˆ’1๎“๐‘–=0๎ƒ‘||๐‘‹๐‘–||๐‘š๎ƒ’โ‰ค๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโˆ’1๎“๐‘–=0||๐‘‹๎‚ต๎ƒ“๐‘–||๐‘š๎ƒ”๎‚ถ,๐ถ+1โ„Ž,UET-UCT,๐‘šmaxโ‰ค๐ถโ„Žโˆ—,UET-UCT,โˆžmaxโˆ’1๎“๐‘–=0๎‚ต||๐‘‹๐‘–||๐‘š๎‚ถ+๐ถโ„Žโˆ—,UET-UCT,โˆžmax,๐ถโ„Ž,UET-UCT,๐‘šmaxโ‰ค๐ถopt,UET-UCT,๐‘šmax+๐ถโ„Žโˆ—,UET-UCT,โˆžmax,๐ถโ„Ž,UET-UCT,๐‘šmaxโ‰ค๐ถopt,UET-UCT,๐‘šmax+๐œŒ๐ถopt,UET-UCT,๐‘šmax,๐œŒ๐‘šโ‰ค(1+๐œŒโˆž).(B.1) This concludes proof of Theorem B.1.

References

  1. R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan, โ€œOptimization and approximation in deterministic sequencing and scheduling: a survey,โ€ Annals of Discrete Mathematics, vol. 5, pp. 287โ€“326, 1979. View at Publisher ยท View at Google Scholar ยท View at Zentralblatt MATH
  2. C. Picouleau, โ€œUET-UCT schedules on arbitrary networks,โ€ Tech. Rep., LITP, Blaise Pascal, Université Paris VI, 1994. View at Google Scholar
  3. R. Giroudeau, J. C. König, and B. Valéry, โ€œScheduling uet-tasks on a star network: complexity and approximation,โ€ 4OR A Quarterly Journal of Operations Research, vol. 9, no. 1, pp. 29โ€“48, 2011. View at Google Scholar
  4. V. Boudet, Y. Cohen, R. Giroudeau, and J. C. Konig, โ€œComplexity results for scheduling problem with non trivial topology of processors,โ€ Tech. Rep. 06050, LIRMM, 2006, submitted to Rairo-RO. View at Google Scholar
  5. C. Lahlou, โ€œScheduling with unit processing and communication times on a ring network: approximation results,โ€ in Proceedings of Europar, pp. 539โ€“542, Springer, New York, NY, USA, 1996. View at Google Scholar
  6. V. J. Rayward-Smith, โ€œUET scheduling with unit interprocessor communication delays,โ€ Discrete Applied Mathematics, vol. 18, no. 1, pp. 55โ€“71, 1987. View at Publisher ยท View at Google Scholar ยท View at Zentralblatt MATH
  7. J. J. Hwang, Y.-C. Chow, F. D. Anger, and C.-Y. Lee, โ€œScheduling precedence graphs in systems with interprocessor communication times,โ€ SIAM Journal on Computing, vol. 18, no. 2, pp. 244โ€“257, 1989. View at Publisher ยท View at Google Scholar ยท View at Zentralblatt MATH
  8. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of 𝒩𝒫-Completeness, A Series of Books in the Mathematical Science, W. H. Freeman, San Francisco, Calif, USA, 1979.
  9. P. Chrétienne and C. Picouleau, Scheduling Theory and Its Applications, Scheduling with Communication Delays: A Survey, chapter 4, John Wiley & Sons, Chichester, UK, 1995.
  10. A. Munier and J. C. König, โ€œA heuristic for a scheduling problem with communication delays,โ€ Operations Research, vol. 45, no. 1, pp. 145โ€“148, 1997. View at Google Scholar
  11. R. Giroudeau, J.-C. Konig, F. K. Moulai, and J. Palaysi, โ€œComplexity and approximation for precedence constrained scheduling problems with large communication delays,โ€ Theoretical Computer Science, vol. 401, no. 1–3, pp. 107โ€“119, 2008. View at Publisher ยท View at Google Scholar ยท View at Zentralblatt MATH
  12. R. P. Brent, โ€œThe parallel evaluation of general arithmetic expressions,โ€ Journal of the Association for Computing Machinery, vol. 21, pp. 201โ€“206, 1974. View at Google Scholar ยท View at Zentralblatt MATH
  13. R. Bellman, โ€œOn a routing problem,โ€ Quarterly of Applied Mathematics, vol. 16, pp. 87โ€“90, 1958. View at Google Scholar ยท View at Zentralblatt MATH
  14. R. Graham, โ€œBounds for certain multiprocessing anomalies,โ€ Bell System Technical Journal, vol. 45, pp. 1563โ€“1581, 1966. View at Google Scholar
  15. A. Munier and C. Hanen, โ€œAn approximation algorithm for scheduling unitary tasks on m processors with communication delays,โ€ private communication, 1996.