#### Abstract

This paper focuses on real-time nonpreemptive multiprocessor scheduling with precedence and strict periodicity constraints. Since this problem is NP-hard, there exist several approaches to resolve it. In addition, because of periodicity constraints our problem stands for a decision problem which consists in determining if, a solution exists or not. Therefore, the first criterion on which the proposed heuristic is evaluated is its schedulability. Then, the second criterion on which the proposed heuristic is evaluated is its execution time. Hence, we performed a schedulability analysis which leads to a necessary and sufficient schedulability condition for determining whether a task satisfies its precedence and periodicity constraints on a processor where others tasks have already been scheduled. We also present two multiperiodic applications.

#### 1. Introduction

Hard Real-Time problematic is to maintain temporal and functional achievement of systems execution. Hard real-time scheduling has been concerned with providing guarantees for temporal feasibility of task execution whatever the situations. A scheduling algorithm is defined as a set of rules defining the execution of tasks at system run-time. It is provided thanks to a schedulability analysis, which determines, whether a set of tasks with parameters describing their temporal behavior will meet their temporal constraints if executed at run-time according to the rules of the scheduling algorithm. The result of such a test is typically a yes or a no answer indicating whether a solution will be found or not. These schemes and tests demand precise assumptions about task properties, which hold for the entire system lifetime.

In order to assist the designers, scientists at INRIA proposed a methodology called AAA (Algorithm Architecture Adequation) and its associated system-level CAD software called SynDEx. They cover the whole development cycle, from the specification of the application functions, to their implementation running in real-time on a distributed architecture composed of processors and specific integrated circuits. AAA/SynDEx provides a formal framework based on graphs and graph transformation. On the one hand, they are used to specify the functions of the applications, the distributed resources in terms of processors, and/or specific integrated circuit and communication media, the nonfunctional requirements such as temporal criteria. On the other hand, they assist the designer in implementing the functions onto the resources while satisfying timing requirements and, as much as possible, minimizing the resources. This is achieved through a graphical environment which allows the user to explore manually and/or automatically, using optimization heuristics, the design space solutions. Exploration is mainly carried out through real-time scheduling analysis and timing functional simulations. Their results predict the real-time behavior of the application functions executed onto the various resources, that is, processors, integrated circuits, and communication media. This approach conforms to the typical hardware/software codesign process. Finally, the code that is automatically generated as a dedicated real-time executive, or as a configuration file for a resident real-time operating system such as Osek and RTlinux, [1, 2], details the AAA methodology and SynDEx.

In practice, periodic tasks are commonly found in applications such as avionics and process control when accurate control requires continual sampling and processing of data. Such applications based on automatic control and/or signal processing algorithms are usually specified with block-diagrams. They are composed of functions producing and consuming data, and each function can start its execution as soon as the data it consumes are available, and the cost of a task scheduling is a constant included in its worst case execution time (WCET). This periodicity constraint is the same as the one we find in the Liu and Layland model [3]. A data transfer between producer and consumer tasks leads to precedence constraints that the scheduling must satisfy. In systems we deal with, besides periodicity and precedence constraints, some tasks must be repeated according to a strict period. These tasks represent sensors and actuators by interacting with the environment surrounding the system and for which no data exchanged with this environment is lost. As data produced by the environment to the system are consumed strict periodically on the one hand and data expected by the environment from the system with a strict periodically way on the other hand, sensors and actuators must be executed at a strict periods [4]. In order to satisfy the strict periodicity of these tasks, we consider that all the system tasks have a strict period.

Strict period means that if the periodic task has period then for all , , where and are the th and the th instances of the task , and and are their start times [5]. Notice that is the th instance or repetition of the periodic task .

The multiprocessor real-time scheduling problem with precedences constraints, but without periodicity constraints, whatever the number of processors, has always a solution. The distributed schedule length, corresponding to the total execution time (Makespan), determines the solution quality. On the contrary, when periodicity constraints must be satisfied the problem may not have a solution. In other words, an application with precedence and periodicity constraints is either schedulable or not.

Thus, this paper discusses three mains goals:(i)the precedence and periodicity constraints must be satisfied. This is achieved by repeating every task according to its period onto the same processor, and receiving all the data a task needs before it executes. In comparison with the previous AAA/SynDEx, additional steps are required to assign the tasks to the processors and to add missing precedence before being distributed and scheduled;(ii)distributed architectures involve interprocessor communications the cost of which must be taken into account accurately, as explained in [6]. These communications must be handled even when communicating tasks have different periods;(iii)since the target architecture is embedded, it is necessary to minimize the total execution time in the one hand to insure that feedback control is correct, and in the other hand to minimize the resource allocation.

As it is mentioned previously, we are interested in nonpreemptive scheduling. This choice is motivated by a variety of reasons including [7]: (i)in many practical real-time scheduling problems such as scheduling, properties of device hardware and software either make preemption impossible or prohibitively expensive. The preemption cost is either not taken into account or still not really controlled;(ii)nonpreemptive scheduling algorithms are easier to implement than preemptive algorithms and can exhibit dramatically lower overhead at run-time;(iii)the overhead of preemptive algorithms is more difficult to characterize and predict than that of nonpreemptive algorithms. Since scheduling overhead is often ignored in scheduling models, an implementation of a nonpreemptive scheduler will be closer to the formal model than an implementation of a preemptive scheduler.

For these reasons, designers often use nonpreemptive approaches, even though elegant theoretical results on preemptive approaches do not extend easily to them [8].

##### 1.1. Related Work

According to the multiprocessor scheduling scheme (partitioned or global), the use of these algorithms changes:(i)in partitioned multiprocessor scheduling, it is necessary to have a scheduling algorithm for every processor and an allocation (distribution) algorithm is used to allocate tasks to processors (Bin Packing) which implies heuristics utilization. Schedulability conditions for RM and EDF in multiprocessor were given;(ii)in global multiprocessor scheduling, there is only one scheduling algorithm for all the processors. A task is put in a queue that is shared by all processors and, since migration is allowed, a preempted task can return to the queue to be allocated to another processor. Tasks are chosen in the queue according to the unique scheduling algorithm, and executed on the processor which is idle. Schedulability conditions for RM and EDF in multiprocessor were given in the case of preemptive tasks.

It is well known that the Liu and Layland results, in terms of optimality and schedulability condition, break down on multiprocessor systems [9]. Dhall and Liu [10] gave examples of task sets for which global RM and EDF scheduling can fail at very low processor utilization, essentially leaving almost one processor idle nearly all of the time. Reasoning from such examples is tempting to conjecture that perhaps RM and EDF are not good or efficient scheduling policies for multiprocessor systems even if, at the moment, these conclusions have not been formally justified [11].

In addition, these scheduling algorithms consider a unique independent tasks which means that tasks do not exchange data, whereas these dependences are specified in our applications by a direct graph of tasks.

In [12] tasks are preemptive but dependent except that only tasks with the same period can communicate. AAA/SynDEx handles dependences between tasks with multiple periods by transforming the initial graph.

##### 1.2. Model

We deal with systems of real-time tasks with precedence and strict periodicity constraints. A task (denoted in this paper as is characterized by a period , a worst case execution time with , and a start time .

The precedences between tasks are represented by a directed acyclic graph (DAG) denoted such that . is the set of tasks characterized as above, and the set of edges which represents the precedence (dependence) constraints between tasks. Therefore, the directed pair of tasks means that must be scheduled, only if was already scheduled, and thus we have .

We assume that periods and WCETs are multiple of a unit of time which means that they are integers representing, for example, some cycles of the processor clock. If a task with execution time is said to start at time unit , it starts at the beginning of time unit and completes before the beginning of time unit . Thus the time interval where task is executed is .

#### 2. Real-Time Nonpreemptive Scheduling with Precedence and Strict Periodicity Constraints

In order to satisfy these constraints, the heuristic we propose is divided into three algorithms. These algorithms are called Assignment, Unrolling, and Scheduling.

##### 2.1. Assignment

Among the two approaches mentioned below for solving multiprocessor scheduling problems, we chose the partitioned one. Consequently, the first algorithm of the heuristic consists in partitioning tasks over the architecture processors. This first algorithm is called Assignment referring that each task is assigned to a processor such that all tasks assigned to one processor are schedulable on this processor. If a task is schedulable on several processor, it will be assigned to every processor.

This algorithm determines if the system is schedulable when all tasks were assigned to the processors, or not schedulable if at least one task was not assigned. The assignment algorithm uses the outcome of the following scheduling analysis.

###### 2.1.1. Scheduling Analysis

The schedulability analysis consists in verifying that a task could be scheduled with other tasks already proved schedulable using the same schedulability analysis. “A schedulable task” means that it exists one or several time intervals on which this task can be scheduled, that is, its period and periods of already schedulable tasks are satisfied.

The following theorem gives a necessary and sufficient condition for scheduling two tasks.

Theorem 2.1. *Two tasks and are schedulable if and only if
*

*Proof. *Let (GCD denotes the greatest common divisor). We start by proving that (2.1) is a sufficient condition. Let us assume that and are schedulable and that and . Thus, each instance of the task is executed within an interval belonging to the set of intervals , such that and each instance of the task is executed within an interval belonging to the set of intervals , such that .

We merge the notation in intervals and . may be rewritten in this way: , and if () such that , then we obtain . Similarly, we find that ( is an integer such as )) . The assumption made at the beginning ( and are schedulable) implies that no intervals belonging to and overlap. We notice that the starts of the intervals of the set represent multiples of . On the other hand, the ends of the intervals of the set represent a multiple of . Intervals of the sets and do not overlap which means that if () then , which is equivalent to . This proves the sufficiency of the condition 1.

In order to prove the necessity of 1, we show that if then tasks and are schedulable. This is equivalent to show that if tasks and are not schedulable then . Without any loss of generality, we assume that .

and are not schedulable means that two integers and exist such that
this is equivalent to

According to Bezout theorem, two integers and exist such that . By taking and , we have

This latter is true if the length of the empty intervals between the intervals , (which is equal to )) is less than the length of the intervals (which is equal to ). That is, . This condition is equivalent to . This concludes the proof of Theorem 2.1.

Now we are interested in the schedulability of a set of tasks (more than two). Let us introduce the following property for a given tasks set.

*Definition 2.2. *If a set of task such that for each two tasks and , is the same, then this set is said to satisfy the SGCD property (SGCD for Same Greater Common Divisor).

The following theorem introduces a necessary and sufficient condition for a set of tasks satisfying the SGCD property.

Theorem 2.3. *Tasks of a set , which satisfies the SGCD property, are schedulable if and only if
** is the GCD of any pair of tasks from this set.*

*Proof. *In order to prove the sufficiency, we proceed by the same way as in the proof of Theorem 2.3. Let us assume that tasks of the set are schedulable and that and . Thus each instance of the task () is executed in an interval belonging to the intervals set . may be rewritten in . If is an integer such that , then we obtain . The assumption made at the beginning (tasks are schedulable) implies that no intervals belonging to the sets () do not overlap. We notice that the starts of the intervals of the set represent multiples of . On one hand, the ends of the intervals of the set represent a multiple of . On the other hand, the ends of the intervals of the set represent a multiple of . Intervals of the sets do not overlap which means that for maximal then , which is equivalent to . As we deduce that . This proves the sufficiency of the condition 2.

We prove the necessity of condition 2 by showing that if then tasks of the set are schedulable. This is equivalent to show that if tasks of the set are not schedulable, then . For this we use the proof by induction.*The Base Case*

For a set of two tasks this condition was proved in Theorem 2.1.*The Inductive Step*

We show that if the condition 2 is valid for the set then it is the same for the task .

Tasks of the set are not schedulable which means that integers exist such that
which is equivalent to
We can rewrite this latter in the following way:

In addition to, according to Bezout's theorem, pairs of integers such that (), with and (), we obtain
This may be rewritten in
This latter is true if the sum of the lengths of empty intervals between the intervals (which is equal to ) is less than the length of intervals (which is equal to ),that is, which is equivalent to . This concludes the proof of Theorem 2.3.

Theorem 2.3 gives a schedulability condition for the tasks which satisfy the SGCD property (introduced in Definition 2.2). Nevertheless, we need a condition for all tasks whatever their periods are. Unfortunately, this condition does not exist because of the complexity of the problem [13]. As an alternative, we propose the following reasoning.

We choose to give a condition which allows to assign one task to a processor where a set of tasks has already been assigned to. This task is called candidate task and once it is assigned another task, among tasks which are not assigned yet, becomes the candidate task. To be assigned to a processor, the candidate task and tasks already assigned to this processor must be schedulable on the same processor.

First, we grouped already assigned tasks according to the SGCD property into sets and, second, looked for a condition which takes into account the candidate and each set of already assigned tasks.

Before going further we need to introduce the notion of “identical tasks”. Two tasks are said to be identical if they have the same period and the same WCET even though they do not perform the same function.

The following theorem introduce an equation allowing to compute the number of the tasks which are identical to the candidate task and can be assigned to a processor. On this processor, a set of tasks satisfying the SGCD property have already been assigned.

Theorem 2.4. *Let be the candidate task and the set of already assigned tasks which satisfy the SGCD property. The number of identical tasks to the candidate task which can be assigned to this processor is given by
*

*Proof. * that represents the number of identical tasks to the candidate task can be scheduled in one interval of length ./ represents the number of intervals of length in one interval of length .

*Example 2.5. *Let , , and be three tasks already assigned. We look for assigning a task . By using Theorem 2.4, we compute the number of tasks identical to which can be assigned.

First we check that tasks , , , and satisfy the SGCD property..

Then, proves that the condition of Theorem 2.3 is satisfied.

Finally, , which means that 5 tasks identical to can be scheduled on this processor.

Figure 1 shows the 5 intervals where 5 identical tasks to can be scheduled (these intervals are numbered from 1 to 5). We show also intervals where cannot be scheduled whereas they are empty, and for each interval the tasks (among , , and ) cause the nonschedulability, for example, if is mentioned then it means that cannot be scheduled on this interval because one of the instances of and one of the instances of will be scheduled on the same interval which is not allowed.

Let us check, on Figure 1, the result obtained by the previous calculation. We notice that if we divide the time axe in intervals of length equal to and divide each interval in subintervals of length 1, then it is always the fourth subinterval, which is used to schedule . This means that the order established in the first interval of length is repeated and observed.

*Definition 2.6. *We denote by the set of tasks which have already been assigned to a processor. We divide the set into several subsets such that the tasks of each subset and the task satisfy the SGCD property (introduced in Definition 2.2). Each subset is characterized by a greater common divisor (noticed ), and we consider that is the subset with the smallest greater common divisor (which is denoted ). The sum off all execution time of a tasks set is denoted by .

Now in order to know if is schedulable or not on a processor where other tasks have already been scheduled, we apply the following algorithm:(1)choose a processor; (2)set up sets ;(3)compute . This represent the identical tasks to which can be scheduled by taking into account only ;(4)for each subset (), remove to the result obtained in the previous step the number of identical tasks to which cannot be scheduled because of nonschedulability with tasks belonging to (the next theorem gives a way to compute this number);(5)following the reached result, we decide to assign to this processor or try to assign it to another processor (go to 1).

Let () represent the number of identical tasks to that cannot be scheduled because of a non-schedulability with tasks belonging to ().

The following theorem gives an equation to compute .

Theorem 2.7. *Let be the candidate task, the set of already scheduled tasks, and the GCD of all . For (), is given by
**
such that
*

*Proof. * is the number of intervals of length in an interval of length . This number is multiplied by which represents the number of tasks identical to that cannot be scheduled inside an interval the length of which is equal to . And also, in order to take into account the tasks of the other sets, we introduce the value of .

Now we bring together the different results obtained previously to find out an equation which allows to compute the number of schedulable tasks identical to , and hence deduce if is schedulable or not.

Corollary 2.8. *Let be the candidate task and the set of already scheduled tasks. The number of identical tasks to which can be scheduled is given by
**
such that
*

*Example 2.9. *In order to illustrate the proposed method, we propose the following example: let to the candidate task to the schedulability analysis. Let the set of tasks already proved schedulable. In this case, (i)the three subsets that we can set up from the set and the task according to the SGCD property are (1) with the GCD of periods equal to 5,(2) with the GCD of periods equal to 10,(3) with the GCD of periods equal to 30.(ii) From Corollary 2.8,
(iii),
(1)= 3,(2)by the same way , so ,(iv) then , from this, we deduce that is schedulable.

*Remark 2.10. *The previous result allows us to assign several tasks (identical tasks to ) at the same times, and if other identical tasks to are not assigned then it means that these tasks are not schedulable on this processor and they must be assigned to another processor.

*Remark 2.11. *Notice that, throughout this schedulability analysis, in the given schedulability condition nothing is mentioned about precedence constraints, whereas we allow it in the tasks model. Indeed, a system with precedence constraints but without any periodicity constraint is always schedulable. The next theorem demonstrates that for a set of proved schedulable tasks, that is, satisfying periodicity constraints, it always exists a scheduling of these tasks which satisfies precedences between them, whatever the precedences are.

Theorem 2.12. *Let be a set of proved schedulable tasks. Whatever precedence constraints between 's tasks are, it exists, at least, one scheduling which satisfies these precedence constraints.*

*Proof. *Once a set of tasks is proved schedulable, these tasks can be scheduled in different ways or orders. From these orders, at least, one order satisfies the precedence constraints (we remind that tasks are not allowed to be preempted).

###### 2.1.2. Proposed Approaches

As a result of the previous study, we are able to yield an algorithm allowing the assignment of tasks to processors while satisfying precedence and periodicity constraints. Since the corollary condition is monoprocessor and it is applied for the assignment of each task (the test may be done several time until finding the right processor), the execution time of the assignment algorithm can have be long. Hence, we propose three assignment algorithms as follows:(1)greedy algorithm: it starts by sorting tasks following a mixed sort which takes into account both the increasing order and a priority level [14]. Then tasks are assigned without any backtracking;(2)local search algorithm: it uses the condition of the corollary. In order to have an assignment more efficient than the one of the greedy heuristic, we introduce a backtracking process. It is used once a task cannot be assigned to any processor, however, by taking off some assigned tasks from their processors until this task could be assigned and taken off tasks will be assigned to another processors. This backtracking does not change all the assignment but only a part that should not considerably increase the algorithm execution time;(3)exact algorithm (optimal): it uses the condition of the corollary. This algorithm takes advantage of the Branch & Cut exact method [14].

##### 2.2. Unrolling

The unrolling algorithm consists in repeating each task of the graph times, where the period of this task is and is the hyper-period (Least Common Multiple of all period tasks) [15].

When two tasks are dependent and have not the same period, there are two possibilities. If the period of the consumer task is equal to times the period of the producer task, then the producer task must be executed times compared to the consumer task, and the consumer task cannot start its execution until it has received all data from the executions of the producer task. Notice that the produced data differ from one execution of the producer task to another execution; therefore, data are not duplicated. Reciprocally, if the period of the producer task, is equal to times the period of the consumer task then the consumer task must be executed times compared to the producer task. The unrolling algorithm exploits this data transfer mechanism.

##### 2.3. Scheduling

This algorithm distributes and schedules each task of the unrolled graph onto the processor where it has been assigned by the assignment algorithm. In case where the task was assigned to several processors; the algorithm distributes it to the processor which minimizes the makespan. The minimization of the makespan is based on a cost function which takes into account the WCETs of tasks and the communication costs due to dependant tasks scheduled on different processors.

Once a task is scheduled all its instances take start times computed in function of the task period and the number of the instance. In addition, to be scheduled, the first instance of each task must satisfy the condition of the next theorem.

Theorem 2.13. *Let be tasks already scheduled on a processor. The task is schedulable at the date on this processor if and only if
**
such that .*

*Proof. *In order to prove Theorem 2.13, it suffices to prove that two tasks and (and ) are schedulable if and only if
Without any loss of generality, we assume that . We start by showing the sufficiency of the Condition 5. Let us consider time intervals . The first time units of each of these intervals can be allocated for executions of once every intervals, and the remaining time units only or partly for executions of once every intervals. If (2.18) is verified, then the allocated time units suffice to execute and .

In order to prove the necessity of (2.18), let us consider again time intervals . If (2.18) is not verified, then the execution of overlaps the first time units once every time intervals. We also note that the time units of the intervals are used to execute once every time intervals. As gcd, there will be an interval from where and are executed together. Hence, if (2.18) is not verified, the tasks and cannot be scheduled on the same processor. This completes the proof of the theorem.

A complete example of the scheduling heuristic can be found in [14].

#### 3. Performance Evaluation

In order to get the best approach among the three ones proposed into the scheduling heuristic (greedy heuristic, local search, and exact algorithm), we performed two kinds of tests.

The first one consisted in comparing the scheduling success ratio for the three approaches on different systems. This test has been performed as follows: we compute for each system the value of which is the ratio between the number of processors and the number of different and nonmultiple periods. Then, we gather all the systems with the same , and we execute for them the three approaches of the proposed heuristic. Finally, we compute the scheduling success ratio for the systems of each by using the results of the previous step. This method allows us to underline the impact of the architecture in terms of number of processors, and of the number of periods of tasks. The diagram of Figure 2 depicts the evolution of the success ratio according to the variation of . Excluding the exact algorithm which finds always the solution, we notice that the local search heuristic displays an interesting results. In addition, the greedy heuristic is efficient only when which means that its use is relative to the targeted systems.

In the second test, the speed of the three approaches of the proposed heuristic isevaluated by varying the size of the systems (both number graph tasks and number of processors). The diagram of Figure 3 shows that the exact algorithm explodes very quickly whereas the local search heuristic keeps a reasonable execution time. In addition the greedy heuristic, as expected, stands for the fastest one. Notice that the algorithm execution time follows a logarithmic scale.

To perform these tests, we generated automatically tasks graphs taking into account the periodic issues with dependent and nondependent tasks. The content itself of the task has no impact on the scheduling, only its WCET and its period are relevant. Also, we generated systems such that the number of different periods is not large relatively to the number of tasks; however, we generated systems with all the possible cases (multiple and not multiple period) in order to obtain more realistic results. In addition, the architecture graphs were generated according to a star topology meaning that any two processors can communicate through the medium without using intermediate processors (no routing).

#### 4. Applications

The first example is a simple version of a “Visual control of autonomous vehicles for platooning”. This application is developed by several teams at INRIA. It consists in replacing the joystick which is used to drive manually a CyCab, by an automatic controller based on a video camera which gathers the necessary information to identify the vehicle ahead and to guarantee a minimum distance between the two vehicles. In order to simplify the original version, we use two major tasks, the first one includes the camera and image processing and the second one includes the distance controller and the tasks performing moving forward, steering, and braking (Figure 4 shows the window containing the algorithm graph). The architecture is also a simple version of the real one with only two processors (Figure 5 shows the window containing the architecture graph). The separate execution of these two tasks showed that the first task produces a data every 1 second, whereas the second task consumes the data every 10 milliseconds. It means that these two dependent tasks are periodic, thus, nonperiodic real-time scheduling algorithms are unable to perform a scheduling which satisfies the periods and ensures the right data transfer between them. A shared memory could be an alternative but data sizes and memory access costs represent a significant inconvenient. AAA/SynDEx distributes and schedules onto the architecture these two periodic tasks which implies that the second task is repeated ten times for each execution of the first task. Thereby, the data produced by the first task is diffused to the ten repetitions of the second task (see Figure 6). The result is standing for a timing window corresponding to the predicted real-time behavior of the algorithm running on the architecture following the proposed distribution and scheduling algorithm. It includes one column for each processor and communication medium, describing the distribution (spatial allocation) and the scheduling (temporal allocation) of operations on processors, and of interprocessor data transfers on communication media. Here time flows from top to bottom, the height of each box is proportional to the execution duration of the corresponding operation (periods and execution durations are given or measured by the user).

The second example shows another aspect of AAA/SynDEx multiperiodic utilization. It consists in the “Engine Cooling System”, and as for the first example, this application is also a simple version of the original one. This application is composed of two sensors, the first one is a temperature sensor and the second one gives parameters representing the state of the engine. These two sensors are connected to the main task which is the engine control unit, supposed to perform the cooling of the engine. The cooling reads the temperatures sent by the first sensor and combine them with the information representing the operating state of the engine (Figure 7 shows the windows containing the algorithm graph). Thereby, it can predict the change of the temperature and achieve the temperature control (by operating the cooling fan, e.g.). In order to perform more accurate predictions, the main task needs several temperatures which must be taken at equal time intervals and needs also several state information. This is why the temperature sensor is executed every 10 milliseconds (period = 10), the second sensor is executed every 15 milliseconds (Period = 15), and the main task every 30 milliseconds (period = 30). The architecture is composed of two processors similarly to the architecture of the previous example.

Figure 8 shows the result of the adequation applied to the algorithm and the architecture.

On Figure 8 we can observe that the temperature sensor is executed three times. These three temperatures and two data producded by the two executions of the state sensor are sent to the main task. This latter, by receiving all these data, predicts the change of the temperature and achieves the temperature control.

Finally, the user can launch the generation of an optimized distributed macro-executive, which produces as much files as there are of processors. This macroexecutive, independent of the processors and of the media, is directly derived from the result of the distribution and the scheduling. It will be macroprocessed with GM4 using executive-kernels which are dependent of the processors and the media, in order to produce source codes. These codes will be compiled with the corresponding tools (several compilers or assemblers corresponding to the different types of processors and/or source languages), then they will be linked and loaded on the actual processors where the applications will ultimately run in realtime.

#### 5. Load and Memory Balancing

We proposed to improve the new distributed real-time scheduling heuristic in two main directions: (i)minimizing the makespan of distributed applications by equalizing the workloads of processors (even thought a first attempt was carried out in the proposed heuristic),(ii)efficient utilization of the memory resource.

Towards these purposes we proposed, a load and memory balancing heuristic for homogeneous distributed real-time embedded applications with dependence and strict periodicity constraints. It deals with applications involving tasks and processors. Each task has an execution time , a start time computed by the distributed scheduling heuristic, and a required memory amount . The required memory amount may be different for every task. It represents the memory space necessary to store the data managed by the task, that is, all the variables necessary for the task according to their types.

For each processor, the proposed heuristic starts by building blocks from tasks distributed and scheduled onto this processor. Then, each block is processed according to the increasing order of their start times. This process consists in computing the cost function (defined in the next paragraph) for the processors whose end time of the last block scheduled on these processors is less than or equal to the start time of the block , and in seeking the processor which maximizes . Moreover, a block is moved to that processor if the periodicity of the other blocs on this processor is verified, otherwise that processor is no longer considered, and the heuristic seeks again another processor which maximizes . If the moved block belongs to the first category and , then this block will decrease its start time. In order to keep its strict periodicity constraint satisfied, the heuristic looks through the remaining blocks and updates the start times of the blocks containing tasks whose instances are in the moved block. This heuristic is applied to the output of the scheduling heuristic already presented.

The heuristic is based then on a *Cost Function * which is computed for a block initially scheduled on processor and a processor

It combines and the sum of required memory amounts by the blocks already moved to this processor . Notice that is the gain in terms of time due to the move of this block.

A detailed example and a complexity and a theoretical performance studies can be found in [16].

#### 6. Conclusion

We presented a new feature for AAA/SynDEx tool which allows the scheduling of multiperiodic dependent nonpreemptive tasks onto a multiprocessor. The new feature of AAA/SynDEx provides a unique scheduling algorithm since it is based on a schedulability analysis which allows for distributing and scheduling the tasks while satisfying their dependences and their strict periodicity. The analysis proposed here is composed in several stages to reach the main result which is a necessary and sufficient condition we obtain at the end through a corollary. Such schedulability analysis can be used in suboptimal heuristics to find an assignment of the tasks for each processor when partitioned multiprocessor scheduling is intended.

When dependent tasks with two different periods are distributed and scheduled onto two different processors the proposed heuristic handles correctly the interprocessor communications.

Since memory is limited in embedded systems, it must be efficiently used and, also, the total execution time must be minimized since the systems we are dealing with include feedback control loops. Thus, we improved the presented heuristic in order to perform load Balancing and efficient memory usage of homogeneous distributed real-time embedded systems. This is achieved by grouping the tasks into blocks, and moving them to the processor such that the block start time decreases, and this processor has enough memory capacity to execute the tasks of the block.

The study of this paper stands for a first step from a more global work which targets resolving all kinds of periodicity constraints.

#### Acknowledgment

This paper is a part of my thesis works which has been supported by French National institute For Research in Computer Science and Control (INRIA) under the supervision of Yves Sorel.