Electrical Engineering Department, Faculty of Engineering - Port-Said, Suez Canal University, Port Fouad 42523, Port-Said, Egypt
Instruction scheduling is an optimization phase aimed at balancing the performance-cost tradeoffs of the design of digital systems. In this paper, a formal framework is tailored in particular to find an optimal solution to the resource-constrained instruction scheduling problem in high-level synthesis. The scheduling problem is formulated as a discrete optimization problem and an efficient population-based search technique; particle swarm optimization (PSO) is incorporated for efficient pruning of the solution space. As PSO has proven to be successful in many applications in continuous optimization problems, the main contribution of this paper is to propose a new hybrid algorithm that combines PSO with the traditional list scheduling algorithm to solve the discrete problem of instruction scheduling. The performance of the proposed algorithms is evaluated on a set of HLS benchmarks, and the experimental results demonstrate that the proposed algorithm outperforms other scheduling metaheuristics and is a promising alternative for obtaining near optimal solutions to NP-complete scheduling problem instances.
1. Introduction
High-level synthesis (HLS) is the
process of generating the register transfer level (RTL) design from the
behavioral description of the digital system [1–3]. In HLS, the description of the system is represented
in an abstract form usually a data flow graph (DFG), and this representation is
transformed and mapped onto architectural elements from a library of resources.
The synthesis process involves three major tasks: instruction scheduling,
allocation, and binding. Instruction scheduling (IS) is concerned with mapping
the operations in the behavioral description to control steps without violating
the precedence relations between them.
Instruction scheduling is an
important optimization phase aimed at balancing the cost and performance
tradeoffs in the design of digital circuits. It can be considered the most
important step during the architecture synthesis [3–6]. Most of the practical formulations of the
scheduling problem are NP-complete. A variety of heuristics exist in the
literature in order to find optimal or near optimal solutions for the scheduling
problems. Those algorithms that do not find the optimal solution often trade
good solutions for better performance and time considerations.
Optimal scheduling
algorithms have been typically based on integer linear programming (ILP) [1, 7].
The major limitation with ILP and other exact scheduling algorithms is that the
applicability of these algorithms is limited to small problem sizes as they
tend to be lengthy and intractable for large
and difficult problems. In order
to handle bigger problem instances, heuristic scheduling algorithms with
polynomial runtime complexity have been proposed. These algorithms rely on
heuristics to remove the examination of parts of the search space that appear
fruitless. In the early days of scheduling in
HLS, simple techniques like
as soon as possible (ASAP) or as late as possible (ALAP) schedules were used
pervasively [6].
Both of ASAP and ALAP
schedules suffer from the inability to exploit the available resources as they
employ local selection criteria for the next step. Several scheduling
heuristics utilize more global criteria and thus try to overcome the
shortcomings of ASAP and ALAP. These
algorithms include list scheduling [3, 5], forced-directed scheduling (FDS) [5, 6], path-based scheduling [8], genetic algorithm [4, 9, 10], tabu search,
simulated annealing, and bipartite graph matching [7].
Among these heuristics, list scheduling is the
most popular because of its simplicity, low run time, and capability to produce
good scheduling results for small-size problems. In list scheduling, operations
are sorted in a list in a topological order (top to bottom) using the precedence
information dictated by data or control dependencies [3, 5, 6]. The sorted operations
are then iteratively scheduled into control steps. When a resource conflict
occurs due to insufficient hardware, one or more operations are deferred by one
control step. The selection of the deferred operation(s) is determined by a
local priority function that is applied to all operations that could be
scheduled in the current control step. A number of alternatives are possible
for the priority function utilized in list scheduling. Popular choices include instruction
depth, successor number, and instruction mobility. The mobility of an instruction is simply the
difference between its
ASAP and ALAP schedules. Although list schedulers are fast, they are in many
cases not sufficient because
their performance depends heavily on the node order in the priority list and
very often may lead to disappointing results. Therefore, several research
efforts were carried out to incorporate list schedulers together with other
heuristics to find good node orders for the list scheduler.
Utilizing evolutionary searching techniques to find good
node order for list scheduler has been investigated in the literature. Genetic
algorithms were used in [4, 10], and
the problem was modeled as a multiobjective optimization problem that searches
the solution space for the best order of the operations. Another attempt was done in [11], where the ant system (MMAS)
optimization was used for the same purpose. Experimental results show that evolutionary-based
searching techniques effectively search the solution space and produce
excellent quality solutions. Encouraging results from previous work motivated
this research effort that tends to explore the effectiveness of using the PSO algorithm
in the same domain.
In this paper, an instruction scheduling algorithm using the
PSO approach is presented. The algorithm utilizes a novel approach that employs
PSO in conjunction with list scheduling to solve the resource constrained instruction
scheduling (CIS) problem. The local and global search heuristics in PSO are iteratively
adjusted making it an effective technique for exploring the discrete
solution space for an optimal solution. Compared with a number of different
scheduling heuristics, the proposed algorithm was found to generate better scheduling
results in most of the tested benchmarks.
The main contributions
of this work are
(i)formulation of the resource constrained instruction scheduling problem
as a discrete optimization problem,(ii)proposing a new hybrid scheduling algorithm that combines a discrete implementation
of PSO and list scheduling algorithm,(iii)evaluating the proposed algorithm using a set of HLS benchmarks and
comparing its performance with existing scheduling heuristics.
The rest of this paper is organized as follows. An
overview of particle swarm optimization is presented in Section 2. In Section 3,
the constrained instruction scheduling problem is formally formulated as a
discrete optimization problem. The proposed PSO-based instruction scheduling algorithm
is explained in Section 4. Finally, Sections 5 and 6 attain the
experimental results and conclusions.
2. Particle Swarm Optimization
Particle swarm optimization (PSO) proposed
by Dr. Eberhart
and Dr. Kennedy
in 1995 is a computational paradigm based
on the phenomenon of collective intelligence inspired by the social
behavior of bird flocking or fish schooling
[12–17]. Computation in PSO is based on a population
(swarm) of processing elements called particles. Each particle represents a
candidate solution and is identified with specific coordinates in the D-dimensional
search space. The position of the ith particle is represented as . The velocity of the
particle (rate of the position change between the current position and the
next) is denoted as . The fitness function is evaluated for each particle in the swarm and is compared
to the fitness of the best previous result for that particle and to the fitness
of the best particle among all particles in the swarm. After finding the two
best values, the particles evolve by updating their velocities and positions according
to the following equations: where ,
and is the
size of the swarm; is the particle best
reached solution, and is the global best solution in the swarm. and are cognitive and social parameters that are bounded
between 0 and 2. and are two
random numbers, with uniform distribution U(0,1). is the maximum velocity that bounds the velocity
vector, where . In (1), the first component
represents the inertia of pervious velocity. The inertia weight is a factor used to control the balance of the search algorithm between
exploration and exploitation; the second component is the “cognitive”
component representing the private experience of the particle itself; the third
component is the “social” component, representing the cooperation
among the particles. The recursive steps will go on until we reach the
termination condition (maximum number of iterations).
However, the original
intent of PSO was to graphically simulate the choreography of a bird block or a
fish school. It was found that particle swarm model can be used effectively as
an optimizer in many domains such as training artificial neural networks, linear
constrained function optimization, wireless network optimization, data
clustering, and many other areas where GA can be applied [12–14, 17, 18].
3. Problem Formulation
The input to the resource-constrained instruction
scheduling algorithm is a directed acyclic graph called the data flow graph
(DFG) defining the instructions and the dependencies among them [3, 11]. A DFG
is denoted by the ordered-pair ,
where each node represents an operation that must
be performed, and the edge denotes a dependency between nodes and .
Edges provide a partial order on the
nodes such that an edge between nodes specifies when nodes can execute relative
to each other. In addition to the
DFG, we have a resource library of R different resource types, where gives the number of available units
of resource type .
A resource-constrained scheduling algorithm tends to
schedule the instructions into the control steps such that the execution time
of these instructions are minimized without violating constraints imposed by
the available resources [11]. The resource-constrained scheduling problem can
be formally modeled as a discrete optimization problem as follows.
Minimize “schedule length (SL)”
subject to the following constraints:(i)data dependencies imposed by the DFG must be obeyed. Let and be two operations in the DFG and is
the parent of . Then, the control step in which starts must be later than the finish time of .;
that is, ;(ii)at any control step, the number of active operations of any type must be
less than or equal to the number of available resources of that type; that
is, at any given control cycle, the number of resources used is constrained by , for every .
As mentioned in Section 1, the effectiveness of a list scheduler depends
mainly on the method used to compute the input priority list [5, 6]. There
exist many different heuristics on how to order the list; however, the best
list depends on the structure of the input application. A priority list based
on a single heuristic limits the exploration of the search space for the list scheduler.
In this work, we address this problem in an evolutionary manner. The proposed
two-phase algorithm incorporates the PSO approach customized for the discrete
search space and the traditional list scheduling algorithm. The algorithm searches
for the optimal solution by an iterative evolutionary searching process. In the
first phase, PSO is used to generate particles that traverse the DFG to
construct individual instruction lists using global and local heuristics; these
lists will be used as priority lists for the second phase. In the second phase,
instruction lists are scheduled using a traditional list scheduler, and the quality
of schedules is evaluated based on the required fitness function. The fitness
function is directly linked to the generated schedule length. The fitness of
particle is the schedule length when its position vector
is used as the node
schedule list; that is, fitness of . Based on this
evaluation, the PSO heuristics are adjusted to favor better solution
components.
In order to use PSO
for instruction scheduling problems, a direct correlation must be found between
the particle vector and the solution of scheduling problem. Suppose we would
like to use a swarm of particles to solve the CIS problem whose
position vectors represent feasible solutions. Each position vector is a node
list satisfying the DFG precedence order. This vector will be used as a
priority list for subsequent list scheduling algorithm to generate schedules
for all particles in the swarm. The quality of the schedule generated by each
particle will be ranked according to the fitness function that ranks the favor of a particle in
the population. In CIS, this can be modeled as a minimization problem where the
result of the fitness function must reflect the length of the final schedule
that a member of the population generates.
Since 1995, PSO has proven to be successful in solving many continuous
optimization problems. The first use of PSO in discrete optimization was for
solving the traveling salesman problem TSP [19]. The main issue was to modify the
position and velocity vector (1) and (2) in the original PSO algorithm to span
the discrete search domain. The swap operator was defined in [19] and
subsequently used in [18, 20]. Consider a normal particle for the CIS
problem, where is a vector of nodes. If a new solution is obtained by exchanging node and node in solution , we define the operation as a swap operator, denoted
as and the swap process as ; for example,
A swap sequence SS is made up of one or more swap operators; that is, The order of the swap
operator in SS is important as the swap operators in the swap sequence
act on the solution in order. This can be described by the following formula:
According to (2), a
new position vector is obtained by imposing the velocity vector to the old
position vector. The velocity vector can be defined as a swap sequence SS acting on the position vector representing the node scheduling list. The
velocity vector evolves according to the formula given in (1). The component in (1) means
the basic swap sequence SS that should act on to get to ; . We can swap the nodes in according to from left to right to get SS. This is
also true for (). For example,consider , and . The third element in first element in , then the first swap operator to
operate on is SO(1,3); that is,
. Similarly, the second swap operator is SO (2,3),
and . The third swap operator to
operate on is SO (4,5) producing where = . Finally, we get the basic swap
sequence
The new velocity
vector consists of three SSs, the old velocity vector , , and The three swap sequences can be merged into a new equivalent swap sequence.
Suppose , , act
on one solution in this particular order, first, second, and third, to get a new
solution . This is equivalent to a new swap sequence described
as follows: Assuming that the inertia
weight factor , the new position and velocity evolution equations
in the discrete domain can be rewritten as follows:
4. Proposed PSO-CIS Algorithm
The specific steps
of the PSO-CIS algorithm are presented in Figure 1. The inputs to the algorithm
are the input DFG () and the resource library (). The output of
the algorithm is the instruction schedule. The process begins by generating an
initial group of candidate solutions as particles. Each particle is a
node list generated by topology sort on the input DFG. The performance of the
candidate solutions is evaluated using a list scheduler. The best previous
position of the th particle should be put in ,
and the best position among all the
particles should be put in . The next generation of particles is produced
by updating the position and the velocity vectors of the particles. The evolutionary
cycle will repeat until the maximum number of iterations is reached.
Figure 1: PSO-CIS algorithm.
5. Experimental Results
In this section, we present the experimental results for
the PSO-CIS algorithm described in Section 4. The proposed algorithm was implemented in C++, and experiments were
carried out on an Intel Xeon-based
Linux machine. The performance of the algorithm was compared to the results
obtained from three heuristic scheduling algorithms reported in [11]. In
addition, the performance of the PSO-CIS algorithm was compared to an optimal ILP-based scheduler solved by
CPLEX [7, 11]. For the consistency of the comparison, the same test
conditions were used as in [11]. Five different types of functional units are
included in the resource library, which are ALUs (a), fast multipliers (fm),
multipliers (m), input units (i), and output units (o). The functionality of
these resources overlaps; that is, each instruction can be performed by at
least one resource type. It is assumed that
additions and subtractions can be mapped to ALUs and last 1 cycle, whereas
multiplications are mapped to multipliers and take two cycles.
Six high-level syntheses DFG benchmarks were used to
evaluate the proposed approach, namely, a second order differential equation
solver (HAL), an autoregressive lattice filter (ARF), an elliptic wave filter (EWF), a finite impulse response filter (FIR), and
two implementations for computing the discrete cosine transform (COSINE1) and (COSINE2).
The number of nodes in these DFGs ranges
from 21 to 82 nodes, and they present a set of challenging testing samples for
the instruction scheduling problem that is widely used in the HLS literature [5, 6, 21]. A sample DFG for the elliptic wave filter is presented in Figure 2 [21].
The DFG consists of a set of 34 nodes that perform two types of operations, addition
or multiplication (marked by “*” in Figure 2), submitted to over 47 precedence
constraints.
Figure 2: DFG of elliptic
wave filter benchmark.
For each benchmark, we run the proposed
algorithm under the constraints imposed by the predefined set of resources. A
population of 10 particles is generated. The cognitive and social
components in the heuristic are balanced by setting parameters , and as follows: , and are randomly selected with uniform distribution from the
interval [0, 1]. The use of randomly generated coefficients for and balances between the exploration and
exploitation of the search space and prevents premature
convergence to local minima which presents a critical issue in all evolutionary
algorithms. The maximum number of iterations is set to 100. For each benchmark, 5 runs are conducted, and
the minimum schedule length is reported at the end of each run. The average
value of the 5 runs is reported as the performance of the algorithm for this
benchmark under these particular test conditions.
Experimental results are summarized in Table 1.
The second column in the table indicates the available resources for each test
case. Results from the proposed algorithm are compared to the results reported in
[11] which are obtained
from the mobility-based list scheduling algorithm (MB-list scheduling), the force-directed
scheduling algorithm (FDS), and the mobility-based MAX-MIN ant system (MB-MMAS). As shown in Table 1, the
proposed algorithm generates near optimal schedules consistently over most testing
cases. In all tested benchmarks, the PSO-CIS
and the MMAS algorithms are found to have comparable performance that
outperforms the traditional list scheduler and the force-directed scheduling
algorithms. The proposed algorithm
achieves up to 10.52% performance enhancement compared to the FDS and list
scheduling algorithms. The algorithm also finds a solution for the COSINE1 and COSINE2 benchmarks, where the CPLEX fails to find an optimal solution before
running out of memory.
Table 1: Schedule length
in cycles generated by different scheduling algorithms.
The reason behind
the results is due to the fact that PSO-CIS and MB-MMAS algorithms are both
based on list scheduling, and utilize an evolutionary-based searching approach
for constructing a better scheduling list. The combination of different methods
for producing the scheduling list and for generating the actual schedule is the
key factor in improving the overall schedule quality produced by these hybrid
algorithms compared to the stand alone list scheduler and the force-directed
scheduling algorithms.
Detraction from
these encouraging results is the time complexity of the PSO-CIS algorithm. For
all the benchmarks, the runtime of the PSO-CIS algorithm ranges from 0.12
second to 1.66 second. This is close to the run time obtained by the MB-MMAS
which ranges between 0.1 and 1.76 second. List scheduling is always the fastest
due to its one-pass nature with complexity , where is the
number of operations to be scheduled. It typically finishes within a small
fraction of a second. The force-directed scheduler runs much slower than the
list scheduler, because its complexity is . The proposed
algorithm utilizes list scheduling, and repeatedly performing this operation to
evaluate all particles in the population increases the time complexity. If the
number of iterations is linear with respect to the number of nodes in the DFG,
the complexity becomes . Increasing the population size also
results in similar complexity increase. An interesting extension to this work will
be the exploration of other forms of scheduling algorithms that fit within this
framework and possibly reduce the time complexity of the algorithm.
6. Conclusion
In this paper, a novel instruction scheduling algorithm utilizing
particle swarm optimization was presented. PSO is used in conjunction with the
traditional list scheduling algorithm to minimize the total schedule time in
resource-constrained scheduling problems. The algorithm is based on the phenomenon of collective intelligence as
a group of particles use local and global search heuristics to explore the
discrete search space for an optimal solution. The solutions are iteratively constructed
from efficient operation swaps guided by a fitness function that ranks the quality of the solution in the
population. Experimental results over a set of HLS benchmarks demonstrate that PSO is
an efficient method for solving the discrete problem of the instruction
scheduling. The proposed algorithm produced better schedules in most test cases
compared to other instruction scheduling heuristics.