Abstract
Hardware/software (HW/SW) partitioning is to determine which components of a system are implemented on hardware and which ones on software. It is one of the most important steps in the design of embedded systems. The HW/SW partitioning problem is an NPhard constrained binary optimization problem. In this paper, we propose a tabu searchbased memetic algorithm to solve the HW/SW partitioning problem. First, we convert the constrained binary HW/SW problem into an unconstrained binary problem using an adaptive penalty function that has no parameters in it. A memetic algorithm is then suggested for solving this unconstrained problem. The algorithm uses a tabu search as its local search procedure. This tabu search has a special feature with respect to solution generation, and it uses a feedback mechanism for updating the tabu tenure. In addition, the algorithm integrates a path relinking procedure for exploitation of newly found solutions. Computational results are presented using a number of test instances from the literature. The algorithm proves its robustness when its results are compared with those of two other algorithms. The effectiveness of the proposed parameterfree adaptive penalty function is also shown.
1. Introduction
The embedded systems have become omnipresent in a wide variety of applications and typically consist of application specific hardware components and programmable components. With the growing complexity of embedded systems, hardware/software codesign has become an effective way of improving design quality, in which the HW/SW partitioning is the most critical step.
HW/SW partitioning decides which tasks of an embedded system should be implemented in hardware and which ones should be in software. A task implemented with a hardware module is faster but more expensive, while a task implemented with a software module is slower but cheaper. For this reason, the main target of HW/SW partitioning is to balance all the tasks to optimize some objectives of the system under some constraints [1].
In recent years, there has been an increasing interest in the study of the HW/SW partitioning problem. Several exact approaches [2–5] to the problem have been developed. However, since most of the exact formulations of the HW/SW partitioning problem are NPhard [6], the practical usefulness of these approaches is limited to fairly small problem instances only. For larger instances, a number of heuristic algorithms, both traditional and general purpose, have been proposed.
There are two traditional families of heuristics: the softwareoriented heuristic and the hardwareoriented heuristic. The softwareoriented heuristic starts with a complete software solution and parts of the system are migrated to hardware until all constraints are satisfied. On the other hand, the hardwareoriented heuristic starts with a complete hardware solution and moves parts of the system to software until a constraint is violated.
Up to now, many generalpurpose heuristics or metaheuristics have also been applied to solve the HW/SW partitioning problem. These include simulated annealing [7–9], genetic algorithm [10–12], tabu search [8, 13–15], artificial immune [16], and the KernighanLin heuristic [17, 18]. There are also some other methods for solving the problem; see [19] for details.
It must be mentioned that the HW/SW partitioning problem is to minimize a cost while satisfying some design constraints. In fact, the problem is a discrete constrained optimization problem, and the optimal solutions usually lie on the boundary of the feasible region. Hence, it is necessary to develop effective techniques in handling the constraints.
This paper presents a tabu searchbased memetic algorithm (TSMA) for solving the HW/SW partitioning problem. The TSMA uses an adaptive penalty function to convert the original problem into an equivalent unconstrained integer programming problem; both problems have the same discrete global minimizers. The TSMA is then applied to unconstrained integer programming problem in order to solve the original problem. The TSMA integrates a tabu search procedure, a path relinking procedure, and a population updating strategy. These strategies achieve a balance between intensification and diversification of the search process. The algorithm has been tested on four benchmark test instances in the literature. Experimental results and comparisons show that the proposed algorithm is effective and robust.
The remaining part of this paper is organized as follows. Section 2 gives the system model and problem formulation. Section 3 presents the parameterfree adaptive penalty function used in converting the original problem into the unconstrained problem. Section 3 also presents some theoretical properties of the resulting unconstrained problem. In Section 4, a full description of TSMA is presented. Experimental results on four benchmark test instances are presented in Section 5. Final remarks are given in Section 6.
2. Hardware/Software Partitioning Model and Formulation
The HW/SW partitioning model discussed in this paper is similar to that presented in [16]. In order to make the paper selfcontained, this section briefly reviews the description of the HW/SW partitioning model considered in [16].
2.1. Function Description
The main function of the system is usually described by a highlevel programming language, and then the function will be mapped into a controldata flow graph (CDFG). The CDFG is composed of nodes and arcs [20, 21]. Generally, the CDFG is a directed acyclic graph (DAG). In the DAG, the nodes are determined by the model granularity, that is, the semantic of a node. A node can represent a short sequence of instructions, a basic block, a function, or a procedure [12]. The arcs between nodes denote their relationship. Each node in DAG can receive data from its previous nodes and can send data to its next nodes.
2.2. Binary Integer Programming Formulation
In a directed acyclic graph (DAG) with node set and arc set , every node is labeled with several attributes. The attributes for a node are defined as follows.(i) denotes the cost of in software implementation.(ii) denotes the executing time of in software implementation.(iii) denotes the cost of in hardware implementation.(iv) denotes the executing time of in hardware implementation.(v) is an array, which denotes the set of incoming nodes of and stores the set of corresponding communication times.(vi) is an array, which denotes the set of outgoing nodes of and stores the set of corresponding communication times.(vii) denotes the number of times that node is executed.
Figure 1 [16] shows the partition model used in this paper, where denote the hardware nodes implemented by application specific integrated circuit (ASIC) or field programmable gate array (FPGA). The software nodes are executed in a programmable processor (CPU). All the nodes exchange data through a shared bus, and they share the common memory to store the interim data. All the hardware nodes without dependency can be executed concurrently [16].
After all nodes are determined to be implemented by hardware or software, all the nodes are scheduled. The total executing time can be evaluated by the list schedule algorithm [11, 22], and the cost of the design can be evaluated. The detailed steps of the list schedule algorithm are shown in Algorithm 1.

In this paper, the objective is to minimize the total cost of the system under the time constraint. Since the cost of software is usually negligible, the HW/SW partitioning problem can be described as follows [11]: where means that node is implemented by hardware, denotes the total executing time, and TimeReq denotes the required time which is given in advance by the designer.
Let denote a solution of the problem . indicates that node is implemented by hardware (software). Then problem can be formulated as the following constrained binary programming problem: where denotes the total executing time of which can be evaluated by the list schedule algorithm, that is, Algorithm 1. We suppose without loss of generality that , , , , , , , and TimeReq are nonnegative integers.
3. The Adaptive Penalty Function
In this section, we introduce the basic form of the penalty function for constrained optimization problem and present an adaptive penalty function to convert the problem into an unconstrained binary optimization problem.
3.1. The Basic Penalty Function
A variety of penalty functions have been proposed to deal with constraints. Of them, the exact penalty function and the quadratic penalty function are most basic and widely used [23–25]. Using these penalty functions problem can be converted into the following unconstrained problem: where , is the penalty parameter, and . For (resp., ), the optimization method that uses (1) is known as the exact (the quadratic) penalty function method.
Let , and denote by the feasible solution space of ; that is, . We construct the following auxiliary problem: where , is the penalty parameter, and .
3.2. The Proposed Adaptive Penalty Function
It is a wellknown fact that the penalty parameter in (1) is sensitive and problemdependent. Hence, it is difficult to choose a value for [26] that can be used for a wide range of optimization problems. Therefore, a number of authors [27–31] have suggested parameterfree penalty functions to do away with the sensitive parameter . Following the same reasonings, we propose the following param0065terfree adaptive penalty function: where and is an upper bound on the global minimum value of problem . Since the cost of the system is smaller than , we can initialize . The value of needs to be updated by the current best known objective function value at a feasible solution.
Based on this adaptive penalty function, we construct the following problem:
Definition 1. A solution is called a discrete local minimizer of problem , if , for all , where is a neighborhood of . Furthermore, if , for all , then is called a discrete global minimizer of problem .
Theorem 2. Let be an upper bound on the discrete global minimum value of problem ; then problems and have the same discrete global minimizers and global minimal values.
Proof. Let be a discrete global minimizer of problem ; then , and , for all . Consider two sets and .
For all , by (3), we have , then
For all , we have two cases: (i) and , and (ii) and .
Case (i). By (2), we have
Case (ii). By (2), we have . Since , it holds
From (4), (5), and (6), it is obvious that is a discrete global minimizer of problem .
Conversely, let be a discrete global minimizer of problem , then
Suppose that ; that is, . If , then
where is a discrete global minimizer of problem ; that is, . If , then
Hence,
Since , inequality (10) implies that which means that is not a discrete global minimizer of problem . So , and problems and have the same discrete global minimizers and global minimal values.
Theorem 3. Suppose that is a solution of problem , and . Then is a feasible solution of problem .
Proof. If is an infeasible solution of problem , then . By (2), if , we have . If , then . So, it holds that , which contradicts the condition . Hence, the theorem holds.
4. The Proposed Memetic Algorithm for HW/SW Partitioning
Memetic algorithms (MAs) are a kind of global search technique derived from Darwinian principles of natural evolution and Dawkins’ notion of memes [32]. They are genetic algorithms that use local search procedures to intensify the search [33]. MAs have been applied to solve a variety of optimization problems [34–40].
In this section, a tabu searchbased memetic algorithm, TSMA, is presented to solve the parameterfree unconstrained problem . First, a general framework of TSMA is presented, followed by detailed descriptions of various components of the algorithm.
4.1. General Framework of the Hybrid Memetic Algorithm
The steps of the general framework of TSMA for solving is shown in Algorithm 2. Throughout its execution, the TSMA updates the upper bound with the function value at the current best solution .

Before the TSMA begins, its iterative process performs three main steps; see Algorithm 2. In step 1, it initializes the first upper bound of with the known trivial solution where , . In step 2, it generates the initial random population set of size ; that is, , , . In step 4, the TSMA refines each member of the initial population by the local search, the tabu search, and treats as the starting population set.
As the iteration process begins, the TSMA repeatedly performs three main steps: crossover followed by tabu search, path relinking, and the updating of . In step 10, a new solution is identified using the crossover operation followed by tabu search. In step 17, the path relinking procedure may be applied to in an attempt to generate an improved solution from . Finally, in step 22, the population updating process is performed to decide whether and which solution should be replaced, provided that such an improved has been generated. The remaining steps of Algorithm 2 are used to update and . This iterative process repeats itself until some stopping conditions are met.
The main iterative components of TSMA, the crossover followed by the tabu search procedure, the path relinking procedure, and the population updating strategy are explained in the subsequent subsections.
4.2. Crossover Operator
We adopt the fixed crossover operator [41] to generate a new offspring. First, a pair of individuals are selected randomly from the population. Next, if two selected individuals have the same bit value, their offspring inherits it. Otherwise, it takes value 0 or 1 randomly. Suppose that and are two selected individuals from the population. The pseudocode of the crossover operator is given in Algorithm 3.

4.3. The Tabu Search Procedure
Tabu search (TS) is a metaheuristic optimization that has been applied successfully to solve a number of combinatorial optimization problems [42–45]. It combines search strategies that are designed to avoid being trapped in a local minimizer and to present revisiting recently generated solutions [46]. The neighborhood structure and the tabu tenure with which TS is implemented are fundamental to achieving the above objectives. A new solution is generated from the existing solution , within a defined neighborhood of , by the operation called a “move.” The solution is not generated again for a number of iterations (tabu tenure).
The tabu search procedure presented here differs mainly in solution generation. While we preserve the tabu tenure feature of TS, we do away with the neighborhood structure in generating from . The solution generation procedure has been suggested here to conform to the problem at hand and to make the search greedier. In particular, a sequence of flips (each flip returns a solution) on the (allowed) components of are carried out. The best solution from the sequence is considered as . More specific to the problem considered, a flip gain of a node (node corresponds to the variable ) is defined as follows: A positive gain is a gain when the objective value of problem decreases if the component (variable) is flipped from the current value to its complement; that is, if , then its value is changed to . Therefore, for each variable we keep an associated tabu tenure which prevents from being flipped again until diminishes (). Hence, the th gain () is calculated using (11) when it is permitted by the tabu tenure or some aspiration criterion is satisfied.
At the beginning of the tabu search procedure, () are initialized by the user and if is flipped the corresponding is updated, as suggested by Cai et al. [42], as follows: The updating rule (12) is called the feedback mechanism in Cai et al. [42], as this updating prevents from being too large or too small.
We have proposed a simple aspiration criterion that permits a variable to be flipped in spite of being tabu if it leads to a solution better than the best solution produced by the tabu search procedure thus far. There are a number of parameters which are needed to be initialized. These are the maximum numbers of iterations, , and the values (). The tabu search procedure we have suggested here is now summarized in Algorithm 4. Notice that the index in step 4 denotes that the node is not in the tabu list () or when the aspiration criterion is met (which is a positive gain). The tabu search procedure process stops when of iterations is reached and returns the best solution found in the process.

4.4. Path Relinking (PR) Procedure
The path relinking technique was originally proposed by Glover et al. [47] to explore possible trajectories connecting high quality solutions obtained by heuristics. It has been applied successfully to solve a number of optimization problems such as data mining [48], unconstrained binary quadratic programming [49], capacitated clustering [50], and multiobjective knapsack problem [51].
The PR procedure presented here explores solution trajectories using two good solutions. Within the iteration process of TSMA, there are always two such solutions: the current best solution and the produced by the tabu search procedure. When , PR is executed in an attempt to improve the current best solution . At the beginning, PR identifies the number of variables whose values differ in and . The indices of these variables are then kept in a set . PR then starts its iteration process. For each index in an iteration is executed. PR stops if an improved has been found during an iteration.
An iteration starts with identifying the index corresponding to the highest gain; that is, . Let the solution corresponding to the flipping of be . The following tasks are then performed to see (a) whether PR has obtained a new best solution or (b) whether PR is approaching towards the current .(a)If , then PR stops as it has found a new best solution.(b)Otherwise, is modified by where . The index is then removed from .PR then proceeds with next iteration for the next index in . The pseudocode of the PR is illustrated in Algorithm 5.

4.5. Population Updating Strategy
A combination of intensification and diversification is essential for designing high quality hybrid heuristic algorithms [52, 53]. We have presented how the TSMA implements its search intensification strategies via the tabu search procedure and PR; we now present its search diversification strategies. This is achieved by updating the population set with the solution obtained by the tabu search procedure or PR. In this subsection, will be referred to as the trial solution.
Like any other algorithm [35, 38], the TSMA uses its population updating to cater for both quality and diversity of the member of . A measure is needed to decide whether the trial solution should be inserted into and if so which solution should it replace. To this end, the following function is suggested when : where is the current best solution and is a parameter, which balances the relative importance of the solution quality and the diversity of the population. The function , , is defined as where
In the updating process, we want to achieve two objectives: solution quality and population diversity. At early stages of TSMA, points in are scattered and diversely distributed. Therefore, a further diversification is not needed. Hence, the decision whether and which solution deletes should be based on the contribution of the function value in and not purely be based on the location of in the solution space with respect to the points in (i.e., density of the points ). Notice that this always holds in (13) when . For ( is likely to be far from ), in (13) is likely to dominate in .
At later stages of TSMA, points in are likely to be clustered together. Hence, diversification of the points in is needed. Clearly, it is likely that (for being more likely to be close to ) is small ensuring less contribution from it and more from in .
With the functional form (13) of , we now device a strategy to achieve both objectives. Central to this strategy is the comparison of measure values, namely, , , . Here (resp., ) represents a measure with respect to the function value at (resp., the function value at ) and its distance (sparsity) with respect to (sparsity of with respect to ). The point with the minimum measure is then identified; denote it by . In particular, we find (or , for all , when ). When , we update as .
On the other hand, if , then is updated with with a small probability. This is because inclusion of in with probability one will reduce the diversity of . In particular, update is carried out with small probability, where , for all . Here is replaced to increase the diversity in . The pseudocode for the population updating procedure is presented with Algorithm 6.

Finally, we present the time required to update the set . It needs to calculate the distance between and . The distance can be calculated in time , and it needs to calculate . Moreover, it takes time to find the solution with the smallest and the second smallest values in , . Therefore, the total running time of the population updating procedure is bounded by .
4.6. Termination Criteria
In this paper, we use two criteria to stop TSMA. If the maximum number of generations is reached, then we stop the algorithm. On the other hand, when the quality of the best solution from one generation to the next is not improved after generations, then we stop the algorithm. Therefore, the TSMA stops when any of these two criteria is met.
5. Computational Experiments
In this section, we present experiments performed in order to evaluate the performance of TSMA. The benchmark test instances used are introduced followed by the parameter settings for TSMA. Finally, computational results, comparisons, and analyses are reported. The algorithm is coded in C language and implemented on a PC with 2.11 GHz AMD processor and 1 G of RAM.
5.1. Test Instances
Task graphs for free (TGFF) [54] can create directed acyclic graphs (DAGs), that is, task graphs. Given identical parameters and input seeds, it can generate identical task graphs. We have used TGFF (Window Version 3.1) to generate four DAGs as hardware/software partitioning test instances. These DAGs were also used to test the performance of the artificial immune algorithm (ENSAHSP) [16]. The TGFF (Window Version 3.1) can be obtained at the website (http://ziyang.eecs.umich.edu/~dickrp/tgff/), and the parameters are given in the Appendix. Since these DAGs do not contain cycles, all nodes in each DAG are executed only once; that is, .
The characteristics of these test instances are given in Table 1. These are the times when all nodes are implemented by software (TimeSw), the time when all nodes are implemented by hardware (TimeHw), the required time (TimeReq), and the cost when all nodes are implemented by hardware (CostHw).
5.2. The Parameter Settings
The tabu search procedure requires parameters , and . We have used , , and . These values are different from those used by Cai et al. [42] for unconstrained binary quadratic programming. Justifications for these choices are that the tabu search procedure proposed here is different from [42], and the scale of test problems used in this paper is smaller than that of the test problems used in [42]. Moreover, we have chosen these values based on our numerical experiments.
An important parameter of TSMA is the size of the population set . The bigger the value of is, the higher the CPU time is. This is because operations are needed to update . The parameter in (13) is used by TSMA to diversify , and as such it has an important significance in the performance of TSMA. We have chosen the values of these two parameters after some numerical experiments.
We first conduct a number of preliminary experiments using the test instance number 3, that is, the instance with 90 nodes. We have fixed , , , , and and run TSMA with various values of and . This experiment was conducted to get suitable ranges for both parameters.
With the values of , , , and chosen as above, we have then conducted another series of runs of TSMA using the test instance with 90 nodes. We have used . For each value of , six values of , that is, , are used to constitute pairs. Hence, there are 24 pairs. For each pair, TSMA was run 5 times, making a total of 120 runs. The average results of 5 runs on each pair are reported in Table 2.
The results in Table 2 show that and are suitable values to choose. Hence, we have used these values throughout the numerical experiments. We present all parameter values used in TSMA in Table 3, where the column 2 presents the subsections where the corresponding parameter has been discussed.
5.3. Numerical Comparisons
In order to show the effectiveness of TSMA, we compare its results with those of the evolutionary negative selection algorithm (ENSAHSP) [16] and traditional evolutionary algorithm (EA) [16]. TSMA was run 20 times on each test instance in Table 1; hence, average results are used in the comparison. Results of ENSAHSP and EA have been taken from [16]. Comparisons of results are presented in Table 4, where columns under the captions “best,” “mean,” and “worst” report the best cost, the average cost, and the worst cost, respectively. The column under the caption “time” reports the average CPU times for TSMA. We are unable to present the CPU times for the other two algorithms as this would require running these algorithms ourselves, since the CPU times of EA and ENSAHSP were not reported in [16]. The last column of Table 4 reports the percentage of improvement on the best results by TSMA. The results obtained by all three algorithms on the first 4 test instances are further summarized in Figures 2, 3, 4, and 5.
Figures 3–5 clearly show the superiority of TSMA over the other two algorithms. Wide differences in costs are noticeable in these figures. Indeed, it can be seen from Table 4 that the TSMA has achieved 84, 65, and 176 less costs than the corresponding previous best obtained by ENSAHSP for the instances numbers 2, 3, and 4, respectively. These improvements are quite significant.
5.4. Effectiveness of the Adaptive Penalty Function
In order to show the effectiveness of our proposed parameterfree adaptive penalty function, we have replaced the adaptive penalty function in (2) with the exact penalty function in (1) and kept other ingredients of TSMA including the parameter values in Table 3 intact. We denote this implementation of TSMA by TSMA2. Since the parameter in is problemdependent, we took , 100, and 200 and run TSMA2 20 times on each of the test instances, for each value. Average results including average CPU times (in seconds) are summarized in Table 5. A number of observations and comparisons using the results in Table 4 are now presented below.(1)Dependency on : it can be clearly seen in Table 5 that results are very much dependent on values. While results are comparable on instance number 1 for all values, results for the other instance are not the same. To give an example, in terms of the “mean” value, the best result was obtained for on instance number 2, while, for the instance numbers 3 and 4, the best “mean” values were obtained using .(2)Comparison using “mean”: The TSMA performs better than TSMA2 for all instances, except for instance number 1 for which both perform the same.(3)Comparison using “best”: The TSMA performs better than TSMA2 on the last instance while both are equally matched on the remaining instances. TSMA2 obtained these results for instances 1, 2, and 3 for .(4)Comparison using “time”: The best CPU times are different for different values in TSMA2. However, even if we take the best CPU times obtained by TSMA2 disregarding the values then the results are still comparable between TSMA2 and TSMA.
A statistical view in terms of boxplot is now shown in Figures 6, 7, and 8 using optimal values obtained by TSMA and TSMA2 over 20 runs. Three figures are presented since the standard deviation is zero for instance number 1. These boxplots show that ranges over which the best results obtained over 20 runs vary. All three figures clearly show that the boxplots of TSMA has less height than those of the corresponding boxplots obtained by TSMA2 for three values. This clearly establishes the superiority of the adaptive penalty function presented in (2) over the exact penalty function (1).
6. Conclusion
In this paper, we have presented a memetic algorithm for the hardware/software partitioning problem. The algorithm has three main components: a local tabu search, a path relinking procedure, and population updating. The local tabu search is a “controlled local search” procedure used for search intensification. The path relinking procedure uses further exploitation of solutions and population updating is used for diversification of solutions. Motivations for each of these components within the proposed memetic algorithm are provided.
The hardware/software partitioning problem is converted into an equivalent unconstrained binary optimization problem by employing a parameterfree adaptive penalty function. We have motivated this penalty function first by theoretically and then by numerically comparing its performance with the exact penalty function. The memetic algorithm is then applied to this unconstrained problem, and robustness with respect to the quality of optimal solution is shown by comparing it with another two algorithms from the literature.
Appendix
In this paper, TGFF (Windows Version 3.1) used the following parameter settings [16] to generate test instances, where 30.tgffopt is for the DAG with 30 nodes, 60.tgffopt is for the DAG with 60 nodes, 90.tgffopt is for the DAG with 90 nodes, 120.tgffopt is for the DAG with 120 nodes, 300.tgffopt is for the DAG with 300 nodes, and 500.tgffopt is for the DAG with 500 nodes. See Algorithm 7.

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported partially by the National Natural Science Foundation of China under Grants 11226236, 11301255, and 61170308, the Natural Science Foundation of Fujian Province of China under Grant 2012J05007, and the Science and Technology Project of the Education Bureau of Fujian, China, under Grants JA13246 and Jk2012037.