Scientific Programming

Volume 2016 (2016), Article ID 6382765, 11 pages

http://dx.doi.org/10.1155/2016/6382765

## An Efficient Technique for Hardware/Software Partitioning Process in Codesign

Department of Electrical Engineering, National Institute of Applied Sciences and Technology, Polytechnic School of Tunisia, Advanced Systems Laboratory, B.P. 676, 1080 Tunis Cedex, Tunisia

Received 27 January 2016; Accepted 10 May 2016

Academic Editor: Michele Risi

Copyright © 2016 Imene Mhadhbi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Codesign methodology deals with the problem of designing complex embedded systems, where automatic hardware/software partitioning is one key issue. The research efforts in this issue are focused on exploring new automatic partitioning methods which consider only binary or extended partitioning problems. The main contribution of this paper is to propose a hybrid FCMPSO partitioning technique, based on Fuzzy C-Means (FCM) and Particle Swarm Optimization (PSO) algorithms suitable for mapping embedded applications for both binary and multicores target architecture. Our FCMPSO optimization technique has been compared using different graphical models with a large number of instances. Performance analysis reveals that FCMPSO outperforms PSO algorithm as well as the Genetic Algorithm (GA), Simulated Annealing (SA), Ant Colony Optimization (ACO), and FCM standard metaheuristic based techniques and also hybrid solutions including PSO then GA, GA then SA, GA then ACO, ACO then SA, FCM then GA, FCM then SA, and finally ACO followed by FCM.

#### 1. Introduction

The hardware/software partitioning process presents the crucial task of the codesign methodology. It is concerned to decide which functions are to be implemented in hardware components and which ones in software components. This partitioning process aims at finding an optimal trade-off between conflicting requirements to improve the system performance.

Recently, different optimization methods have been undertaken to automate the hardware/software partitioning process.

These optimization methods can be split into exact and heuristic methods. The exact methods, such as Integer Linear Programing (ILP) [1], dynamic programming [2, 3], and branch-and-bound [4], work effectively for smaller graph with several tens of nodes. However, the heuristic methods produce near-optimal solutions even for larger inputs. The heuristic methods can, also, be iterative or constructive. The iterative methods such as PSO [5], Genetic Algorithm (GA) [6], Ant Colony Optimization (ACO) [7], Simulated Annealing (SA) [8], Fiduccia-Matteyeses [9], Kernighan/Lin [10], and Tabu Search (TS) [11] attempt to modify a given solution until no improvement can be done. However, the constructive methods, such as greedy and hierarchical clustering [12], generate a small number of solutions starting from an initial partitioning by selecting and adding components to the partial solution until a complete solution is obtained.

Designers of embedded systems focus on achieving more optimal partitioning solutions by emphasizing a combination between existing optimization methods. They propose to combine partitioning algorithms in order to generate optimal solutions of partitioning in a reduced time. In the literature, designers focus on combining the GA and the TS algorithms [13], the PSO and the Branch-and-Bound algorithms [14], the GA and the SA algorithms [15], and the GA and the PSO algorithms [16]. The given results prove that these combinations produce more accurate solutions than the classical algorithms in terms of cost and execution time metrics. In [17], the authors consider the reliability as a factor when solving the partitioning problem, in addition to the cost and time metrics. They propose to combine the recursive and the linear programming algorithms.

Constructive algorithms are usually suggested to be integrated with iterative algorithms to increase the quality of the generated solution. For example, the authors in [18] propose an algorithm based on clustering algorithm to make the GA algorithm better in bigger-scale embedded system. The proposed algorithm overcomes the shortcoming that GA algorithm execution time is too long to achieve good results in system partitioning.

In this work, a new hybrid method combining clustering FCM algorithm and the PSO algorithm called “FCMPSO” algorithm is proposed. Experimental results indicate that the FCMPSO algorithm is superior to GA, SA, ACO, FCM, and PSO standard algorithms and PSO-GA, GA-SA, GA-ACO, ACO-SA, FCM-GA, FCM-SA, and ACO-FCM hybrid techniques for both binary and extended partitioning approaches.

This paper is organized as follows. In Section 2, related works of the hardware/software partitioning techniques are introduced. The constructed benchmarking scenario model definition for partitioning problem is described in Section 3. In Sections 4 and 5, the formulation of hardware/software partitioning problems as a binary and then as an extended approach is presented. Experimental results and comparison of the proposed FCMPSO algorithm with standards partitioning techniques and hybrid ones are discussed in Section 6. Finally, the paper concludes in Section 7 by briefing the present work.

#### 2. The Used Optimization Algorithms to Solve Partitioning Problems

This section provides some detailed notations and definitions of the PSO algorithm, the FCM algorithm, and our proposed FCMPSO algorithm.

##### 2.1. PSO Algorithm

PSO is a stochastic, iterative population-based evolutionary optimization algorithm. It was developed in 1999 by Shi and Eberhart [19]. It uses the swarm intelligence that is based on social-psychological and biological social principles. By equivalence with the swarm intelligence, each swarm member (particle) takes advantage of private memory and has a degree of randomness in its movement as well as knowledge gained by the whole swarm to discover the best available food source. The problem of a food search can be solved by optimizing a fitness function. The definition of the communication structure (or social network) is obtained by assigning the neighbors for each swarm. All particles, in the search space, have fitness values which are evaluated by the fitness function to be optimized and have velocities which direct their motion in the multidimensional search space. Each particle remembers the information about its best solution and its position in the search space and both are available to its neighbors. In order to update the appropriate changes of its position and velocity, each particle has a memory holding: the particle “” position which presents the best solution the particle has seen by itself and the global best particle location’s “” that the particle acquires by communicating with a subset of swarms. The th particle velocity and position updates are based on the following equations:where is the inertia factor that takes linearly decreasing values downward from 1 to 0 according to a predefined number of iterations, is the velocity, is the current solution (or position), and are uniform random numbers in the range between 0 and 1, and and present positive constant parameters called “acceleration coefficient.”

##### 2.2. FCM Algorithm

FCM algorithm is a determinist, constructive optimization algorithm. Different studies prove that the FCM outperforms different existing clustering algorithms, that is, the Self-Organization Map (SOM) neural network algorithm [20], -mean algorithm [21], and hierarchical clustering [20]. It is the most popular fuzzy clustering method, which was originally proposed by Dunn [22] and later had been modified by James [23].

FCM algorithm is efficient, straightforward, and easy to implement. It is based on fuzzy behavior and provides a natural technique for producing clustering where membership weights have a natural interpretation but not probabilistic (determinist). The main goal of the FCM is to minimize an objective function, taking into account the similarity of elements and cluster centers.

Suppose a set of objects in dimensional space, listed by . Each object is represented by a vector of quantitative variables defined by variables indexed by , where . Suppose a set of prototypes listed by associated with groups, where each prototype is also represented as a vector of quantitative variables , where . Suppose a matrix of membership degrees, where presents the membership degree of the object to group . Its value belongs to the real interval .

FCM algorithm aims at finding a prototype matrix and a membership degree matrix that minimize : the objective function called “fitness function.” The prototypes that minimize the objective function are updated using the following equation:The membership degrees that minimize the objective function are updated according to the following equation:where is the level of cluster fuzziness. In the limit , the membership degree converges to 0 or 1, which implies a good partitioning.

FCM algorithm is an effective algorithm. It is faster than the PSO algorithm because it requires fewer function evaluations, but it is sensitive to initial values and usually falls into local optima. The weaknesses of these two algorithms motivate the proposal of an alternative approach based on the combination of FCM and PSO algorithms to form a novel FCMPSO algorithm which maintains the merits of both PSO and FCM algorithms.

##### 2.3. FCMPSO Algorithm

In this work, FCMPSO algorithm was proposed to solve both binary and extended hardware/software partitioning problems. This algorithm takes together advantages of both PSO and FCM algorithms: the PSO algorithm has a strong global search capability, while FCM algorithm produces approximate solutions faster and fails in a local optimal solution easily. Hence, we integrated the FCM algorithm with PSO algorithm to provide near-optimal solution with faster speed.

Firstly, we applied the FCM algorithm to create the uncertain initial partitioning solutions in order to reduce (limit) the research space of the PSO algorithm. Then, we execute the PSO algorithm to have a near-optimal partitioning solution.

The pseudocode of our FCMPSO algorithm is presented in Algorithm 1.

*Algorithm 1 (FCMPSO algorithm). * = FCMPSO () *Require*: Dataset of FCM parameters: (1) Select the center of cluster (2) Select the number of objects (3) Select the maximum number of iterations (4) Select the level of cluster fuzziness () (5) Randomly initialize of object to group (6) ; ; Partitioning Objective Function () Dataset of PSO parameters: (1) Select the maximum number of iterations (2) Select the population (swarm) size: (3) Select the inertia factor “” that takes linearly decreasing values downward from 1 to 0 (4) Select the uniform random numbers and in the range between 0 and 1 (5) Select the acceleration coefficients and (positive constant parameters) *FCM algorithm* Repeatedly,* while * and (a) Update prototype matrix : fix the membership degree matrix and update prototypes using (3) (b) Update the membership degree matrix : fix the membership degrees using (4) (c) (d) Partitioning Objective Function () (e) *end while* *Return* Matrices** Y** and** U** *PSO algorithm* (1) Initialize particle position with matrix of FCM output algorithm *For* each particle position to (dimension of ) (2) Compute the fitness values of (3) Compute velocity (4) Initialize to its initial position: (5) Initialize to the minimal value of the swarm: = fitness value () *End For* *Repeat* until criteria are met () (1) Update particle’s velocity as (1) (2) Update particle’s position as (2) If , (1) Update the best know position of particle : If , (1) Update the swarm’s best position: (2) *end* *return *

The complexity analysis of the original PSO and FCM algorithm is as follows:(i)The PSO algorithm is , where is the swarm population and is the maximum number of iterations.(ii)The FCM algorithm is , where is the objects number, is the maximum number of iterations, and is the cluster center. In our case, .(iii)The FCMPSO algorithm is , where is the dimension of the population issue from the FCM algorithm, is the objects number, and is the maximum number of iterations.As can be seen, the computational complexity of the FCMPSO algorithm is mainly affected by the number of objects and the population dimension issue from the FCM algorithm. The disadvantages of the PSO algorithm are that it is easy to fall into local optima in high-population dimension and has a low convergence rate in the iterative process. It can also be observed that the computational complexity of the hybrid FCMPSO is accepted when it is applied to solve the high-dimensional and complex problems.

Our proposed FCMPSO algorithm will be tested for two kinds of partitioning approaches: binary and extended ones. The main difference between these two kinds of architectures appears in the number of the used devices and their types. In the binary partitioning approach, the target architecture includes a single hardware processing unit and a single software processing unit or a reconfigurable architecture. However, in extended partitioning approach, the target architecture includes multiprocessing hardware components and several software processing components.

Before starting the hardware/software partitioning process, it is necessary to transform the initial specification into formal specification. The benchmarking scenario model used to validate our proposed hardware/software partitioning problem is presented in the next section.

#### 3. Benchmarking Scenario: Task Graphs

Different benchmarks and applications are used to validate hardware/software partitioning approaches. These applications and benchmarks are varying from each other. The embedded application to be partitioned is generally given as a Direct Acyclic Graph (DAG) that represents the sequence of nodes in the embedded system application.

In this work, we use Task Graph for Free (TGFF) tool to generate a set of 20, 50, 100, 200, 500, 1000, and 2000 nodes graphs. Each graph is donated as , where presents the set of tasks and are the set of edges which present the data dependency between two nodes. The partitioning process aims to find a partition , where such that and . It can generate a deciding partition vector , representing the implementation way of the task nodes.

Such approach is necessary to avoid the system architecture dependency variation in the system by making parameter changes. The variations in the number of the input/output nodes, the node metrics (i.e., execution time, cost, area, and power), and the software/hardware processor number are randomly assigned in the TGFF file input configuration file.

The obtained graphs are used as a system specification to validate the efficiency of our proposed partitioning algorithm to solve both binary and extended partitioning problems.

#### 4. Hardware/Software Partitioning as a Binary Problem

This section provides the formal description of the binary partitioning problem, especially the used target architecture and the mathematical model of the objective function and the related constraints.

##### 4.1. Architecture Representation

The characteristic of the target system architecture consists of one software processor to execute software tasks and one hardware component (FPGA/ASIC) to implement hardware tasks. Both software and hardware components have their local memory and communicate with each other through a shared bus for communication between hardware and software components, as shown in Figure 1.