Abstract

The paper presents a novel hybrid evolutionary algorithm that combines Particle Swarm Optimization (PSO) and Simulated Annealing (SA) algorithms. When a local optimal solution is reached with PSO, all particles gather around it, and escaping from this local optima becomes difficult. To avoid premature convergence of PSO, we present a new hybrid evolutionary algorithm, called HPSO-SA, based on the idea that PSO ensures fast convergence, while SA brings the search out of local optima because of its strong local-search ability. The proposed HPSO-SA algorithm is validated on ten standard benchmark multimodal functions for which we obtained significant improvements. The results are compared with these obtained by existing hybrid PSO-SA algorithms. In this paper, we provide also two versions of HPSO-SA (sequential and distributed) for minimizing the energy consumption in embedded systems memories. The two versions, of HPSO-SA, reduce the energy consumption in memories from 76% up to 98% as compared to Tabu Search (TS). Moreover, the distributed version of HPSO-SA provides execution time saving of about 73% up to 84% on a cluster of 4 PCs.

1. Introduction

Several optimization algorithms have been developed over last few decades for solving real-world optimization problems. Among them, we have many heuristics like Simulated Annealing (SA) [1] and optimization algorithms that make use of social or evolutionary behaviors like Particle Swarm Optimization (PSO) [2, 3]. SA and PSO are quite popular heuristics for solving complex optimization problems, but they have some strengths and limitations.

Particle Swarm Optimization (PSO) is based on the social behavior of individuals living together in groups. Each individual tries to improve itself by observing other group members and imitating the better ones. This way, the group members are performing an optimization procedure which is described in [3]. The performance of the algorithm depends on how the particles (i.e., potential solutions to an optimization problem) move in the search space, given that the velocity is updated iteratively. Large research body is therefore devoted to the analysis and proposal of different motion rules (see [4–6] for recent accounts of PSO research). To avoid premature convergence of PSO, we combine it with SA: PSO contributes to the hybrid approach in a way to ensure that the search converges faster, while SA makes the search jump out of local optima due to its strong local-search ability. In this paper, we present a hybrid optimization algorithm, called HPSO-SA, which exploits intuitively the positive features of PSO and SA. We also validate HPSO-SA using ten benchmark functions given in [7] and compare the results with classical PSO, ATREPSO, QIPSO, and GMPSO algorithms described in [2], TL-PSO [8], PSO-SA [9], SAPSO and SUPER-SAPSO presented in [10]. We provide also two versions of HPSO-SA (sequential and distributed) for minimizing the energy consumption in embedded systems memories. The two versions, of HPSO-SA, reduce the energy consumption in memories from 76% up to 98% as compared to Tabu Search (TS). Moreover, the distributed version of HPSO-SA provides execution time saving of about 73% up to 84% on a cluster of 4 PCs.

The rest of the paper is organized as follows. Section 2 introduces, briefly, PSO and SA algorithms. Section 3 is devoted to detailed description of HPSO-SA. In Section 4, benchmark functions are applied on HPSO-SA. In Section 5, HPSO-SA is used to solve the energy consumption problem in memory. In addition, simulation results are provided and compared with those of [11]. Conclusions and further research aspects are given in Section 6.

2. Background

2.1. Simulated Annealing Algorithm

SA [12] is a probabilistic variant of the local search method, which can, in contrast to PSO, escape local optima. SA is based on an analogy taken from thermodynamics which is as follows: In order to grow a crystal, we start by heating material until it reaches its molten state. Then, we reduce the temperature of this crystal melt gradually, until the crystal structure is formed. A standard SA procedure begins by generating an initial solution at random. At initial stages, a small random change is made in the current solution 𝑠𝑐. Then the objective function value of the new solution 𝑠𝑛 is calculated and compared with that of the current solution. A move is made to the new solution if it has better value or if the probability function implemented in SA has a higher value than a randomly generated number. Otherwise a new solution is generated and evaluated. The probability of accepting a new solution is given as follows:⎧βŽͺ⎨βŽͺβŽ©ξ€·π‘ π‘=1,ifπ‘“π‘›ξ€Έξ€·π‘ βˆ’π‘“π‘ξ€Έξƒ©βˆ’||𝑓𝑠<0,expπ‘›ξ€Έξ€·π‘ βˆ’π‘“π‘ξ€Έ||𝑇ξƒͺ,otherwise.(1)

The calculation of this probability relies on a parameter 𝑇, which is referred to as temperature, since it plays a similar role as the temperature in the physical annealing process. To avoid getting trapped at a local minimum point, the rate of reduction should be slow. In our problem we use the following method to reduce the temperature: 𝛾=0.99 and𝑇𝑖+1=𝛾𝑇𝑖,(2) where 𝑖=0,1,….

Thus, at the start of SA most worsening moves may be accepted, but in the end only improving ones are likely to be allowed, which can help the procedure jump out of a local minimum. The algorithm may be terminated after a certain volume fraction of the structure has been reached or after a prespecified runtime.

2.2. Particle Swarm Optimization

PSO is a population based stochastic optimization technique developed by [13], inspired by social behavior patterns of organisms that live and interact within large groups. In particular, it incorporates swarming behaviors observed in flocks of birds, schools of fish, or swarms of bees, and even human social behavior.

PSO algorithm is based on an idea that particles move through the search space with velocities that are dynamically adjusted according to their historical behaviors. Therefore, the particles have the tendency to move towards the better and better search area over the course of search process. PSO algorithm starts with a group of random (or not) particles (solutions) and then searches for optima by updating each generation. Each particle is treated as a volume-less particle (a point) in the 𝑛-dimensional search space. The 𝑖th particle is represented as 𝑋𝑖=(π‘₯𝑖1,π‘₯𝑖2,…,π‘₯𝑖𝑛). At each generation, the particles are updated by using following two 𝑏𝑒𝑠𝑑 values.(i)The first value is the best solution (fitness) a particle has achieved so far (the fitness value is also stored). This value is called 𝑝𝑏𝑒𝑠𝑑.(ii)The second value is the best value tracked by the particle swarm optimizer so far (by any particle) in the population. This best value is a global best and is called 𝑔𝑏𝑒𝑠𝑑. When a particle takes part of a population as its topological neighbors, the best value is a local best and is called 𝑙𝑏𝑒𝑠𝑑.

At each iteration, these two best values are combined to adjust the velocity along each dimension, which is then used to compute a new iteration step for the particle. A portion of adjustment to the velocity is influenced by the individual's previous best position (𝑝𝑏𝑒𝑠𝑑), considered as the cognition component, and this portion is influenced by the best in the neighborhood (𝑙𝑏𝑒𝑠𝑑 or 𝑔𝑏𝑒𝑠𝑑), the social component (see Figure 1). With the addition of the inertia factor, πœ”, by [14] (for balancing the global and the local search), the equations for velocity adjustment are𝑣𝑖+1=πœ”π‘£π‘–+𝑐1ξ€·βˆ—random(0,1)βˆ—π‘π‘π‘’π‘ π‘‘π‘–βˆ’π‘₯𝑖+𝑐2ξ€·βˆ—random(0,1)βˆ—π‘”π‘π‘’π‘ π‘‘π‘–βˆ’π‘₯𝑖,π‘₯(3)𝑖+1=π‘₯𝑖+𝑣𝑖+1,(4) where random(0,1) is a random number independently generated within the range of [0,1] and 𝑐1 and 𝑐2 are two learning factors which control the influence of the social and cognitive components (usually, 𝑐1 = 𝑐2 = 2, see [15]).

In (3), if the sum on the right side exceeds a constant value, then the velocity on that dimension is assigned to be 𝑉𝑖min or 𝑉𝑖max. Thus, particle velocities are clamped to the range of [𝑉𝑖min;𝑉𝑖max], which serves as a constraint to control the global exploration ability of PSO algorithm. This also reduces the likelihood of particles leaving the search space. Note that the values of π‘₯𝑖 are not restricted to the range [𝑉𝑖min;𝑉𝑖max]; it only limits the maximum distance that a particle will move during one iteration.

3. HPSO-SA Hybrid Algorithm

This section presents a new hybrid HPSO-SA algorithm which combines the advantages of both PSO (that has a strong global-search ability) and SA (that has a strong local-search ability). Other applications of hybrid PSO and SA algorithm can be found [9, 10, 16–19].

This hybrid approach makes full use of the exploration capability of both PSO and SA and offsets the weaknesses of each. Consequently, through application of SA to PSO, the proposed algorithm is capable of escaping from a local optimum. However, if SA is applied to PSO at each iteration, the computational cost will increase sharply and at the same time the fast convergence ability of PSO may be weakened. In order to flexibly integrate PSO with SA, SA is applied to PSO every 𝐾 iterations if no improvement of the global best solution does occur. Therefore, the hybrid HPSO-SA approach is able to keep fast convergence (most of the time) thanks to PSO, and to escape from a local optimum with the aid of SA. In order to allow PSO jump out of a local optimum, SA is applied to the best solution in the swarm found so far, each 𝐾 iterations that is predefined to be 270∼500 (based on our experimentations).

The hybrid HPSO-SA algorithm works as illustrated in Algorithm 1, where one has the following.(i)Description of a Particle. Each particle (solution) π‘‹βˆˆπ‘† is represented by its 𝑛>0 components, that is, 𝑋=(π‘₯1,π‘₯2,…,π‘₯𝑛), where 𝑖=1,2,…,𝑛 and 𝑛 represents the dimension of the optimization problem to solve.(ii)Initial Swarm. Initial Swarm corresponds to population of particles that will evolve. Each particle π‘₯𝑖 is initialized with uniform random value between the lower and upper boundaries of the interval defining the optimization problem.(iii)Evaluate Function. Evaluate (or fitness) function in HPSO-SA algorithm is typically the objective function that we want to minimize in the problem. It serves for each solution to be tested for suitability to the environment under consideration.(iv)SA Algorithm. If no improvement of the global best solution occur during the last 𝐾 iterations, then it means that the algorithm is trapped in a local optimum point. To escape out from local optimum, we apply SA algorithm to global best solution. The performance of SA depends on the definition of the several control parameters. (a)Initial Temperature. Kirkpatrick [20] suggested that a suitable initial temperature is one that results in an average probability πœ’0 of a solution that increases 𝑓 being accepted of about 0.8. The value of 𝑇0 will clearly depend on the scaling of 𝑓 and, hence, be problem-specific. It can be estimated by conducting an initial search (100 iterations in next simulations) in which all increases in 𝑓 are accepted and calculating the average objective increase observed 𝛿𝑓. 𝑇0 is then given by 𝑇0=βˆ’π›Ώπ‘“ξ€·πœ’ln0ξ€Έ.(5)(b)Accept Function. Function Accept (current_solution, Neighbor, 𝑇) is decided by the acceptance probability given by (1), which is the probability of accepting configuration π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ.(c)Generate Function. The neighborhood of each solution π‘₯ is generated by using the following Equation: π‘₯⟡π‘₯+π‘‘πœŽπ‘Ÿ,(6) where 𝑑 is the direction of the new neighborhood and takes either 1 or βˆ’1, 𝜎 is random number with Gaussian (0,1) distribution and π‘Ÿ is a constant that correspond to the radius of neighborhood generator.(d)SA_Stop_Criterion. The stopping criterion of SA algorithm defines when the system has reached 3000 function evaluations or maximum number of functions evaluations or π‘‚π‘π‘‘π‘–π‘šπ‘Žπ‘™_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› are not attained.(e)Decrementing the Temperature. The most commonly used temperature reducing function is geometric (see (2)). In next simulations 𝛾=0.99.(f)Inner-Loop. The length of each temperature level determines the number π‘ž=150 of solutions generated at each temperature, 𝑇.

(1)  iter ← 0, 𝑐 𝑝 𝑑 ← 0 , Initialize 𝑠 𝑀 π‘Ž π‘Ÿ π‘š _ 𝑠 𝑖 𝑧 𝑒 particles  
(2)  stop_criterion ← maximum number of function evaluations or 𝑂 𝑝 𝑑 𝑖 π‘š π‘Ž 𝑙 _ 𝑠 π‘œ 𝑙 𝑒 𝑑 𝑖 π‘œ 𝑛 is not attained
(3)  while   Not stop_criterion  do
(4)       for  each particle 𝑖 ← 1   to   𝑠 𝑀 π‘Ž π‘Ÿ π‘š _ 𝑠 𝑖 𝑧 𝑒   do
(5)      Evaluate ( 𝑝 π‘Ž π‘Ÿ 𝑑 𝑖 𝑐 𝑙 𝑒 ( 𝑖 ) )   if  the fitness value is better than the best fitness value (cbest) in history  then
(6)          Update current value as the new 𝑐 𝑏 𝑒 𝑠 𝑑 .
(7)      end
(8)       end
(9)       Choose the particle with the best fitness value in the neighborhood ( 𝑔 𝑏 𝑒 𝑠 𝑑 )
(10)       for  each particle 𝑖 ← 1   to   𝑠 𝑀 π‘Ž π‘Ÿ π‘š _ 𝑠 𝑖 𝑧 𝑒   do
(11)       Update particle velocity according to Equation  (3)
(12)        Enforce velocity bounds
(13)        Update particle position according to Equation  (4)
(14)        Enforce particle bounds
(15)       end
(16)       if  there is no improvement of global best solution  then
(17)      𝑐 𝑝 𝑑 ← 𝑐 𝑝 𝑑 + 1
(18)       end
(19)       Update global best solution
(20)        𝑐 𝑝 𝑑 ← 0
(21)       if   𝑐 𝑝 𝑑 = 𝐾   then
(22)      𝑐 𝑝 𝑑 ← 0
(23)       / / Apply SA to global best solution
(24)      iterSA ← 0, Initialize 𝑇 according to Equation  (5)
(25)       𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑠 π‘œ 𝑙 𝑒 𝑑 𝑖 π‘œ 𝑛 ← global_best_solution
(26)       𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑐 π‘œ 𝑠 𝑑 ← Evaluate( 𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑠 π‘œ 𝑙 𝑒 𝑑 𝑖 π‘œ 𝑛 )
(27)      while  Not SA_stop_criterion  do
(28)        while  inner-loop stop criterion  do
(29)           𝑁 𝑒 𝑖 𝑔 β„Ž 𝑏 π‘œ π‘Ÿ ← Generate( 𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑠 π‘œ 𝑙 𝑒 𝑑 𝑖 π‘œ 𝑛 )
(30)           𝑁 𝑒 𝑖 𝑔 β„Ž 𝑏 π‘œ π‘Ÿ _ 𝑐 π‘œ 𝑠 𝑑 ← Evaluate( 𝑁 𝑒 𝑖 𝑔 β„Ž 𝑏 π‘œ π‘Ÿ )
(31)          if  Accept(current_cost, Neighbor_cost, 𝑇 )  then
(32)             𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑠 π‘œ 𝑙 𝑒 𝑑 𝑖 π‘œ 𝑛 ← 𝑁 𝑒 𝑖 𝑔 β„Ž 𝑏 π‘œ π‘Ÿ
(33)             𝑐 𝑒 π‘Ÿ π‘Ÿ 𝑒 𝑛 𝑑 _ 𝑐 π‘œ 𝑠 𝑑 ← 𝑁 𝑒 𝑖 𝑔 β„Ž 𝑏 π‘œ π‘Ÿ _ 𝑐 π‘œ 𝑠 𝑑
(34)          end
(35)          iterSA ← iterSA + 1
(36)          Update (global_best_solution)
(37)        end
(38)        Update( 𝑇 ) according to Equation(2)
(39)        Update (SA_stop_criterion)
(40)      end
(41)       end
(42)       iter ← iter + 1, Update (stop_criterion)
(43) end

4. Experiments Results

4.1. Benchmark Functions

In order to compare the performance of HPSO-SA hybrid algorithm with those described in [2, 8–10], we use benchmark functions [7] described in Table 1. These functions provide a good starting point for testing the credibility of an optimization algorithm. For each of these functions, there are many local optima in their solution spaces. The number of local optima increases with increasing complexity of the functions, that is, with increasing dimension. In the following experiments, we used 10-, 20- and 30-dimensional functions except in the case of Himmelblau and Shubert functions that are two-dimensional by definition (see Figure 2 for 3D representation).

4.2. Simulation Results and Discussions

To verify the efficiency and effectiveness of HPSO-SA hybrid algorithm, the experimental results of HPSO-SA approach are compared with those obtained by [2, 8–10]. Our HPSO-SA hybrid algorithm is written in C++ and was compiled using gcc version 2.95.2 (Dev-cpp) on a laptop with Windows Vista x64 Premium Home Edition running Intel Core 2 Quad (Q9000) at 2 GHz and having 4 Gb memory.

4.2.1. Comparison with Results Obtained by Using TL-PSO Algorithm [8]

In this section we compare HPSO-SA approach with TL-PSO method [8] that is based on combining the excellence of both PSO and Tabu Search. As described in [8], we apply HPSO-SA algorithm to the four following benchmark problems: Rastrigin, Schwefel, Griewank and Rosenbrock function. Here, the number of particles in the swarm is 30. The number of dimension of searching 𝑛=20 and the number of objective function evaluations is 60000 (i.e., 2000Γ—30). The results obtained after numerical simulations are shown in Table 2. These results indicate the Mean, Best, and Worst values obtained under the same condition over 50 trials. By analyzing Table 2, we conclude that the results obtained by HPSO-SA algorithm are preferable in comparison with those obtained by TL-PSO algorithm.

4.2.2. Comparison with Other PSO Algorithms Described in [2]

Performance of four Particle Swarm Optimization algorithms, namely classical PSO, Attraction-Repulsion based PSO (ATREPSO), Quadratic Interpolation based PSO (QIPSO) and Gaussian Mutation based PSO (GMPSO) is evaluated in [2]. The algorithms presented in this paper are guided by the diversity of the population to search the global optimal solution of a given optimization problem, where as GMPSO uses the concept of mutation and QIPSO uses the reproduction operator to generate a new member of the swarm.

In order to make a fair comparison between classical PSO, ATREPSO, QIPSO, GMPSO and HPSO-SA approach, we fixed, as indicated in [2], the same seed for random number generation so that the initial swarm population is same for all five algorithms. The number of particles in the swarm is 30. The algorithms use a linearly decreasing inertia weight πœ” which starts at 0.9 and ends at 0.4, with the user defined parameters 𝑐1=𝑐2=2.0. For each algorithm, the number of objective function evaluations is 300000. A total of 30 runs for each experimental setting were conducted and the average fitness of the best solutions throughout the run was recorded. The mean solution and the standard deviation (note that the standard deviation indicates the stability of the algorithms), found by the five algorithms, is listed in Table 3. The numerical results given in Table 3 show the following. (i)All the algorithms outperform the classical Particle Swarm Optimization. (ii)HPSO-SA algorithm gives much better performances in comparison to PSO, QIPSO, ATREPSO, and GMPSO, out of the Sphere’s and Ackley’s functions. (iii)On Sphere’s function, QIPSO obtains better results than those obtained by HPSO-SA approach. But when the maximum number of iterations is fixed to 1.5Γ—106, HPSO-SA obtains the optimal value. (iv)The analysis of the results, obtained for Ackley's function, shows that QIPSO obtains better mean result than HPSO-SA algorithm. However, HPSO-SA has a much smaller standard deviation.

4.2.3. Comparison with Other PSO Algorithms Described in [10]

In this section four benchmark functions are used to compare the relative performance of HPSO-SA algorithm with SUPER-SAPSO, SAPSO, and PSO algorithms described in [10].

For all comparisons, the number of particles was set to 30. HPSO-SA algorithm uses a linearly decreasing inertia weight πœ” which starts at 0.9 and ends at 0.4, with the user defined parameters 𝑐1=𝑐2=2.0, 20 runs are conducted for each experimental setting and, for each algorithm, the average value is given in Table 4.

In all above experiments, HPSO-SA algorithm obtains better results in comparison to those obtained by both the standard PSO and SAPSO algorithm [10]. A comparison of HPSO-SA algorithm and SUPER-SAPSO [10], shows that the last one converges faster than HPSO-SA.

SUPER-SAPSO uses an expression for the particle movements (π‘₯𝑑+1=(π‘₯𝑑+𝑣𝑑+1)𝑇 where 𝑇β‰ͺ1) which is well-adapted to the case where the global optimum is 0. This is the reason why SUPER-SAPSO needs a very small number of iterations in this case.

4.2.4. Comparison with PSO-SA Algorithm Described in [9]

In this section performances of HPSO-SA are compared with these of PSOSA [9], Genetic Algorithm and hybrid algorithm [21].

Table 5 lists different results obtained for three different dimensions of each function. The optimum value of Sphere, Rastrigrin and Griewank was set to be 1π‘’βˆ’10 and the goal value of Rosenbrock function was set to be 1π‘’βˆ’06 (as indicated in [9]).

To make a fair comparison, the maximum number of function evaluations allowed was set to 20000, 30000 and 40000 for HPSO-SA and PSOSA algorithms when the number of particle was set to 20. HPSO-SA algorithm uses a linearly decreasing inertia weight πœ” which starts at 0.9 and ends at 0.4, with the user defined parameters 𝑐1=𝑐2=2.0.

The numerical results given in Table 5 show that: (i)Over four benchmark functions, HPSO-SA and PSOSA do better than standard GA and hybrid algorithm [21]. (ii)For Sphere, Rastrigin and Griewank functions, HPSO-SA and PSOSA algorithms obtain optimal solutions within specified constrains (number of objective function evaluations). (iii)For Rosenbrock function, PSOSA obtains better results than HPSO-SA for dimension 20, but for dimensions 10 and 30, HPSO-SA does better and has smaller standard deviation.

5. Reducing Memory Energy Consumption in Embedded Systems

5.1. Description of the Memory Problem

According to trends in [22], memory will become the major energy consumer in an embedded system. Indeed, embedded systems must integrate multiple complex functionalities which needs bigger battery and memory. Hence, reducing memory energy consumption of these systems has never been as topical. In this paper, we will focus on software techniques for the memory management. In order to reduce memory energy consumption, most authors rely on Scratch-Pad Memories (SPMs) rather than caches [23]. Although cache memory helps a lot with program speed, it is not the appropriate for most of the embedded systems. In fact, cache increases the system size and its energy cost (cache area plus managing logic). Like cache, SPM consists of small, fast SRAM. The main difference is that SPM is directly and explicitly managed at the software level, either by the developer or by the compiler which makes it more predictable. SPM requires up to 40% less energy and 34% less area than cache [24]. In this paper, we will therefore use an SPM in our memory architecture. Due to the reduced SPM size, we allocate space for interesting data only whereas, the remaining is placed in main memory (DRAM). In order to determine interesting data, we use data profiling to gather memory access frequency information. The Tabu Search (TS) approach consists of allocating space for data in SPM based on TS principles [25]. More details about how TS is implemented can be found in [11].

In order to compute energy cost of the system, we propose an energy consumption estimation model, for our memory architecture composed by an SPM, an instruction cache and a DRAM. Equation (7) gives the energy model where the three terms refer to the total energy consumed, respectively, in SPM, in instruction cache and in DRAM.𝐸=πΈπ‘‘π‘ π‘π‘š+𝐸𝑑𝑖𝑐+πΈπ‘‘π‘‘π‘Ÿπ‘Žπ‘š.(7)

In this model, we distinguish between the two cache write policies: Write-Through (WT) and Write-Back (WB). In a WT cache, every write to the cache causes a synchronous write to DRAM. Alternatively, in a WB cache, writes are not immediately mirrored to DRAM. Instead, the cache tracks which of its locations have been written over and then, it marks these locations as dirty. The data in these locations is written back to DRAM when that data is evicted from the cache [26]. In this paper, the aim is to minimize the energy for the detailed estimation model presented as follows:𝐸=π‘π‘ π‘π‘šπ‘Ÿβˆ—πΈπ‘ π‘π‘šπ‘Ÿ(8)+π‘π‘ π‘π‘šπ‘€βˆ—πΈπ‘ π‘π‘šπ‘€(9)+π‘π‘–π‘π‘Ÿξ“π‘˜=1ξ€Ίβ„Žπ‘–π‘˜βˆ—πΈπ‘–π‘π‘Ÿ+ξ€·1βˆ’β„Žπ‘–π‘˜ξ€Έβˆ—ξ€ΊπΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘Ÿ+𝐸𝑖𝑐𝑀+ξ€·1βˆ’π‘Šπ‘ƒπ‘–ξ€Έβˆ—π·π΅π‘–π‘˜βˆ—ξ€·πΈπ‘–π‘π‘Ÿ+πΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘€ξ€Έξ€»ξ€»(10)+π‘π‘–π‘π‘€ξ“π‘˜=1ξ€Ίπ‘Šπ‘ƒπ‘–βˆ—πΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘€+β„Žπ‘–π‘˜βˆ—πΈπ‘–π‘π‘€+ξ€·1βˆ’π‘Šπ‘ƒπ‘–ξ€Έβˆ—ξ€·1βˆ’β„Žπ‘–π‘˜ξ€Έβˆ—ξ€ΊπΈπ‘–π‘π‘€+π·π΅π‘–π‘˜βˆ—ξ€·πΈπ‘–π‘π‘Ÿ+πΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘€ξ€Έξ€»ξ€»(11)+π‘π‘‘π‘Ÿπ‘Žπ‘šπ‘Ÿβˆ—πΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘Ÿ(12)+π‘π‘‘π‘Ÿπ‘Žπ‘šπ‘€βˆ—πΈπ‘‘π‘Ÿπ‘Žπ‘šπ‘€.(13)

Equations (8) and (9) represent, respectively, the total energy consumed during reading and writing from/to SPM. Equations (10) and (11) represent, respectively, the total energy consumed during reading and writing from/to instruction cache. When, Equations (12) and (13) represent, respectively, the total energy consumed during reading and writing from/to DRAM. The various terms used in this energy model are explained in Table 6.

As SPM has got a lot of advantages, it is clearly preferable to put as much data as possible in it. In other words, we must maximize terms π‘π‘ π‘π‘šπ‘Ÿ and π‘π‘ π‘π‘šπ‘€ in the model. Hence, the problem becomes to maximize the number of accesses to the SPM. It is therefore a combinatorial optimization problem like knapsack problem [27]. We want to fill SPM that can hold a maximum capacity of 𝐢 with some combination of data from a list of 𝑁 possible data each with 𝑠𝑖𝑧𝑒𝑖 and π‘Žπ‘π‘π‘’π‘ π‘ π‘›π‘’π‘šπ‘π‘’π‘Ÿπ‘– so that the access number of the data allocated into SPM is maximized. This problem has a single linear constraint, a linear objective function which sums the sizes of the data allocated into SPM, and the added restriction that each data will be in the SPM or not. If 𝑁 is the total number of data, then a solution is just a finite sequence 𝑠 of 𝑁 terms such that 𝑠[𝑛] is either 0 or the size of the 𝑛th data. 𝑠[𝑛]=0 if and only if the 𝑛th data is not selected in the solution. This solution must satisfy the constraint of not exceeding the maximum SPM capacity (i.e., βˆ‘π‘π‘–=1𝑠[𝑖]≀𝐢).

5.2. Discrete Sequential Hybrid HPSO-SA Algorithm

This section should be considered as an attempt to use hybrid evolutionary algorithms for reducing energy consumption in embedded systems. Here, the focus is on the use of HPSO-SA algorithm designed in previous sections. Since the problem under consideration is dicrete and has specific features, HPSO-SA needs changes.

π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› (Particle)
A solution can be represented by an array having size equal to the number of the data. Each element from this array denotes whether a data is included in the SPM (β€œ1”) or not (β€œ0”). The HPSO-SA algorithm starts with an initial swarm which is randomly initialized.

πΈπ‘£π‘Žπ‘™π‘’π‘Žπ‘‘π‘’ Function
It is the objective function that we want to minimize in the problem. It serves for each solution to be tested for suitability to the environment under consideration πΈπ‘£π‘Žπ‘™π‘’π‘Žπ‘‘π‘’(π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›)=π‘‡π‘œπ‘‘π‘Žπ‘™_π‘π‘’π‘šπ‘π‘’π‘Ÿ_𝐴𝑐𝑐𝑒𝑠𝑠_π‘Žπ‘™π‘™_π‘‘π‘Žπ‘‘π‘Žβˆ’π‘π‘’π‘šπ‘π‘’π‘Ÿ_𝐴𝑐𝑐𝑒𝑠𝑠(π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›).(14)

Position Update Equation
Each dimension 𝑗 of the particle 𝑖 is updated by using (15): π‘₯𝑖𝑗=𝑣1,rand<sigm𝑖𝑗,0,otherwise,(15) where sigm(𝑣𝑖𝑗) is the sigmoid function, used to scale the velocities between 0 and 1, defined as: 𝑣sigm𝑖𝑗=1ξ€·1+expβˆ’π‘£π‘–π‘—ξ€Έ(16)

Generate Function
SA uses a notion of neighborhood relation. Let 𝑆 be the set of all feasible solutions to the problem and π‘“βˆΆπ‘†β†’β„œ the objective function to be minimized. A neighborhood relation is a binary relation π‘βŠ†π‘†Γ—π‘† with some desired properties. The interpretation of 𝑁(𝑠,π‘ ξ…ž) is that solution 𝑠 is a neighbor of solution π‘ ξ…ž in the search space of all solutions 𝑆. A neighbor heuristic proceeds in steps. Starting search at some initial solution 𝑠0 and then each step moves from the current solution to some neighbor according to rules specific to the heuristic. At each iteration, SA algorithm generates a random neighbor of the π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› (line 10). The neighborhood relation is defined as follows:(1)with probability equal to 0.03, the value of each element of π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› is flipped from 1 to 0 or from 0 to 1;(2)validate solution: while Not feasible (π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› must satisfy the constraint of not exceeding the maximum SPM capacity.) (π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›) do Remove the data 𝑗 having a low number of access from π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›: (π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›[𝑗]←0).

Accept Function
the key idea in the SA approach is the function 𝐴𝑐𝑐𝑒𝑝𝑑 which specifies the probability of accepting the move from π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› to a π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ solution, which also depends on so called temperature (𝑇). The function 𝐴𝑐𝑐𝑒𝑝𝑑 should satisfy the following conditions: (1)𝑝=1 if solution π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ is better than π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› in terms of the cost function 𝑓: (i.e., 𝑓(π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ)<𝑓(π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›) in a minimization problem);(2)if π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ is worse than π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› the value of 𝑝 is positive (i.e., it allows for moving to a worse solution), but decreases with |𝑓(π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ)βˆ’π‘“(π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘›)|;(3)for fixed π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› and π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ, when π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ is worse than π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘_π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› the value of 𝑝 decreases with time and tends to 0.The function 𝐴𝑐𝑐𝑒𝑝𝑑(πΆπ‘π‘œπ‘ π‘‘,π‘π‘π‘œπ‘ π‘‘,𝑇) is decided by the probability of accepting configuration π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿ. This probability is given by the following formula: ⎧βŽͺ⎨βŽͺβŽ©π‘=1,π‘π‘π‘œπ‘ π‘‘<πΆπ‘π‘œπ‘ π‘‘,12ξ€·randβˆ—1+𝑒(πΆπ‘π‘œπ‘ π‘‘βˆ’π‘π‘π‘œπ‘ π‘‘)/𝑇,otherwise,(17) where 𝑇 is the temperature and rand is a random number independently generated within the range of [0,1].

5.3. Discrete Cooperative Distributed Hybrid HPSO-SA Algorithm

For Distributed hybrid HPSO-SA (HPSO-SA_Dist) algorithm, we use independent subswarms of particles with their own fitness functions which evolve in isolation, except for an exchange of some particles (migration). A set of π‘š=30 particles is assigned to each of the 𝑃 processors, for a total population size of π‘šΓ—π‘ƒ. The set assigned to each processor is its subswarm. The processors are connected by an interconnection network with a ring topology. Initial subswarms consist of a randomly constructed assignment created at each processor. Each processor, disjointly and in parallel, executes the HPSO-SA_Seq algorithm on its subswarm for a certain number of generations. Afterwards, each subswarm exchanges (π»π‘ƒπ‘†π‘‚βˆ’π‘†π΄_𝐷𝑖𝑠𝑑 runs in Asynchronous mode: at the 100th iteration, each processor sends its best solution and continues the improvement of its subswarm and verifies if it does not receive a solution from its neighbor.) its best particle (migrant) with its neighbors. We exchange the particles themselves (i.e., the migrant is removed from one subswarm and added to another). Hence, the size of the subswarm remains the same after migration (the worst particle is removed). The process continues with the separate improvement of each current solution for a maximum number of iterations. At the end of the process the best solution that exists constitutes the final assignment.

5.4. Experimental Results

In order to compute the energy cost of studied memory architecture composed by an SPM, an instruction cache and a DRAM, we proposed an energy consumption estimation model which is explained in [11]. Hybrid HPSO-SA algorithms and TS have been implemented on a cluster of PCs running under Windows XP Professional version 2002. The cluster is composed by 4 Pentium (D) machines running at 3 GHz. Each processor has 1 Gbyte of memory. Table 7 gives a description of the benchmarks used and they also can be downloaded from [28].

In experiments, 30 different executions for each heuristic are performed and the best and average results obtained on these 30 executions are recorded. In this case, the best and the average solutions give similar results. Figure 3 shows that both HPSO-SA_Seq and HPSO-SA_Dist achieve better performances than TS on energy savings. In fact, hybrid HPSO-SA heuristics consume from 76.23% (StatemateCE) to 98.92% (ShaCE) less energy than TS.

As HPSO-SA_Seq and HPSO-SA_Dist give similar results, we decide to experiment their behavior when considering their execution time. We recorded the average execution times needed by HPSO-SA_Seq and HPSO-SA_Dist (running on a cluster of 4 PCs) to achieve the 30 executions. Figure 4 presents the results obtained on the largest (size) benchmarks. From this figure, we see that the Distributed HPSO-SA version (HPSO-SA_Dist) is faster than the Sequential HPSO-SA version (HPSO-SA_Seq). In fact, HPSO-SA_Dist requires 73.16% (AdpcmCE) to 84.65% (CntCE) less execution time than HPSO-SA_Seq. The Distributed HPSO-SA version is always faster than the Sequential HPSO-SA version.

6. Conclusion and Perspectives

In this paper, we have designed a hybrid algorithm (HPSO-SA) that combines the exploration ability of PSO with the exploitation ability of SA, and is capable of preventing premature convergence. Compared with QIPSO, ATREPSO and GMPSO [2], Tl-PSO [8], PSO-SA [9] and SUPER-SAPSO [10] on well-known benchmark functions and for the problem of reducing energy consumption in embedded systems memories, it has been shown that HPSO-SA performs well in terms of accuracy, convergence rate, stability and robustness. In future, we will also compare the performances of HPSO-SA with the above mentioned algorithms on the embedded systems memory saving problem.

In addition, we will compare HPSO-SA algorithm with other hybrid algorithms (PSO-GA, PSO-MDP, PSO-TS) whose design is in progress by the authors. Comparison will also be done on additional benchmark functions and more complex problems including functions with dimensionality larger than 30.

Acknowledgments

The authors are grateful to anonymous referees for their pertinent comments and suggestions. Dawood Khan helped the authors with the intricaties of the english language. The work of M. Idrissi Aouad is supported by the French national research agency (ANR) in the Future Architectures program.