Abstract

In the last three decades, the field of computational intelligence has seen a profusion of population-based metaheuristics applied to a variety of problems, where they achieved state-of-the-art results. This remarkable growth has been fuelled and, to some extent, exacerbated by various sources of inspiration and working philosophies, which have been thoroughly reviewed in several recent survey papers. However, the present survey addresses an important gap in the literature. Here, we reflect on a systematic categorisation of what we call “lightweight” metaheuristics, i.e., optimisation algorithms characterised by purposely limited memory and computational requirements. We focus mainly on two classes of lightweight algorithms: single-solution metaheuristics and “compact” optimisation algorithms. Our analysis is mostly focused on single-objective continuous optimisation. We provide an updated and unified view of the most important achievements in the field of lightweight metaheuristics, background concepts, and most important applications. We then discuss the implications of these algorithms and the main open questions and suggest future research directions.

1. Introduction

Hardware and software technologies are advancing at a fast pace and provide complex computing systems. In recent decades, strong competition among manufacturers has caused intense pressure to completely change the face of commercial electronics [1], leading to the ongoing development of computing devices with ever smaller dimensions but higher performance. These devices can range from extremely small form factor devices (e.g., microcontrollers, wearable devices, wireless sensors, and actuators) to larger devices such as hand-helds or tablets. A major concern in the design of these devices is that they usually perform computations under stringent physical, weight, and cost limitations, as well as real-time constraints and limited power capacities (e.g., with batteries that might be difficult or even impossible to replace/recharge). A categorisation of this kind of device, with its specific limitations, can be found, for example, in [2]. The general goal of manufacturers is to design optimal products that meet the requirements of the market without violating any hardware-dependent constraints. However, this is difficult in practise, given the impact these restrictions have on memory capacity, computational performance, and battery life.

Most computational problems that arise in such devices can be formulated in the form of an optimisation problem, i.e., one in which the optimal value of the given decision variables must be found with respect to a given objective function (in the remainder of the paper, we consider bound-constrained (also misleadingly referred to as “unconstrained”) single-objective continuous optimisation problems of the form:where is a candidate solution to the -dimensional optimisation problem defined through the objective function (without loss of generality, we assume this to be minimised) and is the search space delimited by the upper bound vector and the lower bound vector , s.t. ).

Note that due to the lack of a mathematical formulation or complexity of the problem, etc., these are challenging zeroth-order optimisation scenarios, where no assumption can be made on the properties of . Typical examples are self-tuning the parameters of machine learning algorithms on board a device [3], dynamically adjusting the hardware settings (e.g., camera, microphones, battery consumption, etc.), characterising a user profile or providing customised recommendations [4].

To deal with such scenarios, a metaheuristic [5, 6], i.e., general purpose “generate and test” black-box optimisation methods, is the most logical choice. Metaheuristics do not guarantee convergence to the theoretical optimum but offer high applicability without needing any information on the problem at all but rather learn the problem landscape to search for solutions. Their success in solving various numerical and real-world problems [7, 8] made them popular and the subject of continuous investigations. There are many algorithms of this kind in the literature, and choosing the most suitable for a specific problem is not an easy task [9]. Analysing the problem and tailoring a metaheuristic solver to it is the right approach, when possible. Similarly, tuning the parameters of the optimisation algorithm plays an important role. This can be a time-consuming task, especially when using modern algorithms, which are often based on a hybrid structure [10], and thus have even more parameters to adjust [11]. The latter are usually difficult to implement (which makes them more susceptible to errors) and understand, with some operators not contributing to the final performance on many problems [12], simplifying them would at least reduce their algorithmic overhead.

In this light, many modern metaheuristics are not suitable or thought for optimisation on severely constrained devices, as previously discussed. Memory limitations of the environment hosting them and minimising their computational overhead are not factors that are usually taken into consideration during the development phase. However, there are application domains where such devices are required to be equipped with a quick and simple optimisation routine, e.g., in the Internet of Things (IoT), where cost is usually an issue [13], or in the manufacturing sector, where fast, smaller, and energy-efficient systems are a priority. As the current availability does not seem to stop, with the most effective processing artificial intelligence (AI) technology having thousands and thousands of parameters to tune, we argue that minimising algorithmic overhead and memory consumption in optimisation algorithms would be a priority in several constructs in the years to come.

Most metaheuristics in the literature are population-based algorithms operating on a set of candidate solutions, a framework that has been shown to have some benefits [14]. However, single-solution algorithms also exist, and the importance of memory consumption in the study of population-based metaheuristics has been addressed in some recent studies. In [15], the authors adjust the implementation of three different Genetic Algorithms (GA) to embed them in an ATmega328P microcontroller. In their experiments, with 128 individuals represented in 32 bits, approximately of the available data memory (2 KB) was consumed. This leaves no room for other background/parallel processes. Execution time can also be problematic, as shown in [16]. Here, evolutionary and swarm computing algorithms are integrated and run on multiple embedded systems, such as a smartphone and three different Raspberry Pi models (and on a PC to use as a baseline for comparisons), on 10 well-known benchmark functions with and population size . The execution time increases significantly in embedded systems and, worryingly, in the smartphone, where each algorithm is at least 10 times slower than on the PC. This shows that an ad hoc algorithm/implementation should be used in such a device to keep execution time realistic. Other authors experimented with other hardware technologies, such as Field Programmable Gate Arrays (FPGAs) [1719] or Graphical Processing Units (GPUs). For the sake of completeness, it is worth adding that these concepts can be extended to other optimisation scenarios, such as, e.g., multi-objective problems. In this regard, we point to [20] where a Multi-Objective Genetic Algorithm (MOGA) is implemented and executed on a low-end microcontroller.

As a summary of what has been previously discussed, there are important domains where optimisation algorithms that 1. can attain good solutions with much less memory use and 2. can be easily embedded into limited hardware platforms. In the remainder of this article, we will refer to these algorithms as “lightweight metaheuristics” [21] or “memory-saving algorithms” [22].

Given that the nature of the problem imposes the number of “design variables” , reducing the need to store a population of solutions is the main goal of a memory-saving algorithm. We assume that the solutions are represented correctly, i.e., without unnecessary long encodings or redundant design variables. Also, we are selecting simple algorithms with linear memory complexity like genetic algorithms, as opposed to those requiring memory and computationally more expensive features such as eigenvalue decomposition, manipulation of covariance, or Hessian matrices, etc., to function, see e.g., [2326]. The degenerate case results in a “single-solution metaheuristic” (also referred to as “trajectory methods,” “solo search,” or “single-agent-based algorithms”). Here, we must pay attention to the working logic of the algorithm. Despite being less common, memory-consuming single-solution algorithms do exist. Examples, such as the Rosenbrock algorithm [23], the Powell method [24], and SPAM [27], make use of matrices stored in memory to perturb the only candidate solution on which they operate, making it difficult to use in memory-constrained environments. In coherence with the algorithms mentioned above, the Nelder–Mead method (also known as the simplex method) [28] was introduced as a derivative-free optimisation algorithm. It starts with an initial solution and iteratively uses a set of solutions forming the vertices of a simplex to move it. This method teeters on the brink of two opposing perspectives, as it may be thought of as a single-solution approach, yet it cannot be considered lightweight in our case because a simplex of points is required for it to function. A similar approach can be taken with Estimation of Distribution Algorithms (EDAs) [29, 30], which evolve a probabilistic model and draw solutions from it. In this case, sampling only one solution is not enough if a memory-saving probabilistic model is not used. This model can be a simplification of existing models to perform a so-called uncorrelated search that does not require storing cross-correlation values between the design variables. The compact algorithm class [31] is an established framework for obtaining memory-saving EDAs.

If properly designed, single-solution and compact algorithms can perform well and return satisfactory results in several contexts that require a very low memory footprint. These scenarios are abundant in some application fields including bioinformatics [3234], deep learning [35, 36], evolving hardware [37], and robotics [38, 39].

The goal of this article is to present a unified survey of research in the field of lightweight algorithms, as the existing literature appears inadequate to offer a comprehensive perspective on this class of optimisers. Our work sheds light on what is currently available for dealing with optimisation problems in an environment plagued by various limitations and provides readers with a wide range of application domains. This will benefit both practitioners and algorithm designers exploring hybrid algorithmic solutions. Indeed, it is clear that most of these algorithms are currently not as well known as population-based ones, and it is not rare to encounter statements such as “To the best of our knowledge, SA (Simulated Annealing), VNS (Variable Neighbourhood Search), and TS (Tabu Search) …are the only existing single-solution metaheuristics in the literature” [40], suggesting that advances in this field are somehow ignored. Hence, here, we combine relevant research lines and place greater emphasis on approaches such as the compact algorithm paradigm in [31] and holistic analysis in [6, 41] to overcome these problems. We gather relevant literature and provide interesting perspectives on modern and historical lightweight heuristics, reporting key notions that give a global view of these algorithms, including the current state of the art that is not included in [6, 31, 41] and is becoming fragmented. Furthermore, we review and report some significant applications of these algorithms, giving examples to practitioners having to deal with these scenarios and facilitating the search for reactive algorithmic solutions that are already present in the literature in one document. For benchmarking and other performance-related numerical results, we refer to [4244] and most of the articles included in this survey.

The remainder of this paper is organised as follows:(i)Section 2 describes classes of algorithms based on the number of processed candidate solutions and introduces the concept of “lightweight” algorithms for systems having limited resources.(ii)Section 3 focusses on population-based algorithms and discusses the use of micropopulations.(iii)Section 4 introduces the Estimation of Distribution Algorithms and discusses their memory requirements.(iv)Section 5 surveys the existing literature to report and comment on memory-saving algorithms (for both discrete and continuous optimisation) by grouping them into the two main categories of single-solution and compact algorithms.(v)Section 6 reports relevant application scenarios.(vi)Section 7 concludes this work and discusses open issues in the field of lightweight optimisation research.(vii)Section 8 systematically points out areas of improvement to address in the future.

2. Metaheuristics in the Balance

When the environment hosting, the optimiser requires a thrifty use of resources, even between algorithms with linear memory complexity, there might be some that are preferable over others that, on the contrary, require unwanted memory slots (a vector of real values, for example, floating or doubles, of the same size as the problem) to function. In this context, metaheuristics with linear memory footprint can be further classified by considering number of solutions stored in memory during the search for optima, as an indicator of the resources needed to run the optimisation process.

For the sake of clarity, we remark that this shall be done only for algorithms that are already “lightweight” in their nature, i.e., metaheuristics that do not require the storage of auxiliary variables for representing or manipulating the candidate solutions (e.g., covariance matrices, etc.). These classes of algorithms are usually developed in an attempt to obtain high performance in offline problems, but in the context of real-time and onboard optimisation, they are often infeasible choices and are considered, within the scope of this article, “heavyweight” algorithms as opposed to those with linear memory occupation with . Note that, as previously discussed, these heavy-working mechanisms can take place in both population-based and single-solution algorithms. In this work, we go even further and carefully select truly lightweight algorithms from those having a linear memory footprint considering the number of memory slots required for them to operate, as graphically depicted in Figure 1.

It is important to note that lightweight algorithms consist of algorithms with approximately two memory slots (one best solution plus an additional auxiliary solution to produce a new solution). Approaches with this feature are from the previously introduced classes of single-solution and compact algorithms, which we refer to as “sMeta” and “cMeta” in the remainder of this paper for brevity. In line with this notation, we also use the expressions “pMeta” and “ Meta” for population-based algorithms and population-based algorithms working with so-called micropopulations of only a few individuals, respectively.

Note that we not only report algorithms that have a modest memory footprint but also select the most successful design strategies that allow satisfactory performance despite using a small number of memory slots. As most computation algorithms mimic the behaviour of popular pMeta algorithms, for the sake of completeness, we next discuss key points on pMeta algorithms. This allows us to better introduce the Meta algorithm surveyed in this work. These represent the simplest way to obtain a lightweight algorithm and can be found to exist in some memory-constrained environments.

3. Populations and Populations

Population-based algorithms have been the go-to solution for solving a large number of (constrained or unconstrained, single-objective, or multi-objective) optimisation problems for many years now. These methods have been proven to be a key to solving many real-world problems and are now being developed in continuous development. For more information on the many paradigms existing in the literature, the most established being Evolutionary Computing (EC), Swarm Intelligence (SI), and Hyperheurstics/Memetic Computing, we point to relevant books [6, 4547] and surveys [4854]. Modern hybrid structures employing machine learning components also populate the literature [55, 56]. Note that most surveys either focus on a specific algorithmic family or classify wide ranges of metaheuristics based on their inspiring metaphors. Such metaphors have been very useful to the research community, developing the first nature-inspired algorithms. However, there is a clear recent trend in designing optimisation heuristics simply by using an inspiring metaphor as the main driving force and motivation. This is generating a plethora of algorithms whose contributions to the field are arguable and that are often poorly benchmarked and compared to similar strategies that were already available in the literature. This is evident from recent metaheuristic surveys, many of which focus on such metaphor-led algorithms and their variants [5760]. In this survey, we mention some of these algorithms, depending on the relevance of the message of the corresponding article, as it is important to survey the totality of the current literature. However, we recommend that one always checks the literature to avoid reproposing similar ideas under different names, uses a more theoretical or empirically informed approach, and follows good practises [6163] when designing novel algorithms. Proper benchmarking should also be performed. In summary, we are in favour of making progress in algorithm design and using metaphors as a means of conveying complex information, but we share the same doubts/opinions of [10, 6466]. Interestingly, there are surveys on the performance of a wide range of metaheuristics on specific artificial testbed problems and real-world scenarios [6772]. These suggest practical insights on applying algorithms and highlight the importance of performing a thorough parameter tuning phase (and self-adaptive algorithms might also have some parameters to tune).

Adapting algorithm parameters to ensure optimal performance can be a challenging task [73], with the exception of the population size value. We now know that high values are not necessarily recommended, but in most cases, the common belief from the literature seems to be that increasing this parameter is beneficial over noisy, highly multimodal, and large-scale problems. For example, the study in [74] suggests for Differential Evolution (DE), which can be impractical for large-scale or real-world time-expensive problems. Moreover, this is not necessarily correct in all scenarios. Some DE variants with micropopulations of a maximum of five individuals have been shown to perform well on a very large-scale problem with thousands of design variables [75], where exploration is partial under the fixed computational budget, and the decision to focus more on exploitation seems to yield better results. Similar results are obtained with other micropopulation evolutionary and swarm intelligence algorithms [76, 77].

3.1. Micropopulations

As a general rule, the pMeta algorithms with can be referred to as Meta algorithms. However, 20 solutions are often considered too many and are not used when the full benefit of having a small population size; that is, rapid convergence, is sought. Potentially, less than five solutions can be tested but only if the working logic of the algorithms allows for it or is not damaged. For example, a classic DE with rand mutation uses 4 individuals (the target plus three randomly selected individuals chosen from the population) to generate a new offspring solution (see [54] for details on DE). Hence, less than four individuals is not a feasible setting, and exactly four individuals would mean that they will be always involved in all the perturbations. At least 5 or 6 of them are preferred to be able to implement the benefit of having a population while increasing exploitation and minimising memory usage. Note that DE is specifically suitable for working with micropopulations as diversity can still be high in the initial phases of the search process [78]. An analysis of the effect of DE and PSO (Particle Swarm Optimisation) micropopulations on various problems with different characteristics is available at [79].

The first investigations of micropopulations date back to the introduction of the micro-Genetic Algorithm ( GA) [80] and its variants [81, 82]. Since then, several micropopulation Evolutionary Algorithms ( EAs) followed, such as, e.g., [83]. After understanding the potential of a classic micro-Differential Evolution ( DE) over large-scale problems, several DE schemes, self-adaptive variants (such as JADE), and hybrid memetic alterations were proposed [75, 78, 8491]. Analogously, Swarm Intelligence algorithms have been shown to have similar advantages when run with micropopulations. The results worth mentioning are those obtained with micro-Particle Swarm Optimisation ( PSO) algorithms [76, 92, 93]. Further successful examples are those of micro-Artificial Immune System ( AIS) [94], micro-Bacterial Foraging Algorithm ( BFA) [95], and other metaphor-led algorithms such as those in [77, 96, 97] (which are indeed very similar to the more established framework such as DE and PSO, thus returning similar results). Finally, other important roles played by EAs are to perform a local search within memetic algorithms [98] and to act as microalgorithms for multi-objective optimisation [99, 100].

4. Estimation of Distribution Algorithms (EDAs)

The EDAs family forms a significant subset of EC algorithms where the concept of population plays a different role compared to other pMeta algorithms. This family is in continuous evolution and investigation, with frameworks such as Bayesian Optimisation (BO), also known as efficient global optimisation [101], currently finding its place in several time-consuming optimisation contexts, while originally simply referred to as Probabilistic Model-Building Genetic Algorithms (PMBGAs) [102]. This is because the first algorithms of this kind were a modification of previous EAs to drive the search through probabilistic models to achieve better performance on those nonseparable problems characterised by high epistasis [103, 104], which are challenging for many ES and SI strategies.

An EDA builds and samples promising candidate solutions from an explicit probabilistic model (which implicitly represents the population). The optimisation process is then the iterative evolution/update of the model, usually starting with an exploratory distribution and ending with one generating (near) optimal solution. Some EDAs draw populations from the corresponding distributions ( individuals are sampled per iteration), while others need fewer or a candidate solution to be drawn (as in most compact algorithms). Over the years, many algorithms appeared based on different models of all ranges of complexity, such as Population-Based Incremental Learning (PBIL) [105], Mutual Information Maximising Input Clustering (MIMIC) [106], Bivariate Marginal Distribution Algorithm (BMDA) [107], Factorised Distribution Algorithm [108], and many others such as several Extended Compact Genetic Algorithms [109114]. Multivariate factorisation is also a widely used method, and since some evolution strategies incorporate multivariate normal models, these can be seen as EDAs. Among them, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [25, 115] is considered by many as a great example of EDA algorithm, see [116], which became soon a benchmark for optimisation due to its performances and properties, such as invariance to rotations of the problems and many other peculiarities which, according to the authors, do not necessarily match the key characteristic of a “pure” EDA. Indeed, CMA-ES estimates the distribution of expected steps, while EDAs model them, etc. (see [115] for more details). However, the CMA mechanism is based on a complex probabilistic model with time complexity and memory complexity , which can be seen as a heavyweight computational requirement, as it can be relaxed by assuming that there is no cross-correlation (as done in compact optimisation). The resulting covariance matrix will prevent the normal distribution from rotating, while still allowing for adaptation along each coordinate axis if variances are evolved (one per axis). Instead, if the same constant variance value, which acts as a sort of step size, is kept the same along all axes, one obtains a symmetric normal distribution, which can only move within the square space when its mean value gets updated. The latter is the simplest case, which is also less memory-expensive.

For an overview of the many models available for EDAs, we refer to [116118]. Note that models and updated rules for control parameters, such as, for example, variances, mean values, or other measures of central tendency and spread, can be very complex, with the simplest being those proposed for compact optimisation (details in Section 3.1).

5. Lightweight Metaheuristics: A Taxonomy

We propose a lightweight metaheuristic taxonomy structured as in Figure 2. This gives a graphical overview of the two main classes of algorithms we survey in this work, i.e., sMeta and cMeta, and offers a granularity level, further classifying relevant subfamilies and variants of the same framework. Milestone algorithms and their more recent variations are considered in the taxonomy, as well as a few examples of modern metaphor-led algorithms to comment on current practises. While doing this, we summarise their working logic and report relevant successful applications and application domains for such algorithms.

5.1. Single-Solution Optimisation Algorithms for Combinatorial Problems

Hill Climbing (HC) [119], a.k.a. Iterative Descent, is a basic local search algorithm. Starting from an initial point, incremental perturbations are applied iteratively to enhance the value of the cost function. There are four main HC variants, namely, the iterative best improvement method, the iterative first improvement method, the randomised iterative improvement method, and the probabilistic iterative improvement method; see [120] for details. Note that the first two strategies are greedy, while the others accept worsening moves, that is, candidate solutions with a worse objective function value than the current one. Applications of HC are abundant in the literature. An example of a timetabling problem is in [121].

Iterated Local Search (ILS), presented in [122], is a simple multistart method that iteratively performs a perturbation step to explore a new starting point to then perform the local search step. When starting, the initial point must be provided or generated randomly, thus not requiring perturbation, and local search is immediately applied. These two iterated phases can be seen as alternating exploratory and exploitative searches. ILS is used successfully in other scheduling problems, such as the University Course Timetabling Problem [123]. Some notable ILS alterations are the hybrid adaptive ILS with Path-Relinking designed to solve the capacitated vehicle routing problem in [124], the ILS with ejection chains for open vehicle routing problems with time windows [125], and the two-phase ILS for the Set-Union Knapsack Problem [126]. For a detailed review of this algorithm, we refer the reader to [127].

Breakout Local Search (BLS) [128] is a variant of ILS that merges the steepest descent local search with adaptive perturbation strategies. BLS dynamically adjusts diversification by varying perturbation moves and types based on search history information. BLS is used successfully to address the Vertex Separator Problem (VSP) [128], the quadratic assignment problem [129], the maximum clique problem [130], and to solve the Assembly Sequence Planning Problem [131]. The hybrid BLS algorithm based on reinforcement learning from [132] shows improved performance over VSP.

Large Neighbourhood Search (LNS) [133] is a metaheuristic in which the two operators repair and destroy alternate to obtain a new solution in a neighbourhood of the candidate solution. The destroy operator is responsible for perturbing random components of the candidate solution, which then undergoes a feasibility check where the repair operator fixes the components to ensure that the new solution is in the socket space. Adaptive LNS algorithms for Vehicle Routing Problems (VRPs) can be found in [134, 135], and a hybrid adaptive LNS for the large-scale heterogeneous container loading problem is proposed in [136]. These are amongst the latest techniques proposed for these kinds of scheduling problems.

Great Deluge (GD) [137] operates by setting a threshold that acts as an upper limit for the admissible values of the objective function of the newly generated solution. Whenever the new candidate solution is accepted (i.e., in an imitation context, it must have a function value inferior to the upper limit), the upper limit value is decreased according to the adopted decay rate. In this way, most points are accepted at the beginning, but the algorithm becomes more selective after interaction, GD for a real-world examination timetabling problem [138]. In [139], an adaptive version is proposed, called the Flex-Deluge algorithm, to solve the timetabling problems of university exams. Other hybrid variants also populate the literature, such as those in [140, 141], which are proposed to address VRP and task scheduling in grid computing, respectively.

Variable Neighbourhood Search (VNS) [142] is based on the idea of systematically changing the neighbourhood. This occurs in two phases, in a local search phase which chooses the best neighbour improving the current solution to find the local optima and in a shaking phase to escape from the corresponding valley. Details on its extensions and applications in which VNS has proven to be very successful can be found in [143].

Greedy Randomised Adaptive Search (GRASP) [144] is an iterative process that combines a construction heuristic step with a sequential local search step. In the first step, a feasible solution is created using a randomised greedy heuristic. This solution serves as the starting point for the local search, which can be either a descent local search or a more advanced method. The best solution found is returned after the search process. Variations of GRASP and its applications are discussed in [145].

Similarly to the other algorithm, Guided Local Search (GLS) is built on top of the LS technique. To use GLS, one must first define a suitable set of features for the problem. Each feature has a cost and a penalty assigned by GLS. When the LS gets stuck at a local optimum, some features are selected and penalised. More details can be found in [146]. The authors in [147, 148] provide a list of GLS variants/extensions, guidelines on how to use this algorithm in practical applications, along with a variety of problems in which it was applied.

Descent-Based Local Search (DB-LS) [149] moves from the current solution to a neighbouring one according to a given neighbourhood structure in such a way that each movement leads to a better solution. This iterative process continues until no improvement is found, in which case the current solution corresponds to a local optimum. This technique was combined with the reinforcement learning technique and applied to graph colouring [149].

The Tabu Search (TS) method was used in a nontrivial number of combinatorial optimisation problems; see [150]. It was first presented in [151]. The TS algorithm explicitly leverages the history of the search not only to escape local optima but also to implement an exploration strategy. TS, like simulated annealing, allows for lower-quality solutions when a local optimum is discovered. A detailed presentation of this method and its fundamental concepts can be found in [150].

Simulated Annealing (SA) was proposed in [152]. SA accepts nonimprovement solutions in order to increase the chance of exploring the search space and escape from local optima. The algorithm starts with an initial solution and generates a random neighbour using a predefined neighbourhood structure at each iteration. If the newly generated solution is better than the best, it is accepted; otherwise, a solution of poor quality is accepted with a probability specified by the Boltzmann distribution. In particular, although SA was introduced for combinatorial optimisation, it has also been used to tackle real-valued problems. An exhaustive review of the literature is provided in [153].

Threshold Accepting (TA) [154] follows the same principle as SA, but it differs in the criterion used to accept candidate solutions. SA allows a nonimproving solution only with a given probability, whereas TA accepts it if the degradation does not reach a progressively decreasing threshold.

All of these search methods have a similar structure. In addition, each has its own mechanism to diversify the exploration of the search space by escaping local optima. Other local searches are detailed in [6, 41].

Finally, hyperheuristics are prominent in dealing with discrete optimisation and are widely adopted for combinatorial problems. As first described in [155], these can be seen as “heuristics for choosing heuristics.” The selection method is arbitrary but often consists of using a learning mechanism to optimally activate the right operator. Therefore, once a set of heuristics/metaheuristics is provided, hyperheuristics work on the low-level operator space other than the solution space and can be defined in [156] as “a search method or learning mechanism to select or generate heuristics to solve computational search problems.” For details on milestone methods and recent advances, the relevant sources are [157159], from which it can be seen that this optimisation paradigm is highly recommended to solve scheduling, timetabling, and other discrete problems. Comprehensive lists of successful hyperheuristic applications are available in [157, 160]. Furthermore, one can see that most of the low-level operators involved in this framework are sMeta algorithms. This makes this framework very suitable for designing efficient single-solution memory-saving algorithms. Most importantly, single-solution optimisation plays a key role in designing hyperheuristics (even population-based ones), as the use of local searchers based, e.g., on hill climbing methods is very frequent [158, 159].

5.2. Single-Solution Optimisation Algorithms for Continuous Optimisation

When it comes to metaheuristics, most people immediately think of pMeta algorithms because using a set of multiple candidate solutions, i.e., the so-called population, is currently a stable practice. This is particularly true in the continuous domain, where manipulating a population of points is always seen as beneficial, see, e.g., [14], while sMeta algorithms are considered to result in poorer performances [161]. However, in line with [9], this cannot always be the case, and some sMeta, such as Simultaneous Perturbation Stochastic Approximation (SPSA) methods [162], offered ideas and played an important role in applied sciences such as physics and engineering in the past. Other algorithms, obtained as degenerate variants of existing pMeta algorithms with some adjustments to have , also provided interesting results. Other ideas, such as memetic single solution algorithms and hyperheuristics for the continuous domain, displayed highly competitive performances. We report on all of these classes of algorithms, including relevant historical and modern methods. We remark that sMeta is not a synonym for simplicity or minimal material consumption. Let us consider the elegant single solution evolution algorithm Cholesky (1 + 1)-CMA-ES [163], which reduces the computational effort of CMA-ES from to and requires only one candidate solution (plus an additional one for a temporary new solution). Its working mechanism is theoretically sound and consists of manipulating a matrix , thus not belonging to the list of lightweight algorithms provided.

In the continuous domain, it is possible to take advantage of the notion of a gradient to guide the search. Gradient Descent (GD) [164] is a first-order single solution-based method that relies on the objective function differentiability for its proper functioning. Depending on the function to be optimised, it iteratively adjusts the current solution, moving it in the direction of the steepest ascent or descent. In the SPSA algorithm previously mentioned, this is done by approximating the classic finite-difference gradient methods stochastically. As in this case, if Hessian matrices are not required, the method can be quite efficient, in particular, if the problem is not highly multimodal, despite being memory-saving. Other methods, such as those deriving from the classic Hooke–Jeeves direct search method [165], perform this indirectly by looking at objective function values in opposite orientated directions on the same line, per component/axis (we present moderate variations in the remainder of this section). This also resembles a continuous counterpart logic to the methods seen for the discrete domain based on neighbouring operators. In this case, the neighbourhood is obtained with either a fixed or an adaptive exploratory radius from the candidate solution. Solis and Wets [166] present a very simple randomised search of this kind.

Non-Uniform Simulated Annealing (nuSA) [167] is an improved version of SA for continuous optimisation. It uses a nonuniform mutation which gradually shrinks neighbourhood size during the search. Quaternion SA (Q-SA) [168] instead uses a quaternion representation of candidate solutions to improve neighbourhood exploration and prevent premature convergence by widening the initial search space. Q-SA explores the quaternion space rather than the Euclidean space and does not employ specific parameters to alter the neighbourhood range. Finally, the Single Non-Uniform Mutation-based (SNUM) algorithm [21] is a simplification of the nonuniform mutation strategy of nuSA. SNUM has only one parameter, which makes it quite easy to use. Its performance does not depend significantly on the value of this parameter.

There are also historical evolutionary sMeta algorithms specifically designed for the continuous domain. A worth mentioning one is the (1 + 1)-Evolution Strategy with 1/5 Success Rule [169], which decreases the standard deviation of the extended normal perturbation if the number of successful mutations is less than 1/5. Other methods from the EC and SI families are described in the following.

Intelligence Single Particle Optimiser (ISPO) [170], and its first formulation, referred to as IPO in [171], is a simple sMeta variant of the Particle Swarm Optimisation (PSO) metaheuristic [172]. Its working logic operates per component, i.e., each design variable, is perturbed a prefixed number of times sequentially to complete one iteration. Therefore, it is suitable for separable problems. ISPO adjusts the velocity vector depending on a learning factor based on the number of successful updates of the particle during the search. Four parameters are required in total, but performances depend mainly on two (namely, the diversity factor and the descend factor), which are the only problem-dependent parameters according to [170, 171]. As tuning them can be challenging, the AdpISPO algorithm [173] is designed as a self-adaptive version of ISPO whose adaption logic has been shown to work well in many testbed problems. This outperforms ISPO on such problems and is free of problem-dependent parameters. This feature makes it suitable for hybridisation with other algorithms whose implementations depend on the parameter setting. Examples of Memetic Computing (MC) algorithms that use PSO variants and AdpISPO to perform local computations are available in [174176]. To improve performance, in particular, over multimodal and large-scale domains, two multistart variants called ISPO-Restart and Very Intelligent Simple Particle Optimiser (VISPO) are presented in [177]. Both ISPO-Restart and VISPO perform a “jump” in the search space before restarting the search, but the newly generated starting point is first uniformly sampled within the domain and then mixed with the “elite” solution stored in memory by inheriting some of its promising design variables. This inheritance is obtained by binomial crossover (from DE [54]) with a low crossover rate to preserve randomness at the new starting point. ISPO-Restart and VISPO perform similarly, with VISPO being preferable to only a few benchmark problems tested in [177]. The only difference between the two is that the first variant restarts after a number of prefixed functional calls, while the second has a simple learning mechanism based on the number of successful continuous intervals per dimension to automatically decide when to restart.

Multiple Trajectory Search (MTS) [178] is another interesting lightweight sMeta that performs well on separable and large-scale problems (for which it was designed specifically). It can be seen as the coordination of three iterated local searchers algorithms where the first one perturbs all directions one at a time along one axis (as the Hook–Jeeves local searcher operator), the second one differs from the first one in that only searches one-fourth of the available dimensions, and the third one takes three small steps along each dimension according to find a candidate solution heuristically. The three operators are activated according to a grading system based on their successes, and if no improvement is registered, the search range is cut to one-half. The Multiple-Search Multi-Start (MSMS) framework in [179] is based on the simple idea of implementing a few search algorithms, but it features a multistart operator to keep changing the initial point in an attempt to approach premature convergence.

Three-Stage Optimal Meta-memetic Exploration (3SOME) [180] is a simple MC technique characterised by the activation of three operators (memes) that perturb a single solution. These three components, namely, long (L), middle (M), and short (S) distance exploration, are arranged in a bottom-up structure and coordinated in such a way that the exploitation pressure increases as the algorithm converges to a promising area of the search space. Most of the calls to objective functions are used for the local search operator S [181], which implements the same strategy as the first local search of MTS. The coordination logic of the three memes is very simple but allows for competitive results when compared to established algorithms, including pMeta algorithms, and in particular on separable problems. Following these results, several variants have been proposed. An improved M operator dynamically narrowing the space around the solution in the attempt to provide a better quality starting point to S is available at [182], and many modifications (not all necessarily preserving the memory-saving nature of 3SOME) proposed in [183] allow for handling nonseparable problems/rotated problems. The analysis in [181] highlighted the importance of the coordination logic in the operator implementation itself and identified the least activated operations (and the most expensive in terms of the calls of the objective function) during several optimisation processes. This led to simplified variants, resulting in significantly different algorithms with operators making fewer functional calls before local refinement. A very simple one, having only two stages, is the Resampling Search algorithm [184], which can be seen as a sort of multistart ILS algorithm for continuous optimisation, which was subsequently improved in the Resampled Inheritance Search (RIS) [185] framework. Further variants have been designed to deal with specific real-world applications; see, e.g., [186].

RIS [185] is a simple MC approach that performs a restarted iterated local search with a low level of inheritance of the previous best solution design variables after each restart. The S operator is run multiple times until a condition on the length of its exploratory radius is met (the option of fixing the number of S steps is also left available to the user). When a restart occurs, a point is drawn uniformly within the search space, and some of its design variables are crossed over to retain promising elite components. Both binomial and exponential DE crossover strategies are tested, see [54] for details, with exponential being the default choice. RIS is simple yet effective and competes (often outperforms) pMeta algorithms on several benchmark functions. It can be seen as an optimisation framework in which an algorithm, such as crossover strategy and local search, can be replaced with more appropriate combinations depending on the problem if necessary. To obtain a more robust framework, the Parallel Memetic Structure (PMS) [187] followed as a general idea of having multiple searches performing complementing perturbations, thus increasing the diversity of possible moves within the search space. PMS maintains the restart mechanism with inheritance and runs two local searchers moving along the axis (S is used for this purpose) and diagonally in the search space (Rosenbrock is used). This framework was proposed with the idea of including an adaptation system that allocates more budget to the local community performing the most successful move dynamically during the search, see, e.g., [27, 188]. Note that the original implementation of PMS executes Rosenbrock to perform the diagonal move. Obviously, this adds a quadratic memory footprint to the algorithm as a whole. To have a memory-saving variant of PMS, this meme has to be replaced with a lightweight sMeta.

To reduce the number of parameters of the Simulated Kalman Filter (SKF) algorithm, the work in [189] proposes a single-solution SKF (ssSKF) version using only one agent. The working mechanism of ssSKF does not differ significantly from that of SKF, which goes through the three steps of prediction, measurement, and estimation. As these are performed by a single agent, working with a single solution, ssSKF is lightweight and easier to tune, which is an important aspect given the impact of setting parameters on SKF [190]. A similar idea led to the Single-Agent Finite Impulse Response Optimiser (SAFIRO) [191]. In this case, an agent is also responsible for measuring and estimating the optimal solution.

Several other nature-inspired sMeta algorithms have been proposed over the years for continuous optimisation. However, most methods proposed in the last years are not trying to approximate gradient or exploit any specific feature of the problem. Some examples such as the Social Engineering Optimiser (SEO) [40], the Vortex Search algorithm (VS) [192], and the Simulated Raindrop Algorithm (SRA) [193], are based on inspiring metaphors that are implemented heuristically.

5.3. Compact Optimisation

The majority of the algorithms in this survey fall into this class. First, we comment on the methodology used to review the reported articles. Subsequently, we provide a comprehensive description of the compact optimisation literature.

5.3.1. Methodology and Research Questions

In scholarly writing within a specific domain, authors conventionally incorporate prevalent terminologies of that field into their research paper titles and keywords. This practice improves visibility and discoverability among a broad readership. When conducting a survey, employing the same keywords is intuitive for paper retrieval. Nevertheless, this approach may lack precision in the selection of pertinent papers. Hence, it is imperative to adopt a systematic methodology for the acquisition and meticulous selection of relevant literature.

After reviewing the Centre for Reviews and Dissemination (CRD) guidelines proposed in [194], we feel like we adhered sincerely and inadvertently to almost the same step-by-step procedure when compiling the research to ensure rigour and foster comprehensiveness. This approach facilitated the identification and evaluation of candidate publications, thus improving the quality and reliability of the survey results and their implications within the scientific community. There are also other guidelines for performing a bibliometric analysis, such as the one proposed in [195]. It elucidates the distinctions between systematic and bibliometric surveys, outlining the specific scenarios in which each method is applicable. This falls outside of the scope of this work; please refer to the previous reference for more details.

The survey we propose is motivated by the following Research Questions (RQs):RQ1: What characteristics of an algorithm can be used as useful classification criteria? In recent times, there has been a strong emphasis on classifying heuristics based on their inspiring metaphor, but we argue that other characteristics, as we picked the number of processed solutions, are more useful in practice. This RQ is addressed in Sections 25.RQ2: How many kinds of “lightweight” algorithms are available in the literature? Answering this question is useful to show practitioners what a memory-saving algorithm is for which application domain a specific class of these algorithms can be used. This RQ is addressed in Sections 1 and 2.RQ3: What are the main characteristics of lightweight algorithms? We report a mathematical and algorithmic description of the general framework to use for obtaining most of the existing memory-saving variants. This RQ is addressed in Section 5.RQ4: What are the application domains of memory saving optimisation? There are numerous and obvious domains where these algorithms can be used, but we dig deeper and indicate specific cases to show which strategies and variants have been more successful. This RQ is addressed in Section 6.RQ5: What potential future impact can lightweight algorithms have? There are open challenges to face, for example, in combining machine/reinforcement learning, where the use of fats and memory-saving algorithms can be preferred. This RQ is addressed in Sections 7 and 8.

The primary keywords used to review the literature and address the RQs are “memory-saving metaheuristics,” “lightweight metaheuristics,” “single solution metaheuristics,” “compact metaheuristics,” and all combinations where “metaheuristics” are replaced by “optimisation” or “algorithms.” The results are refined by incorporating the names of the seminal algorithms, e.g., “cGA” and “rcGA,” in the search, as well as relevant authors such as, e.g., “Harik” or “Minnino.” It is highly improbable for a new paper on compact optimisation (discrete or continuous) to be published without referencing any of these influential works. This is known as the forward snowball technique to expand the search. We then looked at the references in these papers to find other relevant pieces of research. The latter approach is known as backward snowballing.

We produced a comprehensive list of articles published in the proceedings of established conferences and prestigious publishers. This list includes some old milestone methods and several recently proposed algorithms. Each article was evaluated for alignment with the research questions of this survey, leading to its inclusion or exclusion in the article. In particular, the participation of a single researcher in this process can introduce bias, oversight, and inconsistencies. To address these concerns, the validity of the study was protected through a secondary review conducted by a second author. Although predefining the survey scope and managing paper selection could appear subjective, all conclusions and recommendations stem from the latest insights to ensure clarity of the scope of the paper scope and effective communication of its core message.

Toward the inclusion and exclusion criteria that guided the selection process in accurately categorising lightweight algorithms, we set the following ones. Papers published in languages other than English were excluded. The full text of the paper was scanned in many cases, not just the title and/or abstract. That is because the latter was not enough since it is not rare to find a paper that talks about compact algorithm/method/approach, but after reading the paper, we realise that it is about something different from the definition of “compactness” adopted in this survey. Similarly, algorithms that operate on single solutions may not necessarily be memory savings. In addition, duplicate instances of the retrieved papers were omitted, as well as other papers that have been published at a conference (with limited experimental setup), and then an extended version appears in a journal paper (for example, with cTLBO and cBAT). Lastly, lightweight algorithms hybridised with other components that lead to improvement but do not preserve the concept of lightweight were also removed or ultimately retained due to some considerations. Another case appeared concerning a particular algorithm, the Mean-Variance Mapping Optimisation (MVMO), in which we had different points of view: it was considered by one author as sMeta but not by another. Since we encountered difficulty in making a conclusive decision, we decided to exclude it from the taxonomy. This thorough review filtered out and pruned down the number of articles to a more manageable one. The resulting compilation was deemed representative of presenting a diverse range of lightweight algorithms serving as the basis for the proposed taxonomy.

5.3.2. Compact Optimisation Algorithms

Compact algorithms are among the simplest expressions of EDA algorithms, see Section 4, thus featuring fewer memory and computationally onerous algorithmic structure. For this reason, they have become popular since the publication of the compact Genetic Algorithm (cGA) [196], which was shown to perform similarly to the popular Simple GA with uniform crossover over discrete domains.

After cGA, counterparts for the continuous domain appeared in [197, 198], where the real compact Genetic Algorithm (rcGA) is presented. This concept was then expanded to obtain other EC approaches, such as compact DE (cDE) [199], compact PSO (cPSO) [42], and, in a more general sense, a compact optimisation framework [31]. A general template that illustrates the structure of a compact algorithm is depicted in Algorithm 1.

input: probabilistic model , problem size
output: best solution
sample by means of
while termination criterion is not met do
 sample a candidate solution by means of
 compare fitness of and
 update
if condition replacement is satisfied then
  
end if
end while
5.4. Probabilistic Models

The binary and Gaussian probabilistic models in [196, 198] are the most widely used ones in compact optimisation depending on the discrete or continuous/real-valued nature of the search space. Taking into account the original notation in [196], compact algorithms require a “Probability Vector” PV to probability values. However, this is true for the original binary case. In the continuous domain, PV has been kept as a legacy variable, but it contains (Gaussian) distribution parameters (mean values and standard deviation), thus being a two-dimensional array. Note that a “virtual” population size has to be indicated. This is used within the module to mimic the coverage behaviour that larger or smaller populations have in the search space.

Note that other models have been proposed in the literature. We describe them in the following subsections.

5.4.1. Binary Model

In this model, is a vector of length equal to the dimension of problem , where each element is the probability of sampling 1 for that design variable. All elements of are initialised with a value of 0.5 to have an initial uniformly distributed solution. Relevant examples of algorithms using this model are, e.g., [37, 200204].

5.4.2. Gaussian Model

In the Gaussian model, is the matrix  = [, ], where and are the mean and standard deviation vectors of an uncorrelated and truncated Gaussian Probability Distribution Function (PDF) [198] with domain . Mathematically, this is formulated as in equation,where and are the mean and standard deviation along the axis, and is the error function.

When the algorithm is initialised, these parameters are initialised so that and where is a positive constant (usually  = 10) large enough to approximate a uniform distribution in .

For the Cumulative Distribution Function (CDF), this is formulated in equation

To compute this CDF, which does not have a closed analytical expression, it is approximated using Chebyshev polynomials [205] to be used within the algorithm.

(1) Sampling. The polynomial approximation of the CDF is used to generate solutions within the search space. To do this, a uniform random number is first generated.

Subsequently, the inverse function of the CDF of each axis must be computed to be evaluated at . This returns the value of the design variable that forms the new candidate solution .

Note that this model operates in the domain. Therefore, when dealing with a generic domain, the obtained value of has to be scaled back to the original decision space simply by performing . Conversely, this also means that before feeding a solution to the algorithm, one should normalise it within , unless that is the domain of the original problem.

(2) Selection and update. When the newly generated individual competes with the current individual, the fittest (i.e., the one with a lower objective function value in a minimisation context) is referred to as the winner, while the other is declared as loser. The winner influences as its design variables are used in the update rule of both and as shown in equations (4) and (5), respectively, where indicates the iteration counter and and are the components of winner and looser, respectively.

For more details, see [198].

Algorithms using this model are, e.g., the real-valued compact algorithms rcGA [198], cDE [199], cPSO [42], cABC [206], etc., see Section 5.6 for more details.

5.4.3. Enhanced Gaussian Model

An improved model using two PDFs that share the same parameters and is proposed in [207].

When this model is used, the normalised search space is seen as , and two PDFs, namely, and , are defined as

Algorithm 2 shows the sampling mechanism, where the parameter controls the probability of employing equation (6) other than equation (7)

input: the vectors , and parameter
output: a new trial solution
fordo
 generate a random number according to the uniform distribution
ifthen
  generate [, ] according to Q-PDF described in equation (6)
else
  generate [, 1] according to R-PDF described in equation (7)
end if
end for

If is set to 0.5, then this model is expected to perform as the original Gaussian model [207]. Recent studies using this model reported, such as [207209].

5.4.4. Uniform Model

The work in [210] proposes a model based on the Uniform PDFs defined in the following equation:where and are the lower and upper bounds of the uniform distribution (note that these are different from the search space bounds and change during the optimisation process). By integrating U-PDF, one can easily obtain the following:

The inverse is given by

Unlike the Gaussian model, the bounds and of U-PDF vary if the mean and standard deviation vary, according to equations.

After calculating and , a uniform random number must be generated to obtain a generic design variable through equation (10)

Examples of algorithms that employ these models are those in [38, 211].

5.5. Binary/Discrete Compact Optimisation Algorithms

The first compact algorithm, i.e., cGA, was designed for discrete optimisation. Since then, several other cMeta algorithms appeared in the literature for solving discrete problems. Alternative mutation operators for cGA are proposed in [37, 200, 203, 212], and an evolutionary strategy for the survival of the offspring is proposed in [213]. A study on elitism for GAs in [201] led to two variants with strong and weak elitism that outperformed the original nonelitist cGA and the popular (1 + 1)-ES. Elitism has also been used in [37, 203, 214, 215]. It should be noted that selection pressure can also be ensured by using a larger tournament selection [196].

Other studies try to improve the updated process of . A moving average strategy is presented in [202], and weights are used in [216]. Learning mechanisms for choosing among multiple evolved probability vectors are also available [217, 218]. Note that by doing this, the algorithm might achieve better performance, but it would require more memory usage to store multiples . Hence, these might no longer be considered lightweight according to the classification of this survey. However, if such memory is available on the device, these strategies can be used, as well as some -population algorithms that might have the same memory footprint. Unlike [219], some degree of inheritance is included in the probabilistic model, while the study [215] replaces with the belief vector to store probability values belonging to a Gaussian distribution with a given mean and variance. As in the continuous case, is not a real vector and is more memory-intensive than the original model. We would like to highlight that these advances available in the literature often perform better than the original methods. However, we argue that they are all based on adding complexity. This often leads to higher memory footprints and/or higher time overheads. This is a well-known problem in stochastic optimisation, where a trade-off between performances and other criteria (asymptotic complexity, overheads, smallest use of memory slots, etc.) must be taken according to the optimisation scenario. For the sake of completeness, we report a wide range of studies in this survey and provide these considerations to the reader.

Adaptation is an interesting feature of an optimisation algorithm. In [204, 220], adaptation to the problem is obtained by adding information on the frequencies and continuity of the update probabilities in the updated rules. Most importantly, parameter adaptation schemes are proposed in [221] to have parameterless cGAs capable of tuning the population size to remove unfavourable implications of genetic drift.

Multistart schemes also help in compact optimisation. After each restart, the initial point changes, and usually, is reinitialised. In [222], the restart occurs if no improvement in the objective function values is registered within a fixed number of consecutive generations. This can be seen as an enhanced exploration phase, where the cMeta is used to refine the new initial point. This can also be done with an opposite approach where the cMeta provides solutions that are refined with a local searcher as, e.g., steepest descent [223], or problem-specific local search operators [222, 224, 225].

For the sake of completeness, we report on the use of combinatorial cMeta used in systems that allow for parallelisation. Here, the problem of keeping the number of memory slots at a minimum level is less evident, while the simplicity of the models is important to a feasible and error-free implementation in devices such as FPGAs. The possibility of constructing parallel versions of cGA is discussed in [226230], as well as in [231] (which considers multi-FPGA partitioning), while a memetic variant of cGA was presented in [232], along with a mechanism for fine-grained parallelism.

Other works employed cGA on various hardware devices [37, 233237]. More recently, a GPU-enabled implementation of cGA was presented in [238], to solve a “seriously” large-scale (up to 10 million variables) Integer Linear Programming problem taken from [239], as well as continuous and discrete versions of the OneMax benchmark problem of up to one billion variables.

The use of cGA was also analysed in the context of noisy optimisation in [240]. This study showed that cGA can handle noise efficiently by adjusting its step size according to the level of noise. This method was called graceful noise scaling.

From a theoretical perspective, the runtime of a discrete cGA is studied in multiple works. In particular, it was analysed on the jump functions in [241243]. In [244], lower and upper cGA runtime bounds have been derived for pseudo-Boolean functions, such as OneMax. Other studies have investigated how the size of the virtual population influences the performance of cGA [245, 246].

On a historical note, it is worth mentioning the Selfish Gene Algorithm (SGA) [247], which is very similar to cGA and was presented almost contemporaneously. SGA (not to be confused with the Simple GA) is based on the “selfish gene” theory in biological evolution [248]. Similarly to cGA, SGA evolves a pool of genes that is updated by means of a virtual population. Recently, a new variant of SGA dubbed the “replacement and never penalising” SGA, which was proposed in [249]. Instead of penalising the genes of the loser, this variant replaces them with those of the winner. This algorithm was applied to optimise the gymnastic movements of a humanoid robot. Two elitist variants (with persistent and nonpersistent elitism) of this algorithm are presented in [250]. These seem to significantly outperform the original SGA. For details on SGAs, we refer to [251].

Finally, we should stress that there are several compact binary variants of other metaheuristics besides cGA and SGA. An algorithm known as cBinDE, which stands for compact Binary Differential Evolution, was introduced in [252]. This algorithm follows the same principle as cGA, but it uses the binary versions of mutation and crossover of the Differential Evolution algorithm combined with a simple local search. This algorithm was successfully used to maximise the functional coverage percentage in the verification of digital systems. Binary cDE was also studied in [253, 254]. Other works instead investigated binary versions of compact PSO [255], compact Firefly Algorithm [33], compact Co-Firefly Algorithm [256, 257], compact Memetic Algorithms [258], and other kinds of compact EAs [259, 260]. We argue that potentially all metaheuristics can be made “compact.” However, finding the most useable or suitable solution for a problem is a real challenge. This either requires a time-consuming empirical phase, or a more informed approach, which can be possible only in some cases. This is a fundamental research question in the field to be prioritised in the future.

The most interesting areas of application of compact optimisation in the discrete domain include the Travelling Salesman Problem (TSP) [224, 261]; determining minimum set primers in Polymerase Chain Reaction (PCR) [262]; task scheduling in grid computing environments [263]; protein folding [264]; object recognition [265, 266]; soft decision decoding [267, 268]; minimising the number of coding operations required in multicast based on network coding [222]; estimating the parameters of the maximum log-likelihood function of a first-order moving average model [269] and a mixed model [223]; optimising the aggregation of multiple similarity measures to obtain a single similarity metric for ontology matching [270]; optimising ontology alignment [271]; designing multiple input multiple output wireless communication systems [272].

5.6. Real-Valued Compact Optimisation Algorithms

In Sections 5.6.1 and 5.6.2, we report on EC and SI cMeta for the continuous domain, respectively.

5.6.1. Compact Evolutionary Algorithms (cEAs)

The real-valued compact Genetic Algorithm [198] is the first compact algorithm for continuous optimisation. It uses the Gaussian model described in Section 5.4.2 and only requires storing the elite individual, a temporary solution, and PV to perform the search. A similar variant is proposed in [273], which, despite being named a compact PSO algorithm, displays the same working mechanism of rcGA (we will refer to it as cross-rcGA). The peculiarity of this variant is that it performs a decomposition of the problem into three subproblems. For each subproblem, a local best solution is needed, as well as a local PV (in this context, it is similar to a PSO). We point out that this algorithm is based on an interesting idea but ends up requiring three local best solutions, three temp solutions, a global best slot, and three PV matrices, thus having a similar memory footprint of pMeta with a small population size. Another variant is presented in [21], where rcGA is hybridised with SNUM and is called cSNUM. Here, after generating an offspring solution with the rcGA mechanism, the SNUM operator is applied to a randomly selected design variable. cSNUM deals well with separable problems of different dimensionalities. Similarly, the Single/Multi Non-Uniform Mutation (cSM) algorithm [44] is a hybrid algorithm that combines an rcGA-like structure with the nonuniform mutation (NUM) operator. This is very similar to cSNUM but perturbs all variables instead of just one. An interesting solution is the Uniform compact Genetic Algorithm (UcGA) [211], which is based on the uniform model and features a virtual population size that decreases linearly. Furthermore, it employs a local search operator.

The cDE algorithm [199] generates new trial solutions using the fundamental logic of DE, but rather than selecting them from a population, it samples them from a probabilistic model. Potentially, it can be used with all possible DE mutation strategies, crossover operators, and elitism schemes. However, depending on the use of a specific mutation operator, one may need to sample more individuals, thus requiring more memory. The simplest mutation, i.e., “rand/1,” requires sampling three points to generate the so-called mutant vector. Compared to rcGA, a performance gain is recorded in most benchmark problems [199]. This might be due to the fact that DE is designed for continuous optimisation, and, therefore, cDE maintains the very same encoding and working logic as DE. This is not the case for GA, which is usually used over discrete domains and requires a real population to perform selection mechanisms such as fitness-proportionate or tournament selection (which can only be used with a size of two individuals in the memory-saving context). For these reasons, rcGA ends up performing worse than its population-based counterpart in many cases, particularly for mid- and high-dimensional problems , while cDE is comparable to its population-based counterpart. Moreover, in the continuous domain, cDE usually outperforms rcGA (but requires at least 3 individuals for the mutation, on top of the elite solution and a temporary vector). Similar considerations also apply to other cMeta algorithms, see [31] for details, as population-based algorithms that perform selection based on pairwise comparisons can be successfully and straightforwardly encoded into a compact scheme, while the other might display substantial performance degradation. In this light, there are many cDE-based algorithms in the literature. We remark that some of them might require the same amount of memory slots of population-based algorithms with small populations, but most are still characterised by simple and memory-cheap algorithmic structures. For example, the Disturbed Exploitation compact Differential Evolution (DEcDE) algorithm in [274] is a simple memetic approach based on a cDE algorithm that employs two DE exploitative search strategies. The first is the classic DE/rad/1/exp configuration. The second configuration instead has the trigonometric mutation (see [54] for details on DE). These exploitative DE operators are counterbalanced by a periodic stochastic alteration of the virtual population, which is meant to introduce exploration elements in the search. Despite its simplicity, the algorithm outperforms other compact algorithms in the benchmark functions tested. Similar results are obtained by using generalised Opposition-Based Learning (OBL) within cDE. The resulting cODE (compact Opposition-Based DE) [275] is competitive with its population-based counterpart. Differently, [276] proposes a memory-saving solution called Concise DE-based Chaotic Local Search (CDE-CLS) where a local searcher is added purposely to achieve fast convergence on a real-world problem. An adaptive version is instead the Compound Sinusoidal cDE (CScDE) proposed in [277]. Here, the compound sinusoidal heuristic is used to self-adapt the crossover rate and the mutation scale factor. CScDE outperforms most state-of-the-art compact algorithms on various benchmark problems. Other methods to improve upon exploration include the use of multiple cDE running together, as, e.g., [278]. However, these methods are not memory-saving, as they end up requiring a similar amount of memory slots to a small population-based algorithm, which may be preferred.

From the memory point of view, the cheapest compact DE framework is the compact Differential Evolution light (cDE-light) algorithm proposed in [279]. This is a fast approach, as it requires sampling of only one candidate solution per iteration instead of the three required to perform the classic DE/rand/exp. Furthermore, it does not require loops to implement the exponential crossover operator, which is replaced with a counterpart of this operator capable of predicting the number of design variables to be exchanged. The main idea to reduce the number of individuals in the rand/1 mutation (which is a linear combination of three randomly selected individuals) is to exploit the property of the Gaussian distribution from which these individuals need to be drawn (note that this is an approximation as the distribution is truncated and not a theoretical Gaussian function. However, the results are satisfactory). Indeed, under the reasonable assumption of having statistically independent individuals, the corresponding three Gaussian distributions can be linearly combined to model the Gaussian model of the resulting mutant vector. Without having to sample an individual to generate the mutant, this can simply be obtained from his Gaussian model. As for the crossover “light,” the derivation of the formula predicting the number of variables to be exchanged without requiring a loop through two individuals is provided in [279]. This algorithm is based on interesting design ideas, and, despite the assumptions and approximations, it performs well and behaves similarly to cDE. As seen for a single-solution algorithm, in this case, a performance gain is recorded when it is equipped with the restart with the inheritance mechanism described in Section 5.2. The study in [43] shows that most compact algorithms can benefit from this scheme by benchmarking restart variants of compact DE, PSO, and other algorithms whose compact version is introduced in the next section.

In the literature, there are examples of compact Evolution Strategy (cES) algorithms, such as c(1 + 1)-ES and the c(, )-ES [280]. From the experimental analysis in [280], the c(1 + 1)-ES algorithm appeared to be as effective as the original (1 + 1)-ES. Conversely, c(, )-ES appeared to perform worse than its population-based (, )-ES counterpart, especially for high values of .

5.6.2. Compact Swarm Intelligence Algorithms

Compact Particle Swarm Optimisation (cPSO) [42] is the compact counterpart of the PSO algorithm for the continuous domain. This is simply obtained by using the Gaussian model to generate a new particle , which is perturbed by the velocity vector as in the original PSO. However, to avoid sampling the swarm, some adjustments are needed; the concept of the local best particle cannot be replicated if a single solution is employed at a time to maintain a memory-saving framework. For this reason, the PSO update formula for only takes into account the actual global best solution , while the local best solution is drawn from the Gaussian model with the current PV values. Note that there are variants of this algorithm using different distributions for the model, as in the real-parameter compact supervision for PSO (rcSPSO) [281], where a combination of Cauchy and Gaussian distributions are used. Self-adaptive variants, like the one in [282], have also been proposed.

The compact Bacterial Foraging Optimisation (cBFO) algorithm [283] also employs the same chemotaxis scheme of population-based BFO, but it models the population with the Gaussian model of Section 5.4.2. Similarly to BFO, a new solution is generated from the model at each chemotactic step, and a mix of tumble/swim moves is attempted. When generating new offspring (either using the sampling mechanism or via a tumble/swim), its fitness value is compared to that of the current best solution. The compact implementation of the reproduction and elimination/dispersal steps is a bit different. Instead of preserving and replicating the best bacteria as BFO does, cBFO moves the PDF in favour of the elite and shrinks over it. As a result, forcing a PDF update is an approximation of the sexual reproduction step. Finally, the injection of new randomly produced bacteria into the swarm is modelled using a perturbation of in the elimination/dispersal step.

The list of compact algorithms is large. As many population-based algorithms can be made compact, the literature keeps offering examples of a new compact version to be used mainly in applied contexts. A compact Artificial Bee Colony (cABC) is proposed in [206], and an Enhanced cABC (EcABC) variant is proposed in [207]. A parallel structure, abbreviated as pcABC, is presented in [284]. However, the latter is meant for hardware systems with multiple cores/processors which do not suffer from memory or computational limitations. Some compact Firefly Algorithms (cFAs) [38] are also widely used. Note that these are obtained with some simplifications of the original strategy, which makes the compact version (in particular, the persistent variant using the Gaussian distribution) follow the same steps of rcGA except for an extra step before updating PV. This is required to direct the loser toward the winner to adhere to the original framework. Based on the method for updating the elite solution, which can require using L’evy flight movement, Opposition-Based Learning, and the use of Gaussian or uniform distribution, 12 cFA variants can be obtained. More are presented in [285, 286]. In this light, one can see that making an algorithm compact can be simple. However, the current trend of simplifying existing algorithms just to present a new optimisation framework does not necessarily help progress in understanding what good practices are in the algorithmic design phase. This is particularly true when the design is driven only by inspiring metaphors, which often results in new algorithms whose working mechanism is either unclear or similar to other existing heuristics. These metaphor-led compact algorithms are now abundant in the literature for solving real-world applications. We do report some one of these applied scenarios solved with, e.g., compact Cat Swarm Optimisation (cCSO) [287], compact Teaching-Learning-Based Optimisation (cTLBO) [288], compact Harris Hawks Optimisation (cHHO) algorithm [289], compact Bat Algorithms [208], compact Flower Pollination Algorithms [209, 290], compact Pigeon-Inspired Optimisation (cPIO) [291], the compact Sine Cosine Algorithm (cSCA/pcSCA) [292, 293] and McSCA with Multi-group and Multi-strategy (based on different DE mutations) [294], the compact Equilibrium Optimiser algorithm (cEO/pcEO) [295], the compact Cuckoo Search (cCS) [296], compact Harmony Search Algorithms (cHSA) [297], and many others. We direct the reader to these studies and argue that while the application domain is interesting, it is difficult to understand what the contribution of proposing such algorithms to solve such problems is. In most cases, these are similar to existing methods or just present insufficiently motivated combinations of operators. Although we believe that progress in the algorithmic design must be kept alive within the community, by surveying the recent literature, we call for more emphasis on analysing the algorithms to have a more informed design phase in the future.

6. Lightweight Metaheuristics Applications

The main scenarios and motivations for using lightweight algorithms are summarised as follows.

6.1. Embedded Systems

Lightweight algorithms can be used in several low-cost, resource-limited computing devices that are used nowadays in a wide range of miniaturised commercially available devices, such as those used, for example, in wet laboratories [32, 33], humanoid robots [38], flying robots (microaerial vehicles) [298], and mobile robots [22].

6.2. Real-Time Optimisation

When applications require a real-time optimisation problem to be solved, sMeta can be a logical choice (as well as most compact algorithms as long as the complex model is not used). In engineering, such situations are abundant and in most cases do not require an optimal solution but rather a solution of satisfactory quality within some precision thresholds. Examples of these scenarios are nonlinear optimal control, receding horizon control, and moving horizon estimation [299]. Other applications might involve solving large-scale optimisation problems, such as optimising the parameters of black-box models (e.g., a deep neural network or a hidden Markov model), or solving inverse problems [186] on board an embedded system.

6.3. Hybrid Optimisation Algorithms

Lightweight algorithms, and in particular single-solution metaheuristics, are useful “building blocks” for hybrid algorithms [211, 276]. Even in nonmemory-saving contexts, this is evident when dealing with hyper-heuristics and memetic computing approaches.

6.4. Other Situations

A universal metaheuristic does not exist, and in many real-world scenarios, simple algorithms perform better than more complex ones. Based on examples in, for example, optimal control in industrial plants [42], neural network training [288], and ontology mapping [300], we recommend taking them into account as they might be able to provide satisfactory results while keeping implementation difficulties relatively low.

Relevant application contexts where lightweight algorithms are used are listed as follows:(1)Wireless Sensor Networks (WSNs), e.g., Optimised deployment [301], Minimised energy depletion [302], Base station locations [303], Topology control scheme [304], Clustering formation [208, 301, 302], and Node Localisation [282, 293];(2)Embedded control systems [39, 185, 198, 199, 274, 305];(3)Robotics, e.g., industrial robots [39, 274, 276, 279, 305, 306], mobile robots [22], humanoid robots [38, 285, 286, 297], and unmanned aerial vehicles [185, 296];(4)Electronic design, e.g., of magnetic field sensors [186], printed circuit boards [307], digital signal processing elements [180, 308311], and antennas [312, 313];(5)Computer vision, e.g., image segmentation [287], clustering [179], and face recognition [173, 174];(6)Power/energy systems [42, 314, 315] and renewable energy systems [291, 295, 316];(7)Transportation [292, 294] and civil engineering [317, 318];(8)Design of engineering/mechanical structures [289, 319];(9)Natural language processing [320];(10)Machine learning [276, 288, 321323];(11)Ontology engineering [211, 273];(12)Cloud computing security [324];(13)Bioinformatics [175, 176].

Table 1 contains the aforementioned applications, related articles, and specific algorithms that are being used.

7. Discussion and Open Issues

Based on the survey of the literature on lightweight metaheuristics reported before, we can now draw the following conclusions:(i)Compact and single-solution algorithms are commonly expected to be outperformed by population-based algorithms in terms of solution quality. However, this is not always the case. Furthermore, there might be applied contexts where these are the only choices because of memory constraints or just preferred for speed gain and simplicity in their implementation.(ii)Some well-known drawbacks of population-based optimisation are premature convergence or stagnation. When the first occurs, the population loses diversity, and the algorithm is stuck in a local optimum. In the second case, even though the population is still diverse, the search may stagnate, meaning that the operators cannot create offspring that outperform their parent solutions. Premature convergence also plagues lightweight algorithms. Interestingly, the latter can be used to help population-based algorithms overcome stagnation by providing additional movement in the search space [78] and premature convergence by adding the population superfit individuals.(iii)In relation to the previous point, most compact optimisation methods have an intrinsic limitation when dealing with multimodal functions [21, 277]. Indeed, since they lack an actual population of candidate solutions, they cannot provide a sufficient degree of diversity, particularly in the long term after the Gaussian model has converged. As a result, unless certain restart methods are introduced, these types of algorithms excel at exploitation but fall short of exploration, which is required to handle multimodal functions effectively. Indeed, after the Gaussian model converges, new solutions are sampled from a relatively tiny subset of the search space, resulting in a local search.(iv)Due to the fact that compact optimisation algorithms by nature handle each variable independently, they perform exceptionally well on separable problems. This is especially true for some of the most recent algorithms, e.g., CScDE, SNUM, and cSNUM [21, 277]. However, it is possible to endow both cMeta and sMeta with algorithmic moves that handle multiple variables at the same time. A concrete example is that recently proposed in [44], where the cSM approach was shown to be successful, particularly in multimodal problems.(v)Among compact algorithms, some (e.g., rcGA) work well in lower dimensionalities, while others (e.g., cPSO) perform especially well in larger dimensionalities (a similar observation can be made for single-solution algorithms). This seems not only to be a consequence of the inherent search logic underlying each algorithm. Another possible explanation has been provided in a recent study provided in [326], which showed that the correlation between pairs of variables appears to continuously decrease as the size of the problem increases, from the point of view of a stochastic search algorithm. As a consequence, large-scale nonseparable high-dimensional problems can be approached as if they are separable. According to the same authors, this effect arises from the fact that in high dimensionalities only a very limited portion of the decision space can be explored with a reasonable computational budget. Therefore, exploration should be performed with any improvement along each variable, which is consistent with simple methods that use a limited computational budget to focus mainly on exploitation. This is also compatible with most multistart lightweight algorithms.

8. Future Research Directions

Promising research lines to improve upon the current state of the art in light-weight optimisation algorithms which are summarised as follows:(i)Probabilistic models: Nearly the totality of compact algorithms employs the Gaussian distribution, which is simple to work with, but alternative models might be more suitable and should be investigated.(ii)Multimodality: Poor performance in this class of problems suggests the need to focus more on exploratory operators or mechanisms in the algorithmic design phase.(iii)Scalability: The problem of dealing with increasing dimensionality values without deterioration of performance needs further attention, as truly large-scale problems are becoming increasingly frequent, even in low-resource devices.(iv)Hybridisation: It will be interesting to explore different combinations of multiple compact logics.(v)Adaptation and learning: Lightweight algorithms with increased intelligence will be needed, that is, algorithms that can self-adapt their parameters to the problem at hand. Another possibility would be to put multiple lightweight algorithms together, using, for instance, a scheme similar to that introduced in [327]: in this scheme, 6 algorithms are arranged into three bags, each with two algorithms, depending on the number of solutions used in the optimisers. Once a bag is selected, its optimisers are run according to a reinforcement learning process based on its performance. As a result, if the considered optimiser performs well (based on the fitness improvement), it will be rewarded, while the others will be penalised. The best-performing optimiser will earn more probability to run in the next iterations, while the other optimisers will be activated less frequently, eventually until their corresponding probability is null.(vi)Noisy/dynamic optimisation: It will be useful to investigate new compact optimisation schemes (for instance, with new distributions or new sampling mechanisms) to deal with noisy functions or Dynamic Optimisation Problems (DOPs) [328, 329] (i.e., problems where the search space changes over time). Concerning noisy optimisation, apart from some works on cGA [240, 330332], the only few works dealing with noise are based on rcGA [333], cDE [334], and a compact EDA framework [335]. In terms of dynamic optimisation, a Hooke-Jeeves-based Memetic Algorithm (HJMA) was presented in [336], where experiments have been conducted on the Moving Peaks (MP) problem (this benchmark is defined in [337] as an artificial multidimensional landscape comprising multiple peaks, each of which has its height, width, and position slightly altered whenever a change occurs in the environment. Its complexity can be raised by increasing the number of dimensions/peaks and by adding noise over the whole landscape. A review of the approaches that have been tested in the dynamic MP problem can be found in [338]. More recently, some authors proposed the Deterministic Distortion and Rotation Benchmark (DDRB) [339], a method to generate Deterministic Dynamic Multimodal Optimisation Problems (DMMOP) considering both dynamic and multimodal characteristics, which can simulate more diverse sets of challenges. In the same context, another set of benchmark problems, as well as an optimisation framework, called PopDMMO, containing several population-based algorithms, was designed in [340]). To address the need to continuously adapt to landscape changes, some improvements in cGA have been proposed in [341], based on techniques of hypermutation and random immigrants. The results of a modified version of the Moving Peaks Benchmark indicate that both strategies improve the algorithm performance for dynamic environments. Instead, the compact adaptive mutation genetic algorithm (amcGA) presented in [342] is based on an adaptive mechanism where the mutation scheme is directly linked to a change detection scheme so that the change detection scheme regulates the mutation rate (i.e., the degree of change determines the probability of mutation). This method was tested in [343] using a real-world dynamic optimisation problem that includes designing and optimising a PID controller for a torsional mass-spring-damper system in a dynamic environment. Other variants of amcGA can also be found in [344]. However, more research in this direction is needed.(vii)Constrained and multi-objective problems: Bound-constrained single-objective optimisation has been the focus of most research on lightweight metaheuristics. In many application scenarios, however, one needs to handle multiple objectives at the same time and/or handle a set of equality/inequality constraints. Therefore, future research on lightweight algorithms for multi-objective and/or constrained optimisation applications may be interesting. Regarding sMeta, some multi-objective variants of SEO have been adapted, for example, to solve the problem of home healthcare routing and scheduling [345], to optimise an integrated system of water supply and waste collection [346], and to optimise a municipal solid waste problem [347]. Instead, in [348], a multi-objective variant of the VS algorithm is introduced. Regarding cMeta, there are already some studies that go in this direction which already exist; see, e.g., [349, 350]. These research papers show that Multi-Objective compact Differential Evolution (MOcDE) and Multi-Objective compact Particle Swarm Optimiser (MOcPSO), respectively, can be used successfully for solving unconstrained, continuous multi-objective optimisation problems. In [257, 351], the authors solve the ontology alignment problem using the compact multi-objective Co-Firefly algorithm and cPSO, respectively. One main drawback of cMeta and sMeta is that, without an actual population or an archive, they cannot keep a Pareto front in memory. Lack of diversity is also an issue for multi-objective optimisation. Therefore, more research is needed in this direction. Regarding constrained optimisation, an Improved SA (ISA) is introduced in [352], which is capable of dealing with only linear constraints. ISA is characterised by changing only one component of the current solution at each iteration, without penalty function. Another similar work [353] proposed two variants based on a hybrid Simulated Annealing-Hill Climbing algorithm, to solve constrained optimisation problems. The first version incorporates penalty methods for constraint handling, whereas the second one eliminates the need for imposing penalties in the objective function by tracing feasible and infeasible solution sequences independently. Also, more research seems to be needed in this area.(viii)The trend in designing novel algorithms solely by following an inspiring metaphor and making their compact versions available would not help understand how these simple methods work and can be improved in the future. Even if good results can be obtained with such algorithms over some application domains, this does not seem to lead to any progress in explaining the algorithmic behaviour and the well-known drawbacks of metaheuristics. Also, from the novelty point of view, this approach is arguable [65, 66, 354]. Therefore, we call for more fundamental research in the direction of overcoming drawbacks to obtain an improved self-adaptive algorithmic structure.

Data Availability

Original data were not collected or generated to support this study. Instead, we have included comments on the references surveyed in our article that contain links to data.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Souheila Khalfi conducted conceptualisation, investigation, writing of the original draft, writing of the review, editing, and visualisation; Fabio Caraffini conducted conceptualisation, writing of the original draft, writing of the review, editing, and visualisation; Giovanni Iacca carried out conceptualisation, investigation, and writing of the original draft.

Acknowledgments

Open access funding was enabled by Swansea University and organised through JISC.