Computational Intelligence and Neuroscience

Volume 2016, Article ID 8326760, 12 pages

http://dx.doi.org/10.1155/2016/8326760

## Controlling Individuals Growth in Semantic Genetic Programming through Elitist Replacement

^{1}NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisboa, Portugal^{2}Faculty of Economics, University of Ljubljana, Kardeljeva Ploščad 17, 1000 Ljubljana, Slovenia

Received 6 June 2015; Revised 29 September 2015; Accepted 1 October 2015

Academic Editor: Ricardo Aler

Copyright © 2016 Mauro Castelli et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In 2012, Moraglio and coauthors introduced new genetic operators for Genetic Programming, called geometric semantic genetic operators. They have the very interesting advantage of inducing a unimodal error surface for any supervised learning problem. At the same time, they have the important drawback of generating very large data models that are usually very hard to understand and interpret. The objective of this work is to alleviate this drawback, still maintaining the advantage. More in particular, we propose an elitist version of geometric semantic operators, in which offspring are accepted in the new population only if they have better fitness than their parents. We present experimental evidence, on five complex real-life test problems, that this simple idea allows us to obtain results of a comparable quality (in terms of fitness), but with much smaller data models, compared to the standard geometric semantic operators. In the final part of the paper, we also explain the reason why we consider this a significant improvement, showing that the proposed elitist operators generate manageable models, while the models generated by the standard operators are so large in size that they can be considered unmanageable.

#### 1. Introduction

In the original definition of Genetic Programming (GP) [1, 2], the operators used to explore the search space, crossover, and mutation produce offspring by manipulating the syntax of the parents. In the last few years, researchers have dedicated several efforts to the definition of new GP systems based on the semantics of the solutions [3–5]. Differently from other domains [6–8], in the field of GP the term semantics refers to the behavior of a program once it is executed or more particularly the set of its output values on input training data [9–11]. In particular, new genetic operators, called geometric semantic operators, have been proposed by Moraglio and coauthors [9]. While these operators have interesting properties, which make them a very interesting GP hot topic [9, 12], they present an important limitation: at each application, the newly created individuals have a size that is bigger than the one of the parents. Even using implementation that allows executing the system in a very efficient way (like the one presented in [12]), the problem persists, in the sense that it is in general practically impossible to fully reconstruct the final model generated by GP and whenever it is possible, the resulting expression is so big that it cannot be understood by a human being. For this reason, in this study we define a very simple but effective method that allows GP to produce more compact solutions, without affecting the quality of the final solutions. More in detail, we propose to keep the offspring into the new population only in case they have a better fitness than the parents, keeping the parents otherwise. This type of “elitist” strategy, which has already been applied to standard “syntax-based” GP operators, has, to the best of our knowledge, never been applied to geometric semantic operators before.

The paper is organized as follows: Section 2 defines the geometric semantic operators first introduced in [9]; Section 3 first experimentally analyzes some characteristics of these operators and then presents the proposed elitist method. Section 4 presents the experimental settings and the obtained results, showing the appropriateness of the proposed technique in reducing individuals growth without penalizing their fitness. Finally, Section 5 concludes the paper and provides hints for possible future work.

#### 2. Geometric Semantic Operators

Even though the term semantics can have several different interpretations, it is a common trend in the GP community (and this is what we do also here) to identify the semantics of a solution with the vector of its output values on the training data [3, 11, 13]. Under this perspective, a GP individual can be identified with a point (its semantics) in a multidimensional space that we call semantic space. The term Geometric Semantic Genetic Programming (GSGP) indicates a recently introduced variant of GP in which traditional crossover and mutation operators are replaced by so-called geometric semantic operators, which exploit semantic awareness and induce precise geometric properties on the semantic space. Geometric semantic operators, introduced by Moraglio et al. [9], are becoming more and more popular in the GP community [3] because of their property of inducing a unimodal fitness landscape on any problem consisting in matching sets of input data into known targets (like supervised learning problems such as regression and classification). Geometric semantic operators define transformations on the syntax of the individuals that correspond to geometric crossover and ball mutation [11] in the semantic space. Geometric crossover is an operator of Genetic Algorithms (GAs) that generates an offspring that has, as coordinates, weighted averages of the corresponding coordinates of the parents with weights smaller than one, whose sum is equal to one. Ball mutation is a variation operator that slightly perturbs some of the coordinates of a solution. Geometric crossover generates offspring that stand on the segment joining the parents. It is possible to prove that, in all cases where fitness is a monotonic function of a distance to a given target, the offspring of geometric crossover cannot be worse than the worst of its parents, while ball mutation induces a unimodal fitness landscape.

*Geometric semantic crossover* (here we report the definition of the geometric semantic operators as given by Moraglio et al. for real functions domains, since these are the operators we will use in the experimental phase. For applications that consider other types of data, the reader is referred to [9]) generates the expression as the unique offspring of parents , where is a random real function whose output values range in the interval . Analogously,* geometric semantic mutation* returns the expression as the result of the mutation of an individual , where and are random real functions with codomain in and is a parameter called mutation step.

As Moraglio et al. point out, these operators create much larger offspring than their parents and the fast growth of the individuals in the population rapidly makes fitness evaluation unbearably slow, making the system unusable. In [12], a possible workaround to this problem was proposed consisting in implementation of Moraglio’s operators that makes them not only usable in practice, but also very efficient. Basically, this implementation is based on the idea that, besides storing the initial trees, at every generation it is enough to maintain in memory, for each individual, its semantics and a reference to its parents. As shown in [12], the computational cost of evolving a population of individuals for generations is , while the cost of evaluating a new, unseen, instance is . This allows GP practitioners to use geometric semantic operators to address complex real-life problems [14].

Geometric semantic operators have a known limitation [3]: the reconstruction of the best individual at the end of a run can be a hard (and sometimes even impossible) task, due to its large size.

#### 3. Elitist Geometric Semantic Operators

##### 3.1. Issues with Geometric Semantic Operators and Motivations

Even though in [9] Moraglio and coworkers presented interesting results on a set of benchmarks, clearly showing that geometric semantic operators are very promising, they also pointed out an important drawback of these operators: when these operators are used, the size of the individuals in the population grows very rapidly. In order to give an intuitive idea of the importance of the phenomenon, let us consider two individuals and and let us assume that these individuals belong to the initial population of a GP run. By the definition of geometric semantic crossover, the offspring of the crossover between and iswhere is a random tree. Assuming for simplicity that GP uses only crossover (the reasoning can be easily extended to the case of mutation), at generation 2 all the individuals in the population will have a shape like the one of the individual in (1), with the only difference that different trees will be plugged in place of , , and .

If we iterate this reasoning, performing the crossover between two individuals belonging to the population at generation 2, the offspring has the following shape:and all the individuals at generation 3 will share this structure, although using different trees instead of , , , , and , . While individuals of the same shape as the ones in (2) may already seem complex and hard to read, let us now assume that we iterate the GP run for hundreds of generations. It is not difficult to understand that the individuals in the population rapidly become so large that they are completely unreadable. Furthermore, evaluating those individuals at each generation would make the GP run extremely slow.

Moraglio and coauthors have an interesting discussion about code growth in [9], and they show that in the case of geometric semantic crossover this growth has even an exponential speed. This clearly represents a problem for the usability of GP and the readability of the generated individuals. Solving, or at least alleviating, this problem is the motivation of the present work.

##### 3.2. Principle of Elitist Geometric Semantic Operators

To partially counteract this problem we propose the following replacement method:(i)Considering two parents and , the offspring obtained from the semantic crossover between and is accepted in the new population if and only if its fitness is better than the fitness of both and . Otherwise, one of the two parents is copied in the new population. As we will make clear in the continuation, the parent that survives can be a random one or the best among and , and the choice between these two options has a very weak effect on the overall performance of the system.(ii)Considering an individual , the offspring obtained applying semantic mutation on is accepted in the new population if and only if its fitness is better than the fitness of . In the opposite case, itself is copied in the new population.While the idea is quite simple, it is interesting to point out that the proposed method is supposed to be useful if the geometric semantic genetic operators tend to produce a high number of individuals whose fitness is not better than the fitness of their parents. Hence, before testing the proposed method, it makes sense to perform an experimental analysis aimed at understanding how many crossover and mutation events produce an offspring with a better fitness than the parents. In order to do that, we considered five real-life applications: three applications in the field of drug discovery that are becoming widely used benchmarks for GP [15], an application related to the prediction of the high performance concrete strength [16], and a medical application whose objective is predicting the seriousness of the symptoms of a set of Parkinson disease patients, based on an analysis of their voice [17]. Regarding the three drug discovery applications, the objective is to predict three important pharmacokinetic parameters of molecular compounds that are candidate to become new drugs: human oral bioavailability (%F), median lethal dose (LD50), and protein plasma binding level (PPB). These problems have already been tackled by GP in published literature and for a discussion of them the reader is referred to [10]. Regarding the size of the dataset, the %F dataset consists of a matrix of rows (instances) and columns (features). The LD50 consists of instances and features, while the PPB dataset consists of instances and features. Each row is a vector of molecular descriptor values identifying a drug; each column represents a molecular descriptor, except the last one, which contains the known target values of the considered pharmacokinetic parameter. Regarding the concrete dataset, it consists of features and instances. Each row is a vector of concrete-related characteristics. The Parkinson dataset consists of features and ≈6000 instances.

For this experimental study, we use the same experimental settings considered in [12]. The only difference (besides the number of generations) is that, in this study, we considered several mutation step values. This is fundamental considering that we want to analyze the behavior of the semantic mutation operator whose performance is influenced by the mutation step value.

Results of this analysis are reported in Figure 1. In particular, we reported, for all the considered problems, the median (calculated over independent runs) of the percentage of crossover and mutation events that have produced an offspring with a fitness better than the respective parents. Let us discuss these results problem by problem, and, for each problem, considering the different mutation steps. For the %F problem with (Figure 1(a)) and (Figure 1(b)), we can draw similar observations: the mutation operator produces an offspring that has a better fitness with respect to the original individual in a percentage of the mutation events that stands between and . Regarding the crossover operator, the percentage of successful crossover events stands between and , according to the particular generation that is considered. A slightly different behavior can be observed in Figure 1(c), where a mutation step equal to has been considered. In this case, the crossover operator succeeds in producing offspring with a better fitness than both parents, on average of the times, and it is possible to see an increase of this percentage during the evolution. On the other hand, the percentage of successful mutations decreases, but still mutation succeeds in producing better offspring, on average of the times. A similar analysis can be performed for the second studied problem: PPB. With (Figure 1(d)) and (Figure 1(e)), crossover is successful on of the events, while mutation produces better offspring (with respect to their parents) in of the cases. When a mutation step equal to is considered (Figure 1(f)), the mutation success rate decreases during the evolution, passing from effectiveness of the in the initial generations to a final . Also, the success rate of crossover changes along the evolutionary process, starting with and then passing to and with a final rate of . The LD50 problem presents a similar behavior for all the considered mutation step values (from Figures 1(g)–1(i)): mutation produces fitter individuals in of the applications and crossover in approximately . Finally, the concrete problem and the Parkinson problem present a similar behavior between each other, and thus they can be discussed together. For and crossover produces better individuals than both the parents in a percentage between and of the applications, while mutation produces fitter individuals in a percentage between and of the cases. Similar conclusions can be drawn when a mutation step equal to is considered. The only difference in this last case is that the percentage of mutation events that produce fitter individuals rapidly decreases, passing from the initial to the final . On the other hand, crossover produces a better individual than both parents in only of the cases.