Computational Intelligence and Neuroscience

Volume 2015, Article ID 971908, 8 pages

http://dx.doi.org/10.1155/2015/971908

## Energy Consumption Forecasting Using Semantic-Based Genetic Programming with Local Search Optimizer

^{1}NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisboa, Portugal^{2}Tree-Lab Instituto Tecnológico de Tijuana, Mesa de Otay, 22500 Tijuana, BC, Mexico

Received 22 January 2015; Revised 16 May 2015; Accepted 18 May 2015

Academic Editor: Thomas DeMarse

Copyright © 2015 Mauro Castelli et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Energy consumption forecasting (ECF) is an important policy issue in today’s economies. An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. The paper proposes a semantic based genetic programming framework to address the ECF problem. In particular, we propose a system that finds (quasi-)perfect solutions with high probability and that generates models able to produce near optimal predictions also on unseen data. The framework blends a recently developed version of genetic programming that integrates semantic genetic operators with a local search method. The main idea in combining semantic genetic programming and a local searcher is to couple the exploration ability of the former with the exploitation ability of the latter. Experimental results confirm the suitability of the proposed method in predicting the energy consumption. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that including a local searcher in the geometric semantic genetic programming system can speed up the search process and can result in fitter models that are able to produce an accurate forecasting also on unseen data.

#### 1. Introduction

As reported in [1, 2], energy consumption forecasting (ECF) is the task of predicting the electricity demand on different time scales, in minutes (very short-term), hours/days (short-term), months, and years (long-term). An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. Overestimating the energy demand leads to an unnecessary energy production or purchase and, on the contrary, underestimation causes unmet demand with a higher probability of failures and costly operations. With the recent trend of deregulation of electricity markets, ECF has gained even more importance. In particular, in a dynamic market environment, precise forecasting is the basis of electrical energy trade and spot price establishment for the system to gain the minimum electricity purchasing cost. At the same time, with the amount of data steadily growing, the problem is getting more and more complex. All these facts show the importance of having reliable predictive models that can be used for an accurate energy consumption forecasting [2]. Numerous contributions presenting computational intelligence (CI) based approaches for ECF have appeared in the last years [3]. Surveys can be found in [2, 4]. Among the different CI methods, particular importance was given to neural networks [5, 6], particle swarm optimization [7], support vector machines [8], simulated annealing [9], and genetic algorithms [10]. One of the outcomes of the European Energy Forecast conference [11] that took place in Brussels in February 2014 was the identification the following facts and open issues. (a) ECF will have a huge impact on economy in the near future. (b) ECF is a very difficult problem, since it is influenced by asynchronous and often unpredictable facts. (c) Several different geographical and time scales can be identified for ECF, which contribute to making the task even more complex. (d) The currently existing CI technologies are inadequate to take on this new challenge, making new computational methods much in demand. In particular, the following flaws of the current CI methods were identified.(1)Given the high complexity of ECF, the iterative process of stepwise improvement of solutions that characterizes many CI methods often gets stuck in local optima, stagnating the search for better solutions.(2)Several existing forecasting methods are not able to deal with nonlinearity and other difficulties in modelling of time series. Hence, the performance of these models on test data are not as good as the one achievable on the training data.

The objective of the work presented in this paper is to fill this gap by developing a ground-breaking Genetic Programming (GP) system that (i) finds (quasi-)perfect solutions with high probability (no error on training data) and (ii) generates models able to produce near optimal predictions also on unseen data (test instances). GP is one of the most successful existing CI methods. In the last years, it has obtained excellent results on a large number of complex real-life applications [12] and it has recently made an important breakthrough: the definition of geometric semantic operators (GSOs), new genetic operators that induce a unimodal error surface on any supervised learning problem (including forecasting). Eliminating local optima, GSOs have a stronger problem solving ability. So, they are an excellent first step for overcoming issue number (1) discussed above and developing appropriate forecasting models. However, much work has still to be done in order to use GSOs in a complex application like the ECF. In particular, GSOs converge to optimal solution(s) very slowly and this behaviour is an important limitation in all the applications characterized by the presence of a large amount of data. Hence, in this paper we propose the definition of a CI system that combines GSOs with a local search algorithm. The main idea in combining GSOs and a local searcher is to integrate the exploration ability of GSOs with the exploitation ability of the local searcher. In this way, we expect to achieve optimal solutions faster and to obtain a final model that does not overfit the training data. To analyze the appropriateness of the proposed computational method for ECF, the energy consumption in Italy has been used as a test case.

The paper is organized as follows. Section 2 presents the variant of GP proposed in this study for addressing the ECF problem. Section 3 describes the data that have been considered and reports experimental results, comparing the proposed approach to the standard GP algorithm and other state-of-the-art GP variants. Section 4 concludes the paper, highlighting the main contributions of this work. In the final part of the paper, the appendix contains general introductions of basic concepts for nonexperts in the GP field.

#### 2. Methodology

This section describes the components of the proposed computational intelligence system designed for the ECF problem. In particular, Section 2.1 describes the geometric semantic operators and their properties, while Section 2.2 presents the local search strategy that we used with the GSOs.

##### 2.1. Geometric Semantic Operators

Despite the large number of human-competitive results achieved by GP [12], researchers still continue to investigate new methods for improving the power of GP as a problem solving method. In recent years, one of the emerging ideas is to include semantic awareness in the evolutionary process performed by GP. While several studies exist (a survey can be found in [13]), the definition of semantics is not unique and this concept is interpreted in different ways and under different perspectives [13]. In this work, we use the most common and widely accepted definition of semantics in the GP literature. Hence, the semantics of a program is defined as the vector of outputs obtained after executing the program (or candidate solution) on a set of the training data , such that [14].

In this section, we briefly describe the definition of the geometric semantic operators (GSOs) proposed by Moraglio and coauthors [14]. The objective of GSOs is to define modifications on the syntax of GP individuals that have a precise correspondence on their semantics. More in particular, the idea is to define transformations on the syntax of GP individuals that correspond to well known operators of genetic algorithms (GAs). In this way, GP could “inherit” the known properties of those GAs operators. Furthermore, contrarily to what typically happens in real-valued GAs or other heuristics, in the GP semantic space the target point is also known (it corresponds to the vector of expected output values in supervised learning) and the fitness of an individual is simply given by the distance between the point it represents in the semantic space and the target point (it corresponds to an error measure). The real-valued GA operators that we want to “map” into the GP semantic space are* geometric crossover* and* ball mutation*. In real-valued GAs, geometric crossover produces an offspring that stands in the segment that joins the parents. It was proven in [15] that in cases where the fitness is a direct function of the distance to the target (like the case we are interested in here) this offspring cannot have a worse fitness than the worst of its parents. Ball mutation consists in a random perturbation of the coordinates of an individual. It was shown in [14] that it induces a unimodal error surface for all the problems where fitness is a direct function of the distance to the target. The definitions of the operators that correspond to geometric crossover and ball mutation in the GP semantic space are as given in [14], respectively.

*Definition 1 (geometric semantic crossover (GSC)). *Given two parent functions , the geometric semantic crossover returns the real function , where is a random real function whose output values range in the interval .

*Definition 2 (geometric semantic mutation (GSM)). *Given a parent (as in [14], we abuse of the term “parent,” using it also to identify the solution that is transformed by a mutation operator) function , the geometric semantic mutation with mutation step ms returns the real function , where and are random real functions.

The interested reader is referred to [14] for a detailed discussion of these operators, including a justification of the fact that they correspond, respectively, to geometric crossover and ball mutation in the GP semantic space. Following [14], from now on GP that uses geometric semantic operators will be called geometric semantic GP (GSGP). As Moraglio et al. point out, geometric semantic operators create much larger offspring than their parents and the fast growth of the individuals in the population rapidly makes fitness evaluation unbearably slow, making the system unusable. Moreover, while this growth produces fitter solutions, it is responsible for creating models that are too specialized on training data, hence often generating overfitting. In [16], a possible workaround to the problem related to the slowness of the fitness evaluation process was proposed, consisting in an implementation of these operators that makes them not only usable in practice, but also very efficient. Basically, this implementation is based on the idea that, besides storing the initial trees, at every generation it is enough to maintain in memory, for each individual, its semantics and a reference to its parents. As shown in [16], the computational cost of evolving a population of individuals for generations is , while the cost of evaluating a new, unseen, instance is . Hence, the system can be efficiently used to address problems characterized by a large amount of data. This is the implementation used in this work.

##### 2.2. Local Search in Geometric Semantic Operators

In this work, we integrate a local search (LS) strategy within GSGP. In particular, we include a local searcher within the GSM mutation operator, since previous works have shown that GSGP achieves its best performance using only mutation [13]. In particular, the GSM with LS (GSM-LS) of a tree generates an individual:where . Notice that replaces the mutation step parameter used in the definition of GSM. Equation (1) defines a basic multivariate linear regression problem, which can be solved, for example, by Ordinary Least Square (OLS) regression. In this sense, after each mutation event, OLS is applied to the above expression to obtain the values of the model parameters that best fit the training fitness cases. We point out that, in some sense, this approach contrasts with previous work [17] that relied on a nonlinear local optimizer, since the linear assumption is mostly not satisfied by the expression evolved with standard GP and the corresponding parametrization. On the other hand, in this new approach it is simple to apply a linear regression optimizer, given that the GSM operator defines a linear expression in the parameter space. The idea of including a LS method is based on a very simple observation related to the properties of the geometric semantic operators: while these operators are effective in achieving good performance with respect to standard syntax-based operators, they require a lot of generations to converge to optimal solutions. Including a local search method, we expect to speed up the process and to obtain better solutions faster. Moreover, by speeding up the search process, it will be possible to limit the construction of overspecialized solutions that will eventually overfit the data.

#### 3. Experimental Study

This section describes the data, the experimental settings, and the obtained results for the ECF problem.

##### 3.1. Data Description

Historical energy consumption data and weather information in Italy in the years between and have been used to test the performance of the proposed system. TERNA S.p.A. (Rete Elettrica Nazionale) is an Italian electricity transmission system operator based in Rome, Italy. With 63,500 kilometres of power lines or around of the Italian high-voltage power transmission grid, TERNA is the first independent electricity transmission grid operator in Europe and the sixth in world based on the size of its electrical grid. TERNA is the owner of the Italian transmission grid and responsible for energy transmission and dispatching. The aim of the forecasting task studied in this paper is to predict the energy consumption at day , providing information until day (one-day ahead forecasting) using the past samples of the load and weather information. Data include temperatures, pressure values, wind speed, and other weather related information. Data from 1999 to 2006 have been used during the training phase, while the remaining available data (i.e., from 2006 to 2010) have been used to validate the model on unseen data and hence to assess the quality of the forecasting. The same dataset has been used in [1], where the standard GSGP system has been used for the same task and where GSGP was able to outperform state-of-the-art machine learning techniques in the ECF problem. Hence, in the presentation of the results, it will be interesting to assess whether or not the inclusion of a local searcher optimizer is able to produce a competitive advantage with respect to the simple use of GSGP.

##### 3.2. Experimental Settings

Four different GP systems were compared: standard GP (STGP) that uses the standard syntax-based genetic operators also considered in [1], GSGP that only uses the GSM operator; HYBRID that uses the GSM operator and the proposed GSM-LS operator, LSGP that only uses the GSM-LS operator at each generation of the evolutionary search process.

Regarding the four GP systems, all the runs used populations of individuals allowed to evolve for generations. Tree initialization was performed with the Ramped Half-and-Half method [18] with a maximum initial depth equal to . The function set contained the arithmetic operators, including protected division as in [18]. The terminal set contained variables, each one corresponding to a different feature in the dataset. Mutation has been used with probability , while in STGP we used a mutation rate of and a crossover rate of . The use of different settings for STGP is motivated by the fact that STGP with only mutation performs poorly on this problem. Hence, we decided to consider the settings that are able to produce the best performance for each of the studied systems. Survival from one generation to the other was always guaranteed to the best individual of the population (elitism). For GSM, a random mutation step has been considered in each mutation event, as suggested in [13]. Regarding the HYBRID system, GSM-LS has been used in the first generations, while in the remaining generations we considered the standard GSM operator. We decided to limit the number of generations where the local search has been used in order to assess whether the HYBRID system produces “similar” results with respect to GSGP or LSGP.

For all the considered techniques we studied the obtained performance over two different measures of error. In particular, these two measures are the mean absolute error (MAE) and the mean square error (MSE). The definition of these error measures is as follows:where is the output of the GP individual on the data sample and is the target value corresponding to . denotes the number of samples in the training or testing subset, and contains the indices of that set.

In the next section, the obtained experimental results are reported using curves of the median error on the training and test set. In particular, at each generation the best individual in the population (i.e., the one that has the smaller training error) has been chosen and the value of its error on the training and test sets has been stored. The reported curves finally contain the median of all these values collected at each generation. The median was preferred over the mean in the reported plots because of its higher robustness to outliers. The results discussed in the next section have been obtained using the GSGP implementation freely available at http://gsgp.sourceforge.net and documented in [16].

##### 3.3. Experimental Results

Figure 1 reports training and test error (MAE and MSE) for the considered GP systems against generations. For all the considered GP systems runs have been performed. These figures clearly show that LSGP outperforms GSGP and STGP on both training and test sets, for both the considered error measures. In particular, it is possible to note the fast convergence of the proposed system as well as the fact that the final model does not overfit the training data. A further corroboration about the suitability of combining GSGP with a local search optimizer is given by the performance of the HYBRID system. As it is possible to see from the figures, its performance is similar to the one achieved with LSGP, on both training and test data.