Applied Computational Intelligence and Soft Computing

Volume 2016 (2016), Article ID 3403150, 11 pages

http://dx.doi.org/10.1155/2016/3403150

## Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

^{1}Faculty of Computer Science and Media Technology, Norwegian University of Science and Technology, 2815 Gjøvik, Norway^{2}Faculty of Technology and Management, Norwegian University of Science and Technology, 2815 Gjøvik, Norway

Received 21 February 2016; Revised 11 May 2016; Accepted 16 May 2016

Academic Editor: Shyi-Ming Chen

Copyright © 2016 Fisnik Dalipi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We present our data-driven supervised machine-learning (ML) model to predict heat load for buildings in a district heating system (DHS). Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR), Partial Least Square (PLS), and random forest (RF). We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

#### 1. Introduction

As stated in the report of European Commission strategy for energy, the continuous growing of energy demand worldwide has made energy security a major concern for EU citizens. This demand is expected to increase by 27% by 2030, with important changes to energy supply and trade [1]. Being the largest energy and CO_{2} emitter in the EU, the building sector is responsible for 40–50% of energy consumption in Europe and about 30–40% worldwide [2]. The North European countries have proved themselves as forerunners in the development and application of clean and sustainable energy solutions. Their excellent performance on adopting such solutions enables them to achieve ambitious national climate objectives and requirements and to serve as key players in the entire European energy system [3].

District heating (DH) system is an optimal way of supplying heat to various sectors of the society such as industrial, public, or private buildings. DH network offers functional, economic, and ecological advantages and is also instrumental in reducing the global and local CO_{2} emissions. It offers an enormous adaptability to combine different types of energy sources efficiently [4]. Considering the recent technological trends of progressing to smart energy infrastructures, the development of the fourth generation of district heating implies meeting the objective of more energy-efficient buildings. Moreover, this also envisions DH networks to be as an integrated part of the operation of smart energy systems, that is, integrated smart electricity, gas, and thermal grids [5]. The application of new and innovative technology in district heating is therefore considered essential to improve energy efficiency [6].

The deregulation of the electricity market and the increasing share of energy-efficient buildings have put district heating in a more vulnerable position with regard to challenges in terms of cost effectiveness, supply security, and energy sustainability within the local heat market. With this background, it is therefore important for district heating sector to maintain an efficient and competitive district heating system which is able to meet the various requirements which characterize the heat market. In a flexible district heating system with multiple energy sources and production technologies, the need for accurate load forecasting has become more and more important. This is especially important in a district heating system with simultaneous production of heat, steam, and electricity.

In this paper, with the application of three different ML algorithms to predict heat consumption, we investigate the performance of* Support Vector Regression* (SVR),* Partial Least Squares* (PLS), and* random forest* (RF) approach to develop heat load forecasting models by making a comparative study. Our focus is on low error, high accuracy, and validating our approach with real data. We also compare the error analysis of each algorithm with existing techniques (models) and also find the most efficient one out of the three.

The rest of the paper is organized as follows: Section 2 outlines the related work, where we provide an overview of many approaches to load prediction that are found in the literature. In Section 3, we provide some background information about DH concepts. This is followed by a presentation of the system framework and related prediction models, given in Section 4. Further, in Section 5, we present and discuss the evaluation and results. Finally, Section 6 concludes the paper.

#### 2. Related Work

The state of the art in the area of energy (heating, cooling, and electric energy) demand estimation in buildings is classified as* forward (classical)* and* data-driven (inverse)* approaches [7]. While the forward modelling approach generally uses equations with physical parameters that describe the building as input, the inverse modelling approach uses machine-learning techniques. Here, the model takes the monitored building energy consumption data as inputs, which are expressed in terms of one or more driving variables and a set of empirical parameters and are widely applied for various measurements and other aspects of building performance [8]. The main advantage of data-driven models is that they can also operate online, making the process very easily updatable based on new data. Considering the fact that ML models offer powerful tools for discovery of patterns from large volumes of data and their ability to capture nonlinear behavior of the heat demand, they represent a suitable technique to predict the energy demand at the consumer side.

Numerous ML models and methods have been applied for heat load prediction during the last decade. A good overview of some recent references is given by Mestekemper [6, 9]. The former also built his own prediction models using dynamic factor models. A simple model proposed by Dotzauer [10] uses the ambient temperature and a weekly pattern for prediction of the heat demand in DH. The author makes the social component equal to a constant value for all days of the week. There is another interesting model, which address the utilization of a grey box that combines physical knowledge with mathematical modelling [11]. Some approaches to predict the heat load discussed in the literature include artificial neural networks (ANN) [12–15]. In [12], a backpropagation three-layered ANN is used for the prediction of the heat demand of different building samples. The inputs of the network for training and testing are building transparency ratio (%), orientation angles (degrees), and insulation thickness (cm) and the output is building heating energy needs (Wh). When ANN’s outputs of this study are compared with numerical results, average 94.8–98.5% accuracy is achieved. The authors have shown that ANN is a powerful tool for prediction of building energy needs. In [13], the authors discuss the way self-organizing maps (SOMs) and multilayer perceptrons (MLP) can be used to develop a two-stage algorithm for autonomous construction of prediction models. The problem of heat demand prediction in a district heating company is used as a case study where SOM is used as a means of grouping similar customer profiles in the first stage and MLP is used for predicting heat demand in the second stage. However, the authors do not provide any information related to the error rates obtained during the predictions.

In [14], recurrent neural networks (RNNs) are used for heat load prediction in district heating and cooling systems. The authors compare their prediction results from RNN with the prediction results obtained from a three-layered feed forward neural network (TLNN). The mean squared error between the TLNN and the stationary actual heat load is reported to be 21.05^{2} whereas it is 11.82^{2} between the RNN and the actual heat load data. In the nonstationary case, RNN still provides lower mean squared error. The use of RNNs rises the expectation to capture the trend of heat load since it uses heat load data for several days as the input.

In [15], time, historical consumption data, and ambient temperatures were used as input parameters to forecast heat consumption for one week in the future. The authors compared the performances of three black-box modelling techniques SVR, PLS, and ANN for the prediction of heat consumption in the Suseo DH network and analyzed the accuracy of each method by comparing forecasting errors. The authors report that in one-day-ahead overall average error of PLS is 3.87% while that of ANN and SVR is 6.54% and 4.95%, respectively. The maximum error of SVR is 9.82%, which is lower than that of PLS (16.47%) and ANN (13.20%). In terms of the overall error, the authors indicate that PLS exhibits better forecasting performance than ANN or SVR.

In [16], a multiple regression (MR) model is used for heat load forecasting. The reported MAE is 9.30. The model described in [17] uses an online machine-learning approach named Fast Incremental Model Trees with Drift Detection (FIMT-DD) for heat load prediction and hence allows the flexibility of updating the model when the distribution of target variable changes. The results of the study indicate that MAE and MAPE for FIMT-DD (using Bagging) have lower values in comparison to Adaptive Model Rules (AMRules) and Instance Based Learner on Streams (IBLStreams).

Authors in [18] compare the performance of four supervised ML algorithms (MLR, FFN, SVR, and Regression Tree (RT)) by studying the effect of internal and external factors. The external factors include outdoor temperature, solar radiation, wind speed, and wind direction. The internal factors are related to the district heating system and include supply and return water pressure, supply and return water temperature, the difference of supply and return temperature, and circular flow. Their study shows that SVR showed the best accuracy on heat load prediction for 1- to 24-hour horizons. However, the prediction accuracy decreases with the rise in horizon from 1 to 18 hours.

Wu et al. [19] discuss and implement SVR as a predictive model to the building’s historical energy use. Their predictive model proved to approximate current energy use with some seasonal and customer-specific variations in the approximations. Another work [20] discusses the importance of prediction of load in a smart energy grid network. The authors propose a BN to predict the total consumer water heat consumption in households. Shamshirband et al. [21] construct an adaptive neurofuzzy inference system (ANFIS), which is a special case of the ANN family, to predict heat load for individual consumers in a DH system. Their result indicates that more improvements of the model are required for prediction horizons greater than 1 hour. Protić et al. [22] study the relevance of short-term heat load prediction for operation control in DH network. Here, authors apply SVR for heat load prediction for only one substation for time horizon of every 15 minutes. To improve the predictive model, authors also add a dummy variable to define the state of DH operation.

In literature, the research towards developing load forecasting models is also discussed from different perspectives and used in different energy related applications, such as head load in district heating, wind turbine reaction torque prediction [23], and wind power forecasting [24, 25].

In [23], SVR is employed for wind turbine torque prediction. The results show that an improvement in accuracy can be achieved and conclude that SVR can be considered as a suitable alternative for prediction. It can be also seen that the proposed SVR prediction models produce higher accuracy compared to ANN and ANFIS (adaptive neurofuzzy inference system). The work discussed in [24] considers the penetrations of renewable energies in electrical power systems by increasing the level of uncertainty. In such situations, traditional methods for forecasting of load demand cannot properly handle these uncertainties. Hence, they implement a neural network method for constructing prediction intervals by using a low upper bound estimation (LUBE) approach. The authors conduct a comparative analysis and show that this method can increase the prediction intervals quality for load and wind power generation predictions.

Bhaskar and Singh [25] perform a statistical based wind power prediction using numerical weather prediction (NWP). In order to validate the effectiveness of the proposed method, the authors compared it with benchmark models, such as persistence (PER) and new-reference (NR), and show that the proposed model outperforms these benchmark models.

Additionally, due to innovations in the future sustainable and smart energy systems and recent technological trends with IoT (Internet of Things), many research works [5, 26] consider DH systems as being an integral part in Smart Grid, within the smart city concept. Moreover, such a DH system model will require high computation time and resources for knowledge representation, knowledge inference, and operational optimization problems. Thus, in response to this, researchers are continuously focusing on the development and use of fast and efficient algorithms for real-time processing of energy and behavior related data.

As a summary, previous research on heat load prediction points to various training algorithms: ANN including RNN, FFN (Feedforward Neural Network)/MLP, and SOM; MR including MLR and PLS; SVM including SVR; Bayesian networks (BN); decision trees (DT); ensemble methods [27]; FIMT-DD; AMRules; and IBLStreams.

In spite of the interest and the considerable efforts given by the research community so far, there is no consensus among researchers on neither selecting the most suitable training model for heat load prediction nor selecting an appropriate set of input parameters for training the model with [16] in order to achieve high level of prediction accuracy. This is due to the fact that superiority of one model over another in heat load prediction cannot be asserted in general because performance of each model rather depends on the structure of the prediction problem and the type of data available. The comparison in [15] pointed to the superior performance of SVR already; however, as our problem structure and inputs are different from theirs, we chose to do a comparison of several up-to-date models to find the most promising approach for our case. Table 1 lists models from the literature. The “plus” sign indicates that a particular algorithm has been applied, while “minus” means the opposite. Based on the table, we concluded that SVR, PLS, and RF provide us with a unique combination of models to compare with each other. Simplicity and efficiency of each model in our combination are preferred such that rapid and simple assessment of energy demand with high accuracy can be obtained.