Energy consumption forecasting is essential for efficient resource management related to both economic and environmental benefits. Forecasting can be implemented through statistical analysis of historical data, application of Artificial Intelligence (AI) algorithms, physical models, and more, and focuses on two directions: the required load for a specific area, e.g., a city, and the required load for a building. Building power forecasting is challenging due to the frequent fluctuation of the required electricity and the complexity and alterability of each building’s energy behavior. This paper focuses on the application of Deep Learning (DL) methods to accurately predict building (residential, commercial, or multiple) power consumption, by utilizing the available historical big data. Research findings are compared to state-of-the-art statistical models and AI methods of the literature to comparatively evaluate their efficiency and justify their future application. The aim of this work is to review up-to-date proposed DL approaches, to highlight the current research status, and to point out emerging challenges and future potential directions. Research revealed a higher interest in residential building load forecasting covering 47.5% of the related literature from 2016 up to date, focusing on short-term forecasting horizon in 55% of the referenced papers. The latter was attributed to the lack of available public datasets for experimentation in different building types, since it was found that in the 48.2% of the related literature, the same historical data regarding residential buildings load consumption was used.

1. Introduction

According to the statistical review on world energy (2020) of the International Energy Agency (IEA) [1], until the end of the year 2019 (although the report for 2021 is available [2], it is not used due to the unique impact of COVID-19 in the Energy sector that, due to the authors’ opinion, requires further research and is beyond the scope of this paper) almost 80% of the consumed electricity, in global scale, was produced by nonrenewable sources [3] (Figure 1(a)) such as coal, lignite, oil, natural gas, and the annual variation, with few exceptions, ranged between 1% and 4% of an increase, corresponding to 1% to 2% in average [4] as illustrated in Figure 1(b). These two factors alone are sufficient to conclude that electric energy is a product with an expiration date, high production costs, and derivatives harmful to the environment (nuclear waste, greenhouse gas (GHG) emissions, etc.). The lack of sufficient means of long-term and large-scale storage of produced electricity, rendered both the problems of successfully balancing the supply–demand scale and power consumption forecasting. These are the most significant problems that the electricity production industry has to address on a daily basis, in order to avoid energy shortages that could lead to line of production problems in industrial compounds, services disruption, residential outages, etc., or produced energy waste [5]. Effective management of the produced electric energy, avoidance of excessive production, and minimization of energy wastage constitute the main keys toward sustainable energy consumption [6].

Over the past years, in multiple scientific papers and articles, the term “smart grid” was frequently used [7] to describe a flexible grid regarding the production and distribution of electric energy [8]. It should be noted that a search on Scopus by using the term “smart grid” indicated 44.598 research articles until 2022. This flexibility is due to the dynamic adjustment in power demand and the cost-effective distribution of electricity produced from various sources, e.g., solar, wind, nuclear. [9]. A grid is considered “smart” when it is able to monitor, predict, schedule, understand, learn, and make decisions regarding power production and distribution, carrying valuable information along with electricity [8]. The upgrade of power grid requires the infusion of AI [10]; recent studies suggest the training of Artificial Neural Networks (ANNs) to recognize multiple energy patterns [11].

Regarding the prediction time frame, power consumption forecasting can be divided into three main categories.(i)Short–term, which covers a time frame of up to one day; it is useful in supply and demand (SD) adjustment [12].(ii)Medium–term, which covers a time frame from one day up to a year; it is useful in maintenance and outage planning [13].(iii)Long–term, for a time frame longer than a year; it is useful in infrastructure development planning [14].

In published papers/scientific literature, so far, the applied methodology in power consumption forecasting, regarding an area or a specific building, casts in two main categories.(i)Physics principles-based models.(ii)Statistical and Machine Learning (ML) models.

Building power consumption forecasting is considered more demanding compared to the area of power consumption forecasting [15]; however, more and more researchers are turning to the application of DL to solve such problems due to the reported promising performances [1618].

According to estimates, residential and commercial buildings are responsible for the consumption of 20–40% of the global energy production [1922], having a high-energy wastage rate due to insufficient management and planning, age of the building, lack of responsible energy usage, etc. [23]. High consumption percentages have motivated researchers to develop new ways or enhance and improve the existing ones in better savings of electric energy, focusing mainly on the development of a solid strategy for flexible and efficient energy supply–demand management. The success of the latter strategy is highly dependent on timely and accurate energy consumption forecasting [24, 25].

Therefore, a “smart” grid should be able to predict not only the total amount of energy needed at a certain period of time in a specific area but to calculate further and with precision the electricity consumption needs of a specific building based on the building’s characteristics such as Heating, Ventilation, and Air Conditioning (HVAC) devices, and historic consumption data. [26]. It is worth mentioning that a research work [27] estimated that an increase of 1% in forecasting accuracy could result in approximately £10 million per year fewer expenses for the power system of the United Kingdom.

Towards this end, the contribution of the present work is focused on the methodological searching, collecting, analyzing, and presenting DL methods proposed in the years 2016–2022 for solving the problem of building power consumption prediction. To the best of the authors’ knowledge, literature reviews of DL methodologies and approaches, focusing on forecasting energy use in buildings are limited [28, 29]. This work aims to address the issue of building energy load forecasting and load prediction further and in depth, to update existing literature reviews on the same subject, to shed more light on the current status of DL performance in this area, and to highlight the main challenges that need to be addressed in the future. The main contributions of the current work compared to existing reviews on building load forecasting [28, 29] are the following.(i)The research methodology, analyzed in Section 2, is not limited to specific publishers, resulting in a wider range of publications on the subject under study.(ii)This work focuses on research papers that propose methodologies based on the total building energy load and it is not targeted on specific utility loads, such as HVAC load.

The rest of the paper is organized as follows. In Section 2, the methodology of the contacted literature search is presented, along with statistical analysis and graph displays of the results. In Section 3, all the nondeep learning methodologies proposed or used to date in building load forecasting are presented briefly. In Section 4, the deep learning methodologies that have been proposed and tested so far towards addressing the building energy consumption prediction problem, along with their results and conclusions, are presented in chronological order. Section 5 provides details regarding the datasets used in the referenced literature of Section 4. Section 6 discusses new reflections and concerns regarding the building load forecasting problem that were raised from the conducted research. Finally, Section 7 concludes the paper.

2. Materials and Methods

The methodology that was followed consisted of four main steps, as illustrated in Figure 2.(1)Extensive research in the published literature by using Scopus, a certified, academically approved search engine, to establish a solid baseline for our research. Scopus complies with the most important research features of recall, precision, and importance. Regarding “importance”, Scopus is considered the most effective research engine for an overview of a topic [30], and therefore selected for the scope of this review article. The application of several different combinations of the keywords indexed above, such as “Deep learning” and “Building load forecasting”, resulted initially in 71 papers.(2)By studying the Title–Abstract–Conclusion parts of each paper, we were able to narrow down the relevant, to our subject, papers. In this step, the study focused on papers that could provide potential solutions/suggestions in wider and more generalized applications and benefit in the upcoming research. After this phase, 48 papers remained.(3)Extensive and meticulous study of the remaining papers and categorization according to the suggested solution/methodology. In this phase remained 34 papers.(4)In the final stage of our research, to achieve a more thorough review, we traced/researched the references in the papers of the previous step, which were not included in the results of step 1. This indicated 6 more papers, significant to our research, resulting in a total of 40 papers relevant to the subject.

Several useful conclusions emerged from this literature review, regarding the up-to-date engagement of the scientific community in the current subject. The matter of the application of deep learning methods to the prognosis of the electrical load of buildings is a subject that first appeared in 2016 and since then continues to demonstrate an upward trend when it comes to the interest of researchers, as it appears in Figure 3. It should be noted that only until February of 2022, seven papers relevant to the subject have been published. The latter trend is probably due to the promising results of DL architecture application in research, compared to traditional load forecasting methods.

Regarding the type of building, residential, commercial, or multiple types, research reveals an almost similar interest in multiple and commercial buildings, as it can be seen in Figure 4, while the higher interest is in residential buildings. We assume this has to do with the limited dataset availability than the sole interest in a specific type of building, since approximately 48.2% of the papers experimenting in residential and multiple building load forecasting are using the same well-known dataset containing residential load consumption data, as it will be further discussed in an upcoming section.

Out of the main three categories of forecasting time horizon, short-medium-long, and multiple (more than one category), the one that was mostly expended and researched, as it is shown in Figure 5, is short-term forecasting. This is probably due to dataset resolution and wide spread of smart meters installed in an increasing number of buildings. We also assume that since building load forecasting is highly connected to building occupants’ behavior, it is probably better to predict power consumption in short-term, and adjust models accordingly, since it is more sensitive in capturing variations in building consumption patterns.

Regarding the methods–architectures of deep learning that were proposed and tested, the Long-term Short-Term Memory (LSTM) based architectures summoned the greatest interest, as displayed in Figure 6. This is due to the adaptability that they present in maintaining “memory” for a big number of steps and in their ability to apply numerous parameters in order to achieve better accuracy and performance compared to most of the other models. In Figure 5, the category “Hybrid” refers mainly to LSTM Convolutional Neural Network (LSTM–CNN) hybrid architectures, the category “AE” refers to autoencoders, and the category “Other” refers to the rest of researched architectures.

3. Nondeep Learning Methods in Building Load Forecasting

The methods/technics/approaches regarding building energy load forecasting, according to literature, can be divided into three main categories [31].(1)White Box or Physical methods, which include all methods that address the problem by interpreting the thermal behavior of a building. These complex methods require a detailed description of the building’s geometry, they do not require training data, and their results can be interpreted in physical terms. There are several limitations in this methodology regarding forecasting accuracy and reliability [32, 33]. There are three main approaches in this category, and due to their complexity, there are several software solutions simplifying and automating these complex procedures:(i)Computational Fluid Dynamics (CFD), which is considered a three-dimensional approach [34, 35].(ii)Zonal, a simplified CFD, which is considered a two-dimensional approach [36].(iii)Nodal approach, which is the simplest of the three, and is considered a one-dimensional approach [37].(2)Black Box or Statistical methods using traditional Machine Learning. These methods do not require a detailed description of the building geometry; they require a sufficient amount of training data, and their results can be difficult to interpret in physical terms. The most commonly used methods are:(iv)Conditional Demand Analysis (CDA), based on the Multiple Linear Regression method [38].(v)Genetic Algorithms, based on Darwin’s Theory of evolution of the species [39, 40].(vi)Artificial Neural Networks (ANN), inspired by brain neurons [41, 42].(vii)Support Vector Machine (SVM), a classification or regression problem solving method [43, 44].(viii)Autoregressive Integrated Moving Average (ARIMA) [45].(3)Grey Box or Hybrid models, which combine methods from the previous categories, in an effort to overcome their disadvantages and utilize their advantages [46]. These methods require a rough description of the building geometry, a small amount of training data compared to the previous category, and their results can be interpreted in physical terms.

4. Deep Learning Methods in Building Load Forecasting

As displayed in Figure 4, the methodologies proposed for building load forecasting are categorized into three main categories, regarding the type of buildings under investigation. In this section, following the same categorization, the examined DL methodologies are presented.

4.1. Residential Building Load Forecasting

The first DL-based methodology was proposed by Elena Mocanu et al. [47] in 2016 for load forecasting of a residential building. The examined DL models were: (1) Conditional Restricted Boltzmann Machine (CRBM) [48] and (2) Factored Conditional Restricted Boltzmann Machine (FCRBM) [49], with reduced extra layers. The performance of both models was compared to that of the three most used Machine learning methods of that time [5052]: (1) Artificial Neural Network - Non-Linear Autoregressive Model (ANN-NAR), (2) Support Vector Machine (SVM), and (3) Recurrent Neural Network (RNN). The used dataset entitled “Individual Household Electric Power Consumption” (IHEPC) [53] was collected from a household at a one-minute sampling rate. It contained 2.075.259 samples in an almost four-year period (47 months) of time, collected between December 2006 and November 2010. The attributes, from the dataset, being used in the experiments were: Aggregated active power (household avg power excluding the devices in the following attributes), Energy Submetering 1 (Kitchen–oven, microwave, dishwasher, etc.), Energy Submetering 2 (Laundry room–washing machine, dryer, refrigerator, and a light bulb), and Energy Submetering 3 (water heater and air condition device). In all the implementations, the authors used the first three years of the dataset for model training and the fourth year for testing. Useful conclusions extracted from that research were the following: all five tested models produced comparable forecasting results, with the best performance attained in experiments predicting the aggregated energy consumption, rather than the other three submetering. It is also worth mentioning that in all the scenarios for submetering prediction the results were the most inaccurate, which could be attributed to the difficulty to predict user behavior. The proposed FCRBM deep learning model outperformed the other four prediction methods in most scenarios. All methods proved to be suitable for near real-time exploitation in power consumption prediction, but the researchers also concluded that when the prediction length was increasing, the accuracy of predictions was decreasing, reposting prediction errors half of that of the ANN. The authors also concluded that even though the use of the proposed deep learning methods was feasible and provided sufficient results, it could be further improved to achieve better accuracy in prediction by fine-tuning, the addition of extra information to the models, such as environmental temperature, time, and more [47].

In the same year, Daniel L. Marino et al. [6] proposed another methodology using the LSTM DL model. More precisely, the authors examined three models: (1) Standard Long Short Term Memory (LSTM) [54], a Recurrent Neural Network (RNN) designed to store information for long time periods, that can successfully address the vanishing gradient issue of RNN; (2) LSTM-Based Sequence-to-Sequence (S2S) architecture [55], a more flexible than standard LSTM architecture consisting of two LSTM networks in encoder-decoder duties, which overcomes the naive mapping problem observed in standard LSTM; and (3) Factored Conditional Restricted Boltzmann Machine (FCRBM) method proposed in [47]. This work revealed that the standard LSTM failed in building load forecasting and a naïve mapping issue occurred. The proposed deep learning model LSTM Sequence-to-Sequence (S2S) network, based on standard LSTM network, overcame the naïve mapping issue and produced, comparable results to FCRBM model and to the other methods examined in [47], by using the same dataset [53]. A significant conclusion of this research was that when the prediction length increased, the accuracy of predictions decreased. The researchers also concluded that in order to have a better grasp of the effectiveness of those methods and improve their generalization, more experiments with different datasets and regularization methods had to be conducted. It is worth mentioning that the used dataset was the same as in [47].

The following year, in 2017, Kasun Amarasinghe et al. [56] proposed a methodology based on the Convolutional Neural Network (CNN) model. The novelty of this work was the deployment of a grid topology for feeding the data to the CNN model, for the first time in this kind of problems. The authors compared the performance of the CNN model with that of: (1) Standard Long-Short-Term Memory (LSTM), (2) LSTM-Based Sequence-to-Sequence (S2S) Architecture network, (3) Factored Conditional Restricted Boltzmann Machine (FCRBM), (4) Artificial Neural Networks with Non-Linear Autoregressive Model (ANN-NAR), and (5) Support Vector Machine (SVM). This research extracted the following conclusions: all the tested deep learning architectures produced better results in energy load forecasting for a single residence than SVM, and similar or more accurate results than standard ANN. Moreover, the best accuracy has been achieved by LSTM (S2S). The results of the tested CNN architectures were similar, with slight variations, to each other, performed better than SVM and ANN, and even though they did not outperform the other deep learning methods, they managed to remain a promising architecture. A more general observation that puzzled the researchers was that the results in training were better than in testing. The researchers also concluded, based on their recent and previous work [6], that the tested deep learning methods [57, 58] produced promising results in energy load forecasting. They also suggested that weather data should be considered in future works regarding forecasting due to the direct relationship between the two and the fact that it had not been used to date elsewhere than in [57]. Finally, they came to the same conclusion as in their previous work that in order to report a better grasp of the effectiveness of their methods and to improve their generalization, more experiments with different datasets and regularization methods had to be conducted. Once again, the same dataset [53] was utilized.

In [59], Lei et al. In 2018 introduced a short-term residential load forecasting model, named Residual Conventional Fusion Network (RCFNet). The proposed model consisted of three branches of residual convolutional units (proximity, tendency, and periodicity modeling), a fully connected NN (weekday or weekend modeling) and an RCN to perform load forecasting based on the fusion of the previous outputs. The dataset used in this research [60], covered a two-year time period (April of 2012 to March 2014) and contained half hour sampled data from smart meters installed in 25 households, in Victoria, Australia. For this research purpose, only 8 households that contained the most complete data series were used. Approximately, 91.7% (22 months) of the dataset was used for training and the remaining 8.3% (2 months) for testing. Six different variations of the proposed RCFNet model were compared to four baseline forecasting models: History Average (HA), Seasonal ARIMA (SARIMA), MLP and LSTM, and all models were evaluated by calculating the round mean-square-error (RMSE) metric. The researchers concluded that their model outperformed all other models and achieved the best accuracy, scalability, and adaptability.

In [61], Kim et al. in 2019 introduced a deep learning model for building load forecasting based on the Autoencoder (AE) model. The main idea behind this approach was to devise a scheme capable of considering different features for different states/situations each time, to achieve more accurate and explanatory energy forecasts. The model consisted of two main components, based on LSTM architecture, a projector that gave the input data, the energy current demand that defined the state of the model and a predictor for the building load forecasting, based on that state. The user of the system had a key role and could affect the forecasting through parameter and condition choices. In this work, a well-known dataset [53] was used; 90% of the dataset was used for training and 10% for testing the model. The authors compared their model to traditional forecasting methods, ML methods and DL methods, and they concluded that the proposed model, evaluated by mean square error (MSE), mean absolute error (MAE), and mean relative estimation error (MRE) metrics, outperformed them in most cases. The authors also concluded that their models’ efficiency was enhanced due to the condition adjustment, giving each time the situation/state of the model. The main contribution of the proposed work was that the model could both predict future demand and define the current demand pattern as state.

The same research team of Kim et al. [62] in the same year, 2019, proposed a hybrid model, where two DL architectures, a CNN most commonly used in image recognition, and an LSTM, most commonly used in speech recognition and natural language processing, were linearly combined in a CNN–LSTM model architecture. For the experiments, a popular dataset [53] was used. The proposed model was tested in minute–hour–day–week resolutions and it was discovered that as the resolution increased, accuracy improved. The CNN–LSTM model evaluated by MSE-RMSE-MAE-mean absolute percentage error (MAPE) metrics, as compared to several other traditional energy forecasting ML and DL models and produced the most accurate results. It should be noted that the proposed method introduced first a combination of CNN architectures with LSTM models for energy consumption prediction. The authors concluded that the proposed model could deal with noise drawbacks and displayed minimal loss of information. The authors also evaluated the attributes of the used dataset and the impact that each of them had on building load forecasting. Submetering 3 attributes, representing water heater and air conditioner consumption, had the highest impact followed by Global Active Power (GPA) attribute. Another observation of this research was on the lack of available relevant datasets and that future work should focus on data collection and the creation of an automated method for hyperparameter choosing.

In [63], Le et al. in 2019 presented a DL model for building load forecasting, named EECP-CBL. The architecture of the model was a combination of Bi-LSTM and CNN networks. For the contacted experiments, the authors utilized the IHEPC dataset [53]. For each model, 60% of the data (first three years) was used for training and the rest 40% of the data (last two years) was used for testing. The EECP–CBL model was compared to several state-of-the-art models at the time, used in the industry or introduced by other researchers for energy load forecasting: Linear Regression, LSTM, and CNN-LSTM. After data optimization, the models were tested for real-time (1 minute), short (1 hour), medium (1 day), and long (1 week) term load prediction, and they were evaluated by MSE, RMSE, MAE, and MAPE metrics. The authors concluded that the proposed model outperformed all other models in terms of accuracy. In this research, the researchers also focused on the time consumed for training and prediction of each model and concluded that while the prediction horizon increased, the time required for each additional task decreased for each model, with the proposed model outperforming all other, reporting as a disadvantage a comparatively higher training time. The research team also concluded that EECP–CBL model achieved peak performance on long-term building load forecasting and could be utilized in intelligent power management systems.

In [64], Mehdipour Pirbazari et al. in 2020, in order to explore the extent and the way several factors can affect short-term (1-hour) building load forecasting, performed several experiments on four data-driven prediction models: Support Vector Regression (SVR), Gradient Boosting Regression Trees (GBRT), Feed Forward Neural Networks (FFNNs), and LSTM. The authors focused mainly on the scalability of the models and the prediction accuracy if trained solely in historical consumption data. The dataset covered a four-year time period (November 2011 to February 2014) and contained smart meter hourly data from 5.567 individual households in London, UK [65]. After data normalization and parameter tuning, the dataset utilized in this research focused on the year 2013 (fewer missing values, etc.) regarding 75 households, 15 each out of five different consumer -type groups classified by Acorn [66]. The four models were evaluated by Cumulative Weighted Error (CWE), based on RMSE, MAE, MASE, and Daily Peak Mean Average Percentage Error (DpMAPE) metrics. The researchers concluded that among the four models, LSTM and FFNN presented better adaptability to consumption variations, and resulted in better accuracy, but LSTM had higher computation cost and was clearly outperformed by CBRT, which was significantly faster. According to the reported results, other factors that affected load forecasting, for all four models, were the variations in usage, average energy consumption, and forecasted season temperature. Also, changes in the number of features (input lags) or a total of tested households (size of training dataset) did not affect similarly all models. The developed models were expected to learn various load profiles aiming towards generalization abilities and increase of models’ robustness.

In [67], Mlangeni et al. in 2020 introduced, for medium and long-term building load forecasting, Dense Neural Network (DNN), a deep learning architecture that consisted of multiple ANN layers. The dataset used for this research contained, approximately, 2 million records from households in the eThekwini metropolitan area that contained 38 attributes and covered a five-year period, from 2008 to 2013. After data optimization and preparation, only 709.021 samples remained, which contained 7 attributes. For model training, 75% of the data was used, and for testing, the remaining 25%. In order to model load forecasting for the campus buildings of the University of KwaZulu, the authors assigned the household readings to rooms inside university buildings. The proposed architecture was compared to SVM and Multiple Regression (MR) models and was evaluated by RMSE and normalized RMSE (nRMSE) metrics. The authors concluded that the proposed model outperformed the rest of the models, presented good generalization ability, and could follow the data consumption trends. Dispersion of values in the data resulted in inaccurate estimations of large values, probably due to them being outliers. The authors also concluded that their method could be further improved by implementing more ML architectures and then testing in more datasets against other models or even extending from building load forecasting to wider metropolitan areas.

In [68], Estebsari et al. In 2020, inspired by the high performance of CNN networks in image recognition, proposed a 2-dimensional CNN model for short-term (15-minutes) building load forecasting. In order to encode the 1-dimensional time series into 2-dimensional images, the authors presented and experimented on four well-known methods: recurrence plots (RP) [69], Gramian angular field (GAF), and Markov transition field (MTF) [70]. For the experimental results, it was used the Boston housing dataset [53]; 80% of the data was used for training and the remaining 20% for testing the models. The performance of three different versions, based on the image encoding method used, of the proposed model, CNN-2D was compared to SVM, ANN, and CNN-1D models. All architectures were evaluated by RMSE, MAPE, and MAE metrics. The researchers concluded that the CNN-2D-RP model outperformed all other models, displaying the best forecasting accuracy, however, due to image encoded data, had a significantly higher computational complexity, making it inappropriate for real-time applications.

In [71], Wen et al. in 2020 presented a Deep RNN with Gated Recurrent Unit (DRNN-GRU) architecture, consisting of five layers, for short- to medium-term load forecasting in residential buildings. The proposed models’ prediction accuracy was compared by using MAPE, RMSE, percentage of consonants correct (PCC), and MAE metrics, to several DL (DRNN, DRNN-LSTM) and non-DL schemes (MLP, ARIMA, SVM, MLR). The dataset used in this research contained 15 months of hourly gathered consumption data and was obtained from Pecan Street Inc. Dataport Web Portal [72], while weather data were obtained from [73]. For the experimental evaluation of the method, 20 individual residential buildings were selected from the dataset; the first year of the dataset (80%) was used for training and the remaining three months (20%) for testing. The load demand was calculated for the aggregated load of a group of ten individual residential buildings. The researchers extracted several conclusions from their work. The proposed model achieved a lower error rate compared to the other tested methods and almost 5% less than the LSTM layer variation of DRNN. The researchers also declared that DRNN-GRU model achieved higher accuracy results than the rest models for the aggregated load of 10 residential buildings as well as for the individual load of residencies. There were some issues though to be taken under consideration, regarding the use of the proposed scheme for building load forecasting. The weather attributes, based on historic data, could affect the load forecasting accuracy since the weather could not be predicted with high certainty. In addition, the aggregated load forecasting accuracy was higher than the individual residence load, since the factor of the uncertain human behavior decreased as the number of total residences raised.

In 2021, Jin et al. [74] developed an attention-based encoder-decoder network based on a gated recurrent unit (GRU) NN with Bayesian optimization towards short-term power forecasting. The contributions of the proposed method were in the incorporation of a temporal attention mechanism able to adjust the nonlinear and dynamic adaptability of the network, and the automatic verification of the hyperparameters of the encoder-decoder model resulting in improved prediction performance. The verification of the network was tested for 24-hours load forecasting with data acquired from the American Electric Power (AEP) [75]. The dataset included 26280 data from 2017 to 2020, with a sampling frequency of one hour; 70% of the data was used for training, 10% for validation, and 20% for testing. The model was also tested for the load prediction of four special days: Spring Equinox, Easter, Halloween, and Christmas. The proposed method demonstrated high performance and stability compared to nine other models, considering various indicators to reflect their accuracy performance (RMSE, MAE, Pearson correlation coefficient (R), NRMSE, and symmetric mean absolute percentage error (SMAPE)). The proposed model outperformed all nine models in all cases.

In [15], a hybrid DL model was proposed for household-level energy forecasting in smart buildings. The model was based on the stacking of fully connected layers and unidirectional LSTMs on bidirectional LSTMs. The proposed model could allow the learning of exceedingly nonlinear and convoluted patterns, and correlations in data that could not be reached by the classical up-to-date unidirectional architectures. The accuracy of the model was evaluated on two datasets through score metrics in comparison with existing relevant state-of-the-art approaches. The first dataset included temperature and humidity in different rooms, appliances energy use, light fixtures energy use, weather data, outdoor temperature and relative humidity, atmospheric pressure, wind speed, visibility, and dewpoint temperature data [76]. The second dataset was the well-known IHEPC set of the University of California, Irvine (UCI) Machine Learning repository [53]. The employed performance comparison indicated the proposed model as the one with the highest accuracy, evaluated with RMSE, MAPE and MAE, even in the case of multistep ahead forecasting. The proposed method could be easily extended to long-term forecasting. Future work could focus on additional household occupancy data and on speeding up the training time of the model in order to facilitate its real-time application.

In the same year, Shirzadi et al. [13] developed and compared ML (SVM, RF) and DL models (nonlinear autoregressive exogenous NN (NARX), recurrent NN (RNN-LSTM)) for predicting electrical load demand. Ten years of historical data for Bruce Country in Canada were used [77] regarding hourly electricity consumption by the Independent Electricity System Operator (IESO), feed with temperature, and wind speed information [78]recorded from 2010 to 2019; nine years of data were considered for training and one year for testing. Results revealed that DL models could predict more accurately the load demand, in terms of MAPE and R-squared metrics, for both peak and off-peak values. The windowing size of the analysis period was reported as a limitation of the method, affecting significantly the computation time.

Ozer et al. in 2021 [79] proposed a cross-correlation (XCORR)-based transfer learning approach on LSTM. The proposed model was location-independent and global features were added to the load forecasting. Moreover, only one month of original data was considered. More specifically, the training data were obtained from the Dataport website [72], while the building data for which the load demand was estimated and were collected by an academic building for one month. Evaluation metrics RMSE, MAE, and MAPE were calculated. The performance of the proposed model was not compared to different models; however, the effect of transfer learning on LSTM was emphasized. The method resulted in accurate prediction results, paving the way for energy forecasting based on limited data.

More recently, in January 2022, Olu-Ajayi et al. [80] presented several techniques for predicting annual building energy consumption utilizing a large dataset of residential buildings: ANN, GB, DNN, Random Forest (RF), Stacking, kNN, SVM, Decision Tree (DT), and Linear Regression (LR) were considered. The dataset included building information retrieved from the Ministry of Housing Communities and Local Government (MHCLG) repository [81] and meteorological data from the Meteostat repository [82]. In addition to forecasting, the effect of building clusters on model performance was examined. The main novelty of that work was the introduction of input key features of building design, enabling designers to forecast the average annual energy consumption at the early stages of development. The effects on the performance of the model of both building clusters on the selected features and the data size were also investigated. Results indicated DNN as the most efficient model in terms of R-squared, MAE, RMSE, and MSE.

In the same month of 2022, in [83], Yan et al. proposed a bidirectional nested LSTM (MC-BiNLSTM) model. The model was combined with discrete stationary wavelet transform (SWT) towards more accurate energy consumption forecasting. The integrated approach of the proposed method enabled enhanced precision due to the use of multiple subsignals processing. Moreover, the use of SWT was able to eliminate the signal noise by signal decomposition. The UK-DALE [84] dataset was used for the evaluation of the model by calculating MAE, RMSE, MAPE, and R-squared. The proposed method was compared to cutting-edge algorithms of the literature, such as AVR, MLP, LSTM, GRU, and seven hybrid DL models (Ensemble model combining LSTM and SWT, Ensemble model combining Nested LSTM (NLSTM) and SWT, Ensemble model combining bidirectional LSTM (BLSTM) and SWT, Ensemble model combining LSTM and empirical mode decomposition (EMD), Ensemble model combining LSTM and variational mode decomposition (VMD), Ensemble model combining LSTM and empirical wavelet transform (EWT), and Multichannel framework combining LSTM and CNN (MC–CNN–LSTM)). The proposed model achieved a reduction of MAPE to less than 8% in most of the cases. The method was developed on the edge of a centralized loud system that integrated the edge models and could provide to multiple households a universal IoT energy consumption prediction. The method was limited by the difficulty to integrate multiple models for different household consumption patterns, raising data privacy issues.

In [85], a DL model based on LSTM was implemented. The model consisted of two encoders, a decoder, and an explainer. Kullback-Leibler divergence was the selected loss function that introduced the long-term short-term dependencies in latent space created by the second encoder. Experimental results used the IHEPC dataset [53]. The first ten months of 2010 were used for training and the remaining two months for testing. The performance of the model was examined through three evaluation metrics, MSE, MAE, and MRE. Results were compared to conventional ML models such as LR, DT, and RF, and DL models such as LSTM, stacked LSTM, the autoencoder proposed by Li [86], the state-explainable autoencoder (SAE) [61], and the hybrid autoencoder (HAE) proposed by Kim and Cho [87]. The proposed model performed similarly to the state-of-the-art methods, providing additionally an explanation for the prediction results. Temporal information has been considered, paving the way for additional explanation for not only time but also for spatial characteristics.

In January 2022, Huang et al. [88] proposed a novel NN based on CNN-attention-bidirectional LSTM (BiLSTM) for residential energy consumption prediction. An attention mechanism was applied to assign different weights to the neurons’ outputs so as to strengthen the impact of important information. The proposed method was evaluated on IHEPC [53] household electricity consumption data. Moreover, different input timestamp lengths, of 10, 60, and 120 minutes, were selected to validate the performance of the model. Evaluation metrics of RMSE, MAE, and MAPE were calculated for the proposed model and traditional ML and DL methods for time-series prediction, such as SVR, LSTM, GRU, and CNN-LSTM, for comparison. Results indicated the proposed method as the one with the higher forecasting accuracy, resulting in the lowest average MAPE. Moreover, the proposed model could avoid the influence of the input sequence long time step and was able to extract information from the features that most affect the energy forecasting. The authors suggested the consideration of weather factors [89] and electricity price policy supplementary data for their future work.

The main characteristics of all aforementioned DL-based approaches are summarized in Table 1. Comparative performance to state-of-the-art methods is provided throughout this review instead of a numerical performance report for each method, since different evaluation metrics are calculated in each referenced work (round mean squared error (RMSE), correlation coefficient R, -value, mean absolute error (MAE), mean relative estimation error (MRE), etc.), different datasets and different time frames are selected, not making the results directly comparable.

4.2. Commercial Building Load Forecasting

In 2017, Chengdong Li et al. [86] proposed a new DL model from the combination of Stacked Autoencoders (SAE) [90] and an Extreme Learning Machine (ELM) [91]. The role of SAE was to extract features relative to the building’s power consumption, while the role of the ELM was for accurate energy load forecasting. Only the pretraining of the SAE was needed, while the fine-tuning was established by the least-squares learning of the parameters in the last fully connected layer. The authors compared the performance of the proposed Extreme SAE model with: (1) a Back Propagation Neural Network–BPNN; (2) a Support Vector Regressor–SVR; (3) a Generalized radial basis function neural network - GRBFNN, which is a generalized radial basis function neural network – RBFNN; and (4) a Multiple Linear Regression–MLR, a famous, often used regression and prediction statistical method. The dataset was collected from a retail building in Freemont (California, USA) in a 15-minute sampling rate [92]. The dataset contained 34.939 samples that were aggregated to 17.469 30-minutes and 8.734 1-hour samples. The effectiveness of the examined methodologies was measured in terms of MAE, MRE, and RMSE, for 30 and 60 minutes time period. The researchers concluded that the proposed approach in energy load consumption forecasting presented the best performance, especially with abnormal testing data reflecting uncertainties in the building power consumption. The best overall performance in forecasting was achieved by the Extreme SAE model in comparison to the other models. The achieved accuracy from best to worse was: Extreme SAE > SVR > GRBFNN > BPNN > MLR. The authors also concluded that the proposed SAE and ELM combination was superior to standard SAE, mainly, due to the lack of need for fine tuning of the entire network (iterative BP algorithm), which could speed up the learning process and contribute significantly to the generalization performance. The ELM speeded up the training procedure, without iterations, and boosted the overall performance, due to its deeper architecture and improved learning strategies.

Widyaning Chandramitasari et al. [5] in 2018 proposed a model constructed by the combination of an LSTM network, used for time-series forecasting, and a Feed Forward Neural Network (FFNN), to increase the forecasting accuracy. The research focused on a time horizon of one day ahead with a 30-minute resolution, for a construction company in Japan. The proposed model was validated and compared against the standard LSTM and Moving Average (MA) model, which were used by a power supply company. The effectiveness of the evaluated methodologies was measured by RMSE. The used dataset covered a time period of approximately 1 year and four months (August 2016 to November 2017) with a 30-minutes resolution. Additional time information considered in the experiments was the day, time, and season (low–middle–high). The authors concluded that separating the day in “weekdays” and “All day” data gave more accurate results in energy load forecasting for weekdays. They also pointed out that the data analysis performed for forecasting should be, each time, according to the type of the client (residential, public, commercial, industrial, etc.).

In the same year, Nichiforov et al. [93] experimented on RNN networks with implemented LSTM layers consisting of one sequence input layer, a layer of LSTM units with several different configurations and variations regarding the amount of used hidden units (from 5 up to 125 units), a fully connected layer and a regression output layer. They compared the results for two different nonresidential buildings from the University Campuses, one in Chicago, and the other in Zurich. The datasets used in their experiments were apprehended by BUDS [94] and contained hourly samples over a one-year period and after data optimization, they resulted in two datasets of approximately 8.670 data samples each. Results were promising, pointing out that the method could be used in load management algorithms with limited overhead for periodic adjustments and model retraining.

The following year, the same authors in [95] also experimented with the same dataset and the same RNN architectures, adding to their research one more building located in New York. Useful conclusions extracted from both works were the following: RNN architecture was a good candidate, prompting promising accuracy results for building load forecasting. The best performance, graded by RMSE, Coefficient of Variation of the RMSE (CV- RMSE), MAPE and MSE metrics, was achieved with the RNN network when the LSTM layer contained 50 hidden units, while the worst accuracy was observed when contained 125 hidden units, for all buildings. DL-Model testing in load forecasting enhanced in the past few years due to the availability of datasets and relevant algorithms, better hardware necessary for testing, network modeling that could be obtained in lower prices and industry, and academic research teams’ joint efforts leading to better results. Due to the complexity of the building energy forecasting problem (buildings’ architecture, materials, consumption patterns, weather conditions, etc.), experts’ opinions in this domain could provide insights and guidance, along with further investigation and experimentation on a wide model variation. The authors also suggested that on-site energy storage could balance the scale in favor of better energy management.

In 2019, Ljubisa Sehovac et al. [96] proposed the GRU (S2S) model [97], a simplified LSTM that maintained similar functionality. Two are the main differences between the two models, regarding their cells: (1) GRU (S2S) has an all-purpose hidden state h instead of two different states, memory and hidden and (2) the input and forget gates are replaced with update gate z. These modifications allowed GRU (S2S) model to train and converge in less time than LSTM (S2S) model, maintaining at the same time, a sufficient amount of hidden states dimension and gates to preserve long-term memory. In this study, the authors experimented in all time frame categories, for power consumption forecasting (Short–Medium–Long). The dataset used in the experiments was collected from a retail building at a 5-minute sampling rate. It contained 132.446 samples and covered a time period of one year and three months. There are 11 features in this dataset: Month, Day of Year, Day of Month, Weekday, Weekend, Holiday, Hour, Season, Temperature (°C), Humidity, and Usage (KW). The data were collected from “smart” sensors part of a “smart grid; the first 80% was used for training and the remaining 20% for testing. The proposed method was compared to LSTM (S2S), RNN (S2S) and a Deep Neural Network and their effectiveness was measured by the use of MAE and MAPE. The authors concluded that the GRU (S2S) and LSTM (S2S) models produced better accuracy in energy load consumption forecasting than the other two models. In addition, the GRU (S2S) model outperformed the LSTM (S2S) model and gave an accurate prediction for all three cases. Finally, a significant conclusion that verified the conclusions of relevant research [6, 47] was that when the prediction length increased the accuracy of predictions was expected to decrease.

Mengmeng Cai et al. [98] designed Gated CNN (GCNN) and Gated RNN (GRNN) models. In this research, they tested five different models in short-term forecasting (next day forecasting) and compared them in terms of accuracy in forecasting, ability to be generalized, robustness, and computational efficiency. The models they tested were: (1) GCNN1, a multistep recursive model that made one-hour predictions that applied it 24 times for a day prediction; (2) GRNN1, same as the previous but for RNN model; (3) GCNN24, multistep, direct procedure that predicted the whole 24 hours at once; (4) GRNN24, same like the previous but for RNN model; and (5) ARIMAX, a non-DL, commonly used method for time-series problems. The authors applied the five models in three different nonresidential buildings: (1) Building A (Alexandria, VA, approx. 30.000 sqf, academic, dataset obtained by [99]), Building B (Shirley, NY, approx. 80.000 sqf, school, dataset obtained by [100]), and Building C (Uxbridge, MA, approx. 55.000 sqf, grocery store, dataset obtained by [100]). The datasets used in their experiments were one-hour samples collected in a year time period and contained meteorological data, temperature, humidity, air pressure, and wind speed. After data pre-\processing (cleaning, segmentation, formation, normalization, etc.) for keeping only the weekday samples, the researchers divided the remained data in 90% training data, 5% validation data, and 5% testing data. Several useful conclusions were extracted. The building size, occupancy and peak load mattered significantly in the results of GCNN1 and GRNN1, improving the accuracy of load prediction. While the number of people in the building has risen, the uncertainty caused by each individual’s behavior is averaged, resulting in a more accurate prediction. Among GCNN1, GRNN1, and SARIMAX, the best performance was achieved by GCNN1, while the slightly poorer by GRNN1 and the worst by far by SARIMAX. In another experiment, the GCNN24 outperformed GRNN24, and produced better results in accuracy (22.6% fewer errors compared to SARIMAX) and computational efficiency (8% faster compared to SARIMAX) than GCNN1, GRNN1 and SARIMAX, granting the GCNN24 model as the most suitable, among the five, for short-term (day-ahead) building load forecasting. As a more general conclusion, the researchers stated that DL methods fitted better load forecasting than previously used methods.

In [101], Yuan Gao et al. in 2019 experimented in long-term (one year) building load forecasting and proposed an LSTM architecture with an additional self-attention network layer [102]. The proposed model emphasized on the inner logical relations among the dataset during prediction. The attention layer was used towards improving the ability of the model to convey and remember long-term information. The proposed model was compared to an LSTM model and a Dense Back Propagation Neural Network and evaluated, regarding load forecasting accuracy by MAPE. All three models were applied to a Nonresidential Office building in China. The dataset used in this research contained 12 attributes (weather, time, energy consumption, etc.) on daily measurements and ranged in a two-year time period. The main conclusion of this research was that the proposed method was able to address the issue of long-term memory and conveyed information better than the other two architectures, outperformed LSTM by 2.9% and DBPNN by 6.5%.

Heidrich Benedikt et al. [103] in 2020 proposed a combination of standard energy load profiles and CNN, and created a Profile Neural Network (PNN). The proposed architecture consisted of three different profile modules: standard load profile, trend and colorful noise, and the utilization of CNNs, which according to the authors has never been proposed before. In this scheme, CNNs were used as data encoders for the second (trend encoder) and third module (external and historical), in the prediction network in the third module (colorful noise calculation) and in the aggregation layer, where the results of the three modules were aggregated to perform load forecasting. The dataset used for the experiments was the result of merging two datasets: (a) one that contained historical load data gathered in a ten-year time period from two different campus buildings (one with weak and one with strong seasonal variation) and (b) weather data apprehended from Deutsche Wetterdienst (DWD) [104]. The merged dataset covered an eight-year period with one-hour resolution samples; 75% of the data was used for training and the remaining 25% for testing the models. In order to measure and better comprehend the performance of the PNN, the authors compared the results of four different variations of their model, regarding time window size (PNN0, PNN1 month, PNN6 month, and PNN12 month), to four state-of-the-art building load forecasting methods from the literature: RCFNet, CNN, LSTM and Stacked-LSTM, and three naïve forecasting models: periodic persistence, profile forecast, and linear regression. All models were evaluated, by RMSE and MASE metrics, and tested in short (one day) and in medium-term (one week) building load forecasting. All the PNN models, besides PNN0, outperformed the rest of the tested models, and among them, PNN1 achieved the best performance for both time horizons and both types of buildings. Regarding the training time, PNN models required the least time for both types of buildings for short-term forecasting but were outperformed by CNN in medium-term forecasting. According to the authors, the excess time needed, compared to the fastest model, offered a much better accuracy and thus it was an acceptable trade off. The authors also concluded that the proposed model was flexible due to the ability to change, according to cases, modules and encoders in order to achieve better results, and could also be used on a higher scale than a building.

In [105], Sun et al. in 2020 introduced a novel deep learning architecture that combined an input feature selection, through MRMR (Maximal Relevance Minimal Redundancy) criterion, based on Pearson’s correlation coefficient, and an LSTM–RNN architecture. The dataset used for the short-term forecasting experiments covered one year of historic load data (2017) for three different types of buildings (office, hotel, and shopping mall), apprehended by the Shanghai Power Department, while the weather-related data were collected from a local weather forecast website. In order to establish a baseline and prove the proposed model’s efficiency, the researchers conducted several experiments that were MRMR-based LSTM–RNN model variations competed against ARIMA, BPNN variations and BPNN-SD variations forecasting models, evaluated based on RMSE and MAPE metrics. According to the results, the proposed model, and more specifically the two-time step variation of the model, outperformed all other models and provided the most accurate load forecasting results. The authors concluded that due to the complexity of the building energy load prediction task, the right selection of input features played a key role in the procedure, and in combination with a hybrid prediction model, could present more accurate results.

In [106], Gopal Chitalia et al. in 2020 presented their findings, regarding deep learning architectures in short-term load forecasting, after experimenting on nine different DL models: Encoder–Decoder scheme, LSTM, LSTM with attention, Convolutional LSTM, CNN – LSTM, BiLSTM, BiLSTM with an attention mechanism, Convolutional BiLSTM and CNN–BiLSTM. The main idea was that RNN networks with an attention layer could produce more robust and accurate results. All the above models were tested in five different types of buildings on two different continents, Asia and North America. Four out of five datasets used in this research can be found in [100, 107, 108], while the weather data were collected from [109]. The authors investigated short-term building load forecasting, through several various aspects regarding feature selection, data optimization, hyperparameter fine-tuning, learning-based clustering, and minimum dataset volume, with acceptable results of accuracies. All DL architectures were evaluated by RMSE, MAPE, CV, and Root-Mean-Square Logarithmic Error (RMSLE) and provided a fair assessment for each building’s load forecasting results. The researchers concluded that the implementation of the attention layer in RNN networks increased the load forecasting accuracy of the model and could perform adequately in a variation of buildings, loads, locations, and weather conditions.

In January 2022, Xiao et al. [110] proposed an LSTM model to predict the day-ahead energy consumption. Two data smoothing methods, Gaussian kernel density estimation and Savitzky-Golay filter, were selected and compared. Data used in that work was from the Energy Detective 2020 dataset [111], including hourly consumption data from 20 office buildings and weather data, from 2015 to 2017. The authors concluded that data smoothing could help enhance the accuracy of prediction in terms of CVRMSE, however, when raw data were taken as the reference, the prediction accuracy decreased dramatically. A larger training set was recommended in conclusions, if the computing cost was acceptable.

The main characteristics of the DL-based approaches of this section are summarized in Table 2.

4.3. Multiple Type of Buildings Load Forecasting

In [112], H. Shi et al. in 2018 introduced for short-term household load forecasting, a pooling-based deep RNN architecture (PDRNN), boosted by LSTM units. In the proposed PDRNN, the authors combined DRNN with a new profile pooling technic, utilizing neighboring household data to address overfitting and insufficient data in terms of volume, diversity, etc. There were two stages in the proposed methodology: load profile pooling and load forecasting through DRNN. The data used for model training and testing were apprehended from Commission for the Energy Regulation (CER) in Ireland [113] and were collected from smart metering customer behavior trials (CBTs). The data covered one-and-a-half-year time period (July 2009 to December of 2010). The proposed method was compared to other state-of-the-art forecasting methods, ARIMA, RNN, SVR and DRNN models, and was evaluated by RMSE, NRMSE, and MAE metrics. The researchers concluded that PDRNN outperformed the rest of the models, achieving better accuracy, and successfully addressing overfitting issues.

In the same year, Aowabin Rahman et al. [114] proposed a methodology focused on medium- to long-term energy load forecasting. The authors examined two LSTM-based (S2S) architecture models with six layers. The contributions of this work were: (1) energy load consumption forecasting for a time period ranging from a few months, up to a year (medium to long term); (2) quantification of the performance for the proposed models on various consumption profiles for load forecasting in commercial buildings and in aggregated load at the small community scale; and (3) development of an imputation scheme for missing history consumption data values by the use of deep learning RNN models. Regarding the used dataset, the authors followed different protocols to collect useful data: (1) A Public Safety Building at Salt Lake (PSB at Utah, USA). The dataset used for this part of the paper, obtained from the PSB, was at one-hour resolution for a time frame of 448 days (one year, two months, and three weeks) covering a time period from the18th of May 2015 till the 8th of August 2016. The proposed architectures were tested in several load profiles with a combination of variables (weather, day, month, hour of the day, etc.). The first year of the dataset was used for training and the remaining (approximately 83 days) for testing; (2) A number (combinations of maximum 30) of residential buildings in Austin (Texas, USA). The dataset used for this part of the paper was acquired from Pecan Street Inc. Dataport Web Portal [72], at one-hour resolution for an approximate two-year time period from January 2015 till December 2016. The dataset included data for 30 individual residential buildings and the load consumption forecasting was aggregated. The first year of the dataset was used for training and the remaining time for testing. The experiments revealed that the prediction accuracy, for both models, was limited and highly affected by the weather. Moreover, if the training data greatly differed from testing and future weather data, then a model that produced sufficient power load consumption predictions for a specific building cannot be applied successfully to a different building. In addition, if major changes regarding occupancy, building structure, consumer behavior, or the installed appliances/equipment occurred in the specific building, the same model would have decreased accuracy. According to the authors’ findings, both proposed models performed better in commercial building energy load forecasting, than a three-layer MLP model, but worse over a one-year period forecasting regarding the aggregated load for the residential buildings, with the MLP model performed even better as the total of residential buildings increased. As a final remark, the researchers concluded that there was a lot of potential in the use of deep RNN models in energy load forecasting over medium- to long-term time horizon. It is worth mentioning that besides the consumption history data, the authors considered several other variables (day of the week, month, time of the day, use frequency, etc.) and weather conditions acquired from Mesowest web portal [73].

In [115] Y. Pang et al. in 2019, in order to overcome the limited historical consumption data for most buildings, to utilize for short-term load forecasting model training, proposed the utilization of Generative Adversarial Network method (GAN). The researchers introduced the GAN-BE model, an LSTM unit-based RNN (LSTM-RNN) deep learning architecture, and experimented with different variations of it, with or without attention layer. For the experiments were used data, collected from four different types of buildings: an office building, a hotel, a mall, and a comprehensive building. The different variations of the proposed model were compared to four LSTM variations and evaluated by MAPE, RMSE, and Dynamic Time Warping (DTW) metrics. The proposed model, with and without the attention layer, outperformed the other models, displaying better accuracy and robustness.

In [116], Khan et al. in 2020 developed a hybrid CNN with an LSTM autoencoder architecture (CNN with LSTM-AE) that consisted of ten layers, for short-term load forecasting in residential and commercial buildings. The load forecasting accuracy of the proposed model was compared (by MAPE, RMSE, MSC, and MAE metrics) to other DL schemes (CNN, LSTM, CNN-LSTM, LSTM-AE). Two datasets were used in this research: (1) From the UCI repository [53] and (2) a custom dataset regarding a Korean commercial building from a single sensor, instead of four used on the UCI dataset, sampled in a 15-minute window and a total amount of 960.000 records. For this experiment, the first 75% of the dataset (three years) was used for training and the remaining 25% (one year) for testing. All models were tested, on both datasets, in hourly and daily resolution. The authors extracted several conclusions from their research. When they tested the above DL models, fed on the UCI dataset, over hourly data resolution, they discovered that some cross combinations among them produced better results than each one of them individually. The latter inspired them to develop the proposed model, which outperformed all the above tested DL models. They also experimented using the same dataset, over daily data resolution, and the proposed model achieved again the best forecasting accuracy. In the next step of their research, they tested their model using their own dataset over hourly and daily data resolution. Their model produced less accurate results than LSTM and LSTM-AE models, over hourly data resolution, but outperformed all other models, over daily data resolution. The general conclusion of their research was that the proposed hybrid model performed better during the experiments, especially over daily data resolution, compared to other DL and more traditional building load forecasting methods.

A kCNN-LSTM deep learning framework was proposed in [117]. The proposed model combined k-means clustering for analyzing energy consumption patterns, CNNs for feature extraction and LSTM-NN to deal with long-term dependencies. The method was tested with real-time energy data of a four-story academic building, containing more than 30 electrical-related features. The performance of the model was assessed in terms of MAE, MSE, MAPE, and RMSE for the considered year, weekdays, and weekend. The authors observed that the proposed model provided accurate energy demand forecasting, attributed to its ability to learn the spatiotemporal dependencies in the energy consumption data. The kCNN-LSTM was compared to k-means variants of state-of-the-art energy demand forecast models, revealing higher performance analysis in terms of computational time and forecasting accuracy.

In the same year, Lei et al. [118] developed an energy consumption prediction model based on the rough set theory and deep belief NN (DBN). The used data were collected from 100 civil public buildings (office, commercial, tourist, science, education, etc.) for rough set reduction and from a laboratory building to train and test the DL model. The public building data referred to five months of a total of 20 inputs data collection. The laboratory building data referred to less than 20 energy consumption inputs, obtained for approximately a year, including building consumption and meteorological data. Short-term and medium-term predictions were included. Prediction results, MAPE and RMSPE, were compared to that of a back-propagation NN, Elman NN, and fuzzy NN, revealing higher accuracy in all cases. The authors concluded that the rough set theory was able to eliminate unnecessary affecting factors of building energy consumption. The DBN with a reduced number of inputs resulted in improved prediction accuracy.

In [119], Khan et al. introduced a hybrid model, DB-Net, by incorporating a dilated CNN (DCNN) with bidirectional LSTM (BiLSTM). The proposed method used a moving average filter for noise reduction and handled missing values via the substitution method. Two energy consumption datasets were used: the IHEPC dataset [53] consisting of four-year energy data (three years for training and 1 year for testing) and the Korean dataset of the advanced institutes of convergence technology (AICT) [120] for commercial buildings consisting of three-year energy data (two years for training and one year for testing). The proposed DB-Net model was evaluated using MAE, MSE, RMSE, and MAPE error metrics and it was compared to various ML and DL models. The proposed model outperformed the referenced approaches, by forecasting multi-step power consumption, including hourly, daily, weekly, and monthly output with higher accuracy. However, the method was limited by the fixed-size input data and the use of the invariance time-series data in a supervised sense. The authors suggested applying several alternative methods to boost the performance of the model, more challenging datasets, and more dynamic learning approaches as their future work.

Wang et al. [121] proposed a DCNN based on ResNet for hour-ahead building load forecasting. The main contribution of their work was the design of a branch that integrated the temperature per hour into the forecasting branch. The learning capability of the model was enhanced by an innovative feature fusion. The genome project building dataset was adopted [122], including load and weather conditions of nonresidential buildings; the focus was on two laboratories and an office. The performance of five DL models was considered for comparison reasons. Comparison results for single-step and 24-step building load forecasting revealed that the proposed DCNN could provide more accurate forecasting results, higher computational efficiency, and stronger generalization for different buildings.

In January of 2022, Jogunola et al. [123] introduced architecture, named CBLSTM-AE, including a CNN, an autoencoder (AE) with bidirectional LSTM (BLSTM). The effectiveness of the proposed architecture was tested with the well-known UCI dataset, IHEPC [53] and the Q-Energy [124] platform dataset was used to further evaluate the generalization ability of the proposed framework. From the Q-Energy dataset, a private part was used including two small-to-medium enterprises (SME), a hospital, a university, and residences. The time resolution of both datasets was converted to 24 hours towards short-term consumption prediction. The IHEPC data was further used for comparison of the proposed method with state-of-the-art frameworks. The proposed model achieved lower MSE, RMSE, and MAE and improved computational time, compared to the other models: LSTM, GRU, BLSTM, Attention LSTM, CNN-LSTM and electric ECP-based CNN, and BLSTM (EECP-CBL). Results demonstrated good generalization ability and robustness, providing an effective prediction tool over various datasets.

In February 2022, the most research on energy consumption forecasting up-to-date was presented by Sujan Reddy et al. in [125]. The authors proposed a stacking ensemble model for short-term load consumption. ML and DL models (RF, LSTM, DNN, evolutionary trees (EvTree)) were used as base models. Their prediction results were combined using Gradient Boosting (GBM) and Extreme Gradient Boosting (XGB). Experimental observations on the combinations revealed two different ensemble models with optimal forecasting abilities. The proposed models were tested on a standard dataset [126], available upon request, containing approximately 500000 load consumption values at periodic intervals for over 9 years. Experimental results pointed out the XGB ensemble model as the optimal, resulting in reduced training time and higher accuracy, compared to the state-of-the-art (EvTree, RF, LSTM, NN, ARMA, ARIMA, Ensemble model of [126], feed-forward NN (FNN–H20) of [127] and DNN-smoothing of [127]). Five regression measures were used: MRE, R-squared, MAE, RMSE, and SMAPE. A reduction of 39% was reported in RMSE.

The main characteristics of the DL based approaches of this section are summarized in Table 3.

5. Datasets

The dataset is the key element of all deep learning methods. In order to train a model in understanding and producing useful results, the dataset should be selected carefully. The user has to weigh the options of choosing certain features of each dataset, in accordance with the result that the model produces. In the problem of building load forecasting, we encounter in the existing literature a finite number of datasets being used by researchers, mostly acquired from the building under investigation. Data collection is labor intensive and presupposes a metering infrastructure installed in the buildings for effective energy consumption monitoring. Moreover, historical data of several years is usually necessary. In most research papers, the datasets are comprised of consumption history data (thousands of samples covering a time period of over a year), focusing on major power consuming devices/appliances (kitchen, water heater, HVAC, etc.) and different load profiles. In some research papers, the authors considered the weather conditions, but the required data were not part of the same dataset as the consumption history data and the weather data had to be acquired from different sources. Some experimentations, driven mostly by the results of each methodology, lead the researchers to add weather conditions and cast the time-series data into categories such as weekday, weekend, hour of the day, and achieve by that way, more promising results. Global and local climate change as well as urban overheating can seriously affect the energy consumption of urban buildings, creating weather datasets that are not reliable over the years. So far, the research is focused on testing and experimenting in different deep learning models, sometimes involving the same dataset, in order to better understand and conclude comparatively to which model provides better results by using the same dataset. It should be noted here that approximately 48.2% of the papers experimenting in residential and multiple building load forecasting in this work are using the same dataset. Towards this end, research efforts are focusing on load forecasting based on limited input variables, which would additionally lead to less computational complex models appropriate for real-time applications. An interesting observation regarding the datasets is that they can be used efficiently for the building that has been acquired from. Any effort to be adjusted in a different building will not produce the desirable results. This limitation in generalization is a major drawback that needs to be addressed in future research in the field. Reliable forecasting models for varying data, building types, locations, weather, and load distributions need to be developed. The solution of the lack of detailed datasets for numerous buildings could be addressed by the rapid growth of the Internet of Things (IoT) and the growing capability of the research community to make use and better comprehend Big Data. The evolution of home/building and grid to “smart home/building” and “smart grid” by applying a number of sensors and actuators (IoT) will provide the researchers with a vast amount of data (volume), rich in features (variety), and in almost real-time (velocity), better described as Big Data.

6. Discussion

Building load forecasting is an emerging area of building performance simulation (BPS), facing technical complexity and major significance to a variety of stakeholders, since it supports future operational and energy efficiency improvements in existing buildings [128]. Deep learning models have entered the load forecasting field in recent years due to their ability to deal with big data and lead to high forecasting accuracies. Reviewing the relevant literature regarding building load forecasting with deep learning methods, interesting findings became apparent. To date, most DL models have been applied to residential buildings (47.5%). Residential buildings count for almost 70% of total energy consumption [129]. The increase in population and floor area per person in urban cities resulted subsequently in an increase in residential energy consumption. The latter motivated the research community to investigate further the energy load forecasting of residential buildings, so as to account for the spent energy and propose energy conservation measures and future green policies. Furthermore, most DL models were applied for short-term forecasting horizon (55%), e.g., a day or an hour ahead. Short-term forecasting may lead to more accurate results, since a longer forecasting horizon would significantly increase the possibility of alterations of the input data, not known beforehand and able to impact severely the forecasting accuracy. The most popular architecture in the literature was found to be the LSTM model. LSTM models are able to provide a great number of parameters, e.g., learning rates, input and output biases, thus, they do not require fine adjustments. Although the results of DL models appear promising, many challenges need to be addressed, mainly related to data availability and improvements on the DL models.

The human factor is one of the most defining factors that add to the difficulty of building load forecasting problem. On the small scale of a building, with a number of offices or flats, or even on the scale of a single household, human behavior can challenge even the most efficient DL load forecasting methods. It is a problem that several researchers pointed out in their work and tried to handle it by aggregating the load of several homes together before proceeding to forecast. The higher the scale of the researched structure or the sum of the people working/living in it, the less of an impact in forecasting, as the individual human behavior falls to a more general behavior that is easier to predict.

It is also important to point out that the primary DL models that were utilized in building load forecasting were in a plainer mode than later, as the research progressed, where we encounter more complex schemes, DL model combinations and hybrid models that produced more efficient and accurate results. In general, a lack of general guidelines for developing and testing a DL model is missing from the literature. For example, trial and error was applied in many cases for tuning the hyperparameters, resulting in methodologies not being able to be reproduced easily. Moreover, the models balance between high accuracy and lower training time or higher model complexity. The more computational complex the model, the most accurate is reported in most of the referenced cases; however, the latter leads to an increase in training time, making the models inappropriate for real-time applications.

In general, building energy models need to be improved so as to represent in more detail the actual performance of the building. The solution is in model calibration techniques, by calibrating several inputs to the existing building simulation programs. Calibration could significantly improve the performance of the energy models; however, simulation accuracy is determined by multiple parameters, referring to measured energy building data inserted as calibration inputs. The collection of detailed data may require extensive time and costs. Therefore, another challenge that the researchers had to encounter, as already mentioned, is the lack of detailed datasets. Additionally, the absence of public available datasets obstructs the reproduction of results and comparative studies. In a great number of research papers, in order to explore the impact of different features, to enhance the prediction accuracy by producing more robust and generalized models, researchers had to combine different datasets or process the existing data in several different ways. Once the lack of datasets is addressed properly, the greater challenge in this field will be the development of a DL model, or the combination/utilization of the existing DL models, in a way that it can be applied in several different types of buildings (office, residential, academic, etc.), use detailed real-time data, proceed automatically to self-adjustments, and produce accurate and applicable results, for the Energy Industry towards efficient energy management.

7. Conclusion

The application of deep learning methods to the prognosis of the electrical load of buildings is a subject that first appeared in 2016 and since then continues to demonstrate an upward trend when it comes to the interest of researchers. The latter trend is probably due to the promising results of relevant research work, compared to alternative existing methods. Several useful conclusions were emerged from this literature review, regarding the up-to-date engagement of the scientific community in the current subject. Research revealed a higher interest in residential building load forecasting covering the 47.5% of the referenced literature, mainly towards short-term forecasting, in 55% of the papers. The latter was attributed to the lack of available public datasets for experimentation in different building types, since it was found that in the 48.2% of the related literature, the same historical data regarding residential buildings load consumption was used. Even though the several encountered challenges, researchers proved to be resourceful, resilient in their work, and proposed or utilized several new or pre-existing methods to address most of the issues confronted on the way. The advancement of technology and the price decrease in hardware equipment, necessary for DL methodology applications for the management of the vast amount of data, also contributed to the enhancement of DL methods application. To conclude, considering the up-to-date published research, the DL models produce accurate and promising results regarding building load forecasting, outperforming almost all the other traditional forecasting methods such as physics-based models, statistical models. Most of the researchers concluded that further testing of their models, with different datasets and more features, would apprehend more accurate results. The latter can be addressed by the Internet of Things (IoT) and smart sensors embedded in the grid, upgrading it to “smart”, paving the way for future research work.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported by the MPhil program “Advanced Technologies in Informatics and Computers,” hosted by the Department of Computer Science, International Hellenic University, Greece.