Abstract

For the building energy consumption models with complex scale sensitivity, it is difficult to achieve ideal prediction effect with single-granularity prediction model. Therefore, this paper proposed a multigranularity MgHa-LSTM model based on convolutional recurrent neural network, including a multigranularity feature extraction module and a long-term dependency capture module. Multigranularity feature extraction included granularity segmentation, feedback mechanism, and parallel convolutional modules, which can capture short-term scale sensitivity dependencies. Long-term dependency capture consists of a hybrid attention mechanism and long-short term memory layers, which are able to capture long-term dependencies. For building energy consumption patterns with different scale sensitivity, MgHa-LSTM, MLP, CNN, LSTM, and MsC-LSTM models were constructed on the IHEPC building energy consumption dataset used in this paper for comparative experiments. The experimental results showed that on the IHEPC dataset, the MSE of the building energy consumption prediction model is 0.2821 based on the MgHa-LSTM model proposed in this paper, which is equivalent to 93.72% of the MsC-LSTM model with the smallest MSE among other deep learning prediction models. Compared with other deep learning prediction models, the prediction results of the MgHa-LSTM building energy consumption prediction model are more accurate.

1. Introduction

In the face of carbon emissions and climate issues, the 2021 National Two Sessions clearly pointed out the importance and significance of “emission peak and carbon neutrality,” which was written into the government work report for the first time and listed as one of the key tasks in 2021. Emission peak and carbon neutrality is an extensive, complex, and far-reaching economic and social systematic change [1]. The “14th Five-Year Plan” pointed out that the country needs to accelerate the development of “green and low-carbon,” of which green buildings are one of the green and low-carbon plans. The development of green buildings and energy conservation is an important part [2].

Reports on building energy consumption at home and abroad showed that building energy consumption occupied a high proportion of the total energy consumption of various countries. In China, the energy consumption of buildings was 947 million tons of standard coal equivalent in 2017, accounting for 21.11% of the total energy consumption in China. According to a report by the Building Energy Conservation Research Center of Tsinghua University, the energy consumption of buildings was 1 billion tons of standard coal equivalent, which accounted for about 22% of the total energy consumption in China. Abroad, developed countries grew by an average of 1.7% in 2019, but carbon emissions related to energy consumption fell by 3.2%. Among them, electricity-related industries made great contributions to energy conservation and emission reduction, accounting for 36% of the total carbon emissions. As can be seen from the above data, building energy consumption accounted for a relatively large proportion and had the potential of energy conservation and emission reduction. Existing research has made important achievements in building energy conservation, such as building envelope, building energy consumption prediction, green buildings, and so on [35].

For the task of building energy consumption prediction, historical energy consumption data collected by many energy agencies, and indoor and outdoor environmental data collected by sensors were integrated into specific datasets for research and learning [6]. In addition, with the proposal of deep learning method [7] and its development over the years, it has been proved to perform well in image classification, target recognition, time series prediction, natural language processing, and other tasks. With the continuous improvement of computer computing power [8] and the enrichment of relevant data, it is a reasonable scheme to predict building energy consumption with deep learning algorithm [9]. At present, there are data outliers in the building energy consumption prediction task, which have an impact on the prediction performance. Deep learning [10] models rarely consider feature representations with different scale sensitivities, making model performance stagnant.

The main contributions of this paper are as follows. (1)In the MgHa-LSTM (Multigranularity Hybrid Attention LSTM) model, granularity segmentation and feedback mechanism were proposed, and parallel convolutional layers were used to propose short-term features for building energy consumption to capture the short-term scale sensitivity of building energy consumption(2)The MgHa-LSTM model used hybrid attention mechanism to integrate deep feature maps with different granularities and used recurrent layer to capture the long-term temporal correlation of building energy consumption(3)Based on the above two points, the MgHa-LSTM model was proposed, which can achieve better prediction performance compared with other models on IHEPC dataset

Existing machine learning methods are prone to overfitting when the correlation of input variables is complex or the amount of data increases.

Deep learning has been actively researched in various application fields with good results. Deep learning has been involved in many hot research fields. In speech recognition [11] and natural language processing [12], recurrent neural networks [13] perform well. In image recognition, convolutional neural networks are superior to conventional methods [14]. Among them, the main training of the convolutional neural networks is to learn the weights of the convolution kernels on the feature maps of each layer and extract abstract visual features, such as points, lines, and surfaces, from the input data. The relationship between pixels in the image can be learned through the weights of the convolution kernels. During training, recurrent neural networks store long-term information in memory for proper processing, representation, and storage. The hidden long-term information is updated as the number of iterations increases to ensure that the hidden long-term information is persistent.

Fumo and Rafe Biswas conducted multiple linear regression analysis and quadratic regression analysis on hourly data of a research institution, explored the improvement of outdoor temperature and solar radiation on the performance of multilayer perceptron model, and confirmed that the temporal resolution of historical data would have an impact on the performance of prediction model [15]. Amber et al. constructed a multiple regression model and genetic programming model for daily electricity consumption of the administration building of the South Bank residential area of London South Bank University, and both models used independent variables of environmental temperature, solar radiation, relative humidity, and wind speed [16]. Chen et al. proposed a prediction model based on the ResNet network and improved the ResNet network to cope with the nonlinear changes in building energy consumption. Validation on three datasets confirmed the effectiveness of the model, but the computational burden was heavy [17]. Wu et al. fused the features of multimodal data as a rich information source for reliable analysis [18]. Muralitharan et al. proposed a neural network-based genetic algorithm (NNGA) and a neural network-based particle swarm optimization (NNPSO) algorithm, in which NNGA had a better performance better in short-term building energy consumption prediction, while the NNPSO algorithm was more suitable for long-term building energy consumption prediction. Through verification on the real-time data provided by Pecan Street Inc., the performance of NNGA was superior to other comparison models such as CNN model [19]. Li et al. constructed building energy consumption prediction models with stacked autoencoders and extreme learning machines, using the former to extract input variable features and the latter to predict energy consumption. The best performance was obtained compared to some popular machine learning methods [20]. Khan et al. conducted a new prediction framework based on CNN and LSTM. The framework utilizes a unique CNN model to extract multitime scale features that incorporate short, medium, and long dependencies in time-series data. From the experiment results, the proposed framework achieves state-of-the-art performance in comparison to other existing methods [21]. Kim and Cho constructed a CNN-LSTM neural network combining convolutional and recurrent modules to predict residential energy consumption. Convolutional modules were used to obtain correlations between multiple variables, while recurrent modules were used to obtain temporal correlations [22]. Massaoudi et al. proposed a hybrid convolutional neural network-bidirectional short and long term memory network model (SG-CBiLSTM) based on Evolution Strategy (ES) and Savitzky-Golay (SG) filters. The SG filter was used to smooth the energy consumption data to improve the learning ability of the model, and the ES was used to optimize the structure of the SG-CBILSTM model. Validated on the ISO New England dataset, the performance of this model was significantly improved over the original CBiLSTM model and also showed higher performance than other models [23]. Chen et al. proposed a multistep prediction model of BiGRU neural network based on attention mechanism to solve the long-term prediction problem [24]. Sehovac et al. proposed a novel energy consumption prediction method based on sample generation and sequence-to-sequence (Seq2Seq) deep learning algorithm, which employed recurrent neural networks to capture temporal correlations. Experiments showed that the proposed model (GRU-Seq2Seq) outperformed LSTM-Seq2Seq, RNN-Seq2Seq, and other neural network models [25]. Son and Kim constructed an LSTM model to predict the electricity consumption of a house in South Korea from 1991 to 2012 and introduced social characteristics such as electricity price and population into the prediction model to improve the prediction ability of the model [26]. Fan et al. constructed a transfer learning building energy consumption prediction model and validated it on a dataset of 507 nonresidential buildings, randomly selecting 407 buildings as the source domain and 100 buildings as the target domain. Compared with the prediction model without transfer learning, its root mean square error is reduced by 34% [27].

Our proposed framework is different from the abovementioned networks in the following ways. (1)Most of the existing building energy consumption prediction methods are based on CNN methods, but our model adopts LSTM-based methods. We innovatively proposed granularity segmentation and feedback mechanism in the model to capture the short-term scale sensitivity of building energy consumption. Our model solved the problem of different scales such as the frequency of use and different durations of different electrical appliances in the building energy consumption data(2)Our model used hybrid attention mechanism to integrate deep feature maps with different granularities and used recurrent layer to capture the long-term temporal correlation of building energy consumption

3. Methodology

The MgHa-LSTM model consists of a multigranularity feature extraction module and a long-term capture module. From the perspective of the system, this model takes the historical energy consumption data as input by intercepting the time window. First, divides the data to generate multiple batches of data of different granularity, sends the multigranularity feature extraction module to extract composite features, captures different short-term scale sensitivity dependencies, generates a deep feature map and sends it to the recurrent network layer through the attention mechanism to obtain long-term time correlation dependence, and finally, predicts the energy consumption value through the fully connected layer. As shown in Figure 1, it shows complete workflow of the framework.

3.1. Short-Term Scale Sensitivity Capture of Building Energy Consumption

This section mainly introduced the structure of short-term sensitivity capture in the proposed MgHa-LSTM model, including granularity segmentation, feedback mechanism, and parallel feature extraction.

The core of the multigranular feature extraction module is the parallel convolutional module. The module contains two convolutional layers, two batch normalization layers, two pooling layers, and two activation functions. The input is the fusion feature map, and the output is the deep feature map corresponding to different granularities of the current iteration.

When a building consumes energy, various appliances turn on and off at different times, and are used at different times and for different hours each day. But, there is a certain pattern of electricity consumption, that is, a combination of multiple scale sensitivities. A single convolution structure can extract short-term time and feature correlation by a single convolution kernel. Therefore, with the use of parallel convolutional structures and multigranularity data, representations of different scale sensitivity can be obtained with similar structures.

The short-term sensitivity capture procedure chart was shown in Figure 2. The model received raw input data. For this model, it did not use all historical energy consumption data. On the one hand, the length of historical energy consumption data in different periods was different, which would bring difficulties to training. On the other hand, since not all historical energy consumption data were meaningful, there was a lot of redundant information. Therefore, a sliding window was used to predict the data. The model divided the raw data in the window to generate multiple batches of data with different granularities, and then sent the data to a parallel convolution module for deep feature extraction, and then to subsequent structures for long-term correlation capture.

3.1.1. Granularity Segmentation

In terms of building energy consumption, the energy consumption generated in the current period is equal to the sum of the energy consumption generated by all electrical appliances in the current period. However, various appliances have different time scale sensitivity depending on the duration, frequency, and mode of use. Therefore, feature extraction with a single scale would lead to ignoring other potential energy modes. The fine-grained scale would help to focus more on the details of energy consumption in a short period of time, while ignoring the trend of energy consumption over a long period of time. The coarse-grained scale would focus on energy consumption trends over a longer period of time, while ignoring energy consumption details over a shorter period of time. This paper proposed granularity segmentation of raw building energy consumption data, and multiple batches of data with different granularities were generated into the parallel feature extraction module, which could learn feature representations with multiple temporal resolutions.

The input data was historical energy consumption data within a fixed long-term window, which was first standardized and then went through the granularity segmentation. Since model training, validation, and testing were all supervised learning, each set of inputs should have a corresponding output. At each iteration of the process, the model went through a set of inputs and a set of outputs. Granularity segmentation was performed on the input data, whose batch number can be used as one of the hyperparameters of the model. The value of the hyperparameter used in this paper is 3, so the introduction was based on the hyperparameter 3.

For each iteration, the input data took 64 as the window length to segment the historical data, so as to obtain the energy consumption feature vector of the latest 64 moments, denoted as, where represents the feature vector at time , and represents the current time. Since the granularity division batch hyperparameter was set to 3, the three batches of data has been named , , and , corresponding to coarse-grained, middle-grained, and fine-grained, respectively.

The input data for a coarse-grained batch was defined as averaged the feature vectors of every 4 moments, namely . In the formula, represents the mean, , and so on.

The input data for a middle-grained batch was defined as down sampled or averaged the feature vectors of every 2 moments, namely , where and so on.

The fine-grained input is defined as .

Here, and are the data after averaging the original data. contains the shallow features at 64 moments, contains the shallow features at 32 moments, and contains the original features at 16 moments. , , and are all 16 moments in length.

That is, when obtaining fixed-length historical energy consumption data of 1 hour, 2 hours, and 4 hours, the 1-hour granularity requires only 16 hours of data, the 2-hour granularity requires 32-hour data, and the 4-hour granularity requires 64-hour data. 64-hour of data may seem big, but it is still small for the entire dataset.

3.1.2. Feedback Mechanism

The feedback mechanism was to feedback the deep feature map extracted in the previous iteration in the training process to the input features in the next iteration, and then fused the two features in a certain way. After that, the composite features were extracted through the deep feature extraction module of the current iteration, and then the deep feature map was fed back to the next iteration, and so on. The feedback mechanism is shown in Figure 3.

3.1.3. Upsampling of Deep Features

In each iteration of the model training in this paper, since the deep feature extraction module contained the pooling layer, the obtained deep feature images were feature images of small size. Before the shallow features were fed back to the next iteration process and feature fusion was performed, the size of the deep feature maps in the current iteration and the shallow feature maps in the next iteration should be equal. Therefore, an upsampling operation on the depth feature map was required in the current iteration. Because in the convolution operation of the convolution layer, the setting of the padding hyperparameter would correspond to the size of the convolution kernel. For example, if the size of the convolution kernel is , the padding is 1; if it is , the padding is 2. Therefore, there was an integer multiple relationships between the length and width of the deep feature map and the shallow input features. The upsampling operation was shown in the following:.

Among them, represents the moment of the last iteration, represents the length of the feature map in the time direction, and is the width of the feature map in the variable direction. Since two pooling layers are used in the deep feature convolution module in the model used in this paper, it is necessary to enlarge the length and width of the deep feature map by 4 times. Upsampling is performed by the nearest neighbor interpolation.

3.1.4. Fusion of Deep Feature Maps and Shallow Feature Maps

The coarse-grained deep feature image from the previous iteration is fused with the fine-grained shallow feature image from the current iteration to extract composite features, which is done once between two similar granularities. The operation is shown in the following:

Among them, and represent the fusion feature map, represents the coarse-grained deep feature map that is upsampled in the previous iteration, and represents the fusion method, which has three operation schemes: stack, concat, and attention. The operation can be seen in the following:

Stack operation is to concatenate deep feature map and shallow feature map in channel dimension. Concat operation is to concatenate deep feature map and shallow feature map in feature dimension. Attention operation is the fusion of deep feature maps and shallow feature maps by using attention mechanism.

3.1.5. Multigranularity Feature Extraction

The core of the multigranularity feature extraction module is the parallel convolution module, which contains two convolutional layers, two batch normalization layers, two pooling layers, and two activation functions. The input is the fusion feature map, and the output is the depth feature map corresponding to different granularities of the current iteration.

Convolutional layers can extract local features with a small number of weights. In the building energy consumption data, the input is a feature map with the number of channels as 1, the length as the time window length, and the width as the feature vector length. In other building energy consumption prediction studies, 2D convolutions are used to extract deep features. However, although the two-dimensional convolution is meaningful in the time dimension, the convolution in the feature dimension is not meaningful. Therefore, one-dimensional convolution can be used to extract features in the feature vector direction and time direction, respectively. The convolution operation is shown in the following:

Among them, represents the weight matrix of the convolution kernel, and represent the length and width of the convolution kernel, represents the input feature map, and represents the output feature map.

The main goal of the pooling layer is to reduce unnecessary parameters and calculations while retaining the main data features, prevent overfitting to a certain extent, and improve the generalization ability of the model. The pooling operation is shown in the following:

Among them, represents the input feature map, represents the output feature map, represents the max-pooling operation, and represents the average pooling operation. In this model, the pooling layer is and , so the length and width of and have a 2-fold relationship, respectively.

The main function of the batch normalization (BN) layer is to avoid the gradient vanishing or explosion caused by the large difference in the weight data of the intermediate layer during model training. The BN layer can adjust the distribution of weight data in the middle layer.

Activation function can introduce nonlinearity into model predictions. The activation function used in this model is function, which can be described by .

3.2. Long-Term Temporal Correlation Capture of Building Energy Consumption

This section mainly introduces the structure of long-term temporal correlation capture in the proposed MgHa-LSTM model, including the hybrid attention mechanism and recurrent layers.

In the process of building energy consumption, there is not only sensitivity information at different scales, but also obvious periodicity in units of days, weeks, months, and seasons. Parallel convolutional layers cannot extract such long-term information due to their limited receptive fields. Therefore, the model also needs to capture long-term temporal correlation features.

The long-term correlation capture process is shown in Figure 4. The parallel convolutional layers used in short-term scale sensitivity capture need to fuse multiple features when extracting parallel multigranularity features. The hybrid attention mechanism is used here to send the fused feature map into the long-short term memory layer, and finally, a prediction value is output through the fully connected layer as the energy consumption prediction value of this iteration.

3.2.1. Hybrid Attention Mechanism

Feature fusion is required for parallel feature images input to the Ha-LSTM module. Due to the two feature fusion methods of concat and stack, both the parameters of the intermediate layer and the calculation amount will increase a lot. Therefore, this paper used the attention mechanism for feature fusion to reduce unnecessary parameters and learn the relationship between features and results earlier.

The deep feature map corresponding to each particle size is denoted as . Firstly, the deep feature maps are converted into one-dimensional vectors and denoted as , and the attention weight of each particle size feature map is calculated with a two-layer feed-forward neural network. Then, the weighted average is obtained as the fused feature map.

The definition of the two-layer feed-forward network is shown in Equation (9), where and are the weights of the first-layer and the second-layer feed-forward network, respectively, and and are the bias terms of the first-layer and the second-layer feed-forward network, respectively, and is the vector of the input feature map transformation, which uses the function.

The schematic diagram of hybrid attention feature fusion is shown in Figure 5. The calculation definition of the attention weight is shown in Equation (10), where represents the output of feature maps with different granularities in the two-layer feed-forward network, and represents the attention weight vector. The advantage of using the function here is that the sum of the attention weights corresponding to each feature map is 1. Numerically speaking, the fused feature maps will not have very large numbers.

The feature fusion is performed by the attention mechanism, and the subsequent feature map is expressed as Equation (11), where represents the fused feature map, represents the attention weight, and represents the one-dimensional vector transformed by the input feature map. The fused feature map is the sum of the one-dimensional vectors of different granularity feature maps multiplied by the attention.

3.2.2. Long-Short Term Memory Layer

Long-short term memory (LSTM) network is the most widely used in the recurrent neural network (RNN) system. The LSTM model and the GRU model are currently popular extensions and improvements based on the RNN model [2830].

Because of its unique gate, input gate, forgetting gate, and output gate, LSTM can theoretically solve the vanishing and explosion prone to occur in gradient calculation during the back-propagation of the RNN model to a certain extent and can “memorize” the long-term training-time information. The structure of the LSTM model is shown in Figure 6. The direction of the arrow indicates the direction of the data flow.

The first forgetting gate decides to remove unnecessary information according to the information passed in at the last moment, which can be expressed as

In Formula (4), represents the forgetting threshold at moment , represents the activation function, represents the weight, represents the bias, represents the input value, and represents the output value of the hidden layer at moment .

The second input gate determines which information should be stored in the cell based on the current input data and can be expressed as

Among them, represents the input threshold at time , and represent the bias term, , , and represent the weight, represents the internal state of the cell at time , and represents the internal state of the cell at time , which is transferred from the previous time to the current time.

The third output gate determines the output value of the cell at the current moment, which can be expressed by the following:

Among them, represents the output threshold at time , is the bias term, and is the weight, is the output value of the cell at time , which will also be passed to the next time, and represents the activation function. After the data passes through the three gates, the output is valid information, and the invalid information is forgotten.

3.2.3. Fully Connected Layer

In the fully connected layer, the predicted value is calculated by inputting the feature vector output by the LSTM layer, which is used as the predicted energy consumption value of this iteration, and the network weights are adjusted by back-propagation of the loss value calculated by the loss function.

The predicted value of building energy consumption is defined as Equation (18), where represents the predicted value of building energy consumption of the model, and represent the weights and bias terms of the fully connected layer, and represents the output feature vector of the LSTM layer. Dropout method is used for fully connected layers. During training, the connections between the weights and outputs of the intermediate layers are disconnected with a fixed probability.

3.3. Building Energy Consumption Prediction Model Based on MgHa-LSTM

The overall structure of the MgHa-LSTM model is shown in Figure 7.

From the feed-forward direction, the input is the historical data of building energy consumption. First, the granularity is segmented into groups. All but the coarsest-grained data are fused with upsampled feature maps from the coarse-grained deep feature maps from the previous iteration. The fusion features are input into a parallel convolution module, which consists of two sets of convolutional layers—batch normalization layer and activation function pooling layer. On the one hand, the feature map output by the pooling layer of the last layer is used for upsampling, and then passed to the fine-grained input features of the next iteration for composite feature fusion, and on the other hand, feature fusion is performed through a hybrid attention mechanism. The generated feature map is sent to the long- and short-term memory layer to capture the long-term dependence of building energy consumption, and the feature vector is output to the fully connected layer, and finally, a predicted value of building energy consumption is output. In the feedback direction, the predicted building energy consumption value and the real building energy consumption value calculate the MSE loss value, and then adjust the weights in the network through back-propagation.

4. Experiments

4.1. Selection of Datasets

This paper used the IHEPC dataset provided by Hebrail and Berard in the UCI Machine Learning Repository, which is widely used in the development of building energy consumption prediction models. The dataset contains a total of 2,075,259 records from a house in Sceaux, France from December 2006 to November 2010, of which 25,979 missing records were retrieved during the data processing phase on April 28, 2017. Table 1 shows some feature information of the dataset, with a total of 8 features, namely: timestamp, global active power, global reactive power, voltage, global current, and useful energy consumption of three residential areas. The above features are the feature data carried by the dataset itself, which are all internal features of the building without involving external features, such as dew point temperature, humidity, wind speed, wind direction, air pressure, and other weather factors.

In the IHEPC dataset, the data collection interval was 1 minute. The sampling granularity is very fine compared to other energy consumption datasets. The prediction granularity of building energy consumption prediction task is usually 1 hour. Therefore, in this paper, 60 records per hour are aggregated into one record in a mean way for prediction. A total of 34,524 records were collected, and for model training predictions, 34,524 records were divided into training, validation, and test sets in a 6 : 2 : 2 ratio.

The dataset is the feature vector constructed at time , and the feature vector of the input model is , which contains records, each record has 8 features. is a two-dimensional matrix. The prediction target is at the next moment. Since the sampling interval in the dataset is constant and there is formula , the predicted power indirectly predicts the energy consumption.

Figure 8 shows the power curve of the first 1000 hours after the IHEPC dataset is aggregated at a granularity of 1 hour. It can be seen that the power curve has a certain periodicity.

Figure 9 shows the power curve of the first 1000 hours after Global_intensity in the IHEPC dataset is aggregated with a granularity of 1 hour.

4.2. Results and Analysis
4.2.1. Evaluation Indicators

Building energy consumption prediction task is a regression task based on time series, which often uses evaluation indexes such as mean square error, root mean square error, and mean absolute error. The following are the performance indicators used in this article in turn.

The first indicator is Mean Square Error (MSE)[31], which is used to calculate the average of the squared errors between the true and predicted values. It is an absolute indicator commonly used in logistic regression tasks. The calculation formula is shown in formula (19), where represents the length of the prediction sequence.

The second indicator is Root Mean Square Error (RMSE) [32], which is used to calculate the average of the square root of the error between the true and predicted values. It is an absolute indicator that is often used in logistic regression tasks. The calculation formula is shown in Formula (20), where represents the length of the prediction sequence.

The third indicator is Mean Absolute Error (MAE) [33], which is used to calculate the average of the absolute errors between the true and predicted values. In contrast to MSE, MAE does not calculate the squared error between actual and predicted values. Therefore, MAE is a relatively less sensitive metric to error and an absolute metric that is often used in logistic regression tasks. The definition is shown in Formula (21), where represents the length of the predicted sequence, represents the actual value of energy consumption, and represents the predicted value of energy consumption.

4.2.2. Parameters for Dividing the Number of Granularities as well as Parallel Convolutional Layers

When using the IHEPC dataset prediction, the convolutional hyperparameter evaluation of the MgHa-LSTM model is performed. This paper evaluates the number of parallel convolutional layers and the influence of the size of convolution kernels on the predictive performance of building energy consumption. In the evaluation process, the control variable method should be used, that is, when exploring one hyperparameter, the values of the other hyperparameters are not changed. Among them, when the number of parallel convolutional layers is 1, it is a single-grained model, so this section can also be used as a performance comparison between multigranularity models and single-grained models. The experimental results are shown in Tables 2 and 3, when the particle size number is 3 and the convolution kernel is , the performance of building energy consumption prediction is the best, so these parameters are selected as the default parameters of the parallel convolutional layer in subsequent experiments.

4.2.3. The Feature Fusion Method and the Parameters of the Loop Layer

When using the IHEPC dataset prediction, the MgHa-LSTM model is evaluated for the fusion of deep features and shallow features in two successive iterations and the cyclic layer hyperparameter evaluation. This paper evaluates the fusion mode of deep features and shallow features in two consecutive iterations, and the influence of the number of layers of the loop layer on the prediction performance. In the evaluation process, the control variable method should be used, that is, when exploring one hyperparameter, the values of the other hyperparameters are not changed. Among them, feature fusion methods include attention mechanism, stitching, and stacking. The experimental results are shown in Tables 4 and 5, and the experimental results show that when the feature fusion method of the feedback mechanism adopts the stacking and circulation layer number selection 1; the performance of building energy consumption prediction is the best, so these parameters are selected as the default parameters in subsequent experiments. Since only two layers of features are fused between the shallow feature map and the deep feature map, the attention mechanism may increase the calculation too much and do not perform well. However, with the increase of the number of cyclic layers in the MgHa-LSTM model, there will be obvious overfitting phenomenon, resulting in a decrease in performance.

4.2.4. Comparative Experiment

This section mainly compared the prediction performance of the MgHa-LSTM model and other building energy consumption prediction models on the IHEPC dataset.

In this experiment, MLP, CNN, LSTM, Multiscale LSTM, and CNN are constructed to compare building energy consumption prediction models. The simulation experiments are carried out in combination with the MGHA-LSTM model proposed in this paper. The prediction curve results on the IHEPC dataset are shown in Figure 10, and the prediction performance is shown in Table 6.

As shown in Figure 10, the IHEPC dataset is more complex. It can be seen that when predicting high energy consumption, the predicted value of MGHA-LSTM model is closer to the real value, while when predicting low energy consumption, the predicted value of MGHA-LSTM model cannot accurately predict the real value, but it is still the model that is closer to the real value in many models.

Overall, the MGHA-LSTM model proposed in this paper has a good fitting effect on the real value of building energy consumption. It is shown that in the MgHa-LSTM model, the multigranularity capture for short-term scale sensitivity and the recurrent layer capture for long-term correlation dependence are more effective for building energy consumption prediction and obtain good prediction representations. As can be seen from Table 6, on the IHEPC dataset, the MSE value of the MgHa-LSTM model proposed in this paper is 0.2821. Compared with the currently popular MsC-LSTM model, the MSE error is smaller, which is 93.72% of the MsC-LSTM model. The RMSE and MAE of the MgHa-LSTM model are also lower than other models, indicating that the prediction performance of the MgHa-LSTM model is better than other models.

5. Conclusions

The MgHa-LSTM model proposed in this paper can be divided into two parts, namely a short-term sensitivity capture module and a long-term correlation capture module. The short-term scale sensitivity capture module proposed granularity segmentation and feedback mechanism and adopts a parallel convolution module. The long-term correlation capture module adopted a hybrid attention mechanism and long-short term memory layers. Based on the above improvements, the mean square error of the MgHa-LSTM building energy consumption prediction model is 0.2821, which is 93.72% of the MsC-LSTM model with the smallest mean square error among other deep learning prediction models. Compared with other deep learning prediction models, the multigranularity MgHa-LSTM model based on convolutional recurrent neural network has more accurate prediction results. However, with our proposed MgHa-LSTM model, we achieved significant performance gains. However, the accuracy is still low. In our future work, we intend to integrate digital twin technology to collect multifaceted data for data entry to further improve the performance. We believe that the proposed framework can be easily used for other time series forecasting applications. For example, mechanical failure prediction, electricity consumption, sales forecast, stock market forecast, and disease forecast.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the Basic Research Project (JCKY2019604C004).