Abstract

Building automation systems is becoming more vital, especially in regard to reduced building energy consumption. However, the accuracy of such systems in calculating building thermal loads is limited as they are unable to predict future thermal loads based on prevailing environmental factors. The current paper therefore seeks to improve the understanding of the interactions between outdoor meteorological data and building energy consumption through a statistical analysis. Using weather data collected by the Korean Meteorological Agency (KMA) over a period of three years (2011–2014), prediction models that are able to predict heating thermal loads considering the time-lag phenomenon are developed. In addition, the study develops different prediction models for buildings of different sizes. The results confirm the existence of the time-lag phenomenon: the heating load experienced by a building at a given time is better explained by a regression model developed using the climatic conditions that existed two hours before. As such, conventional building simulation programs must endeavor to include time-lag as well as Aerosol Optical Depth (AOD) data as important factors in the prediction of building heating loads.

1. Introduction

Energy efficiency is of particular importance, especially in the building sector. The building sector consumes a significant amount of energy in comparison with other sectors [1]. As such, reducing the energy consumed by buildings has been at the core interest of building-related research.

Over the years, various energy-saving systems have been developed and implemented. Research from multiple disciplines, such as IT (information technology) and mechanical engineering, are being utilized to improve the energy performance of buildings through building automation. For simplification, building automation can be divided into three categories: the building management system (BMS), the security system (SS), and the energy management system (EMS). The EMS is more closely associated with the energy-consuming units of the building, such as the heating, ventilating, and air conditioning (HVAC) systems and systems that require mechanical operations, such as elevators, automated doors, and so forth. The likes of such systems are essential in predicting the energy consumed by buildings in that they collect the energy usage patterns of a given building, which can be used later in the calculation of the total energy consumed by the building [2]. These systems are also helpful in understanding complex relationships between building occupants and their environments [3].

Although systems that assist in the prediction of building energy performance already exist, such as the EMS and other related building simulation software, the process of accurately predicting the amount of energy consumed by a given space is rather complex [4]. It requires an extensive understanding of certain factors that influence how a building behaves. An example of such a factor is the outdoor climatic conditions. There is a plethora of studies that deal with the complex relationships between buildings and physical factors [5]. However, studies that deal with outdoor climatic conditions and their influence on building energy consumption are rather insufficient. The current study seeks to fill this gap and improve the understanding of the interaction between the prevailing outdoor elements and building energy consumption. In addition, this study seeks to present ways in which the newly found relationship between outdoor climatic conditions and building energy performance can be used to predict building heating thermal loads. Furthermore, this study considers aerosol weather data that were not previously taken into account when predicting building energy using computer simulation programs.

2. Energy Prediction in Buildings

Achieving significant reductions in the energy consumed by buildings requires knowledge about the thermal behavior of a given building. With the knowledge regarding building behavior and how a building interacts with its surroundings, the energy consumed by a building can then be predicted and subsequently reduced. Traditionally, building energy consumption could be predicted through mathematical equations. To do this, factors related to building thermal loads and their influence on the thermal performance of the building had to be extensively studied. Nowadays, new and simpler methods are constantly being developed to assist building professionals in their endeavors to optimize building designs and improve building energy performance [6]. Additionally, the estimation of building energy consumption has become a key factor in documenting energy consumed by buildings. However, it is not a simple task to estimate building energy consumption, as it involves various factors that are associated with the characteristics of buildings, such as the climatic conditions, occupant schedule, HVAC systems, and so forth [7].

Recently, scientific studies have been focusing on human behavior and its influence on energy consumption by buildings. Erickson et al. alluded to the issue of modern buildings being conditioned by relying on the assumption of maximum occupancy and, thus, conditioning spaces unnecessarily. Subsequently, energy is utilized where it is not needed. In addition, the study presented a method by which energy occupancy models can be developed using a wireless sensor network integrated into building HVAC systems to ensure that spaces are not conditioned without occupancy [8]. Furthermore, Ahn and Park [9] explained how poor consideration of occupant behavior might lead to inconsistencies between the actual building energy data and the modeled building energy data. In the study, multiple approaches to optimize the occupant schedule data in building modelling were also discussed. Virote [10] also stresses the importance of considering occupant behavior and space occupancy in building energy modelling. Different occupant schedules are likely to lead to different energy usage patterns. In addition, Virote demonstrates the applicability of stochastic models to estimate occupant behavior and space occupancy.

Various statistical modelling techniques are also playing a significant role in building modelling. A number of studies have developed models that either predict the amount of energy consumed by buildings or models that improve the accuracy of the building energy prediction software that already exists. For instance, a study by Ahmad et al. [11] has demonstrated how building electric energy can be predicted using artificial intelligence methods. His study discusses two common approaches used in deep learning-related predictions: support vector machine (SVM) and artificial neural network (ANN). In addition, the study shines light on new data-handling techniques being developed to improve building modelling and building behavior prediction. Some of these methods include least square support vector machine (LSSVM) and group method of data handling (GMDH). Ahmad concludes that there are advantages and disadvantages associated with each model, and it is thus difficult to pinpoint which of the modelling techniques is superior. Deep learning techniques have also been used to study other aspects of building modelling such as occupant behavior. Zhao et al. [12] developed a data-mining model that learns the behavior of occupants based on the office appliance usage. Grey-box models have also been predominantly used in studying short-term behavior of buildings [13].

Li et al. [14] also introduced an advanced deep learning approach to increase accuracy in building energy consumption predictions. The introduced method is a combination of two approaches: stacked autoencoders (SAEs) and extreme learning machine (ELM). The SAE is used in studying the building’s energy usage patterns whereas the ELM is used as the prediction learning tool. The proposed approach shows better performance when compared with other deep learning techniques such as support vector regression (SVR), generalized radial basis function neural network (GRBFNN) and multiple linear regression (MLR). Sensor-based machine learning techniques are also widely applied in forecasting building energy consumption. Such techniques require less input data than conventional statistical methods and are thus less complex [15]. Fan et al. [16] discuss the application of data-mining techniques in developing a set of predictive models that estimate next-day energy consumption and peak power demand. The developed ensemble models show a much higher performance in comparison with single-base models.

Among statistical approaches used in building energy estimation, regression analysis has shown significant outcomes. Regression analysis is widely adopted because it is easy to use and provides reasonably accurate results. Braun et al. used regression models to predict the future electricity and gas consumption of a supermarket in northern England, obtaining theoretically accurate results [17]. Similarly, Tso and Yau also developed regression models to predict the electricity consumed by buildings [18]. Lam et al. further highlighted the usefulness of regression analysis to predict building behavior in the early design stage of buildings [19]. The accuracy of regression analysis in predicting the behavior of buildings has been so far outstanding. Regression models have been used to estimate the heating energy demand in buildings, obtaining accurate results with R-values as high as 0.987 [20] and 0.90 [21]. The accuracy of regression models to predict building energy consumption is constantly being improved through continuous research. For instance, Fumo and Rafe Biswas [22] point out the importance of the time interval of measured data in regard to the quality of produced regression models. Similarly, the current paper seeks to utilize regression models to improve accuracy in the prediction of building thermal heating loads by considering the time-lag phenomenon and a new climatic factor, aerosols. Both of these factors have not been studied extensively in regard to building energy modelling through regression analysis.

3. Methodology

3.1. Weather Data and Energy Predictions in Buildings

Predicting the amount of energy used by a given space requires a number of factors. One of the most important factors is the weather data. Previous studies [23] have shown indoor climatic conditions and outdoor climatic conditions to be correlated in a directly proportional manner. As such, existing outdoor climatic conditions have a significant influence on how a building occupant behaves; for instance, if it is cold outside, the occupant is more likely to activate the heating system. Many energy prediction software programs have methods in which information about the climatic conditions of an area is entered into the program for analysis. In cases where real-time data from weather stations are unavailable, weather data can be obtained from online databases. Some types of weather data are even derived from developed mathematical equations. However, weather data obtained using methods other than actual data from weather stations have been known to cause inconsistencies in the predicted energy results. In fact, the Lawrence Berkeley National Laboratory (LBNL) attributes the differences between the actual amount of energy consumed by a building and the amount of energy predicted by simulation software to be due to inaccuracies in the weather data used [24]. Due to the amount of climatic conditions used in simulation software to predict energy used by a building, there have been attempts [25] to use statistical methods such as the Taguchi method to sieve out the least important climatic factors and thus reduce the number of input parameters that describe weather conditions. In a similar study, Kapetanakis et al. [26] used data from Building Energy Management System (BEMS) to examine and subsequently provide appropriate input weather valuables necessary for predicting building heating thermal loads. Different energy prediction software also uses different formats of weather data [27]. This is mainly because such software was developed using different algorithms and might not be able to process a weather data file with a different format. Consequently, the current study converted the format of the original data (data obtained from KMA) into a format that can be read by EnergyPlus, “epw.” Table 1 shows the weather data used within this study.

3.2. The Time-Lag Phenomenon

The time-lag phenomenon is being studied in the current paper to address the inconsistencies brought upon by the usage of weather data in building energy predictions. Theoretically, it is usually assumed that indoor climatic conditions are a direct representation of outdoor climatic conditions, especially in passive buildings that lack mechanical systems to adjust indoor environmental conditions. In reality, however, outdoor climatic conditions at a given time are not accurately matched by indoor climatic conditions at that same time. The term “time-lag” is being used to represent the delay that occurs between peak energy usage by a given building and peak outdoor climatic conditions. Figure 1 gives an example of the time-lag phenomenon. For May 13, 2011, the peak radiation from the sun, peak temperature, and peak cooling load occurred at approximately 1 pm, 3 pm, and 4 pm, respectively. Theoretically, the peak cooling load should occur at the same time as the peak outdoor temperature, but in reality, there was a delay of 1 hour between the time the peak temperature occurred and the time the peak cooling load occurred. Moon and Kim [28] discussed in detail the issue of time-lag in buildings.

3.3. Reference Building

The current study used the U.S. Department of Energy (DOE) commercial buildings as case models for analysis. The DOE reference models are standard models used for energy performance assessments in building-related studies [29]. The reference building models incorporate a total of 16 building types and 16 different geographical locations in the U.S. They represent approximately 60% of the characteristics usually found within commercial buildings [30]. Table 2 shows the DOE reference buildings used.

Furthermore, ASHRAE specifies building material according to different climatic conditions. South Korea, which is the base area of this study, falls under ASHRAE climatic zones 4A and 4B; the building materials used are in line with ASHRAE standard 90.1–2004 to match the climatic conditions of the location under study. It is worth noting that the regression equations determined through this study might need revisions for buildings made of different materials.

3.4. Research Process

As mentioned in the previous sections of this study, the current paper seeks to develop heating load prediction models while considering the time-lag phenomenon. Multiple regression modelling is used to study the relationship between the outdoor climatic conditions and heating load in buildings of three different sizes while considering time-lag. Ideally, the purpose is to find which weather data best describes the heating load at a given time. To do that, EnergyPlus is made to output hourly data in three phases. First, when the time-lag is zero (time-lag0), meaning that the output results from EnergyPlus are for a time that is in sync with the time of the weather data used. Second, when the time-lag is one (time-lag1), meaning that the output results from EnergyPlus are for a time that is one hour ahead of the time of the weather data used. Finally, when the time-lag is two (time-lag2), meaning that the output results from EnergyPlus are for a time that is two hours ahead of the time of the weather data used. This procedure is done for each building size (small, medium, and large DOE reference buildings). Figure 2 below explains the study procedure.

4. Results

4.1. Multiple Regression Analysis for the Prediction of the Heating Load for a Small Office considering Time-Lag

To estimate the heating load of a small office building, 20,352 weather data points were used. In addition, the influence of outdoor climatic conditions on the heating load was assessed through multiple regression analysis. Without considering time-lag, the heating load values extracted from EnergyPlus after one hour and two hours were matched separately by hour during the analysis. The results show R-values of the time-lag0 model, time-lag1 model, and time-lag2 model to be 0.534, 0.542, and 0.592, respectively. In addition, the R-square values of time-lag0, time-lag1, and time-lag2 were 0.285, 0.294, and 0.350, respectively; this implies that the regression model for the DOE small office has a higher explanatory or predictive power when using a time-lag of two hours. Table 3 shows the results of the small office model summary considering time-lag.

The F-variations for all the developed models were higher than 700, and the values were all less than 0.001, thus confirming the statistical significance of the models. Furthermore, the Durbin–Watson test was conducted, and the results were 0.799, 0.834, and 0.778 for the time-lag0, time-lag1, and time-lag2 models, respectively; this means that among the three developed models, the time-lag1 model has a higher estimating power as it has a higher R-value and Durbin–Watson value.

The results of the ANOVA analysis shown in Table 4 indicate a value less than 0.05 for all the models; thus, all the models are assumed to be statistically significant. The time-lag2 model showed the largest variance, whereas the time-lag0 model showed the largest variance of residue. Additionally, the F-value belonging to the time-lag2 model was the highest among the three developed models, thus indicating that the time-lag2 model has the best fit.

Table 5 shows the tolerance limit, significance of probability, and t-value of the variables for each model. For accuracy, if a high goodness-of-fit is found for the independent variables, the t-value and value should meet certain criteria: t-value ≥ 1.96 and . In addition, to avoid issues related to multicollinearity, the tolerance limit should be 0.1 or higher; variables that do not satisfy these conditions are usually omitted from the analysis to improve the accuracy of the developed models.

Consequently, visibility was omitted from the time-lag0 model, whereas the dew point temperature, relative humidity, and visibility were also removed from the time-lag1 model; these variables are thus absent from the final models.

As indicated by the results after taking into account the t-value, significance of the probability, and multicollinearity issues, the time-lag2 model was found to be the most suitable for a DOE small office building; this model is defined as the “Modified Model” and is presented in Table 6. Secondly, another model is developed based on all the variables considered in the time-lag2 model, AOD, and precipitable water values; this model is defined as the “New Model,” and it is also presented in Table 6. Additionally, a third model based on only those variables received from the Korea Meteorological Administration (KMA) and which can be input by the user into EnergyPlus was developed considering a time-lag of 2 hours; this third model is defined as the “Limited Model” and is shown in Table 6.

A multiple regression analysis was performed to predict the heating thermal load. As indicated by the obtained results, the “New Model” showed the minimum R-squared (0.422) value and the highest Durbin–Watson value (1.509); in this case, a Durbin–Watson value above 1.3 indicates a weak correlation between the variables. In addition, although the R-squared values for the “Modified Model” and “Limited Model” are higher than that of the “New Model,” multicollinearity issues were found for the two models.

The F-variation of the “New Model” was shown to be higher than 170 at . Furthermore, the variance of the regression model and residual were smaller than those of both the “Modified Model” and “Limited Model.” The F-value was shown to be highest in the “Limited Model.” Additionally, the results predicted by the “New Model” exhibited the smallest mean and standard deviation, as well as the smallest difference between the maximum and minimum predictions (Tables 79). A regression equation for the “New Model” can be drafted based on the final variables listed in Table 6.

4.2. Multiple Regression Analysis for the Prediction of Heating Load for a Medium Office considering Time-Lag

For a DOE medium office building, 20,352 weather data points were used to estimate the heating load. In addition, the influence of the outdoor climatic conditions on the heating load was assessed through multiple regression analysis. Without considering time-lag, heating load values extracted from EnergyPlus after one hour and two hours were matched separately by hour during the analysis. The results of the analysis together with the input variables describing the climatic conditions are shown in Tables 1013. The results show the R-values of the time-lag0 model, time-lag1 model, and time-lag2 model to be 0.537, 0.537, and 0.579, respectively. In addition, the R-square values of time-lag0, time-lag1, and time-lag2 were 0.289, 0.289, and 0.335, respectively; this implies that the regression model for the DOE medium office has a higher explanatory or predictive power when using a time-lag of two hours. Table 10 shows the model summaries of a DOE medium office building considering time-lag.

The F-variation for all the developed models was higher than 750, and the values were all less than 0.001, thus confirming the statistical significance of the models. Furthermore, the Durbin–Watson test was conducted, and the results were 0.706, 0.731, and 0.677 for the time-lag0 model, time-lag1 model, and time-lag2 model, respectively; this means that among the three developed models, the time-lag1 model has the highest estimating power as it had the highest R-value and Durbin–Watson value, as shown in Table 11.

The results of the ANOVA analysis indicate a value less than 0.05 for all models, and thus, all the models are assumed to be statistically significant. The time-lag2 model showed the largest variance with the smallest variance in the residuals. In addition, the F-value belonging to the time-lag2 model was the highest among the three developed models and thus indicated that the time-lag2 model has the best fit.

Table 12 shows the tolerance limit, significance of the probability, and t-value of the variables for each model. For accuracy, if a high goodness-of-fit is found for the independent variables, the t-value and value should meet certain criteria: t-value ≥ 1.96 and . In addition, to avoid issues related to multicollinearity, the tolerance limit should be 0.1 or higher; variables that do not satisfy these conditions are usually omitted from the analysis to improve the accuracy of the developed models.

Since the time-lag phenomenon was not taken into consideration in the time-lag0 model, the most important variables when it comes to thermal conditions, such as global horizontal radiation and direct normal radiation, failed to satisfy the conditions. The dew point temperature and relative humidity in the time-lag1 model and the dew point temperature in the time-lag2 model were the variables found to be rejected from each model. These variables are thus absent from the final models. Table 13 shows residual statistics for a DOE medium office building considering time-lag.

As indicated by the results after taking into account the t-value, significance of the probability, and multicollinearity issues, the time-lag2 model was found to be the most suitable for a DOE medium office building; this model is defined as the “Modified Model” and is presented in Table 14. Secondly, another model is developed based on all the variables considered in the time-lag2 model, AOD, and precipitable water values; this model is defined as the “New Model,” and it is also presented in Table 14. Additionally, a third model based on only those variables received from the Korea Meteorological Administration (KMA) and which can be input by the user into EnergyPlus was developed considering a time-lag of 2 hours; this third model is defined as the “Limited Model.”

Multiple regression analysis was performed to predict the heating thermal load. As indicated by the obtained results, the “New Model” showed the minimum R-squared (0.422) value and the highest Durbin–Watson value (1.509); in this case, a Durbin–Watson value above 1.3 indicates a weak correlation between the variables. In addition, although the R-squared values for the “Modified Model” and “Limited Model” are higher than that of the “New Model,” multicollinearity issues were found for the two models.

The “Modified Model” is a good explanation of the dependent variable, as it exhibits the highest R-values. The variables considered in the “Modified Model” also exhibit positive correlations, as indicated by the Durbin–Watson value (0.671). The “Limited Model” indicated R-values lower than those of the “Modified Model” but higher than those of the “New Model” by a value of 0.03. The “New Model” showed the lowest R-value (0.458) and thus is an indication of the lowest explanatory power.

Tables 1517 show the statistical results of the new regression model dealing with AOD. As shown in the table, the F-variation of the model was higher than 159 at . Its variance and residual are slightly lower than that of the “Modified Model.” In addition, the F-values of the “Modified Model” were shown to be the highest of all three models. The “Limited Model” indicated the largest difference between the maximum and minimum predictions, as well as the largest mean.

4.3. Multiple Regression Analysis for the Prediction of the Heating Load for a Large Office considering Time-Lag

For a DOE large office building, 20,352 weather data points were used to estimate the heating load. In addition, the influence of the outdoor climatic conditions on the heating load was assessed through multiple regression analysis. Without considering time-lag, heating load values extracted from EnergyPlus after one hour and two hours were matched separately by hour during the analysis. The results of the analysis together with the input variables describing the climatic conditions are shown in Tables 1821. The results show R-values of the time-lag0 model, time-lag1 model, and time-lag2 model to be 0.514, 0.516, and 0.579, respectively. In addition, the R-square values of time-lag0, time-lag1, and time-lag2 were 0.264, 0.266, and 0.335, respectively; this implies that the regression model for the DOE large office has a higher explanatory or predictive power when using a time-lag of two hours.

All regression models developed in the study show F-variations higher than 660 at . In addition, the time-lag2 model indicated the highest R-value (0.931) amongst the three models. The Durbin–Watson test showed values of 0.732, 0.761, and 0.677 for the time-lag1, time-lag2, and time-lag3 models, respectively.

The results of the ANOVA analysis indicate a value less than 0.05 for all the models, and thus, all the models are assumed to be statistically significant. The time-lag2 model showed the largest variance with the smallest variance in the residuals. In addition, the F-value belonging to the time-lag2 model was the highest among the three developed models and thus indicated that the time-lag2 model has the best fit.

Table 20 shows the tolerance limit, significance of the probability, and t-value of the variables for each model. For accuracy, if a high goodness-of-fit is found for the independent variables, the t-value and value should meet certain criteria: t-value ≥ 1.96 and . In addition, to avoid issues related to multicollinearity, the tolerance limit should be 0.1 or higher; variables that do not satisfy these conditions are usually omitted from the analysis to improve the accuracy of the developed models.

Since the time-lag phenomenon was not taken into consideration in the time-lag0 model, the most important variables when it comes to thermal conditions, such as global horizontal radiation and direct normal radiation, failed to satisfy the criteria. In addition, the dew point temperature and relative humidity in the time-lag1 model and the dew point temperature in the time-lag2 model did not meet the criteria and thus were omitted from the final model.

As indicated through the obtained results, the time-lag2 model was shown to be the most suitable for heating load prediction of the DOE reference for a large office building. After taking into consideration the criteria related to the t-value, significance probability, and multicollinearity, the dew point temperature was excluded from the analysis; this model is defined as the “Modified Model” and is presented in Table 22. Secondly, another model is developed based on all the variables considered in the time-lag2 model, AOD, and precipitable water values; this model is defined as the “New Model” and it is also presented in Table 22. Additionally, a third model based on only those variables received from the Korea Meteorological Administration (KMA) and which can be input by the user into EnergyPlus was developed considering a time-lag of 2 hours; this third model is defined as the “Limited Model” and is shown in Table 22. In other words, the third model considers all real data with the exception of the data obtained from equations. The data considered for each model are described as prediction a, prediction b, and prediction c and are shown in Table 22.

Tables 2325 show the statistical results of the new regression model dealing with AOD. As shown in the table, the F-variation of the model was higher than 159 at . The variance and the residuals of the “New Model” are slightly lower than those of the “Modified Model.” In addition, the F-values of the “Modified Model” were shown to be the highest of all three models. The “Limited Model” indicated the largest difference between the maximum and minimum predictions, as well as the largest mean.

5. Conclusion

The current study aimed to understand the effect of time-lag as well as Aerosol Optical Depth on the prediction of the heating load in buildings. In the process, this study also explains how heating load predictions can be improved through the consideration of time-lag. Various studies have already established that outdoor weather conditions play a significant role in determining the thermal behavior of a building; the term time-lag in this sense is used to define the phenomenon in which the time of the effect of the outdoor climatic conditions on the building heating load is misrepresented during prediction. The current study uses regression models to identify which outdoor climatic conditions, in terms of time, best explain the heating load being experienced.

Many building energy simulation programs assume that the time at which peak outdoor conditions occur is the same time the effect of these conditions is highest inside the building. However, the results obtained by the current study prove otherwise. For instance, for all the building sizes considered (DOE reference buildings for small, medium, and large offices), it has been shown that the heating load at a given time is best described by the outdoor climatic conditions that occur two hours prior to that time. In this same manner, the developed regression models can be used to predict the heating load of a future time, for example, two hours in advance; a number of factors, such as the building materials used, can explain the reason for this. For example, the outer material of a building might absorb heat during the peak solar hours only to release it later into the building. In this case, a building simulation program would estimate the highest thermal load to be at the time of the peak solar hours, yet the effect is actually experienced later when the heat has been released from the building mass. The regression models developed here, however, can be used accurately to estimate the heating load since they consider the time-lag caused by factors such as the building mass or building size. Table 26 is a summary of the regression models that best estimate the heating load per building size. In addition, the table presents the developed models in two sets: the first set considering only the time-lag and the second set considering both time-lag and Aerosol Optical Depth data.

In conclusion, the key finding of this study is that both time-lag and Aerosol Optical Depth (AOD) data are critical factors in the estimation of building energy usage. As such, conventional building simulation programs must endeavor to include time-lag as an important factor in the prediction process. The impact of each input variable on the output can be assessed by the size of the coefficient values. For example, for the DOE small office linear regression model, total sky cover has the biggest impact on building heating loads.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.