Abstract

This paper presents a novel abnormal data detecting algorithm based on the first order difference method, which could be used to find out outlier in building energy consumption platform real time. The principle and criterion of methodology are discussed in detail. The results show that outlier in cumulative power consumption could be detected by our method.

1. Introduction

Building energy consumption represents a significant percentage of national energy consumption in many countries; the reaching figures of energy used in buildings in comparison with the total national consumption are considered to be 25% for Japan, 28% for China, 37% for European Union, and 40% for the United States [14]. Furthermore, the rise of energy demand in buildings will continue in the near future because of the growing use of buildings and increasing demand for improved building comfort levels [5]. In this scenario, energy efficiency in facilities is a prime objective of energy policy, and the energy efficiency of buildings is of prime concern in both developing and developed countries for anyone who wishes to identify energy savings. Energy monitoring can help to achieve energy savings and improve the quality of the energy supply; therefore, it can be of strategic importance. The purpose of monitoring building energy consumption is to get data, which is to provide greater insight into how a building consumes energy and achieve a better understanding of the energy usage. Once the dynamic of the energy consumption of a building is known, it is possible to analyze what improvements are likely to be most effective in reducing consumption.

There are already some intelligent building energy consumption monitoring platforms designed in order to collect the energy data in buildings, especially for large public buildings [68], which include hotels, hospitals, convenience stores, and government office buildings. However, there always exits abnormal data in the process of data acquisition, outlier, which reflects the very small or very large data compared to other normal data, as shown in Figure 1. The outlier must be removed because it will lead to the increasing of error during the data processing in the server, even cannot be calculated.

In [9], the researchers developed neural network algorithm to check if there is any energy consumption data by using predicting the energy consumption from collected previous data. If the predicted data is lower or higher than the setting thresholds, which is calculated by the ratio of actual energy consumption, it is regarded as an outlier. Also, Lee et al. [10] used intelligent algorithm to solve the problem. In [11, 12], an intelligent data-analysis system was proposed by grouping days of the week with similar power consumption, which can automatically detect abnormal energy use in a building; it was used to determine if the energy consumption is significantly different than previous energy consumption and notify building managers of issues with minimal delay, helping to reduce energy costs. There are many other researchers who developed different methods detecting the abnormal energy consumption data in buildings [5, 13].

However, these methods are too complex to establish model, and the calculation quantities are too large, which are not suitable for real time application. To improve the potential drawback, we develop a rapid detecting algorithm based on the first order difference method. This paper presents a convenient methodology for finding and processing the outlier based on the first order difference method. The related method for processing outlier is discussed in Section 2. Section 3 proposes the detailed description of first order difference methodology. Examples illustrating the methodology are given in Section 4. Finally, Section 5 presents the conclusion.

3. Methodology

3.1. Principle

The difference between adjacent points is small, when the sample frequency satisfies the Nyquist sampling theorem, which could be shown as in Figure 2. The relationship of adjacent points is

In this case, it is possible to estimate the next moment of sampling points by using the value of and , which could be expressed as

Take formula (1) into formula (2), and then, where is the sampling value at timeslot, is the sampling value at timeslot, is the sampling value at timeslot, and is the predicated value of timeslot.

3.2. Criterion

Setting is the sampling value of timeslot and is the predicated value of this moment, we regard as the outlier if , where is a given error bound, and then should be replaced by . The selection of error bound and the prediction algorithm of are the two key issues of the first order difference method. If the value is too large, the outlier may be missed; on the other hand, the normal data is likely to be taken as outlier if the value of is set too small. The sampling frequency of monitoring system and the variation characteristic of physical quantity should be considered when configuring the value of . Besides that, two key techniques are employed in the proposed model set design procedure: the first is selection of starting point, and the other is how to deal with many outliers in the same time.

3.2.1. Selection of Starting Point

There is a kind of special case, namely, the starting point is the interference of abnormal points, which will mislead the normal program execution. In order to avoid the situation, the first three continuous points selected must be satisfied:

The procedure of data identification and correction starts after finding the correct starting three points; we process the previous data along with the negative direction of time axis and positive direction for future data, which refers to (5) and (6), respectively. However, if , , and could meet the condition of inequality (4), we could use them as the starting points directly:

3.2.2. Continuous Outliers

The initial points and must be replaced when the algorithm encounters continuous outliers, in order to avoid data deviation from the correct trend. Typically, but not always, the initial points and should be changed after two continuous outliers. However, also should be checked by (7) so as to enable the accuracy of energy data: where is the constant parameter, which should be set according to the actual situation. If satisfies formula (7), it could be reserved; otherwise, it should be deleted as outlier and replaced by the predicted point .

The flowchart of outlier detecting algorithm is shown as in Figure 3. is the error limitation, while is the up-boundary of error limitation, which is usually selected as 5 times of . If the difference between measured value and predicted value is smaller than , the measured value is regarded as normal. On the other hand, if the measured value is greater than , it should be deleted as outlier. Furthermore, if the value of difference is between and , it should be checked by formula (7), avoiding deviating from the trend.

4. Results and Analysis

Figure 4 shows the result of the algorithm; the data is from a real building energy consumption monitoring platform in Dalian, China, which reflects the cumulative power consumption of an exhauster, while Figure 4(a) is the raw data from database, and there are three outliers as the arrows point out, with the values being zero caused by interference. Figure 4(b) is the correct result after being processed by our method; it can be seen that the three points have been modified by predicted data. Furthermore, the water or heat energy consumption data can also be processed by this method.

5. Summary

Outlier detection could make better use of the energy consumption monitoring platform in buildings, especially for large public buildings. In this paper, we present an abnormal data detecting methodology based on the first order difference method, which could be implemented in building energy monitoring platforms. Energy consumption prediction and seeking the best running mode for managing are our next phase work.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.