Wireless Communications and Mobile Computing

Volume 2018, Article ID 5018053, 18 pages

https://doi.org/10.1155/2018/5018053

## Predicting Short-Term Electricity Demand by Combining the Advantages of ARMA and XGBoost in Fog Computing Environment

School of Software, Central South University, Changsha 410075, China

Correspondence should be addressed to Li Kuang; nc.ude.usc@ilgnauk

Received 26 January 2018; Accepted 25 March 2018; Published 6 May 2018

Academic Editor: Xuyun Zhang

Copyright © 2018 Chuanbin Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

With the rapid development of IoT, the disadvantages of Cloud framework have been exposed, such as high latency, network congestion, and low reliability. Therefore, the Fog Computing framework has emerged, with an extended Fog Layer between the Cloud and terminals. In order to address the real-time prediction on electricity demand, we propose an approach based on XGBoost and ARMA in Fog Computing environment. By taking the advantages of Fog Computing framework, we first propose a prototype-based clustering algorithm to divide enterprise users into several categories based on their total electricity consumption; we then propose a model selection approach by analyzing users’ historical records of electricity consumption and identifying the most important features. Generally speaking, if the historical records pass the test of stationarity and white noise, ARMA is used to model the user’s electricity consumption in time sequence; otherwise, if the historical records do not pass the test, and some discrete features are the most important, such as weather and whether it is weekend, XGBoost will be used. The experiment results show that our proposed approach by combining the advantage of ARMA and XGBoost is more accurate than the classical models.

#### 1. Introduction

In recent years, with the rise of Cloud Computing [1, 2], more and more computing and storage processing are taking place in Cloud, and the vast employment of Cloud inevitably leads to high latency, network congestion, and low reliability. At the same time, with the wide adoption of IoT services, a variety of household appliances and sensors will be connected to the Internet and produce a large amount of data [3–5]. It has been estimated that the number of devices connected by IoT will reach 50 billion to 100 billion by 2020, which means there will be more and more data without the control of existing techniques on data processing and analysis, privacy leaks may be caused, and the quality of service will be decreased [6, 7]. In this regard, the rapid development of IoT has deepened the dilemma of Cloud Computing. The emergence of Fog Computing makes up for these shortcomings but also brings new opportunities and challenges to the transformation and upgrading of traditional industries. Electricity system, which aims at providing enterprises with safe, reliable, and high-quality electric power, has become an indispensable part in the construction of national economy and people’s life, so it is affected at first. Under the current technical conditions, it is still not possible to achieve large-scale storage of electric energy; therefore, it is required to generate electricity according to the system load at any time, or else the quality of electricity supply and usage may be affected, and even the safety and stability of the system may be endangered. It has become an urgent and important research issue to improve the accuracy of electricity demand prediction in Fog Computing framework.

In the field of electricity demand prediction, scholars have carried out extensive research. In the early stage, scholars basically followed the technology in the field of economic prediction, focusing on the rule of the load sequence in the form of time series itself. The prediction model is established by analyzing the qualitative relationship between the historical load and related factors, and the parameters are estimated according to historical data. However, time series based approach requires historical data with high accuracy, insensitive to the factors such as weather and holidays. Actually, it is very difficult to express the nonlinear relation between the input and output by using a clear mathematical equation since the electricity data are nonlinear, time-varying, and uncertain. In order to further improve the accuracy of electricity demand prediction, artificial intelligence methods have been applied since the 1990s, such as neural network, expert system, and wavelet analysis. However, the existing methods usually can just be applied to a limited scenario and only effective for simple electricity systems with a small quantity of factors.

The main deficiencies of existing work include the following: electricity data samples have been proven to have anomalies but few of the existing solutions detect and deal with the abnormal data. When classifying users and their data, the number of clusters needs to be specified in advance, and the density distribution on electricity consumption is usually ignored in clustering users. In the process of modeling, the temporal features of data are not fully excavated, and the interaction among the features is not fully considered. Most of the solutions just adopt a sole model, which cannot give full play to the advantage of each model.

In order to overcome the shortcomings of existing solutions, in this paper, we propose a short-term electrical-demand prediction approach by combining the advantages of XGBoost and ARMA in Fog Computing framework. Firstly, sensors collect the real-time data of electricity consumption, and then the Fog Nodes would classify enterprise users into different groups according to the amount of electricity consumption and perform anomaly and outlier data detection and procession for each group. Secondly, a model selection process would be performed, that is, based on a series of tests including stationarity, white noise and Pearson correlation coefficient of data, as well as the observed electricity consumption rule; we will decide whether to use time series model or decision tree based model for the modeling of each enterprise group. Finally, the prediction values of each enterprise user are combined to obtain the final result. The accuracy of the proposed approach is verified to be 20% higher than classical models by experiments.

The rest of the paper is organized as follows: Section 2 reviews the related work on predicting short-term electricity demand. Section 3 describes the framework of the proposed solution, as well as the details of the involved key techniques. Section 4 introduces the experiment and analyzes the results. Section 5 summarizes our work and provides the future research plan.

#### 2. Related Work

Because of the nonlinear, time-varying, and uncertainty characteristics of electricity data, it is difficult to accurately grasp the related factors and the rules of electricity consumption change. How to effectively improve the accuracy of electricity demand prediction has become a major challenge to researchers [8]. At present, the methods used for short-term electricity demand prediction mainly include time series [9–11], Regression Analysis [12, 13], Support Vector Regression [14–16], Neural Network [17–20], Bayes [21], Fuzzy Theory [20, 22], and Wavelet Echo State Network [23]. Each kind of method has its own applicable scenario, and no model can achieve desired satisfying result alone.

In order to improve the accuracy of prediction, the current research works mainly focus on three directions. The first one is to explore the optimization of single model. Zhu et al. [24] propose to predict daily load roughly by ARMA first, then obtain the difference sequences that are noncyclical and strongly influenced by the weather, and finally propose an improved ARIMA prediction model with strong adaptability to weather. Factors which influence the electricity consumption are recognized and the mapping relation between key factors and electricity consumption are mined [25, 26]. Ghelardoni et al. [27] use the empirical mode decomposition method to divide the time series into two parts, describing the trend and the local oscillation of energy consumption values, respectively, and then use them to train the support vector regression model. Che et al. [28] use the human knowledge to construct fuzzy membership functions for each similar subgroup and then build an adaptive fuzzy comprehensive model based on self-organizing mapping, support vector machine, and fuzzy reasoning for prediction. An electricity load model is established based on improved particle swarm optimization algorithm and genetic algorithm [29, 30].

The second direction is to improve the accuracy of prediction by integrating different models. Haque et al. propose a hybrid intelligent algorithm based on wavelet transform and fuzzy adaptive resonance theory [31]. In [32–34], the wavelet decomposition is used to project the load sequence decomposition onto different scales, different models are used to predict the different components, and finally the final result is obtained by reconstructing the components. Pindoriya et al. [35] propose an adaptive wavelet neural network (AWNN) for short-term price prediction in the electricity market. Pany and Ghoshal [36] propose a local linear wavelet neural network (LLWNN) model instead of wavelet neural network for the electricity price prediction. Che and Wang [37] propose a hybrid model that combines the unique advantages of SVR and ARIMA models in both nonlinear and linear modeling.

The third direction is to explore composite models for prediction. The weighted average of all the results by various algorithm is usually used, and there are two kinds of ways to determine the weights. The first kind is to improve the fitting accuracy of historical electricity consumption by minimizing the fitting error. The main methods include monotone iterative algorithm [38], evolutionary programming [39], and quadratic programming [40]. Wang et al. [41] propose using adaptive practical swarm optimization algorithm to optimize the weight of the integrated model. The second kind is to determine the weights by evaluating the algorithm’s score. Elliott and Timmermann [42] introduce the concept of the loss function, quantify the negative impact caused by different prediction errors, then take the minimum loss expectation as the goal, and perform optimization to get the weights. Yao et al. [43, 44] employ analytic hierarchy process (AHP) in multiobjective decision analysis to get the relative merits of each algorithm in fitting accuracy, model adaptability, and result reliability, the judgment matrices are obtained, and then the weights are combined by calculating the main eigenvectors of each matrix. Petridis et al. study the use of probability models to determine the weight of each model and combine the values of each algorithm to obtain the final result [45]. Without enough quantitative theoretical basis, the weights of such models only reflect the advantages and disadvantages of the algorithms.

In summary, there have been many research works on the prediction of short-term electricity demand, and exploring composite models for prediction is the main trend. However, existing methods still have limitations. In the context of the rapid development of smart electricity grid, in this paper, we propose a short-term electricity demand prediction method based on XGBoost and ARMA.

#### 3. Predicting Short-Term Electricity Demand

##### 3.1. Problem Definition

Given a dataset , is the th electricity consumption record for a certain enterprise user, and can expressed as a 3-tuple, that is, = , where record_date represents the date time, user_id is the ID of the enterprise user, and power_consumption represents the electricity consumption amount of the enterprise user on that day.

We use the dataset from Tianchi, which contains the historical records of 1454 enterprises in Yangzhong High-Tech Industrial Development Zone of Jiangshu Province from 2015-01-01 to 2016-11-30, for the following illustration and experiments. Examples of the dataset are shown in Table 1.