Abstract

The grey model, which is abbreviated as GM (1, 1), has been widely applied in the fields of decision and prediction, particularly in the prediction of time series with few observations, referred to as the poor information and small sample in the literature related to grey model. Previous studies focus on improving the accuracy of prediction but pay less attention to the robustness of the grey model to outliers, which often occur in practice due to an incorrect record by chance or an accidental failure in equipment. To fill that void, we develop a robust grey model, whose structural parameters are obtained from the least trim squares, to forecast Chinese electricity demand. Also, we use the last value in the first-order accumulative generating time series as the initial value, according to the new information priority criterion. We name the novel grey model, proposed in this paper, the novel robust grey model integrating the new information priority criterion, which could be abbreviated as NIPC-GM (1, 1). In addition, we introduce a novel approach, that is, the bootstrapping test, to investigate the robustness against outliers for the novel robust grey model and the classical grey model, respectively. Using the data on Chinese electricity demand from 2011 to 2021, we find that not only does the novel robust grey model integrating the new information priority criterion have a better robustness to outliers than the classical grey model, but it also has a higher accuracy of prediction than the classical grey model. Finally, we apply the novel robust grey model integrating the new information priority criterion to forecasting the future values in Chinese electricity demand during the period 2022 to 2025. We see that Chinese electricity demand would continue to rise in the next four years.

1. Introduction

Electricity demand monitoring, forecasting, and warning early are of importance in both energy and economic fields, which is very close to industrial activities and human activities [13]. Also, electricity demand is referred to as an important impact factor in electricity generation, which is beneficial to regulate the schedule of electricity generation for the operator in the electricity system [46]. In addition, electricity demand, to some degree, serves as an important indicator in macroeconomic performance for policymakers around the world, which is highly associated with a large number of economic variables, such as gross domestic product and gross national product [79]. Therefore, the fluctuation in electricity demand would have a great effect on the society [10]. An accurate and reliable approach to the prediction of the future demand in electricity is needed by policy maker and electricity producer.

However, it is not an easy task to predict the future demand in electricity in practice [1113]. On the one hand, electricity demand would be affected by a large number of factors, including population, economic growth, and climate change [14, 15]. These factors also have a large uncertainty and fluctuation, which adds the difficulty in forecasting the future demand in electricity [16]. On the other hand, the time series of electricity demand itself displays less stability, which makes the prediction of electricity demand be full of challenges [17]. Thus, a robust and reliable approach to forecast the future demand in electricity is needed from a methodological perspective.

According to previous research by Hernandez et al., who reviewed the literature related to the techniques of the prediction of electricity demand over the past 40 years, these techniques of the prediction of electricity demand could be divided into three groups [18]. The first group applies machine learning techniques [19]. The second group uses statistical approaches [20]. The third group adopts grey models [21]. These machine learning techniques include artificial neural network and support vector machine. The disadvantage of the machine learning technique is that it requires a lot of observations in the sample for learning. These statistical approaches include parametric regression, semi-parametric regression, and non-parametric regression. In the field of time series, statisticians often employ the autoregressive moving average model, or the vector autoregressive model, to fit the data in sample and to make predictions out of sample. The statistical model has its drawback, although it is easy to know the idea behind the statistical model. That is, these statistical models, like autoregressive moving average and vector autoregressive, heavily rely on data collection and parametric estimation. The grey model is likely to be the most appreciative approach to make predictions of electricity demand, across these three groups that we mentioned previously, which is very specialized in coping with poor information and small samples.

The grey model was proposed by Professor Deng in the year of 1982 [22]. It is quite popular among a great number of applications in prediction because the grey model has a strong capability of capturing the characteristics of a system with uncertainty. Besides, the grey model has a high accuracy of prediction when it is applied in a sample with few observations. Like the machine learning technique and the statistical approach, the grey model is also a collection of a series of grey models. Among them, the classical grey model, which is abbreviated as GM (1, 1) in the literature related to the grey theory, is the most popular and the most used one. Previous studies have pointed out that the classical grey model efficiently deals with these problems faced by the grey system, particularly insufficient observations and uncertain circumstances. Thus, it is a good choice to use the grey model to predict the future demand in electricity.

The prediction of the electricity demand, also, should be referred to as a grey system problem because it could be affected by great quantities of uncertainty. These factors, including the total population, the level in economic development, and weather conditions, all affect the accuracy and the reliability of prediction. It is a pity, however, that we do not exactly know how many factors would affect the electricity demand, as well as how these factors affect the electricity demand. Besides, as we know, some emerging countries like China have a short duration of time series of the electricity demand, and at the same time, the electricity demand in these countries is rapidly increasing, which adds difficulties in forecasting the future demand in electricity using the predictive models, such as machine learning techniques and statistical approaches. Therefore, the classical grey model provides a good alternative to machine learning techniques and statistical approaches to predict the future demand in electricity.

The literature related to the prediction of electricity demand using the grey model is quite large and is expected to continue to rise in the future [23, 24]. Hu used a neural network-based grey model to forecast the future demand in electricity [25]. Zhao et al. proposed a rolling grey model and provided predictions of the electricity demand over a relatively long period [26]. Bahrami et al. developed a grey model with a microwave transformation and used it to forecast the future demand in electricity in a relatively short-run term [27]. Xu considered an optimized algorithm to update the grey model for the projection of Chinese electricity demand [28].

Although these grey models, proposed by the previous studies, have a great amount of obvious advantages and are better than those traditional models, such as machine learning techniques and statistical approaches, they have some problems. For example, a large number of the existing grey models estimate the structural parameters using the ordinary least squares estimation, which assumes that there do not exist outliers in the sample. Recent studies have demonstrated that due to outliers occurring in the sample, the grey model suffers from poor robustness as well as a low predictive accuracy [29, 30]. In order to solve this problem, we introduce least trim squares estimation into the classical grey model, that is, GM (1, 1), and propose a novel robust grey model to predict the Chinese electricity demand. In addition, we consider to use the new information priority criterion to further improve the novel proposed robust grey model. Up to this point, we name the novel grey model the novel robust grey model integrating the new information priority criterion, which could be abbreviated as NIPC-RGM (1, 1).

The rest of this paper is organized as follows. Section 2 describes the classical grey model and the novel proposed robust grey model. Section 3 reports the results, which include the robustness and the accuracy of the novel proposed robust model, compared with the classical grey model. To test the robustness of the grey model, we introduce a novel approach, that is, the bootstrapping test, whose implementation steps would be explained in corresponding part. Section 4 concludes this paper.

2. Methods

In this section, we first describe the existing grey model, which is also called GM (1, 1) in the literature on grey theory. Then, we illustrate how the novel robust grey model, which we propose in this paper, is implemented by researchers and analysts in practice. The novel model, proposed by us, using the least trim squares to estimate the structural parameters and using the new information priority criterion to improve its capability of prediction is abbreviated as NIPC-RGM (1, 1). Finally, we present approaches, which are used to test the robustness to outliers for models, as well as indicators, which indicate the accuracy of prediction.

2.1. The Existing Grey Model, GM (1, 1)

In this section, we provide a brief introduction to the existing grey model, that is, GM (1, 1). Suppose that there is a time series, whose entries are not negative. The time series is described as follows:

Using the first-order accumulative generation operation, we obtain a new time series, which is described as follows:where and .

The following equation:is called the basic differential equation for the classical grey model, that is, GM (1, 1).

is calculated by the following formula:which is known as the background value in the literature on the grey model.

The following equation:is defined as the whitening differential equation for the classical grey model, that is, GM (1, 1).

If we letthen the structural parameters, , in the classical grey model, that is, GM (1, 1), could be estimated using the least squares, which is described as follows:

The general solution for the classical grey model, that is, GM (1, 1), could be written aswhere represents an arbitrary constant.

If we set to be and we set to be 1, then the constant could be calculated by the following formula:

We substitute equation (9) into equation (8) and obtain the time response for the classical grey model, that is, GM (1, 1). That is,

Thus, using the estimates in the structural parameters obtained from the least squares, the prediction of could be calculated using the following equation:

The prediction of is obtained by the inverse first-order accumulate generation operation, which is

2.2. The Novel Robust Grey Model Integrating New Information Priority Criterion

Here, we illustrate the structure of the novel robust grey model that is proposed in this paper, which is expected to be robust to outliers. The novel model applies the least trim squares to estimating the structural parameters and adopts the new information priority criterion to enhance the accuracy of prediction. We name the novel model the robust grey model integrating the new information priority criterion, which could be abbreviated to NIPC-RGM (1, 1). In the following, we first explain the least trim squares method that is used to estimate the structural parameters. Then, we demonstrate the new information priority criterion that is used to optimize the initial condition for the grey differential equation. Finally, we present a complete algorithm for the novel robust grey model integrating the new information priority criterion, that is, NIPC-RGM (1, 1).

2.2.1. The Least Trim Squares

Distinguished from the ordinary least squares estimation, used to estimate the structural parameters in the classical grey model, the least trim squares estimation shows two advantages. On the one hand, it investigates the order of residuals squared, which probably is beneficial to improve the accuracy of prediction. On the other hand, it reduces the influence due to outliers and enhances the robustness to outliers.

Definition 1. Suppose that there is a series of points, arranged according to the order of the time, that is, . It satisfies a simple regression model, that is,

Definition 2. The structural parameters in the classical grey model are obtained using the ordinary least squares estimation, that is, equation (7), which could be rewritten using the following formula:

Definition 3. The structural parameters in the novel robust grey mode are obtained from the least trim squares estimation. That is,where represents the trimming constant, indicating that there are observations with the relatively small residuals, which would be used to estimate the structural parameters in the novel robust grey model. In this paper, we set the trimming constant to be . That is, we keep half of observations with the relatively small residuals to estimate the structural parameters in the novel grey model.
Comparing Definition 2 and Definition 3, we could see that if the trimming constant is equal to the number of observations in the total time series, then the estimates obtained from the least trim squares estimation would be the same as the estimates obtained from the least trim squares estimation. By Definition 3, we also see that the least trim estimation eliminates these observations with the relatively large residuals, which could be referred to as outliers.

2.2.2. The New Information Priority Criterion

In the literature on the classical grey model, most of articles use as the initial value, that is, the oldest value in the original time series. According to the new information priority criterion, we use the newest value, that is, , as the initial value.

Theorem 1. Given and , obtained from equation (15), the following conclusion could be summarized.

The solution of the whitening grey differential equation for the novel robust grey model integrating new information priority criterion, that is, NIPC-RGM (1, 1), could be written as

The prediction from the above time responses is

Proof. First, we consider a general solution for the novel robust grey model based on least trim squares estimation:

We set to be and obtain

Then, we have

Therefore, the time responses would be

So far, Theorem 1 is proved.

2.2.3. The Novel Robust Grey Model Integrating the New Information Priority Criterion

Now, we present the complete implementation steps, which could be described as follows:Step 1. Obtain the raw time series, that is, , as well as its first-order accumulative generating series, that is, .Step 2. Calculate the background values, that is, .Step 3. Estimate the structural parameters using the least trim squares estimation.Step 4. Calculate the predictions of the first-order accumulative generating series, that is, .Step 5. Obtain the prediction of the raw time series, that is, .

2.3. Tests for Robustness and Accuracy

The evaluation of the capability of the novel robust grey model integrating the new information priority criterion includes two aspects, that is, the robustness and the accuracy of the novel proposed model.

In order to evaluate the robustness of the novel robust grey model integrating the new information priority criterion, we perform a series of bootstrapping tests. That is, first, we randomly choose a number from the interval between the maximum and the minimum in the original time series. Second, we replace the original value in a particular year, such as the year of 2020, with the aforementioned number chosen randomly, which forms a simulated time series with an outlier in the particular year. Third, we apply the novel robust grey model integrating the new information priority criterion to make predictions based on the aforementioned formed simulated time series. Fourth, we calculate the mean absolute percentage error using the prediction and the values in the original time series. Fifth, we repeat the above steps 1000 times and obtain an empirical distribution of mean absolute percentage errors in a particular year when an outlier occurs. Sixth, we perform the bootstrapping test for the classical grey mode. Seventh, we compare the distribution from the novel robust grey model integrating the new information priority criterion with the distribution from the classical grey model. If the range of the distribution from the novel robust grey model integrating the new information priority criterion is smaller than the range of the distribution from the classical grey model, then we would integrate that the robustness of the novel robust grey model integrating the new information priority criterion is better than the robustness of the classical grey mode, GM (1, 1). If, on the other hand, the novel robust grey model integrating the new information priority criterion has a larger range of the distribution than the classical grey model, then our integration would be that the robustness of the novel robust grey model integrating the new information priority criterion is not better than the robustness of the classical grey model.

On the other hand, to compare the novel robust grey model integrating the new information priority criterion to the classical grey model, we use two statistical indicators. They are correlation coefficient and mean absolute percentage error, which are defined as

3. Results

In this section, we first investigate the robustness of the novel robust grey model integrating the new information priority criterion, using the bootstrapping technique that is described in the previous section in details. Then, we compare the accuracy of prediction for the novel robust grey model integrating the new information priority criterion and the classical grey model. Finally, we forecast the future values in Chinese electricity demand during the years 2023 to 2025.

3.1. The Robustness to Outliers

In Figure 1, we plot the empirical distribution of mean absolute percentage errors from the bootstrapping test for a particular year, when an outlier occurs, using the classical grey model over the period from 2011 to 2018. Figure 1 is divided into eight panels. In Panel A, we set the value in the year of 2011 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel B, we set the value in the year of 2012 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel C, we set the value in the year of 2013 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel D, we set the value in the year of 2014 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel E, we set the value in the year of 2015 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel F, we set the value in the year of 2016 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel G, we set the value in the year of 2017 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series. In Panel H, we set the value in the year of 2018 as an outlier, which is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series.

In Figure 2, we plot the empirical distribution of mean absolute percentage errors from the bootstrapping test for a particular year, when an outlier occurs, using the novel robust grey model integrating the new information priority criterion over the period from 2011 to 2018. Figure 2 is divided into eight panels. In Panel A, we set the value in the year of 2011 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel A in Figure 1. In Panel B, we set the value in the year of 2012 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel B in Figure 1. In Panel C, we set the value in the year of 2013 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel C in Figure 1. In Panel D, we set the value in the year of 2014 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel D in Figure 1. In Panel E, we set the value in the year of 2015 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel E in Figure 1. In Panel F, we set the value in the year of 2016 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel F in Figure 1. In Panel G, we set the value in the year of 2017 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel G in Figure 1. In Panel H, we set the value in the year of 2018 as an outlier that is repeatedly obtained using the bootstrapping technique at the interval between the maximum and the minimum of the original time series, which is in regard to Panel H in Figure 1.

Comparing Figure 1 with Figure 2, we could see that across all the panels except Panel A, the range of the distribution from the novel robust grey model integrating the new information priority criterion is smaller than the range of the distribution from the classical grey model, in the corresponding year when an outlier occurs. For example, the range of the distribution in Panel D of Figure 2 is about half size of range of the distribution in Panel D of Figure 1, suggesting that the robustness of the novel robust grey model integrating the new information priority criterion is better than the robustness of the classical grey model.

Besides, we see that the mean of the distributions from the novel robust grey model integrating the new information priority criterion is more close to zero than the mean of the distributions from the classical grey model in the corresponding year when an outlier occurs. For example, the mean in Panel B of Figure 1 is larger than 0.04, while the mean in Panel B of Figure 2 is smaller than 0.04, indicating that the novel robust grey model integrating the new information priority criterion could have a higher accuracy of prediction than the classical grey model.

3.2. The Accuracy of Prediction

Here, we test the accuracy of prediction. We divide the total dataset into two datasets. One represents the data during the period 2011 to 2018, while the other represents the data during the period 2019 to 2021. The former is referred to as the training set, while the latter is referred to as the test set. In order to illustrate that the novel robust grey model integrating the new information priority criterion has a better capability of prediction than the classical grey model whether an outlier occurs in the sample, we provide two settings in the current analysis. One represents the setting with an outlier, while the other represents the setting without an outlier. In the setting with an outlier, we consider the value in the year of 2015 as an outlier with 9.180, whose real value is 5.801.

3.2.1. There Is an Outlier in Sample

Table 1 reports the results during the period 2011 to 2021, when there exists an outlier in the sample. In columns (1) and (2), we provide the predictive values and absolute percentage errors, respectively, obtained from the classical grey model, that is, GM (1, 1). In columns (3) and (4), we provide the predictive values and absolute percentage errors, respectively, from the novel robust grey model integrating the new information priority criterion, that is, NIPC-RGM (1, 1). In the bottom of Table 1, we report mean absolute percentage error and correlation coefficient, which are the indicators of accuracy of prediction. From Table 1, we see that the correlation coefficients are the same for the two models, while the novel robust grey model integrating the new information priority criterion has a lower mean absolute percentage error in the test set than the classical grey model, indicating that the former has a higher predictive accuracy than the latter when an outlier occurs in the sample.

3.2.2. There Is No Outlier in Sample

Table 2 reports the results during the period 2011 to 2021, when there does exist outliers in the sample. In columns (1) and (2), we provide the predictive values and absolute percentage errors, respectively, obtained from the classical grey model, that is, GM (1, 1). In columns (3) and (4), we provide the predictive values and absolute percentage errors, respectively, from the novel robust grey model integrating the new information priority criterion, that is, NIPC-RGM (1, 1). In the bottom of Table 2, we report mean absolute percentage error and correlation coefficient. They are referred to as the indicators of accuracy of prediction. From Table 2, we find that the novel robust grey model integrating the new information priority criterion has the same value in correlation coefficient as the classical grey model, while the former has a lower mean absolute percentage error in the test set than the latter, suggesting that the novel robust grey model integrating the new information priority criterion has a higher predictive accuracy than the latter when there is no outlier in the sample.

3.3. The Forecasts of Chinese Electricity Demand during the Period 2023 to 2025

Previously, we have illustrated the robustness and the accuracy of prediction for our proposed novel robust grey model integrating the new information priority criterion. Here, we apply the novel robust grey model integrating the new information priority criterion to forecasting the future values in Chinese electricity demand from 2022 to 2025, that, is the values in the next four years. Table 3 reports the results. From Table 3, we see that Chinese electricity demand would continue to rise at a quicker speed in the next four years.

4. Conclusion

In this paper, we propose a novel robust grey model based on the least trim squares estimation. The novel grey model also integrates the new information priority criterion. We refer to it as the novel robust grey model integrating the new information priority criterion, which could be abbreviated as NIPC-GM (1, 1). We demonstrate the implementation steps for the novel robust grey model integrating the new information priority criterion. Also, we provide the evidence that the novel robust grey model integrating the new information priority criterion has a more excellent performance on the prediction of Chinese electricity demand than the classical grey model.

Our work contributes to grey models by focusing the issues related to outliers, which often take place in practice due to an incorrect record by chance or an accidental failure in equipment. The issues are little explored by the literature related to grey models, although recent literature has pointed out that due to outliers occurring in the sample, the grey model suffers from poor robustness and a low predictive accuracy. In this paper, we try to solve this problem. We introduce least trim squares estimation to estimate the structural parameters in the classical grey model. Our study also proposed a novel approach to test and illustrate the robustness of grey models, which adopted the bootstrapping technique to form a novel sample including artificial outliers. This approach also could be generalized to compare the robustness across a set of grey models and between grey models and other predictive models such as autoregressive integrated moving average models and machine learning models. In addition, we apply our novel robust grey models to predict Chinese electricity demand, which is a time series with large uncertainty. We find that the robustness to outliers is better when the series is modeled by the novel robust grey model than when the series is modeled by the classical grey model. Finally, we see that the accuracy of prediction is better when the series is modeled by the novel robust grey model than when the series is modeled by the classical grey model.

Of course, our work has limitations. For example, in this paper, we set the trimming constant to be half of the number of observation and exclude the probability of other value that the trimming constant is set to be, where the novel robust grey model integrating the new information priority criterion could have a higher accuracy of prediction. Future research is needed to investigate whether the value of the trimming constant would affect the predictive accuracy of the novel robust grey model integrating the new information priority criterion. Besides, future inquiry into comparison between the novel robust grey model integrating the new information priority criterion and the other robust grey models is needed.

Data Availability

The data used to support the findings of this study are included within the article.

Disclosure

This paper does not reflect an official statement or opinion from the organizations.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All authors equally contributed to this paper. Cong Wei conceptualized the study and provided supervision. Jiayang Kong collected the data, conducted the statistical analysis, and drafted the manuscript. Riquan Yao and Shaojun Jin contributed to the interpretation of the results. All authors provided critical feedback on drafts and approved the final manuscript.

Acknowledgments

The authors thank Botao Liu, Yu Cao, Lang Cheng, Wei Wei, and Yihui Li for their helpful comments on an earlier version of this paper. In addition, Jiayang Kong wants to thank, in particular, Jieru Meng for the patience, care, and support over the years. Of course, the authors acknowledge the Science and Technology Project of State Grid Corporation of China (grant no. B311UZ21000D).