Abstract

Grey system theory has been widely used to forecast the economic data that are often highly nonlinear, irregular, and nonstationary. The size of these economic datasets is often very small. Many models based on grey system theory could be adapted to various economic time series data. However, some of these models did not consider the impact of recent data or the effective model parameters that can improve forecast accuracy. In this paper, we proposed the PRGM(1,1) model, a rolling mechanism based grey model optimized by the particle swarm optimization, in order to improve the forecast accuracy. The experiment shows that PRGM(1,1) gets much better forecast accuracy among other widely used grey models on three actual economic datasets.

1. Introduction

Forecasting can be an important issue to many fields of economy; especially its accuracy was ensured to do a reasonable prediction that could change the economic policy of large companies and governments and ensure a more reasonable behavior by the financial actors. The ideal state is that the prediction error tends to be more and more smaller, but in fact, we can only do our best to research and develop the prediction algorithm as much as possible to improve the prediction accuracy.

Many forecasting models have been proposed; in general, these models can be divided into two categories: causal models and time-series models [1]. Causal models assume that historical relationship between dependent and independent variables will remain valid in future. Causal models include multiple linear regression analysis and econometric models which assume that independent variables could explain the variations in dependent variable. However, the limitation of causal models is the availability and reliability of independent variables.

Time-series models assume that history will repeat itself and its prediction refers to the process by which the future values of a system are forecasted based on the information obtained from the past and current data points. In the literature, two main techniques for time series prediction are statistical and artificial intelligence (soft computing) based approaches. The well-known statistical models proposed include AR (autoregressive), MA (moving average), ARMA (autoregressive moving average), ARIMA (autoregressive integrated moving average), and Box-Jenkins models. The statistical models are too weak to solve the nonlinear problems and too complex to be used in predicting future values of a time series.

The widely used artificial intelligence approaches include neural network (NN) [24], support vector machines (SVM) [58], fuzzy systems [9], linear regression, Kalman filtering [10], and hidden Markov models (HMM) [3]. All of these approaches are used for updating the model parameters. In the recent years, several hybrid models [1114] were proposed to improve the forecast accuracy. However, these artificial intelligence based approaches demand a great deal of training data and relatively long training period for robust generalization [11]. For those economic predictions, which are very difficult to construct a model by using neither the conventional linear statistical methods nor the artificial neural networks because the economic time series are highly nonlinear, highly irregular, and highly nonstationary [15].

Grey system theory was introduced and developed by Deng back in 1989 to be used for mathematical analysis on the phenomenon of uncertainty and roughness. It requires a small set of training data, which are discrete or incomplete, to construct a model for future forecast. The uncertainty and roughness training data are “grey” data [16]. Similarly, “white” data means that the information is completely clear, while “black” indicates that the information is completely unclear.

Grey system theory has been widely and successfully used to forecast all kinds of data in the many areas such as economic, financial, agricultural, and industrial areas and energy. In the past few years, grey system theory has been employed for solving the forecasting economic problems. The model GM(1,1) built from grey system theory has shown that this approach is very efficient to forecast the irregular and nonlinear economic time series data. A combination of residual modification and residual artificial neural network (ANN) sign estimation is proposed to improve the accuracy of the original GM(1,1) model [1719]. However, this approach needs long training period.

Rolling mechanism is one of the most effective methods to improve the performance of grey system model and handle noisy data [7, 2022]. The authors in [22] used the rolling mechanism to improve the forecast accuracy of grey model for education expenditure. Zhao et al. [23] proposed rolling mechanism to forecast the per capita annual net income of rural households in China and showed that it outperformed other traditional grey prediction models and a differential evolution algorithm proposed to optimize rolling grey prediction model. The authors in [2426] proposed an improved rolling grey model, which can update the model parameters on the coal production forecast and semiconductor industry production forecast, respectively.

However, although these improved rolling mechanism based grey models could adapt to various economic time series data because they considered the recent data that can improve forecast accuracy in future prediction, they did not consider the impact of their model parameters which are fixed through the whole prediction period or only considered a simple change of the model parameters for the prediction which could perform well on noiseless sequence, but it could not adapt to the noisy data.

In this paper, we proposed an improved rolling mechanism based grey model optimized by the particle swarm optimization (PSO for short) to improve the forecast accuracy, especially for the highly irregular and noiseless data. PSO, which belongs to swarm intelligence methods, is considered as a tool for modeling behavior and for optimization of difficult numerical solutions, since it was developed by [27] as an evolutionary computing technology. PSO algorithm had been enormously successful on about 700 applications [28]. We choose PSO to optimize our model parameters for two significant reasons: its routinely delivering of good optimization results like NN methods and its simplicity to get better results in a faster and cheaper way that NN methods cannot achieve.

This paper examines a rolling mechanism based grey model with PSO optimization on economic data. Section 2 outlines the original grey model GM(1,1) and the improved GM(1,1) model with rolling mechanism. Section 3 presents the rolling mechanism based grey model with PSO optimization. We also propose a PSO based algorithm that searches the best value for the model parameter. Furthermore, we illustrate that our model gets much better performance on three economic dataset: financial intermediation in Beijing, real estate in Beijing, and semiconductor production in Taiwan, compared with other grey system theory based models. Section 5 concludes this paper.

2. Grey Model Background

The grey system theory mainly focuses on extracting realistic governing laws of the system from the available data of the system generally with white noise data. A grey model in grey system theory is denoted by GM, where indicates the order of the difference equation and indicates the number of variables.

GM(1,1) is the original grey model, which has been widely applied to carry on the short-term prediction because of its computational efficiency. It uses a first order differential equation to predict an unknown system. A GM(1,1) algorithm is described below.

Step 1. The original time sequence is initiated by where the time series data at time and is the length of sequence which must be equal to or larger than .
On the basis of the initial sequence , a new sequence is set up through the accumulated generating operator (AGO), which is monotonically increasing to weaken the variation tendency defined as

Grey system theory is applied to accumulate generation of to obtain a new sequence , which has a clear growing tendency.

Step 2. Establishing the first-order differential equation of grey model GM(1,1) as and its difference equation is where is the development coefficient, is the driving coefficient, and is the generated sequence of .

In the original GM(1,1), is set to the mean value of adjacent data . In this paper, we proposed a method by using the PSO algorithm to find a more efficient value of .

Step 3. From (5), we can obtain the following equation:

In the above, is a sequence of coefficient parameters that can be computed by employing the least squares method: where is the constant vector and is the accumulated matrix

Step 4. Substituting in (6) with (7), the solution of the prediction value of at time is
After performing an inverse accumulated generating operation on (10), the predicted value of at time is , where .

GM(1,1) uses the whole data set for prediction. However, the recent data can improve forecast accuracy in future prediction [21]. Rolling mechanism, which is a metabolism technique that updates the input data by discarding old data for each loop in grey prediction, can be applied to perform the perfect prediction. The purpose of RM is that, in each rolling step, the data utilized for next forecast is the most recent data. The RM-GM is an efficient technique to increase the forecast accuracy in the case of having noisy data. The data may exhibit different trends or characteristics at different times, so to address these differences, it is preferable to study such noisy data with the RM-GM, and the RM provides a means to guarantee input data are always the most recent values.

3. PSO Optimized RM-GM Model

Because directly influences the calculation of and in GM(1,1) model and is one of the most important factors that may decide the performance of the models; we present an algorithm based on RM-GM(1,1) combined with PSO which optimizes the parameter in each rolling period to improve the forecast accuracy.

In basic GM(1,1) model, the value of is customarily set to the mean value 0.5 for each in the generated sequence . It means that each data has the equal impact on every future predicted data. However, the authors in [29] found that GM(1,1) model often performs very poor and makes delay errors for quick growth sequences because of the mean value on the generated sequence . Tan proposed a method that set to , where , in order to widen the adaptability of GM(1,1) model to various kinds of time sequences. The authors in [26] found that the RM-GM with variable value generates better forecasts than with a fixed value. They determined the value by the timely percent change. From this study, we can find that for the trend prediction of nonmonotonous functions, the forecast outcomes are much better if the value of is set appropriately on the grey predicted results. However, Tan's method used the whole data set to calculate a fixed value of . It did not consider the influence of recent data which would improve accuracy.

In an improved RM-GM(1,1) algorithm, the strategy of finding a value of could be proposed in a variety of ways. The basic RM-GM(1,1) sets the value of to 0.5, which does not consider any influence of sequence data. Although Tan's strategy could adapt to various sequences, it did not consider the impact of the recent data from the sequence. Chang's strategy only considered the timely percent change for the prediction. It could perform well on regular and noiseless sequence, but it could not acclimatize itself to the noisy data sequence. In this paper, we select PSO as our strategy to find the value of in each loop in -RM-GM(1,1). We named our PSO-based algorithm as PRGM(1,1).

3.1. Characteristics of PSO

Two significant reasons that make using PSO to calculate the parameter are its routinely delivering good optimization results and its simplicity. Compared with another commonly used swarm intelligence method, ant colony optimization (ACO), which is not easy to be used to define variables for the given problems, PSO is not only a metaheuristic that makes few or no assumptions about the problem being optimized, but can also search very large spaces of candidate solutions. It does not require that the optimization problem be differentiable. Since the problem of predicting economic data is partially irregular, noisy, and, changing over time, PSO is a better choice to be employed to optimize parameter . Another one of the most significant advantages of PSO algorithm is its relatively simple coding and low computational cost. Compared with other optimization algorithms, like ACO, which requires massive computation, PSO can get better results in a faster and cheaper way [30]. Hence, PSO algorithm can even perform well in the applications that need power-aware computing on smart or personal devices that have limited computational, storage, and energy resources in the case of guarantying the prediction accuracy.

3.2. Calculating by PSO

The PSO is a population-based optimization technique in which the optimal solution can be found by iteration and the solution quality is evaluated by the fitness. In the PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) that it has achieved so far. First, a dimensional space with particles is initialized. The particles' position and velocity are randomly initialized. The position of the th particle is represented as and its velocity is represented as , where and . Then the objective function values (forecast errors) of all particles can be computed. Then, the particles are updated iteratively until the termination condition is satisfied. It includes the particles’ own speed and location according to the following two formulas for all particles: where, and are determined as the objective function values fitness which should be set according to the actual problem solving. For the prediction, it can be set to the smallest prediction error. and , respectively, represent the individual extreme value of the th particle found by the particle itself at th dimension and the global optimal value which records the best particle among all the particles in the group; is the pointer of iterations; and are two positive acceleration constants; is the uniform random value in the range ; is the velocity of a particle at iteration ; is the current position of the th particle at iteration ; is the inertia weight determining how much of the particle's previous velocity is preserved. If the current value is better (with smaller forecast accuracy index value), then update the best position and its objective function value of the particle with the current position and corresponding objective function value. Finally, determine the best particle of the whole population based on their best objective function values. If their objectives function value is smaller than the current global optimal objective function value , then update the best position and objective function value for the entire swarm with the current best particle's position and objective function value.

3.3. Parameter Selection

In -PSO algorithm, the values for the cognitive weight , social weight , and the inertia weight having to be selected would have an impact on the convergence speed and the ability of the algorithm to find the optimum. However, different values may be better for different problems. Many works have been done to select a combination of values that works well in a wide range of problems. Both theoretical and empirical studies are available to help in selection of proper values [3134].

Generally, the individual and sociality weights and are both set to 2. A proper value of inertia weight provides a balance between global and local explorations. A large inertia weight favors global search, while a small inertia weight favors local search [31, 35]. In practice, often reduces linearly from about to . The authors in [31] suggested that utilizing LDW (linear decreasing weight) policy which improved a lot compared with optimization of the benchmark equation algorithm, but not the most common and suitable for the reason that demanding the searching process is linear. It is suggested that for each iteration setting the inertia weight according to the following equation may be a better choice:

A proper value of the inertia weight provides a balance between global and local explorations. A large inertia weight favors global search, while a small inertia weight favors local search. In general, settings near facilitate global search, and settings ranging from facilitate rapid local search. The linear decreasing weight (16) is introduced to dynamically adapt the inertia weight (13). and are usually set to and : The nonlinearly decreasing inertia weight (14) incorporates the hyperbolic tangent function (15) to update of each particle : where is the neighborhood index of the particle , which is calculated at each iteration as where is the global worst fitness value at the current iteration. A small indicates the current position is bad and needs global exploration with a large inertia weight. On the contrary, a large indicates the requirement of local exploitation with a small inertia weight.

The constriction factor was used to control the magnitude of the velocities, instead of . The velocity update scheme is replaced with the following: where and generally .

4. Experiments and Evaluations

4.1. Datasets

The prediction of the development of tertiary industry is a very important topic in economic and financial areas. However, time series prediction in economic area is generally very difficult because it is nonstationary, nonlinear, and highly noisy.

In order to illustrate that our PRGM(1,1) algorithm gets better performance on both smoothing and noisy data forecasting model by using small set of training data, we used three datasets: financial intermediation in Beijing during 1994 to 2010 which has relatively smoothing trends, real estate in Beijing during 1994 to 2010 which seems much nonlinear, and semiconductor industry production in Taiwan from 1994 to 2002 which seems regular from 1994 to 2000 but irregular since 2000. All datasets are collected from the China Statistical Yearbook, National Bureau of Statistics of China.

4.2. Evaluation Metrics

Prediction accuracy is an important criterion for evaluating a forecasting technique [36]. In this paper, three metrics, namely, mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean squared error (MSE), which are often adopted for the performance of each model [6, 22], are used to evaluate the prediction accuracy. MAPE is a general accepted metric in percent of prediction accuracy. The criterion of MAPE [37] is listed in Table 1:

MAD and MSE are two metrics of the average magnitude of the forecast errors, but the latter imposes a greater penalty on a large error than several small errors. The smaller the values, the closer the predicted values to the actual values [38]:

Besides, the coefficient of determination, denoted as , is also applied to evaluate models in our experiments: where,  ,, .

The higher the value of is, the more successful the model is at predicting statistical data [23]. The maximum value of the coefficient of determination is .

4.3. Experimental Setup

The experiments are divided into two parts, Experiment I and Experiment II. Experiment used the datasets of financial intermediation and real estate in Beijing. The data from 1994 to 2005 were used as sample data, while the data from 2006 to 2010 were used for prediction and test. Experiment I compared three prediction models on these data, GM(1,1), RM-GM(1,1), and PRGM(1,1). Experiment II compared various PRGM(1,1) with different parameter settings.

The values of the parameters for PRGM(1,1) are selected in both experiments. We set the number of candidates of in particle searching space to and the maximum number of iterations to . For the basic PRGM(1,1), we set the two weights, and .

4.4. Experiment I

Table 2 shows the parameters calculated by the three prediction models, GM(1,1), RM-GM(1,1), and PRGM(1,1). In GM(1,1) which is constructed by all of the data 1994–2005 with the fixed value , the parameter is equal to a fixed value and is also equal to a fixed value for all the predicted years in financial intermediation. Similarly, and for all the predicted years in real estate.

In RM-GM(1,1), we set the sample sequence with and starting from 1994 to forecast the data from 2006 to 2010. Hence, the rolling number equals . The value is also fixed to in RM-GM(1,1). However, the parameters and change for every predicted year because of the rolling mechanism.

In PRGM(1,1), similar with RM-GM(1,1), the sample sequence with and that starts from 1994 to 2005 was used for predicting the 5 years' data since 2006. However, the value of is a variable of year that is different among the predictions of 2006–2010. Hence, the parameters and change for every predicted year because of both the rolling mechanism and the variety of .

Table 3 shows the evaluation metrics among GM(1,1), RM-GM(1,1), and PRGM(1,1). For the dataset of financial intermediation, PRGM(1,1) with the MAPE value %, compared with the MAPE value of GM(1,1) and RM-GM(1,1), % and %, respectively, shows much better prediction performance than the other two models. The MAD and the MSE also indicate the excellent results produced by PRGM(1,1). The coefficient of determination produced by PRGM(1,1) is nearly to the maximum value . For the dataset of real estate, the prediction by PRGM(1,1) model still shows excellent results with the MAPE % comparing results % and % produced by GM(1,1) and RM-GM(1,1), respectively. PRGM(1,1) shows nearly times better performance than either GM(1,1) or RM-GM(1,1) in both the MAPE and the MAD metrics and times better in the MSE metric. PRGM(1,1) could predict the future data much more successfully with which equals .

Table 4 shows the forecasting results of the semiconductor industry production from 1998 to 2002 predicted by values RM-GM(1,1) and PRGM(1,1) using the sample data of 1994–2002. We compared our results of PRGM(1,1) with the results produced by value RM-GM(1,1) from the literature [26]. The MAPE value of PRGM(1,1), that is, %, is better than the value of % from P-RM-GM(1,1), The error of predication, which is defined as that indicates the deviation degree of the predictive data from the actual data for each year among 1998–2000 from PRGM(1,1), is much lower than from P-RM-GM(1,1). The actual value suddenly fell by more than %. PRGM catches the trends well, which means that PRGM has remarkable ability to predict the irregular sequence, especially to sense the unexpected changes. However, value RM-GM(1,1) model gets the predictive value of 2002 with a very small percentage error %, but the error value of PRGM(1,1) model is %. The reason is that PRGM(1,1) model can get better results of matching the trends that the production data rebounded from the slump of 2001. PRGM(1,1) is much better than value RM-GM(1,1) to forecast the trends of time series sequences, which is significant for the economic prediction.

4.5. Experiment II

In this experiment, we estimated PSO variants of different parameter configurations. We evaluated the constant setting and linearly varying settings of and on prediction accuracy. In constant settings, the configuration of is the best. It is in accordance with most of the previous conclusions. In linearly varying setting (see (12)), there is not much improvement on the metrics compared with the constant setting. We also evaluated the forecasting performances with diverse combinations of the start values and , and the end values and ranging from with a step of and found that there still is not much difference among them for all of the three datasets.

We evaluated three kinds of settings, constant, linearly decreasing, and nonlinearly decreasing. In constant setting, the optimal setting is for all of the datasets. We also observed that the performance is exactly the same when the population size is with different values of ranging from . In linearly decreasing setting (13), we varied the combinations of and ranging from with a step of , respectively. The results showed that there is nearly no difference on the metrics among different combinations. It indicates that the historical setting does not have much impact on the forecasting performance by linearly updating .

We also used the nonlinearly varying (14) and the constriction factor (18) to update particles' velocities (17). Figure 1 shows that the nonlinearly varying setting and the constriction factor setting with linearly varying and in the meantime can improve the prediction performance. The nonlinearly varying method does not require an initial setting of or . It calculates the dynamically according to the current situation. A large is set if current position is far away from the global best position, or a small is set if current position is near to the global best position. The constriction factor can slow down the velocities but needs to combine with linearly varying method to control the effects of and in order to search much more spaces.

Figure 2 shows an illustration of the evolution of the fitness at the first predicted year in all of the datasets. According to our empirical study, the maximum iteration can be set to 60–80 in the single particle PSO. Figure 2 shows the comparison of the convergence speed among variant PSOs. There is no general rule on these PSOs for all of the datasets, but all PSOs converge after 60–80 iterations at most. The time complexity of the PSO is . The runtime is dependent on both population size and iteration number.

5. Conclusions

In this paper, we proposed a rolling mechanism based grey model, and its parameter is optimized by the PSO algorithm, which has the significant impact of the forecast accuracy. The experiments show that the prediction made by PRGM(1,1) model is almost perfect among three economic datasets, which are either regular or noisy. PRGM(1,1) gets much better forecast accuracy compared with three widely used grey models: GM(1,1) that has a fixed and ignores the impact of recent data, RM-GM(1,1) that considers the impact of recent data but has a fixed through the whole prediction period, and value RM-GM(1,1) that not only considers the recent data but also adjusts in each rolling step.

We evaluated other variant PSOs with different parameter settings. Almost all of metaheuristics are required to set a number of parameters, which might lead to different outcomes, for example, multiple locally optimal solutions in the parameter space in terms of solution quality. An extension of this work includes analyzing the principles of balancing exploitation and exploration of metaheuristics on forecasting. We will focus on the work of the details of comparing the effectiveness of the exploitation or the exploration among them and analyzing the different concepts or philosophy within them.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by Cuiying Grant of China Telecom, Gansu Branch (Grant no. lzudxcy-2013-3), Science and Technology Planning Project of Chengguan District, Lanzhou (Grant no. 2013-3-1), and Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry (Grant no. 44th). This work is also partially supported by Fundamental Research Funds for the Central Universities (Grant no. XDJK2014C141 and SWU114005).