Abstract

A new methodology, which combines nonparametric method based on local functional coefficient autoregressive (LFAR) form with chaos theory and regional method, is proposed for multistep prediction of chaotic time series. The objective of this research study is to improve the performance of long-term forecasting of chaotic time series. To obtain the prediction values of chaotic time series, three steps are involved. Firstly, the original time series is reconstructed in m-dimensional phase space with a time delay τ by using chaos theory. Secondly, select the nearest neighbor points by using local method in the m-dimensional phase space. Thirdly, we use the nearest neighbor points to get a LFAR model. The proposed model’s parameters are selected by modified generalized cross validation (GCV) criterion. Both simulated data (Lorenz and Mackey-Glass systems) and real data (Sunspot time series) are used to illustrate the performance of the proposed methodology. By detailed investigation and comparing our results with published researches, we find that the LFAR model can effectively fit nonlinear characteristics of chaotic time series by using simple structure and has excellent performance for multistep forecasting.

1. Introduction

In recent decades, researchers have paid much attention to chaos motion in many fields, such as meteorology, medicine, economics, signal processing, traffic flow, power load, Sunspot prediction, and many others [112] and bring about lots of new models for predicting chaotic time series. In the late 1960s, researchers found it is a difficult task to forecast chaotic time series which is the evolution of a chaotic system’s observations by using traditional time series forecasting methods [1]. Then a series of theories and methods was established for understanding essence of chaos motion, such as Takes’ embedding theory [13]. Now, chaos theory has become an important part of nonlinear science and is used for forecasting chaotic time series.

Up to now, modeling of chaotic systems constructed from observed data and predicting one or several future values of the time series have become an important issue [14]. There are many prediction methods that have been proposed, such as adaptive prediction [15], the support vector machine (SVM) [1620], polynomial estimation [2124], and neural network (NN) [2529]. In most of the published literature, single-step prediction was considered. For multistep prediction, the direct and iterative methods are proposed as two main categories. The direct multistep prediction does not use the prediction values in the future; the iterative multistep prediction uses short-term predictor and is built through recursive prediction, which means the future values are calculated by the predictor itself. However, multistep prediction becomes a difficult task because of the limited largest Lyapunov exponent of the chaotic system. Some researchers have been focusing on multistep prediction and using NN or its extended models to improve the performance of multistep prediction [2931]. Some researchers’ studies show that the accuracy of prediction can be improved by using hybrid technique, such as combined SVM and Neuro-Fuzzy [32] and neural network and Neuro-Fuzzy [25, 26]. Researchers’ studies also show that hybrid technique can appear to have good performance by using the prediction error, such as the combined PCA and SVM [19] and ARMA and RESN [7]. The generalized nonlinear filtering methods are investigated for 5-step prediction of chaotic time series in [33]. These methods generally prompt better results than those single models, but they are complex, affected by personal experience, and easy to overfit.

In this paper, we propose to use functional coefficient autoregressive (FAR) model instead of local linear structure to approximate the local attractor in reconstructed phase space. As in [34], it is a nonparametric estimation of nonlinear dynamics. The proposed method combines chaos theory and local technique and has excellent spatial adaptation to effectively fit nonlinear characteristics of chaotic time series. Unlike RBF-AR model, the LFAR model has reasonable simple implementation and is rarely affected by personal experience. Furthermore, the LFAR model can avoid overfitting by controlling the dimension of the primary functions which are used for estimating the functional coefficients of LFAR model. In this study, an algorithm based on the dynamic least squares criterion for estimation of local functional coefficients is proposed. The effectiveness of the proposed model is demonstrated by the application to simulated data (Lorenz and Mackey-Glass systems) and real data (Sunspot time series). In these cases, we analyze and estimate the functional coefficients by using the proposed algorithm and examine the properties of iterative multistep prediction.

The remainder of this paper is organized as follows. Section 2 reviews the concept of the LFAR model and the optimal parameter set is established by using GCV. Section 3 uses the simulated chaotic systems and one real life time series as examples to evaluate the proposed models and discuss the properties of model’s parameters and also compares the results with published researches. Section 4 presents the conclusion of this paper.

2. Methodology

2.1. Phase Space Reconstruction for Chaotic Time Series

For a scalar chaotic time series , the phase space can be reconstructed by Takes’ embedding theory and the reconstructed phase points are , where . The embedding dimension and the time delay can be obtained by using Cao’s method [38]. Then, a continued vector mapping or can be described by the unknown evolution from to or . That is, or .

2.2. The LFAR Model and Estimation Method

A chaotic time series prediction model for describing evolution from to can be written asThe continued vector mapping is the best prediction function in the sense that minimizes the expected prediction error:

The saturated nonparametric function cannot be estimated with reasonable accuracy due to the curse of dimensionality [30]. A LFAR model for chaotic reconstruction data is presented in -dimensional reconstructed phase space. That is,where is the lag of the model-dependent variable , is the embedding dimension, is the time delay, and functional coefficients are continued functions.

The functional coefficients are difficult to estimate because they are considered as nonparametric and do not have a conformed form. There are many nonlinear forms that can be used. Finding a good nonlinear form is hard by trying one model after another. Here the local nonparametric method is applied to obtain the estimations of unknown functional coefficients . Using Taylor’s series expansion, with -order derivative near the point can be described as follows:

Let ,  ,  ,   is the bandwidth for controlling the number of the nearest neighbor points, and , . Ignoring the higher order infinitesimal, we can approximate by the -dimensional primary functions as follows:where is used to replace variable and represent the weight coefficients of primary functions.

The LFAR model at the current state point in -dimensional phase space can be described as follows:where , . In order to obtain a LFAR model at the current state point in the reconstructed phase space, we select nearest neighbor points by using the Euclidean distances . The estimations of the parameters of the LFAR model at state point can be obtained by solving the following weighted least squares (WLS) regression problem:whereand is a kernel function and is a nonnegative function which emphasizes neighbor observations around . The parameter is used to determine the weights of neighboring observations around in estimating .

Let ; we have . Then we can obtain a prediction of next point:

SetThen the WLS solution iswhere is a diagonal matrix with as its th diagonal element, which entails .

2.3. Determination of Optimal Parameters

In the process of modeling LFAR, the parameter set needs to be estimated. The embedding dimension and the time delay can be calculated by Cao’s method and the autocorrelation function method. For the remaining part of the parameter set, it is clear to see that the accuracy of prediction based on LFAR model is sensitive to the kernel function and the parameter . Here we have the form of kernel function as follows:Conventionally, we have and . Let ; then for any . And let , the kernel for any , which dues to the local linear model. For , the kernel changes from to , which means that the weight at the same neighbor point changes from to . The parameter can adjust the weight and the parameter can adjust the convergence speed of kernel when or . In this study, we consider simulated chaotic systems and real life time series as examples to investigate a proper parameter to achieve the best performance for modeling a LFAR model.

It is also very crucial to choose a proper dimension of primary functions to achieve the best performance for modeling chaotic time series. Low dimension leads to bad simulation results, and high dimension increases the complexity of computation and leads to overfitting. Hence, our main purpose of this section is to determine the parameter set . Generally, the optimal dimension should be selected to minimize the mean squared error (MSE) or its improved versions. Here we use a simple and quick method which is proposed in [34]. It can be regarded as a modified generalized cross validation criterion. Let and be two given positive integers and satisfy . First we use subseries with sample size to estimate the unknown coefficient functions and then to compute the multistep forecasting errors of the next part with sample size based on the estimated model. For example, the data with sample size is used to get the estimated model, and the prediction errors for the next data are computed. Then, the data with sample size is used and so on. The average prediction error or the standard prediction errors [31] use the subseries which is given bywhere . The overall prediction error is given byFan et al. [34] set and , and Meng and Peng [28] set and . The selected bandwidth does not critically depend on the choice of and as long as is reasonably large. In practical implementations, we select Meng’s method.

We select the proper bandwidth by minimizing . The function is minimized by comparing its values in a finite set of scale parameters in a grid . We can obtain different with different , , and . It is also very important to choose the parameter set . Finally we can determinate the optimal parameter set .

2.4. Algorithm Description and Multistep Prediction

Select the optimal parameter set in executable LFAR algorithm that contains the process of the phase space reconstruction parameter set , the number of nearest neighbor points , the dimension of primary functions , and the weight of neighbor observations parameter . Here, we choose the number of nearest neighbor points from to . To speed up the computation and avoid overfitting, we let . We select and   and set . The optimal parameters’ selection is described in detail in Algorithm 1.

Input: The relevant data, such as: chaotic time series, the initial parameters
Output: Results, such as: the optimal parameters
 Scale chaotic time series between ;
For  %  
 For  %  
  The autocorrelation function method, The Cao method.
 End
End %  Obtain the embedding dimension and the time delay
 Choose training set and select primary functions, Phase space reconstruction based on ,
Compute the Euclidean distances between phase points in the reconstructed phase space.
For
%  ,  ,  
 Compute .
End
 Select the optimal parameter set:

In Section 2.2, the single-step prediction is given. For multistep forecasting, as we say in Section 1, there are two possible approaches. The iterative -step prediction is to add to the training set and utilizes the LFAR model iteratively. The direct -step prediction is to fitThe optimal parameter set needs to be selected to minimize the again. For multistep prediction in this paper, we use the iterative approach. The executable multistep prediction of LFAR model based on Algorithm 1 is described in detail in Algorithm 2.

Input: The relevant data, such as: chaotic time series, the optimal parameters
Output: Results, such as: the prediction values
Scale chaotic time series between ;
Phase space reconstruction based on embedding dimension and the time delay ;
Compute the Euclidean distances between phase points in the reconstructed phase space;
For   %   is unknown data.
Compute prediction values by using the optimal parameters ;
Add prediction values to the training set;
Compute Euclidean distances between the new phase point and others;
End

3. Numerical Experiments and Performance Evaluation

To compare our results with others, we looked for published research doing long-term prediction, testing it with Lorenz system, Mackey-Glass system, and the Sunspot time series. All these chaotic time series are scaled within as follows:

3.1. Prediction of Lorenz System

Lorenz time series can be produced as follows (Lorenz, 1963, see [1])where , , and are commonly selected as , , and . The standard fourth-order Runge-Kutta method is used to get Lorenz time series, and the -coordinate is used as observations. A time series with sample size 2300 is randomly generated. The first 1800 pieces of data are used for training. The data with sample size is used to estimate model, and the prediction errors of the next pieces of data are computed. Then, the data with sample size is used, and the prediction errors are computed and so on. And the rest of data is treated as testing. The results of multistep prediction for Lorenz time series are shown in Figure 1.

From Figure 1, it can be seen that the LFAR model has small error values, and the values of multistep prediction are in good agreements with the real data. And the multistep prediction values of LFAR start diverging significantly from the 450th time step; this implies that the proposed model has a good performance of the multistep prediction of chaotic time series.

The results of prediction are shown in Figures 2, 3, and 4. Here we have four parameters, but we cannot show them in a 5-dimensional picture. So we sort the parameter set and show the less than in Figure 2. From Figure 2, we can see that the LFAR model’s with different parameters are similar. More details about the parameters are shown in Figures 3 and 4. Figure 3 shows that the functional coefficients’ order mainly takes four, the lag variable is selected from to at different phase points, the weight parameter changes with other parameters, and the number of nearest neighbor points is chosen in the vicinity of . We can obtain the optimal parameters from Figure 2. In Figure 4, we change one parameter and fix the other optimal parameters to investigate the influence of the changed parameter. From Figure 4, we can see that parameters can effectively influence the prediction error.

In Figure 5, we show the functional coefficients’ estimation. From this figure, we can see that the estimated values are seasonal within 400 prediction steps, and the accuracy of prediction declines and does not follow precious law beyond 400 steps.

3.2. Prediction of Mackey-Glass System

Mackey-Glass system is used in literature as a benchmark model due to its chaotic characteristics [39]. Mackey-Glass time series is generated by the following discrete form:where , , and and initial conditions . Thus, we can obtain a scalar chaotic time series sample set with sample size of 3300. Then we choose 1800 pieces of data for training, and the remaining part is treated as testing data.

Figure 6 compares the real values with prediction values for the remaining part. From Figure 6, it can be seen that the LFAR model has small error values, and the values of prediction are in good agreements with the real values. The multistep prediction values of LFAR model start diverging significantly from the 900th time step, and this implies that the proposed model has a good performance for the multistep prediction of chaotic time series.

The results of prediction are shown in Figures 7, 8, and 9. From Figure 7, we can see that the LFAR model’s with different parameters are similar. Figure 8 shows that the functional coefficients’ order mainly takes four, the lag variable is selected from to at different phase points, the weight parameter changes with other parameters, and the number of nearest neighbor points is chosen in the vicinity of . We can obtain the optimal parameters from Figure 7. From Figure 9, we can see that parameters can effectively influence the prediction error.

In Figure 10, we show the functional coefficients’ estimations. From this figure, we can see that the estimated values are seasonal within 900 prediction steps, but the prediction accuracy declines and does not follow the precious law beyond 900 steps.

3.3. Prediction for Sunspot Time Series

Sunspot time series is a good indication of solar activity for solar cycles. The monthly smoothed Sunspot time series is obtained from the SIDC (World Data Center for the Sunspot Index). To compare the results with different models in the literature, data are selected in the same conditions reported by [8, 9]. Sunspot series from November 1834 to June 2001 (2000 points) are selected and scaled within . The first 1000 samples of time series are selected for training and the remaining 1000 samples are kept to test the prediction models.

Results are shown in Figures 11 and 12. From Figure 11, it can be seen that the prediction values of LFAR model have accuracy of prediction.

From Figure 12, for the multistep prediction, we find that the prediction values of LFAR model start diverging significantly from the time step. This is because the real data has noise, and the noise affects the performance of prediction. Besides, the 60 time steps can still help us predict 5-year Sunspot data in the future.

3.4. Results and Discussion

We compare the proposed models with some of the models reported in the literature and measure performance with mean squared error (MSE), root mean squared error (RMSE) and normalized mean squared error (NMSE), relative error (), and symmetric mean absolute percentage error (SMAPE), namely,

The comparative results are shown in Tables 13. Data cannot be selected in the same conditions reported in the literature; thus the conclusions from Tables 13 have a few mistakes. In Table 1, we can see that the proposed models in this paper are better than some of the existing methods for predicting Lorenz time series. But the best results are of RBF and PSORBF [30]. This implies that the RBF can predict multistep results well, but the optimal parameters are difficult to obtain. The proposed model can obtain all optimal parameters. Table 2 presents that the proposed models are the best. For the real-world time series in Table 3, we can see that the proposed models are better than most of the existing methods only except the CERNN [29].

4. Conclusions

We propose a new methodology for forecasting chaotic time series based on FAR model, chaos theory, and local nonparametric technique. Firstly, the chaotic time series are reconstructed in -dimensional phase space with a time delay by using chaos theory. Secondly, the neighbor points are selected by using local method in the -dimensional phase space. Thirdly, we use the nearest neighbor points to identify a novel FAR model by using local least squares method. Finally, all parameters are calculated by the GCV criterion and judged by the sense of SAPE. The proposed functional coefficient autoregressive (FAR) model is used instead of local linear structure to approximate the local attractor in reconstructed phase space, which is a local nonparametric estimation of nonlinear dynamics.

An algorithm based on the dynamic least squares criterion for estimation of local functional coefficients is proposed. For Mackey-Glass and Lorenz attractors, the parameters have been investigated carefully. For Sunspot time series, which is noise in data, have better one-step forecasting performance. And multistep prediction model can forecast accurately before 60th, and this can help us predict 5-year Sunspot data in the future. In these cases, we analyze and estimate the functional coefficients by using the proposed algorithm and examine the properties of iterative multistep prediction. By detailed investigation and comparing our results with published researches, we find that the LFAR model can effectively fit nonlinear characteristics of chaotic time series by using simple structure and has excellent performance for multistep forecasting.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The project was supported by Natural Science Foundation Project of China (Grant no. 11471060), Fundamental and Advanced Research Project of CQCSTC of China (Grant no. cstc2014jcyjA40003), and Scientific and Technological Research Program of Chongqing Municipal Education Commission of China (Grant no. KJ130818).