#### Abstract

It has been a hot and challenging topic to predict the chaotic time series in the medium-to-long term. We combine autoencoders and convolutional neural networks (AE-CNN) to capture the intrinsic certainty of chaotic time series. We utilize the transfer learning (TL) theory to improve the prediction performance in medium-to-long term. Thus, we develop a prediction scheme for chaotic time series-based AE-CNN and TL named AE-CNN-TL. Our experimental results show that the proposed AE-CNN-TL has much better prediction performance than any one of the following: AE-CNN, ARMA, and LSTM.

#### 1. Introduction

Chaos is the unity of extrinsic randomness and intrinsic certainty, which widely exists in social fields and natural systems. However, in many cases, we cannot represent chaotic systems by establishing analytical mathematical models but by observing the time series [1]. With the development of big data acquisition and research technology, we can obtain large-scale chaotic time series from a wide range of social fields and natural systems, which highlight the necessity of research on the prediction of chaotic time series. Of course, the prediction of chaotic systems has always been a hot and challenging topic in many fields. However, the acquisition of chaotic time series data usually accompanies errors, which make the prediction of chaotic time series more difficult.

The most significant characteristic of a chaotic system is its initial sensitivity; that is, a slight change in input will produce random and unpredictable difference after long enough evolution, which determines the difficulty in medium-to-long-term prediction for a chaotic system. Similarly, the error in chaotic time series may be amplified to an unacceptable degree due to its initial sensitivity, which will lead to the failure of prediction. However, the orbit of the chaotic system also has intrinsic certainty such as self-similarity, which indicates that a chaotic system can be predicted to a certain extent, even in the medium-to-long term by using fractal geometry and other related theories. But the self-similarity and other intrinsic certainty of a chaotic system are very difficult to be measured accurately. So far, there is no commonly accepted theory that can be employed to accurately measure the intrinsic certainty such as self-similarity and realize the medium-to-long-term prediction of a chaotic system. Inspired by Xia et al. [2], we try to overcome the following difficulties to effectively predict the medium-to-long-term time series: (1) how to make full use of the intrinsic certainty of chaotic time series? It needs us to propose or borrow a universal approximation framework [3], which has a powerful representation of deep learning and can extract high-order features from input information for medium-to-long-term prediction of chaotic time series; (2) how to extract intrinsic high-order features from a chaotic time series and transfer them to its coupled time series to strengthen its prediction performance.

To solve abovementioned difficulties, we will mainly implement the following contributions:(1)Employ a convolutional neural network (CNN) to extract high-order features of a chaotic time series for medium-to-long-term prediction. The CNN has a strong representation learning ability, so we can use it to classify the input information according to its hierarchical structure and extract high-order features from the input information.(2)Utilize the transfer learning (TL) theory to improve the medium-to-long-term prediction performance. We extract the intrinsic high-order features from the chaotic time series and use the transfer learning theory to overlay them with their high-order features to enhance its prediction performance.

The remainder of this article is organized as follows: in Section 2, we review the related work; in Section 3, we describe the proposed the AE-CNN-TL in detail; in Section 4, we implement some experiments and discuss the results; and in Section 5, we draw a conclusion and propose some future works.

#### 2. Related Work

##### 2.1. Classification of Prediction Methods for Chaotic Time Series

According to the step number of prediction, we can divide the prediction methods into the single-step prediction [4] and multistep prediction [5, 6]. In the single-step prediction, the model can be only used to predict the sequence values of the next step. In the multistep prediction, the model can be employed to predict the sequence values of the multistep, which sets higher requirements. There are two strategies of the multistep prediction: direct approach and iterative approach. Based on the direct approach, the prediction of chaotic time series can complete long-term prediction once by constructing a function mapping between input and multistep output. The idea of the direct approach is simple and easy to implement, but the direct approach needs more complex function mapping structure, which is a significant challenge. The iterative approach is to predict the time series value after only one step at a time and then take the predicted value as known and finally iterate to predict the next step. The iterative approach has low requirements for the model, but in the iterative process, the prediction error is also input into the model, which results in the error accumulation and the prediction accuracy decrease in medium-to-long term. The proposed AE-CNN-TL constructed in this article belongs to the direct approach.

##### 2.2. Prediction of Chaotic Time Series Based on Deep Learning of Direct Strategies

Taking accurate multistep predictions with a novel recurrent neural network (RNN), Min et al. [5] constructed a new scheme to implement long-term prediction. Yi et al. [7] introduced a deep neural network- (DNN-) based method to predict the air quality in medium term. Kuo and Huang [8] presented a DNN model (EPNet), which combines CNN and LSTM to predict the electricity price in medium term. Mujeeb et al. [9] built a deep LSTM model for short- and medium-term prediction. Shen et al. [10] set up a prediction model for time series named SeriesNet, which includes an LSTM and a dilated causal convolution network. Mudassir et al. [11] established the high-performance machine learning- (ANN, SANN, SVM, and LSTM) based classification and regression models to predict Bitcoin price in short-to-medium term. Hussain et al. [12] put forward a one-dimensional convolutional neural network (1D-CNN) and extreme learning machine (ELM) to predict the one-step-ahead stream-flow in short and medium terms. Chimmula and Zhang [13] constructed an LSTM network to predict COVID-19 cases. Livieris et al. [14] designed a CNN-LSTM model to predict gold prices. Zhang and Dong [15] provided a CRNN model consisting of CNN and RNN to predict temperature changes from historical data. Li [16] proposed the KLS method fusing Kalman filter, LSTM, and SVM to forecast the carbon emission. Combining multikernel convolutional layers and convolutional LSTM layers, Asadi and Regan [17] propose a deep learning framework for spatiotemporal forecasting problems.

#### 3. Proposal

As we know, we can employ an autoencoder (AE) to learn a representation (encoding) for a time series by training the network to ignore signal “noise,” i.e., denoising. Considering CNN’s advantages of shift invariant or space invariant, the CNN is most commonly applied to classify images, cluster images by similarity, and perform object recognition within scenes. In this article, we will employ CNN to extract the intrinsic high-order features including similarity from the chaotic time series to implement predictions. By combining the advantages of AE and CNN, we can use AE-CNN to predict a chaotic time series, as shown in Figure 1.

If there are a chaotic time series *Y* associated with *X*, we can extract the high-order intrinsic features from *Y* and utilize the transfer learning theory to overlay with the high-order intrinsic features from *X* to enhance the prediction performance. Thus, we can develop a prediction scheme for chaotic time series of medium-to-long-term-based AE-CNN and transfer learning, named AE-CNN-TL, as shown in Figure 2.

We assume that both time series and with length , where and .

From Figure 2, AE-CNN-TL consists of four processes: denoising, feature extracting, transferring learning, and forecasting. At the denoising stage, a typical AE is firstly used for data filtering so as to remove the noises from the chaotic time series. At the feature extracting stage, the denoised chaotic time series and original time series would be taken as the inputs of CNNs to extract deep representations. On the transferring learning stage, the deep representations from can be transferred to them from and overlapped together. At the forecasting stage, we would obtain the prediction of through a fully connected layer.

The optimization of AE-CNN-TL is to minimize the reconstruction error of AE and the training error of the whole model. At the denoising stage, the output of AE is an approximate copy of input. Therefore, we have to minimize the reconstruction error between input and output, which could maintain the direct significance of temperature. To obtain a sound model, it is necessary to minimize the training error of the whole model. We formulate the objective functions aswhere and are activation functions, and are weights, and and are biases.

It is necessary to obtain a sound model to minimize the training error of the whole model. We formulate the objective function of the whole model aswhere denotes the predicted value.

#### 4. Experiments and Results

##### 4.1. Data

In this article, we use the Beijing PM2.5 dataset (https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data). The hourly data span the period between January 1, 2010, and December 31, 2014. Each series in this dataset contains 43824 data points. After removing missing values, 41757 data points were finally used in the experiment. The time series we used is PM2.5 concentration () and temperature (°C). The descriptive statistics for PM2.5 concentration and temperature is shown in Table 1.

To obtain a desirable model, we divide the experimental data into three parts: 70% training dataset, 10% validation dataset, and 20% test dataset. The training dataset is to reach a sound model, the validation dataset is to further determine the parameters of the whole network, and the test dataset is to test the generalization ability of the model.

##### 4.2. Evaluation Index

The root mean square error (RMSE) is a widely used evaluation index for continuous type forecasting. Generally speaking, the smaller the RMSE value is, the better its prediction performance is. The RMSE is defined as follows:where denotes the length of test dataset.

##### 4.3. 0-1 Test for Chaos in the Beijing PM2.5 Dataset

According to the 0-1 test algorithm for chaos, we sample data points from the Beijing PM2.5 dataset after taking every 10^{th} data point and set a random number drawn from a uniform distribution on satisfying the following translation equations:where denotes the time series of PM2.5 or temperature.

Then, we calculate the following test value of chaos:where , , and the covariance is written with vectors and of length as follows:

Figure 3 shows plots of the test value vs. random number for the time series of temperature. implies that there is a chaotic attractor in the time series of temperature.

Figure 4 shows Brownian-like trajectories and denotes there is also a chaotic attractor in the time series of temperature. That is to say, the time series of temperature is chaotic.

Similar to Figure 3, Figure 5 also shows plots of the test value vs. random number for the PM2.5 concentration time series. also implies there is a chaotic attractor in the time series of PM2.5 concentration.

Similar to Figure 4, Figure 6 also shows Brownian-like trajectories and denotes there is also a chaotic attractor in the time series of PM2.5 concentration. That is to say, the time series of PM2.5 concentration is chaotic.

Figure 7 demonstrates the phase portrait of temperature vs. PM2.5 concentration, in which Brownian-like (unbounded) trajectories imply there exists a chaotic attractor.

In short, Figures 3–7 prove that there exist chaotic attractors in the Beijing PM2.5 dataset.

##### 4.4. Experimental Results

In this part, we test the prediction performance of the proposed AE-CNN-TL for chaotic time series.

To compare the prediction performance of AE-CNN and AE-CNN-TL, we firstly employ AE-CNN to predict PM2.5 concentration without TL from temperature, as shown in Figure 8. Then, we utilize AE-CNN-TL to predict PM2.5 concentration, as shown in Figure 9. In the same way, we firstly employ AE-CNN to predict temperature without TL from PM2.5 concentration, as shown in Figure 10. Then, we utilize AE-CNN-TL to predict temperature, as shown in Figure 11. Owing to the continuous value prediction, we employ RMSE to measure the prediction performance. The smaller the RMSE value is, the better the prediction performance is. And these corresponding RMSEs of prediction are shown in Table 2.

From Table 2, we can catch that the prediction accuracy of PM2.5 concentration is enhanced to a large extent with taking TL from temperature. And the prediction accuracy of temperature is also improved significantly with taking TL from PM2.5 concentration. Therefore, we can conclude that AE-CNN-TL is much better in the prediction performance than any one of the following: AE-CNN, ARMA, and LSTM.

#### 5. Conclusions

In this article, we develop a prediction scheme for chaotic time series of medium-to-long-term-based AE-CNN and TL, named AE-CNN-TL, which has much better prediction performance than any one of the following: AE-CNN, ARMA, and LSTM. This prediction scheme can be used in chaotic time series of medium-to-long term of many natural fields and social systems in the real world.

In fact, AE-CNN-TL can also be used to reveal the dynamic link between PM2.5 concentration and temperature. What is more, AE-CNN-TL can be used to explore the Granger causality [18] between PM2.5 concentration and predict PM2.5 concentration and temperature with a good performance. By means of experiments, we find a bidirectional Granger causality between PM2.5 concentration and temperature, which is consist with previous researches.

We know that the Beijing PM2.5 dataset originates from the nature field. In fact, we can try to employ the proposed AE-CNN-TL to predict a chaotic time series of medium-to-long term which comes from some artificial systems, such as the game system [19–21] and the financial system [22].

#### Data Availability

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

#### Acknowledgments

This work was supported by the Natural Science Foundation of Shandong Province (Grant no. ZR2016FM26) and the National Social Science Foundation of China (Grant no. 16FJY008).