Mathematical Problems in Engineering

Volume 2015, Article ID 484093, 13 pages

http://dx.doi.org/10.1155/2015/484093

## Online Sequential Prediction for Nonstationary Time Series with New Weight-Setting Strategy Using Extreme Learning Machine

^{1}College of Computer and Information Engineering, Henan Normal University, Henan, Xinxiang 453007, China^{2}Management Institute, Xinxiang Medical University, Henan, Xinxiang 453003, China

Received 21 August 2014; Accepted 12 October 2014

Academic Editor: Amaury Lendasse

Copyright © 2015 Wentao Mao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Accurate and fast prediction of nonstationary time series is challenging and of great interest in both practical and academic areas. In this paper, an online sequential extreme learning machine with new weighted strategy is proposed for nonstationary time series prediction. First, a new leave-one-out (LOO) cross-validation error estimation for online sequential data is proposed based on inversion of block matrix. Second, a new weighted strategy based on the proposed LOO error estimation is proposed. This strategy ranks the samples’ importance by means of the LOO error of each new added sample and then assigns various weights. Performance comparisons of the proposed method with other existing algorithms are presented based on chaotic and real-world nonstationary time series data. The results show that the proposed method outperforms the classical ELM and OS-ELM in terms of generalization performance and numerical stability.

#### 1. Introduction

Time series prediction is generally playing an important role in many engineering fields, for example, dynamic mechanics, weather diagnostics, and so on. The key goal of time series prediction is to mine the inner regular patterns in time data in order to predict future data effectively [1]. Many traditional methods such as AR, ARMA, and ARIMA are well applied to solving stationary time series prediction. However, in practical applications, time series is almost nonstationary, which restricts the stationary methods above. From Takens’ phase space delay reconstructing theory [2], this kind of data generally needs to reconstruct the phase space via delay coordinate at first. For example, in chaotic time series the -dimensional vector is defined as follows:where is embedding dimension and is delay constant. The prediction model can be described as , where is a nonlinear map. From this reconstruction, the time correlation is transformed to spatial correlation. Then support vector machines (SVMs) [3, 4], neural networks (NNs) [5, 6], and other machine learning methods [7, 8] are successfully introduced to approximate the spatial correlation in nonstationary time series data.

Generally speaking, there are two main challenges for predicting nonstationary time series effectively. One is how to choose a proper baseline algorithm which should be computationally inexpensive and accurate enough. Another is how to distinguish the importance of different samples in time series. Different from SVMs and NNs, extreme learning machine (ELM), introduced by Huang et al. [9], has shown its very high learning speed and good generalization performance in solving many problems of regression estimate and pattern recognition [10, 11]. As a sequential modification of ELM, online sequential ELM (OS-ELM) proposed by Liang et al. [12] can learn data one-by-one or chunk-by-chunk. In many applications such as time-series forecasting, OS-ELMs also show good generalization at extremely fast learning speed. Therefore, OS-ELM is a proper solution for the first challenge. Many researches were devoted to solve the second challenge. As recent data usually carry more important information than the distant past data, a typical and effective method is weight-setting. Lin and Wang [13] held the first sample in time series with the lowest importance while the most recent sample with the highest importance and then assigned fuzzy memberships to every sample. Tay and Cao [14] used exponential function to calculate every sample’s importance in financial time series prediction. Very different from these stationary weight-setting strategy, Mao et al. [15] established a heurist algorithm to dynamically choose the optimal weights. Bao et al. [16] solved this problem from multi-input multioutput perspective. He regarded the time samples as multiple outputs in a time slot, and utilized multidimensional SVM to establish model. Considering ELM, Wang and Han [17] introduced kernel trick on OS-ELM for nonstationary time series. Its essential idea is to transform spatial space for better approximation. Grigorievskiy et al. [18] used optimally-pruned ELM to tackle long-term time series prediction and obtained more comparable results than with SVM.

However, although ELM-based methods mentioned above work well in time series prediction [19], it still does not yet successfully solve the second challenge, that is, to distinguish different samples’ significance. Specifically speaking, as a sample is added sequentially, it does not seem clear whether this sample is the most important or is even the newest. In this scenario, the inner structure hidden in time series data will determine the samples’ significance, especially in nonstationary setting. In other words, it could not guarantee the new added sample to be most valuable for prediction. Therefore, to solve this problem, this paper firstly develops a new leave-one-out (LOO) cross-validation error estimation for OS-ELM aiming at time series prediction. Based on inversion of block matrix, this LOO estimation is fast enough for time series data. As proved by many theoretical works [20], LOO error using PRESS statistics is approximately unbiased and has been successfully applied to ELM with Tikhonov regularization [21]. To our best knowledge, this LOO error estimation is the first attempt to evaluate the generalization performance of OS-ELM on time series data. Moreover, this paper utilizes this LOO error estimation of each new added sample to measure its importance. Obeying this weight-setting strategy, this paper then proposes a new weighted learning method for OS-ELM. The short version of this paper has been published in the proceedings of 5th International Conference on Extreme Learning Machine (ELM 2014). Experimental results on chaotic and real-life time series data demonstrate the proposed method outperforms the traditional ELMs in generalization performance and numerical stable.

The paper is organized as follows. In Section 2, a brief review on OS-ELM and LOO cross-validation estimation is provided. In Section 3, we describe the LOO error estimation and the weighted learning algorithm of OS-ELM on time series data. Section 4 is devoted to computer experiments on two different types of time series data sets, followed by a conclusion of the paper in Section 5.

#### 2. Brief Review

As the theoretical foundations of ELM, [22] studied the learning performance of SLFN on small-size data set and found that SLFN with at most hidden neurons can learn distinct samples with zero errors by adopting any bounded nonlinear activation function. Then, based on this concept, Huang et al. [9] pointed out that ELM can analytically determine the output weights by a simple matrix inversion procedure as soon as the input weights and hidden layer biases are generated randomly and then obtain good generalization performance with very high learning speed. Here a brief summary of ELM is provided.

Given a set of i.i.d. training samples , standard SLFNs with hidden nodes are mathematically formulated as [9]:where is activation function, is input weight vector connecting input nodes and the hidden node, is the output weight vector connecting output nodes and the hidden node, and is bias of the hidden node. Huang et al. [9] has rigorously proved that, then, for arbitrary distinct samples and any randomly chosen from according to any continuous probability distribution, the hidden layer output matrix of a standard SLFN with hidden nodes and is invertible and with probability one if the activation function is infinitely differentiable in any interval. Then, given , training a SLFN equals finding a least-squares solution of the following equation [9]:whereConsidering most cases in which , cannot be computed through the direct matrix inversion. Therefore, the smallest norm least-squares solution of (3) is calculated as follows:where is the Moore-Penrose generalized inverse of matrix . Based on the analysis above, Huang et al. [9] proposed ELM whose framework can be stated as follows.

*Step 1*. Randomly generate input weight and bias , .

*Step 2*. Compute the hidden layer output matrix .

*Step 3*. Compute the output weight .

Therefore, the output of SLFN can be calculated by and :

Like ELM, all the hidden node parameters in OS-ELM are randomly generated, and the output weights are analytically determined based on the sequentially arrived data. OS-ELM process is divided into two steps: initialization phase and sequential learning phase [12].

*Step 1. *Initialization phase: choose a small chunk of initial training data, where .(1)Randomly generate the input weight and bias , . Calculate the initial hidden layer output matrix :(2)Calculate the output weight vector:where , .(3)Set .

*Step 2. *Sequential learning phase.(1)Learn the th training data: .(2)Calculate the partial hidden layer output matrix:Set .(3)Calculate the output weight vector:(4)Set . Go to Step 2.

The generalization ability of ELM has been analyzed by many researchers. Lan et al. [23] added a refinement stage that used leave-one-out (LOO) error to evaluate the neurons significance in each backward step. Feng et al. [24] presented a fast LOO error estimation for regularized ELM. From the incremental learning point of view, Feng et al. [24] proposed an error minimized extreme learning machine which measured the residual error caused by adding a new added hidden node in an incremental manner. We highly recommend the following work. As for ELM, Liu et al. [25] derived a fast LOO error estimation of ELM. The generalization error in th LOO iteration can be expressed as follows:where means the th element, is hidden layer matrix, and means the row about the sample in .

Liu et al. [25] have shown that the proposed algorithm can accurately calculate the LOO error and can avoid the times observable model training process of the original cross-validation method. By the simulation experiment of artificial and real data sets, it has been verified that the LOO cross-validation algorithm based on ELM is efficient and has good generalization performance.

#### 3. OS-ELM with LOO Weighted Strategy

As shown above, in the training process of the classic OS-ELM algorithm, all samples are equally treated. As long as a new sample is arriving, the network weight will be updated. This rigid weight updating mechanism lacks adjustment flexibility according to the actual situation. Moreover, it tends to increase the unnecessary computation.

To improve the generalization ability of OS-ELM while maintaining model’s simplicity, this paper improves this rigid weight updating mechanism of traditional OS-ELM effectively via adopting dynamic weighted strategy. This strategy determines each sample’s importance according to its LOO error estimation in online scenario. Consequently, a new OS-ELM based on online LOO cross-validation weight-setting strategy (LW-OSELM) is proposed.

##### 3.1. LOO Error Estimation of ELM

As discussed in Section 2, the fast LOO error estimation of ELM proposed by Feng et al. [24] derived that the generalization error in th LOO iteration can be expressed as follows:

Obviously, (12) works mainly on offline learning setting rather than online sequential scenario. The key reason is that cannot be updated in online stage. Considering the sequential reaching of the sample in online setting, (12) can be extended to online sequential scenario, and it provides a channel to calculate the LOO error of each sample. The key is calculating from (12) in online manner. However, the time complexity of matrix inversion is , where is the number of samples. Therefore, the modeling time will be significantly increased along with the number of training patterns. To avoid complex calculation and make the established model simple, we adopt the idea of block matrix inversion, which transforms the complex calculation into linear operation, for decreasing the computation greatly.

As pointed out by many theoretical researches, LOO error is almost unbiased estimation of true generalization performance. Once a sample’s LOO error is smaller, this sample’s contribution in the decision model is greater. In order to highlight the samples’ contribution and ensure the generalization of models, we set the corresponding weights of each sample according to the value of LOO error in the process of online. At the same time, to ensure the simplicity of the model, the oldest sample, which has the furthest distance from the current moment, is eliminated. Namely, the samples cost is zero. To avoid complex calculation and make the established model simple, we follow the idea of block matrix inversion [26], which transforms the complex calculation into linear operation, for decreasing the computation greatly in online learning stage. Thus, a new kind of extreme learning machine based on online leave-one-out cross-validation is put forward, as in the following sections.

##### 3.2. Initial Stage of Training

Suppose there are training samples . The hidden layer output matrix is , and the output vector is , calculating the output weight vector:whereLet ; (14) can be rewritten as .

##### 3.3. Add New Sample

Add the new arrived sample into training set. The output vector becomes , and the hidden layer matrix becomes . Then we haveLet ; thenbecauseFor (17), according to Block matrix inversion, we havewhere , . So, can be calculated based on , which reduces computational cost largely. Then we have by substituting (18) into (16).

##### 3.4. Calculate the LOO Error

Let set up the online LOO model; then the LOO error in th LOO iteration can be expressed as follows:Then we can obtain the corresponding LOO error, , of each sample from (19). According to the value of , where , we set the relevant weight of each sample, where . Note that the smaller is, the bigger is. To emphasize the newest sample and make the decision model simple, we reset the weight of the newest sample as . This paper defines as 1.02. And we set the weight of the oldest sample as zero; namely, we set its contribution to the model as zero.

##### 3.5. Weighted Training

After adding the new sample , we set the weight of oldest sample as zero, namely, excluding this sample. After excluding , the output vector becomes , and the hidden matrix becomes . Then we haveLet ; thenFrom (21), contains two parts: and . Because the calculation of involves matrix inversion, we only set weight on in order to avoid the huge computational cost in calculating LOO error. Then the hidden matrixbecomes

And becomeswhere are the corresponding weights , , and , becauseFrom (25), there is a relationship between and . So can be calculated on the basis of to simplify the calculation.

Assume that can be partitioned and expressed as follows:where , , and .

As in (25), let , ; then (25) is equivalent toBy the definition of matrix inversion, ; namely,Through the block matrix multiplication, we haveCalculating (29), we haveThus, we have by substituting (24) and (30) into (21).

Then we can update the network weights according to the following equation:

##### 3.6. Algorithm

The algorithm of LW-OSELM can be described into two steps.

*Step 1 (The initial stage of training). * For the initial training sample set , calculate the network weights based on the training samples by the following equation:where is the hidden layer matrix based on new training set and .

Set .

*Step 2 (Online learning stage). * Set the th arrived sample as . Now the output vector becomes , and the hidden layer matrix becomes . Calculate :where Calculate the LOO error of each sample by the following equation:Then each sample is setting the corresponding weight according to the value of , where , and the smaller the is, the bigger the is. In similar way, let ; .

Set the oldest samples weight . Now the output vector becomes , and the hidden layer matrix becomes Calculate by the following equation:Make represented as a new block:In (37), considerThen substitute (39) into (37) to get .

Use (40) to update the network weight: Let , , and . Let , and go to step .

For better understanding, here we provide a flow chart of the proposed algorithm, shown as in Figure 1.