Computational Intelligence and Neuroscience

Volume 2017, Article ID 9478952, 9 pages

https://doi.org/10.1155/2017/9478952

## Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory

^{1}College of Command and Information System, PLA University of Science and Technology, Nanjing, Jiangsu 210007, China^{2}1st Department, Army Officer Academy of PLA, Hefei, Anhui 230031, China

Correspondence should be addressed to Zhisong Pan; moc.liamtoh@szptoh

Received 27 August 2017; Revised 23 November 2017; Accepted 3 December 2017; Published 17 December 2017

Academic Editor: Pedro Antonio Gutierrez

Copyright © 2017 Haimin Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM.

#### 1. Introduction

A time series is a sequence of real-valued signals that are measured at successive time intervals [1, 2]. Time series data occur naturally in many application areas such as economics, finance, environment, and medicine and often arrives in the form of streaming in many real-world systems. Time series prediction has been successfully used in a wide range of domains including speech analysis [3], noise cancelation [4], and stock market analysis [5, 6]. The traditional methods of time series prediction commonly use a potential model, for example, autoregressive moving average (ARMA) [7], autoregressive integrated moving average (ARIMA) [1], and vector autoregressive moving average (VARMA) [8], to mimic the data. However, these methods all need to deal with the whole dataset to identify the parameters of the model when facing new coming data, which is not suitable for large datasets and online time series prediction. To address this problem, online learning methods are explored to extract the underlying pattern representations from time series data in a sequential manner. Compared to traditional batch learning methods, online learning methods avoid expensive retraining cost when handling new coming data. Due to the efficiency and scalability, online learning methods including methods based on linear models [9], ensemble learning [10], and kernels [11] have been applied to time series prediction successfully.

Long short-term memory (LSTM) [12], a class of recurrent neural networks (RNNs) [13], is particularly designed for sequential data. LSTM has shown promising results for time series prediction. Its units consist of three gates: input gate, forget gate, and output gate. It is popular due to the ability of learning hidden long-term sequential dependencies, which actually helps in learning the underlying representations of time series. However, the time series data in real world often contains some outliers more or less especially in cyberattacks, which are commonly shown as anomalies in time series data monitoring some measurements of network traffic. Those outliers mislead the learning method in extracting the true representations of time series and reduce the performance of prediction.

In this paper, we propose an efficient online gradient learning method, which we call RoAdam (Robust Adam) for LSTM to predict time series in the presence of outliers. The method modifies Adam (Adaptive Moment Estimation) [14], a popular algorithm for training deep neural networks through tracking the relative prediction error of the loss function with a weighted average. Adam is based on standard stochastic gradient descent (SGD) method without considering the adverse effect of outliers. The learning rate of RoAdam is tuned adaptively according to the relative prediction error of the loss function. The large relative prediction error leads to a smaller effective learning rate. Likewise, a small error leads to a larger effective learning rate. The experiments show that our algorithm achieves the state-of-the-art performance of prediction.

The rest of this paper is organized as follows. Section 2 reviews related work. In Section 3, we introduce some preliminaries. Section 4 presents our algorithm in detail. In Section 5, we evaluate the performance of our proposed algorithm on both synthetic data and real time series. Finally, Section 6 concludes our work and discusses some future work.

#### 2. Related Work

In time series, a data point is identified as an outlier if it is significantly different from the behavior of the major points. Outlier detection for time series data has been studied for decades. The main work focuses on modeling time series in the presence of outliers. In statistics, several parametric models have been proposed for time series prediction. The point that deviated from the predicted value by the summary parametric model including ARMA [15], ARIMA [16, 17], and VARMA [18] is identified as an outlier. Vallis et al. [19] develop a novel statistical technique using robust statistical metrics including median, median absolute deviation, and piecewise approximation of the underlying long-term trend to detect outliers accurately. There also exist many machine learning models for time series prediction with outliers. The paper [20] proposes a generic and scalable framework for automated time series anomaly detection including two methods: plug-in method and decomposition-based method. The plug-in method applies a wide range of time series modeling and forecasting models to model the normal behavior of the time series. The decomposition-based method firstly decomposes a time series into three components: trend, seasonality, and noise and then captures the outliers through monitoring the noise component. The paper [21] gives a detailed survey on outlier detection.

LSTM has shown promising results for time series prediction. Lipton et al. uses LSTM to model varying length sequences and capture long range dependencies. The model can effectively recognize patterns in multivariate time series of clinical measurements [22]. Malhotra et al. use stacked LSTM networks for outliers detection in time series. A predictor is used to model the normal behavior and the resulting prediction errors are modeled as a multivariate Gaussian distribution, which is used to identify the abnormal behavior [23]. Chauhan and Vig also utilize the probability distribution of the prediction errors from the LSTM models to indicate the abnormal and normal behaviors in ECG time series [24]. These methods are not suitable for online time series prediction because they all need to train on time series without outliers to model the normal behavior in advance. In this paper, our online learning method for time series prediction is robust to outliers through adaptively tuning the learning rate of the stochastic gradient method to train LSTM.

#### 3. Preliminaries and Model

In this section, we formulate our problem to be resolved and introduce some knowledge about Adam, a popular algorithm for training LSTM.

##### 3.1. Online Time Series Prediction with LSTM

In the process of online time series prediction, the desirable model learns useful information from to give a prediction and then compare with to update itself, where is a time series, is the time series data point forecasted at time , and is the real value. LSTM is suitable for discovering dependence relationships between the time series data by using specialized gating and memory mechanisms.

We give the formal definition of a neuron of a LSTM layer as follows. The* j*th neuron of a LSTM layer at time* t*, consists of input gate , forget gate , and output gate and is updated through forgetting the partially existing memory and adding a new memory content . The expressions of , , , and are shown as follows:

Note that , , , , , and are the parameters of the th neuron of a LSTM layer at time* t*. is a logistic sigmoid function. , , and are diagonal matrices. and are the vectorization of and . The output of this neuron at time is expressed as

In our model of online time series prediction, we set a dense layer to map the outputs to the target prediction, which is formulated aswhere is the activation function of the dense layer, is the weights, is the bias, and is the vectorization of . The objection of our model at time is to update the parameters . The standard process iswhere is the learning rate and is the loss function.

##### 3.2. Adam

Adam is a method for efficient stochastic optimization, which is often used to train LSTM. It computes adaptive learning rates for individual parameters from estimates of the first moment and the second moment of the gradients, only requiring first-order gradients. Adam keeps an exponentially decaying average of the gradient and the squared gradient:where and initialized as zero are estimates of the first moment and the second moment and and are exponential decay rates for the moment estimates. We can find that and are biased towards zero, when and are close to 1. So Adam counteracts these biases through bias correction of and :The rule of updating parameters iswhere , , , and by default.

#### 4. Method

In this section, we introduce our online gradient learning method, which is called RoAdam (Robust Adam) to train long short-term memory (LSTM) for time series prediction in the presence of outliers. Our method does not directly detect the outliers and adaptively tunes the learning rate when facing a suspicious outlier.

In Algorithm 1, we provide the details of the RoAdam algorithm. The main difference between our algorithm and Adam is , a relative prediction error term of the loss function. The relative prediction error term indicates whether the point is an outlier. The larger value of means the current point is more suspicious to be an outlier. It is computed as , where and . and are the absolute prediction errors of and . In practice, a threshold is used to scheme to ensure the stability of relative prediction error term. and denote the lower and upper thresholds for . We let (1), if and (2) otherwise, which captures both increase and decrease of relative prediction error. Our settings consider different situations when the preceding point and current point are at different status. The details are listed in Table 1.