Forecasting Time Series Movement Direction with Hybrid Methodology

Waeto, Salwa; Chuarkham, Khanchit; Intarasit, Arthit

doi:https://doi.org/10.1155/2017/3174305

Journal of Probability and Statistics

On this page

Abstract Introduction Background Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 3174305 | https://doi.org/10.1155/2017/3174305

Forecasting Time Series Movement Direction with Hybrid Methodology

Salwa Waeto,^1,2Khanchit Chuarkham,³and Arthit Intarasit^1,2

Academic Editor: Dejian Lai

Received30 Nov 2016

Revised26 Mar 2017

Accepted04 Apr 2017

Published31 Jul 2017

Abstract

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.

1. Introduction

Time series modeling and forecasting are a challenge for describing dynamic phenomena and pattern behavior of the time series. In recent years, the issue of accurate Thailand’s south insurgency trends has been receiving more attention. There are many research papers that studied the unrest in southern Thailand. According to the database of Deep South Watch [1], Jitpiromsri and Mccargo [2] and Jitpiromsri [3] reported the trends of Thailand’s south insurgency using diagram for comparing the monthly number of the unrest incidents. By applying a polynomial least-square regression, they provided the forecasting model for describing the unrest incidents in the south of Thailand. This polynomial is not indeed fitting the monthly number of the unrest incidents as well.

In this study, we would like to identify patterns and trends of Thailand’s south insurgency and to evaluate the accuracy of model for modeling and forecasting. By doing this, we use the traditional regression models such as Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA). These models are also called the Box-Jenkins models.

In general, time series data of Thailand’s south insurgency can be categorized as nonstationary time by using Box-Jenkins methodology. Then an estimated model of time series data of Thailand’s south insurgency can be obtained by support vector regression (SVR). We aim to combine ARIMA and SVR for making an adequately estimated model in order to forecast time series of Thailand’s south insurgency.

This paper is organized as follows. Section 2 provides some backgrounds of mathematical theories related to time series modeling and forecasting and SVR. The detail of proposed hybrid model is explained in Section 3. Section 4 gives experimental results obtaining the proposed hybrid model with the first difference in time series of Thailand’s south insurgency. Finally, the main conclusions are summarized in Section 5.

2. Background and Mathematical Theory

2.1. Autoregressive Integrated Moving Average Modeling

Three basic methods for forecasting time series are naïve model, exponential smoothing model, and ARIMA model. The first two models relate to a random walk as the formulation of the model. In this section, ARIMA model will be reviewed.

An autoregressive model of order abbreviated as AR() model iswhere is stationary, are constants , and is a white noise series with zero mean and variance . AR() model of (1) predicts the current value by the past function which explains as a linear combination of .

The Moving Average model of order abbreviated as MA() model iswhere is stationary, are constants , and is a Gaussian white noise series with mean zero. MA() model of (2) explains the current value by a linear combination of the white noise .

Autoregressive Moving Average model abbreviated as ARMA() model developed by Box and Jenkins [4] is defined by the combined autoregressive and the Moving Average model. It has the form

According to the original Box-Jenkins methodology, an integrated process is the stationary process obtained by differenced a nonstationary process. The stationary ARMA(,) process after being differenced times is denoted by ARIMA():where denoted th difference time series [5]. These models are as foundation model for time series forecasting.

2.2. Box-Jenkins Methodology

Plots of autocorrelation function (acf) and partial autocorrelation function (pacf) are the main tools in order to identify parameters for AR, MA, ARMA, and ARIMA models. AR() is used to obtain an estimated model for time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to and then will die down immediately.

Opposite to AR(), MA() is used to obtain an estimated model of time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to and then will die down immediately.

A mixed process ARMA() is suggested when either the acf or the pacf exits tend to show spike for lags up, respectively, to and and then die down quickly, either by an exponential decay or by a damped sine wave. Proceeding diagnostic checking to identify and for the mixed process ARMA() which is able to fit to times series is the best performance [6].

This identification as described in this section will be important to diagnose a model of our study.

2.3. Hybrid Models

In recent years, the forecasting model used in the literature can be classified into three categories: statistical models, artificial intelligence model (AI), and hybrid model.

Statistical models are known as time series models including naïve model, AR model, MA model, ARIMA model, exponential smoothing, and generalized autoregressive conditional heteroskedasticity (GARCH) volatility which aim to utilize time series analysis to identify the pattern of time series and provide the future value based on the obtained pattern.

ARIMA model is known as Box-Jenkins model [4] which includes AR and MA models identified by Box-Jenkins methodology. These models are based on the assumption that the time series under study are stationary and linear which means that the relationship between the input and output series is linear.

AI models are the second kinds of forecast time series, practically artificial neural networks (ANNs), genetic algorithm (GA), and supported vector machine (SVM). AI models can capture nonlinear pattern and improved forecast performance.

Many of the literatures introduce a hybrid model in order to capture the linear and nonlinear characteristics in time series. Wang et al. [7] reported that using a statistical model alone or using an AI model alone are not adequate in making forecasts for stock price time series.

2.4. Hybrid Methodology

A hybrid model is described by a combination of models with mixed methodology for formulation. Many literatures suggested that time series consists of linear and nonlinear as in the form

An estimated model of (5) is formulated as follows: using linear statistic model to obtain an estimated model of linear component denoted by and after that modeling the residual which contains only the nonlinear relationship to obtain an estimated model of nonlinear component denoted by .

Zhang [8] utilized the hybrid model by introducing the estimated model of (5) in the form , where is prescribed by ARIMA model and is prescribed by feedforward neural networks model. Modified Zhang’s hybrid approach with estimated by support vector machine (SVM) model can be found in many literatures, for example, De Oliveira and Ludermir [9], while Aladag et al. [10] estimated by Elman’s recurrent neural networks (ERNN) model and applied to Canadian Lynx data.

2.5. Supported Vector Regression

Let the dot product pace be our data universe with vectors as objects. Let be a sample set such that . Let be the target function. Let be the training set.

The regression problem is to find the best approximate model for the true underlying function mapping input to output by using such that .

The regression problem is classified as linear or nonlinear type. For the linear regression model, the best approximate model can be obtained from the set of possible functions with the following set of specifications:where is a weight vector and is a constant.

Generally, in order to describe nonlinear relationship between input and output, the SVR allied transform the nonlinear regression problem in the lower dimension input space into a linear regression problem in a high dimension feature space . In the new space , a linear model is formulated, which represents a nonlinear model in the original space:where denotes the dot product in . Linear SVR model in (6) is obtained from (7) by using the identity function .

Performing SVR to fit linear regression to the training data by estimate and in (7) as minimization of the following regularized function:where both and are user-given parameters and is quadratic -insensitive loss function defined by

The following two propositions related to the formulation of an estimated model. These propositions are modified from [11, 12] for our study.

Proposition 1. Given a regression training set the optimal support vector regression model is computed by , where the parameters and solved the following optimization problem: hold with .

The constant is called the penalty constant which is trade-off between margin maximization and the minimization of the slack variables.

Proposition 2. Given a regression training set the optimal support vector regression model is computed by , whereand are the parameters solved by the following dual quadratic optimization problem:

The parameter is obtained by and which satisfied optimization (11). The parameter is solved as follows: obtain from (11) and substitute and in the constraint ; then solve for . Define as the average of .

The optimal regression model is obtained by substituting into and into where we have the following lemma.

Lemma 3. The optimal regression model iswhere the coefficient is nonzero as support vector. The optimal regression model depends only on the support vectors.

3. Formulation of the Proposed Model

In this section, we want to formulate the proposed model. We begin by using the hybrid models that combine several models in order to reduce the risk of using an inappropriate model, obtain the results that are more accurate than the previous one, and improve overall forecasting performance.

Assume that is the under-study time series based on the assumption of linear and stationary time series. Then, we use the Box-Jenkins methodology to check behavior of . After this step, we can get a suitable model of AR() or MA() or ARIMA() in order to estimate . By fitting under-study time series with (2) or (3) or (4), we can get in the form . According to Lemma 3, perform SVR for under-study time series in order to evaluate from (12). This model is a function of its past values in the form with .

Consider a time set with and . There is only one single time point (necessarily from time to time ) precisely on time satisfying for all and satisfying for all . The proposed hybrid model is the estimated model defined by setting for all and vanishing otherwise and setting for all and vanishing otherwise. The proposed hybrid model can be extended to include two or more time intervals.

The under-study time series is initially modeled by the proposed hybrid model as follows:where is residuals of the time series model in the time that is as obtained from (13),where and are residuals of the under-study time series model in the time of the estimated model of , respectively, to .

4. Application of the Proposed Hybrid Model to Thailand’s South Insurgency Movement Direction Forecasting

4.1. Data Set

In this research, we are interested in studying the unrest in the four southern provinces of Thailand, particularly in Pattani, Yala, Narathiwat, and parts of Songkla. We consider the monthly number of deaths, injuries, and incidents in these provinces. At the time of working research, we can get the latest data from Deep South Watch (DSW) [1] and Deep South Coordination Center (DSCC) [13]. By using the proposed hybrid model, our aim is to formulate an estimate model for the trend of the number of deaths, injuries, and incidents in these regions.

Figure 1 illustrates a diagram of the number of unrest incidents in the four southern provinces from 2005 to 2015. This diagram presents a high frequency of the number of unrest incidents with a small fluctuation in the first period (2005 to 2008), a decreasing frequency of the number of unrest incidents in the middle period (2009 to 2012), and an increasing frequency of the number of unrest incidents from 2013 to 2014, the lowest frequency in 2015.

The data series of our study consists of 40 months of deaths, injuries, and incidents in the four southern provinces of Thailand from September 2012 to December 2015.

Figure 2 presents three graphs describing three data series of monthly number for deaths, injuries, and incidents. It shows that the graph of deaths is in the bottom for all periods of time, while the graph of injuries is in the middle between the graphs of deaths and incidents in almost all periods of time. Moreover, the graph of incidents is in the top in almost all periods of time.

From Figure 2, we can see that the number of incidents is not necessary to be equal to the sum of numbers of deaths and injuries. Sometimes, there is an unrest incident; no deaths or no injures occurs. Or there are high numbers of deaths and injuries in some incidents.

Monthly numbers of injures and incidents are apparently stationary. A candidate model for monthly number of two data series can be determined by plotting of acf and pacf. However, the monthly number of deaths exhibits a linear trend in the mean since it has a clear downward slope.

Figure 3 shows comparing of monthly number of deaths plotted against its first differenced series for monthly number of deaths (a) and plotting of acf (b) and pacf (c). The data series of injures plotted against its first differenced series is shown in Figure 4 and the data series of incidents plotted against its first differenced series is shown in Figure 5.

(a)

(b)

(c)

(a)

(b)

(c)

(a)

(b)

(c)

Plotting of the first differenced series (Figures 3, 4, and 5) shows that it looks like a stationary process, although plotting acf and pacf of series of deaths, injuries, and incidents cannot clearly identify parameter for constructing an estimated model formulated by the ARIMA model.

The acf for the first difference in monthly number of deaths tends to die down quickly whereas the pacf tends to show spike for lags up to 1 which ignores significant spikes in each plot when it is outside the limits. This suggests that the first difference in monthly number of deaths can be a model as an AR.

Similarly, the first differenced series of injures and incidents can be a model as an AR. After checking of residual in diagnosis stage, this indicates that ARMA is a candidate model for formulating an estimate model for the first difference in monthly number of deaths and injuries. MA is also a candidate model for the first difference in monthly number of incidents. With notation of ARIMA(), ARIMA is an estimated model for monthly number of deaths and injuries and ARIMA for monthly number of incidents.

Table 1 reports mean square error (mse) of three estimated models for monthly number of deaths, injuries, and incidents formulated by ARIMA, SVR, and hybrid. The mean square error of the formal model is calculated by choosing the best trajectory: trajectories simulated by ARIMA for each series.

Plotting a convergent of mean square error is calculated from monthly number and an estimated model with 2,500, 5,000 trajectories for monthly number of deaths, injuries, and incidents illustrated in Figure 6.

(a)

(b)

(c)

Setting = 0.0025, = 150000, = 3.25, and = 2.75 for SVR model and using ARIMA model in order to select from the best trajectory from trial trajectories, then both models are combined in order to formulate an estimated model for monthly number of deaths: , where, and where , , and , where it has specified a value.

Predictive performance of SVR-ARIMA hybrid model for monthly number of deaths and injuries, respectively, is shown in Figure 7.

In the same way, for monthly number of injures, setting = 0.025, = 350000, = 2.755, and = 0.00125 for SVR model and using ARIMA model in order to select from the best trajectory from trial trajectories, both models are combined in order to formulate an estimated model for monthly number of injuries: , where, and , where , , and , where it has specified a value.

Predictive performance of SVR-ARIMA hybrid model for difference monthly number of injuries is shown in Figure 8.

For monthly number of incidents, set = 0.025, = 200000, = 1.555, and = 0.725 for SVR model and use ARIMA model to select from the best trajectory from trial trajectories. Then these two models are combined in order to formulate an estimated model for monthly number of incidents: , where, and , where , , and , where it has specified a value.

Predictive performance of SVR-ARIMA hybrid model for monthly number of incidents is shown in Figure 9.

5. Conclusions

The hybrid SVR-ARIMA model has been investigated to formulate time series model of monthly number of Thailand’s south insurgency in this study. In particular, we consider the first difference in monthly number of deaths, injuries, and incidents in Pattani, Yala, Narathiwat, and Songkla provinces in 40 months from September 2012 to December 2015. According to the hybrid methodology, the SVR-ARIMA() model is obtained by combining ARIMA() and SVR model. Plotting of autocorrelation and partial autocorrelation indicates that the first difference in monthly number of deaths, injuries, and incidents is linear and stationary.

The test results of the estimated model are obtained from the proposed hybrid model and compared with the estimated model of the AR(), MA(), ARIMA(), and SVM models. This presents the fact that the proposed hybrid model performs better than the remaining models. For time series of Thailand’s south insurgency, SVR-ARIMA is the estimated model for monthly number of deaths and injuries and SVR-ARIMA is the estimated model for monthly number of incidents. In particular, SVR-ARIMA consists of two components: the first component uses the SVR model in order to formulate the estimated model for historical data and the second component uses the ARIMA model in order to formulate the estimated model for the unseen value in the short future.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the Deep South Coordination Center (DSCC) and Deep South Watch (DSW) for providing the data. This research was supported by grant funds from the Centre of Excellence in Mathematics, the Commission on Higher Education, Thailand.

References

Deep South Watch (DSW). Deep South Incident Database for the Thailand. http://www.deepsouthwatch.org/dsid.
S. Jitpiromsri and D. Mccargo, “The Southern Thai Conflict Six Years On: Insurgency, Not Just Crime,” Contemporary Southeast Asia, vol. 32, no. 2, pp. 156–183, 2010.
View at: Publisher Site | Google Scholar
S. Jitpiromsri, An Inconvenient Truth about the Deep South Violent Conflict: A Decade of Chaotic, Constrained Realities and Uncertain Resolution, Center for Conflict Studies and Cultural Diversity (CSCD), Deep South Watch (DSW), Prince of Songkla University, 2010.
G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, Calif, USA, 1976.
View at: MathSciNet
G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice Hall, Englewood Cliffs, NJ, USA, 3rd edition, 1994.
View at: MathSciNet
D. Asteriou and S. G. Hall, Applied Econometrics, Palgrave Macmillan, China, 2nd edition.
J.-J. Wang, J.-Z. Wang, Z.-G. Zhang, and S.-P. Guo, “Stock index forecasting based on a hybrid model,” Omega, vol. 40, no. 6, pp. 758–766, 2012.
View at: Publisher Site | Google Scholar
G. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003.
View at: Publisher Site | Google Scholar
J. F. L. De Oliveira and T. B. Ludermir, “A hybrid evolutionary system for parameter optimization and lag selection in time series forecasting,” in Proceedings of the 3rd Brazilian Conference on Intelligent Systems, BRACIS 2014, pp. 73–78, Sao Paulo, Brazil, October 2014.
View at: Publisher Site | Google Scholar
C. H. Aladag, E. Egrioglu, and C. Kadilar, “Forecasting nonlinear time series with a hybrid methodology,” Applied Mathematics Letters, vol. 22, no. 9, pp. 1467–1470, 2009.
View at: Publisher Site | Google Scholar
L. Hamal, KnowledgeDiscovery with Support Vector Machines, John Wiley & Sons, Hoboken, NJ, USA, 2009.
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, UK, 2000.
View at: Publisher Site
Deep South Coordination Center (DSCC). Database of the violent event and victims in southern Thailand. http://dsrd.pn.psu.ac.th/webnew/index.php/database.html.

Copyright

Copyright © 2017 Salwa Waeto et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2373

Downloads

1238

Citations