Journal of Probability and Statistics

Volume 2017, Article ID 3174305, 8 pages

https://doi.org/10.1155/2017/3174305

## Forecasting Time Series Movement Direction with Hybrid Methodology

^{1}Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of Songkla University, Pattani Campus, Pattani 94000, Thailand^{2}Centre of Excellence in Mathematics, Commission on Higher Education, Ratchathewi, Bangkok 10400, Thailand^{3}Faculty of Commerce and Management, Prince of Songkla University, Trang Campus, Trang 92000, Thailand

Correspondence should be addressed to Arthit Intarasit; moc.liamg@tisaratni.a

Received 30 November 2016; Revised 26 March 2017; Accepted 4 April 2017; Published 31 July 2017

Academic Editor: Dejian Lai

Copyright © 2017 Salwa Waeto et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Forecasting the tendencies of time series is a challenging task which gives better understanding. The purpose of this paper is to present the hybrid model of support vector regression associated with Autoregressive Integrated Moving Average which is formulated by hybrid methodology. The proposed model is more convenient for practical usage. The tendencies modeling of time series for Thailand’s south insurgency is of interest in this research article. The empirical results using the time series of monthly number of deaths, injuries, and incidents for Thailand’s south insurgency indicate that the proposed hybrid model is an effective way to construct an estimated hybrid model which is better than the classical time series model or support vector regression. The best forecast accuracy is performed by using mean square error.

#### 1. Introduction

Time series modeling and forecasting are a challenge for describing dynamic phenomena and pattern behavior of the time series. In recent years, the issue of accurate Thailand’s south insurgency trends has been receiving more attention. There are many research papers that studied the unrest in southern Thailand. According to the database of Deep South Watch [1], Jitpiromsri and Mccargo [2] and Jitpiromsri [3] reported the trends of Thailand’s south insurgency using diagram for comparing the monthly number of the unrest incidents. By applying a polynomial least-square regression, they provided the forecasting model for describing the unrest incidents in the south of Thailand. This polynomial is not indeed fitting the monthly number of the unrest incidents as well.

In this study, we would like to identify patterns and trends of Thailand’s south insurgency and to evaluate the accuracy of model for modeling and forecasting. By doing this, we use the traditional regression models such as Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA). These models are also called the Box-Jenkins models.

In general, time series data of Thailand’s south insurgency can be categorized as nonstationary time by using Box-Jenkins methodology. Then an estimated model of time series data of Thailand’s south insurgency can be obtained by support vector regression (SVR). We aim to combine ARIMA and SVR for making an adequately estimated model in order to forecast time series of Thailand’s south insurgency.

This paper is organized as follows. Section 2 provides some backgrounds of mathematical theories related to time series modeling and forecasting and SVR. The detail of proposed hybrid model is explained in Section 3. Section 4 gives experimental results obtaining the proposed hybrid model with the first difference in time series of Thailand’s south insurgency. Finally, the main conclusions are summarized in Section 5.

#### 2. Background and Mathematical Theory

##### 2.1. Autoregressive Integrated Moving Average Modeling

Three basic methods for forecasting time series are naïve model, exponential smoothing model, and ARIMA model. The first two models relate to a random walk as the formulation of the model. In this section, ARIMA model will be reviewed.

An autoregressive model of order abbreviated as AR() model iswhere is stationary, are constants , and is a white noise series with zero mean and variance . AR() model of (1) predicts the current value by the past function which explains as a linear combination of .

The Moving Average model of order abbreviated as MA() model iswhere is stationary, are constants , and is a Gaussian white noise series with mean zero. MA() model of (2) explains the current value by a linear combination of the white noise .

Autoregressive Moving Average model abbreviated as ARMA() model developed by Box and Jenkins [4] is defined by the combined autoregressive and the Moving Average model. It has the form

According to the original Box-Jenkins methodology, an integrated process is the stationary process obtained by differenced a nonstationary process. The stationary ARMA(,) process after being differenced times is denoted by ARIMA():where denoted th difference time series [5]. These models are as foundation model for time series forecasting.

##### 2.2. Box-Jenkins Methodology

Plots of autocorrelation function (acf) and partial autocorrelation function (pacf) are the main tools in order to identify parameters for AR, MA, ARMA, and ARIMA models. AR() is used to obtain an estimated model for time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to and then will die down immediately.

Opposite to AR(), MA() is used to obtain an estimated model of time series when the acf exhibits tendency to die down quickly, either by an exponential decay or by a damped sine wave whereas the pacf exhibits tendency to show spike (significant autocorrelation) for lags up to and then will die down immediately.

A mixed process ARMA() is suggested when either the acf or the pacf exits tend to show spike for lags up, respectively, to and and then die down quickly, either by an exponential decay or by a damped sine wave. Proceeding diagnostic checking to identify and for the mixed process ARMA() which is able to fit to times series is the best performance [6].

This identification as described in this section will be important to diagnose a model of our study.

##### 2.3. Hybrid Models

In recent years, the forecasting model used in the literature can be classified into three categories: statistical models, artificial intelligence model (AI), and hybrid model.

Statistical models are known as time series models including naïve model, AR model, MA model, ARIMA model, exponential smoothing, and generalized autoregressive conditional heteroskedasticity (GARCH) volatility which aim to utilize time series analysis to identify the pattern of time series and provide the future value based on the obtained pattern.

ARIMA model is known as Box-Jenkins model [4] which includes AR and MA models identified by Box-Jenkins methodology. These models are based on the assumption that the time series under study are stationary and linear which means that the relationship between the input and output series is linear.

AI models are the second kinds of forecast time series, practically artificial neural networks (ANNs), genetic algorithm (GA), and supported vector machine (SVM). AI models can capture nonlinear pattern and improved forecast performance.

Many of the literatures introduce a hybrid model in order to capture the linear and nonlinear characteristics in time series. Wang et al. [7] reported that using a statistical model alone or using an AI model alone are not adequate in making forecasts for stock price time series.

##### 2.4. Hybrid Methodology

A hybrid model is described by a combination of models with mixed methodology for formulation. Many literatures suggested that time series consists of linear and nonlinear as in the form

An estimated model of (5) is formulated as follows: using linear statistic model to obtain an estimated model of linear component denoted by and after that modeling the residual which contains only the nonlinear relationship to obtain an estimated model of nonlinear component denoted by .

Zhang [8] utilized the hybrid model by introducing the estimated model of (5) in the form , where is prescribed by ARIMA model and is prescribed by feedforward neural networks model. Modified Zhang’s hybrid approach with estimated by support vector machine (SVM) model can be found in many literatures, for example, De Oliveira and Ludermir [9], while Aladag et al. [10] estimated by Elman’s recurrent neural networks (ERNN) model and applied to Canadian Lynx data.

##### 2.5. Supported Vector Regression

Let the dot product pace be our data universe with vectors as objects. Let be a sample set such that . Let be the target function. Let be the training set.

The regression problem is to find the best approximate model for the true underlying function mapping input to output by using such that .

The regression problem is classified as linear or nonlinear type. For the linear regression model, the best approximate model can be obtained from the set of possible functions with the following set of specifications:where is a weight vector and is a constant.

Generally, in order to describe nonlinear relationship between input and output, the SVR allied transform the nonlinear regression problem in the lower dimension input space into a linear regression problem in a high dimension feature space . In the new space , a linear model is formulated, which represents a nonlinear model in the original space:where denotes the dot product in . Linear SVR model in (6) is obtained from (7) by using the identity function .

Performing SVR to fit linear regression to the training data by estimate and in (7) as minimization of the following regularized function:where both and are user-given parameters and is quadratic -insensitive loss function defined by

The following two propositions related to the formulation of an estimated model. These propositions are modified from [11, 12] for our study.

Proposition 1. *Given a regression training set the optimal support vector regression model is computed by , where the parameters and solved the following optimization problem: hold with .*

The constant is called the penalty constant which is trade-off between margin maximization and the minimization of the slack variables.

Proposition 2. *Given a regression training set the optimal support vector regression model is computed by , whereand are the parameters solved by the following dual quadratic optimization problem:*

The parameter is obtained by and which satisfied optimization (11). The parameter is solved as follows: obtain from (11) and substitute and in the constraint ; then solve for . Define as the average of .

The optimal regression model is obtained by substituting into and into where we have the following lemma.

Lemma 3. *The optimal regression model iswhere the coefficient is nonzero as support vector. The optimal regression model depends only on the support vectors.*

#### 3. Formulation of the Proposed Model

In this section, we want to formulate the proposed model. We begin by using the hybrid models that combine several models in order to reduce the risk of using an inappropriate model, obtain the results that are more accurate than the previous one, and improve overall forecasting performance.

Assume that is the under-study time series based on the assumption of linear and stationary time series. Then, we use the Box-Jenkins methodology to check behavior of . After this step, we can get a suitable model of AR() or MA() or ARIMA() in order to estimate . By fitting under-study time series with (2) or (3) or (4), we can get in the form . According to Lemma 3, perform SVR for under-study time series in order to evaluate from (12). This model is a function of its past values in the form with .

Consider a time set with and . There is only one single time point (necessarily from time to time ) precisely on time satisfying for all and satisfying for all . The proposed hybrid model is the estimated model defined by setting for all and vanishing otherwise and setting for all and vanishing otherwise. The proposed hybrid model can be extended to include two or more time intervals.

The under-study time series is initially modeled by the proposed hybrid model as follows:where is residuals of the time series model in the time that is as obtained from (13),where and are residuals of the under-study time series model in the time of the estimated model of , respectively, to .

#### 4. Application of the Proposed Hybrid Model to Thailand’s South Insurgency Movement Direction Forecasting

##### 4.1. Data Set

In this research, we are interested in studying the unrest in the four southern provinces of Thailand, particularly in Pattani, Yala, Narathiwat, and parts of Songkla. We consider the monthly number of deaths, injuries, and incidents in these provinces. At the time of working research, we can get the latest data from Deep South Watch (DSW) [1] and Deep South Coordination Center (DSCC) [13]. By using the proposed hybrid model, our aim is to formulate an estimate model for the trend of the number of deaths, injuries, and incidents in these regions.

Figure 1 illustrates a diagram of the number of unrest incidents in the four southern provinces from 2005 to 2015. This diagram presents a high frequency of the number of unrest incidents with a small fluctuation in the first period (2005 to 2008), a decreasing frequency of the number of unrest incidents in the middle period (2009 to 2012), and an increasing frequency of the number of unrest incidents from 2013 to 2014, the lowest frequency in 2015.