Mathematical Problems in Engineering

Volume 2015, Article ID 231394, 7 pages

http://dx.doi.org/10.1155/2015/231394

## A Hybrid Least Square Support Vector Machine Model with Parameters Optimization for Stock Forecasting

^{1}International Business School, Shaanxi Normal University, Xian 710062, China^{2}Department of Management Sciences, City University of Hong Kong, Hong Kong^{3}School of Business, Tung Wah College, Hong Kong

Received 30 May 2014; Accepted 20 August 2014

Academic Editor: Shifei Ding

Copyright © 2015 Jian Chai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper proposes an EMD-LSSVM (empirical mode decomposition least squares support vector machine) model to analyze the CSI 300 index. A WD-LSSVM (wavelet denoising least squares support machine) is also proposed as a benchmark to compare with the performance of EMD-LSSVM. Since parameters selection is vital to the performance of the model, different optimization methods are used, including simplex, GS (grid search), PSO (particle swarm optimization), and GA (genetic algorithm). Experimental results show that the EMD-LSSVM model with GS algorithm outperforms other methods in predicting stock market movement direction.

#### 1. Introduction

Stock market is one of the most sophisticated and challenging financial markets since many factors affect its movement, including government policy, global economic situation, investors’ expectations, and even correlations with other markets [1]. References [2, 3] described financial time series as essentially noisy, dynamic, and deterministically chaotic data sequences. Hence, a precise prediction of stock index movement can help investors make decisions to take or shed positions in the stock market at the right time and make profits. Many works have been published by researchers to maximize investment profits and minimize risk. Therefore, predicting stock market is quite important and significant.

Neural networks have been successfully applied in forecasting of financial time series during the past two decades [4–6]. Neural networks are general function approximations which can approximate many nonlinear functions regardless of the properties of time series data [7]. Besides, neural networks are able to learn dynamic systems which make them a more powerful tool for studying financial time series compared with traditional models [8–10]. However, there are a couple of weaknesses when neural networks are used in forecasting financial time series. For instance, when the typical back-propagation neural network is applied, a huge number of parameters are required to be controlled for. This makes the solution unstable and causes overfitting. The overfitting problem results in poor performance and becomes a critical issue for researchers.

Accordingly, [11] proposed a support vector machine (SVM) model. According to [12–14], there are two advantages of using SVM rather than neural networks. One is that SVM has a better performance in terms of generalization. Unlike the empirical risk minimization principle in traditional neural networks, SVM reduces generalization error bounds based on the structural risk minimization principle. SVM seeks to achieve an optimal structure through finding out a balance between generalization errors and Vapnik-Chervonenkis (VC) confidence interval. Another advantage is that SVM prevents the model from getting stuck into local minima.

Since the introduction of SVM, it has been developed rapidly in the real world. There are mainly two ways for applying SVM: one is classification and the other is regression. For classification, [15] constructed a SVM based model to accurately evaluate the consumers’ credit score and solve classification problems. Also, SVM is widely used in the area of forecasting. Reference [16] used SVM to predict the direction of daily stock price in the Korea composite stock price index (KOSPI). More recently, [17] applies the Support Vector Regression to forecast the Nikki 225 opening index and TAIEX closing index after detecting and removing the noise by independent component analysis (ICA).

However, the performance of SVM mainly depends on the input data and is sensitive to parameters. Recent empirical studies have demonstrated that properties of the model performance are influenced by two aspects: low level of signal to noise ratio (SNR) and instability of model specification during the estimation process. For example, [18] investigates the hyperparameters selection for support vector machine with different noise distributions to compare the model performance. Moreover, [19] applied wavelet to denoise the bearing vibration signals by improving the SNR and then figure out the best model according to the performances of ANN and SVM.

To improve the classification and forecasting accuracy, several researchers including [20, 21] have proved that the combined classifying and forecasting models perform better than any individual model. Also, [22] showed that the ensemble empirical model decomposition (EEMD) can be integrated with extreme learning machine (ELM) to an effective forecasting model for computer products sales. In this paper, we propose a hybrid EMD-LSSVM (empirical mode decomposition least squares support vector machine) with different parameters optimization algorithms. The experimental results prove that the EMD-LSSVM model has a better performance than the WD-LSSVM (wavelet denoising least squares support vector machine) model. Firstly, we use the empirical mode decomposition and wavelet denoising algorithm to deal with the original input data. Secondly, parameters of SVM are optimized by different methods, including simplex, grid search (GS), particle swarm optimization (PSO), and genetic algorithm (GA). Results from empirical studies show that the hybrid model EMD-LSSVM with GS parameter optimization outperforms the other model.

#### 2. EMD-LSSVM Model and WD-LSSVM

##### 2.1. Empirical Mode Decomposition (EMD)

References [23, 24] proposed empirical mode decomposition (EMD) which decomposes data series into a number of intrinsic mode functions (IMFs). It was designed for nonstationary and nonlinear data sets. In order to apply EMD, time series data set must satisfy the following two conditions.(1)The sum of local maxima and local minima must equate to the total number of zero crossings or the difference between them is 1. In other words, for every local maxima and local minima, there must be one zero crossing following up.(2)The local average is zero, which means that mean value of the upper envelope (defined by local maxima) and lower envelope (defined by local minima) must be zero.

Thus, if a function is an IMF, it represents a signal symmetric to local mean zero. An IMF is a simple oscillatory mode which is more general than the simple harmonic function and the frequency and amplitude of the IMF can be variable. Then, data series can be decomposed by the following sifting procedure.(1)Find all local maxima and minima in . Then use the cubic spline line to connect all local maxima to generate upper envelope and connect all local minima to generate lower envelop .(2)According to the upper and lower envelopes obtained in Step , calculate the envelope mean : (3)Data series minus envelope mean gives the first component : (4)Check if satisfies the IMF requirements; if does not satisfy them, go back to Step and replace with to conduct the second sifting procedure; that is, . Repeat the sifting procedure times until the following stop criterion is satisfied: where is the stopping condition. Normally, it is set between 0.2 and 0.3. Then, we get the first IMF component; that is, .(5)Subtract first IMF component from data sets and get the residual .(6)Treat as the new data series and repeat Steps to . Then get the new residual . In this way, after repeating times, we get

When the residual becomes a monotonic function, the data sets cannot be decomposed anymore. The whole EMD is completed. The original date series can be described as the combination of IMF components and a mean trend ; that is,

In this way, the original data series can be decomposed into IMFs and a mean trend function. Then, we use the IMFs for instantaneous frequency analysis.

The traditional Fourier transform decomposes a data series into a number of sine or cosine waves for the analysis. However, the EMD technique decomposes the data series into several sinusoid-like signals with variable frequencies and a mean trend function. The EMD has several advantages. First, this method is relatively easy to understand and is also widely applied since it avoids complex mathematical algorithms. Secondly, EMD is suitable to deal with nonlinear and nonstationary data series. Thirdly, EMD is more suitable for analysing data series with trends such as weather and economic data. Finally, EMD is able to find the residual which reveals the data series trends [25–27].

##### 2.2. Wavelet Denoising Algorithm

While the traditional Fourier analysis can only remove noise of certain patterns over the entire time horizon, wavelet analysis can deal with multiscales and more detailed data and is more suitable for financial time series. Wavelets are continuous functions which satisfy the unit energy and admissibility condition in where is the Fourier transform of frequency. is the wavelet transform.

The continuous wavelet function can orthogonally transform the original data into subdata series in the wavelet domain. Consider where is the dilation parameter and is the translation parameter.

The wavelet synthesis rebuilds the original data series, guaranteed by the properties of orthogonal transformation in

In wavelet analysis, the denoising technique separates the data and noise from the original data sets by selecting a threshold. The raw data series are first decomposed into some data subsets. Then, based on a certain strategy of selecting the threshold, the boundary between noises and data is set. Depending on the boundary, smaller data points are eliminated and the remaining data are handled by setting certain thresholds. Finally, these denoised data sets are rebuilt from the decomposed data points [28].

##### 2.3. LSSVM in Function Estimation

This section shows the basic theory of the least squares support vector machine. The support vector methodology has been used mainly in two areas, that is, classification and function estimation. Considering regression in the set of function with given training data inputs and outputs , we apply to map from to . Notice that can be of infinite dimensional and is defined only implicitly. Also, vector can also be infinite dimensional. Thus, the optimization problem becomes

The constant defines the tolerance of deviations from the desired accuracy. It defines the weight of the regularization term empirical risk. The larger the is, the more important it is for the empirical risk to grow, compared with the regularization term. is called the tube size and represents the accuracy required in training data points.

By introducing Lagrange multipliers , we obtain the Lagrangian for this problem. Consider

The reason of introducing another Lagrange multiplier is that there are other slack variables . By maximizing the Lagrangian we obtain

Then we obtain the following dual problem:

Here we use the kernel function for . Then the function estimation becomes where are solutions of the above quadratic programming problem and is obtained from the complementarity of KKT conditions. It is obvious that the decision function is determined by the support vectors in which coefficients are not zero. In practice, a larger results in a smaller number of support vectors and thus the sparser of the solution. Also, the larger the is, the worse the accuracy of training points will be. Hence, can be applied to control the balance between closeness to training data and sparseness of the solution.

Kernel function can be obtained by seeking the function which satisfies Mercer’s condition. Here are some popular kernel functions [14, 29, 30]: linear: ; polynomial: , where is the degree of the polynomial kernel; RBF kernel: , where is the bandwidth of the Gaussian kernel.

Parameters of the kernel function define the structure of the high dimensional feature space and also control the accuracy of the final solution. Thus, they should be selected carefully.

#### 3. Empirical Study

##### 3.1. Data Description

The CSI 300 is chosen for empirical analysis and to examine the performance of the proposed model. This index comprises 179 stocks from Shanghai stock exchange and 121 stocks from Shenzhen stock exchange and is managed by the China Securities Index Company Ltd.

Most researchers have chosen international indices in the past, including S & P 500, NIKKEI 225, NASDAQ, DAX, and gold price as input variables. They have examined the cross relationship between stock market index and macroeconomic variables. The potential input variables that can be used for forecasting model mainly consist of the gross domestic product (GDP), gross national product (GNP), short-term interest rate (ST), long-term interest rate (LT), and term structure of interest rate (TS) [1, 31, 32].

Although China has overtaken Japan to become the world's second largest economy and the Chinese stock market has developed into one of the most important markets in the global economy, Chinese consumption capacity is limited in the domestic market. The movement of the stock market has a close relationship with the money available of the investors, which is determined by the money supply and the interest rate. Considering that the Chinese stock market is affected by the global economic situation as well as the domestic economic development, we choose US Dollar Index (USDX), Shanghai Interbank Offered Rate (SHIBOR), P/E ratio (PE), money supply (M2), repurchase agreement (REPO), China CNY Monthly New Loan, market capitalization of the 300 publicly traded companies (mkt cap), People’s Bank 5-year CDS, and short-mid note as input variables.

The lag of input variables is 3 days. We use the daily data to predict the CSI 300 index by nonlinear SVM regression. Since M2, short-mid note, and New Loan are published once a month, we transform these variables into daily variables by dividing them by a daily variable. We divided all data sets into two sections and used the first section as the training part to find the optimal parameters for the LSSVM and avoid overfitting by training and validating the model. The other section is used for testing. As shown in Table 1, we choose nine variables as the input variables and one variable as the output variable including 643 daily data from May 1, 2009, to August 23, 2011, to train the parameters in the model. Once we obtain these parameters, we use the same input and output variables from August 24, 2011, to January 20, 2012, including 100 daily data to examine the performance of different model in the testing part.