Research Article  Open Access
Norm Multikernel Learning Approach for Stock Market Price Forecasting
Abstract
Linear multiple kernel learning model has been used for predicting financial time series. However, norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt norm multiple kernel support vector regression () as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than norm multiple support vector regression model.
1. Introduction
Forecasting the future values of financial time series is an appealing yet difficult activity in the modern business world. As explained by Deboeck and Yaser [1, 2], the financial time series are inherently noisy, nonstationary, and deterministically chaotic. In the past, many methods were proposed for tackling this kind of problem. For instance, the linear models for forecasting the future values of stock prices include the autoregressive (AR) model [3], the autoregressive moving average (ARMA) model [4], and the autoregressive integrated moving average (ARIMA) model [4]. Over the last decade, nonlinear approaches have received increasing attention in financial time series prediction and have been proposed for a satisfactory answer to the problem. For example, Yao and Tan [5] used time series data and technical indicators as the input of neural networks to increase the forecast accuracy of exchange rates; Cao and Tay [6, 7] applied support vector machine (SVM) in financial forecasting and compared it with the multilayer backpropagation (BP) neural network and the regularized radial basis function (RBF) neural network; Qi and Wu [8] proposed a multilayer feedforward network to forecast exchange rates; Pai and Lin [9] invested a hybrid ARIMA and support vector machines model in stock price forecasting; Pai et al. [10] presented a hybrid SVM model to exploit the unique strength of the linear and nonlinear SVM models in forecasting exchange rate; Kwon and Moon [11] proposed a hybrid neurogenetic system for stock trading; Hung and Hong [12] presented an improved ant colony optimization algorithm in a support vector regression (SVR) model, called SVRCACO, for selecting suitable parameters in exchange rate forecasting; Jiang and He [13] introduced local grey SVR (LGSVR) integrated grey relational grade with local SVR for financial times eries forecasting; and so on.
In comparison with the previous models, SVR with a single kernel function can exhibit better prediction accuracy because it conceives the structural risk minimization principle which considers both the training error and the capacity of the regression model [14, 15]. However, the researchers have to determine in advance the type of kernel function and the associated kernel hyper parameters for SVR. Unsuitably chosen kernel functions or hyper parameter settings may lead to significantly poor performance [16, 17].
In recent years there has a lot of interest in designing principled regression algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance and decreasing the generalization error. When the right choice of features is unknown, learning linear combinations of multiple kernels is an appealing strategy. The approach with a optimization process is called multiple kernel learning (MKL). A first step towards a more realistic model of MKL was achieved by Lanckriet et al. [18], who showed that, given a candidate set of kernels, it is computationally feasible to simultaneously learn a support vector machine and a linear kernel combination at the same time. In MKL we need to solve a joint optimization problem while also learning the optimal weights for combing the kernels. Several practitioners have adopted the linear multiple kernels to deal with the practical problems. For example, Rakotomamonjy et al. [19] addressed the MKL problem through a weighted 2norm regularization formulation and proposed an algorithm, named Simple MKL, for solving this MKL problem. Bach [20] proposed the asymptotic model consistency of the group Lasso. Zhang and Shen [21] presented multimodal multitask learning algorithm for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Especially, ChiYuan Yeh and his coworkers [22] developed a twostage MKL algorithm by incorporating sequential minimal optimization and the gradient projection method. The new method [22] performed better than previous ones for forecasting the financial time series. Previous approaches to multiple kernel learning (MKL) have promoted sparse kernel combinations to support interpretability and scalability. Unfortunately, sparsity at the kernel level may harm the generalization performance of the learner, therefore norm MKL is rarely observed to outperform trivial baselines in practical applications [23]. To allow for robust kernel mixtures that generalize well, the researchers extend norm MKL to arbitrary norms, that is, norm MKL (). For example, Marius Kloft et al. developed two efficient interleaved strategies for norm MKL and showed that it can achieve better accuracy than norm MKL for realworld problems [23]; Francesco Orabona et al. presented a MKL optimization algorithm based on stochastic gradient descent for norm MKL, which possessed a faster convergence rate as the number of kernels grows [24].
In this paper, a multiple kernel learning framework is established for learning and predicting the stock prices. We present a regression model for the future values of stock prices, that is, norm multiple kernel support vector regression (norm MKSVR), where . We decompose the optimization problem into smaller subproblem and adopt the interleaved optimization strategy to solve the regression model. Our experimental results show that norm MKSVR performs a better performance.
The rest of this paper is arranged as follows. Section 2 details the processing of the norm MKSVR model construction and describes the algorithm for our regression model. Experimental results are presented in Section 3. Section 4 concludes the paper and provides some future research directions.
2. Forecasting Methodology
2.1. Norm Multiple Kernel Support Vector Regression
In this section, the idea of norm multiple kernel support vector regression (norm MKSVR) is introduced formally.
Let , where and , be the training set. Each is the desired output value for the input vector . Consider a function that maps the samples into a high, possibly infinite, dimensional space. A regression model is learned from the previous and used to predict the target values of unseen input vectors. SVR is a nonlinear kernelbased regression method which tries to locate a regression hyperplane with small risk in highdimensional feature space [14]. Considering the soft margin formulation, the objective function and constraints for SVR should be solved, as follows:
SVR model usually uses a single mapping function and hence a single kernel function . Although the SVR model has good function approximation and generalization capabilities, it is not fit for dealing with a dataset which has a locally varying distribution. For resolving this problem, we can construct a MKSVR model. Combining multiple kernels instead of using a single one, norm MKSVR model can catch up the varying distribution very well. Therefore we can use the composite feature map which has a block structure: to map the input space to the feature space, where are weights of component functions. Given a set of base kernels which correspond the previous feature maps , linear MKSVR aims to learn a linear combination of the base kernels as . In learning with MKSVR we aim at minimizing the loss on the training data with respect to the optimal kernel mixture in addition to regularizing to avoid overfitting. The primal can therefore be formulated as Previous research to MKSVR employs the regularizer of the form which can promote sparse kernel mixtures. However, sparsity is not always desirable, since the information carried in the zeroweighted kernels is lost. Therefore we propose to use nonsparse and thus more robust kernel mixtures by employing an norm constraint with , that is, , and , . In (3), let , , , and the first equation be divided with , then the following norm MKSVR is obtained:
An alternative approach previous equations has been considered by studiers. For example, Zien and Ong [25] upperbound the value of the regularizer and incorporate the regularizer as an additional constraint into the optimization problem. According to this thought, norm MKSVR model (4) can be transformed into the following form:
It can be shown (see the Appendix for details) that the dual of (5) is where , , , , and is the dual norm of . Suppose the optimal , and are found by solving (6), the regression hyperplane for norm MKSVR model is given by where is obtained from any and , with . In the following section, an efficient algorithm is proposed for solving the optimization problem (6).
2.2. An Optimistic Algorithm
norm MKSVR model (6) can be trained with several algorithms, for example, the Sequential Minimal Optimization algorithm [26] and multikernel learning with onlinebath optimization [24]. In this paper, the interleaved optimization is used for the optimization scheme according to the idea of [23]. As a matter of fact, we can exploit the structure of norm MKSVR cost function by alternating between optimizing the linear combination of the base kernels and the remaining variables as and . We can do so by setting up a twostage optimization algorithm. The basic idea of the algorithm is to divide the optimization variables of norm MKSVR problem (6) into two groups, on one hand and on the other. Our procedure will alternatingly operate on those two stages via a block coordinate descent algorithm. Therefore the optimization will be carried out analytically and the will be computed in the dual. The two stages are iteratively performed until the specified stopping criterion is met, as shown in Figure 1.
In the first stage, the variables are kept fixed, that is, the are known. Then the optimal in norm MKSVR model (6) can be calculated analytically by the following process.
Set the ’s first partial derivatives with respect to , and let it be : In the optimal point holds, so the previous equation yields where , and .
In the second stage, the following algorithm is used. We give a chunkingbased training algorithm (Algorithm 1) via analytical update for Norm MKSVR. Kernel weighting and are optimized in an interleaving way. The basic idea of this algorithm is to divide the optimal problem into an inner subproblem and an outer subproblem. The algorithm alternates between solving the two subproblems until convergence.

In every iteration process, the inner subproblem ( and step) identifies the constraint that maximises (6) with fixing kernel weighting . The outer subproblem ( step) is also called the restricted master problem. is computed with the (10), .
The interleaved optimization algorithm is depicted in Algorithm 1, and the details of it are as follows.
2.2.1. Initialization
Assume the original values of and are 0, for all , and the initial value of is , for all , where is a constant.
2.2.2. Chunking and Carrying out with SVR
In the iteration process, the procedure is standard in chunkingbased SVR solvers and is carried out by , where is chosen as described in [28]. We implement the greedy secondorder working set selection strategy of [28]. Rather than compute the gradient repeatedly, we speed up variable selection by caching, separately for each kernel. The cache needs to be updated every time we change and in the reduced variable optimisation. In Algorithm 1, (4) and (5) compute the objective values of SVR. Finally, the analytical value of is carried out in (10).
2.2.3. Stopping Criterion
When the duality gap falls below a prespecified threshold, that is, , we terminate the algorithm and output , , .
3. Experimental Results
In this section, two experiments on a real financial time series have been carried out to assess the performance of norm MKSVR. The motivation behind the two experiments are to compare the performance of our proposed method with that of other methods, that is, single kernel support vector regression (SKSVR) [29] and norm MKSVR [22]. All calculations are performed with programs developed in MATLAB R2010a.
3.1. Experiment I
Firstly, we compare the performance of norm MKSVR with that of SKSVR. In this experiment, the daily stock closing prices of Shanghai Stock Index in China for the period of January 2003 to December 2007 are used, and the training/validating/testing data set is generated by a oneseason movingwindow testing approach. Following the way done in [29], three data sets, data1 to data3, are formed. For instance, data1 contains the daily stock closing prices from January 2003 to December 2006 are selected as the training data set, the daily stock closing prices from January 2007 to March 2007 are selected as the validating data set, the daily stock closing prices from April 2007 to June 2007 are selected as the testing data set. The corresponding time periods for data 1 to data 3 are listed in Table 1.

According to [29], we can derive training patterns based on the original daily stock closing prices for SKSVR and norm MKSVR. Let × be the day exponential moving average of the th day, where is the th day daily stock closing prices and , then the output variable can be defined as Let be the input vector and let be the lagged relative difference in percentage of price (RDP). Moreover, We can obtain a transformed closing price by subtracting a day EMA from the closing price, that is,
Based on in the previously mentioned, the input variables can be defined as , , , , and . We adopt the root mean squared error (RMSE) for performance comparison, that is, where and are desired output and predicted output, respectively.
There are three parameters that should be determined in advance for SKSVR,that is, , , and for using RBF kernel. The forecasting performance of SKSVR is examined with and . Because the forecasting performance obtained by SKSVR is effected by the parameter , we try with different settings of it from 0.01 to 3 with a stepping factor of 0.05. Figure 2 shows the RMSE for performance on the three data sets by SKSVR. The figure shows that SKSVR requires different settings for different data sets to obtain the best performance. For example, the best performance for data 1 occurs when . The best RMSE values obtained by SKSVR are listed in Table 2.

For norm MKSVR training model, we adopt RBF kernel . A kernel combining 60 different RBF kernels is considered,that is, with step . Hence, the kernel matrix is combined with a weighted sum of 60 kernel matrices, that is, where denotes the kernel weight for the first kernel matrix with and denotes the kernel weight for the second kernel matrix with , and so on. For the three data sets, the RMSE values obtained by norm MKSVR are listed in Table 2, too. Obviously when , , and , norm MKSVR model performs better than SKSVR one for data1 data set, data2 data set, and data3 data set, respectively.
3.2. Experiment II
Secondly, we compare the performance of norm MKSVR with that of norm MKSVR. In this experiment, the daily stock closing prices of Shanghai Stock Index in China for the period of January 2008 to December 2011 are used, and the training/validating/testing data set is generated by a oneseason movingwindow testing approach. Following the way done in Tay and Cao [29], three data sets, DI to DIII, are formed. The corresponding time periods for DI to DIII are listed in Table 3.

We also adopt RMSE (13) for performance comparison. For norm MKSVR and norm MKSVR training model, a kernel combining 40 different RBF kernels is considered, that is, , . Hence, the kernel matrix is combined with a weighted sum of 40 kernel matrices,that is, where denotes the kernel weight for the first kernel matrix with and denotes the kernel weight for the second kernel matrix with , and so on. For the three data sets, the RMSE values obtained by norm MKSVR and norm MKSVR are listed in Table 4. Obviously when , , and , norm MKSVR model performs better than norm MKSVR one for DI data set, DII data set, and DIII data set, respectively. Figure 3 shows the forecasting results for DI and DII by the two regression models.

(a)
(b)
(c)
(d)
Furthermore, we can use a statistical test proposed by Diebold and Mariano [30] to assess the statistical significance of the forecasts by norm MKSVR model. The lossdifferential series of norm MKSVR and norm MKSVR are shown in Figures 4 and 5. According to [30], we adopt the asymptotic test as the test statistic, where is the lossdifferential series of norm MKSVR and norm MKSVR models, and denote the forecasting errors; is the weighted sum of the available sample autocovariances: , where is the sample size, , and is the lag window, defined as where ; reports the number of forecasting steps ahead.
We denote as the forecasting accuracy of norm MKSVR and as the forecasting accuracy of norm MKSVR. Under the null hypothesis: , the test was performed at the and significant levels [12]. The test results are shown in the following Table 5. For the three data sets, all asymptotic tests reject . The test result shows that norm MKSVR model indeed improves the forecasting accuracy in comparison with norm MKSVR model.

We briefly mention that the superior performance of norm MKSVR model () is not surprising. When we use the sparsityinducing norm (), some of the kernel weights are forced to become zero, and the corresponding kernel will be eliminated leading to some information loss. The daily stock closing prices do not carry large parts of overlapping information, and the information is discriminative. So a nonsparse kernel mixture can access more information and perform more robustly.
4. Summary and Prospect
In this paper, an norm MKSVR model for stock market price forecasting is proposed. The model conceives an optimization scheme of unprecedented efficiency and provides a really efficient implementation. In an empirical evaluation, we show that norm MKSVR can improve predictive accuracies on relevant realworld data sets. Although we focus on volatility forecasting of stock markets in this paper, our norm MKSVR model could be applied to more general financial forecasting problems. Therefore in the future we will apply our norm MKSVR model for other financial markets, such as exchange markets.
Appendix
Norm MKSVR Dual Formulation
In this appendix, we detail the dual formulation of norm MKSVR. We again consider norm MKSVR with a general convex loss,
In the following, we build the Lagrangian of (A.1). By introducing Lagrangian multipliers , , , and , the Lagrangian saddle point problem is given by Set the Lagrangian’s first partial derivatives with respect to , , , and , and let them be to reveal the optimality conditions
Resubstituting the previous equations to the Lagrangian yields the following which can also be written as
For standard support vector regression formulations, the hinge loss function can be defined as . This loss is also convex with a subgradient bounded by . As is known to all, the FenchelLegendre conjugate of a function is defined as , and the dual form is denoted by (the norm defined via the identity . According to (A.3), (A.5), and FenchelLegendre conjugate of the hinge loss function, we can obtain the following dual: where , , , , and .
In the following, we find at optimality. Let us solve for the unbounded ; then we can obtain the optimal as Obviously, , so we can ignore the corresponding constraint from the optimization problem and plug (A.7) into (A.6). Then the following dual optimization problem for norm MKSVR is written as
For the choice of norm, holds in the optimal point so that the term can be discarded [23]. Therefore the previous equations reduce to an optimization problem that depends on and as
Now, norm MKSVR model has been constructed.
Acknowledgments
The authors would like to thank the handling editor and the anonymous reviewers for their constructive comments, which led to significant improvement of the paper. This work was partially supported by the National Natural Science Foundation of China under Grant no. 51174236.
References
 J. W. Hall, “Adaptive selection of U.S. stocks withneural nets,” in Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Nancial Markets, G. J. Deboeck, Ed., John Wiley & Sons, New York, NY, USA, 1994. View at: Google Scholar
 Y. S. AbuMostafa and A. F. Atiya, “Introduction to financial forecasting,” Applied Intelligence, vol. 6, no. 3, pp. 205–213, 1996. View at: Google Scholar
 D. G. Champernowne, “Sampling theory applied to autoregressive schemes,” Journal of the Royal Statistical Society B, vol. 10, pp. 204–231, 1948. View at: Google Scholar
 G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Prentice Hall, Englewood Cliffs, NJ, USA, 3rd edition, 1994.
 J. Yao and C. L. Tan, “A case study on using neural networks to perform technical forecasting of forex,” Neurocomputing, vol. 34, pp. 79–98, 2000. View at: Publisher Site  Google Scholar
 L. Cao and F. E. H. Tay, “Financial forecasting using support vector machines,” Neural Computing and Applications, vol. 10, no. 2, pp. 184–192, 2001. View at: Publisher Site  Google Scholar
 L. J. Cao and F. E. H. Tay, “Support vector machine with adaptive parameters in financial time series forecasting,” IEEE Transactions on Neural Networks, vol. 14, no. 6, pp. 1506–1518, 2003. View at: Publisher Site  Google Scholar
 M. Qi and Y. Wu, “Nonlinear prediction of exchange rates with monetary fundamentals,” Journal of Empirical Finance, vol. 10, no. 5, pp. 623–640, 2003. View at: Publisher Site  Google Scholar
 P. F. Pai and C. S. Lin, “A hybrid ARIMA and support vector machines model in stock price forecasting,” Omega, vol. 33, no. 6, pp. 497–505, 2005. View at: Publisher Site  Google Scholar
 P. F. Pai, W. C. Hong, C. S. Lin, and C. T. Chen, “A hybrid support vector machine regression for exchange rate prediction,” International Journal of Information and Management Sciences, vol. 17, no. 2, pp. 19–32, 2006. View at: Google Scholar
 Y. K. Kwon and B. R. Moon, “A hybrid neurogenetic approach for stock forecasting,” IEEE Transactions on Neural Networks, vol. 18, no. 3, pp. 851–864, 2007. View at: Publisher Site  Google Scholar
 W. M. Hung and W. C. Hong, “Application of SVR with improved ant colony optimization algorithms in exchange rate forecasting,” Control and Cybernetics, vol. 38, no. 3, pp. 863–891, 2009. View at: Google Scholar
 H. Jiang and W. He, “Grey relational grade in local support vector regression for financial time series prediction,” Expert Systems with Applications, vol. 39, no. 3, pp. 2256–2262, 2012. View at: Google Scholar
 V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
 N. Cristianini and J. ShaweTaylor, An Introduction to Support Vector Machines and Other Kernel Based Learning Methods, Cambridge University Press, Cambridge, UK, 2000.
 O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Machine Learning, vol. 46, no. 1–3, pp. 131–159, 2002. View at: Publisher Site  Google Scholar
 K. Duan, S. S. Keerthi, and A. N. Poo, “Evaluation of simple performance measures for tuning SVM hyperparameters,” Neurocomputing, vol. 51, pp. 41–59, 2003. View at: Publisher Site  Google Scholar
 G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan, “Learning the kernel matrix with semidefinite programming,” Journal of Machine Learning Research, vol. 5, pp. 27–72, 2004. View at: Google Scholar
 A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, “SimpleMKL,” Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008. View at: Google Scholar
 F. R. Bach, “Consistency of the group lasso and multiple kernel learning,” Journal of Machine Learning Research, vol. 9, pp. 1179–1225, 2008. View at: Google Scholar
 D. Zhang, D. Shen, and The Alzheimer’s Disease Neuroimaging Initiative, “Multimodal multitask learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, pp. 895–907, 2012. View at: Google Scholar
 C. Y. Yeh, C. W. Huang, and S. J. Lee, “A multiplekernel support vector regression approach for stock market price forecasting,” Expert Systems with Applications, vol. 38, no. 3, pp. 2177–2186, 2011. View at: Publisher Site  Google Scholar
 M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien, “${\ell}_{P}$norm multiple kernel learning,” Journal of Machine Learning Research, vol. 12, pp. 953–997, 2011. View at: Google Scholar
 F. Orabona, L. Jie, and B. Caputo, “Multi kernel learning with onlinebatch optimization,” Journal of Machine Learning Research, vol. 13, pp. 165–191, 2012. View at: Google Scholar
 A. Zien and C. S. Ong, “Multiclass multiple kernel learning,” in Proceedings of the 24th International Conference on Machine Learning (ICML'07), pp. 1191–1198, June 2007. View at: Publisher Site  Google Scholar
 S. V. N. Vishwanathan, Z. Sun, N. TheeraAmpornpunt, and M. Varma, “Multiple kernel learning and the SMO algorithm,” in Advances in Neural Information Processing Systems, 2010. View at: Google Scholar
 C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” in ACM Transactions on Intelligent Systems and Technology (TIST '11), vol. 2, no. 3, pp. 1–27, ACM, 2011. View at: Google Scholar
 R. E. Fan, P. H. Chen, and C. J. Lin, “Working set selection using second order information for training support vector machines,” Journal of Machine Learning Research, vol. 6, pp. 1889–1918, 2005. View at: Google Scholar
 F. E. H. Tay and L. Cao, “Application of support vector machines in financial time series forecasting,” Omega, vol. 29, no. 4, pp. 309–317, 2001. View at: Publisher Site  Google Scholar
 F. X. Diebold and R. S. Mariano, “Comparing predictive accuracy,” Journal of Business and Economic Statistics, vol. 20, no. 1, pp. 134–144, 2002. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2012 Xigao Shao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.