Multivariate Local Polynomial Regression with Application to Shenzhen Component Index

Su, Liyun

doi:https://doi.org/10.1155/2011/930958

Discrete Dynamics in Nature and Society

On this page

Abstract Introduction Data Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2011 | Article ID 930958 | https://doi.org/10.1155/2011/930958

Multivariate Local Polynomial Regression with Application to Shenzhen Component Index

Liyun Su¹

Academic Editor: Carlo Piccardi

Received24 Nov 2010

Revised28 Jan 2011

Accepted09 Mar 2011

Published05 May 2011

Abstract

This study attempts to characterize and predict stock index series in Shenzhen stock market using the concepts of multivariate local polynomial regression. Based on nonlinearity and chaos of the stock index time series, multivariate local polynomial prediction methods and univariate local polynomial prediction method, all of which use the concept of phase space reconstruction according to Takens' Theorem, are considered. To fit the stock index series, the single series changes into bivariate series. To evaluate the results, the multivariate predictor for bivariate time series based on multivariate local polynomial model is compared with univariate predictor with the same Shenzhen stock index data. The numerical results obtained by Shenzhen component index show that the prediction mean squared error of the multivariate predictor is much smaller than the univariate one and is much better than the existed three methods. Even if the last half of the training data are used in the multivariate predictor, the prediction mean squared error is smaller than the univariate predictor. Multivariate local polynomial prediction model for nonsingle time series is a useful tool for stock market price prediction.

1. Introduction

In many theoretical and practical problems, complex nonlinear systems exist everywhere. It is suitable for stock price short-term prediction that stock market can be seen as a nonlinear dynamical system [1–3]. Researchers in economics and finance have been interested in predicting stock price behavior for many years. A variety of forecasting methods have been proposed and implemented. Among them, nonlinear prediction method is a new method developed in the last decades [4, 5]. Nonlinear system prediction posed a significant challenge for the complex system analyst, since the nonlinear structure tends to be very intricate and nonuniform.

Although frequently referred to as unpredictable deterministic behavior, complex nonlinear systems can in fact be forecast over limited time scales. In many situations, it is hard to build up exact analytic model for complex systems (such as the stock market and the electric load) because their constructions are very intricate and the information available is incomplete and inaccurate. Complex systems are usually analyzed by time series observed or measured from the systems. Most prediction methods can be grouped into global and local methods. The class of local nonlinear prediction methods is based on next neighbor searches and is introduced by Lorenz [6]. Many introductions to next neighbor techniques have been published. A very effective local prediction method has been proposed for the multivariate time series case and achieved satisfactory results [7]. In this paper, multivariate local polynomial prediction method for multivariate time series [8–10] is applied to Shenzhen stock market in China. Multivariate local polynomial regression prediction methods based on kernel smoothing techniques including mean, linear, polynomial, and Backpropagation (BP) neural networks are analyzed and applied to predict stock price behavior. We also compare the accuracy of different prediction methods mentioned in this paper. The model combines the advantages of traditional local, weighted, multivariate prediction methods. The results show that the multivariate local polynomial prediction method has lower mean squared error compared with the univariate predictor based on univariate local polynomial regression (U-LPR) and most of the traditional ones (such as the local mean prediction, local linear prediction, and BP neural networks prediction).

The remainder of this paper is organized as follows. Section 2 describes the data examined in this study. In Section 3, the multivariate local polynomial estimator is applied to obtain multivariate complex nonlinear predictor. The selection of the time delay, the embedding dimension, the order of multivariate local polynomial function, the kernel function, and the bandwidth are also described. The applications and discussions for Shenzhen stock index time series are given in Section 4. Conclusions are drawn in Section 5.

2. Data

As is well known, data of the stock market, for example, stock prices, often shows greatly complicated behavior; therefore, it is very difficult to predict its movement accurately. In order to set up a good prediction model about such financial indices, to seek a suitable variable affecting price index is important. Daily Shenzhen component index data are sourced from Shenzhen stock market. Nonlinearity of stock market index time series has been tested in many literatures [1–3], so we can apply the prediction method given in this paper to fit and predict the index. To apply the proposed scheme, besides the component index, the change index is also considered to suit the multivariate situation, since it is one of the key factors influencing the dealer's mind. Shenzhen component index and the return index were selected, and we used the multivariate local polynomial prediction method to predict the Shenzhen component index. We selected 1091 data points from the Shenzhen component index and the increment index individually from January 4, 2006 to June 30, 2010, and we denote the Shenzhen component index (ShCpIn) by and the change index by . From this, we have the two variants time series and (where ). The original data points are shown in Figure 1.

(a)

(b)

3. Multivariate Nonlinear Time Series Predictor with Local Polynomial Regression

3.1. Phase Space Reconstruction Model

Takens' theorem is the 1981 delay embedding theorem of Takens [11]. In mathematics, a delay embedding theorem gives the conditions under which a chaotic dynamical system can be reconstructed from a sequence of observations of the state of a dynamical system. The reconstruction preserves the properties of the dynamical system that do not change under smooth coordinate changes. Phase space reconstruction model [12–14] has been studied and applied to many fields. Suppose that we have an -dimensional time series . As in the case of univariate time series (when ), the phase space reconstruction can be described by where , are the time delays and the embedding dimensions, respectively. Following Takens' delay embedding theorem, if or each is large enough, there exists an -dimensional continued vector mapping : , such that or there exists an -dimensional continued function : , such that

Thus, the evolution from to reflects the motion of the original unknown dynamics. This means that the geometrical characteristics of the strange attractor in the reconstructed space are equivalent to the original state space. So any differential or topological invariant quantities computed for the reconstructed strange attractor are identical to those in the original state space.

3.2. Prediction Model Based on Multivariate Local Polynomial Regression

Multivariate local polynomial fitting is an attractive method both from theoretical and practical point of view. Multivariate local polynomial method has a small mean squared error compared with the Nadaraya-Watson estimator which leads to an undesirable form of the bias and the Gasser-Muller estimator which has to pay a price in variance when dealing with a random design model. Multivariate local polynomial fitting also has other advantages. The method adapts to various types of designs such as random and fixed designs and highly clustered and nearly uniform designs. Furthermore, there is an absence of boundary effects: the bias at the boundary stays automatically of the same order as the interior, without the use of specific boundary kernels. The local polynomial approximation approach is appealing on general scientific grounds; the least squares principle to be applied opens the way to a wealth of statistical knowledge and thus easy generalizations. All the above-mentioned assertions or advantages can be found in literatures [8, 15–18]. In this section, we briefly outline the idea of the extension of multivariate local polynomial fitting to multivariate nonlinear time series forecasting.

Suppose that the state vector at time is . Time later than on attractor is fitted by the function

Our purpose is to obtain the estimation of function . In this paper, we use the th order multivariate local polynomial to predict the value of the fixed-point . The polynomial function can be described as where

In the multivariate prediction method, the change of with time on the attractor is assumed to be the same as those of nearby points, , according to the distance order. Using pairs of , for which the values are already known, the coefficients of are determined by minimizing

For the weighted least squared problem, when is inverse, the solution can be described by where and is the , then we can get the estimation , where .

There are several important issues about the bandwidth, the order of multivariate local polynomial function, and the kernel function which have to be discussed.

3.3. Parameters Estimations and Selections

We calculate the time delays with the mutual information method [19] separately for each univariate time series , . The mutual information method is based on linear statistics, not taking into account nonlinear dynamical correlations. Therefore, it is advocated that one look for the fist minimum of the time delayed mutual information.

There are many of the embedding dimensions algorithms [12, 20]. In univariate time series , , a popular method that is used for finding the embedding dimensions is the so-called false nearest-neighbor method [21, 22]. Here, we apply this method to the multivariate case.

For the multivariate local polynomial predictor, there are three important problems which have significant influence on the prediction accuracy and computational complexity. First of all, there is the choice of the bandwidth matrix, which plays a rather crucial role. The bandwidth matrix is taken to be a diagonal matrix. For simplification, the bandwidth matrix is designed into . In theory, there exists an optimal bandwidth in the meaning of mean squared error, such that

Another issue in multivariate local polynomial fitting is the choice of the order of the polynomial. Since the modeling bias is primarily controlled by the bandwidth, this issue is less crucial, however. For a given bandwidth , a large value of would expectedly reduce the modeling bias, but would cause a large variance and a considerable computational cost. Since the bandwidth is used to control the modeling complexity, and due to the sparsity of local data in multidimensional space, a higher-order polynomial is rarely used. So, we apply the local quadratic regression to fit the model (i.e., ).

The third issue is the selection of the kernel function. In this paper, we choose the optimal spherical Epanechnikov kernel function [8, 15], which minimizes the asymptotic mean square error (MSE) of the resulting multivariate local polynomial estimators, as our kernel function.

4. Practical Applications and Discussions

In the following, we apply our prediction algorithms to Shenzhen stock index series. The forecast variables here are close price and its change index data.

For the nonlinear stock index time series , divide into two parts: and . The former data are used to construct a model and estimate the coefficients, which are called the trained sets; the latter data are used to make forecasting, which are called prediction sets. Make the prediction of to be . This prediction is defined as -step prediction. For the purpose of simplification, we only predict the first variable.

Furthermore, in order to evaluate the prediction accuracy and effectiveness, we apply the following indices, namely, the mean squared prediction error (MSE) and the absolute error

The former 1000 data are used as the training sets and the latter 91 data are used as the prediction sets. We apply the normalized method to the original time series, then the formula is as follows:

For Shenzhen component index , we obtain the optimal time delay and the minimum embedding dimension , and we reconstruct a phase space with .

In nonlinear time series prediction, most existing methods used phase space reconstruction of univariate time series for prediction. Theoretically, if embedding dimension and delay time are selected reasonably, univariate time series can achieve satisfactory prediction results. But for most actual problems, because the length of acquired time series is limited, and often time series involves noise, the reconstructed phase space of univariate time series cannot very accurately describe the evolutionary track of state variables of dynamic systems. In addition, often we do not know whether univariate time series contains complete information of dynamic systems for phase space reconstruction, and multivariate time series usually contains more complete system information than univariate time series. So, phase space reconstruction of multivariate time series can reconstruct a more accurate phase space. Literatures [5, 19, 20] have verified that multivariate time series forecasting methods based on phase space reconstruction can get more accurate prediction. This paper applies phase space reconstruction methods of multivariate time series and multivariate polynomial regression to fit and predict Shenzhen stock index series.

We combine Shenzhen component index and change index into one multivariate time series to predict the evolvement of component index according to Section 2. Firstly, two component series are obtained by combination of initial Shenzhen index time series and the time series of consecutive differences. Then based on the embedding dimensions and time delays are multivariate phase space is reconstructed, and the proposed methods are used to fit and predict the multivariate time series. For the multivariate nonlinear time series , we obtain the optimal time delay , and the minimum embedding dimension , according to the methods of Section 3.3, and we reconstruct a multivariate phase space with .

Mean squared prediction errors with univariate data are shown in Table 1. BPNN predictor is defined as BP neural networks predictor with the hidden layer consists of 12 neurons and the training times are 900 times. The LM predictor denotes the local mean predictor. The LL predictor is the abbreviation of the local linear predictor. The three local approaches are compared with the U-LPR predictor using the same the number of nearest neighbors 200 (i.e., ). From Table 1, we can conclude that the prediction results of U-LPR predictor are significantly better than the three traditional methods in the same univariate data.

In order to further discuss the influence of different reconstructed vector data on prediction with the same data from complex nonlinear Shenzhen stock market, the prediction errors are shown in Table 2. From Table 2, we can see that the predicted results with M-LPR method are better than the U-LPR one.

The results from Figures 2–5 show that the proposed multivariate nonlinear stock index time series predictor based on multivariate local polynomial fitting is effective, even in only the last half of the fitting data, the method performs well for the prediction of complex multivariate nonlinear stock index time series. Figure 2 is the one-step prediction results with 1000 fitting data, and Figure 3 is its absolute error. Figure 4 is the one-step prediction results with just the last 500 fitting data, and Figure 5 is its absolute error. From Figures 2–5, we know that a big fitting data obtain a small mean squared error. This is one of the advantages of nonparametric approaches, that is to say, the more fitting data can make the prediction results more accurate.

5. Conclusion

In this brief, we have presented a new method for the prediction of multivariate nonlinear complex systems for stock market index time series based on multivariate local polynomial regression with kernel smoothing technique. The multivariate local polynomial and weighted least squared method are applied to the nonlinear complex system prediction. The univariate predictor has been compared with the multivariate forecasting based on the multivariate local polynomial fitting in the same stock index data. Comparisons with the conventional three predictors have also been made. The results obtained by Shenzhen component index system have indicated that the prediction mean squared error of the M-LPR predictor is much smaller than the U-LPR one, even if the last half of the fitting data is used in the former one, and the proposed method is also much better than most of the existed methods including the local mean prediction, local linear prediction, and BP neural networks prediction.

Acknowledgments

This work was supported by Chongqing CSTC foundation of China (CSTC2010BB2310, CSTC2009BB2420), Chongqing CMEC foundation of China (KJ100810, KJ100818), and CQUT foundation of China (2007ZD16).

References

W. Haiyan and T. Longkun, “Testing for nonlinearity in Shanghai stock market,” International Journal of Modern Physics B, vol. 18, no. 17–19, pp. 2720–2724, 2004.
View at: Publisher Site | Google Scholar
J. Ma and L. Liu, “Multivariate nonlinear analysis and prediction of Shanghai stock market,” Discrete Dynamics in Nature and Society, vol. 2008, Article ID 526734, 8 pages, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
H. Liu, Z. Zhang, and Q. Zhao, “The volatility of the index of shanghai stock market research based on ARCH and its extended forms,” Discrete Dynamics in Nature and Society, vol. 2009, Article ID 743685, 9 pages, 2009.
View at: Publisher Site | Google Scholar
S. A. R. B. Rombouts, R. W. M. Keunen, and C. J. Stam, “Investigation of nonlinear structure in multichannel EEG,” Physics Letters A, vol. 202, no. 5-6, pp. 352–358, 1995.
View at: Publisher Site | Google Scholar
A. Porporato and L. Ridolfi, “Multivariate nonlinear prediction of river flows,” Journal of Hydrology, vol. 248, no. 1–4, pp. 109–122, 2001.
View at: Publisher Site | Google Scholar
E. N. Lorenz, “Atmospheric predictability as revealed by naturally occurring analogues,” Journal of the Atmospheric Sciences, vol. 26, no. 4, pp. 636–646, 1969.
View at: Publisher Site | Google Scholar
K. Kocak, L. Saylan, and J. Eitzinger, “Nonlinear prediction of near-surface temperature via univariate and multivariate time series embedding,” Ecological Modeling, vol. 173, no. 1, pp. 1–7, 2004.
View at: Publisher Site | Google Scholar
J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, vol. 66 of Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1996.
L.-Y. Su, “Prediction of multivariate chaotic time series with local polynomial fitting,” Computers & Mathematics with Applications, vol. 59, no. 2, pp. 737–744, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
L. Su and F. Li, “Deconvolution of defocused image with multivariate local polynomial regression and iterative wiener filtering in DWT domain,” Mathematical Problems in Engineering, vol. 2010, Article ID 605241, 14 pages, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
F. Takens, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980 (Coventry, 1979/1980), vol. 898 of Lecture Notes in Mathematics, pp. 366–381, Springer, BerlinM Germany, 1981.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
M. B. Kennel, R. Brown, and H. D. I. Abarbanel, “Determining embedding dimension for phase-space reconstruction using a geometrical construction,” Physical Review A, vol. 45, no. 6, pp. 3403–3411, 1992.
View at: Publisher Site | Google Scholar
F. M. Roberts, R. J. Povinelli, and K. M. Ropella, “Identification of ECG using phase space reconstruction,” Lecture Notes in Computer Science, vol. 2168, pp. 411–423, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
S. Ma, Z. Cai, Y. Hua, X. Li, and Y. Ge, “An approach of combustion diagnosis in boiler furnace based on phase space reconstruction,” Communications in Computer and Information Science, vol. 2, pp. 528–535, 2007.
View at: Publisher Site | Google Scholar
J. Fan and Q. Yao, Nonlinear Time Series: Nonparametric and Parametric Methods, Springer Series in Statistics, Springer, New York, NY, USA, 2003.
View at: Publisher Site
J. Fan and I. Gijbels, “Adaptive order polynomial fitting: bandwidth robustification and bias reduction,” Journal of Computational and Graphical Statistics, vol. 4, no. 3, pp. 213–227, 1995.
View at: Publisher Site | Google Scholar
J. Fan and I. Gijbels, “Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation,” Journal of the Royal Statistical Society. Series B, vol. 57, no. 2, pp. 371–394, 1995.
View at: Google Scholar | Zentralblatt MATH
J. Fan, N. E. Heckman, and M. P. Wand, “Local polynomial kernel regression for generalized linear models and quasi-likelihood functions,” Journal of the American Statistical Association, vol. 90, no. 429, pp. 141–150, 1995.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
L. Cao, A. Mees, and K. Judd, “Dynamics from multivariate time series,” Physica D, vol. 121, no. 1-2, pp. 75–88, 1998.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
S. Boccaletti, D. L. Valladares, L. M. Pecora, H. P. Geffert, and T. Carroll, “Reconstructing embedding spaces of coupled dynamical systems from multivariate data,” Physical Review E, vol. 65, no. 3, pp. 1–4, 2002.
View at: Publisher Site | Google Scholar
H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, vol. 7 of Cambridge Nonlinear Science Series, Cambridge University Press, Cambridge, UK, 1997.
A. M. Fraser and H. L. Swinney, “Independent coordinates for strange attractors from mutual information,” Physical Review A, vol. 33, no. 2, pp. 1134–1140, 1986.
View at: Publisher Site | Google Scholar | Zentralblatt MATH

Copyright

Copyright © 2011 Liyun Su. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2904

Downloads

1203

Citations