Abstract

This study attempts to characterize and predict stock index series in Shenzhen stock market using the concepts of multivariate local polynomial regression. Based on nonlinearity and chaos of the stock index time series, multivariate local polynomial prediction methods and univariate local polynomial prediction method, all of which use the concept of phase space reconstruction according to Takens' Theorem, are considered. To fit the stock index series, the single series changes into bivariate series. To evaluate the results, the multivariate predictor for bivariate time series based on multivariate local polynomial model is compared with univariate predictor with the same Shenzhen stock index data. The numerical results obtained by Shenzhen component index show that the prediction mean squared error of the multivariate predictor is much smaller than the univariate one and is much better than the existed three methods. Even if the last half of the training data are used in the multivariate predictor, the prediction mean squared error is smaller than the univariate predictor. Multivariate local polynomial prediction model for nonsingle time series is a useful tool for stock market price prediction.

1. Introduction

In many theoretical and practical problems, complex nonlinear systems exist everywhere. It is suitable for stock price short-term prediction that stock market can be seen as a nonlinear dynamical system [13]. Researchers in economics and finance have been interested in predicting stock price behavior for many years. A variety of forecasting methods have been proposed and implemented. Among them, nonlinear prediction method is a new method developed in the last decades [4, 5]. Nonlinear system prediction posed a significant challenge for the complex system analyst, since the nonlinear structure tends to be very intricate and nonuniform.

Although frequently referred to as unpredictable deterministic behavior, complex nonlinear systems can in fact be forecast over limited time scales. In many situations, it is hard to build up exact analytic model for complex systems (such as the stock market and the electric load) because their constructions are very intricate and the information available is incomplete and inaccurate. Complex systems are usually analyzed by time series observed or measured from the systems. Most prediction methods can be grouped into global and local methods. The class of local nonlinear prediction methods is based on next neighbor searches and is introduced by Lorenz [6]. Many introductions to next neighbor techniques have been published. A very effective local prediction method has been proposed for the multivariate time series case and achieved satisfactory results [7]. In this paper, multivariate local polynomial prediction method for multivariate time series [810] is applied to Shenzhen stock market in China. Multivariate local polynomial regression prediction methods based on kernel smoothing techniques including mean, linear, polynomial, and Backpropagation (BP) neural networks are analyzed and applied to predict stock price behavior. We also compare the accuracy of different prediction methods mentioned in this paper. The model combines the advantages of traditional local, weighted, multivariate prediction methods. The results show that the multivariate local polynomial prediction method has lower mean squared error compared with the univariate predictor based on univariate local polynomial regression (U-LPR) and most of the traditional ones (such as the local mean prediction, local linear prediction, and BP neural networks prediction).

The remainder of this paper is organized as follows. Section 2 describes the data examined in this study. In Section 3, the multivariate local polynomial estimator is applied to obtain multivariate complex nonlinear predictor. The selection of the time delay, the embedding dimension, the order of multivariate local polynomial function, the kernel function, and the bandwidth are also described. The applications and discussions for Shenzhen stock index time series are given in Section 4. Conclusions are drawn in Section 5.

2. Data

As is well known, data of the stock market, for example, stock prices, often shows greatly complicated behavior; therefore, it is very difficult to predict its movement accurately. In order to set up a good prediction model about such financial indices, to seek a suitable variable affecting price index is important. Daily Shenzhen component index data are sourced from Shenzhen stock market. Nonlinearity of stock market index time series has been tested in many literatures [13], so we can apply the prediction method given in this paper to fit and predict the index. To apply the proposed scheme, besides the component index, the change index is also considered to suit the multivariate situation, since it is one of the key factors influencing the dealer's mind. Shenzhen component index and the return index were selected, and we used the multivariate local polynomial prediction method to predict the Shenzhen component index. We selected 1091 data points from the Shenzhen component index and the increment index individually from January 4, 2006 to June 30, 2010, and we denote the Shenzhen component index (ShCpIn) by and the change index by . From this, we have the two variants time series and (where ). The original data points are shown in Figure 1.

3. Multivariate Nonlinear Time Series Predictor with Local Polynomial Regression

3.1. Phase Space Reconstruction Model

Takens' theorem is the 1981 delay embedding theorem of Takens [11]. In mathematics, a delay embedding theorem gives the conditions under which a chaotic dynamical system can be reconstructed from a sequence of observations of the state of a dynamical system. The reconstruction preserves the properties of the dynamical system that do not change under smooth coordinate changes. Phase space reconstruction model [1214] has been studied and applied to many fields. Suppose that we have an -dimensional time series . As in the case of univariate time series (when ), the phase space reconstruction can be described by where , are the time delays and the embedding dimensions, respectively. Following Takens' delay embedding theorem, if or each is large enough, there exists an -dimensional continued vector mapping : , such that or there exists an -dimensional continued function : , such that

Thus, the evolution from to reflects the motion of the original unknown dynamics. This means that the geometrical characteristics of the strange attractor in the reconstructed space are equivalent to the original state space. So any differential or topological invariant quantities computed for the reconstructed strange attractor are identical to those in the original state space.

3.2. Prediction Model Based on Multivariate Local Polynomial Regression

Multivariate local polynomial fitting is an attractive method both from theoretical and practical point of view. Multivariate local polynomial method has a small mean squared error compared with the Nadaraya-Watson estimator which leads to an undesirable form of the bias and the Gasser-Muller estimator which has to pay a price in variance when dealing with a random design model. Multivariate local polynomial fitting also has other advantages. The method adapts to various types of designs such as random and fixed designs and highly clustered and nearly uniform designs. Furthermore, there is an absence of boundary effects: the bias at the boundary stays automatically of the same order as the interior, without the use of specific boundary kernels. The local polynomial approximation approach is appealing on general scientific grounds; the least squares principle to be applied opens the way to a wealth of statistical knowledge and thus easy generalizations. All the above-mentioned assertions or advantages can be found in literatures [8, 1518]. In this section, we briefly outline the idea of the extension of multivariate local polynomial fitting to multivariate nonlinear time series forecasting.

Suppose that the state vector at time is . Time later than on attractor is fitted by the function

Our purpose is to obtain the estimation of function . In this paper, we use the th order multivariate local polynomial to predict the value of the fixed-point . The polynomial function can be described as where

In the multivariate prediction method, the change of with time on the attractor is assumed to be the same as those of nearby points, , according to the distance order. Using pairs of , for which the values are already known, the coefficients of are determined by minimizing

For the weighted least squared problem, when is inverse, the solution can be described by where and is the , then we can get the estimation , where .

There are several important issues about the bandwidth, the order of multivariate local polynomial function, and the kernel function which have to be discussed.

3.3. Parameters Estimations and Selections

We calculate the time delays with the mutual information method [19] separately for each univariate time series , . The mutual information method is based on linear statistics, not taking into account nonlinear dynamical correlations. Therefore, it is advocated that one look for the fist minimum of the time delayed mutual information.

There are many of the embedding dimensions algorithms [12, 20]. In univariate time series , , a popular method that is used for finding the embedding dimensions is the so-called false nearest-neighbor method [21, 22]. Here, we apply this method to the multivariate case.

For the multivariate local polynomial predictor, there are three important problems which have significant influence on the prediction accuracy and computational complexity. First of all, there is the choice of the bandwidth matrix, which plays a rather crucial role. The bandwidth matrix is taken to be a diagonal matrix. For simplification, the bandwidth matrix is designed into . In theory, there exists an optimal bandwidth in the meaning of mean squared error, such that

Another issue in multivariate local polynomial fitting is the choice of the order of the polynomial. Since the modeling bias is primarily controlled by the bandwidth, this issue is less crucial, however. For a given bandwidth , a large value of would expectedly reduce the modeling bias, but would cause a large variance and a considerable computational cost. Since the bandwidth is used to control the modeling complexity, and due to the sparsity of local data in multidimensional space, a higher-order polynomial is rarely used. So, we apply the local quadratic regression to fit the model (i.e., ).

The third issue is the selection of the kernel function. In this paper, we choose the optimal spherical Epanechnikov kernel function [8, 15], which minimizes the asymptotic mean square error (MSE) of the resulting multivariate local polynomial estimators, as our kernel function.

4. Practical Applications and Discussions

In the following, we apply our prediction algorithms to Shenzhen stock index series. The forecast variables here are close price and its change index data.

For the nonlinear stock index time series , divide into two parts: and . The former data are used to construct a model and estimate the coefficients, which are called the trained sets; the latter data are used to make forecasting, which are called prediction sets. Make the prediction of to be . This prediction is defined as -step prediction. For the purpose of simplification, we only predict the first variable.

Furthermore, in order to evaluate the prediction accuracy and effectiveness, we apply the following indices, namely, the mean squared prediction error (MSE) and the absolute error

The former 1000 data are used as the training sets and the latter 91 data are used as the prediction sets. We apply the normalized method to the original time series, then the formula is as follows:

For Shenzhen component index , we obtain the optimal time delay and the minimum embedding dimension , and we reconstruct a phase space with .

In nonlinear time series prediction, most existing methods used phase space reconstruction of univariate time series for prediction. Theoretically, if embedding dimension and delay time are selected reasonably, univariate time series can achieve satisfactory prediction results. But for most actual problems, because the length of acquired time series is limited, and often time series involves noise, the reconstructed phase space of univariate time series cannot very accurately describe the evolutionary track of state variables of dynamic systems. In addition, often we do not know whether univariate time series contains complete information of dynamic systems for phase space reconstruction, and multivariate time series usually contains more complete system information than univariate time series. So, phase space reconstruction of multivariate time series can reconstruct a more accurate phase space. Literatures [5, 19, 20] have verified that multivariate time series forecasting methods based on phase space reconstruction can get more accurate prediction. This paper applies phase space reconstruction methods of multivariate time series and multivariate polynomial regression to fit and predict Shenzhen stock index series.

We combine Shenzhen component index and change index into one multivariate time series to predict the evolvement of component index according to Section 2. Firstly, two component series are obtained by combination of initial Shenzhen index time series and the time series of consecutive differences. Then based on the embedding dimensions and time delays are multivariate phase space is reconstructed, and the proposed methods are used to fit and predict the multivariate time series. For the multivariate nonlinear time series , we obtain the optimal time delay , and the minimum embedding dimension , according to the methods of Section 3.3, and we reconstruct a multivariate phase space with .

Mean squared prediction errors with univariate data are shown in Table 1. BPNN predictor is defined as BP neural networks predictor with the hidden layer consists of 12 neurons and the training times are 900 times. The LM predictor denotes the local mean predictor. The LL predictor is the abbreviation of the local linear predictor. The three local approaches are compared with the U-LPR predictor using the same the number of nearest neighbors 200 (i.e., ). From Table 1, we can conclude that the prediction results of U-LPR predictor are significantly better than the three traditional methods in the same univariate data.

In order to further discuss the influence of different reconstructed vector data on prediction with the same data from complex nonlinear Shenzhen stock market, the prediction errors are shown in Table 2. From Table 2, we can see that the predicted results with M-LPR method are better than the U-LPR one.

The results from Figures 25 show that the proposed multivariate nonlinear stock index time series predictor based on multivariate local polynomial fitting is effective, even in only the last half of the fitting data, the method performs well for the prediction of complex multivariate nonlinear stock index time series. Figure 2 is the one-step prediction results with 1000 fitting data, and Figure 3 is its absolute error. Figure 4 is the one-step prediction results with just the last 500 fitting data, and Figure 5 is its absolute error. From Figures 25, we know that a big fitting data obtain a small mean squared error. This is one of the advantages of nonparametric approaches, that is to say, the more fitting data can make the prediction results more accurate.

5. Conclusion

In this brief, we have presented a new method for the prediction of multivariate nonlinear complex systems for stock market index time series based on multivariate local polynomial regression with kernel smoothing technique. The multivariate local polynomial and weighted least squared method are applied to the nonlinear complex system prediction. The univariate predictor has been compared with the multivariate forecasting based on the multivariate local polynomial fitting in the same stock index data. Comparisons with the conventional three predictors have also been made. The results obtained by Shenzhen component index system have indicated that the prediction mean squared error of the M-LPR predictor is much smaller than the U-LPR one, even if the last half of the fitting data is used in the former one, and the proposed method is also much better than most of the existed methods including the local mean prediction, local linear prediction, and BP neural networks prediction.

Acknowledgments

This work was supported by Chongqing CSTC foundation of China (CSTC2010BB2310, CSTC2009BB2420), Chongqing CMEC foundation of China (KJ100810, KJ100818), and CQUT foundation of China (2007ZD16).