Abstract

This study attempts to characterize and predict stock returns series in Shanghai stock exchange using the concepts of nonlinear dynamical theory. Surrogate data method of multivariate time series shows that all the stock returns time series exhibit nonlinearity. Multivariate nonlinear prediction methods and univariate nonlinear prediction method, all of which use the concept of phase space reconstruction, are considered. The results indicate that multivariate nonlinear prediction model outperforms univariate nonlinear prediction model, local linear prediction method of multivariate time series outperforms local polynomial prediction method, and BP neural network method. Multivariate nonlinear prediction model is a useful tool for stock price prediction in emerging markets.

1. Introduction

Researchers in economics and finance have been interested in predicting stock price behavior for many years. A variety of forecasting methods have been proposed and implemented. Among them, nonlinear prediction method is a new method developed in the last decades. It is suitable for stock price short-term prediction for that stock market is seen as a nonlinear dynamical system.

An important aspect of nonlinear prediction is to detect nonlinear structures in time series. One of the most commonly applied tests for nonlinearity is surrogate data method of Theiler et al. [1]. In 1994, the method is extended to the multivariate case by Prichard and Theiler [2]. Now the method has been used in many fields such as electrocardiograms (EEG) [3] and finance [4]. Most prediction methods can be grouped into global and local methods. The class of local nonlinear prediction methods is based on next neighbor searches and is introduced by Lorenz [5]. Many introductions to next neighbor techniques have been published. A very simple next neighbor prediction method is proposed by Farmer and Sidorowich [6] in 1987. With the development of multidimensional phase space reconstruction, the method has been extended for the multivariate time series case and achieved satisfactory results [7, 8].

In this paper, surrogate data method of multivariate time series and multivariate nonlinear prediction method are applied to Shanghai stock market in China. Local nonlinear prediction methods based on next neighbor techniques including linear, polynomial, and BP neural network are proposed and applied to predict stock price behavior. We also compare the accuracy of different prediction methods mentioned in this paper.

The remainder of this paper is organized as follows. Section 2 describes the data examined in this study. We test for nonlinearity in the data by surrogate data method as given in Section 3. Multivariate nonlinear prediction methods based on multidimensional phase space reconstruction are proposed in Section 4. The application of the prediction methods to Lorenz system and Shanghai stock market and the results are also given. Section 5 draws conclusion.

2. Data

Daily stock market index data are sourced from Shanghai stock exchange (SSE) for the time period of 1 January 2001 through 31 December 2006, given a total of 1443 observations. The following stock price indexes are studied: SSE Composite Index (SHCI), SSE Constituent Index (SSE 180 Index) (SHCI1), SSE A Share Index (SHAI), and SSE B Share Index (SHBI). The price series, which is not stationary and contains trends, seasonality, and cycles, is converted into continuously compounded returns series to obtain an accepted stationary series: . The time series of daily close price are given in Figure 1.

3. Testing for Nonlinearity Using Surrogate Data Method

The wide-spread and powerful approach to test nonlinearity in observed time series is based on surrogate data method [1]. The idea is to generate many surrogate data sequences from the original data record, preserve its linear properties but destroy some nonlinear structures that possibly exist. In 1996, Paluš [9] proposed an extension to multivariate time series of the nonlinearity test, which combined the redundancy and line redundancy approach with the surrogate data technique. Suppose we have m measured variables with zero mean, unit variance, and correlation matrix C. The line redundancy of can be defined as

where are the eigenvalues of the correlation matrix C. The general redundancy of can be defined as

where are the diagonal elements (variances) of the correlation matrix C, and is the entropy of discrete random variable . If have an m-dimensional Gaussian distribution, and are theoretically equivalent. The general redundancies detect all dependences in data under study, while the line redundancies are sensitive only to linear structures. Due to stationary, the redundancies do not depend on time and are the function of the time delays .

We use general redundancy and line redundancy as the test statistics to compare the original data and the surrogates. The method of generating surrogate data based on phase-randomized Fourier transform algorithm proposed by Prichard and Theiler [2] is used in our paper. Typically, the confidence of rejection is given in terms of significance

where is the test statistic of the original data, is the average, and is the standard deviation of the test statistics of . A indicates that the null hypothesis is rejected at a level of 0.05, and the time series is nonlinear with probability 95%. If the null hypothesis is not rejected, it means that the time series can be described by a linear Gaussian process with probability 95%.

The surrogate data method is first used to detect nonlinearity of Lorenz system. The Lorenz system is given by a set of nonlinear equations: , , , where the parameters are chosen as , , We create time series of the x, y, and z components by Runge-Kutta algorithm with a time step of 0.001 with the assumed initial values of , and . We use 4500 samples of the three variables time series and generate 39 multivariate surrogate data sets. Line redundancy and general redundancy as functions against delay time are presented in Figures 2(a) and 2(b), respectively. Obviously, linear redundancy is different from general redundancy. The test statistic of the line redundancy and the test statistic of the general redundancy as functions against delay time are also presented in Figure 2. While the nonlinear statistic presented in the figures are larger than 1.95 when , the null hypothesis is rejected. The nonlinearity is reliably detected; it is also consistent with the fact.

We analyze the nonlinearity of stock returns time series in Shanghai stock market. 39 surrogates are generated using phase randomization algorithm. Line redundancy , general redundancy and its test statistics as functions against delay time are presented in Figure 3. Obviously, linear redundancy is different from general redundancy for all 4 stock returns time series. In other words, it shows that the surrogate data is technically good and should not be a source of spurious results in the test. While the nonlinear statistic presented in the figures are larger than 1.95 when , the null hypothesis is rejected. The nonlinearity of all 4 stock returns time series is reliably detected.

4. Prediction Method Based on Multivariate Time Series

In this section, multivariate prediction methods, which extend from univariate prediction [6] proposed by Farmer and Sidorowich, are introduced. The one-dimensional versions will be obtained as a special case. Suppose we have an M-dimensional time series where , The embedding of multivariate time series [10] is given as

where , is the time delays and is the embedding dimensions. In this paper, we use mutual information function [11] to choose the time delays separately for each scalar time series. The method to get the minimum embedding dimension is based on minimum forecasting error.

Following the delay embedding theorem, if or is sufficiently large, there exists a map such that . In local prediction method, the change of with time on the attractor is assumed to be the same as that of nearby points , where is the number of neighbor points. In general, for a time series prediction problem, a predictor fits a model to given data and finds an approximate mapping function between the input and the output values. To estimate the kth predictor of vector we employ a set of nearest neighbors, of

The real value is defined as , where is the prediction error, is the prediction value. Three prediction models used in this paper are defined as follows.

(1)Line regression prediction of M-dimension inputs (MLP) is defined asWhen , it is the line regression prediction of univariate input (ULP).(2)Polynomial prediction of M-dimension inputs (MPP) is defined as(3)Back-propagation neural network of M-dimension inputs (MBP) is defined aswhere is a matrix of weights.

To evaluate forecasting performance, the prediction values are compared with actual values according to root mean squared error (RMSE) and normalized mean squared error (NMAE) criteria.

5. Numerical Simulation

5.1. Nonlinear Prediction of Lorenz System

The proposed method is used to predict Lorenz system mentioned above. The forecast variable here is . The first 1400 data points are taken as the training data. Then the model that is used to predict the last 100 data points is predicted. The univariate time series prediction is performed at first. The delay time and embedding dimension m = 5 are selected based on mutual information function [11] and false nearest neighbor method [7]. The results of one-step prediction with univariate time series (ULP) are given in Table 1.

We combine , and time series into one multivariate time series to predict the evolvement of variable . Time delays are found to be 4 for each variable with mutual information functions. We select embedding dimensions m = 2 for each variable based on false nearest neighbor method. Three-layer BP neural networks is constructed with 6 input neurons, 15 hidden neurons, and 1 output neurons for Lorenz system. We use RMSE and NMSE to monitor prediction performance of the methods. Table 1 gives the performance measure of three methods (MLP, MPP, and MBP). As it can be seen from Table 1, the application of the MLP took on the smallest RMSE and NMSE, ULP ranked second, followed by MBP and MPP. The results of MLP with more hits of minimal errors seem to give the best performance among the four methods. MLP is more suitable and can be applied to the prediction of Lorenz system.

5.2. Nonlinear Prediction for Shanghai Stock Market

In the following, we apply our prediction algorithms to stock returns series. The forecast variable here is next day’s close price. The total number of data points measured in this period is 1442. The first 1319 data points are taken as the training data. Then the model is used to predict the last 123 data points.

Firstly, univariate prediction of stock price is performed. We choose the delay time for all four daily close price series based on mutual information function. Simultaneously, we set embedding dimension for the daily close price time series of SHCI, SHCI1, SHAI, and SHBI, respectively, which is based on false nearest neighbor method. The results of one-step prediction with univariate time series (ULP) are given in Table 2.

We combine close price, open price, high price, and low price time series into one multivariate time series to predict the evolvement of the close price. Time delays are found to be 1 for each variable with mutual information functions. The embedding dimensions are selected as follows: for the close price, the open price time series, the high price time series, and the low price time series of SHCI; for SHCI1, for SHAI, and for SHBI, respectively. After the reconstruction of phase space, one-step prediction based on multivariate time series is applied to the close price. In the ANN section of model, we also choose three-layer BP neural networks. The BP was constructed with input neurons , hidden neurons and output neurons for SHCI, SHCI1, SHAI, and SHBI. Table 2 gives the performance measure of three methods (MLP, MPP, and MBP). As it can be seen from Table 2, the application of the MLP takes on the smallest RMSE and NMSE.

From Tables 1 and 2, we can see that the errors between the observation and the prediction are all very small, which suggest that prediction obtained from multivariate time series is very effective. Thus, the practical application results show the effectiveness of the proposed approaches. Comparison of the errors among models shows that the multivariate model is superior to the univariate model, and local linear prediction method of multivariate time series outperform local polynomial prediction method and BP neural network method.

We also see that the errors NMSE of all stock returns series for all the methods are relatively big, possibly because the values of stock returns are very small. After all the prices were transformed into natural logarithms, we use our methods to predict logarithms series of stock close price. Table 2 gives the performance measure of all the methods (ULP, MLP, MPP, and MBP). From Table 2, we can see that no significant differences among the errors RMSE of logarithms series of stock close price and those of stock returns, but the errors NMSE of logarithms series of stock close price are far smaller than those of stock returns, which suggest that the forecast results of stock price itself or its natural logarithm sequence are more ideal.

6. Conclusion

In this work, multivariate time series of stock returns in Shanghai stock exchange has been analyzed in order to discover whether a nonlinear dynamical approach can provide better predictions than others. Several kinds of analysis have been conducted on the data. Surrogate data method of multivariate time series provides evidence of the presence of a nonlinear deterministic component in the dynamics considered. Predictions are obtained approximating the nonlinear dynamics by linear autoregressive, polynomials autoregressive, and BP neural network, in the context of the nearest neighbor method. One-step prediction is performed. The predictive results are, on the whole, satisfactory, regarding the comparison between the observed and predicted time series. Comparison of the errors among models shows that the prediction quality of multivariate time series approximating the nonlinear dynamics by linear autoregressive outperforms that of univariate time series, and local linear prediction method of multivariate time series outperforms local polynomial prediction method and BP neural network method. The forecast results of stock price itself or its natural logarithm sequence are more ideal. It is also conjectured that stock price time series could be modeled and predicted better by the dynamical systems approach.