Integrating Independent Component Analysis and Principal Component Analysis with Neural Network to Predict Chinese Stock Market
We investigate the statistical behaviors of Chinese stock market fluctuations by independent component analysis. The independent component analysis (ICA) method is integrated into the neural network model. The proposed approach uses ICA method to analyze the input data of neural network and can obtain the latent independent components (ICs). After analyzing and removing the IC that represents noise, the rest of ICs are used as the input of neural network. In order to forect the fluctuations of Chinese stock market, the data of Shanghai Composite Index is selected and analyzed, and we compare the forecasting performance of the proposed model with those of common BP model integrating principal component analysis (PCA) and single BP model. Experimental results show that the proposed model outperforms the other two models no matter in relatively small or relatively large sample, and the performance of BP model integrating PCA is closer to that of the proposed model in relatively large sample. Further, the prediction results on the points where the prices fluctuate violently by the above three models relatively deviate from the corresponding real market data.
Recently, some progress has been made to investigate the statistical behaviors of the financial market fluctuation, see [1–6] and some forecasting methods of price changes have been developed and studied by using the theory of artificial neural networks [7–9]. Financial market is a complex system which has many influence factors and many kinds of uncertainties, and its fluctuation often represents strong nonlinear characteristics, so that the forecasting of financial time series has long been a focus of financial research. Unlike the traditional time-series analysis methods, such as exponential smoothing, GARCH , and ARIMA , artificial neural network which can handle disorderly comprehensive information does not require strong model assumptions and also has good nonlinear approximation, strong self-learning, and self-adaptive abilities. Therefore, the neural network is usually applied to forecast the financial time series; see [12–15]. The most popular neural network training algorithm for financial forecasting is the backpropagation neural network (BP), which has powerful problem-solving ability. When the BP neural network is applied to forecast financial time series, we need to consider the related factors which are correlated and contain a large amount of noise included into the system. The noise in the data could lead to overfitting or underfitting problems. The key of using BP neural network is to eliminate the correlation and noise among the input data as far as possible, so that it can improve the performance of the BP model and the prediction accuracy.
In the present paper, the methods of independent component analysis (ICA) and principal component analysis (PCA) are integrated into BP neural network for forecasting financial time series, which are called ICA-BP model and common PCA-BP model, respectively. Independent component analysis is a recently developed and new method of feature extraction in blind source separation which is the process of separating the source signal from the mixed signal in the case of unknown mixing signal model; see . The explication of ICA is that multichannel observed signals, according to the statistical independence, are decomposed into several independent components (ICs) through the optimization algorithm. ICA can also be used for the financial model to extract latent factors of influencing financial market; for example, see . In this model, the input data is firstly analyzed by using ICA and obtains several latent ICs. After identifying and eliminating the IC that represents the noise, the remaining ICs are conducted as the input of the BP model, so that the noise component of original data is removed and the input data is made independent of each other. The approach of PCA-BP is to extract the principal components (PCs) from the input data according to the PCA method and conduct PCs as the input of BP model which can eliminate redundancies of original information and remove the correlation between the inputs. In order to forecast the fluctuations of Chinese stock market, we compare the forecasting performance of ICA-BP model with those of common PCA-BP model and single BP model by selecting the data of Shanghai Composite Index (SHCI). Index SHCI plays an important role in Chinese financial markets. It can reflect the activity and the trend of Chinese security markets in large degrees, so this will be helpful for us to understand the status of China macroeconomic. The database is from Shanghai Stock Exchange; see www.sse.com.cn. This paper is organized as follows. Section 2 gives a brief introduction about independent component analysis, BP neural network, and principal component analysis. The forecasting models of stock market are described in Section 3. Section 4 presents the experimental results according to the datasets of SHCI.
2. Research Methodology
2.1. Independent Component Analysis
Independent component analysis as a new statistical method is developed in the recent years; for example, see [18, 19]. The ICA model has been widely applied in signal processing, face recognition and feature extraction; see [20, 21]. Kiviluoto and Oja  also employed the ICA model to find the fundamental factors influencing the cash flow of the 40 stores belonging to the same retail chain. They found that the cash flow of the retail stores was mainly influenced by holiday, season, and competitors' strategy. The purpose of ICA is that the observed data will be decomposed linearly into statistically independent components, and ICA method aims at finding several latent unobserved independent source signals. Suppose that the observed signal is a zero-mean data vector observed at the time and that the source signal is a zero-mean vector with the components being mutually independent, such that , where is a full rank linear mixing matrix. The algorithm considers a linear transformation to obtain the solution of the ICA model, where , is an estimator of a row of the matrix , and is a linear combination of with showing the weight. Since the two independent random variables are closer to Gaussian distribution than the original variables, is closer to Gaussian distribution than any . Consider as a vector that maximizes the non-Gaussiandistribution of ; it means that equals one of the independent components. A quantitative measure of the non-Gaussian of a random variable is negentropy which is based on the concept of entropy in information theory. The entropy of random variable could be interpreted as the information degrees of a given observe variable. The negentropy is defined by where is the entropy, given by , is the probability density, and is a Gaussian random vector having the same covariance matrix as . The negentropy is always nonnegative, and it is zero if and only if follows a Gaussian distribution. For the calculation of negentropy, the method of approximate calculation is often applied, which is given as follows: where are some positive constants, is a Gaussian variable with zero mean and unit variance, is a random variable with zero mean and unit variance, and are some nonquadratic function. Even in this case, the approximation is not accurate. Here we use nonquadratic function , and the approximation is changed into where or , , and is some appropriate constant. If the signals are repeatedly observed, the observed signals are denoted as original signals matrix . When the matrix , the separate matrix, is the inverse of the mixing matrix , the independent component matrix could be used to estimate source signals matrix , where one row of matrix is an independent component. In this paper, we apply the FastICA algorithm which is based on fixed-point algorithm and is applicable for any type of data to solve the separate matrix.
2.2. Artificial Neural Network Model
Neural network is a large-scale and nonlinear dynamic system , which has the abilities of highly nonlinear operations, self-learning, and self-organizing. Since 1987 Lapedes and Farber applied neural network technology to prediction research firstly in 1987, many researchers have been engaged in the study of the predicting method of neural network. Azoff  also applied neural network to forecast time series of financial market. In the financial field, neural network is often applied to predict the closing stock price of the next trading day according to the history data. The stock data of the last trading day, including daily open price, daily closing price, daily highest price, daily lowest price, daily volume (stock trading amount), and daily turnover (stock trading money), are very important indicators. We can apply the history indicators as the input of neural network, the closing price of the next trading day as the output to predict stock price.
In practice, feed-forward neural network, which can be thought of as highly nonlinear mapping from the input to the output, is usually adopted to predict.Since the three-layer feed-forward neural network possesses the capability that it can be approximated to any complicated continuous function, it is suitable for time series prediction. BP neural network that is characterized by the error backpropagation is a kind of multilayer feed-ward neural network. A three-layer BP neural network which contains input layer, one hidden layer, and output layer is chosen in this study. Figure 1 showsthe corresponding topological structure. The training of BP neural network is as follows: for the neuron , its input and output are calculated with the following formula: where is the weight of the connection from the th neuron in the previous layer to the neuron , is the activation function of the neurons, and is the bias input to the neuron. The error in the output is calculated with the following formula: where is the number of training set, is the number of output nodes, is the output value, and is the target value. When the error falls below the threshold or tolerance level, the training will end. The error in output layer and the error in hidden layer are calculated according to the following formula: where is the expected output of the th output neuron, is the actual output in the output layer, is the actual output value in the hidden layer, and is the adjustable variable in the activation function. The weights and biases in both output and hidden layers are updated with back propagation error. The weights and biases are adjusted as follows: where is the number of the epoch and is the learning rate.
2.3. Principal Component Analysis
PCA is a well-established technique for feature extraction and dimensionality reduction. The basic concept of PCA is to use fewer indexes to replace and comprehensively reflect the original more information, and these comprehensive indexes are the principal components. Ouyang  used PCA method to evaluate the ambient water quality monitoring stations located in the main stem of the LSJR. The outcome showed that the number of monitoring stations can be reduced from 22 to 19. Yang et al.  built a prediction model for the occurrence of paddy stem borer based on BP neural network, and they applied the PCA approach to create fewer factors to be the input variables for the neural network. Because the essence of PCA is the rotation of space coordinates that does not change the data structure, the obtained PCs are the linear combination of variables, reflect the original information to the greatest degree, and are uncorrelated with each other. The specific steps are as follows: assume the data matrix with variables, , times observations Firstly, we normalize the original data by using the following method: where and . For convenience, the normalized is still denoted as . Let be the eigenvalues of covariance matrix of normalized data. Also let be the corresponding eigenvector; the th principal component is such that , where . Generally, is called the contribution rate of the th principal component and is called the cumulative contribution rate of the first principal components. If the cumulative contribution rate exceeds 85%, the first principal components contain the most information of original variables.
3. The Forecasting Models of Stock Market
3.1. Common PCA-BP Forecasting Model
The BP neural network model requires that the input variables should have worse correlation because the better correlation between input variables implies that they carry more repeated information, and it may lead to increasing the computational complexity and reducing the prediction accuracy of the model. The concept of the common PCA-BP forecasting model is explained as follows; for more details, see [14, 26]. Firstly, use PCA method to extract the principal components from the input data of BP neural network, and then conduct the principal components as the input of the BP neural network. The following example is to illustrate how to extract the principal components from the input data using the method of PCA. Six financial time series are denoted as , , , , , and , the size is or each . Table 1 exhibits the correlation which is measured by Pearson that correlation coefficient. From Table 1, we can clearly see that the correlation between the six time series is obvious; it means that they contain more repeated information.
Table 2 shows the PCA result on six time series. It indicates that the cumulative contribution rate of the first two PCs exceeds 99%, namely, the first two PCs contain 99% information of the original data. The two PCs are, respectively, recorded as and which are conducted as the input of the PCA-BP model instead of the original data.
3.2. ICA-BP Forecasting Model
In the proposed ICA-BP model, ICA method is firstly used to extract the independent components from the original signals. The feature of original signals is contained in the ICs; each IC represents a feature. The IC including the least effective information of original signals is the noise IC. After identifying and removing the noise IC, the rest of ICs are conducted as the input of BP model. Here the observed time series represent the original signals. The obtained PCs according to PCA method are only eliminated by the correlation, but the obtained higher-order statistics with the ICA method are also independent of each other. In statistical theory, independent is a stronger condition than uncorrelated. The key of the model is to identify the noise IC after obtaining the latent ICs. The testing-and-acceptance () method is used to solve the problem in this study; see .
Similarly as the above given six time series , , , , , and , Figure 2 shows the tendencies of six time series. Each of the time series is considered as a row; they can be formed by the matrix of size . By the ICA method, the separate matrix and the independent component matrix can be obtained. Each row of , the of size , represents an IC. Figure 3 shows the tendencies of the six ICs. It can be seen from Figure 3 that each IC can represent different features of the original time series data in Figure 2.
Now the method is applied to identify the noise IC. To introduce the algorithm of , we consider the obtained ICs. After excluding one IC for each iteration, the remaining ICs are used to reconstruct the original signals matrix. Let be the excluded IC and the reconstructed original signals matrix. can be calculated according to the following equation: where , is the th reconstructed variable, is the th column vector of mixing matrix which is the inverse of separate matrix , and is the th IC. Respectively, we consider the cases , that is, repeat iterations and each IC is excluded once. The reconstruction error, which is measured by using relative hamming distance (RHD) , between each reconstructed matrix and the original signals matrix can be computed. The RHD can be computed as follows: where , . Here if , if , and if . is the actual value, is the predicted value, and is the total number of data points.
The RHD reconstruction error can be used to assess the similarity between the original variables and their corresponding reconstructed variables. When the RHD value is closer to zero, it shows that there is higher similarity between original variables and their corresponding reconstructed variables, that is, the corresponding ICs that are used to reconstruct original variables contain more features of original variables and the eliminated IC contains less effective information. On the contrary, when the RHD value is farther from zero, this means that the similarity between the original variables and their corresponding reconstructed variable is lower, that is, the eliminated IC contains more effective information of original variables. So the reconstruction in which the RHD value is the closest to zero should be found out; the corresponding eliminated IC is the noise IC. In allusion to the given six financial time series, Table 3 shows the RHD reconstruction errors of each iteration.
Table 3 reveals that the value of RHD, which is reconstructed by using IC2, IC3, IC4, IC5, and IC6 and eliminating IC1, is the smallest. It is concluded that the IC1 contains the least information and IC1 represents the noise IC. IC2, IC3, IC4, IC5, and IC6 are conducted as the input of the proposed ICA-BP model.
4. Empirical Research
4.1. Selection of Datasets
For evaluating the performance of the proposed ICA-BP forecasting model and the common PCA-BP forecasting model, we select the data of Shanghai Composite Index to analyze the models by comparison. In the BP model, the network inputs include six kinds of data, daily open price, daily closing price, daily highest price, daily lowest price, daily volume, and daily turnover. The network outputs include the closing price of the next trading day, because, in stock markets, practical experience shows us that the six kinds of data of the last trading day are very important indicators when we predict the closing price of the next trading day at the technical level. For comparing the performance, two sets of data are used to analyzed, that is, Set 1 and Set 2. Set 1 contains relatively fewer data, that is, the data of SHCI each trading day from April 11, 2008, to November 30, 2009. Figure 4 presents the daily SHCI closing price in this period.
Set 1 includes 400 selected data in which the first 300 data points are used as training set while the rest 100 data points are used as testing set. Set 2 contains relatively more data, that is, the data of SHCI each trading day from January 4, 2000, to November 30, 2009. Figure 5 presents the daily SHCI closing price in this period. In Set 2, there are 2392 selected data in which the 2171 data points from 2000 to 2008 are conducted as training set while the remaining 221 data points are conducted as testing set.
4.2. Performance Criteria and Basic Setting of Model
The prediction performance is evaluated by using the following performance measures: the mean absolute error (MAE), the root mean square error (RMSE), and the correlation coefficient (). The corresponding definitions are given as follows: where is the actual value, is the predicted value, is the mean of the actual value, is the mean of the predicted value, and is the total number of the data. The smaller MAE and RMSE values and the larger value represent the less deviation, that is, the better performance of the forecasting model.
To compare the forecasting performance of the sing BP model, the common PCA-BP model, and the proposed ICA-BP model, all the three models contain BP neural network, so we set for the BP neural network the similar architecture and the same parameters.This can show the effect that the PCA and the ICA method process the input data of BP neural network.For the BP neural network, we only set one hidden layer. The number of neural nodes in the input layer is (it is different for the 3 models), the number of neural nodes in the hidden layer is set to be according to the empirical formula (see ), and the number of neural nodes in the output layer is 1. We can use to represent the architecture of the network. The threshold of the maximum training cycles is 1000, the threshold of the minimum error is 0.0001, the activation function is , and the learning rate is 0.1. In the single BP model, the number of neural nodes in the input layer is 6 which corresponds to daily open price, daily closing price, daily highest price, daily lowest price, daily volume, and daily turnover, the number of the neural nodes in the hidden layer is 13, the number of neural nodes in the output is 1 which corresponds to closing price of the next trading day, and the architecture is 6-13-1. In the common PCA-BP model, after analyzing the six original time series by using PCA method, we obtain two PCs (see Section 3.1), and the number of neural nodes in the input layer is 2 which corresponds to the two PCs, the number of neural nodes in the hidden layer is 5, the output layer is the same as the single BP model, and the architecture is 2-5-1. In the proposed ICA-BP model, after analyzing the six time series by using ICA method and eliminating one IC that represents the noise, we obtain five ICs (see Section 3.2), and the number of neural nodes in the input layer is 5 which corresponds to the five ICs, the number of neural nodes in the hidden layer is 11, the output layer is also the same as the single BP model, and the architecture is 5-11-1. The architectures of all the three models are 1.
4.3. The Comparisons of Forecasting Results
For comparing the forecasting performance of the proposed ICA-BP model with the common PCA-BP model and the single BP model, the two sets of data (Set 1 and Set 2) are, respectively, used for the empirical study.
(I) Firstly, Table 4 depicts the forecasting result of daily SHCI closing price with the three forecasting models by using Set 1 data. It can be observed that the MAE is 68.5315, the RMSE is 90.3209, and the is 0.9334 in the proposed ICA-BP model. The MAE and the RMSE are smaller and the is larger than those of the other two models. We can summarize that the proposed ICA-BP model outperforms the other two models and the common PCA-BP model outperforms the single BP model. From Figure 6,we can conclude the same result.
Table 5 and Figure 7 both show the forecasting result of daily SHCI closing price with the three forecasting models by using Set 2 data. The conclusion is similar to that of Set 1, that is, the proposed ICA-BP model has the best performance and the common PCA-BP outperforms the single BP model.
(II) Comparing Table 4 with Table 5, we can also see that the proposed ICA-BP model outperforms the common PCA-BP model distinctly in Set 1. The MAE values are 68.5315 and 84.3123, respectively, where the difference of the two numbers is about 16, and the RMSE values are 90.3209 and 119.5324, respectively, where the difference of two numbers is about 29. Nevertheless in Set 2, the MAE values are 51.5165 and 56.4246, respectively, where the difference of the two numbers is about 5, and the RMSE values are 70.8551 and 80.9682, respectively, where the difference of the two numbers is about 10. It shows that the performance of the common PCA-BP model becomes closer to the proposed ICA-BP. It means that the denoising ability of ICA method is clearly better than that of the PCA method in relatively small samples, but the denoising ability of PCA method is closer to that of the ICA method in relatively large samples.This may be because the PCA method is based on Gaussian assumption and the ICA method is based on non-Gaussian assumption.In the case of small sample, the corresponding distribution usually deviates from the Gaussian distribution.
(III) In this part, we consider the statistical behaviors of the price returns in Shanghai stock market and the relative error of the forecasting result in Set 2. The formula of stock logarithmic return and relative error is given as follows: where and , respectively, denote the actual value and the predicted value of daily closing price of SHCI at the date , . In Figure 8, we consider the fluctuation of the daily SHCI return and the relative error of forecasting result from the single BP model. Similarly, Figure 9 is the plot for the daily SHCI return and the relative error of forecasting result from the common PCA-BP model; Figure 10 is the plot for the daily SHCI return and the relative error of forecasting result from the proposed ICA-BP model. From Figures 8–10, it can be seen that there are all some points with large relative error of forecasting result in the three models. Through the observation, we can notice that these points appear basically in the place where there is large return volatility (marked in Figures 8–10). This indicates that the predicted results to the points where prices fluctuate violently are relatively not satisfactory by using the three models. The marked parts in Figures 6 and 7 can also support this opinion.
In the present paper, we investigate and forecast the fluctuations of Shanghai stock market. The independent component analysis method and the principal component analysis method are introduced into the neural network model to forecast the stock price. In the proposed ICA-BP model, the input data is firstly analyzed by using ICA, and we obtain several latent ICs; after identifying and eliminating, the IC represents the noise and the remaining ICs are conducted as the input of the BP model. Further, the empirical research is made to compare the actual daily SHCI closing price with the predicted values of the three models, and the relative errors of forecasting results are analyzed. Empirical results show that the ICA-BP model outperforms the other two models.
The authors are supported in part by the National Natural Science Foundation of China Grant no. 70771006 and no. 10971010.
R. Gamberini, F. Lolli, B. Rimini, and F. Sgarbossa, “Forecasting of sporadic demand patters with seasonality and trend components: a empirical comparison between Holt-Winters and (S) ARIMA methods,” Mathematical Problems in Engineering, vol. 2010, Article ID 579010, 14 pages, 2010.View at: Publisher Site | Google Scholar
R. Gaylord and P. Wellin, Computer Simulations with Mathematica: Explorations in the Physical, Biological and Social Science, Springer, New York, NY, USA, 1995.
K. Ilinski, Physics of Finance: Gauge Modeling in Non-equilibrium Pricing, John Wiley, New York, NY, USA, 2001.
M. F. Ji and J. Wang, “Data analysis and statistical properties of Shenzhen and Shanghai land indices,” WSEAS Transactions on Business and Economics, vol. 4, pp. 29–33, 2007.View at: Google Scholar
Q. D. Li and J. Wang, “Statistical properties of waiting times and returns in Chinese stock markets,” WSEAS Transactions on Business and Economics, vol. 3, pp. 758–765, 2006.View at: Google Scholar
T. C. Mills, The Econometric Modelling of Financial Time Series, Cambridge University Press, Cambridge, UK, 2nd edition, 1999.View at: Publisher Site
E. D. McKenzie, “General exponential smoothing and the equivalent ARMA process,” Journal of Forecasting, vol. 3, pp. 333–344, 1984.View at: Google Scholar
B. Y. Lu, Y. L. Chen, and Y. Y. Li, “The forecast of the pre-processing data with BP neural network and principal component analysis,” Science & Technology Information, vol. 17, pp. 29–30, 2009.View at: Google Scholar
A. Back and A. Weigend, “Discovering structure in finance using independent component analysis,” in Proceedings of 5th International Conference on Neural Networks in Capital Market, pp. 15–17, Kluwer Academic Publishers, 1997.View at: Google Scholar
A. Hyvarinen, Independent Component Analysis, Mechanic Industry Press, 2007.
Z. Q. Yang, Y. Li, and D. W. Hu, “Independent component analysis: a survey,” Acta Automatica Sinica, vol. 28, no. 5, pp. 762–772, 2002.View at: Google Scholar
K. Kiviluoto and E. Oja, “Independent component analysis for parallel financial time series,” in Proceeding of the 5th International Conference on Neural Information, pp. 895–898, 1998.View at: Google Scholar
L. Q. Han, Theory, Design and Application of Artificial Neural Network, Chemical Industry Press, 2002.
E. M. Azoff, Neural Network Time Series Forecasting of Financial Market, Wiley, New York, NY, USA, 1994.
Y. Ouyang, “Evaluation of river water quality monitoring stations by principal component analysis,” Water Research, vol. 39, pp. 2621–2635, 2005.View at: Google Scholar
L.-N. Yang, L. Peng, L.-M. Zhang, L.-I. Zhang, and S.-S. Yang, “A prediction model for population occurrence of paddy stem borer (Scirpophaga incertulas), based on Back Propagation Artificial Neural Network and Principal Components Analysis,” Computers and Electronics in Agriculture, vol. 68, no. 2, pp. 200–206, 2009.View at: Publisher Site | Google Scholar