Machine Learning Applications in Complex Economics and Financial NetworksView this Special Issue
Advantages of Combining Factorization Machine with Elman Neural Network for Volatility Forecasting of Stock Market
With a focus in the financial market, stock market dynamics forecasting has received much attention. Predicting stock market fluctuations is usually challenging due to the nonlinear and nonstationary time series of stock prices. The Elman recurrent network is renowned for its capability of dealing with dynamic information, which has made it a successful application to predicting. We developed a hybrid approach which combined Elman recurrent network with factorization machine (FM) technique, i.e., the FM-Elman neural network, to predict stock market volatility. In this paper, the Standard & Poor’s 500 Composite Stock Price (S&P 500) index, the Dow Jones industrial average (DJIA) index, the Shanghai Stock Exchange Composite (SSEC) index, and the Shenzhen Securities Component Index (SZI) were used to demonstrate the validity of our proposed FM-Elman model in time-series prediction. The results were compared with predictions obtained from the other two models which are basic BP neural network and the Elman neural network. Some experiments showed that the FM-Elman model outperforms others through different accuracy measures. Furthermore, the effects of volatility degree on prediction performance from different stock indexes were investigated. An interesting phenomenon had been found through some numerical experiments on the effects of different user-specified dimensions on the proposed FM-Elman neural network.
In recent years, the fluctuation analysis of financial time series has received a lot of concerns. Stock market volatility prediction has become a significant topic in economic research. The study of stock market volatility forecasting can be helpful for policy makers to take appropriate decisions on asset allocation and risk management. Therefore, predicting the volatility of financial time series with a reasonable accuracy deserves much attention. However, stock market exhibits nonlinear and chaotic properties in nature [1, 2]. Statistical models then have some difficulties in dealing with nonlinear and nonstationary time series or deriving satisfactory forecasting performance under the statistical assumptions of normally distributed observations. The predicting becomes more challenging.
Artificial neural network has the advantages on learning from sample data and capturing the nonlinear relations among interconnected neurons through training mode . It is capable of dealing with nonlinear high-dimensional data and approximating any nonlinear functions with arbitrary precision [4–7]. Particularly, the simple recurrent network, i.e., Elman neural network (Elman NN)  has shown its stronger ability as it has the characteristic of time-varying. And the Elman NN is a kind of feedback network where the added layer connecting to the hidden layer can be regarded as a time delay operator capable of memorizing recent events. It is a time-varying predictive control system that has faster convergence and more accurate mapping ability.
Elman NN has been utilized to financial prediction and applied to many other different types of time series. Most studies on Elman NN obtained higher accuracy. Zheng  used an Elman NN to forecast opening prices of the Shanghai Stock Exchange. Wu and Duan applied the Elman NN in predicting stock  and gold future markets , respectively. In the area of electricity prediction, Rani and Victoire  integrated the decomposition method and group search optimization algorithm into the Elman NN. It showed that the Elman NN outperformed other approaches.
There are also other artificial neural networks like wavelet neural network and radial basis function neural network [13–16]. Some developed artificial intelligence techniques like expert systems [17, 18], support vector machines (SVMs) [19, 20], and hybrid methods [21, 22] are also applied in forecasting stock prices. Recently, some novel models have utilized random jump or random time effective function with different neural networks [23, 24] which have been proposed in forecasting financial market.
Although the models which are based on artificial intelligent have achieved remarkable results, there are still limitations. There is a few technique in most models which pay attention to the nonlinear interactions among the inputs. For example, the nonlinearities in neural network models were handled by the activation functions. These models without consideration of interactions between features with different scales have been widely used in some applications such as image processing, mechanical translation, and speech recognition [25–27].
FM is originally used for collaborative recommendations which were first introduced by Rendle . FM is a supervised learning method that can model feature interactions with second-order even when the data have very high sparsity and high dimension. FMs show state-of-the-art performance as they have two main benefits. First, FMs are on a par with polynomial regression but can achieve empirical accuracy with smaller and faster evaluation results. Second, unlike the linear regression, FMs can infer the weights of feature interactions that were not observed in the training dataset. The weights of second-order feature interactions have the low-rank property which makes FMs become increasingly popular in the recommender system. Although FM is a general framework of matrix factorization, FM shows more flexibility as the matrix factorization method only models the relation between two entities . FMs are general predictors like SVMs and have a lot of applications in industry. FMs are applicable to any variables with real feature and are not restricted to recommender systems. FM gives a promising direction for the prediction purpose in regression, classification, and ranking [30–33].
As far as we know, real-world time series is rarely pure nontime-varying. And the linear regression is not always capable of deriving the interactions between features which however are more common in various applications. Hence, the problem of dealing with time-varying and nonlinear interactions can be solved by combining FM with Elman NN. Moreover, it is almost universally agreed in the forecasting literature that no single model is the best in every situation because a real-world problem is often complex. Using any single model may not be able to capture different patterns equally well . Therefore, we propose a forecasting model combining FM technique with Elman NN for stock market volatility prediction in the present paper.
In this paper, we apply the FM-Elman neural network to forecast the volatility degree’s behavior of the Standard & Poor’s 500 Composite Stock Price (S&P 500) index, the Dow Jones industrial average (DJIA) index, the Shanghai Stock Exchange Composite (SSEC) index, and the Shenzhen Securities Component index (SZI) from January 2nd, 2000, to December 31st, 2011. Different threshold values were introduced into our model, and the corresponding volatility prediction results were presented. To show the advantages of the proposed FM-Elman model, we compare the predicting results with two other neural network models including BP network and Elman recurrent network through three performance evaluation measures such as the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE).
The remainder of this paper is presented as following sections. In Section 2, the Elman NN and FM are reviewed where they are prepared for our proposed model. Then, we give the prediction model FM-Elman neural network in Section 3. In this section, we first give the model description and in the same time introduce some needed ingredients of it. And the algorithm of the FM-Elman model is also given. Section 4 presents the main forecasting results of the FM-Elman model. This section gives predicting comparisons among our proposed model, BP neural network, and Elman neural network. It not only presents the effects of different parameters like volatility degree and user-specified dimension on the FM-Elman model’s performance but also considers other evaluation measures. And Section 5 highlights some necessary conclusions finally.
2. Elman Neural Network and Factorization Machine
2.1. Elman Neural Network (Elman NN)
Elman neural network was founded by Elman  in 1990 which is famous for its recurrent topology structure. Unlike the BP network, an Elman NN has a set of recurrent nodes. The so-called recurrent nodes in the buffer received message from the peered output nodes in the hidden layer and then transmitted message to the hidden layer. Every hidden node is connected to only one recurrent neuron, and the message will remain the same after transmitting. Hence, the number of recurrent layer nodes is the same as the number of hidden nodes, and the recurrent layer contains the state of input data from the hidden layer.
Figure 1 gives the structure of multi-input Elman NN. The Elman NN is composed of the input layer, the hidden layer, the output layer, and the recurrent layer. There are nodes in the input layer, and both the hidden layer and the recurrent layer have nodes. In the output layer, there exists only one unit neuron. The mathematical computation for the nonlinear state of the Elman NN iswhere is the vector of output values in the hidden layer, is the final output of the network, and denotes the input of the network at time . The weight matrix connects the input layer node to the node in the hidden layer, connects the node in the recurrent layer to the hidden layer neuron, and is the matrix which connects the node in the hidden layer to the output node. Functions and are the activation functions where is the sigmoid function and is an identity function in this paper.
From equation (1) and through deduction, we can obtain thatwhere depends on the matrix and which comes from different time. Elman NN has the ability to adapt to time series varying.
2.2. Factorization Machine
FM has the same prediction ability as SVMs but also has capability of estimating reliable parameters under very sparse data. The feature of modelling all variable interactions is comparable to a polynomial kernel in SVM. The equation for a FM with second-order feature is defined as follows:where the parameters , , and have to be determined. And is the inner product of two vectors with size . Then,which models the interaction between the th variable and the th variable, where is the th variable with dimension factors.
Our intuition for the complexity of equation (3) is in because all pairwise interactions have to be computed. As there is no parameter in a model depending on two variables directly, the pairwise interactions in equation (3) are reformulated as follows:
And the equation only needs linear runtime to be computed after the reformulation. So, FMs are applicable from a computational point of view.
3. Our Proposed Method
We construct the Elman recurrent neural network with factorization machine, i.e., FM-Elman neural network, to predict the volatility of different stock indexes. The detailed topology of FM-Elman neural network is presented in Figure 2.
The layers of the FM-Elman neural network are analyzed as follows:(1)Hidden Layer. The nodes in the hidden layer are partitioned into two parts. One part of them has normal nodes which show the linear relations among the input data. And the remaining nodes in the other part incorporate all interactions between each pair of features from the input data. The results are computed by where is the input value from input node , denotes the value of the th node in the hidden layer, is the undetermined weight which relates the th input node to the th normal node in the hidden layer, is the weight connecting the th input node to the remaining node in the hidden layer which is also undetermined, and have the same meaning with and except for the first two parameters which link the nodes in the recurrent layer to the hidden layer, is the iteration number in the formulas, is the user-specified dimension, and is the activation function.(2)Recurrent Layer. The number of nodes in the recurrent layer is the same as the number of hidden nodes. Each hidden node is connected to only one node in the recurrent layer, and the connected weight is a constant value one. So, the recurrent layer is also partitioned into two parts which are presented as follows:(3)Output Layer. The outputs are where is the undermined weight, is the output value of th node in the output layer, and is the activation function. There are different loss functions to calculate the error between the actual and the estimated values from the output layer. We consider the squared loss function which is given as follows:where is the actual value in the th iteration.
So, the final output error of the FM-Elman model is computed by
To optimize the FM-Elman model, we often use the stochastic gradient decent method to update the weights until it achieves convergence.
3.2. Algorithm of FM-Elman Model
The training process of the FM-Elman model is detailed as follows:(1)The gradients of the weights and the updated rule in the output layer are computed as follows: where is the learning rate.(2)The gradients of the weights and the updated rule in the hidden layer connected by the recurrent layer are calculated as the following two cases: where is the learning rate, is the user-specified dimension, and and are corresponded derivative functions of and .(3)The gradients of the weights and the updated rule in the hidden layer connected by the input layer are computed as the following two cases:where is the learning rate, is the user-specified dimension, and and are corresponded derivative functions of and .
4. Forecasting Results
4.1. Data Selecting and Processing
Stock prices’ different changing behaviors and volatility predicting study have long been a focus in economic research. We use the logarithmic return to describe the statistical characteristic of a stock return volatility. The stock logarithmic return is defined aswhere denotes the stock daily closing price at time .
The data (http://www.finance.yahoo.com) chosen for our experiment are the Standard & Poor’s 500 Composite Stock Price (S&P 500) index, the Dow Jones industrial average (DJIA) index, the Shanghai Stock Exchange Composite (SSEC) index, and the Shenzhen Securities Component index (SZI). All the data are collected from trading days ranging from January 2nd, 2000, to December 31st, 2011. So, the size of different time series is 3019, 3027, 2901, and 2902, respectively. We partition them into training sets (from January 2nd, 2000, to December 31st, 2007) and testing sets (from January 2nd, 2008, to December 31st, 2011). Table 1 describes the data’s statistical feature, where and denote the sizes of the whole data and the testing samples, respectively.
In this paper, a threshold value is introduced as the volatility degree. Let denote the set in which the stock returns’ absolute values are greater than the value, and the definition is given by . Once the value is set, we can obtain the dataset including the satisfied stock daily closing price. Figure 3 gives an example of stock returns with different thresholds. For a fixed threshold value , the corresponding stock trading dates are determined. The trading dates are in the set where time values satisfy . Newly formed series are arranged in a chronological order. Table 2 gives the numbers of data for stock indexes distributed in training data and testing data under different threshold values .
When the threshold value equals 0, the averaged values of daily absolute returns for S&P 500, DJIA, SSEC, and SZI are 0.0094, 0.0089, 0.0117, and 0.0130, respectively. We set different volatility degrees 0.003, 0.006, 0.009, and 0.012 to see the data numbers distributed in the training and testing data set, respectively. In Table 2, as the threshold value increases, the quantity of data in both training dataset and testing dataset that exceeds the given threshold value gradually reduces. And it can be predicted that when is larger than 0.012, the corresponding numbers will be fewer.
Four input variables including the daily opening prices, the daily highest prices, the daily lowest prices, and the daily closing prices are selected according to the newly formed dates. And we choose the next time daily closing price in the chronological ordered datasets as the output variable. In order to reduce the noise’s impact on the stock markets, all the input data are normalized as follows:
Then, it is easily to obtain the actual prediction value through .
4.2. Performances of FM-Elman Model
In the proposed FM-Elman neural network, we choose the structure with where the number of input nodes is 4, the number of the hidden nodes is 10, and the number of the output nodes is 1. We set the maximum iterations number as 5000, , and the predefined minimum training threshold is .
To analyze and evaluate the predicting performance of the FM-Elman neural network model, we use the accuracy measures with the corresponding definitions as follows:where and are the actual value and the predictive value in the th iteration and is the sample size. When the values of these evaluation measures are smaller, the prediction performance is better.
In this section, we derive the stock prices’ different fluctuation behaviors through the proposed FM-Elman neural network. First, the new proposed model is proved to be better compared with BPNN and Elman neural model through different evaluation measures. Then, the prediction performance of FM-Elman neural network is measured when threshold value varies. And finally, we can see how the user-specified dimension impacts the performance of the proposed model.
When the threshold value equals 0, the training datasets and testing datasets are the original datasets. And the prediction results of S&P500 and SSEC by the FM-Elman neural model with are presented in Figures 4 and 5.
We then give the performance comparisons among different prediction models for in Table 3 where the MAPE (100) means the latest 100 days of MAPE in the testing data. Three different prediction models include BPNN, Elman neural network, and FM-Elman neural network with the user-specified dimension . Table 3 shows that FM-Elman model’ evaluation errors are all smaller than those in the other two models. In addition, the MAPE (100) value is smaller than the corresponding stock index’ MAPE value. It shows that the short-term prediction outperforms the long-term prediction.
4.3. The Impact of
When varies from 0.003 to 0.012, different prediction analysis of indexes S&P 500, DJIA, SSEC, and SZI can be performed by the FM-Elman neural network. Figures 6 and 7 are the prediction analysis of S&P 500 and SSEC by the FM-Elman neural model. The two figures also show the effectiveness of forecasting with different volatility degree values of . When is small, the performance of volatility prediction is revealed better through the empirical results. Like and in Figures 6(a) and 6(b), the predictive values are closer to the actual values than those in Figures 6(c) and 6(d). Figure 7 also indicates the similar results.
We choose the often recommended criterion MAPE to measure the prediction performance for stock indexes S&P 500, DJIA, SSEC, and SZI under the FM-Elman neural model which is presented in Table 4. When the value of increases, the value of MAPE increases gradually. And the numerical experiment results show that using the FM-Elman neural network model, the volatility degree forecasting is feasible.
4.4. The Impact of
In this subsection, we want to analyze how the user-specified dimension affects the prediction performance by the FM-Elman neural model through numerical experiment when . From the descriptions in Section 3, when increases, the amount of red nodes in both hidden layer and recurrent layer becomes larger. That means more hidden and recurrent nodes in the proposed FM-Elman neural network will contain interaction information from the connected inputs. Then, the computation becomes complicated as increases. It is interesting to see in Table 5 that the evaluation values of MAE, MAPE, and RMSE for indexes S&P 500, DJIA, SSEC, and SZI increase first and then decrease with the increasing of . So, the low user-specified dimension or high user-specified dimension is a better choice. No matter which outperforms the predicting results of BPNN and Elman NN from the previous Table 3.
4.5. Further Predicting Performance Evaluation
We adopt three trend-type statistical methods, i.e., directional symmetry (DS), correct up-trend (CP), and correct down-trend (CD) , to check the practical stock movement. When the values of these three performance evaluation results become larger, the forecasting of change direction will be more precise. The definitions of these three performance evaluation methods are given aswhere is the number of testing samples.where is the number of testing samples which satisfy .where is the number of testing samples which satisfy . and are the actual value and the predictive value in the th iteration, respectively.
In Table 6, the trend-type measures DS, CP, and CD for stock indexes S&P 500, DJIA, SSEC, and SZI under varying volatility degrees are presented through some numerical experiments. When the value of changes, all the stock indexes change a little. And we can see that the direction forecasting results of SSEC and SZI show better performance than S&P 500 and DJIA since the DS, CP, and CD values in two indexes before all exceed 50. And the performance results of stock indexes S&P 500 and DJIA are more sensitive to with large value. For example, when changes from the value 0.009 to 0.012, the values of DS, CP, and CD for stock indexes S&P 500 and DJIA decrease sharply.
In this study, we developed an improved Elman recurrent neural network by introducing the factorization machine. Through extensive numerical experiments on the data from stock indexes S&P 500, DJIA, SSEC, and SZI, we demonstrated the effectiveness of the FM-Elman neural network. The prediction accuracy for all financial time series shows that our proposed FM-Elman model outperforms the BP neural network and the original Elman NN. We select training and testing datasets under different volatility degrees, i.e., the threshold value varies, to predict. The prediction performance of the FM-Elman model will degrade as becomes larger. We also investigate the effect of the user-specified dimension on the prediction performance by the FM-Elman neural model.
The contribution of this work includes the following two points: (1) a technique FM combined with Elman NN to form an FM-Elman neural model for nonstationary analysis which enjoys benefits from both FM and Elman NN and (2) we demonstrate the prediction accuracy in various metrics. The numerical experiments show significant improvements in prediction accuracy over the existing methods. However, the limitation of this research is that the proposed model is data dependent which does not guarantee excellent predictions on all datasets. And further study on high-order interactions among the inputs is also a challenging work.
The power of combining FM with neural network to achieve better performance will likely exist for the area of classification and regression which will be useful for future studies. We believe that FM can be used in conjunction with other deep learning network such as LSTM to form the high quality predicting method. Various combinations in techniques and approaches can be investigated in the future to solve problems occurring in different applications.
The data used to support this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was supported by the R&D Program of Beijing Municipal Education Commission (grant no. KJZD20191000401). This research was also supported by the Program of the Co-construction with Beijing Municipal Commission of Education of China (grant nos. B20H100020 and B19H100010) and funded by the Key Project of Beijing Social Science Foundation Research Base (grant no. 19JDYJA001). Fang Wang is grateful for the support by the Beijing Laboratory of National Economic Security Early-Warning Engineering (grant no. B19H100030).
J. Y. Zheng, “Forecast of opening stock price based on Elman neural network,” Chemical Engineering Transactions, vol. 46, pp. 565–570, 2015.View at: Google Scholar
X. Y. Qian and S. Gao, “Financial series prediction: comparison between precision of time series models and machine learning methods,” 2018.View at: Google Scholar
S. Rendle, “Factorization machines,” in Proceedings of the 2010 IEEE International Conference on Data Mining, pp. 995–1000, Sydney, Australia, December 2010.View at: Google Scholar
X. He, H. Zhang, M. Y. Kan, and T. S. Chua, “Fast matrix factorization for online recommendation with implicit feedback,” in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 549–558, Pisa, Italy, July 2016.View at: Publisher Site | Google Scholar
F. Zhou, H. Zhou, Z. Yang, and L. Yang, “EMD2FNN: a strategy combining empirical mode decomposition and factorization machine based neural network for stock market trend prediction,” Expert Systems with Applications, vol. 115, pp. 136–151, 2019.View at: Google Scholar