Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2020 / Article

Research Article | Open Access

Volume 2020 |Article ID 2746845 | https://doi.org/10.1155/2020/2746845

Can Yang, Junjie Zhai, Guihua Tao, "Deep Learning for Price Movement Prediction Using Convolutional Neural Network and Long Short-Term Memory", Mathematical Problems in Engineering, vol. 2020, Article ID 2746845, 13 pages, 2020. https://doi.org/10.1155/2020/2746845

Deep Learning for Price Movement Prediction Using Convolutional Neural Network and Long Short-Term Memory

Academic Editor: Petr Hájek
Received24 Mar 2020
Revised05 Jun 2020
Accepted15 Jun 2020
Published16 Jul 2020

Abstract

The prediction of stock price movement direction is significant in financial studies. In recent years, a number of deep learning models have gradually been applied for stock predictions. This paper presents a deep learning framework to predict price movement direction based on historical information in financial time series. The framework combines a convolutional neural network (CNN) for feature extraction and a long short-term memory (LSTM) network for prediction. We specifically use a three-dimensional CNN for data input in the framework, including the information on time series, technical indicators, and the correlation between stock indices. And in the three-dimensional input tensor, the technical indicators are converted into deterministic trend signals and the stock indices are ranked by Pearson product-moment correlation coefficient (PPMCC). When training, a fully connected network is used to drive the CNN to learn a feature vector, which acts as the input of concatenated LSTM. After both the CNN and the LSTM are trained well, they are finally used for prediction in the testing set. The experimental results demonstrate that the framework outperforms state-of-the-art models in predicting stock price movement direction.

1. Introduction

Financial time series prediction, particularly stock price movement prediction, has been one of the most difficult problems for investors and researchers. Forecasting the direction of stock price movement accurately plays a key role in determining to buy and sell a stock. However, stock price is easily affected by macro- or microeconomics, such as interest rates, exchange rates, and monetary policy, making prediction become a challenging task. Motivated by great profits in stock market investment, researchers and speculators have focused on stock market prediction research for decades. Traditional statistical methods like logistic regression, exponential average, ARIMA, and GARCH were used to predict the stock price movement [1, 2]. However, statistical methods are under an assumption that the time series is generated from a linear process and therefore exhibits a poor performance in nonlinear stock price movement prediction. Accordingly, due to the great success in nonlinear field, machine learning and deep learning methods are gradually applied in forecasting stock price movement. Most of them performed two-stage predictions, which are extracting features and then using them as input to the model to make predictions.

Feature extraction is one of the most important parts in stock prediction process. Better market features always contribute to better predictions. Technical analysis is mostly performed to extract features from the original market data [3]. Machine learning methods such as kNN, ANN, SVM, and RF are often utilized to learn the relationship between the features from the technical analysis and price movement [3, 4]. Moreover, deep learning methods, especially for CNN, which have achieved great success in computer vision and image processing, are also used for feature extraction. A time series to image conversion approach was proposed in [5], in order to help CNN extracting useful features from financial variables. Nevertheless, in the approach, the potential influence from correlated stock markets was ignored. To address this problem, a three-dimensional input tensor construction approach was designed in [6], which is capable of extracting features from correlated stock markets. Inspired by their idea, this paper also employed this three-dimensional input tensor construction approach for feature extraction. Another important part in stock prediction process is selecting or enhancing a model. Recent research studies had revealed that deep learning models are superior to traditional machine learning models in financial market prediction [710]. CNN [8], RNN [10], and LSTM [11] were commonly used deep learning models in predicting the stock price movement. In addition, constructing hybrid models is a popular way to enhance the performance of model, such as SVM-ANN model [12], CNN-SVM model [13], and CNN-LSTM model [1420].

In this study, we proposed a hybrid model consisting of CNN and LSTM to predict the direction of stock price movement. On the one hand, we improved the three-dimensional input tensor for CNN to extract features. There are two differences between our approach and Hoseinzade and Haratizadeh’s approach [6]. First, Hoseinzade and Haratizadeh used a diversity of financial variables including stock prices, technical indicators, and stock indices from other markets to construct a three-dimensional input tensor as the input of a specified CNN model. In their input tensor, the influence of transformation of technical indicators and the degree of correlation between other stock markets are ignored, while in our improved three-dimensional tensor, technical indicators were converted into deterministic trend signals following a certain rule and stock markets were ordered according to PPMCC. Another difference lies on that the prediction model used in [6] is a specified CNN, while in our approach, a hybrid model consisting of CNN and LSTM is employed. And the hybrid model is able to combine the advantages of CNN in feature extraction with the advantages of LSTM in time series prediction. On the other hand, we proposed a CNN-LSTM model for stock price movement forecast. Compared with other CNN-LSTM models [1420], the main difference between them and our proposed hybrid model lies on the CNN-based feature extraction module. Their feature extraction modules mainly aimed at extracting features from one-dimensional or two-dimensional input variables, while ours was aimed at three-dimensional input tensor. Different purposes lead to different structures of feature extraction modules. The final experimental results demonstrated that the improvement on input tensor and the combination of CNN and LSTM can significantly improve the prediction performance of the model.

In brief, the main contributions of this work can be summarized as follows:(1)We built an improved three-dimensional input tensor for CNN by converting the technical indicators into deterministic trend signals and using PPMCC to order the correlated stock indices.(2)We designed a CNN-based feature extraction module, which is suitable for extracting features from the three-dimensional input tensor.(3)Extensive experiments demonstrated that our improvement on the three-dimensional input tensor can significantly improve prediction performance, and our proposed model outperforms several state-of-the-art models in terms of F-measure.

The rest of the paper is organized as follows. The related work is introduced in Section 2. Section 3 proposes our framework and methods. Section 4 provides extensive experiments. Finally, the conclusion is drawn in Section 5.

In stock market forecast domain, the previous research approaches are usually categorized into two groups. One focuses on achieving better feature extraction from a series of financial variables. The other attempts to improve prediction performance by enhancing the models.

2.1. Feature Extraction

Extracting useful features from a diverse set of financial variables is one of the most important issues in stock price movement prediction. A better prediction performance can be gained by having better input features. Technical analysis can be used to extract market features from the original financial variables. And stock prediction often uses technical analysis to form features used as input for the models. As reported by Shynkevich et al. [3], approximately 20% of stock market prediction models use technical indicators as input features. These models used for extracting market features from technical indicators mainly include machine learning models and deep learning models.

ANN and SVM are commonly used machine learning models for feature extraction in stock market prediction. Thenmozhi and Chand [21] used SVM to extract information transmission features from six global markets over the period from 1999 to 2011 to predict stock returns. Patel et al. [4] focused on investigating the effect of feature extraction on the prediction performance of models. They employed four machine learning models, which are ANN, SVM, RF, and NB, to extract features from ten technical indicators that were converted into deterministic trend signals and then made predictions in Indian stock markets. Their results showed that converting technical indicators into deterministic trend signals is beneficial to feature extraction and hence improving prediction performance.

As a typical deep learning model, CNN had exhibited great ability for feature extraction in computer vision and image processing. Recently, it was gradually applied to extract market features in stock prediction fields. Persio and Honchar [22] used CNN to extract features from a one-dimensional input variable which is obtained from the history of close price. To compensate for the lack of sufficient information in one-dimensional input, researchers attempted to provide more sufficient financial variables for CNN to extract market features. In fact, some researchers directly used the candlestick chart as the input of CNN [23, 24]. Furthermore, instead of directly taking the image as the input of CNN, Sim et al. [25] employed high-frequency data of close price to construct the input image as the input for CNN model. Sezer and Ozbayoglu [5] proposed a time series to image conversion approach, which utilized 15 technical indicators and 15 different intervals of technical indicators to generate a input image. However, in the approach, the potential influence from correlated stock markets was ignored. To address this problem, Hoseinzade and Haratizadeh [6] recently proposed an approach to build a three-dimensional input tensor for CNN to extract market features. And the experimental results showed the effectiveness of the three-dimensional input tensor in extracting features and hence contribute to improve the performance of the model in predicting the direction of the stock price movement.

2.2. Model Enhancement

Combining the model with other techniques is a common way to improve the prediction performance. In [26], the authors used Harmony search and GA to enhance traditional ANN model and then utilized enhanced ANN to make a prediction. And the results showed that the proposed ANN model is found as a dominant model compared with the other models. Besides, Yin and Bai [27] designed an adaptive SVR for stock data at different time scales. Experimental results showed that the improved SVR with dynamic optimization of learning parameters by PSO can achieve a better result than the traditional SVR. However, in recent years, machine learning models are challenged by deep learning models in stock market prediction [28]. By investigating the Chinese stock market, Chen et al. [8] found that the deep learning model outperforms the backpropagation, the extreme learning machine, and RBFNN in stock price prediction. Similarly, Yu and Yan [9] designed a DNN model based on PSR and LSTM to predict stock prices. By predicting multiple stock indices for different periods, they found the proposed DNN model gets a higher prediction accuracy than ARIMA, SVR, and MLP. Furthermore, a similar conclusion can be drawn in [29].

Designing a hybrid model is another popular way to enhance the prediction performance of single-structure model. In [12], a two-stage fusion approach was proposed. SVR in the first stage and the second stage involves different models, including ANN, RF, and SVR. Experiments on Indian stock market demonstrated the effectiveness of the fusion prediction models. Zhou et al. [30] developed a learning architecture by cascading the logistic regression model onto the GBDT for predicting the stock indices. Cao and Wang [13] established a hybrid prediction model, which consists of CNN and SVM, to make stock market predictions. And the results illustrated that the combination of CNN and SVM can significantly improve the model’s prediction performance. Long et al. [31] proposed an end-to-end model named MFNN for feature extraction on stock price movement prediction task. In their model, both convolutional and recurrent neurons were integrated to construct the multifilter structure. Experiments on Chinese stock market index CSI300 showed the superiority of MFNN to traditional machine learning models, statistical models, CNN, RNN, and LSTM in terms of the accuracy, profitability, and stability. In fact, a more commonly used hybrid model is the CNN-LSTM model [1420]. For example, in [14], the authors found that the CNN-LSTM model is superior to LSTM and CNN in stock price movement prediction. In [17], Li et al. added an attention mechanism to the CNN-LSTM model and further improved its scalability and prediction accuracy. Similarly, Zhou et al. [18] developed a generic framework by using LSTM and CNN for adversarial training to predict stock price direction in the high-frequency stock market and achieved significant results.

3. The Proposed Framework

The architecture of our proposed model is illustrated in Figure 1, which is comprised of three major steps, including input data representation, CNN for feature extraction, and LSTM for prediction.

3.1. Data Representation
3.1.1. Data Labelling

In the field of forecasting stock price movement, the price movement direction often was classified into two classes: up and down [6, 32]. Class labels indicate the movement direction of the stock price. In this paper, the labels are computed by using the daily close price of a stock index. Let be the close price for a stock index on day . The class label for the -th day is defined as

3.1.2. Transformed Deterministic Signals

It is well known that technical indicators are widely used in stock market prediction. In this paper, we employ ten technical indicators and convert them into deterministic trend signals for prediction since Jigar et al. [4] demonstrated that trend deterministic values of technical indicators are better than the native values of technical indicators in stock trend forecasting. Table 1 presents the specific details.


Name of indicatorsFormulasRules for deterministic trend signals

Simple moving averageIf , label “1”; otherwise, label “0”
Weight moving averageIf , label “1”; otherwise, label “0”
Momentum =  − If , label “1”; otherwise, label “0”
Stochastic K%If , label “1”; otherwise, label “0”
Stochastic D%If , label “1”; otherwise, label “0”
Moving average convergence divergence
If , label “1”; otherwise, label “0”
Relative strength indexIf or , label “1,” and if or , label “0”
William’s %RIf , label “1”; otherwise, label “0”
Commodity channel index, , If or , label “1,” and if or , label “0”
Accumulation/distribution oscillatorIf , label “1”; otherwise, label “0”

, , and denote the close price, low price, and high price at time , respectively; and represent, respectively, lowest low and highest high in the last t days; means upward price change while is the downward price change at time . EMA refers to the exponential moving average, , , and denotes the time period of day exponential moving average.
3.1.3. Input Tensor Building

In [32], the authors ordered the features in the two-dimensional input matrix according to the correlation between instances and features before they are presented as input to the CNN. And their results showed that the CNN with a specifically ordered features outperforms CNN that utilizes randomly ordered features. Inspired by their idea, we try to apply this correlation to the three-dimensional input tensors for CNN.

In Figure 2, we show the representation of the three-dimensional input tensor. In this paper, , , and are 10, 10, and 11, respectively. In the proposed framework shown in Figure 1, the input is a three-dimensional tensor, each dimension of which represents the number of technical indicators, the number of trading days, and the number of correlated stock indices, where there are converted deterministic variables from the technical indicators for each of these markets, days used for prediction, and correlated market indices.

Different from [6], in our three-dimensional input tensor, the technical indicators are transformed into deterministic trend signals and the stock indices are ranked by PPMCC. Actually, PPMCC is one of the most common measurements of determining linear dependence, which is capable of reflecting the degree of linear correlation between two variables [33, 34]. The calculation formula is as follows:where and are the values of the -th and the -th feature on the -th day index. and are the average values of the -th and -th feature. represents the number of data. If , there is a positive correlation, and if , it is negatively correlated; otherwise, it is linearly independent.

In detail, we take the calculation of PPMCC between S&P 500 and DJIA as an example. and in equation (2) are close prices of S&P 500 and DJIA on the -th day, respectively. and are the corresponding average of the close price of S&P 500 and DJIA. is the number of trading days in S&P 500. Following equation (2), we can obtain the PPMCC between S&P 500 and DJIA. The PPMCC between S&P 500 and the other 10 stock indices can also be obtained in a similar way. And the results can be found in Figure 3. Therefore, the order of stock indices in the three-dimensional input tensor is S&P 500, NASDAQ, DJIA, RUSSELL, NYSE, DAX, N225, FTSE, CAC40, HSI, and SSE.

3.2. CNN for Feature Extraction

In general, the CNN model includes several layers [35], such as the input layer, the convolutional layer, the pooling layer, the fully connected layer, and the output layer. In this paper, we do not employ the pooling layer because Yang et al. [36] claimed in the financial study that if a pooling layer is adopted, the information would probably be lost. Specifically, the convolutional layer is designed for performing convolution operations on the input data. Actually, the convolution operation can be considered as a filter used for the input data. The size of a filter suggests its coverage. Moreover, all the filters share the same weights in the convolution operation, and the weights are updated in training. Similar to [6], Figure 2 exhibits how the filter works in the three-dimensional input tensor. Next, a fully connected layer is used for linking the flattened layer to the output layer, which is a MLP network that can perform the prediction and classification operations.

Inspired by [32], the authors used a parallel convolutional layer to generate multiple time series representations of different time scales and achieved significant results. And in the proposed CNN feature extraction module, there are 5 layers, including a parallel convolutional layer, a merge layer, two convolutional layers, and a flattened layer, which are shown in the virtual line frame of Figure 4. In the parallel layer, the convolutions for different branches are independent of each other. In the merged layer, all extracted features of parallel layers are concatenated. Then, the concatenated feature will be processed by the remaining two convolutional layers. Finally, the flattened layer obtains the feature vector. Notably, the fully connected and the output layer are only used for training, and in testing, the LSTM network replaces them and is concatenated with the feature vector generated from the flattened layer.

A specific configured in the CNN feature extraction module shows that the input tensor is a matrix of 10 by 11 with a depth of 10. The parallel convolutional layers perform and convolutional operations, and the filters both are ten, after which there is one convolutional layer with ten filters, and in the next convolutional layer, ten filters are utilized. By the way, in each convolutional layer, the padding method takes “same.” Then, a flattened layer is used to generate the feature vector. When training, the flattened layer is concatenated with a fully connected network consisting of two hidden layers: the first layer has 10 neurons and the second layer has 2 neurons. Specifically, the loss function is categorical cross entropy, epochs are 24, and batch size is 32 in our experiments. Finally, the “softmax” activation function is employed in the output layer.

3.3. LSTM for Prediction

In the combination model, the LSTM network, concatenated with the trained CNN, is used for final prediction. Specifically, the feature vector generated from the flattened layer acts as the input for the LSTM network to make a prediction. The LSTM network is comprised of an input layer, a hidden layer, and an output layer. In detail, the hidden layer, including the memory cells, is the main characteristic of LSTM networks. Each of the memory cells has three gates designed for maintaining and adjusting its cell state : a forget gate (), an input gate (), and an output gate (). Specifically, each of the gates can be considered a filter to fulfill a certain purpose. The forget gate and the input gate define which information to remove from and add to the cell state, respectively. The output gate specifies which information from the cell state will be utilized as output.

Figure 5 illustrates the structure of a memory cell. We formulate the LSTM model to process time series of stock indices, referring to the literature [37]. During a forward pass, denotes an output of LSTM at day and can be calculated as follows:where is the weight matrix and is the input vector at time . , , and are forgotten, input, and output gates at time , respectively. and denote the distorted input to the memory cell and the content of the memory cell at time . In addition, represents the value of the hidden node, and the symbol represents the elementwise production operation. The corresponding details of the back propagation through time are introduced in [38].

In terms of the configuration of the LSTM network, the optimizer adopts the “Adam” optimization algorithm, the loss function is a categorical cross entropy, epochs are 12, and batch size is 64. As for the time steps, the number of hidden neurons, and the dropout rate, we present the levels instead of specified values. Details can be found in Table 2. For each stock index, the determination of these parameters is according to the prediction performance on the validation set. Notably, the proposed model is implemented by Python with a version of 3.5.4. We mainly use such machine learning libraries as “Keras” and “NumPy” for various functionalities.


ParametersLevels

Time steps6, 7, 8, …, 14, 15
Number of hidden neurons50, 100, 150, 200
Dropout rate0.1, 0.2, 0.3, 0.4

4. Experiments

In this study, we use 11 influential international stock market indices, including CAC40, DJIA, S&P 500, NASDAQ, DAX, FTSE, NYSE, HSI, N225, SSE, and RUSSELL. The data are from the period of January 4, 2010, to December 29, 2017. All the data are downloaded from Yahoo Finance (https://finance.yahoo.com/).

In addition, our experimental scheme for the investigation is based on the proposed deep learning framework, called “CNN3D-DR + LSTM,” and the workflow of proposed model can be seen in Figure 6. Besides, the main steps are described as follows:(1)Data preprocessing: the original dataset is used to generate the labels and the deterministic trend signals, which act as the input within three-dimensional tensors. Then, the stock indices in the three-dimensional tensor are ranked by PPMCC.(2)Data partitioning: all the labelled data are first divided into 3 parts—the training set for training, the validation set for parameter determination, and the testing set for performance evaluation.(3)Training: the training dataset is used to train the CNN connecting with a fully connected neural network and then the trained CNN is used to generate a series of feature vectors, which act as the input of the LSTM neural network. Next, we set different parameters and use the obtained feature vectors to train the LSTM network. Then, we use the validation set to evaluate the prediction performance of the hybrid model. The parameters of optimal prediction performance are obtained.(4)Testing: the testing dataset and the trained CNN are used to compute the feature vectors and then they are put into LSTM with optimal parameters for predicting the direction of stock price movements.(5)Evaluation: the prediction performance is evaluated by comparing the predicted value with the real ones.

4.1. Evaluation Methodology

The evaluation scheme is based on the confusion matrix for two-class classification shown in Table 3; here, , , , and denote true positive, false positive, false negative, and true negative counts, respectively. Precision, recall, accuracy, and F-measure are commonly used indicators to evaluate the prediction performance, and the corresponding formula is as follows:


Actual/predictedPositiveNegative

Positivetpfn
Negativefptn

Accuracy is an important evaluation indicator. However, it may not be suitable for an unbalanced dataset [32]. For a full assessment of the prediction performance, we also take precision, recall, and F-measure into consideration. In order to evaluate the prediction performance for each class, the precision, recall, and F-measure take the mean of values for positive and negative classes. By the way, the mean of the F-measure values for positive and negative classes is also called macroaverage F-measure [6, 32]. Furthermore, we use the ROC curve that is created by plotting the TPR against the FPR at different possible thresholds to visualize the performance of the proposed models. And the AUC (area under the ROC curve) is taken as an overall performance measure because it is independent of the cutoff value. The higher the AUC value is, the better prediction performance that the model achieves.

4.2. Experiments on S&P 500

In this section, we conduct extensive experiments on the S&P 500 to investigate the effectiveness of the proposed model. For the simplicity of description, we let “D” represent the fact that the technical indicators have been converted into deterministic trend signals, and we let “R” represent the fact that the stock indices in the three-dimensional tensor have been ranked by PPMCC. Specifically, we design several models for comparative experiments as follows:(1)CNN3D: we take a three-dimensional input tensor as the input data for a CNN model to make a prediction. In the input tensor, the value of a technical indicator is normalized and not converted to deterministic trend signals.(2)CNN3D-DR: in this model, the difference from CNN3D is the fact that the technical indicators in the input tensor are transformed into deterministic trend signals and the stock indices in the tensor are ranked by PPMCC.(3)LSTM-D: to test the LSTM network, the deterministic trend signals transformed from technical indicators are utilized as the input to make predictions.(4)CNN3D + LSTM: regardless of the deterministic trend signals, the CNN is used to extract features from the three-dimensional input tensor, while the LSTM network is employed for making predictions. The input tensor used here is the same as in CNN3D.(5)CNN3D-D + LSTM: in the three-dimensional input tensor, the technical indicators are transformed into deterministic trend signals but the stock indices are not ranked. And the CNN is used to extract features, while the LSTM network is used to make predictions.(6)CNN3D-DR + LSTM: in contrast to CNN3D + LSTM, in the input tensor, the technical indicators are transformed into deterministic trend signals and the stock indices are ranked by PPMCC.

First of all, we divide the dataset into three parts: training set, validation set, and testing set. The validation set is used to determine the optimal parameters in LSTM network. Here, we define as the ratio of training set and validation set to the testing set. For example, means that the ratio of training set and validation set is 80% of dataset, while the testing set is 20%. For simplicity, we set the ratio between training set and validation set as 4 : 1. Table 4 shows the macroaverage F-measure of CNN3D-DR + LSTM with different parameters on validation set in S&P 500 when . And we can find the optimal time steps in LSTM is 6, the optimal number of hidden neurons is 100, and the optimal dropout rate is 0.3.


Number of hidden neuronsDropout rateTime steps in LSTM
n = 6n = 7n = 8n = 9n = 10n = 11n = 12n = 13n = 14n = 15

N = 500.10.62890.61070.62130.58650.55830.59120.57080.55460.53680.5767
0.20.60570.60550.59280.57590.55650.62540.57200.58880.57870.5492
0.30.61770.60700.58010.62350.59430.60590.56400.56220.54450.5445
0.40.61470.62070.58220.60700.56040.61310.55930.55990.55590.5582

N = 1000.10.60950.60330.61560.59800.58830.62230.57390.55720.54670.5380
0.20.58790.60690.57920.58740.59880.64420.57530.58210.57830.5505
0.30.61900.59800.59500.55400.62170.57230.59930.55450.5555
0.40.59980.61070.59960.59500.56810.60280.56810.59910.54050.5593

N = 1500.10.60950.62950.59500.57940.56440.58840.58110.55550.55590.5613
0.20.61700.60100.58760.60210.58760.62580.59400.54580.59150.5622
0.30.60570.60930.58760.61370.58160.61870.57860.56780.53680.5626
0.40.58630.61230.57490.60260.55970.62030.54870.56860.56610.5603

N = 2000.10.60430.61980.60400.58090.57120.62840.53750.56610.54560.5245
0.20.59810.61740.58690.60330.56210.59190.57080.58580.54670.5670
0.30.60350.60930.60470.59880.59430.61500.58950.59700.52680.5513
0.40.60780.62040.58690.58530.57420.62690.53570.56700.54730.5783

Then, we design a group of experiments with different sizes of the training set and the testing set to detect the suitability and robustness of the proposed framework. We conduct experiments on S&P 500 and exhibit the average prediction results of the experiments with . can be set to 60/40, 65/35, 70/30, 75/35, and 80/20, and Table 5 presents the corresponding optimal parameters. Furthermore, we show the average performance of different models with different in Table 6. For a clearer visualization, we illustrate the results in Figure 7.


Time steps in LSTMNumber of hidden neuronsDropout rate

60/4081000.1
65/359500.1
70/306500.3
75/2561000.4
80/2061000.3


MetricsCNN3DCNN3D-DRLSTM-DCNN3D + LSTMCNN3D-D + LSTMCNN3D-DR + LSTM

Accuracy60/400.50820.49570.52020.49930.56930.5833
65/350.50950.50090.53480.51300.56050.5728
70/300.47980.49520.53390.50790.59910.6160
75/250.48560.52000.52290.51500.59680.6316
80/200.48780.51800.51750.50100.56530.5916
Average0.49420.50600.52580.50720.57820.5991

Precision60/400.50190.49540.51240.49910.56710.5818
65/350.50710.49850.53170.50290.55290.5650
70/300.49040.49270.51500.50040.59290.6107
75/250.50450.51650.50060.52220.59210.6281
80/200.49490.51830.50890.50090.56200.5894
Average0.49980.50430.51370.50510.57340.5950

Recall60/400.50170.49640.51070.49920.56350.5779
65/350.50620.49750.52760.50270.55130.5621
70/300.49380.49370.51300.50020.59100.6094
75/250.50340.51550.50030.52180.58970.6249
80/200.49510.51840.50870.50080.55960.5860
Average0.50010.50430.51210.50490.57100.5921

F-measure60/400.49540.49510.49620.49860.56020.5754
65/350.49890.49800.51540.50000.55070.5609
70/300.46320.49230.50430.49890.59120.6096
75/250.47910.51570.49110.51410.58940.6250
80/200.48750.51630.50720.49950.55770.5845
Average0.48480.50350.50280.50220.56980.5911

To compare the results of CNN3D and CNN3D-DR, we find that CNN3D-DR can provide better average performance. In particular, in the comparison between the CNN3D + LSTM and the CNN3D-DR + LSTM, the CNN3D-DR + LSTM shows significant superiority compared to the CNN3D + LSTM, which demonstrates that the improvement of three-dimensional input tensor can significantly improve the prediction accuracy. Furthermore, neither CNN3D-DR nor LSTM-D defeats CNN3D-DR + LSTM, indicating that the hybrid model is effective in improving prediction performance. In brief, the CNN3D-DR + LSTM outperforms the others in the given situation and demonstrates that the improvement of three-dimensional input tensor and the combination of CNN and LSTM can improve the prediction performance. To better evaluate the performance of stock price movement direction prediction, we illustrate the ROC curves of different experiment groups where takes 80/20 in Figure 8, from which we can find that the area under the ROC curve (AUC) of the proposed model is larger than the others.

4.3. Comparison with Other Models

In addition, we conduct a group of experiments to evaluate the performance of the proposed model compared with several state-of-the-art models. We apply all the models in predicting stock price movement direction on five different stock indices, which are S&P 500, DJIA, NASDAQ, NYSE, and RUSSELL, respectively. In the comparison with other models, we divide the dataset into 2 parts: the first 80% of the data is used for training, while the remaining 20% acts as the testing data. Accordingly, the performance of these models is compared in terms of the average macroaverage F-measure.

In terms of the proposed hybrid model, Table 7 shows its optimal parameters on different stock indices. Besides, in other models, the same parameter settings reported in the original paper are used. The details of other models are described as follows:(1)PCA + ANN [39]: first, the initial data are mapped to a new feature space by using PCA. Then, we use the resulting representation of the data to train a three-layered ANN for stock price direction prediction. In the hidden layer, the number of neurons is set to 10 and a tangent sigmoid function is used. And a logistic sigmoid transfer function is used in the output layer.(2)SVM [4]: ten technical indicators are represented as trend deterministic data and are then fed into SVM to predict stock price index movement. For each stock, the optimal parameters of SVM are obtained from several given parameter levels. By the way, the selected ten technical indicators are same as this paper.(3)CNN-cor [32]: the feature set is extracted from different technical indicators, price and temporal information, and then ordered by the correlations between instances and features. Finally, the ordered features are used to build a two-dimensional input matrix for the specified CNN to predict the direction of stock price movement.(4)CNNpred [6]: a diverse set of financial variables, including technical indicators, stock indices, commodities, future contracts, etc., is used to construct three-dimensional input tensors. Then, the input tensors are fed into a specified CNN model to make predictions.(5)CNN + LSTM: we implement a common CNN-LSTM model for comparison. In the model, ten technical indicators are used to construct a two-dimensional input data for CNN to extract features. The ten technical indicators are the same as those in [4] and the parameters of CNN are the same as this paper. Besides, LSTM is utilized for price direction forecasting. The time steps, number of hidden neuron, and dropout rate are 10, 50, and 0.1, respectively.


Stock indexTime steps in LSTMNumber of hidden neuronsDropout rate

S&P 50061000.3
DJIA10500.1
NASDAQ92000.3
NYSE6500.1
RUSSELL102000.1

Table 8 shows the average results. In addition, we also show the best performance of the models in Table 9. The experimental results on different stock indices demonstrate that the proposed model is superior to the other common models, including ANN, SVM, CNN, and CNN + LSTM.


Market modelPCA + ANNSVMCNN-corCNNpredCNN + LSTMCNN3D-DR + LSTM

S&P 5000.44690.49630.39280.48370.51650.5864
DJIA0.41500.45950.39000.49790.50360.6215
NASDAQ0.41990.42840.37960.49310.50810.5860
NYSE0.40700.42990.39060.47510.50070.5910
RUSSELL0.45250.51020.39240.48460.52410.5987
Average0.42830.46490.38910.48690.51060.5967


Market modelPCA + ANNSVMCNN-corCNNpredCNN + LSTMCNN3D-DR + LSTM

S&P 5000.56270.52080.57230.55320.53450.6061
DJIA0.55180.55940.52530.56120.52800.6418
NASDAQ0.54870.47760.54980.55760.54230.6005
NYSE0.52510.50120.53760.55920.52140.6053
RUSSELL0.56650.51150.56020.57870.54760.6181
Average0.55100.51410.54910.56200.53480.6144

5. Conclusion

This paper presented a combined deep learning framework with CNN and LSTM neural networks to predict the stock price movement direction. First, we improved the three-dimensional input tensor by transforming the technical indicators into deterministic trend signals and ranking the correlated stock indices according to PPMCC. Then, we designed a CNN-based module for feature extraction. Finally, we employed a LSTM network for stock price movement direction prediction.

Extensive experiments demonstrated that the deterministic trend signals and the ranked stock indices in the three-dimensional input tensor play a significant role in improving the prediction performance. Moreover, the result of comparing with several state-of-the-art models showed the superiority of the proposed model in predicting direction of the stock price movement.

In future work, it would probably be a core challenge to design better learning models via intelligently extracting more valuable features to further improve the prediction performance.

Abbreviations

ANN:Artificial neural network
ARIMA:Autoregressive integrated moving average
CAC40:CAC40 index
CNN:Convolutional neural network
DAX:DAX performance index
DJIA:Dow Jones industrial average
DNN:Deep neural network
FPR:False positive rate
FTSE:FTSE 100 index
GA:Genetic algorithm
GARCH:Generalized autoregressive conditional heteroscedasticity
GBDT:Gradient boosted decision tree
HSI:Hang Seng index
kNN:k-nearest neighbor
LSTM:Long short-term memory
MFNN:Multifilter neural network
MLP:Multilayer perception
N225:Nikkei 225 index
NASDAQ:NASDAQ composite index
NB:Naive Bayes
NYSE:New York stock exchange index
PPMCC:Pearson product-moment correlation coefficient
PSO:Particle swarm optimization
PSR:Phase-space reconstruction
RBFNN:Radial basis function neural network
RF:Random forest
RNN:Recurrent neural network
ROC:Receiver operating characteristic
RUSSELL:RUSSELL 2000 index
S&P 500:S&P 500 index
SSE:SSE composite index
SVM:Support vector machine
SVR:Support vector regression
TPR:True positive rate.

Data Availability

The data used to support the findings of this study can be downloaded from Yahoo Finance (https://finance.yahoo.com/).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

  1. J. Sun, K. Xiao, C. Liu, W. Zhou, and H. Xiong, “Exploiting intra-day patterns for market shock prediction: a machine learning approach,” Expert Systems With Applications, vol. 127, pp. 272–281, 2019. View at: Publisher Site | Google Scholar
  2. Z. Lin, “Modelling and forecasting the stock market volatility of sse composite index using garch models,” Future Generation Computer Systems, vol. 79, pp. 960–972, 2018. View at: Publisher Site | Google Scholar
  3. Y. Shynkevich, T. M. McGinnity, S. A. Coleman, A. Belatreche, and Y. Li, “Forecasting price movements using technical indicators: investigating the impact of varying input window length,” Neurocomputing, vol. 264, pp. 71–88, 2017. View at: Publisher Site | Google Scholar
  4. J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques,” Expert Systems with Applications, vol. 42, no. 1, pp. 259–268, 2015. View at: Publisher Site | Google Scholar
  5. O. B. Sezer and A. M. Ozbayoglu, “Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach,” Applied Soft Computing, vol. 70, pp. 525–538, 2018. View at: Publisher Site | Google Scholar
  6. E. Hoseinzade and S. Haratizadeh, “Cnnpred: cnn-based stock market prediction using a diverse set of variables,” Expert Systems with Applications, vol. 129, pp. 273–285, 2019. View at: Publisher Site | Google Scholar
  7. Y. Chen, W. Lin, and J. Z. Wang, “A dual-attention-based stock price trend prediction model with dual features,” IEEE Access, vol. 7, pp. 148047–148058, 2019. View at: Publisher Site | Google Scholar
  8. L. Chen, Z. Qiao, M. Wang, C. Wang, R. Du, and H. E. Stanley, “Which artificial intelligence algorithm better predicts the Chinese stock market?” IEEE Access, vol. 6, pp. 48625–48633, 2018. View at: Publisher Site | Google Scholar
  9. P. Yu and X. Yan, “Stock price prediction based on deep neural networks,” Neural Computing and Applications, vol. 132, pp. 1–20, 2019. View at: Google Scholar
  10. H. M, G. E. A., V. K. Menon, and S.K. P., “Nse stock market prediction using deep-learning models,” Procedia Computer Science, vol. 132, pp. 1351–1362, 2018. View at: Publisher Site | Google Scholar
  11. S. Borovkova and I. Tsiamas, “An ensemble of lstm neural networks for high-frequency stock market classification,” Journal of Forecasting, vol. 38, no. 6, pp. 600–619, 2019. View at: Google Scholar
  12. J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock market index using fusion of machine learning techniques,” Expert Systems with Applications, vol. 42, no. 4, pp. 2162–2172, 2015. View at: Publisher Site | Google Scholar
  13. J. Cao and J. Wang, “Stock price forecasting model based on modified convolution neural network and financial time series analysis,” International Journal of Communication Systems, vol. 32, no. 12, p. e3987, 2019. View at: Publisher Site | Google Scholar
  14. S. Jain, R. Gupta, and A. A. Moghe, “Stock price prediction on daily stock data using deep neural networks,” in Proceedings of the 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), pp. 1–13, IEEE, New York, NY, USA, 2018. View at: Google Scholar
  15. J. Eapen, D. Bein, and A. Verma, “Novel deep learning model with cnn and bi-directional lstm for improved stock market index prediction,” in Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 264–270, IEEE, New York, NY, USA, 2019. View at: Google Scholar
  16. X. Zhan, Y. Li, R. Li, X. Gu, O. Habimana, and H. Wang, “Stock price prediction using time convolution long short-term memory network,” in Proceedings of the International Conference on Knowledge Science, Engineering and Management, pp. 461–468, Springer, Berlin, Germany, 2018. View at: Google Scholar
  17. C. Li, X. Zhang, M. Qaosar, S. Ahmed, K. M. R. Alam, and Y. Morimoto, “Multi-factor based stock price prediction using hybrid neural networks with attention mechanism,” in Proceedings of the 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 961–966, IEEE, Berlin, Germany, 2019. View at: Google Scholar
  18. X. Zhou, Z. Pan, G. Hu, S. Tang, and C. Zhao, “Stock market prediction on high-frequency data using generative adversarial nets,” Mathematical Problems in Engineering, vol. 34, 2018. View at: Google Scholar
  19. J. Liu, Y. Chen, K. Liu, and J. Zhao, “Attention-based event relevance model for stock price movement prediction,” in Proceedings of the China Conference on Knowledge Graph and Semantic Computing, pp. 37–49, Springer, Berlin, Germany, 2017. View at: Google Scholar
  20. P. Oncharoen and P. Vateekul, “Deep learning using risk-reward function for stock market prediction,” in Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, pp. 556–561, Berlin, Germany, 2018. View at: Google Scholar
  21. M. Thenmozhi and G. Sarath Chand, “Forecasting stock returns based on information transmission across global markets using support vector machines,” Neural Computing and Applications, vol. 27, no. 4, pp. 805–824, 2016. View at: Publisher Site | Google Scholar
  22. L. D. Persio and O. Honchar, “Artificial neural networks architectures for stock price prediction: comparisons and applications,” International Journal of Circuits, Systems and Signal Processing, vol. 10, pp. 403–413, 2016. View at: Google Scholar
  23. S.-J. Guo, F.-C. Hsu, and C.-C. Hung, “Deep candlestick predictor: a framework toward forecasting the price movement from candlestick charts,” in Proceedings of the 2018 9th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 219–226, IEEE, Berlin, Germany, 2018. View at: Google Scholar
  24. K. Jearanaitanakij and B. Passaya, “Predicting short trend of stocks by using convolutional neural network and candlestick patterns,” in Proceedings of the 2019 4th International Conference on Information Technology (InCIT), pp. 159–162, IEEE, Berlin, Germany, 2019. View at: Google Scholar
  25. H. S. Sim, H. I. Kim, and J. J. Ahn, “Is deep learning for image recognition applicable to stock market prediction?” Complexity, vol. 10, 2019. View at: Google Scholar
  26. M. Göçken, M. Özçalıcı, A. Boru, and A. T. Dosdoğru, “Integrating metaheuristics and artificial neural networks for improved stock price prediction,” Expert Systems with Applications, vol. 44, pp. 320–331, 2016. View at: Publisher Site | Google Scholar
  27. Y. Guo, S. Han, C. Shen, Y. Li, X. Yin, and Y. Bai, “An adaptive svr for high-frequency stock price forecasting,” IEEE Access, vol. 6, pp. 11397–11404, 2018. View at: Publisher Site | Google Scholar
  28. R. Singh and S. Srivastava, “Stock prediction using deep learning,” Multimedia Tools and Applications, vol. 76, no. 18, pp. 18569–18584, 2017. View at: Publisher Site | Google Scholar
  29. Q. Wang, W. Xu, and H. Zheng, “Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles,” Neurocomputing, vol. 299, pp. 51–61, 2018. View at: Publisher Site | Google Scholar
  30. F. Zhou, Q. Zhang, D. Sornette, and L. Jiang, “Cascading logistic regression onto gradient boosted decision trees for forecasting and trading stock indices,” Applied Soft Computing, vol. 84, p. 105747, 2019. View at: Publisher Site | Google Scholar
  31. W. Long, Z. Lu, and L. Cui, “Deep learning-based feature engineering for stock price movement prediction,” Knowledge-Based Systems, vol. 164, pp. 163–173, 2019. View at: Publisher Site | Google Scholar
  32. H. Gunduz, Y. Yaslan, and Z. Cataltepe, “Intraday prediction of borsa istanbul using convolutional neural networks and feature correlations,” Knowledge-Based Systems, vol. 137, pp. 138–148, 2017. View at: Publisher Site | Google Scholar
  33. M.-T. Puth, M. Neuhäuser, and G. D. Ruxton, “Effective use of Pearson's product-moment correlation coefficient,” Animal Behaviour, vol. 93, pp. 183–189, 2014. View at: Publisher Site | Google Scholar
  34. J. Guo and X. Li, “Prediction of index trend based on lstm model for extracting image similarity feature,” in Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science, pp. 335–340, New York, NY, USA, 2019. View at: Google Scholar
  35. Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10, p. 1995, 1995. View at: Google Scholar
  36. H. Yang, Y. Zhu, and Q. Huang, “A multi-indicator feature selection for cnn-driven stock index prediction,” in Proceedings of the International Conference on Neural Information Processing, pp. 35–46, Springer, Berlin, Germany, 2018. View at: Google Scholar
  37. C. Yang, S. Ren, Y. Liu, H. Cao, Q. Yuan, and G. Han, “Personalized channel recommendation deep learning from a switch sequence,” IEEE Access, vol. 6, pp. 50824–50838, 2018. View at: Publisher Site | Google Scholar
  38. K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “Lstm: a search space odyssey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–2232, 2016. View at: Google Scholar
  39. X. Zhong and D. Enke, “Forecasting daily stock market return using dimensionality reduction,” Expert Systems with Applications, vol. 67, pp. 126–139, 2017. View at: Publisher Site | Google Scholar

Copyright © 2020 Can Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views6939
Downloads1481
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.