Multistep-Ahead Stock Price Forecasting Based on Secondary Decomposition Technique and Extreme Learning Machine Optimized by the Differential Evolution Algorithm
The prediction research of the stock market prices is of great significance. Based on the secondary decomposition techniques of variational mode decomposition (VMD) and ensemble empirical mode decomposition (EEMD), this paper constructs a new hybrid prediction model by combining with extreme learning machine (ELM) optimized by the differential evolution (DE) algorithm. The hybrid model applies VMD technology to the original stock index price sequence to obtain different modal components and the residual item, then applies EEMD technology to the residual item, and then superimposes the prediction results of the DE-ELM model for each modal component and the residual item to obtain the final prediction results. In order to verify the validity of the model, this paper constructs a series of benchmark models and, respectively, tests the samples of the S&P 500 index and the HS300 index by one-step, three-step, and five-step forward forecasting. The empirical results show that the hybrid model proposed in this paper achieves the best prediction performance in all prediction scenarios, which indicates that the modeling idea focusing on the residual term effectively improves the prediction performance of the model. In addition, the prediction effect of the model combined with the decomposition technology is superior to the single DE-ELM model, where the secondary decomposition technique has a significant decomposition advantage compared to the single decomposition technique.
The stock index is an important indicator reflecting the development of the stock market. The stock index price forecast can not only provide investors with a reasonable investment basis, but also help relevant policy makers to monitor systemic risks and asset price bubbles in the stock market in a timely manner. However, the price fluctuations of the stock index are affected by complex factors such as the development of the overall financial market and investor sentiment [1–3]. Compared to a single stock, the price of a stock index has more uncertainty. It has a series of characteristics such as nonstationarity, nonlinearity, and long memory. Therefore, the majority of scholars have been committed to continuously improving the accuracy of stock index prediction [4–6].
According to the different variables involved in predicting stock index price models, it can be divided into two categories: multivariable prediction and univariate prediction. Multivariable forecasting refers to that the forecasting model includes not only historical stock price data, but also technical analysis tools, various macroeconomic variables, and investor sentiment that affect stock index prices [7–9]. Among them, technical analysis tools have been widely used by stockbrokers. In the research of stock price prediction, technical analysis tools are often combined with other prediction models, especially artificial intelligence models [10, 11].
The univariate prediction refers to that the input variables in the prediction model only include historical stock index price data [12–15]. This article belongs to the research scope of univariate prediction.
Scholars make in-depth research on the prediction of univariate prediction. There are three main types of prediction models: traditional econometric models, artificial intelligence models, and hybrid models. Traditional econometric models include the autoregressive conditional heteroscedasticity (ARCH) model , generalized autoregressive conditional heteroscedasticity (GARCH) [17, 18], autoregressive moving average (ARMA) model [19, 20], autoregressive integrated moving average (ARIMA) model , and vector autoregressive model . However, the traditional econometric models often need to follow a series of strict assumptions when dealing with data, which are usually only suitable for dealing with stable linear data and do not have good prediction ability for the stock index. Therefore, the artificial intelligence model with good prediction ability for nonlinear time series has been widely used in stock market price prediction in recent years.
The artificial intelligence model improves the prediction accuracy of the model by training historical price data. Common representative models include the artificial neural network (ANN) [23–25], feedforward neural network (FFNN) , support vector machine (SVM) [27, 28], and long short-term memory neural network (LSTM) [29, 30]. Unfortunately, stock index prices are highly volatile, and there is a lot of noise in the data series. However, the traditional feedforward network has difficulty in adjusting model parameters, and the model does not have good antinoise ability. A large number of studies have shown that the prediction effect of hybrid models is often better than that of single models [31–33]. The typical method in the hybrid model is the TEI@I complex system research methodology . The method firstly uses the decomposition algorithm to process the original sequence with a large amount of noise and then models each decomposed component with the prediction model. This strategy significantly improves the prediction accuracy of time series and has been widely used in various fields of prediction in recent years [35–39].
Common decomposition algorithms include wavelet decomposition (WD)  and empirical mode decomposition (EMD) . Compared with WD, EMD is more suitable for processing nonlinear complex sequences. Wei  proposed a mixed time series model based on empirical mode decomposition to predict the stock prices of the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and Hang Seng Stock Index (HSI). The empirical results prove the effectiveness of the EMD algorithm in forecasting time series. Cheng and Wei  combined empirical mode decomposition and the support vector regression model to build a mixed model for forecasting time series, taking the stock price of Taiwan Stock Exchange (TAIEX) as an empirical target. The empirical results show that the mixed model is superior to the autoregressive model and the support vector regression model in forecasting performance.
The EMD algorithm has the disadvantage of easily superimposing various modes in the result of decomposing the original sequence. Therefore, the ensemble empirical mode decomposition (EEMD)  was further proposed and applied. Li and Feng  applied ensemble empirical mode decomposition (EEMD) to the processing of nonlinear and nonstationary financial time series. Fang  used ensemble empirical mode decomposition (EEMD) to explore the prediction ability of investor sentiment in different periods of stock market returns. Although EEMD eliminates the phenomenon of mode aliasing to some extent, the EEMD algorithm cannot completely separate components with similar frequencies. In addition, the EEMD algorithm has the defect of overenveloping or incomplete enveloping the data, and the decomposition of the original data is still insufficient.
In recent years, as an improvement to the above decomposition algorithm, variational mode decomposition (VMD)  has performed excellent in processing signals interfered by noise. VMD can effectively separate components with similar frequencies and improve the accuracy of original sequence decomposition. It has been widely combined with various prediction models by scholars in empirical research and applied in the prediction research of the energy price market or the financial market [36, 48]. Lahmiri  combined variational mode decomposition (VMD) and the backpropagation neural network (BPNN) to construct the VMD-PSO-BPNN model to predict stock prices. The empirical results show that compared with the benchmark model PSO-BPNN, the VMD-PSO-BPNN model has advantages in prediction performance. Bisoi et al.  combined the variational model decomposition (VMD) and the optimized limit learning organization to build a DE-VMD-RKELM model to predict the BSE S&P 500 Index (BSE), Hang Seng Index (HSI), and Financial Times Stock Exchange 100 Index (FTSE). Wang et al. , based on fast ensemble empirical mode decomposition (FEEMD), VMD, and backpropagation (BP) neural network, established a mixed model of two-level decomposition to predict the electricity price.
As a method to form the prediction part of the TEI@I complex system research methodology, any regression method is adequate. However, there are differences in the prediction accuracy and the completion time of different models. The artificial intelligence models mentioned above (SVR, ANN, and LSTM) are better choices than traditional econometric models. Unfortunately, the prediction ability of the traditional feedforward network is limited by parameter adjustment, which has the defects of slow calculation speed and unstable generalization ability. Therefore, extreme learning machine (ELM)  has been widely used in various prediction fields due to its advantages in learning convergence speed and parameter setting [53, 54].
In summary, in the existing stock index price prediction research, the TEI@I complex system research methodology represented by “decomposition-prediction” has gained recognized advantages in the prediction field. However, the existing research still has the following deficiencies:(1)In the existing research using VMD technology, the residual term is discarded or treated as an ordinary component. None of these studies focused on the residuals that contain rich information.(2)Existing studies using ELM algorithms only reduce the reliability of prediction results and reduce the stability of model prediction performance.(3)Multistep forward forecasting can help investors plan long-term investment goals. However, most of the existing research studies on stock index price forecasting are one-step forward, and few research studies are on multistep forward prediction.
In view of the shortcomings of the existing research, the VMD-RES.-EEMD-DE-ELM hybrid model was constructed in this paper, where RES. represents the residual term after VMD. Based on the secondary decomposition technology of VMD and EEMD, this model further combines the ELM model optimized by the DE  algorithm. The innovations of the VMD-RES.-EEMD-DE-ELM model are as follows:(1)The secondary decomposition technique combining VMD and EEMD is applied to the price sequence, in which EEMD is used to process the residual term of VMD. The secondary decomposition technology can better capture the overall features of the data and improve the overall prediction accuracy of the hybrid model.(2)The DE algorithm, which has been proven effective [44, 56, 57], is introduced to optimize the input weight and hidden layer thresholds of ELM during the training process, thus improving the stability and reliability of ELM prediction results. The DE-ELM model is used as the prediction model in the VMD-RES.-EEMD-DE-ELM model.(3)The VMD-RES.-EEMD-DE-ELM model is applied to the multistep forward prediction, which verifies the accuracy and robustness of the model’s prediction performance.
In order to verify the validity of the model proposed in this article, the daily price data of the S&P 500 index and the HS300 index are used as empirical samples, and seven benchmark models are selected, respectively, namely, ELM, KELM, DE-ELM, EEMD-DE-ELM, EEMD-VMD-DELM, VMD-DE-ELM, and VMD-RES.-DE-ELM.
The rest of this article is as follows: Section 2 introduces the composition methods of the VMD-RES.-EEMD-DE-ELM model and the detailed steps of constructing the VMD-RES.-EEMD-DE-ELM model. Section 3 introduces other benchmark models and conducts empirical analysis on the S&P 500 index and the HS300 index samples to test the performance of the model proposed in this paper. Section 4 is the summary of this article.
2. Introduction of Methodology
This paper builds a combination model (VMD-RES.-EEMD-DE-ELM) based on the secondary decomposition technique and the machine learning method to predict the price of grain futures. Before building the model, it is necessary to briefly introduce the components of the model combination: VMD technology, EEMD technology, extreme learning machine, differential evolution algorithm, and VMD-RES.-EEMD-DE-ELM hybrid model constructed in this paper.
VMD is used to decompose the original input signal into band-limited intrinsic mode functions with certain sparsity. In this paper, it is simply called VMF, i.e., the set of , in which the signal evolution of each mode is expanded around the center frequency . By minimizing the sum of the bandwidth estimates of all modes, the corresponding mode component signal and related parameters can be obtained. The VMD signal decomposition process is also the solution process of the variational constraint problem, which is shown as follows:where are each modal component VMF after VMD; are, respectively, the center frequencies corresponding to each VMF; represents the partial derivative to ; is the impact function; is the convolution symbol; is the original input signal. The analytic signal of correlation is obtained by the Hilbert transform to obtain its unilateral spectrum. The exponential term is used to adjust the estimated value of each and integrate the spectrum of into the basic frequency band. In order to solve the above constrained variational problem, the constrained variational problem needs to be converted into an unconstrained problem. By introducing the quadratic penalty factor and the Lagrange operator , the following Lagrange expression describing the unconstrained variational problem is obtained:where the penalty factor is used to ensure the accuracy of signal reconstruction; Lagrange operator can maintain strict constraint conditions. Further, an alternative direction method of multiplications (ADMM) is adopted to carry out iterative search to find the saddle point of the Lagrange function, thus obtaining the optimal solution of the unconstrained variational problem of equation (2). The expressions of VMF and center frequency are, respectively:
The specific implementation steps of the VMD decomposition method are as follows: Step 1. Initialize the values of parameters such as VMF and center frequency, that is, , , and select the appropriate number of VMFs, that is, the value of . Step 2. Update the values of and according to equations (3) and (4), respectively. Step 3. Update the value of : Step 4. Given the determination accuracy , if the following condition is met:
Then, stop the iteration; otherwise, return to Step 2. In the above equation, , , and are the Fourier transforms corresponding to , , and .
Compared with empirical mode decomposition (EMD), EEMD has stronger adaptability and local variation characteristics  and can effectively identify the nonlinearity and nonstationarity of time series. For the unparsed original signal, the brief implementation of the EEMD algorithm is as follows : Step 1. After adding white noise sequence , is obtained. Step 2. is decomposed into IMFs and residuals. Step 3. Repeat steps 1 and 2 times, using different white noises each time ( is the integration number). Step 4. The influence of white noise on the original components is eliminated by calculating the average of all IMF components and residual terms. So far, the complete time series can be expressed as the sum of IMF and residual terms:where is the inherent modal function, represents the final residual component, is the total number of IMF, and is the modal index. The IMF should meet the following two conditions [41, 57]: first, the number of extreme values and the number of zero crossings must be equal or the difference between the two must be at most 1 over the entire length of the IMF; secondly, in any case, the average value of the upper envelope defined by the local maximum and the lower envelope defined by the local minimum is zero. Therefore, IMF1 represents the maximum amplitude and the highest frequency, while the other IMF subsequently has lower amplitude and frequency, and the residual term represents the slow change process around the long-term average.
Extreme learning machine (ELM) was originally proposed by Huang Guangbin. As a special kind of the single-hidden layer feedforward neural network (SLFN), it contains only one hidden layer, and its network structure is shown in Figure 1.
In the ELM network training process, the connection weights of the input layer and the hidden layer and the threshold value of the hidden layer can be randomly given and need not be adjusted. Assuming there are arbitrary training samples , where , , given the hidden layer number of ELM is , the randomly generated connection weight matrix of the input layer and the hidden layer is , and the offset of the hidden layer is ; the network output can be expressed aswhere is the training output, is the connection weight between the hidden layer and the output layer, is the corresponding activation function, and is the inner product of and . The goal of the ELM network training is to make the error function reach the minimum, that is:
Given the input weight and the hidden layer offset (equation (9)), the solution process of the network is transformed into searching for the optimal output weight , so that
The matrix expression iswhere (equation (12) is the output matrix of hidden layer nodes:
An expression further derived from formula (13) is as follows:where (equation (14)) is the left inverse matrix of . Therefore, once the input weight and the hidden layer offset are randomly given in the ELM training process, the output matrix of the hidden layer can be uniquely determined without adjusting the parameters in the training process.
2.3.2. Differential Evolution
DE algorithm is an evolutionary computation method based on population difference and has strong global optimization ability. Firstly, a mutation operation is performed on the difference of parent individuals to generate mutation individuals. Then, the parent individual and the variant individual perform crossover operation according to a certain probability to generate mixed individuals. Finally, the selection operation of survival of the fittest is carried out between the parent and the mixed individuals according to the fitness to achieve the evolution of the population. The basic process is shown in Figure 2.
The basic steps are as follows: Step 1. Initialize the related parameters: for example, the population size is set to ; the mutation factor is ; the cross factor is ; the spatial dimension is ; the evolutionary algebra . Step 2. Initialize the parent population: , . Step 3. Calculate the fitness value of each individual, that is, the objective function of the model. Step 4. Perform the mutation operation: the mutant individuals are generated by the difference between the parental individuals . The generation process is shown in the following equation: where , , and (equation (15)) are, respectively, the three parent individuals randomly selected from the population and . Step 5. Crossover operation: a crossover operation is performed between the parent individual and the mutant individual to generate a mixed individual . The generation process is as follows: where is a random number valued in the range and is a random integer valued at . Step 6. Selection operation: the differential evolution algorithm keeps excellent individuals and eliminates inferior individuals through continuous evolution. It selects the best one with the best fitness value from the parent individual and the mixed individual as the next generation individual , and the selection process is shown in the following formula: where is the fitness function, which is the objective function that needs to be optimized, and the purpose is to find its minimum value. Step 7. Iteration termination test: if the error requirement is met or the maximum number of iterations is reached in the generated next-generation population , the iteration is stopped and the optimal individual is output. Otherwise, mutation, crossover, and selection operations continue until the iteration stop condition is met.
2.3.3. Construction of DE-ELM Model
Since the input weight matrix and hidden layer threshold of the model are randomly set during the network training of the basic ELM, the training stability of the model is weakened. In order to improve the training stability of the ELM, the DE algorithm is adopted to optimize the input weight matrix and hidden layer threshold of the ELM model. It is worth mentioning that in the ELM model, the optimization of the input weight matrix and the hidden layer threshold needs to first determine the number of hidden layer neurons. Based on experience, this paper sets the number of hidden layer neutrinos in the ELM network at 35. On this basis, the DE-ELM model is proposed in this paper. Its specific steps are as follows: Step 1. The random input weight and the hidden layer threshold of the ELM model are encoded to initialize the population. Step 2. Initialize relevant parameters of the DE algorithm. The mutation factor of the DE algorithm is 0.5, the crossover factor is 0.9, the maximum number of iterations is 20, the dimension of each individual is 245, and the population size is 10. This paper is based on the same settings as above for all hybrid models. Step 3. The fitness value of each individual in the population is calculated, which is the root mean square error value of the predicted output of the ELM model. Step 4. Mutation, crossover, and selection operations are performed in sequence, and iterative termination check is performed. If the termination condition is satisfied, the optimal input weight and the hidden layer threshold are output; otherwise, the iteration is continued until the optimal output is obtained. Step 5. The input weight and hidden layer threshold of the ELM model are set as the optimal individuals optimized by the DE algorithm, thus obtaining the optimized and improved DE-ELM model.
In the above steps of optimizing the ELM model, the specific decision variables are the input weight matrix and the hidden layer threshold, and the goal of optimization is to minimize the value of the root mean square error (RMSE) index as a function of fitness. The specific formula of RMSE can be found in equation (19).
In addition, the input layer dimension of the ELM network in the DE-ELM model is set to 6; that is, the data of the 7th day are predicted based on the data of the previous 6 days, and by analogy, rolling prediction is carried out. That is, , where is the predicted value of the sequence at time and , …, are the values of the previous 6 days of the sequence at time , respectively.
2.4. Construction of VMD-RES.-EEMD-DE-ELM Hybrid Model
As we all know that the price sequence of the stock index shows nonstationary and nonlinear complex characteristics. The VMD technology can effectively remove noise from the original price series and extract the main features hidden in the original price series. In addition, unlike EEMD technology, the residual term remaining after VMD technology processes the original sequence contains rich information. However, previous studies directly discarded the residual term or treated the residual term as a common component, which failed to capture the complete characteristics of the time series and reduced the overall prediction accuracy of the combined model. Therefore, this paper proposes a secondary decomposition technology based on the combination of VMD and EEMD technology. After VMD processes the original sequence, EEMD technology is used to further process the residual term. Further, this paper combines the forecast advantages of the DE-ELM model to build the VMD-RES.-EEMD-DE-ELM hybrid model. The step diagram of visualization is shown in Figure 3. The detailed modeling steps of the model are as follows: Step 1. VMD technology is applied to the original price sequence of the stock index to obtain various modal components (denoted as VMF) and the residual item containing complex information. Step 2. The different modal components are normalized. After selecting an appropriate number of training sample data and test sample data, the DE-ELM model is applied to train and test the training samples to obtain the respective prediction results of each modal component. Step 3. The EEMD technique is applied to the residual term to obtain different modal components (denoted as IMF). The content in step 2 is repeated, and the DE-ELM model is used to predict each component. After the superposition, the predicted value of the residual term can be obtained. Step 4. The predicted values of each modal component obtained after applying the VMD technology are superimposed, and finally the predicted values of the original price series are obtained.
3. Empirical Analysis
3.1. Sources and Processing of Data
This paper selects the S&P 500 index and the HS300 index of mainland China as representatives of the stock market [58–60] to test the prediction accuracy of the VMD-RES.-EEMD-DE-ELM hybrid model constructed in this paper. The S&P 500 index covers a wide range of stocks in the market, while the HS300 index consists of 300 large-scale and highly liquid-constituent stocks on China’s Shanghai and Shenzhen Stock Exchanges, both of which are representatives of the securities market.
The sample of the S&P 500 index selected in this paper covers a period of 1,511 transaction data from January 2, 2014, to December 27, 2019. The sample of the HS300 index covers 1,465 transaction data from January 2, 2014, to January 2, 2020. All models in this paper are run by using MATLAB 2019b software package. All the data come from the Choice Financial Terminal of Oriental Fortune.
The samples in this paper are composed of the training set and the test set, respectively. In the sample of the S&P 500 index, the first 1361 data are training sets, and the next 150 data are set as test sets. In the sample of the HS300 index, the first 1315 data are training sets, and the next 150 data are set as test sets. Figure 4 shows the trend of the S&P 500 index and the HS300 index samples. Table 1 lists the information of descriptive statistical analysis of the S&P 500 index and the HS300 index, including the maximum (Max), minimum (Min), average (Ave), median (Med), and standard deviation (Std).
In addition, in order to further test the robustness of the model, all models in this paper are predicted by one-step, three-step, and five-step-ahead forecasting; i.e., the data of the previous 6 trading days are used to predict the data of the 7th, 9th, and 11th trading days, respectively. That is, in all the prediction models constructed in this paper, the input variables of the model are the data of the previous 6 trading days, and in the one-step-ahead forecasting, the output variable is the forecast data of the 7th trading day; in the three-step-ahead forecasting, the output variable is the forecast data of the 9th trading day; in the five-step-ahead forecasting, the output variable is the forecast data of the 11th trading day.
When the original sequence is processed by VMD technology, the number of preset modal components must be given in advance. Assuming that the number of decomposed modal components is n, this paper chooses n as 10 according to the principle that the n-power of 2 does not exceed the sample data. Furthermore, in the process of applying the DE-ELM model to train the data, in order to ensure the training effect of the DE-ELM model, this paper adopts the normalization process of “min-max dispersion standardization” to the sequence data of each modal component generated after decomposition. The specific mathematical expression is as follows:where is the normalized subsequence data, is its original value, and and are its maximum and minimum values, respectively.
3.2. Evaluation Index of Model Prediction Results
In order to visually compare the prediction effects of different prediction models. This article introduces three evaluation indicators of prediction performance, namely, root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The specific calculation formula is as follows:where and are the real and predicted values of the S&P 500 index and the HS300 index, respectively, is the scale data of the test sample, is the serial number of the test sample points, and , , and indicate the prediction accuracy of the model; the smaller the value of the index, the higher the prediction accuracy of the model.
3.3. Comparative Analysis of Model Prediction Effects
In order to prove the superiority of the VMD-RES.-EEMD-DE-ELM model, this paper constructs seven other different benchmark models for comparison. Figure 5 shows the multistep forward prediction results of each model for the S&P 500 index and the HS300 index, respectively. In addition, Tables 2–4, respectively, record the specific values of evaluation indexes generated by the eight prediction models for the S&P 500 index and the HS300 index prediction. The empirical results show that the VMD-RES.-EEMD-DE-ELM hybrid model proposed in this paper has achieved the best performance in all cases compared with all other benchmark models in the prediction results of the S&P 500 index and the HS300 index. Detailed analysis of the results is described below.
3.3.1. Comparative Analysis of Single Prediction Models
Table 2 shows the prediction results of the ELM model, the kernel extreme learning machine (KELM) model, and the DE-ELM model for the S&P 500 index and the HS300 index. In the prediction results of the one-step, three-step, and five-step-ahead forecasting, compared with the traditional ELM model and the KELM model, the DE-ELM model has obvious improvement in the indexes of , , and ; that is, the DE-ELM model optimized by the DE algorithm has prediction advantages over the basic ELM model and the KELM model, showing the superiority and robustness of prediction performance. Therefore, the DE-ELM model is selected as the prediction model in the hybrid model constructed in this paper.
3.3.2. Comparative Analysis of Mixed Models without Residual Decomposition
Table 3 shows the prediction results of the EEMD-DE-ELM, EEMD-VMD-DE-ELM, and VMD-DE-ELM models for the S&P 500 index and the HS300 index. EEMD-DE-ELM is to discard the residual term after applying EEMD technology to the original sequence and use the DE-ELM model to carry out combined prediction on the remaining components. The model EEMD-VMD-DE-ELM indicates that after EEMD technology is applied to the original sequence, VMD technology is further applied to the first high-frequency component (IMF1) to improve its prediction accuracy, and then the prediction results of each component and residual term are superimposed to form the final prediction result. VMD-DE-ELM discards the residual term after applying VMD technology to the original sequence and uses the DE-ELM model to carry out combined prediction on the remaining various components.
By comparing the prediction results of the single prediction model in Table 2 and the prediction model combined with decomposition technology in Table 3, it can be found that the prediction accuracy of the combined model has an overwhelming advantage over the noncombined model. The prediction accuracy of the single prediction model is poor, while the prediction model using decomposition technology has greatly improved its prediction accuracy. The results show that the decomposition technique can effectively reduce the complexity of the yield series and improve the prediction accuracy.
In addition, by comparing the prediction results of the EEMD-DE-ELM model and the VMD-DE-ELM model, it can be found that the prediction accuracy of the VMD-DE-ELM model is better than that of the EEMD-DE-ELM model in the same decomposition level. Although the EEMD-VMD-DE-ELM combination model with secondary decomposition technology has achieved better prediction results than the EEMD-DE-ELM model, its performance is still inferior to the VMD-DE-ELM model. The results show that the performance of VMD technology in extracting sequence data features and processing complex signals is better than EMMD technology.
3.3.3. Comparative Analysis of Mixed Models considering Residual Decomposition
VMD-RES.-DE-ELM in Table 4 is a combined model that uses the DE-ELM model to predict each VMF component and residual term after applying VMD technology to the original sequence. The VMD-RES.-EEMD-DE-ELM model is the hybrid model proposed in this paper.
By comparing the prediction results in Tables 3 and 4, it can be known that in the results of the multistep forward prediction, all the evaluation indexes of the VMD-Res.-DE-ELM model including the residual term are better than that of the VMD-DE-ELM model. It shows that the inclusion of residual terms containing complex information into the model for predictive analysis can help improve the overall prediction effect of the model.
Further, by comparing the prediction results between the VMD-RES.-DE-ELM model and the VMD-RES.-EEMD-DE-ELM model, it can be seen that the residual term generated by applying VMD technology to the original sequence does contain important information, and its complexity is relatively high. When the DE-ELM model is used to directly predict the residual term, its prediction effect is very limited. Therefore, in this study, the VMD-RES.-EEMD-DE-ELM model is the best choice, and the empirical results reasonably confirm that the proposed secondary decomposition technology can combine the advantages of VMD and EEMD to generate smoother subsequences, thus further obtaining more accurate prediction results.
Accurate stock index price predictions are not only of great significance to regulators, but also crucial to the stability of the financial system. Based on the advantages of VMD, EEMD technology, and extreme learning machine, this paper proposes a combined model VMD-RES.-EEMD-DE-ELM, which combines secondary decomposition technology and the artificial intelligence algorithm. Further, this article takes the S&P500 index and the HS300 index as experimental samples and , , and as evaluation indexes to test the performance of the VMD-RES.-EEMD-DE-ELM model compared with other seven benchmark models in the results of one-step, three-step, and five- step forward prediction. The conclusions of empirical analysis are as follows:(1)The VMD-RES.-EEMD-DE-ELM hybrid model proposed in this paper not only takes full advantage of the secondary decomposition, but also solves the problem that the residual term is not fully considered in the traditional time series prediction model based on VMD technology. In the prediction research of the price of the S&P 500 index and the HS300 index, the VMD-RES.-EEMD-DE-EELM model has achieved the best performance in all cases, indicating that the hybrid model proposed in this paper can fully capture the characteristics of the original sequence and has good prediction performance.(2)In the prediction research of the S&P 500 index and HS300 index prices, the VMD technology has higher decomposition accuracy of the original sequence than the EEMD technology. In the empirical prediction test combined with the DE-ELM algorithm, the VMD technology can better improve the model prediction effect than the EEMD technology.(3)In the prediction research of the S&P 500 index and HS300 index prices, the secondary decomposition technology has demonstrated its superiority compared to the single decomposition technology. The two-layer decomposition technology can better handle complex original sequences and effectively improve the prediction accuracy of the hybrid model combined with the prediction model.
However, the trend of the stock market’s price series has complex influencing factors, such as the economic development of different countries, geopolitical conflicts, and the effects of linkages between international markets, the scale of industrial development and business conditions of different industries. Therefore, multidimensional and complex influencing factors can be included in the research scope in future research to further improve the overall prediction effect.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This study was funded by two National Natural Science Foundations of China (Grant nos. 71573042 and 71973028).
Z. Hu, W. Liu, J. Bian, X. Liu, and T. Y. Liu, “Listening to chaotic whispers: a deep learning framework for news-oriented stock trend prediction,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining-WSDM 2018, pp. 261–269, Los Angeles, CA, USA, February 2018.View at: Google Scholar
J. Dong, W. Dai, Y. Liu, L. Yu, and J. Wang, “Forecasting Chinese stock market prices using Baidu search index with a learning-based data collection method,” International Journal of Information Technology & Decision Making, vol. 18, no. 5, pp. 1605–1629, 2019.View at: Publisher Site | Google Scholar
J. Z. Wang, J. J. Wang, Z. G. Zhang, and S. P. Guo, “Forecasting stock indices with back propagation neural network,” Expert Systems with Applications, vol. 8, no. 11, pp. 14346–14355, 2011.View at: Google Scholar
Y. Sai, F. T. Zhang, and T. Zhang, “Research of Chinese stock index futures regression prediction based on support vector machines,” Chinese Journal of Management Science, vol. 3, pp. 35–39, 2013.View at: Google Scholar
Y. F. Yang, Y. K. Bao, Z. Y. Hu, and R. Zhang, “Crude oil price prediction based on empirical mode decomposition and support vector machines,” Chinese Journal of Management, vol. 12, pp. 1884–1889, 2010.View at: Google Scholar
S. Wang, L. Yu, and K. K. Lai, “Crude oil price forecasting with TEI@I methodology,” Journal of Systems Science and Complexity, vol. 18, no. 2, pp. 145–166, 2005.View at: Google Scholar
N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903–995, 1998.View at: Publisher Site | Google Scholar
Z. Wu and N. E. Huang, “A study of the characteristics of white noise using the empirical mode decomposition method,” Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 460, no. 2046, pp. 1597–1611, 2004.View at: Publisher Site | Google Scholar
H. L. Li and C. E. Feng, “Relationship between investor sentiment and stock indices fluctuation based on EEMD,” Xitong Gongcheng Lilun Yu Shijian/System Engineering Theory and Practice, vol. 34, no. 10, pp. 2495–2503, 2014.View at: Google Scholar
D. Wang, H. Luo, O. Grunder, Y. Lin, and H. Guo, “Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by firefly algorithm,” Applied Energy, vol. 190, pp. 390–407, 2017.View at: Publisher Site | Google Scholar
G. Bin Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 985–990, Budapest, Hungary, July 2004.View at: Publisher Site | Google Scholar
K. Yan, Z. Ji, H. Lu, J. Huang, W. Shen, and Y. Xue, “Fast and accurate classification of time series data using extended elm: application in fault diagnosis of air handling units,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 7, pp. 1349–1356, 2019.View at: Publisher Site | Google Scholar