Interpretation of Machine Learning: Prediction, Representation, Modeling, and Visualization 2021
View this Special IssueResearch Article  Open Access
Jie Wang, Jun Wang, "A New Hybrid Forecasting Model Based on SWLSTM and Wavelet Packet Decomposition: A Case Study of Oil Futures Prices", Computational Intelligence and Neuroscience, vol. 2021, Article ID 7653091, 22 pages, 2021. https://doi.org/10.1155/2021/7653091
A New Hybrid Forecasting Model Based on SWLSTM and Wavelet Packet Decomposition: A Case Study of Oil Futures Prices
Abstract
The crude oil futures prices forecasting is a significant research topic for the management of the energy futures market. In order to optimize the accuracy of energy futures prices prediction, a new hybrid model is established in this paper which combines wavelet packet decomposition (WPD) based on long shortterm memory network (LSTM) with stochastic time effective weight (SW) function method (WPDSWLSTM). In the proposed framework, WPD is a signal processing method employed to decompose the original series into subseries with different frequencies and the SWLSTM model is constructed based on random theory and the principle of LSTM network. To investigate the prediction performance of the new forecasting approach, SVM, BPNN, LSTM, WPDBPNN, WPDLSTM, CEEMDANLSTM, VMDLSTM, and STGRU are considered as comparison models. Moreover, a new error measurement method (multiorder multiscale complexity invariant distance, MMCID) is improved to evaluate the forecasting results from different models, and the numerical results demonstrate that the highaccuracy forecast of oil futures prices is realized.
1. Introduction
Crude oil is a natural and nonrenewable resource that has an irreplaceable effect on the development of the global economy and international financial markets. Since oil is the main source of energy production, it is often considered the single important commodity in the world. The price fluctuations of crude oil may affect the economic situation, social stability, and even national security in the world [1]. Meanwhile, international crude oil price series are regarded as nonlinear and nonstationary time series. Hence, accurate forecasting of the crude oil price is a challenging task of energy market and has increasingly become an active research field.
In recent years, numerous methods for time series predictions have been proposed [2–13]. These methods can be classified into the following three categories: traditional econometric models, machine learning approaches and deep learning models. The autoregressive integrated moving average model (ARIMA) is a popular statistical model applied to time series prediction. Liu et al. [3] proposed two novel forecasting models based on ARIMA, which was employed to forecast two sections of actual wind speed series. Abdollahi and Ebrahimi [4] established a new composite model to predict Brent crude oil prices by integrating the adaptive neuro fuzzy inference system (ANFIS), autoregressive fractionally integrated moving average (ARFIMA), and Markovswitching models. However, the traditional econometric models have evident shortcomings. For instance, the time series data must be stable when these models are used for forecasting. It is difficult to capture the characters if the datasets are nonstationary. Therefore, the model is less effective when applied for time series forecasting during periods of sharp fluctuations [14]. With the development of artificial intelligence, machine learning models, such as support vector machine (SVM) and artificial neural networks (ANNs), have attracted a lot of attention because of the learning capabilities for nonlinear kernel mapping between input and output vectors. For instance, Huang et al. [7] explored the forecasting ability of SVM for financial movement direction and proposed a combining model based on SVM and classification methods. Ghiassi et al. [15] presented a dynamic neural network model for time series events prediction, and compared with the ARIMA model, the prediction results of the proposed model have higher accuracy. Liao and Wang [6] established an improved neural network, the stochastic timeeffective neural network model, and analyzed the volatility statistics characteristics of the Chinese stock price indices. Wang and Wang [8] established a hybrid model by combining the principle component analysis (PCA) algorithm and random timeeffective neural networks (STNN) and explored the predictive performance by considering financial time series. Although machine learning techniques have considerable prediction processing capacity, their precision on the correlations exploring between data is still not efficient. Meanwhile, these methods are extremely timeconsuming for big data and predictions are not quite expected [16]. With the establishment of the hidden layer units, the transmission of historical information can be realized by recurrent neural networks (RNNs). Wang and Wang [9] proposed a new forecasting model to elevate the prediction accuracy of crude oil price fluctuations, which is based on multilayer perceptrons (MLP) and Elman recurrent neural networks (ERNN) with stochastic time effective function. Berradi and Lazaara [17] combined principal component analysis and RNNs to predict the stock price from Casablanca Stock Exchange, and the results enhanced the accuracy of the original method and performed a desirable prediction for the stock price. Deep learning methods are the broader series of machine learning methods, which try to learn advanced features from the given data. Compared with traditional neural network models, deep learning methods contain multiple hidden layers of multilayer perceptrons, and they have better performances in managing strong nonlinear characteristics. Long shortterm memory network (LSTM) is a type of deep learning method devised to deal with the longterm dependence problems for a special purpose [18]. The network structure of LSTM is much more complex than that of RNNs, which utilizes memory cell states to maintain essential historical information and get rid of the unimportant. Due to the superior algorithm mechanism, LSTM is widely applied to natural language processing (NLP) and sentimental analysis [19, 20], time series forecasting [10, 21, 22], and synthesizing a piece of music [23]. However, the individual forecasting models cannot precisely reveal the complicated connections existing in the nonlinear and nonstationary datasets.
To obtain more accurate and reliable time series prediction, different kinds of hybrid forecasting models have been proposed which could take the advantage of different single models [24–26]. Among them, the hybrid models based on decomposition and prediction have been widely recognized, and such models are usually composed of nonlinear decomposition method and forecasting model. Liu et al. [27] presented an improved hybrid forecasting model for wind speed, which includes the empirical wavelet transform method and three types of deep learning networks. By comparing all the data results of different methods, the proposed reinforcement learning based hybrid model is effective in combining three types of deep learning networks and performs better than conventional optimizationbased hybrid models. Wang and Wang [28] combined empirical mode decomposition (EMD) method with random time strength neural network to predict global stock indices, and the empirical results showed that the proposed approach veritably has a great effect in predicting stock market fluctuations. Wang et al. [29] established a twolayer decomposition model and then developed an ensemble approach by integrating the fast ensemble empirical mode decomposition method (FEEMD), variational mode decomposition (VMD), and optimized backpropagation neural network by firefly algorithm (FABPNN). The empirical results indicated that the developed new model has exceptional forecasting implementation in electricity price series. The first key point of hybrid models is to break down the original data series into several independent subseries and makes it likely for models to adaptively learn the nonlinear characteristics of fluctuations in each subseries. Then, by using the inverse transformation algorithm, the forecasting series of each subseries are integrated to acquire the final forecasting results. These hybrid models could raise the efficiency and precision of modelling by conquering the handicap of nonlinear and nonstationary of original series [30–32]. The empirical results show that wavelet transform (WT) is a timefrequency localization analysis method in which the window area is fixed but its shape can be changed. Because it only redecomposes lowfrequency signals during the decomposition process, and no longer breaks down highfrequency signals, its frequency resolution decreases as the frequency increases. The EMD, FEEMD, and VMD methods also have some certain limitations, for example, inadequate mathematical explanations, the boundary effects, noise oversensitivity, and pattern overlap. These may cause excessive decomposition of the original data and adversely affect the prediction results [33, 34]. On the other hand, the wellknown deep learning model causes overfitting problems and is always based on historical information without thinking over the statistical regularity of behavior in the financial market, which leads to deficient precision [10, 32].
To improve the disadvantages of the above widely recognized decomposition methods and the traditional deep learning methods, this paper proposes a novel ensemble energy forecasting framework, WPDSWLSTM, which combines wavelet packet decomposition (WPD), the stochastic time strength weights (SW) method, and LSTM. The WPD is proposed on the basis of the issue that the inferior frequency resolution of wavelet decomposition in the highfrequency range and poor time resolution in the lowfrequency range. It is a more sophisticated method of signal analysis to improve the temporal resolution signal. Moreover, the WPD working speed is faster than the traditional WT, and by selecting the appropriate wavelet basis function and mother function, the mixingfrequency problem can be improved. Therefore, WPD is adopted in this research to explore the complexity of nonlinear characteristics for original energy future time series. In fact, there are complicated factors that affect energy futures prices in the process of market transactions fluctuations. SW is based on stochastic process which conforms with both the real trading market and the gating mechanism in the forecasting model [6, 8, 10]. The mechanism of SW is to measure historical information in conformity with the time of occurrence. The newer the historical data occurs, the more valuable its data information is to present future information, so that historical price figures can be employed to advanced pick up the fluctuations statistics in the energy futures series. In addition, this research employs the WPD method to extract the original crude oil series for the first time and firstly improves the conventional LSTM model with stochastic time strength weights for the crude oil prices forecasting. With the method of WPD, the original energy futures price series can be decomposed into several subseries (), which are in different frequency bands. Then, different SWLSTM models are modeled for the corresponding , respectively. Finally, the ensemble forecasting result of the original energy futures series is produced by integrating all the predicted components. To estimate the predictive power of the proposed model WPDSWLSTM, the conventional and latest hybrid models (SVM, BPNN, LSTM, WPDBPNN, WPDLSTM, CEEMADLSTM, VMDLSTM, and STGRU) are introduced for comparative analysis. In order to reveal the predictive capabilities of different forecasting models, quantitative analysis is performed through different error methods. At the same time, this research proposes a new error measurement method called multiorder multiscale complexity invariant distance (MMCID) [9,35]. The main contributions of this paper are summarized as follows:(a)A novel hybrid forecasting model SWLSTM is established for energy futures series, which based on the LSTM network and the theory of stochastic process.(b)Combined with WPD method, several subseries () with different fluctuation frequency are derived from the original data series. Each is trained by the new SWLSTM model, respectively.(c)The empirical results of corresponding forecasting models are estimated and contrasted with different error criteria and the new measurement MMCID.
The structure of this article is as follows. Section 2 explains the price datasets from the energy futures markets. Section 3 introduces the WPD and SWLSTM methodologies and provides the main framework of this paper. Section 4 demonstrates the experimental forecasting results in detail. Section 5 compares the proposed hybrid method with other models, which are SVM, BPNN, LSTM, WPDBPNN, WPDLSTM, CEEMADLSTM,VMDLSTM, and STGRU. Moreover, error measurement methods are applied to estimate the prediction performance of each model in this section. Finally, Section 6 summarizes the main conclusion of this study.
2. Datasets
Crude oil is an international bulk financial commodity, which can be traded in markets around the world either through spot oil or through financial derivative contracts. This research mainly focuses on the oil futures market, and four representative oil futures indices are selected for the case study: west Texas intermediate (WTI) futures prices series, Brent crude oil futures prices series, RBOB gasoline, and heating oil. These four datasets are from the New York Mercantile Exchange (NYMEX) energy futures market, which can be downloaded from https://www.wind.com.cn/. WTI crude oil price is widely applied in the pricing of US domestic crudes. Brent is the theoretical international oil benchmark, and prices of most oil use Brent crude as the criterion, which connected with twothirds of all the world’s oil contracts. Brent crude and WTI dominate the oil market, and both determine pricing in their corresponding markets. They are known as light sweet oil because they contain low sulfur, making it “sweet,” and have low density, making it “light.” Gasoline and heating oil are refined from crude oil which are usually merchandised as futures contracts in financial markets. Figure 1 reveals the similar dynamic changes in more than a 10year period from January 2, 2009, to October 23, 2019, of the four corresponding oil futures series. In the past decades, the price fluctuation trends of these four futures series are almost the same, which manifest that there is a certain correlation between them.
(a)
(b)
(c)
(d)
3. Methodology
3.1. Wavelet Packet Decomposition
Wavelet transform is a mathematical method produced to solve the problem of decomposition of nonstationary signals. Compared with wavelet analysis, wavelet packet decomposition (WPD) can be used to analyze the signal more meticulous. Wavelet packet analysis can divide the timefrequency plane in more detail, and the resolution of the highfrequency part of the signal is better than wavelet analysis [36]. It can also adaptively select the best wavelet basis function according to the characteristics of the signal in order to better analyze the signal. The theory of the WPD analysis is as follows [37–39]. The wavelet packet function is a timefrequency function; it can be defined aswhere the integers and are the index scale and translation operations. The index is an operation modulation parameter or oscillation parameter. The first two wavelet packet functions are the scaling and mother wavelet functions:
When , the function has the following recursive relationship:where and are the quadrature filter function related to the previously defined scaling function and mother wavelet function. The wavelet packet coefficients are calculated by the inner product , which is defined as
According to the literature [40], the number of the decomposition level is often in the range from 2 to 4 in forecasting model. In the present work, the 3level framework of WPD algorithm is applied, which is schematically shown in Figure 1(a). Additionally, the Daubechies wavelets of order 4 are employed as the mother wavelet in this research [41], and the corresponding decomposition result of the WTI crude oil is demonstrated in Figure 2(b). Each subseries with different frequency band represents a sort of oscillatory factor embedded in the futures price indices. In Figure 2(b), the decomposed subseries “DDD3,” “DDA3,” “DAD3,” “DAA3,” “ADD3,” “ADA3,” “AAD3,” “AAA3” are recorded as series subsequently.
(a)
(b)
3.2. Long ShortTerm Memory Network
Long shortterm memory networks are a particular form of RNNs that can handle with longterm and shortterm dependencies. They were introduced in 1997 by Hochreiter and Schmidhuber [18] and were improved and promoted in subsequent work. Although the structure of traditional RNNs are entirely component of handling longterm memory dependencies in theory, the effect is confined in the actual application [42]. Therefore, the memory storage capacity of RNNs is more suitable for shortterm sequences. On the basis of conventional RNNs, cell states and gate mechanism are added to the hidden layer, so that the gradient vanishing problem can be largely mitigated through its control gates. In addition, each time the historical message is dispatched to the neurons of the hidden layer, several control gates with different functions are employed to regulate the information of the past and latest. The principle of the control gate is described as follows. It is mainly composed of a sigmoid neural net layer and a pointwise multiplication operation. The output values of sigmoid function stage are between 0 and 1, which indicate how much information can be delivered to the next step. A value of zero means letting nothing through, while a value of one means letting everything through. Specially, when the value is 0, it means nothing can be transmitted, and when the value is 1, it implies everything can be transmitted. The LSTM control gates involve three gates: the forget gate , the input gate , and the output gate . The forget gate determines how much historical information stored in the current moment from the last moment. The input gate judges the information saved in the cell state, and the output gate decides the output data based on the cell state. The architecture of LSTM network is shown in Figure 3. The description of LSTM networks follows Fischer and Krauss [43], Sainath et al. [44], and He et al. [45]. The specific algorithm steps of LSTM are as follows:(i)The memory cell reads in the input and the previous hidden state , which can reveal longterm dynamic trends and abandon the redundant useless information. The forget gate is determined by the following equation:(ii)The first part of input gate in the model determines how much current information should be retained in the cell state:(iii)The second part is to generate a new candidate vector to update the state, which is according to the following equation:(iv)After that, the new cell state is constructed on the basis of the outcomes of the last steps with denoting the Hadamard (elementwise) product:(v)Finally, the output gate is updated and the final output is decided based on the updated state and the output gate state:
In the previous equations, the following notation is used:(i) is the input vector at current time step .(ii), , , and are the weight matrices which associate with corresponding vectors. They can be spilt into(iii), , , and are bias indicators.(iv), , and are forget gate, input gate, and output gate vectors.(v) and are vectors for the cell states and candidate values.(vi) is a vector for the output of the LSTM layer.(vii) and are the sigmoid function and hyperbolic tangent function, respectively.
3.3. LSTM with Stochastic Time Effective Weight Function (SWLSTM)
Dufresne and Gatheral et al. [46, 47] demonstrate that the prediction of financial market price series should integrate great amount of historical data, because the information represented in different periods has different impacts on future results. In other words, the closer the data is to the current time, the stronger the impact of information is at that moment, and, on the contrary, the further the data is, the weaker the influence is [48]. Therefore, to improve the accuracy of forecasting in actual application, this paper considers combining the SW function with LSTM theory in the predictive modelling process. During the stage of model training, SW function is integrated into the LSTM model to construct a novel forecasting model, which is referred to as long shortterm memory with stochastic time strength weight function model (SWLSTM). The expression of SW function derives from a stochastic process [6]. It can assign different weights to different data in the light of the variant time of occurrence. The mathematical expression is as follows:where is the depth of market parameter, is the moment of the latest time point in the data set, and is an arbitrary time point in the dataset. is the standard Brownian motion which is commonly considered as random movement of a particle in liquid [49]. is the drift function which mainly direct trend changes. is the wave function which is applied to model the uncertain events during the forecasting process. The mathematical expression of and is as follows:
In the training process of conventional LSTM network, the parameter matrices , , , and are modified following the backpropagation in each iteration through time procedure of typical RNNs [17]. The model training error of the sample point is defined as
For the SWLSTM model, a new description of model training error can be obtained:
Then, the corresponding global error of model training is defined as
In the modelling process, based on the newly defined global error , the model parameters are updated through the gradient descent method [10, 50, 51]. First, the partial derivative of each model parameter needs to be calculated from the global error function. Then, the principle of parameter update is as follows:where denotes the input of the corresponding function, , , , and .
The above is the algorithm of SWLSTM model, which corrects the model parameters accords with the gradient descent method. Figure 4 illustrates the training algorithm procedures of the proposed model, which involve six steps. For the different subseries of different crude oil series, different hyperparameters, which include the training steps, the number of hidden layers units, the learning rate, number of iterations, and the batch size, should be trained by the proposed model. The specific modelling and empirical prediction are given in Section 4.
3.4. Forecasting Process of the Hybrid WPDSWLSTM Model
In this study, the fluctuation of energy futures prices is applied to the proposed hybrid forecasting model, WPDSWLSTM. The procedure of the WPDSWLSTM approach is described in brief subsequently, and the flowchart of this research is shown in Figure 5. Firstly, the main process of the proposed model is displayed on the upper left of Figure 5, which includes three steps. The first step is data decomposition, where the original preprocessed data are decomposed by WPD method. Then, applying the improved SWLSTM method for subseries forecasting step, the third step is the ensemble forecasting step. Then, the final forecasting results can be obtained by aggregating the subseries forecasting results with inverse wavelet packet transform. The specific description of each step is as follows: Step 1: the WPD technique is employed to analyze the original energy futures series . And, 8 subseries are derived from the threelayer WPD method, which indicate that the local oscillations in different frequency bands. The details of the WPD algorithm are given in Section 3.1. Step 2: each subsequence derived from WPD method is separated into training and testing datasets. The SWLSTM network is utilized to train and establish the forecasting model on the basis of the training dataset. Model parameters need to be set in advance, which includes the learning rate, the number of hidden layer units, the number of iterations, and the batch size. They are essential for predicting precision of the model. The training algorithm procedures of SWLSTM model are proposed in Sections 3.2 and 3.3. Step 3: it composites the prediction of each to obtain the final forecasting results by employing the theory of inverse wavelet packet transform. Moreover, linear regression and relative error are applied to investigate the correlation between predictive points and actual values. Step 4: multiple evaluation indicators are adopted to estimate the prediction ability of WPDSWLSTM, which involves MAE, RMSE, MAPE, SMAPE, and TIC and a novel method multiple multiorder complexityinvariant distance (MMCID) based on information theory. In addition, other models like SVM, BPNN, LSTM, WPDBPNN, and WPDLSTM are taken into account for prediction comparison.
4. Forecasting and Statistical Analysis
4.1. Data Preprocessing
To estimate the performance of the proposed WPDSWLSTM forecasting model, the futures prices of WTI crude oil, Brent crude oil, RBOB gasoline, and heating oil are selected. Table 1 displays the selected data sets of all indices that are from 02/01/2009 to 23/10/2019. Usually, the nontrading days are regarded as frozen such that this research only adopts the data during trading time. To conduct the experiments, nearly eighty percent of the samples from 2009 to 2017 are used to train the model, and the remaining twenty percent of data are used for testing to examine the effectiveness of the proposed model. Table 1 provides the selection and division of the four selected oil futures indices. Generally, to minimize the influence of noise and finally enhance the accuracy of forecasting, each subseries derived from WPD is normalized to the range of by the following standardized method [52, 53]:
 
Note: training number means the number in training set; testing number represents the number in testing set. 
After that, to acquire the true predictive value and then intuitively compare the numerical results with the actual value, the normalized output variables should be reverted to as follows:
4.2. Training and Forecasting by the Hybrid WPDSWLSTM Model
In this section, four different energy futures price series are carried out to support the proposed hybrid WPDSWLSTM model. The decomposition merit of WPD makes it exceptional in the extraction of feature sequences. The model parameters are trained by calculating the root mean square error between the predicted value and actual value. The global error between the predicted value and the actual target is reduced through weights modification. The training enters the next step when the global error is less than the preset value. For all prediction models involved in this article, the input units are set to 4, and the output units are set to 1. In WPDSWLSTM model, the batch size is set to 32, the hidden size is 30, and the epochs number is 400.
Afterwards, the normalized subseries obtained from WPD are trained and predicted by the SWLSTM model. The number of input samples is set to 4, and the number of outputs is set to 1; that is, the 4th order historical data are used to predict the data of the next period. Figure 6 shows the forecasting results of each subseries from the futures series of WTI crude oil. It is shown visually that the predicted value of each subseries is almost consistent with the actual values. With the purpose of illustrating the prediction from the SWLSTM forecasting model, Figure 7 demonstrates the empirical results of each subseries from RBOB gasoline. Figures 6 and 7 present decomposed forecasting results of WTI crude oil and RBOB gasoline as examples, which is a critical component that measures the fluctuations of the prediction, especially in forecasting the direction of fluctuations accurately. The subseries has been recognized as the whole trend of the futures price series, whose results from the proposed forecasting model are well predicted. The curves of the actual data and the predicted data intuitively are very approximating. Then, the final predictive results of the four sample datasets can be calculated by employing the theory of inverse wavelet packet transform.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 8 shows the final predictive results for four indices, WTI, Brent, heating oil, and RBOB, with the proposed WPDSWLSTM model. From this figure, the fluctuation trends of the predictive data are extremely near that of the actual data. In addition, the absolute correlation error results of the empirical analysis are also revealed in Figure 7, which can be calculated by . It can be concluded that the predicted results nearly have consistent trends with the fluctuations of the actual data. The results of are also centralized in , and only a few sectional data points surpass 0.01 and are smaller than 0.015. It means that with repeated experiments, the energy futures series have been trained excellently, and the forecasting performance of the WPDSWLSTM model is improving.
(a)
(b)
(c)
(d)
It is generally known that the predicted results and the actual value can be fitted by linear regression method, where the predicted points are regarded as the dependent variable , and the actual data are considered as the independent variable . Through linear regression analysis between the predicted value of the WPDSWLSTM model and the actual data, the prediction accuracy can be judged by the goodness of fit. The closer the goodness of fit value is to 1, the closer the predicted value is to the true value. An effective numerical indicator between the two variables is the correlation coefficient . The curves of linear regression for series WTI, Brent, heating oil, and RBOB are revealed, respectively, in Figure 9, and the numerical results are revealed in Table 2. In detail, the values of for these four series are all above 0.98, and the regression coefficients of the linear equations are near to 1, which indicates that the predicted values are almost close to the actual values. The regression equation parameters of the proposed model for WTI are , which is approaching to the ideal situation , followed by the Brent indices, . The heating oil is and RBOB gasoline is .
(a)
(b)
(c)
(d)

5. Models Comparison and Prediction Accuracy Evaluation
5.1. Performance Evaluation Criteria
While the established model WPDSWLSTM is utilized to the forecasting experiments, it is also indispensable to validate the forecasting effects of different models. Then, five models (SVM, BPNN, LSTM, WPDBPNN, and WPDLSTM) are employed to the forecasting evaluations in this part. Support vector machine (SVM) technique is displayed in this part, which is regarded as the stateoftheart machine learning theory for binary classification [54–56]. Additionally, to fully prove the effectiveness of the proposed model, BPNN, LSTM, and WPDBPNN are selected to make a comparison because the proposed model is constructed based on LSTM network, and backpropagation neural network (BPNN) is the most typical neural network. For the purpose of estimating the forecasting error of the new hybrid model and comparing it with other five models, the error measurement between actual data points and predicted value for different models are investigated. Among them, mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), symmetric mean absolute percent error (SMAPE), and Theil inequality coefficient (TIC) are selected as the error evaluation criteria, which can indicate the forecasting performance of each model. Generally, the smaller the error (MAE, RMSE, MAPE, SMAPE, and TIC) values are, the more accurate the predictive ability of the forecasting model is [52]. The evaluation definitions are expressed as follows:where and are the actual value and the predicted value at time , respectively, and is the total number of the data.
Figure 10 illustrates the forecasting results of WTI, Brent, RBOB, and heating oil for the six forecasting models in comparison. Additionally, the forecasting results from the insert plots of Figure 10 show the local prediction of training sets and testing sets from the proposed WPDSWLSTM model, respectively. It displays the distinct advantages contrast with the other five models, SVM, BPNN, LSTM, WPDBPNN, and WPDLSTM, especially at big fluctuation stages. Affected by the changes of social economy and various external environment, the energy market shows different fluctuations. Besides, the predicted results during the small fluctuation period seem comparatively accurate for all predictive models.
(a)
(b)
(c)
(d)
Tables 3–6 demonstrate a detailed comparison of the evaluation criteria quantitatively, by applying MAE, RMSE, MAPE, SMAPE, and TIC among aforementioned six models. The numerical results demonstrate that the evaluation indicators from the WPDSWLSTM model are all the smallest ones among these models, and the evaluation indicators by the hybrid models are almost less than those by the individual models. For example, the MAPE values for WTI futures indices from the first three hybrid models are 1.4329, 2.0092, and 2.7653, and the individual models MAPE values are 4.6351, 5.4562, and 5.6108, respectively. Overall, the empirical results demonstrate that the WPDSWLSTM predictor has higher forecasting accuracy. From the error evaluations, the hybrid models WPDSWLSTM, WPDLSTM, and WPDBPNN are superior to the LSTM, BPNN, and SVM models. Moreover, compared with the WPDLSTM and WPDBPNN model, the superior predictive accuracy of the proposed model WPDSWLSTM reflects that the stochastic time effective weights (SW) method can play an important role during forecasting process. In particular, after WPDLSTM is combined with SW, the hyperparameters are extremely improved, and error indicators MAE, RMSE, MAPE, SMAPE, and TIC are raised by 33.32%, 19.14%, 28.69%, 39.59%, and 48.06%, respectively. In order to show the forecast results more intuitively, Figure 11 displays the evaluation values of MAE, RMSE, MAPE, SMAPE, and TIC for different models, respectively. Due to the different data structures and character of these four indices, the left axis of Figure 11 in the case of WTI and Brent stands for the value of MAE, RMSE, MAPE, and SMAPE, and the right axis is the TIC value. But for the case of RBOB and heating oil, the left axis represents the value of MAE, RMSE, and TIC, and the right axis is the value of MAPE and SMAPE. From Figure 11, the MAPE and SMAPE have similar numerical results for all the case study. The MAPE, SMAPE, and TIC values of RBOB and heating oil indicate that there is no obvious difference between WPDLSTM model and the WPDBPNN model, but in accordance with the results of MAE and RMSE, the former is slightly better than the latter model.




(a)
(b)
(c)
(d)
In order to verify whether the proposed model is significantly different from other forecasting models (WPDLSTM, WPDBPNN, LSTM, BPNN, and SVM), the nonparametric Wilcoxon signed rank test is applied on two absolute errors by two compared models [57–59]. The corresponding statistical test results of the four indexes are presented in Table 7. The results illustrate that the proposed model has statistical significance among the other models. Besides, in Tables 3–6, the error evaluations of MAE, RMSE, MAPE, SMAPE, and TIC by WPDSWLSTM are all smaller than those by other five models for indexes WTI, Brent, RBOB, and heating oil. It can be inferred that the WPDSWLSTM model is significant superior to other models for the four indexes.

5.2. Evaluation of Multiorder Multiscale CID Analysis (MMCID)
In this section, novel error evaluation methods are proposed to detect the predicted performance. The new analysis method is based on complexityinvariant distance (CID) which generally brings about major improvements in time series classification and clustering accuracy [35]. Complexity invariance makes use of knowledge about complexity discrepancy between two different datasets as a modification factor for the existing distance measurement methods [35, 60]. By improving the CID method, multiorder multiscale complexity invariant distance (MMCID) is derived to evaluate the predictions of the energy futures prices with different forecasting models. In practical application, the complexity is not limited to a single scale. The MMCID measurement considers multiple time scales when validating and quantifying the connection between different futures series. The MMCID measurement can consist of the following two procedures: (i) considering onedimensional discrete time series: , consecutive coarsegrained vector is calculated with the scale parameter . The specific mathematical expressions are as follows, which refers to [61]
Particularly, when , the coarsegrained time series is , which is merely the primitive sequence. The length of each coarsegrained time series is equal to the length of primitive series divided by the scale parameter . (ii) According to the principle of CID, we compute the multiorder value of CID for each coarsegrained time series and then acquire the MMCID method as a function with scale parameter . Assuming that there are two time series, and , with length ,
The multiorder distance expression is given aswhere between two time series and indicates complexity invariant by introducing a correction index. is a complexity correction index, and is a complexity evaluation of time series . Moreover, gives reasons for complexity differences of different datasets into comparison. It separates time series with distinctly different complexities to be further apart. And multiorder parameter is applied to enlarge the performance of great changes in the process of error evaluation.
When evaluating with the MMCID method, the actual value can be regarded as series and the predicted results as the series . According to the theory of the MMCID, the predicted effectiveness is better when the MMCID value is smaller. It also indicates that the fluctuation trends of the prediction are almost consistent with the actual data. In this study, the parameter is set to 2 and is from 1 to 20. Table 8 shows the specific MMCID values between the forecasting results and the actual values from the six mentioned models when the scale parameter . The empirical results from the four different types of experiment data demonstrate that the proposed hybrid model performs much better than the other five forecasting models. Figure 12 shows MMCID results between the actual futures prices series and the corresponding prediction of them from each predictive model. It is distinctly noticed that the MMCID value between actual data and the prediction ones by the WPDSWLSTM model is the smallest one of all, and the results from hybrid models are much better than those from single models for all the four contemplated futures indices. With the novel estimation method, the forecasting merits of the proposed WPDSWLSTM model are further manifested, and the productiveness of the SW method added to WPDLSTM model is also revealed distinctively. In view of the above empirical analysis, the established new hybrid forecasting approach is effective for improving the accuracy of energy futures prices.

(a)
(b)
(c)
(d)
5.3. Comparative Analysis with Existing Hybrid Models
In this section, the latest hybrid models are considered as the benchmark models to make predictions on the selected four energy futures indexes. Recently, many researchers have combined decomposition methods with machine learning algorithm to establish hybrid forecasting models. Lin et al. [34] proposed the CEEMDANLSTM model to the forecast of exchange rate. Niu et al. [32] and He et al. [45] applied the VMDLSTM model to the forecasting fields of stock prices and exchange rate movements. Li and Wang [62] developed a novel model STGRU by embedding stochastic time intensity function into gated recurrent unit model (GRU). Therefore, this section makes comparative analysis between the WPDSWLSTM model with the CEEMDANLSTM, VMDLSTM, and STGRU models, respectively. Table 9 has listed the error evaluation results of the four hybrid forecasting models. Table 10 is the hypothesis test results of Wilcoxon signed rank test for different paired models. The values are all close to 0 and the values are 1 through calculation by hypothesis test, indicating that test rejects null hypothesis. Hence, the prediction error of the WPDSWLSTM model is significantly different (under the significance level of 0.05) from the error of the other three hybrid models. Furthermore, compared with the results of other models, all the error evaluations of the forecasting performances in Table 9 are very close, but those of the proposed model are smaller than the errors of the other models. Combined with the results of the statistical test in Table 10, it can be deduced that the prediction efficiency of the proposed model is more superior to the latest three hybrid models for energy futures prices forecasting.


6. Conclusion
In this research, a new hybrid forecasting model, WPDSWLSTM, has been set up by integrating the wavelet packet decomposition based on LSTM with stochastic time strength weight function method. After decomposing the primitive futures series into several subseries, each forecasting model for the different subseries has been established according to its own frequency band properties. The correlation coefficient values () from four energy futures series are all above 0.98 and extremely near 1, which implies that the proposed model performs great prediction effect. Furthermore, compared with the empirical results of SVM, BPNN, LSTM, WPDBPNN, and WPDLSTM forecasting models, the predicted values and different error evaluation reveal that the proposed WPDSWLSTM forecasting model has strong points in upgrading the accuracy of energy futures prices. In addition, according to the evaluation errors of MAE, RMSE, MAPE, SMAPE, and TIC, the hybrid models WPDSWLSTM, WPDLSTM, and WPDBPNN have better prediction performance than the individual models, LSTM, BPNN, and SVM. The effectiveness of stochastic time strength weight function is the key that the accuracy of the WPDSWLSTM model is far more than the other five models. By introducing the novel evaluation error, MMCID method and the forecasting effectiveness of the proposed model are further confirmed. At the last section, compared with the recent hybrid CEEMDANLSTM, VMDLSTM, and STGRU models, by Wilcoxon test, the proposed model is significantly different from the forecasting errors of the other three models. Combined with the error evaluation results, it can be referred that the forecasting accuracy of the proposed model is the highest among the other benchmark models for energy futures prices forecasting.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors would like to thank the financial supports from the funds of North China University of Technology, China (no. 110051360002).
References
 K. Lang and B. R. Auer, “The economic and financial properties of crude oil: a review,” The North American Journal of Economics and Finance, vol. 52, Article ID 100914, 2020. View at: Publisher Site  Google Scholar
 D. Faruk, “A hybrid neural network and ARIMA model for water quality time series prediction,” Engineering Applications of Artificial Intelligence, vol. 23, pp. 86–94, 2010. View at: Publisher Site  Google Scholar
 H. Liu, H.Q. Tian, and Y.F. Li, “Comparison of two new ARIMAANN and ARIMAkalman hybrid methods for wind speed prediction,” Applied Energy, vol. 98, pp. 415–424, 2012. View at: Publisher Site  Google Scholar
 H. Abdollahi and S. B. Ebrahimi, “A new hybrid model for forecasting Brent crude oil price,” Energy, vol. 200, Article ID 117520, 2020. View at: Publisher Site  Google Scholar
 J. Li, S. Zhu, and Q. Wu, “Monthly crude oil spot price forecasting using variational mode decomposition,” Energy Economics, vol. 83, pp. 240–253, 2019. View at: Publisher Site  Google Scholar
 Z. Liao and J. Wang, “Forecasting model of global stock index by stochastic time effective neural network,” Expert Systems with Applications, vol. 37, no. 1, pp. 834–841, 2010. View at: Publisher Site  Google Scholar
 W. Huang, Y. Nakamori, and S.Y. Wang, “Forecasting stock market movement direction with support vector machine,” Computers & Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005. View at: Publisher Site  Google Scholar
 J. Wang and J. Wang, “Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks,” Neurocomputing, vol. 156, pp. 68–78, 2015. View at: Publisher Site  Google Scholar
 J. Wang and J. Wang, “Forecasting energy market indices with recurrent neural networks: case study of crude oil price fluctuations,” Energy, vol. 102, pp. 365–374, 2016. View at: Publisher Site  Google Scholar
 B. Wang and J. Wang, “Deep multihybrid forecasting system with random EWT extraction and variational learning rate algorithm for crude oil futures,” Expert Systems with Applications, vol. 2020, Article ID 113686, 2020. View at: Publisher Site  Google Scholar
 P. Du, J. Wang, W. Yang, and T. Niu, “A novel hybrid model for shortterm wind power forecasting,” Applied Soft Computing, vol. 80, pp. 93–106, 2019. View at: Publisher Site  Google Scholar
 H. Su, E. Zio, J. Zhang, M. Xu, X. Li, and Z. Zhang, “A hybrid hourly natural gas demand forecasting method based on the integration of wavelet transform and enhanced DeepRNN model,” Energy, vol. 178, pp. 585–597, 2019. View at: Publisher Site  Google Scholar
 L. Q. Han, Theory, Design and Application of Artificial Neural Network, Chemical Industry Press, Beijing, China, 2002.
 Z. Yang and J. Wang, “A hybrid forecasting approach applied in wind speed forecasting based on a data processing strategy and an optimized artificial intelligence algorithm,” Energy, vol. 160, pp. 87–100, 2018. View at: Publisher Site  Google Scholar
 M. Ghiassi, H. Saidane, and D. K. Zimbra, “A dynamic artificial neural network model for forecasting time series events,” International Journal of Forecasting, vol. 21, no. 2, pp. 341–362, 2005. View at: Publisher Site  Google Scholar
 H. H. H. Aly, “A proposed intelligent shortterm load forecasting hybrid models of ANN, WNN and KF based on clustering techniques for smart grid,” Electric Power Systems Research, vol. 182, Article ID 106191, 2020. View at: Publisher Site  Google Scholar
 Z. Berradi and M. Lazaar, “Integration of principal component analysis and recurrent neural network to forecast the stock price of Casablanca stock exchange,” Procedia Computer Science, vol. 148, pp. 55–61, 2019. View at: Publisher Site  Google Scholar
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at: Publisher Site  Google Scholar
 E. Azari and S. Vrudhula, “An energyefficient reconfigurable LSTM accelerator for natural language processing,” in Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), pp. 4450–4459, Los Angeles, CA, USA, December 2019. View at: Google Scholar
 G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomputing, vol. 337, pp. 325–338, 2019. View at: Publisher Site  Google Scholar
 I. E. Livieris, E. Pintelas, and P. Pintelas, “A CNNLSTM model for gold price timeseries forecasting,” Neural Computing and Applications, vol. 32, no. 5, pp. 1–10, 2020. View at: Publisher Site  Google Scholar
 A. Sagheer and M. Kotb, “Time series forecasting of petroleum production using deep LSTM recurrent networks,” Neurocomputing, vol. 323, pp. 203–213, 2019. View at: Publisher Site  Google Scholar
 T. Hussain, K. Muhammad, A. Ullah et al., “Cloudassisted multiview video summarization using CNN and bidirectional LSTM,” IEEE Transactions on Industrial Informatics, vol. 16, no. 1, pp. 77–86, 2019. View at: Google Scholar
 H. Y. Kim and C. H. Won, “Forecasting the volatility of stock price index: a hybrid model integrating LSTM with multiple GARCHtype models,” Expert Systems with Applications, vol. 103, pp. 25–37, 2018. View at: Publisher Site  Google Scholar
 L. Yu, S. Wang, and K. K. Lai, “Forecasting crude oil price with an EMDbased neural network ensemble learning paradigm,” Energy Economics, vol. 30, no. 5, pp. 2623–2635, 2008. View at: Publisher Site  Google Scholar
 A. Safari and M. Davallou, “Oil price forecasting using a hybrid model,” Energy, vol. 148, pp. 49–58, 2018. View at: Publisher Site  Google Scholar
 H. Liu, C. Yu, H. Wu, Z. Duan, and G. Yan, “A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting,” Energy, vol. 202, Article ID 117794, 2020. View at: Publisher Site  Google Scholar
 J. Wang and J. Wang, “Forecasting stochastic neural network based on financial empirical mode decomposition,” Neural Networks, vol. 90, pp. 8–20, 2017. View at: Publisher Site  Google Scholar
 D. Wang, H. Luo, O. Grunder, Y. Lin, and H. Guo, “Multistep ahead electricity price forecasting using a hybrid model based on twolayer decomposition technique and BP neural network optimized by firefly algorithm,” Applied Energy, vol. 190, pp. 390–407, 2017. View at: Publisher Site  Google Scholar
 A. A. Abdoos, “A new intelligent method based on combination of VMD and ELM for short term wind power forecasting,” Neurocomputing, vol. 203, pp. 111–120, 2016. View at: Publisher Site  Google Scholar
 H. Liu and C. Chen, “Data processing strategies in wind energy forecasting models and applications: a comprehensive review,” Applied Energy, vol. 249, pp. 392–408, 2019. View at: Publisher Site  Google Scholar
 H. Niu, K. Xu, and W. Wang, “A hybrid stock price index forecasting model based on variational mode decomposition and LSTM network,” Applied Intelligence, vol. 50, no. 12, pp. 1–14, 2020. View at: Publisher Site  Google Scholar
 M. A. Jallal, A. GonzalezVidal, A. F. Skarmeta et al., “A hybrid neurofuzzy inference systembased algorithm for time series forecasting applied to energy consumption prediction,” Applied Energy, vol. 268, Article ID 114977, 2020. View at: Publisher Site  Google Scholar
 H. Lin, Q. Sun, and S.Q. Chen, “Reducing exchange rate risks in international trade: a hybrid forecasting approach of CEEMDAN and multilayer LSTM,” Sustainability, vol. 12, no. 6, Article ID 2451, 2020. View at: Publisher Site  Google Scholar
 G. E. A. P. A. Batista, E. J. Keogh, O. M. Tataw, and V. M. A. De Souza, “CID: an efficient complexityinvariant distance for time series,” Data Mining and Knowledge Discovery, vol. 28, no. 3, pp. 634–669, 2014. View at: Publisher Site  Google Scholar
 R. R. Coifman, Y. Meyer, and V. Wickerhauser, “Wavelet analysis and signal processing,” in Wavelets and their Applications, Jones and Barlett, Sudbury, MA, USA, 1992. View at: Google Scholar
 J. D. Wu and C. H. Liu, “An expert system for fault diagnosis in internal combustion engines using wavelet packet transform and neural network,” Expert Systems with Applications, vol. 36, Article ID 42788C4286, 2009. View at: Publisher Site  Google Scholar
 J. Zarei and J. Poshtan, “Bearing fault detection using wavelet packet transform of induction motor stator current,” Tribology International, vol. 40, Article ID 7638C769, 2007. View at: Publisher Site  Google Scholar
 L. Y. Zhao, L. Wang, and R. Q. Yan, “Rolling bearing fault diagnosis based on wavelet packet decomposition and multiscale permutation entropy,” Entropy, vol. 17, no. 9, pp. 6447–6461, 2015. View at: Publisher Site  Google Scholar
 N. Amjady and F. Keynia, “Shortterm load forecasting of power systems by combination of wavelet transform and neuroevolutionary algorithm,” Energy, vol. 34, no. 1, pp. 46–57, 2009. View at: Publisher Site  Google Scholar
 H. Liu, X. Mi, and Y. Li, “Smart deep learning based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short term memory network,” Energy Conversion and Management, vol. 166, pp. 120–131, 2018. View at: Publisher Site  Google Scholar
 J. Schmidhuber, S. Hochreiter, and Y. Bengio, “Evaluating benchmark problems by random guessing,” in A Field Guide to Dynamical Recurrent Networks, J. Kolen and S. Cremer, Eds., WileyIEEE Press, Hoboken, NJ, USA, 2001. View at: Google Scholar
 T. Fischer and C. Krauss, “Deep learning with long shortterm memory networks for financial market predictions,” European Journal of Operational Research, vol. 270, no. 2, pp. 654–669, 2018. View at: Publisher Site  Google Scholar
 T. N. Sainath, O. Vinyals, A. Senior et al., “Convolutional, long shortterm memory, fully connected deep neural networks,” in Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584, IEEE, South Brisbane, Australia, April 2015. View at: Google Scholar
 F. He, J. Zhou, Z.k. Feng, G. Liu, and Y. Yang, “A hybrid shortterm load forecasting model based on variational mode decomposition and long shortterm memory networks considering relevant factors with Bayesian optimization algorithm,” Applied Energy, vol. 237, pp. 103–116, 2019. View at: Publisher Site  Google Scholar
 D. Dufresne, “The integral of geometric brownian motion,” Advances in Applied Probability, vol. 33, no. 1, pp. 223–241, 2001. View at: Publisher Site  Google Scholar
 J. Gatheral and A. Schied, “Optimal trade execution under geometric Brownian motion in the almgren and chriss framework,” International Journal of Theoretical and Applied Finance, vol. 14, no. 03, pp. 353–368, 2011. View at: Publisher Site  Google Scholar
 G. Dudek, “Generating random weights and biases in feedforward neural networks with random hidden nodes,” Information Sciences, vol. 481, pp. 33–56, 2019. View at: Publisher Site  Google Scholar
 M. Abdechiri, M. R. Meybodi, and H. Bahrami, “Gases brownian motion optimization: an algorithm for optimization (GBMO),” Applied Soft Computing, vol. 13, no. 5, pp. 2932–2946, 2013. View at: Publisher Site  Google Scholar
 Y. Yang, J. Wang, and B. Wang, “Prediction model of energy market by long short term memory with random system and complexity evaluation,” Applied Soft Computing, vol. 95, Article ID 106579, 2020. View at: Publisher Site  Google Scholar
 R. H. Abiyev, “Fuzzy wavelet neural network based on fuzzy clustering and gradient techniques for time series prediction,” Neural Computing and Applications, vol. 20, no. 2, pp. 249–259, 2011. View at: Publisher Site  Google Scholar
 S. Makridakis, “Accuracy measures: theoretical and practical concerns,” International Journal of Forecasting, vol. 9, no. 4, pp. 527–529, 1993. View at: Publisher Site  Google Scholar
 X. T. Liu, “Study on data normalization in bp neural network,” Mechanical Engineering & Automation, vol. 3, pp. 122123, 2010. View at: Google Scholar
 S. Lahmiri, “A comparison of PNN and SVM for stock market trend prediction using economic and technical information,” International Journal of Computer Applications, vol. 29, pp. 24–30, 2011. View at: Google Scholar
 R. Rosillo, J. Giner, and D. De la Fuente, “Stock market simulation using support vector machines,” Journal of Forecasting, vol. 33, no. 6, pp. 488–500, 2014. View at: Publisher Site  Google Scholar
 T. Papadimitriou, P. Gogas, and E. Stathakis, “Forecasting energy markets using support vector machines,” Energy Economics, vol. 44, pp. 135–142, 2014. View at: Publisher Site  Google Scholar
 G.F. Fan, S. Qing, H. Wang, W.C. Hong, and H.J. Li, “Support vector regression model based on empirical mode decomposition and auto regression for electric load forecasting,” Energies, vol. 6, no. 4, pp. 1887–1901, 2013. View at: Publisher Site  Google Scholar
 Y. Chen, W.C. Hong, W. Shen, and N. Huang, “Electric load forecasting based on a least squares support vector machine with fuzzy time series and global harmony search algorithm,” Energies, vol. 9, no. 2, p. 70, 2016. View at: Publisher Site  Google Scholar
 M.W. Li, Y.T. Wang, J. Geng, and W.C. Hong, “Chaos cloud quantum bat hybrid optimization algorithm,” Nonlinear Dynamics, vol. 103, no. 1, pp. 1167–1193, 2021. View at: Publisher Site  Google Scholar
 J. A. Rodger, “A fuzzy nearest neighbor neural network statistical model for predicting demand for natural gas and energy cost savings in public buildings,” Expert Systems with Applications, vol. 41, no. 4, pp. 1813–1829, 2014. View at: Publisher Site  Google Scholar
 M. Costa, A. L. Goldberger, and C. K. Peng, “Multiscale entropy analysis of biological signals,” Physical Review E, vol. 71, Article ID 021906, 2005. View at: Publisher Site  Google Scholar
 J. Li and J. Wang, “Forecasting of energy futures market and synchronization based on stochastic gated recurrent unit model,” Energy, vol. 213, Article ID 118787, 2020. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2021 Jie Wang and Jun Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.