Research Article | Open Access
Waqas Ahmad, Muhammad Aamir, Umair Khalil, Muhammad Ishaq, Nadeem Iqbal, Mukhtaj Khan, "A New Approach for Forecasting Crude Oil Prices Using Median Ensemble Empirical Mode Decomposition and Group Method of Data Handling", Mathematical Problems in Engineering, vol. 2021, Article ID 5589717, 12 pages, 2021. https://doi.org/10.1155/2021/5589717
A New Approach for Forecasting Crude Oil Prices Using Median Ensemble Empirical Mode Decomposition and Group Method of Data Handling
The accuracy of time series forecasting is more important and can assist organizations to take up-to-date decisions for better planning and management. Several classical econometrics and computational approaches show promising results for the ordinary time series forecasting tasks, but they are not satisfactory in crude oil price forecasting. Ensemble empirical mode decomposition (EEMD) not only resolves the problem of nonlinearity and nonstationarity of time series prediction but also creates some problems (i.e., mood mixing and splitting). In this study, we proposed a new hybrid method that combines the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH) to reduce mood splitting problems and forecast crude oil price. MEEMD is achieved by replacing the mean operator with the median operator during the EEMD process. For testing and validation purposes of the different models, the two-seat stamp benchmarked crude oil price data are used (i.e., Brent and West Texas Intermediate (WTI)). To check the proposed model performance, different evaluation measures are used including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Diebold-Mariano (DM) test. All the forecasting accuracy measures confirmed that our proposed model performs well in crude oil prices forecasting as compared to other hybrid models.
The global market price fluctuates dramatically and increases in the long term. Commodity prices’ fluctuation causes a massive impact on the global economy, for example, soaring the cost of imports, stimulating inflation, sluggish growth in the economy, and decreasing the efficacy of macroeconomic policy. Thus, analyzing the characteristics of international market variations in goods in order to forecast the price and pattern is critical for the world economy. Goods are vital for global growth from the perspective of nations and governments and have a significant strategic effect on national economic stability. If consumer prices can be more reliably estimated, goods can be imported at low prices to significantly relieve imported inflation pressures. As import prices drop, government subsidies to businesses may be lowered and fiscal policy stability increases. Moreover, the currency reserves of a nation could be used to accommodate the flexibility of exchange rates to improve the resilience of monetary policies regardless of the decline of foreign exchange spending.
In the opinion of producers, goods are raw materials for the aviation, shipping, and food processing sectors. They are also items for the oil mining industry and nonferrous metallurgy businesses. Fluctuations of product markets influence the costs and earnings of companies. The effective fluctuation of the value of goods helps farmers to accurately schedule production, minimize the cost, and achieve greater profitability. As for exchange firms, sharp commodity price swings are also the result of significant losses. The probability of market volatility is therefore underdetermined due to the lack of an analysis team, operational team, and the required decision-making process. In fact, the expected price contrasts considerably with the real price when dealing with product prospects. By creating a model of projection for consumer goods and a method of mitigation fluctuations in prices, trading firms are able to avoid risks and lower trade losses. In general, it is important and urgent to study commodity price predictions to provide rational support for the government and business decision-makers.
Crude oil is one of the core major natural products, with demand and supply exceeding 80 million barrels per day, because it covers two-thirds of the world’s direct energy consumption . Oil assumes an undeniably important role in the global economy since about 66% of the world’s energy utilization comes from unrefined petroleum and gasoline. Sharp oil price value improvements are most likely going to shake aggregate economic activity, especially since Jan 2004, the world’s oil cost has been rising rapidly and is creating striking fluctuations for the world economy. Consequently, unss oil prices are a source of major zeal for many analysts, research experts, and organizations. The price of crude oil is essentially dictated by its demand and supply but is more clearly affected by numerous unpredictable past/present/future occurrences, such as climate change, stock levels, GDP development, and political perspectives. These realities lead to a distinctly varying and nonlinear market and the basic component of maintaining the intricate dynamic is not understood.
In 2019, global oil consumption reached 1,0075 million barrels per day considering data from the International Energy Agency (IEA). Oil indeed plays the most important role in fulfilling global energy needs. Asian emerging-market countries have become the key contributors to the rising demand for crude oil. The fast economic development led them to dramatically raise the demand for crude oil. The crude oil demand has increased due to the expeditious economic growth. China’s oil consumption, for instance, has risen from an average of 69,700 barrels per day in 2005 to 145,100 barrels per day in 2019. As a demand factor, rising crude oil prices would result in higher production costs for the nonoil companies and a shrink in profit . As crude oil is a very critical commodity to the global economy, many leading governments, investors, and scholars have invested a lot of effort in building models to predict fluctuations in their prices and important properties. Given its complexity, price charts are susceptible to factors such as supply and demand, speculative activities, competition between suppliers, development of technology, and endless war [3, 4].
Due to the nonlinear and complex nature, it is difficult for humans to understand the high volatility in crude oil prices. In the past, crude oil prices for the West Texas Intermediate (WTI) peaked in July 2008 at USD 145.31 per barrel. But the price fell sharply to USD 30.28 per barrel due to the financial crisis, which was about 80 percent from the high at the end of 2008. Prices climbed to $113 per barrel in April 2011, when the economy boomed, but in February 2016, it dropped again to $27 per barrel, owing to certain political causes and demand and supply variations . From different perspectives, the impact of crude oil price fluctuations on the national economy is reflected in two aspects. In the first aspect, soaring crude oil prices have seriously affected the economic empowerment of oil-importing economies. The second aspect is that the decline in crude oil prices (such as the decline in 1998) has caused serious budgetary deficit problems for oil-exporting countries . Since crude oil price series are generally considered to be nonlinear and nonstationary time series, they can be accurately influenced by several factors; therefore, accurately predicting the price of oil can be quite challenging. Since the oil price pattern displayed nonlinear, nonstatic, or multiscaling elements, researchers started to analyze oil price volatility by using multiscale techniques, such as the wavelet analysis and the analytical decomposition mode (EMD). These techniques have a strong time and frequency resolution and can increase the regularity of the variations. There are several methods of analyzing and predicting oil prices developed by researchers that can be separated approximately into single models and mixed models. Single models include observational approaches, methods for causal inference, times, and math. Combined models are made according to such laws by integrating single models.
In the past decades, future observation and prediction based on time series data have attracted great attention in many research fields. To predict the future behavior of a particular phenomenon, many techniques have been developed to address this issue, such as cointegration analysis, vector error correction model (VECM), vector autoregression (VAR), linear-regression (linR), random walk model, GARCH, and ARIMA models. Other than that, computational approaches such as empirical mode decomposition (EMD), artificial neural network (ANN), and ensemble empirical mode decomposition (EEMD) have also been used. Gülen  used a cointegration methodology to predict the WTI crude oil price. Lanza et al.  utilized the error correction model (ECM) to predict crude oil prices. Another famous methodology is the GARCH model; likewise,  used the GARCH properties to predict Brent crude oil price. Mohammadi and Su  applied the ARIMA-GARCH model on weekly crude oil spot prices in eleven international markets, to forecast the conditional mean and volatility. ANN and ARIMA models were used to predict the future price of WTI crude oil . They documented a comparative analysis between the ANN and ARIMA models to show the techniques with the best results based on the forecasting accuracy measures including Mean Absolute Error (MAE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE). The scholars concluded that the ANN had better prediction results than the ARIMA model. Mirmirani and Li  investigated US oil prices using vector autoregression (VAR) and ANN and concluded that BPN-GA attains the best results. Ahmad  predicted the Oman crude oil prices using the ARIMA model and proved that ARIMA (1, 1, 5)(1, 1, 1) achieved the best results. Aamir et al.  used the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) hybrid model to forecast crude oil prices for both Brent and WTI markets. Three-layer feedforward neural network (FNN) was incorporated by  for forecasting short-term crude oil price. The authors [15, 16] forecast the crude oil price by using a support vector machine (SVM) and contrasted the results with backpropagation neural network (BPNN) and ARIMA models. Their findings indicated that SVM outperformed the BPNN and ARIMA models. The aim of the authors  was to achieve some improvements in the prediction of oil price volatility by using an Artificial Neural Network Generalized Autoregressive Conditional Heteroscedasticity ANN-GARCH hybrid model combined with financial variables. They concluded that the hybrid model improves the volatility prediction accuracy by more than 30% through the results measured by the heteroscedasticity-adjusted mean square error (HMSE) model. Lin and Sun  used CEEMDAN-MLGRU based decomposition method to forecast WTI crude oil price. Li et al.  used a new hybrid-based model, namely, EEMD-SBL-ADD, and concluded that the proposed model is promising for forecasting crude oil price. The authors  also worked on Box-Jenkins (ARIMA) and neural network (NN) models to forecast repairable system failure analysis. They concluded that both models performed better with short-term forecasting; however, NN as compared to the ARIMA model gave satisfactory performances. Moreover, the authors in  used machine-learning (decision tree) models to forecast crude oil prices. This study also concluded that the decision tree models achieved higher prediction accuracy than benchmark models such as multiple linR and ARIMA. Due to nonlinear features, the classical time series models were not capable of predicting crude oil prices accurately [21, 22].
Motivated by the potential of median ensemble empirical mode decomposition (MEEMD) in signal decomposition, we proposed a new method for the prediction of crude oil prices combining MEEMD, namely, the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH), stimulated by the capability of MEEMD in the signal breakdown in order to minimize mood splitting and mixing problems. In particular, there are three phases of MEEMD-GMDH. First, MEEMD is used to decompose the daily raw price range for crude oil into several relatively simple components. Secondly, we use GMDH inputs from the autoregressive term to separately forecast each component. Finally, the predicted results for each component are aggregated as a result of the final forecast.
This paper contains the following main contributions:(1)We are proposing a new MEEMD framework that uses the median operator rather than the mean operator during an ensemble noisy intrinsic mode function (IMF) trial.(2)We forecast crude oil prices by integrating MEEMD-GMDH following the “decomposition and ensemble” framework. To the best of our knowledge, this blend is used for forecasting purposes for the first time.(3)Experimental outcomes show that the approach proposed is beneficial for forecasting crude oil prices.
The remainder of the study is structured in the sequence as follows: In Section 2, we concisely explain MEEMD, GMDH, ANN, and ARIMA models. Section 3 formulates the proposed MEEMD-GMDH model in detail. To evaluate the proposed model, results and discussion are presented in Section 4 and finally, and Section 5 concludes this paper.
EMD is one of the popular and widely used decomposition methods for nonlinear and nonstationary time series forecasting. The EMD decomposes the data into IMFs along with a residual. Intermittent recurrence of signals in EMD is usually due to mode mixing and mode splitting. Mode mixing is defined as one IMF containing different scales while mode splitting is defined as the spread of one scale over two or more IMFs. To remove the effect of mode mixing from EMD, an additional white noise term is added to the original signal before applying EEMD. The new added white noise solves the problem of mode mixing; however, it inevitably creates new mode splitting due to two main reasons: (i) Signals having a scale located in the overlapping region of the EMD equivalent filter would have a finite probability of mode splitting ; (ii) The added white noise cannot guarantee full uniformity across all scales, which may lead unexpected signal intermittency and irregularity . To reduce the mode splitting problem,  proposed MEEMD. It is the variation of the EEMD method that uses the median operator instead of a mean operator during the ensemble noisy IMF trial. The steps of MEEMD are as follows:(i)Create the ensemble: For, where .(ii)Perform and decompose every member of into a maximum number of IMFs using standard (EMD) to obtain the IMFs and one residue.(iii)Use the median operator to obtain final IMFs within MEEED. The IMFs are computed as In normal distribution , the median is asymptotically normal . That is, The added noise for MEEMD is as follows: is the final standard deviation which is equal to the difference between input and sum of IMFs. The flowchart of the MEEMD is presented in Figure 1.
2.2. ARIMA Model
Box and Jenkins (1976) introduced the Box-Jenkins technique (ARIMA models) in the area of time series analysis . An ARIMA model predicts a value in a response time series as a linear combination of its own pth previous values, the previous error, and current and previous values of other time series. Box-Jenkins versions are highly versatile due to the use of both AR and MA concepts. The time series (Yt) interdependencies are measured by AR terms, while the previous error conditions depend on the MA terms. The following form is given for an ARMA order model (p, d, and q) for a univariate series.
The ARIMA process of order (p, d, and q) is defined as follows:
Here, is the target value of time series is the lagged previous values, and is the coefficients of lagged previous time series values, whereas is the coefficients of the previous error term, is an error term with normally distributed , and is the previous error terms. As most of the time series are usually nonstationary, the ARIMA model needs stationary data . This can be achieved by differentiating the time series data. The ACF and PACF plots can be used to select the appropriate order of AR and MA terms.
2.3. Artificial Neural Network Model
ANN is a very successful model for time series forecasting . One of the characteristics of ANNs is their universal approximation; that is, ANNs can estimate any nonlinear continuous function up to any desired degree of accuracy [29, 30]. The most commonly used for time series forecasting is single hidden layer feedforward neural network (SLFN). The relationship between output and the inputs has the following output:where and are the weights, and are the bias term, represents the input nodes, and is hidden nodes. Using the logistic function as the hidden layer activation function , the first layer is the input layer where the data are introduced to the network, the second layer is the hidden layer, and the last layer is the output layer where the result of a given input is produced. The ANN architecture is shown in Figure 2.
2.4. Group Method of Data Handling (GMDH)
The idea of GMDH was first proposed by Ivakhnenko in 1966, as an inductive learning algorithm . According to , GMDH methodology solves higher-order regression polynomials, that is, solving modeling and classification problems. In the time series forecasting, the GMDH algorithm identifies the relationship between the variables based on their lag values. The GMDH methodology automatically chooses the process to follow in the algorithm after training the relationship between variables. The authors in  analyzed that GMDH has the ability to generalize and can fit the complexity of nonlinear systems. The Ivakhnenko polynomial is defined as follows:
Here, y represents the response variable, represents weights (coefficients), and represents lagged time series data. GMDH model consists of the following five steps:Step 1.Step 1. Divide the data into training and testing sets. Choose input (variables). T is equal to the number of inputs. Construct the GMDH model using train data, while evaluating the estimated model using the testing dataset.Step 2.For partial description of GMDH, choose the new k variables where k is the number of combinations i.e., . The partial description is the form of Many researchers consider that the partial description is a transfer function. There are many types of transfer functions. In this study, we use the radial basis function (RBP). The radial basis function is the form of Step 3.Estimate the vector of coefficients of partial description using the least square method. Here, is a polynomial coefficient. is the observed values, and Step 4.In this step, identify new (inputs) for the second layer. Based on some criteria to choose the input (variables) for the second layer, choosing the best variables is based on some performance index, MSE, and Relative Mean Square Error (RMSE). The best neuron out of k neurons identified the value of MSE of checking dataset such that Those values of Z whose MSE is less than the threshold; then stop the process. Otherwise, new input is . In criteria 2, ignore the weakest variables, and replace by those columns of that best estimate the response variable in the checking set. Step 5.In this step, check the stopping criterion. Whether a set of polynomials of the model is further improved, the lowest value of MSE obtained in the current layer is compared with the smallest value of MSE obtained in the previous layer. If an improvement is achieved, one goes back and repeat steps 1 to 5; otherwise, the process is stopped, and the algorithm has been completed. And finally, the GMDH model is shown in Figure 3.
3. The Proposed MEEMD-GMDH Model
Due to nonlinear features, the classical time series models are not capable of predicting crude oil prices accurately . Therefore, inspired by the advantage of MEEMD in this study, a novel approach that integrates MEEMD-GMDH is used for forecasting crude oil price. The decomposition and ensemble framework of MEEMD-GMDH consists of the following steps and is shown in Figure 4. Step 1.Decomposition of data: MEEMD is applied to decompose the crude oil prices series into two parts: (i) IMFs components and (ii) one residue component . Step 2.Individual prediction: divide the data into training and testing sets. Construct the GMDH model using train data, while evaluating the estimated model using the testing dataset. Step 3.Ensemble prediction: the test results for all IMFs from Step 2 are composed by adding as the final prediction results.
4. Experimental Results
4.1. Data Description
In this study, daily crude oil prices of time series data are utilized, that is, WTI and Brent. The WTI series consists of 8000 observations from Feb 10, 1989, to Oct 10, 2019; 80 percent (6400 observations) are used as a training set while 20 percent (1600 observations) are used as a testing set. The Brent dataset consists of 12000 observations from Dec 10, 1973, to Oct 10, 2019; 80 percent (9600 observations) are used as a training set whereas 20 percent (2400 observations) are used as an assessment set to check the model performances. The distribution of training and checking data series is 80 percent and 20 percent, respectively .
4.2. Evaluation Criteria
The forecasting accuracy measures are the most important criteria when competing models occur. In this study, the forecasting capacity of the models is measured using four criteria, as presented in Table 1, where n is the number of data points, represents observed values, and represents the predicted values.
The Diebold-Mariano (DM) test statistic compares the prediction error of the two models, where , , is the forecasts achieved from the 1st model, and is the forecasts from the second model. The original WTI crude oil series is decomposed into 10 IMFs and one residue component, while the Brent series is decomposed into 11 IMFs and one residue component using MEEMD. We add the white noise with a standard deviation of 0.02 and the number of ensemble sizes is equal to the size of data. Both of them are illustrated in Figures 5 and 6.
4.3. Fitting Proposed Model to the Data
In designing the GMDH model, one must determine the number of input variables. The selection of input corresponding to the number of variables plays an important role in many successful applications of the GMDH model. According to , no theory can be used to guide the selection of the number of inputs. To make the MEEMD-GMDH model, we choose the best order (p, d, and q) of the ARIMA model for every kth IMFs based on AIC and BIC that are used to determine the input variables for the proposed model.
4.4. Predictive Performance of Single Models
We proposed a hybrid model that includes two components: a decomposition by MEEMD and forecasting by the GMDH model. The compared single models include GMDH, ANN, and classical time series ARIMA. In this study, the comparison of the three competing single models concerning forecasting evaluation (accuracy) for testing datasets is presented in Table 2.
From Table 2, among all these models, GMDH model attains the smallest value (better performance) on the metrics (RMSE, MAE, and MAPE). As shown in Table 2, the single ANN model performed better than the classical ARIMA model. Table 2 also indicated that the ARIMA model performed worst, because the classical econometric and time series method does not perform well for nonlinear time series.
The forecasting evaluation measures for single models are shown in Table 2; some interesting conclusions can be drawn:(1)The forecasting evaluation performance (RMSE) for both crude oil markets is presented in Table 2. The decision is made from the RMSE value that GMDH (single model) got the lowest value and outperformed the other single models (ANN; ARIMA) for both markets.(2)The MAPE value of GMDH (single model) 0.2994% and 0.7584% for both markets (Brent and WTI) lies in the classification of perfect forecasts.(3)The ANN model attained second (lowest values) and the ARIMA model has achieved the third rank in the forecasting performance.
4.5. Predictive Performance of Hybrid Models
Regarding the hybrid models (i.e., EEMD-GMDH, EEMD-ANN, EEMD-ARIMA, MEEMD-GMDH, MEEMD-ANN, and MEEMD-ARIMA), Tables 3 and 4 show the experimental results of RMSE, MAE, and MAPE on both markets, Brent and WTI, respectively.
The forecasting evaluation criterion RMSE results are shown in Tables 3 and 4. The conclusion is drawn from RMSE that the proposed MEEMD-GMDH model significantly outperformed the other models for both Brent and WTI crude oil markets, while on other hand, MEEMD-based models (MEEMD-ANN and MEEMD-ARIMA) also attain the lowest value (better performance) than the corresponding EEMD-based (EEMD-ANN and EEMD-ARIMA) models.
The forecasting evaluation performance (MAE) for both crude oil markets is presented in Tables 3 and 4. The decision is made from the MAE value that the proposed model got the lowest value and outperformed the other models for both markets. From Tables 3 and 4, we observe that the MEEMD-based model also performs better than the corresponding EEMD-based models.
The forecasting evaluation criterion MAPE on both markets for all selected models is presented in Tables 3 and 4. The decision made from MAPE value that the proposed model significantly outperformed the other models for both markets. The models MEEMD-ANN and MEEMD-ARIMA also attain the lowest values and perform well; then, the other benchmarked models and ranked second and third, respectively, in terms of MAPE. The MAPE values of the proposed model are 0.0087 and 0.0051 for both Brent and WTI markets, respectively, which lies in the classification of highly accurate forecasts.
From the forecasting evaluation measures shown in Tables 3 and 4, some interesting conclusions can be drawn:(1)Among all these models, the proposed method performed well based on RMSE, MAE, and MAPE presented in Tables 3 and 4.(2)The hybrid models based on MEEMD have better performance than those based on EEMD.(3)The proposed model performs better for long-term dependence than other classical and machine-learning methods for predicting crude oil prices.(4)Both MEEMD and EEMD hybrid models performed better than single time series and machine-learning models.(5)The suggested MEEMD-GMDH framework is way above all other comparable models in terms of MAPE, RMSE, and DM test, by utilizing the benefits of MEEMD and GMDH. All of these mean that the MEEMD-GMDH can effectively forecast crude oil prices.
Next, to confirm the superiority of the proposed model, we apply the DM test. For WTI dataset, the DM test statistic and their p-values are shown in Table 5, while for the Brent series, the DM test statistic and their corresponding p-values are presented in Table 6.
Statistically significant at 1%.
Statistically significant at 1%.
The assumptions of the DM test are the two methods that have the same number of predictions. The DM test confirmed the above conclusion. The MEEMD-based model statistically outperformed ANN and ARIMA models, and their p-values are less than <0.01 for both markets which shows the superiority of the MEEMD-GMDH model, while on the other hand, EEMD hybrid models and their p-values are also less than 0.01. Finally, the proposed model performed better than other models in this study.
4.6. Monte Carlo Simulations
In this section, simulation is performed to check the robustness and generalizability of the proposed MEEMD-GMDH model . As we know, the nature of the crude oil prices data is the combination of the stochastic and deterministic components. The MEEMD and EEMD procedure divided the original time series into IMFs in such a way that the first IMF is more stochastic as compared to the second IMF and the second IMF is more stochastic than the third IMF and so on, whereas the last IMF is completely deterministic. Synthetic time series datasets which are composed of additive white noise and sine function are described in two different scenarios as follows [37–39]:(1)The first synthetic time series consisting of a sine function represents the deterministic component, whereas the normal distribution represents the stochastic component. That is,(2)The second synthetic time series consisting of the sine function represents the deterministic component, whereas the ARMA model represents the stochastic component with an error of 0.25.(3)Different time series are generated using equations (15) and (16) with a different number of observations, that is, 500, 1000, 2000, 5000, and 10000, and decompose all the series using MEEMD and EEMD. The distribution of training and testing data series is 80 and 20 percent, respectively, of every series. The forecast accuracy measures RMSE, MAE, and MAPE for testing datasets are presented in Tables 7 and 8, respectively, for scenarios 1 and 2 for all models, that is, EEMD-GMDH, EEMD-ANN, EEMD-ARIMA, MEEMD-ARIMA, MEEMD-ANN, and the proposed MEEMD-GMDH model.
From Tables 7 and 8, it is observed that the MEEMD improved the performance of the ARIMA, ANN, and GMDH models as compared to EEMD. Thus, for forecasting the crude oil prices, the MEEMD is recommended for data decomposition. The model MEEMD-GMDH outperforms all of the models for a different number of observations, that is, 500, 1000, 2000, 5000, and 10000. The MAPE values of the model MEEMD-GMDH are less than 1 for all sets of observations which demonstrated the highly accurate forecasts [37, 38]. The experimental findings of both scenarios demonstrated that all ensemble methodologies were effective but MEEMD was more effective. Moreover, the forecasting accuracy measures in terms of MAE, RMSE, and MAPE highlighted that the model MEEMD-GMDH is the most efficient method for forecasting daily crude oil prices.
One of the most important quantitative models with significant interest in the literature is time series forecasts. Oil prices are a crucial factor influencing the economic agenda and policies of government and trading enterprises, because of the importance of the role of crude oil in the world economy. Proactive experience of their potential movements will also contribute to improved decision-making at all levels of government and management. The forecasts for oil prices are very complicated since the financial time series is extremely unpredictable, nonlinear, and erratic.
Despite the attempts to fix the issue with new mathematical approaches and because of its inherent complexity, apparently volatile existence, and different variables influencing the fluctuation of the demand in crude oil, oil prices are still difficult to tackle. Incorporate strategies have been more indispensable than ever to apply computational approaches for predicting and encouraging investment decisions.
Achieving an accurate prediction of a time series is a very important but difficult task because of its attributes of nonlinearity and nonstationarity. In this paper, we proposed a hybrid model called MEEMD-GMDH for crude oil price forecasting. The MEEMD method uses a median operator instead of a mean operator during the ensemble noisy IMF trial during the standard EEMD process. The advantage of MEEMD is to reduce the mood splitting problem of IMFs. This is the first time that MEEMD-GMDH has been applied to predict crude oil prices. The experimental results show that our new proposed methodology goes beyond other decomposition hybrid models (EMD, EEMD, and CEEMD). This shows that MEEMD-GMDH is a superior and promising alternative to the autoregressive integrated moving average model, ANN, and other machine-learning approaches studied by other researchers. In addition to crude oil prices, for its robustness and routine testing, the MEEMD-GMDH methodology can be implemented with more complex tasks. Both theoretical and observational literature evidence indicate that the hybrid model is less generic or error dependent on the use of dissimilar models or models that vary strongly. Moreover, the hybrid procedure can reduce the model instability, usually present in the statistical inference and time series prevision, due to potential unreliable or evolving data trends. A literature analysis of the crude oil forecast reveals that there have been limited studies on AI and complex methods. The main objective of this approach is to help decision-makers to reduce the risks of crude oil and improve the accuracy of crude oil price forecasts. Moreover, the research results of this study are crucial to national economic growth and sustainable development.
The future work could be extended in two aspects: (1) to predict other time series, such as gold price series, electricity, and wind speed, one can apply MEEMD-GMDH; (2) to attain more accurate and special decomposition of time series data, one can apply more advanced average operators such as weighted mean, quartiles, or geometric mean.
The data used to support the findings of this study are included within the supplementary information files.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The crude oil prices data used in this paper consist of WTI and Brent. (Supplementary Materials)
- P. K. Verleger, “Adjusting to volatile energy prices,” Peterson Institute, vol. 39, 1994.
- H. Lin and Q. Sun, “Crude oil prices forecasting: an approach of using CEEMDAN-based multi-layer gated recurrent unit networks,” Energies, vol. 13, no. 7, p. 1543, 2020.
- Y. Wang, Y. Wei, and C. Wu, “Detrended fluctuation analysis on spot and futures markets of West Texas Intermediate crude oil,” Physica A: Statistical Mechanics and Its Applications, vol. 390, no. 5, pp. 864–875, 2011.
- L. Yu, W. Dai, and L. Tang, “A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting,” Engineering Applications of Artificial Intelligence, vol. 47, pp. 110–121, 2016.
- T. Li, M. Zhou, C. Guo et al., “Forecasting crude oil price using EEMD and RVM with adaptive PSO-based kernels,” Energies, vol. 9, no. 12, p. 1014, 2016.
- S. Abosedra and H. Baghestani, “On the predictive accuracy of crude oil futures prices,” Energy Policy, vol. 32, no. 12, pp. 1389–1393, 2004.
- S. G. Gülen, “Efficiency in the crude oil futures market,” Journal of Energy Finance & Development, vol. 3, no. 1, pp. 13–21, 1998.
- A. Lanza, M. Manera, and M. Giovannini, “Modeling and forecasting cointegrated relationships among heavy oil and product prices,” Energy Economics, vol. 27, no. 6, pp. 831–848, 2005.
- C. Morana, “A semiparametric approach to short-term oil price forecasting,” Energy Economics, vol. 23, no. 3, pp. 325–338, 2001.
- H. Mohammadi and L. Su, “International evidence on crude oil price dynamics: applications of ARIMA-GARCH models,” Energy Economics, vol. 32, no. 5, pp. 1001–1008, 2010.
- S. Mirmirani and H. C. Li, “A comparison of VAR and neural networks with genetic algorithm in forecasting price of oil,” Advances in Econometrics, vol. 19, pp. 203–223, 2004.
- M. I. Ahmad, “Modelling and forecasting Oman crude oil prices using Box-Jenkins techniques,” International Journal of Trade and Global Markets, vol. 5, no. 1, pp. 24–30, 2012.
- M. Aamir, A. Shabri, and M. Ishaq, “Crude oil price forecasting by CEEMDAN based hybrid model of ARIMA and kalman filter,” Jurnal Teknologi, vol. 80, 2018.
- I. Haidar, S. Kulkarni, and H. Pan, “Forecasting model for crude oil prices based on artificial neural networks,” in Proceedings of the 2008 International Conference on Intelligent Sensors, pp. 103–108, Sydney, Australia, December 2008.
- A. Khashman and N. I. Nwulu, “Intelligent prediction of crude oil price using Support Vector Machines,” in Proceedings of the 2011 IEEE 9th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 165–169, Smolenice, Slovakia, January 2011.
- W. Xie, L. Yu, S. Xu, and S. Wang, “A new method for crude oil price forecasting based on support vector machines,” in Proceedings of the International Conference on Computational Science, pp. 444–451, Faro, Portugal, June 2006.
- W. Kristjanpoller and M. C. Minutolo, “Forecasting volatility of oil price using an artificial neural network-GARCH model,” Expert Systems with Applications, vol. 65, pp. 233–241, 2016.
- T. Li, Z. Hu, Y. Jia, J. Wu, and Y. Zhou, “Forecasting crude oil prices using ensemble empirical mode decomposition and sparse Bayesian learning,” Energies, vol. 11, no. 7, p. 1882, 2018.
- S. L. Ho, M. Xie, and T. N. Goh, “A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction,” Computers & Industrial Engineering, vol. 42, no. 2-4, pp. 371–375, 2002.
- E. Chen and X. J. He, “Crude oil price prediction with decision tree based regression approach,” Journal of International Technology and Information Management, vol. 27, pp. 2–16, 2019.
- R. M. N. Y. Sarpong-Streetor, R. A. L. Sokkalingam, M. B. Othman, D. L. C. Ching, and H. b. Sakidin, “A hybrid autoregressive integrated moving average-phGMDH model to forecast crude oil price,” International Journal of Energy Economics and Policy, vol. 9, no. 5, pp. 135–141, 2019.
- N. Bokde, A. Feijóo, D. Villanueva, and K. Kulat, “A review on hybrid empirical mode decomposition models for wind speed and wind power prediction,” Energies, vol. 12, no. 2, p. 254, 2019.
- Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted data analysis method,” Advances in Adaptive Data Analysis, vol. 01, no. 01, pp. 1–41, 2009.
- L. Xie, X. Lang, A. Horch, and Y. Yang, “Online oscillation detection in the presence of signal intermittency,” Control Engineering Practice, vol. 55, pp. 91–100, 2016.
- X. Lang, N. ur Rehman, Y. Zhang, L. Xie, and H. Su, “Median ensemble empirical mode decomposition,” Signal Processing, vol. 176, Article ID 107686, 2020.
- D. C. Montgomery, C. L. Jennings, and M. Kulahci, Introduction to Time Series Analysis and Forecasting, John Wiley & Sons, Hoboken, NJ, USA, 2015.
- M. Aamir and A. Shabri, “Modelling and forecasting monthly crude oil price of Pakistan: a comparative study of ARIMA, GARCH and ARIMA Kalman model,” AIP Conference Proceedings, vol. 2341, Article ID 060015, 2016.
- R. Sharda, “Neural networks for the MS/OR analyst: an application bibliography,” Interfaces, vol. 24, no. 2, pp. 116–130, 1994.
- G. Zhang, B. Eddy Patuwo, and M. Hu, “Forecasting with artificial neural networks:,” International Journal of Forecasting, vol. 14, no. 1, pp. 35–62, 1998.
- G. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, pp. 159–175, 2003.
- D. Srinivasan, “Energy demand prediction using GMDH networks,” Neurocomputing, vol. 72, no. 1–3, pp. 625–629, 2008.
- A. Shabri and R. Samsudin, “A hybrid GMDH and box-jenkins models in time series forecasting,” Applied Mathematical Sciences, vol. 8, pp. 3051–3062, 2014.
- T. Kondo, “The learning algorithms of the GMDH neural network and their application to the medical image recognition,” in Proceedings of the 37th SICE Annual Conference, pp. 1109–1114, Tokushima, Japan, July 1998.
- M. Aamir and A. Shabri, “Improving crude oil price forecasting accuracy via decomposition and ensemble model by reconstructing the stochastic and deterministic influences,” Advanced Science Letters, vol. 24, no. 6, pp. 4337–4342, 2018.
- R. Samsudin and P. Saad, A Time Series Forecasting Model Using Group Method of Data Handling (GMDH).
- N. D. Bokde, Z. M. Yaseen, and G. B. Andersen, “ForecastTB-an R package as a test-bench for time series forecasting-application of wind speed and solar radiation modeling,” Energies, vol. 13, no. 10, p. 2578, 2020.
- W. Gao, M. Aamir, A. B. Shabri, R. Dewan, and A. Aslam, “Forecasting crude oil price using kalman filter based on the reconstruction of modes of decomposition ensemble model,” IEEE Access, vol. 7, pp. 149908–149925, 2019.
- P. Xu, M. Aamir, A. Shabri, M. Ishaq, A. Aslam, and L. Li, “A new approach for reconstruction of IMFs of decomposition and ensemble model for forecasting crude oil prices,” Mathematical Problems in Engineering, vol. 2020, Article ID 1325071, 23 pages, 2020.
- M. Aamir, A. Shabri, and M. Ishaq, “Improving forecasting accuracy of crude oil prices using decomposition ensemble model with reconstruction of IMFs based on ARIMA model,” Malaysian Journal of Fundamental and Applied Sciences, vol. 14, no. 4, pp. 471–483, 2018.
Copyright © 2021 Waqas Ahmad et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.