Abstract

Crude oil is the most important nonrenewable energy resource and the most key element for the world. In contrast to typical forecasts of oil price, this study aims at forecasting the demand of imported crude oil (ICO). This study proposes different single stage and two-stage hybrid stages of forecasting models for prediction of ICO in Taiwan. The single stage forecasting modeling includes multiple linear regression (MLR), support vector regression (SVR), artificial neural networks (ANN), and extreme learning machine (ELM) approaches. While the first step of the two-stage modeling is to select the fewer but more significant explanatory variables, the second step is to generate predictions by using these significant explanatory variables. The proposed two-stage hybrid models consist of integration of different modeling components. Mean absolute percentage error, root mean square error, and mean absolute difference are utilized as the performance measures. Real data set of crude oil in Taiwan for the period of 1993–2010 and twenty-three associated explanatory variables are sampled and investigated. The forecasting results reveal that the proposed two-stage hybrid modeling is able to accurately predict the demand of crude oil in Taiwan.

1. Introduction

Natural resources are often classified into two groups: renewable and non-renewable resources. Renewable energy is the one which comes from natural resources such as sunlight, wind, and rain, and it is naturally replenished. It is reported that about 16% of global final energy consumption comes from renewable energy. The share of renewable energy in global electricity generation is around 19% [1]. In the United States, renewables provided 12.7% of total domestic electricity in 2011, up from 10.2% in 2010, and 9.3% in 2009. In China, wind power generation increased by more than 48.2% in 2011. In European Union, renewables accounted for more than 71% of total electric capacity additions in 2011 [2]. Non-renewable resources form very slowly or do not naturally form in the environment. A good example of the nonrenewable is fossil fuels. Although many studies have reported that the renewable sources of energy are currently receiving considerable attention, the fossil fuels are still the most needed element of the world energies [3]. In particular, crude oil is the most important nonrenewable energy resource and the most key element for the world [4, 5].

The relationship between economic growth and oil consumption was addressed [6], and the study determined that the minimum statistical (lower-bound) annual oil consumption for developed countries was 11 barrels per capita. By using autoregressive distributed lag (ARDL) bounds testing approach of cointegration, the study discussed a long-run relationship among quantity of crude oil import, income, and price of the imported crude in India [7]. The results showed that the long-term income elasticity of imported crude in India is 1.97 and there existed a unidirectional long-run causality running from economic growth to crude oil import. The demand for ICO in South Africa as a function of real income and the price of crude oil were studied [8]. The estimated long-run price and income elasticities revealed that import demand for crude oil is price and income inelastic.

The study estimated the short-run and long-run elasticities of demand for crude oil in Turkey for the period of 1980 to 2005 [9]. The comparative study was performed in the study [10]. A decision support system for forecasting fossil fuel production in Turkey, using a regression ARIMA and SARIMA method, was developed for the period of 1950–2003. Also, the study applied the ARIMA and SARIMA methods to predict primary energy demand in Turkey for the period from 2005 to 2020 [11].

In addition to forecasting the oil demand or consumption, several forecasting methods were applied to predict different types of energies. For example, regression modeling was employed to forecast the coal, oil, and gas electricity requirement [12]. The MR models were also applied to predictions for electricity consumption in Taiwan, Brazil, Delhi, and Hong-Kong, respectively [1315]. The study [16] used the multiple linear regression techniques to develop simple empirical relations for the estimation of daily and monthly evaporation in Kuwait. While the ARIMA approaches were commonly used for predictions of energy demand [10, 11], the ANN models were also widely used for predictions of energy demand [11, 1720]. Another promising forecasting technique, SVR, was used to forecast the energy demand [21, 22]. Additionally, the various hybrid modeling approaches were reported in forecasting energy demand in several studies [2326], and the hybrid modeling schemes appear to be a promising technique. The study provided literature review in detail about the forecasting modeling in energy issue [27].

Extreme learning machine (ELM) proposed by Huang et al. [28, 29] is a novel learning algorithm for single-hidden-layer feedforward neural networks (SLFN). It provides much better generalization performance with much faster learning speed and avoids many issues faced in the traditional algorithms such as stopping criterion, learning rate, number of epochs and local minima, and the over-tuned problems [29]. Moreover, the universal approximation capability of ELM has been analyzed and proven by [3032] to show the effectiveness of ELM. Thus, ELM has attracted a lot of attentions in recent years and been used for various forecasting issues, such as sales forecasting [3335], stock price forecasting [36, 37], and electricity price forecasting [38].

In June 1946, Chinese Petroleum Corp. (CPC) was funded, and the headquarters was set up in Taipei under the direction of the Ministry of Economic Affairs. With service facilities covering the whole nation, its operations today include the import, exploration, development, refining, transport, marketing, and sale of petroleum and natural gas. CPC’s total capital stands at NT$130.1 billion, and its total revenues in 2011 amounted to NT$1.03 trillion.

Since crude oil is extremely important for development of Taiwan’s economy, the predictions of the demand of imported crude oil are a must. Accordingly, this study is aimed at proposing single and two-stage forecasting techniques to predict the demand of imported crude oil in Taiwan. The single stage forecasting modeling includes the support vector regression (SVR), artificial neural networks (ANN), extreme learning machine (ELM), and multiple linear regression (MLR) approaches. The two-stage models combine the two modeling components. The first component of the model uses its own feature to capture the significant explanatory variables. Then, the second component generates the predictions based on these explanatory variables. In this study, the combinations of MLR and SVR (i.e., refer to MLR-SVR), MLR and ANN (i.e., refer to MLR-ANN), and MLE and ELM (i.e., refer to MLR-ELM) are used as the two-stage models.

Real data are sampled for the period of 1993–2010 for the ICO in Taiwan. According to the suggestion [39], twenty-three associated variables are collected for serving as the explanatory variables. The predictions for ICO in Taiwan can be made based on the single stage and two-stage forecasting models. This study uses the first 14 years (1993–2006) of data for model building, and the last four years’ data are used for the purpose of confirmation. The mean absolute percentage error (MAPE), the root mean square error (RMSE), and the mean absolute difference (MAD) are used as the forecasting accuracy measures.

The contents of this study are organized as follows. The following section introduces the proposed forecasting techniques. Section 3 presents the real data of ICO and the forecasting results. The performances for all of the forecasting models are demonstrated and discussed. The final section, Section 4, concludes this study.

2. Research Methodologies

This study considers MLR, SVR, ANN, ELM, and their hybrid modeling schemes as possible forecasting models for import demand of crude oil in Taiwan. These forecasting techniques are introduced in the subsequent sections.

2.1. Multiple Linear Regression

The multiple linear regression analysis is the procedure by which an algebraic equation is formulated to estimate the value of a dependent variable , given the values of some other independent variables , . The relationship between the dependent and independent variables is expressed by a linear algebraic model as follows: Based on the least squares or maximum likelihood criterion, the so-called normal equations and the point estimators of can be obtained. Under the normal and some mild assumptions, the sampling distributions and hence the statistical inference for the estimators of can be derived.

To identify significant independent variables, the backward elimination, forward selection, or stepwise regression procedures can be applied. The backward elimination procedure begins with the model which includes all of the available explanatory variables, and successively deletes one variable at a time to the model in such a way that at each step, the variable deleted is the variable contributing the least to the prediction of dependent variable at that step. On the contrary, the forward selection procedure begins with the constant model that includes no explanatory variable, and successively adds one variable at a time to the model in such a way that, at each step, the variable added is the variable contributing the most to the prediction at that step. The stepwise regression procedure is an admixture of the backward elimination procedure and the forward selection procedure. This selection procedure builds a sequence of models and at each step deletes or adds an explanatory variable according to some selection criterions such as coefficient of partial correlation or error sum of squares reduction.

2.2. Support Vector Regression

While support vector machine (SVM) is one of the most powerful techniques in machine learning areas [4044], support vector regression (SVR) is the most common application form of SVMs. Due to its effectiveness, SVR has been used for predictions in many fields [4549].

Statistical learning theory and structural risk minimization principle have provided a very effective framework for development of support vector regression [4851]. Based on the computation of a linear regression function in a high dimensional feature space, the inputs for SVR are mapped via a nonlinear function. The modeling of SVR can be described as follows. Suppose that where is the weight vector, represents the model inputs, is a bias, and stands for a kernel function which uses a nonlinear function to transform the nonlinear input to be linear mode in a high dimension feature space.

Typical regression modeling obtains the coefficients through minimizing the square error, which can be considered as empirical risk based on loss function. The -insensitivity loss function was introduced [51], and it can be described as follows: where is the target outputs and defines the region of -insensitivity; when the predicted value falls into the band area, the loss is zero. However, when the predicted value falls outside the band area, the loss is defined as the difference between the predicted value and the margin.

When empirical risk and structure risk are both considered, the SVR can be setup to minimize the following quadratic programming problem: where is the number of training data, represents the empirical risk, stands for the structure risk preventing over-learning and lack of applied universality, and is a modifying coefficient representing the trade-off between empirical risk and structure risk. With an appropriate modifying coefficient , band area width , and kernel function, the optimum value of each parameter can be solved by Lagrange procedure.

The general form of the SVR-based regression function is described as follows [51]: where and are Lagrangian multipliers and satisfy the equality . Additionally, since the radial basis function (RBF) is the most widely used kernel function [49], this study uses it for our experimental study. The RBF can be defined as follows: where denotes the width of the RBF.

2.3. Artificial Neural Networks

Artificial neural networks, originally derived from neurobiological models, are massively parallel, computer-intensive, and data-driven algorithmic systems composed of a multitude of highly interconnected nodes, known as neurons as well. Mimicking human neurobiological information-processing activities, each elementary node of a neural network is able to receive an input single from external sources or other nodes and the algorithmic procedure equipped in each node is sequentially activated to locally transforming the corresponding input single into an output single to other nodes or environments.

It was indicated that knowledge is not stored within individual processing units, but is represented by the strength between units [52]. They also stated that neural networks can be classified into two categories: the feedforward networks and the feedback networks. The feedback networks contain neurons that are connected to each other, enabling a neuron to influence other neurons. The feedback networks contain neurons that are connected to themselves, enabling a neuron to influence other neurons and itself. Also, in recent years, there has been a great deal of attention toward the field of ANN [41, 43, 53, 54].

For ANN modeling, the relationship between output and inputs can be formed as where and are model connection weights, is the number of input nodes, is the number of hidden nodes, and is the error term. The transfer function in the hidden layer is often represented by a logistic function; that is,

Accordingly, the ANN model in (7) accomplishes a nonlinear functional mapping from the inputs to the output ; that is, where is a vector of all model parameters and is a function determined by the ANN structure and connection weights.

2.4. Extreme Learning Machine

ELM randomly selected the input weights and analytically determined the output weights of SLFNs. One may randomly choose and fix the hidden node parameters which are the key principle of the ELM. After randomly choosing the hidden nodes parameters, SLFN becomes a linear system where the output weights of the network can be analytically determined using simple generalized inverse operation of the hidden layer output matrices [28, 29].

In general, the concept of ELM is similar to that of the random vector functional-link (RVFL) network where the hidden neurons are randomly selected. However, the main difference between ELM and RVFL is the characteristics of hidden neuron parameters. In ELM, all the hidden node parameters are randomly generated independently of the target functions and the training patterns [55]. In RVFL, the selection of hidden neurons is based on partial randomness and the randomly generated hidden node parameters are not completely independent of the training data [56]. That is, the universal approximation capability of ELM can be linearly extended to RVFL [55, 56].

Consider arbitrary distinct samples () where and . SLFNs with hidden neurons and activation function can approximate samples with zero error. This means that where and , , is the weight vector connecting the th hidden node and the input nodes, is the weight vector connecting the th hidden node and the output nodes, and is the threshold of the th hidden node. denotes the inner product of and . is called the hidden layer output matrix of the neural network; the th column of is the th hidden node output with respect to inputs . Therefore, the determination of the output weights is as simple as finding the least-square solution to the given linear system. The minimum norm least-square (LS) solution to the linear system is where is the Moore-Penrose generalized inverse of matrix . The minimum norm LS solution is unique and has the smallest norm among all the LS solutions.

The first step of ELM algorithm is to randomly assign input weight and bias . Then, the hidden layer output matrix is calculated. Finally, one can calculate the output weight , , where .

2.5. Hybrid Models

Recent research indicates that hybrid systems which are integrated with several standard ones can help to achieve a better performance for some applications. For example, the hybrid modeling applications have been reported in forecasting [5762], credit risk [61], and manufacturing process [4144]. As a consequence, this study proposes a two-step hybrid data mining mechanisms to predict the demand of ICO in Taiwan. This study proposes three hybrid data mining models, namely, the integration of MLR and ANN (i.e., MLR-ANN), MLR and SVR (i.e., MLR-SVR), and MLR and ELM (i.e., MLR-ELM). The concept of those hybrid modeling is as follows.

In the first stage of hybrid modeling, more significant variables are selected using MLR with forward selection, backward elimination or stepwise regression, say, . In the second stage, the selected significant variables obtained in the first stage are served as the inputs of ANN, SVR, and ELM in order to establish the two-stage hybrid forecasting models. By providing the ANN, SVR, and ELM with good starting points, it is hoped that more effective models can be developed on the strength of their learning ability. Such two-stage hybrid models then are compared with the single-stage of MLR, ANN, SVR, and ELM models.

3. Results and Analysis

To show the effectiveness of the proposed hybrid modeling, the real data, from the years 1993 to 2010, were sampled for the ICO from Bureau of Energy in Taiwan [62]. Following the suggestion [39] and having discussed with the ICO practitioners, we sample 23 associated influential variables. Table 1 lists these 23 variables. The yearly data were collected for the period of 1993–2010 from the web sites of Bureau of Energy in Taiwan [62]. The first 14 years’ data are used for model building, and the last four years’ data are used for model confirmation.

3.1. Single-Stage Modeling

Figure 1 displays the yearly data of ICO in Taiwan for the periods of 1993–2006. For MLR modeling, we first compute the variance inflation factors (VIFs) to examine the existence of collinearity.

Our numerical results reveal that all the values of VIFs of independent variables are greater than 10 except the variables of and , indicating that there may exists high collinearity among the independent variables. To eliminate variables with high collinearity, we apply the Pearson correlation coefficient. If the correlation coefficient between two independent variables is greater than 0.7, we eliminate the variable which has a lower relationship with dependent variable . Table 2 lists the analysis results. As shown in Table 2, after excluding the variables with high collinearity, there remained six independent variables, including , , , , , and . In Table 3, it can be found that all the values of VIFs of the remaining variables are smaller than 10. As a consequence, there is no high collinearity among these independent variables.In addition, we apply the backward elimination, forward selection, and stepwise regression proceduresto identify significant independent variables, and use a 0.05 significance level to perform the MLR analysis.All the three procedures have the same selection results. As listed in Table 4, significant independent variables related to imported crude oil in Taiwan include average electricity price () and dependence on imported energy (). Accordingly, the MLR model is derived as follows:

The regression coefficients in the MLR model indicate that the higher the average electricity price is, the lower the imported crude oil is. On the contrary, the higher the dependence on imported energy is, the lower the imported crude oil is.

For ANN modeling, since the backpropagation neural network (BPNN) structure has been widely used [43, 54], this study employs BPNN as ANN modeling structure. In BPNN structure, we have 23 input nodes and one output node. The hidden nodes range from to , where is the number of input variables. Thus, the hidden nodes were set up as 21, 22, 23, 24, and 25, respectively. The training and testing processes include 14 and 4 data vectors for possible parameter setting. The learning rates are 0.01, 0.005, and 0.001, respectively, according to the suggestions [43].

After applying ANN to ICO data, we have obtained the {23-22-1} topology with a learning rate of 0.01 which provides the best result. The represents the number of nodes in the input layer, hidden layer, and output layer, respectively. For SVR modeling, same as ANN modeling, we have 23 input variables. The two parameters, and gamma, were estimated as and , respectively.

As mentioned earlier, the most important ELM parameter is the number of hidden nodes and that ELM tends to be unstable in single run forecasting [29, 33]. Therefore, the ELM models with different numbers of hidden nodes varying from 1 to 15 are constructed. For each number of nodes, an ELM model is repeated 30 times and the average RMSE of each node is calculated. The number of hidden nodes that gives the smallest average RMSE value is selected as the best parameter of ELM model. In modeling ELM, the forecasting model with eight hidden nodes has the smallest average RMSE values and is therefore the best ELM model.

3.2. Proposed Hybrid Modeling

A rational strategy for a hybrid modeling is to use the fewer but more informative variables, which were selected by the first stage of modeling approaches, as the inputs for the second stage of classifier approaches. Accordingly, in this study, the significant variables selected, that is, average electricity price () and dependence on imported energy (), are used as the input variables of the ANN and SVR for hybrid modeling.

After completing the first stage of hybrid modeling, the ANN topology settings can be established. This study has found that the {2-3-1} topology with a learning rate of 0.01 provides the best result for the hybrid model. The network topology with the minimum testing RMSE is also considered as the optimal network topology. For the MR/SVR hybrid modeling, the parameters of and gamma were still estimated as and , respectively. In the construction of MR/ELM hybrid model, the model with eight hidden nodes has the smallest average RMSE values is the best MR/ELM hybrid model.

3.3. Experimental Results

In this study, we consider the forecasting accuracy measures of MAPE, MSE, and MAD to address the forecasting performance for the five different approaches, ANN, SVR, ELM, MRSEL, , , and , respectively. These prediction measurements are defined as follows: where stands for the value of the residual at time . A low MAPE, MSE or MAD is associated with better forecasting accuracy. The results are listed in Table 5. In Table 5, by considering single stage modeling approaches, we note that ELM model has the best performance than the MR, ANN, and SVR models.

In comparison to the single stage and our proposed hybrid models in Table 5, one is able to apparently observe that our proposed hybrid models provide more accurate results than the single stage models. In terms of MAPE, MSE, or MAD, the two hybrid models are all lower than the three single stage models. For example, the MAPE percentage improvements of the proposed model over the four-single-stage models, ANN, SVR, MRSEL, and ELM are 53.43%, 67.26%, 37.87%, and 28.23%, respectively. Table 6 lists a comparison with respect to the overall improvement percentage in the single stage models. In addition, we use independent sample t-test to test whether the hybrid models are superior to the single ones. Let and be the means of the absolute value of residuals for the single-stage models and hybrid models, respectively. The independent sample t-test is applied to test the following: The value is 0.0127. Obviously, the hybrid models outperform the single stage models.

Figure 2 shows the actual ICO values for the last four years and the corresponding forecasts by using four-single-stage models. It can be seen that the ELM model provides the best predictions of ICO. Figure 3 displays the actual ICO values and the corresponding forecasts by using three hybrid models. It shows that the hybrid MR-ANN model has the best forecasting capability. Figure 4 plots the actual ICO values and the corresponding forecasts by using the single MR and ANN models and their hybrid technique, MR-ANN. It apparently can be observed that the proposed hybrid model outperforms the single models. The same conclusions can be drawn by observing Figures 5 and 6. That is, the performance of the hybrid models is better than the single models.

As shown in Figures 4, 5, and 6 and Tables 5 and 6, the proposed hybrid models outperform all three-single-stage models. As a consequence, the proposed two-stage hybrid approaches are more efficient for forecasting ICO in Taiwan than the typical single stage methods.

4. Conclusions

Oil is not only used to make gas for cars, but for heating homes, producing electricity, making plastics, and other commodities. Oil and its byproducts are ingrained into almost every culture in the world. Therefore, the accurate prediction of the demand of ICO is very important for the economic development of a country.

Because it is difficult to fully capture the characteristics of the real ICO data, the two hybrid prediction models are then proposed to forecast the demand of ICO in Taiwan. Based on our numerical results, it is found that the proposed hybrid approaches are more accurate than the established single-stage ones. The modeling procedures and results of this work may provide a guidance to develop forecasting models for other energies.

Besides, there are many other two-stage hybrid forecasting models that have been proposed and applied in various fields [6370]. The proposed hybrid model in this study is not the only hybrid forecasting scheme for the prediction of demand of crude oil, as one can combine other artificial intelligence techniques or traditional multivariate models, like decision tree, multivariate adaptive regression splines, logistic regression, rough set, or independent component analysis, with ANN, SVR, or ELM for further improving the prediction accuracies. Based on our work, further research may be expanded. For example, extensions of the proposed hybrid prediction methods to other machine learning techniques or statistical prediction methods or to multistage prediction procedures are possible. Such works deserve further research and are our future concern.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported in part by the National Science Council of Taiwan, Grants NSC102-2221-E-030-019 and NSC 102-2118-M-030-001.