#### Abstract

The global economy experienced turbulent uneasiness for the past five years owing to large increases in oil prices and terrorist’s attacks. While accurate prediction of oil price is important but extremely difficult, this study attempts to accurately forecast prices of crude oil futures by adopting three popular neural networks methods including the multilayer perceptron, the Elman recurrent neural network (ERNN), and recurrent fuzzy neural network (RFNN). Experimental results indicate that the use of neural networks to forecast the crude oil futures prices is appropriate and consistent learning is achieved by employing different training times. Our results further demonstrate that, in most situations, learning performance can be improved by increasing the training time. Moreover, the RFNN has the best predictive power and the MLP has the worst one among the three underlying neural networks. This finding shows that, under ERNNs and RFNNs, the predictive power improves when increasing the training time. The exceptional case involved BPNs, suggesting that the predictive power improves when reducing the training time. To sum up, we conclude that the RFNN outperformed the other two neural networks in forecasting crude oil futures prices.

#### 1. Introduction

During the past three years, the global economy has experienced dramatic turbulence owing to uneasinessbecause of terrorists’ attacks and rapidly rising oil prices. For example, the US light crude oil futures price rapidly climbed to the all-time peak about US$80 recently. Simultaneously, the US Federal Reserve continuously increased its benchmark short-term interest rates by seventeen times to prevent inflation till August 2006. Consequently, many governments and corporate managers attempted to seek a method of accurately forecasting the crude oil prices.

Accurate prediction of crude oil price is important yet extremely complicated and difficult. For example, Kumar [1] found that the traditional model-based forecasts had larger errors compared to forecast crude oil prices using futures price. However, Pham and Liu [2] and Refenes et al. [3] showed that neural networks had significant performance of forecasting. According to Chen and Pham [4], numerous real-world application problems cannot be fully described and handled via classical set theory. Meanwhile, fuzzy set theory can deal with partial membership. Although Omlin et al. [5] argued that fuzzy neural networks (FNNs) combine the advantages of both fuzzy systems and neural networks, whereas Omlin et al. [5] proposed that most of the FNNs could only process state input-output relationships, FNNs were unable to process temporal input sequences with arbitrary length. Since recurrent neural networks (RNNs) are dynamic systems involving temporal state representation, RNNs are computationally powerful. Although the Elman recurrent neural network (ERNN) is a special case of RNN and is less efficient than standard RNN, Pham and Liu [2] posited that ERNN can model a very large class of linear and nonlinear dynamic systems. Additionally, Lee and Teng [6] argued that the RFNNs have the same advantages as RNNs and extended the application domain of the FNNs to temporal problems. These findings motivate us to apply three neural network models (namely, traditional backpropagation neural networks (BPNs), Elman recurrent neural networks (ERNNs), and recurrent fuzzy neural networks (RFNNs)) to forecast crude oil futures prices.

The focus of this paper is to apply neural networks for predicting crude oil futures prices. This work has the following objectives: forecast crude oil futures prices using BPNs, ERNNs, and RFNNs; compare the learning and predictive performance among BPNs, ERNNs, and RFNNs, and explore how training time affects prediction accuracy.

This study classifies the previous literature into three main groups: (1) the studies that compared artificial neural networks (ANNs) with other methods to forecast futures prices, (2) the works that combined fuzzy systems with recurrent neural networks, and (3) the researches that examined the evolution or forecasting accuracy of energy futures prices.

The following studies have applied various ANNs to predict futures prices. Refenes et al. [3], Castillo and Melin [7], Giles et al. [8], Donaldson and Kamstra [9], and Sharma et al. [10] all demonstrated that neural networks outperformed classical statistical techniques in forecasting ability. Although Kamstra and Boyd [11] also found that ANNs outperformed the naive model for most commodities in forecasting ability, yet Kamstra and Boyd found that ANNs have less predictive power than linear model for barley and rye.

The following works combined fuzzy system with recurrent neural networks. Omlin et al. [5], Juang and Lin [12], Nürnberger et al. [13, 14], Zhang and Morris [15], Giles et al. [8], Lee and Teng [6], Mastorocostas and Theocharis [16, 17], Juang [18], Yu and Ferreyra [19], Hong and Lee [20], and Lin and Chen [21] all designed to combine recurrent neural networks (RNNs) with fuzzy system for identification and prediction.

The following researches examined the evolution or forecasting accuracy of energy prices. Hirshfeld [22], Ma [23], Serletis [24], Kumar [1], Pindyck [25], Adrangi et al. [26], and Krehbiel and Adkins [27] reviewed or examined the energy futures prices and the price risk. Among these literatures, Adrangi et al. [26] found strong evidence of nonlinear dependencies in crude oil futures prices, but the evidence is not consistent with chaos.

The paper is organized as follows. Three kinds of artificial neural networks are described in Section 2. The performance of the proposed learning algorithms is examined by the computer simulations are described in Section 4. Conclusion in presented in Section 5. The performance valuation method is presented in Section 3.

#### 2. Neural Networks for Crude Oil Forecasting

##### 2.1. Multilayer Perceptron (MLP)

As a promising generation of information processing system that expresses the ability to learn, recall, and generalize based on training patterns or data, artificial neural networks (ANNs) are interconnected assembly of simple processing nodes, whose functionality is similar to human neurons. ANNs have become popular during the last two decades for diverse applications, ranging from financial prediction to machine vision. According to Refenes et al. [3], the main potentials with respect to ANNs include the following: (1) handling complex nonlinear function; (2) learning from training data; (3) adapting to changing environments; (4) handling incomplete, noisy, and fuzzy information; (5) performing high-speed information processing. The most popular and widespread method used to train the multilayer perceptron (MLP) is the back propagation algorithm. MLP can be interpreted as a universal approximators and is used to estimate the parameter values via a gradient descent algorithm in problems involving nonlinear regression. The popularity of MLP is based on the simplicity and power of the underlying algorithm. Figure 1 shows the structure of MLP.

BPN involves two steps. The first step generates a forward flow of activation from the input layer to the output layer via the hidden layer.

The sigmoid function is usually served as
where . An activation function can be differentiated since the steepest descent method is employed to derive the weight updating rule. The response of the hidden layer is the input of the output layer. In the second step, an overall error, , which is the difference between the actual and the desired output, is minimized employing a supervised learning task performed by MLP:
where denotes the total error for a neural network across the entire training set, represents the network error for the th pattern, denotes the desired output of the th unit in the output layer for pattern *p*, and is the actual output of the th unit in the output layer for the th pattern.

Then the gradient method is applied to optimize the weight vector of to minimize the summed square error between the actual and the desired network outputs throughout the training period. The network weight is adjusted whenever a training data is inputted. The size of the adjustment is positively related to the sensitivity of the error function to weight connections. The general weight updating rule for the connection weight between the th input node and the th output node is as follows: where is the learning rate.

BPNs can be widely applied to sample identification, pattern matching, compression, classification, diagnosis, credit rating, stock price trend forecasting, adaptive control, functional link, optimization, and data clustering. They can also be trained via a supervised learning task to reduce the difference between the desired and the actual outputs and have high learning accuracy. Yet BPNs have the following weaknesses: (1) slow learning speed, (2) long executing time, (3) very slow convergence; (4) falling into a local minimum of error functions, (5) lack of systematic methods in the network dynamics, (6) inability to use past experience to forecast its future behavior. This study further uses two dynamic neural networks to predict the crude oil futures prices.

##### 2.2. Recurrent Neural Networks

Recurrent neural networks (RNNs) were first developed by Hopefield [28] in a single-layer form and later were developed using multilayer perceptrons comprising concatenated input-output, processing, and output layers. An RNN is a dynamic neural network that permits self-loops and backward connections so that the neurons have recurrent actions and provide local memory functions. The feedback within RNN can be achieved either locally or globally. The ability of feedback with delay provides memory to the network and is appropriate for prediction. According to Haykin [29], RNN can summarize all required information on past system behavior to forecast future behavior. The ability of RNN to dynamically incorporate past experience based on internal recurrence makes it more powerful than BPN. The structure of RNN generally comprises the following: (a) recurring information from the output or hidden layer to the input layer, and (b) mutual connection of neurons within the same layer. The advantages of RNNs are as follows: (1) fast learning speed, (2) short executing time, and (3) fast converging to a stable state.

Refenes et al. [3] argued that RNNs exhibited full connection between each node and all other nodes in the network, whereas partial recurrent networks contain a number of specific feedback loops. They quoted Hertz et al. [30] who assigned the name Elman architecture when the feedback to the network input is from one or more hidden layers. This name originated from Elman [31] who designed a neural network with recurrent links for providing networks with dynamic memory. The Elman recurrent neural network (ERNN) is a temporal and simple recurrent network with a hidden layer, assuming that the neural network operates using discrete time steps. The activations of the hidden units at time are fed backwards and serve as inputs to “context layer” at time and thus represent a form of short-term memory that enables limited recurrence. Moreover, the feedback links run from the hidden layer to the context layer and produce both temporal and spatial patterns. As depicted in Figure 2, this network is a two-layer network involving feedback in the first layer [32].

##### 2.3. Elmann Recurrent Neural Network

According to Hammer and Nørskov [33], ERNN is a special case of RNN, differing mainly in that the learning algorithm is simply a truncated gradient descent method and training is less efficient than the standard method employed for RNN. However, Pham and Liu [2] argued that ERNN can be used to model a very large class of linear and nonlinear dynamic systems. Li et al. [34] designed a new nonlinear neuron-activation function into the framework of an Elman network, thus enabling the neural network to generate chaos. Li et al. [34] described a context layer in the Elman network as follows:

The descriptive equations of ERNN can be considered as a nonlinear state-space model in which all weight values are constant following initialization: where demonstrates weight linking the th hidden-layer neuron and the th context-layer neuron, indicates weight linking the input neuron and the th hidden-layer neuron, refers to weight linking the output neuron and the th hidden-layer neuron, expresses nonlinear activation function in the hidden-layer node, and is the number of hidden-layer nodes. Since it is difficult to interpret the network functions of RNNs, this study further incorporates the fuzzy logic into RNNs.

##### 2.4. Recurrent Fuzzy Neural Networks

Fuzzy sets theory has first been introduced by Zadeh [35, 36]. Zimmermann [37] argued that fuzzy set theory can be adapted to different circumstances and contexts. Buckley and Hayashi [38] stated that fuzzy neural network (FNN) is a layered, feedforward, neural net that has fuzzy set signals and weights. Neural networks may utilize the data bank to train and learn, while the solution obtained by fuzzy logic may be verified by empirical study and optimization. Omlin et al. [5] noted that FNN comprises both clear physical meanings and good training ability. However, FNN only applies to static problems.

Besides the fact that RNN’s underlying theory is complicated and RNN is difficult to interpret, Hu and Chang [39] also found that there are limitations to forecast the accurate valuation for the long-term period by both BPN and RNN. This study thus uses a recurrent fuzzy neuron network (RFNN) model in addition to BPN and ERNN. According to Lee and Teng [6], RFNN has several key aspects: dynamic mapping capability, temporal information storage, universal approximation, and the fuzzy inference system. Li et al. [34] also argued that RFNN has the same dynamic and robust advantage as RNN. In addition, the network function can be interpreted using fuzzy inference mechanism. Therefore, long-term prediction fuzzy models can be easily implemented using RFNNs. The network output in RFNN is fed back to the network input using one or more time delay units. As depicted in Figure 3, the general microstructure of RFNN consists of four layers in general: an input layer, a membership layer, a fuzzy rule layer, and an output layer.

The information transmission process and basic functions of each layer are as follows.(1) Input layer: the input nodes in this layer represent input variables. The input layer only transmits the input value to the next layer directly and no computation is conducted in this layer. From (2.5), the connection weight at the input layer () is unity: (2) Membership layer: the membership layer is also known as a fuzzification layer and contains several different types of neurons, each neuron performs membership function. The membership nodes in this layer correspond to the linguistic label of the input variables in the input layer and serve as a unit of memory. Each of these variables is transformed into several fuzzy sets in the membership layer where each neuron corresponds to a particular fuzzy set, with the actual membership function being provided by the neuron output. Each neuron in this layer represents characteristics of each membership function, and Gaussian function serves as the membership function. The th neuron in this layer has the following input and output: where denotes the mean value of a Gaussian membership function of the th term with respect to the th input variable, represents the standard derivation of the Gaussian type membership function of the th term with respect to the th input, is the input of this layer at the discrete time , denotes the feedback unit of memory which stores the past network Information and represents the main difference between FNN and RFNN, and indicates the connection weight of the feedback unit. Each node in the membership layer possesses three adjustable parameters: , , and .(3) Fuzzy rule layer: The fuzzy rule layer comprises numerous nodes, each node corresponds to a fuzzy operating region of the process being modeled. This layer constructs the entire fuzzy rule data set. The nodes in this layer equal the number of fuzzy sets corresponding to each external linguistic input variable and receive the one-dimensional membership degree of the associated rule from the nodes of a set in the membership layer. The output of each neuron in the fuzzy rule layer is obtained by using a multiplication operation. The input and output for the th neuron in the fuzzy rule layer are as follows: where , , , is the th input value that inputs to the neuron of the fuzzy rule layer, and denotes the output of a fuzzy rule node representing the “firing strength” of its corresponding rule. Links before and fuzzy rule layer indicate the preconditions of the rules, and links after and fuzzy rule layer demonstrate the consequences of the fuzzy rule nodes.(4) Output layer: the output layer performs the defuzzification operation. Nodes in this layer are called output linguistic nodes, where each node is for an individual output of the system. The links between the fuzzy rule layer and the output layer are connected by the weighting values . For the th neuron in the output layer, where is the output action strength of the th output associated with the th fuzzy rule and serves as the tuning factor of this layer, is the final inferred result, and represents the th output of the FRNN.

#### 3. Valuation Performance

This work uses the mean square error (MSE) method to assess the performance of three neural networks. The MSE is calculated as the average of the sum of the square of the error, which is given by the difference between the actual and the designed output. MSE thus is computed as where indicates the total number of samples, refers to the number of estimated samples, represents the actual output, and denotes the desired output.

#### 4. Computer Simulations

##### 4.1. Data Description

This study focuses on energy futures for the near-month. Daily oil prices for Brent, WTI, DUBAI, and IPE are used in this investigation. The data sources are obtained from the Energy Bureau of USA and International Petroleum Exchange (IPT) of the Great Britain. This work explores the influence of training times on prediction performance so that it classifies the training period from January 1, 1990 to April 30, 2005 into three five-year sections. Table 1 shows the different training periods.

##### 4.2. Comparison among Various Neural Networks

This work divides the training period into three parts and uses Matlab software to perform training and testing. In order to compare the predictive power of these three artificial neural network (ANNs), the training function, namely, Levenberg Marquardt method, is employed, and the number of iterations over the data set is arbitrarily set to 1000 in order to train individual neural networks.

Table 2 shows the symbolization and training times of multi-layer perceptron (MLP), Elman recurrent neural networks (ERNNs) and recurrent fuzzy neural networks (RFNNs).

*(1) The Comparison of Learning Performance*

Following 1000 training times, Table 3 illustrates the following ranking for the learning performance of the three ANNs: RFNN ranks first, followed by ERNN, and finally BPN. Table 3 shows that the learning performance of the ANNs improves with increasing training time of ANNs. One exceptional case is the MSE at part 2 under RFNN, which is less than that obtained from part 3.

*(2) The Comparison of Predictive Power*

The empirical results indicate that the predictive power of the three ANNs is ranked as follows: RFNN ranks first, followed by ERNN, and finally MLP. Table 4 shows that, under ERNNs and RFNNs, the predictive power of the ANNs improves with increasing training time. However, the predictive power of MLP differs from those of ERNNs and RFNNs as the predictive power of the MLP retrogresses with increasing training time. One possible explanation for this phenomenon is that a large difference exists between the forecasting value and the actual value from March 20, 2005 to March 28, 2005. The MLP is not a dynamic network and it cannot be applied by the past experience to the behavioral forecasting.

#### 5. Conclusion

This study uses multi-layer perception (MLP), Elman recurrent neural networks (ERNNs) and recurrent fuzzy neural networks (RFNNs) to forecast the crude oil prices and compare the predictive power of the above three neural network models. Results of this work are summarized as follows.

All of the MSE values obtained under different training times through MLP, ERNNs, and RFNNs are below 0.0026768, suggesting that the use of the neural networks to forecast the crude oil futures prices is appropriate, and consistent learning ability can be obtained by using different training times. This investigation confirms that, under most circumstances, the more training times the neural networks take, the more the learning performance of the neural networks improves. The only exceptional case occurs at part 2 under the RFNN model, where MSE is slightly less than that obtained from part 3.

Regarding the predictive power of the three neural networks, this study finds that RFNN has the best predictive power and MLP has the least predictive power among the three neural networks. This work also finds that, under ERNNs and RFNNs, the predictive power improves when increasing the training time. However, the results are different from those obtained under MLP, indicating that the predictive power improves when decreasing the training time. Possible explanation for this phenomenon is the existence of a large difference between the predictive value and the actual value during a 9-day period. To summarize, this study concludes that the recurrent fuzzy neural network is the best among the three neural networks.

#### Acknowledgments

The authors would like to thank Dr. Oleg Smirnov for his insightful comments at the 81st WEA Annual Conference on July 1, 2006. Also, the authors would like to thank the anonymous referees for their valuable comments. This research is partially supported by the National Science Council of Taiwan under grant NSC 99-2410-H-033-026-MY3.