The Scientific World Journal

Volume 2016 (2016), Article ID 3460293, 15 pages

http://dx.doi.org/10.1155/2016/3460293

## Intelligent Soft Computing on Forex: Exchange Rates Forecasting with Hybrid Radial Basis Neural Network

^{1}Faculty of Management Science and Informatics, University of Zilina, Univerzitna 8215/1, 010 26 Zilina, Slovakia^{2}Faculty of Economics, VSB-Technical University of Ostrava, Sokolska Trida 33, 701 21 Ostrava 1, Czech Republic

Received 15 May 2015; Accepted 20 September 2015

Academic Editor: Venkatesh Jaganathan

Copyright © 2016 Lukas Falat et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper deals with application of quantitative soft computing prediction models into financial area as reliable and accurate prediction models can be very helpful in management decision-making process. The authors suggest a new hybrid neural network which is a combination of the standard RBF neural network, a genetic algorithm, and a moving average. The moving average is supposed to enhance the outputs of the network using the error part of the original neural network. Authors test the suggested model on high-frequency time series data of USD/CAD and examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, they perform a comparative statistical out-of-sample analysis of the tested model with autoregressive models and the standard neural network. They also incorporate genetic algorithm as an optimizing technique for adapting parameters of ANN which is then compared with standard backpropagation and backpropagation combined with -means clustering algorithm. Finally, the authors find out that their suggested hybrid neural network is able to produce more accurate forecasts than the standard models and can be helpful in eliminating the risk of making the bad decision in decision-making process.

#### 1. Introduction

Techniques of artificial intelligence and machine learning started to apply in time series forecasting. One of the reasons was the study of Bollerslev [1], where he proved the existence of nonlinearity in financial data. First models of machine learning applied into time series forecasting were artificial neural networks (ANNs) [2]. This was due to the fact that the artificial neural network is a universal functional black-box approximator of nonlinear type [3–5] that is especially helpful in modeling of nonlinear processes having a priori unknown functional relations or system of relations is very complex to describe [6] and they are even able to model chaotic time series [7]. They can be used for nonlinear modeling without knowing the relations between input and output variables. Thanks to this, ANNs have been widely used to perform tasks like pattern recognition, classification, or financial predictions [8–11]. Following the theoretical knowledge of perceptron neural network published by McCulloch and Pitts [12] and Minsky and Papert [13], nowadays, it is mainly radial basis function (RBF) network [14, 15] that has been used as it showed to be better approximator than the basic perceptron network [16–18].

In this work we extend the standard RBF model by using moving average for modeling the errors of RBF network. We chose the RBF neural network for our exchange rates forecasting experiment because according to some studies [19] ANNs have the biggest potential in predicting financial time series. In addition, Hill et al. [20] showed that ANNs work best in connection with high-frequentional financial data. Moreover, we will combine the standard ANN with EC technique called genetic algorithms. As, according to some scientists [21], the use of technical analysis tools can lead to efficient profitability on the market, we decided to combine our customized RBF network with moving averages [22]. We will use the simple moving average to model the error part of the RBF network as we supposed it could enhance the prediction outputs of the model.

Applying the prediction analysis, the forecasting ability of this nonlinear model will be compared and contrasted with a standard neural network and an autoregressive (AR) model with GARCH errors to determine the best model parameters for this currency pair forecasting problem. We will provide out-of-sample evidence since it focuses directly on predictability as it is important to avoid in-sample overfitting for this type of nonlinear models [23].

The soft computing application we suggest is novel in two ways; we use the standard neural network hybridized with simple moving averages to form a whole new hybrid model. Except for the standard algorithm for training the neural network, we also use other (advanced) technicques.

#### 2. Suggested Hybrid-RBF Neural Network Combined with Moving Average

Hybrid models have become popular in the field of financial forecasting in recent years. Since studies from Yang [24] or Clemen [25] theoretically proved that a combination of multiple models can produce better results, we will also use the combined model of customized RBF neural network (supplemented by genetic algorithms for weights adaptation) and simple moving average tool for modeling the error part of the RBF. We eliminate the error of the neural network by modeling the residuals of RBF.

Let be a function defined as* F*: which is a representation assigning one value to -dimensional input in a given time period . Let be a restriction of defined as , where is a complement of to . Then, the hybrid neural network model of RBF and SMA(*q*) is defined aswhere with

The necessary condition is that the model must be adapted to approximate the unknown function ; that is, the model must fulfill the condition that the difference between estimated output produced by the model and the original value is minimal.

, , , denote the ANN parameters; is the input vector of the dimension ; and are the parameters of the network (also called synapses or weights) and are used for the interconnection of the neural network. The input vector forms the input layer of the network; are weights going from the input layer to the hidden layer that is formed by hidden neurons. In the RBF network, the radial basis function of Gaussian type instead of a sigmoid function is used for activating neurons in hidden layer of a perceptron network. The Gaussian function for activating neurons is for th hidden neuron defined as , , where is the variance of th neuron and is the potential of the neuron. Furthermore, , are weights between th hidden neuron and the output layer that is represented by just one neuron (the network output). Activated neurons are weighted by weight vector in order to get the output of the network counted as .

#### 3. Methodology

The neural network we used for this research was RBF which is one of the most frequently used networks for regression. RBF has been widely used to capture a variety of nonlinear patterns (see [26]) thanks to their universal approximation properties (see [27]).

In order to optimize the outputs of the network and to maximize the accuracy of the forecasts we had to optimize parameters of ANN. The most popular method for learning in multilayer networks is called backpropagation. It was first invented by Bryson and Ho [28]. But there are some drawbacks to backpropagation. One of them is the “scaling problem.” Backpropagation works well on simple training problems. However, as the problem complexity increases (due to increased dimensionality and/or greater complexity of the data), the performance of backpropagation falls off rapidly [29]. Furthermore, the convergence of this algorithm is slow and it generally converges to any local minimum on the error surface, since stochastic gradient descent exists on a surface which is not flat. So the gradient method does not guarantee to find optimal values of parameters and imprisonment in local minimum is quite possible.

As genetic algorithms have become a popular optimization tool in various areas, in our implementation of ANN, backpropagation will be substituted by the GA as an alternative learning technique in the process of weights adaptation. Genetic algorithms (GA), which are EC algorithms for optimization and machine learning, are stochastic search techniques that guide a population of solutions towards an optimum using the principles of evolution and natural genetics [30]. Adopted from biological systems, genetic algorithms are based loosely on several features of biological evolution [31]. In order to work properly, they require five components [32], that is, a way of encoding solutions to the problem on chromosomes, an evaluation function which returns a rating for each chromosome given to it, a way of initializing the population of chromosomes, operators that may be applied to parents when they reproduce to alter their genetic composition, parameter settings for the algorithm, the operators, and so forth. GA are also characterized by basic genetic operators which include reproduction, crossover, and mutation [33]. Given these genetic operators and five components stated above, a genetic algorithm operates according to the following steps stated in [29]. When the components of the GA are chosen appropriately, the reproduction process will continually generate better children from good parents; the algorithm can produce populations of better and better individuals, converging finally on results close to a global optimum. Additionally, GA can efficiently search large and complex (i.e., possessing many local optima) spaces to find nearly global optima [29]. Also, GA should not have the same problem with scaling as backpropagation. One reason for this is that it generally improves the current best candidate monotonically. It does this by keeping the current best individual as part of their population while they search for better candidates.

In addition, as Kohonen [34] demonstrated that nonhierarchical clustering algorithms used with artificial neural networks can cause better results of ANN, unsupervised learning technique will be used together with RBF in order to find out whether this combination can produce the effective improvement of this network in the domain of financial time series. We will combine RBF with the standard unsupervised technique called -means (see [35]). -means algorithm, which belongs to a group of unsupervised learning methods, is a nonhierarchical exclusive clustering method based on the relocation principle. The most common type of characteristic function is location clustering. And the most common distance function is Euclidean.

The -means will be used in the phase of nonrandom initialization of weight vector performed before the phase of network learning. In many cases it is not necessary to interpolate the output value by radial functions, it is quite sufficient to use one function for a set of data (cluster), whose center is considered to be a center of activation function of a neuron. The values of centroids will be used as initialization values of weight vector . Weights should be located near the global minimum of the error function (1) and the lower number of epochs is supposed to be used for network training. The reason why we decided to use -means is that it is quite simple to implement and in addition to that, in the domain of nonextreme values, it is relatively efficient algorithm. In our experiments, the adaptive version of -means will be used which is defined as follows:(1)random initialization of centroids in the dimension of input vector,(2)introduction of input vector ,(3)determination of the nearest from all centroids to a given input,(4)adaptation of the coordinates of the centroid according to the rule as follows: , where is the nearest cluster to the introduced input and is a learning rate parameter,(5)termination of the algorithm if all inputs were processed or the coordinates of the cluster are not changing anymore.

#### 4. Empirical Research

We chose forex market for our experiments. Our experiment focuses on time series of daily close price of USD/CAD (the data were downloaded from a website http://www.global-view.com/forex-trading-tools/forex-history/) (Canadian dollar versus US dollar), one of major currency pairs, covering a historical period from October 31, 2008, to October 31, 2012 ( daily observations). Due to validation of a model, data were divided into two parts (Figure 7). The first part included 912 observations (from 10/31/2008 to 4/30/2012) and was used for the model training. The second part of data (5/1/2012 to 10/31/2012), counting 132 observations, was used for model validation by making one-day-ahead ex-post forecast. These observations include new data which have not been incorporated into model estimation (parameters of the model were not changing anymore in this phase). The reason for this procedure is the fact that an ANN can become so specialized for the training set that loses flexibility, hence the accuracy in the test set.

We used our own application of RBF neural network implemented in JAVA with one hidden layer according to Cybenko [36]; the feedforward network with one hidden layer is able to approximate any continuous function. For the hidden layer, the radial basis function was used as an activation function as it has been showed that it provides better accuracy than the perceptron network. We estimated part of the RBF model with several adapting algorithms: RBF implemented with a backpropagation algorithm, a genetic algorithm, and combination of -means and backpropagation. As for the backpropagation learning, the learning rate was set to 0.001 to avoid the easy imprisonment in local minimum. The number of epochs for each experiment with backpropagation was set to 5000 as this showed to be a good number for backpropagation convergence. The final results were taken from the best of 5000 epochs and not from the last epoch in order to avoid overfitting of the neural network. -means was used instead of random initialization of weights before they were adapted by backpropagation. Coordinances of clusters were initiated as coordinances of randomly chosen input vector. -means cycle was repeated 5000 times and the learning rate for cluster adaptation was set to 0.001. The number of clusters was set to the number of hidden neurons.

For GA algorithm the following was needed: a method of encoding chromosomes, the fitness function used to calculate the fitness values of chromosomes, the population size, initial population, maximum number of generations, selection method, crossover function, and mutation method. Our implementation of the genetic algorithm we used for weight adaptation is as follows. The chromosome length was set according to the formula: , where is the number of hidden neurons and is the dimension of the input vector. A specific gene of a chromosome was a float value and represented a specific weight in the neural network. The whole chromosome represented weights of the whole neural network. The fitting function for evaluating the chromosomes was the mean square error function (MSE). The chromosome (individual) with the best MSE was automatically transferred into the next generation. The other individuals of the next generation were chosen as follows: by tournament selection 100 individuals were randomly chosen from the population. The fittest of them was then chosen as a parent. The second parent was chosen in the same way. The new individual was then created by crossover operation. If the generated value from <0, 1 was lower than 0.5 the weight of the first parent at the specific position was assigned to the new individual. Otherwise, the new individual received the weight of the second parent. The mutation rate was set to 0.01. If performed, the specific gene (weight) of a chromosome was changed to a random value. The size of the population and the number of generations for the genetic algorithm were set accordingly to the settings of backpropagation. Based on some experiments, we used the size of the population that equaled 1000 and the number of generations was set to 10.

When the best configuration of the RBF network was found, the RBF error was then modelled in order to minimize the total error of the model. Using moving average, the forecast of the future error of the RBF was counted as an average of last network errors. We used only simple moving average: the weights of the previous network errors had the same weight. To find out the optimal number of the parameters of moving average tool, we used various numbers of previous errors for counting the future (average) value of RBF error.

The numerical characteristic for assessing models called mean squared error (MSE) was used:where is the forecasting horizon and is the total number of predictions for the horizon over the forecast period.

In order to make a comparison with standard statistical models, we also performed the empirical Box-Jenkins analysis [37] in order to compare our suggested model with standard statistical model (for details of Box-Jenkins analysis see the appendix). Box-Jenkins analysis focused on the original and differentiated series of daily observations of USD/CAD currency pair covering a historical period from October 31, 2008, to October 31, 2012. The data, as stated above, was downloaded from the following website: http://www.global-view.com/forex-trading-tools/forex-history/. The best results for out-of-sample prediction were achieved with EGARCH model with Gaussian error distribution. Therefore, this model was compared with the neural network models and our suggested model. The volatility of this model is defined as follows:

#### 5. Results and Discussion

The reason why the prediction qualities were applied on the validation set (ex-ante predictions) was the fact that an ANN can become so specialized for the training set that could lose accuracy in the test set. Therefore, the estimation of all models was only based on 912 observations, in order to make further comparisons with the predictions of the 132 remaining observations. In this paper, we only used one-step-ahead forecast: that is, horizon of predictions was equal to one day. In order to eliminate deformation of our results by a single replication we used a procedure applied in Heider et al. [38]; that is, experiment for every model configuration was performed twelve times, the best and worst results were eliminated, and from the rest the mean and standard deviations were counted. The result of a given model is from the best neuron configuration (in every model we tested number of hidden neurons from 3 to 10 to find the best output results of the network).

In Table 1 (RBF network, one autoregressive input), we see that network with BP achieved the best results when having 4 neurons in the hidden layer (see also Figure 1). On the other hand, the advanced methods for network learning (-means + BP, GA) achieved the best results with 4 (GA), respectively, 9 neurons (-means + BP). However, when using these advanced methods the number of hidden neurons seemed to not play an important role as the results were comparable. Following from that one can deduce that for remembering the relationships in this time series it is enough to use smaller number of hidden neurons (three or four). When looking at the results of the standard BP, the reason for increasing the error with the higher number of neurons is the fact that the more of the neurons the longer time for the weights adaptions of the network.