Recent Trends and Techniques in Computing Information Intelligence
View this Special IssueResearch Article  Open Access
Lukas Falat, Dusan Marcek, Maria Durisova, "Intelligent Soft Computing on Forex: Exchange Rates Forecasting with Hybrid Radial Basis Neural Network", The Scientific World Journal, vol. 2016, Article ID 3460293, 15 pages, 2016. https://doi.org/10.1155/2016/3460293
Intelligent Soft Computing on Forex: Exchange Rates Forecasting with Hybrid Radial Basis Neural Network
Abstract
This paper deals with application of quantitative soft computing prediction models into financial area as reliable and accurate prediction models can be very helpful in management decisionmaking process. The authors suggest a new hybrid neural network which is a combination of the standard RBF neural network, a genetic algorithm, and a moving average. The moving average is supposed to enhance the outputs of the network using the error part of the original neural network. Authors test the suggested model on highfrequency time series data of USD/CAD and examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, they perform a comparative statistical outofsample analysis of the tested model with autoregressive models and the standard neural network. They also incorporate genetic algorithm as an optimizing technique for adapting parameters of ANN which is then compared with standard backpropagation and backpropagation combined with means clustering algorithm. Finally, the authors find out that their suggested hybrid neural network is able to produce more accurate forecasts than the standard models and can be helpful in eliminating the risk of making the bad decision in decisionmaking process.
1. Introduction
Techniques of artificial intelligence and machine learning started to apply in time series forecasting. One of the reasons was the study of Bollerslev [1], where he proved the existence of nonlinearity in financial data. First models of machine learning applied into time series forecasting were artificial neural networks (ANNs) [2]. This was due to the fact that the artificial neural network is a universal functional blackbox approximator of nonlinear type [3–5] that is especially helpful in modeling of nonlinear processes having a priori unknown functional relations or system of relations is very complex to describe [6] and they are even able to model chaotic time series [7]. They can be used for nonlinear modeling without knowing the relations between input and output variables. Thanks to this, ANNs have been widely used to perform tasks like pattern recognition, classification, or financial predictions [8–11]. Following the theoretical knowledge of perceptron neural network published by McCulloch and Pitts [12] and Minsky and Papert [13], nowadays, it is mainly radial basis function (RBF) network [14, 15] that has been used as it showed to be better approximator than the basic perceptron network [16–18].
In this work we extend the standard RBF model by using moving average for modeling the errors of RBF network. We chose the RBF neural network for our exchange rates forecasting experiment because according to some studies [19] ANNs have the biggest potential in predicting financial time series. In addition, Hill et al. [20] showed that ANNs work best in connection with highfrequentional financial data. Moreover, we will combine the standard ANN with EC technique called genetic algorithms. As, according to some scientists [21], the use of technical analysis tools can lead to efficient profitability on the market, we decided to combine our customized RBF network with moving averages [22]. We will use the simple moving average to model the error part of the RBF network as we supposed it could enhance the prediction outputs of the model.
Applying the prediction analysis, the forecasting ability of this nonlinear model will be compared and contrasted with a standard neural network and an autoregressive (AR) model with GARCH errors to determine the best model parameters for this currency pair forecasting problem. We will provide outofsample evidence since it focuses directly on predictability as it is important to avoid insample overfitting for this type of nonlinear models [23].
The soft computing application we suggest is novel in two ways; we use the standard neural network hybridized with simple moving averages to form a whole new hybrid model. Except for the standard algorithm for training the neural network, we also use other (advanced) technicques.
2. Suggested HybridRBF Neural Network Combined with Moving Average
Hybrid models have become popular in the field of financial forecasting in recent years. Since studies from Yang [24] or Clemen [25] theoretically proved that a combination of multiple models can produce better results, we will also use the combined model of customized RBF neural network (supplemented by genetic algorithms for weights adaptation) and simple moving average tool for modeling the error part of the RBF. We eliminate the error of the neural network by modeling the residuals of RBF.
Let be a function defined as F: which is a representation assigning one value to dimensional input in a given time period . Let be a restriction of defined as , where is a complement of to . Then, the hybrid neural network model of RBF and SMA(q) is defined aswhere with
The necessary condition is that the model must be adapted to approximate the unknown function ; that is, the model must fulfill the condition that the difference between estimated output produced by the model and the original value is minimal.
, , , denote the ANN parameters; is the input vector of the dimension ; and are the parameters of the network (also called synapses or weights) and are used for the interconnection of the neural network. The input vector forms the input layer of the network; are weights going from the input layer to the hidden layer that is formed by hidden neurons. In the RBF network, the radial basis function of Gaussian type instead of a sigmoid function is used for activating neurons in hidden layer of a perceptron network. The Gaussian function for activating neurons is for th hidden neuron defined as , , where is the variance of th neuron and is the potential of the neuron. Furthermore, , are weights between th hidden neuron and the output layer that is represented by just one neuron (the network output). Activated neurons are weighted by weight vector in order to get the output of the network counted as .
3. Methodology
The neural network we used for this research was RBF which is one of the most frequently used networks for regression. RBF has been widely used to capture a variety of nonlinear patterns (see [26]) thanks to their universal approximation properties (see [27]).
In order to optimize the outputs of the network and to maximize the accuracy of the forecasts we had to optimize parameters of ANN. The most popular method for learning in multilayer networks is called backpropagation. It was first invented by Bryson and Ho [28]. But there are some drawbacks to backpropagation. One of them is the “scaling problem.” Backpropagation works well on simple training problems. However, as the problem complexity increases (due to increased dimensionality and/or greater complexity of the data), the performance of backpropagation falls off rapidly [29]. Furthermore, the convergence of this algorithm is slow and it generally converges to any local minimum on the error surface, since stochastic gradient descent exists on a surface which is not flat. So the gradient method does not guarantee to find optimal values of parameters and imprisonment in local minimum is quite possible.
As genetic algorithms have become a popular optimization tool in various areas, in our implementation of ANN, backpropagation will be substituted by the GA as an alternative learning technique in the process of weights adaptation. Genetic algorithms (GA), which are EC algorithms for optimization and machine learning, are stochastic search techniques that guide a population of solutions towards an optimum using the principles of evolution and natural genetics [30]. Adopted from biological systems, genetic algorithms are based loosely on several features of biological evolution [31]. In order to work properly, they require five components [32], that is, a way of encoding solutions to the problem on chromosomes, an evaluation function which returns a rating for each chromosome given to it, a way of initializing the population of chromosomes, operators that may be applied to parents when they reproduce to alter their genetic composition, parameter settings for the algorithm, the operators, and so forth. GA are also characterized by basic genetic operators which include reproduction, crossover, and mutation [33]. Given these genetic operators and five components stated above, a genetic algorithm operates according to the following steps stated in [29]. When the components of the GA are chosen appropriately, the reproduction process will continually generate better children from good parents; the algorithm can produce populations of better and better individuals, converging finally on results close to a global optimum. Additionally, GA can efficiently search large and complex (i.e., possessing many local optima) spaces to find nearly global optima [29]. Also, GA should not have the same problem with scaling as backpropagation. One reason for this is that it generally improves the current best candidate monotonically. It does this by keeping the current best individual as part of their population while they search for better candidates.
In addition, as Kohonen [34] demonstrated that nonhierarchical clustering algorithms used with artificial neural networks can cause better results of ANN, unsupervised learning technique will be used together with RBF in order to find out whether this combination can produce the effective improvement of this network in the domain of financial time series. We will combine RBF with the standard unsupervised technique called means (see [35]). means algorithm, which belongs to a group of unsupervised learning methods, is a nonhierarchical exclusive clustering method based on the relocation principle. The most common type of characteristic function is location clustering. And the most common distance function is Euclidean.
The means will be used in the phase of nonrandom initialization of weight vector performed before the phase of network learning. In many cases it is not necessary to interpolate the output value by radial functions, it is quite sufficient to use one function for a set of data (cluster), whose center is considered to be a center of activation function of a neuron. The values of centroids will be used as initialization values of weight vector . Weights should be located near the global minimum of the error function (1) and the lower number of epochs is supposed to be used for network training. The reason why we decided to use means is that it is quite simple to implement and in addition to that, in the domain of nonextreme values, it is relatively efficient algorithm. In our experiments, the adaptive version of means will be used which is defined as follows:(1)random initialization of centroids in the dimension of input vector,(2)introduction of input vector ,(3)determination of the nearest from all centroids to a given input,(4)adaptation of the coordinates of the centroid according to the rule as follows: , where is the nearest cluster to the introduced input and is a learning rate parameter,(5)termination of the algorithm if all inputs were processed or the coordinates of the cluster are not changing anymore.
4. Empirical Research
We chose forex market for our experiments. Our experiment focuses on time series of daily close price of USD/CAD (the data were downloaded from a website http://www.globalview.com/forextradingtools/forexhistory/) (Canadian dollar versus US dollar), one of major currency pairs, covering a historical period from October 31, 2008, to October 31, 2012 ( daily observations). Due to validation of a model, data were divided into two parts (Figure 7). The first part included 912 observations (from 10/31/2008 to 4/30/2012) and was used for the model training. The second part of data (5/1/2012 to 10/31/2012), counting 132 observations, was used for model validation by making onedayahead expost forecast. These observations include new data which have not been incorporated into model estimation (parameters of the model were not changing anymore in this phase). The reason for this procedure is the fact that an ANN can become so specialized for the training set that loses flexibility, hence the accuracy in the test set.
We used our own application of RBF neural network implemented in JAVA with one hidden layer according to Cybenko [36]; the feedforward network with one hidden layer is able to approximate any continuous function. For the hidden layer, the radial basis function was used as an activation function as it has been showed that it provides better accuracy than the perceptron network. We estimated part of the RBF model with several adapting algorithms: RBF implemented with a backpropagation algorithm, a genetic algorithm, and combination of means and backpropagation. As for the backpropagation learning, the learning rate was set to 0.001 to avoid the easy imprisonment in local minimum. The number of epochs for each experiment with backpropagation was set to 5000 as this showed to be a good number for backpropagation convergence. The final results were taken from the best of 5000 epochs and not from the last epoch in order to avoid overfitting of the neural network. means was used instead of random initialization of weights before they were adapted by backpropagation. Coordinances of clusters were initiated as coordinances of randomly chosen input vector. means cycle was repeated 5000 times and the learning rate for cluster adaptation was set to 0.001. The number of clusters was set to the number of hidden neurons.
For GA algorithm the following was needed: a method of encoding chromosomes, the fitness function used to calculate the fitness values of chromosomes, the population size, initial population, maximum number of generations, selection method, crossover function, and mutation method. Our implementation of the genetic algorithm we used for weight adaptation is as follows. The chromosome length was set according to the formula: , where is the number of hidden neurons and is the dimension of the input vector. A specific gene of a chromosome was a float value and represented a specific weight in the neural network. The whole chromosome represented weights of the whole neural network. The fitting function for evaluating the chromosomes was the mean square error function (MSE). The chromosome (individual) with the best MSE was automatically transferred into the next generation. The other individuals of the next generation were chosen as follows: by tournament selection 100 individuals were randomly chosen from the population. The fittest of them was then chosen as a parent. The second parent was chosen in the same way. The new individual was then created by crossover operation. If the generated value from <0, 1 was lower than 0.5 the weight of the first parent at the specific position was assigned to the new individual. Otherwise, the new individual received the weight of the second parent. The mutation rate was set to 0.01. If performed, the specific gene (weight) of a chromosome was changed to a random value. The size of the population and the number of generations for the genetic algorithm were set accordingly to the settings of backpropagation. Based on some experiments, we used the size of the population that equaled 1000 and the number of generations was set to 10.
When the best configuration of the RBF network was found, the RBF error was then modelled in order to minimize the total error of the model. Using moving average, the forecast of the future error of the RBF was counted as an average of last network errors. We used only simple moving average: the weights of the previous network errors had the same weight. To find out the optimal number of the parameters of moving average tool, we used various numbers of previous errors for counting the future (average) value of RBF error.
The numerical characteristic for assessing models called mean squared error (MSE) was used:where is the forecasting horizon and is the total number of predictions for the horizon over the forecast period.
In order to make a comparison with standard statistical models, we also performed the empirical BoxJenkins analysis [37] in order to compare our suggested model with standard statistical model (for details of BoxJenkins analysis see the appendix). BoxJenkins analysis focused on the original and differentiated series of daily observations of USD/CAD currency pair covering a historical period from October 31, 2008, to October 31, 2012. The data, as stated above, was downloaded from the following website: http://www.globalview.com/forextradingtools/forexhistory/. The best results for outofsample prediction were achieved with EGARCH model with Gaussian error distribution. Therefore, this model was compared with the neural network models and our suggested model. The volatility of this model is defined as follows:
5. Results and Discussion
The reason why the prediction qualities were applied on the validation set (exante predictions) was the fact that an ANN can become so specialized for the training set that could lose accuracy in the test set. Therefore, the estimation of all models was only based on 912 observations, in order to make further comparisons with the predictions of the 132 remaining observations. In this paper, we only used onestepahead forecast: that is, horizon of predictions was equal to one day. In order to eliminate deformation of our results by a single replication we used a procedure applied in Heider et al. [38]; that is, experiment for every model configuration was performed twelve times, the best and worst results were eliminated, and from the rest the mean and standard deviations were counted. The result of a given model is from the best neuron configuration (in every model we tested number of hidden neurons from 3 to 10 to find the best output results of the network).
In Table 1 (RBF network, one autoregressive input), we see that network with BP achieved the best results when having 4 neurons in the hidden layer (see also Figure 1). On the other hand, the advanced methods for network learning (means + BP, GA) achieved the best results with 4 (GA), respectively, 9 neurons (means + BP). However, when using these advanced methods the number of hidden neurons seemed to not play an important role as the results were comparable. Following from that one can deduce that for remembering the relationships in this time series it is enough to use smaller number of hidden neurons (three or four). When looking at the results of the standard BP, the reason for increasing the error with the higher number of neurons is the fact that the more of the neurons the longer time for the weights adaptions of the network.
 
stdev: standard deviation. 
Also, the standard BP was the great weakness of the neural network. The convergence is really slow and it generally converges to any local minimum on the error surface, since stochastic gradient descent exists on a surface which is not flat. So the gradient method does not guarantee to find optimal values of parameters and imprisonment in local minimum is quite possible. Another drawback to backpropagation is the “scaling problem.” Backpropagation works well on simple training problems. However, as the problem complexity increases (due to increased dimensionality and/or greater complexity of the data), the performance of backpropagation falls off rapidly.
Due to these disadvantages of BP, we tested other methods for network adaptation. No surprise that the RBF network combined with means or GA for weights adaptation provided significantly better results than the original RBF (see Table 1). Moreover, besides lower mean MSE, another advantage of using genetic algorithm or means upgrade is the consistency of predictions, that is, standard deviation of performed experiments at the same network configuration (see Figure 2). The standard deviation of these methods is uncomparably lower than the standard deviation when using the standard backpropagation (see Table 1 and Figure 2).
The biggest strength of means is the speed of convergence of the network. Without means, it took considerably longer time to achieve the minimum. However, when the means was used to set the weights of the network before backpropagation, the time for reaching the minimum was much shorter. The advantage of this combination is that lower number of epochs is supposed to be used for network training. Moreover, means is quite simple to implement. However, one must bear in mind that means is a relatively efficient algorithm only in the domain of nonextreme values.
We tested also GA in weights adaptation and we found out that the convergence is also considerably faster than at backpropagation and therefore it is no surprise that sometimes the network converged only after 5 generations. In addition to that, genetic algorithm does not have the same problem with scaling as backpropagation. One reason for this is that it generally improves the current best candidate monotonically. It does this by keeping the current best individual as part of their population while they search for better candidates. Genetic algorithms are generally not bothered by local minima. Also, genetic algorithms are especially capable of handling problems in which the objective function is discontinuous or nondifferentiable, nonconvex, multimodal, or noisy. Since the algorithms operate on a population instead of a single point in the search space, they climb many peaks in parallel and therefore reduce the probability of finding local minima. In other words, a key concept for genetic algorithms is the schemata. A schema is a subset of the fields of a chromosome set to particular values with the other fields left to vary. Therefore, as originally observed in Holland [31], the power of genetic algorithms lies in their ability to implicitly evaluate large numbers of schemata simultaneously and to combine smaller schemata into larger schemata [29]. The disadvantage of using genetic algorithms in the neural network is the fact that it demands quite a lot of parameters to set it up correctly (population size, mutation rate, crossover function, crossover rate, tournament size, fitness function, etc.).
When comparing weights adaptation via GA and means plus backpropagation, the results are almost the same. Even though means provided better results compared to GA, the differences are not very large. However, GA has a bigger potential to perform even better forecasts as there are more parameters needed to be optimized. Backpropagation, even though it is used with means, seemed to reach its global minimum even with the higher number of epochs (we tested backpropagation up to 10000 cycles) and the results were almost the same.
For assessing our new hybrid neural network model we used the same strategy as for the standard ANN. For the value of parameter of the moving average, we tested the values from one to one hundred and we experimentally found out the best value for the tested data (for the majority of testing procedures the optimal value of moving average parameter was 44). Finally, just like for the standard RBF, from the best ten out of twelve experiments, the mean and standard deviations of the best results of suggested hybrid (having the optimal value of MA parameter) were counted. For every number of hidden neurons tested, the results are stated in Table 1 which contains the results of outofsample predictions provided by the different models and optimization techniques, respectively. The illustrated results from one testing procedure are shown in Table 2 (it is important to note that the final results presented in Table 1 are made as the mean and standard deviation of ten procedures like the one in Table 2).

We also performed the predictive comparison with standard RBF network as well as the statistical ARIMA and GARCH model in order to show the prediction power of our suggested model. Table 4 states the final results of the numerical comparison of tested models. The standard RBF provided the best outputs when combined with means and backpropagation algorithms. The error of prediction at this network was a little bit lower compared to statistical model; however, these two models provided almost the same results. Nonetheless, the suggested hybrid neural network model provided much better forecasts compared to these two models. Comparing the numerical (see Table 3) as well as graphical results (see Figures 3, 4, and 5), the hybrid improved the prediction power of the standard RBF considerably. We can state that the application of our suggested new hybrid neural network model into the domain of exchange rates provides significantly better results than the standard RBF neural network as well as statistical models.

 
: mean squared error; : standard deviation. 
6. Conclusion
Quantitative methods are excellent tool in decisionmaking process as they rely on facts, numbers, and accurate mathematical methods and models. The most used approach, which has been used for many years, is a statistical approach. Statistical methods are verified methods which have been used in forecasting process for many years. As for computing intelligent technologies, they are getting more and more popular nowadays. The main representative neural networks are based on mathematical model of human neuron and therefore it does not have to fulfil any initial assumptions like statistical models. In this paper we tested the predictive power of neural networks in the domain of exchange rates. We suggested a new hybrid network model combined with moving averages. We used USD/CAD data which was later divided into training set and validation due to model checking. We also performed the tests with the statistical model. We also used other algorithms in the neural network training process; we combined RBF with an unsupervised learning method called means and GA into the RBF. The reason for incorporating other algorithms into the network was that the BP is considered a weakness of the RBF. Both of these upgrades showed to be helpful in the process of creating better forecasts and should be definitely used instead of the standard BP.
By performing experiments we can deduce that the models of ANN are relatively fast, they are able to generalize, and in addition to that it is not necessary to know all the relationships of the system. Thanks to that, ANN modeling is enabled to people who are not able to identify relations between the variables of the model by using BoxJenkins, GARCH, or any other methodology. Moreover, in this work we also suggested a new hybrid neural network model. The reason for this was to improve the prediction accuracy of our customized standard neural network. As for the prediction results of our hybrid, we performed experiments to find out that our suggested RBFMA hybrid neural network has a significant predictive superiority over the statistical model as well as standard neural network models. On the validation set the tested hybrid model proved excellent results and according to MSE errors on the validation set, it was by far the best model of all tested models. In our experiments its numerical characteristics always overcame individual models (ANN, statistical model); the improvements ranged from about 18 per cent to more than 89 per cent. Our hybrid neural network model showed to be a great improvement of the standard RBF neural network as we experimentally clearly proved that for the USD/CAD this hybrid model provided significantly better forecasts than the standard model of the RBF neural network and as the statistical model and hence there was a clear benefit of better onedayahead forecasts.
Despite the fact that neural networks and soft computing techniques are a minor approach used in decision process of business forecasting, it is definitely an attractive alternative to traditional statistical models. Moreover, following from our empirical findings for outofsample onestepahead forecasts, we believe that our suggested hybrid model has also a great potential in the whole domain of financial forecasting as well as other areas of continuous forecasting.
Appendix
BoxJenkins Statistical Modeling
The empirical BoxJenkins analysis [37] focuses on the original and differentiated series of daily observations of USD/CAD currency pair covering a historical period from October 31, 2008, to October 31, 2012. The reason for this particular study was to perform a comparison between our tested model and the standard statistical model. The data, as stated above, was downloaded from the following website: http://www.globalview.com/forextradingtools/forexhistory/, and the statistical characteristics are in Figure 6.
As stated in the previous part of this paper, data was divided into two parts—the training part and validation part. As the validation part was used for model checking, we only used observations from training set for statistical modeling. For statistical modeling which included model identification, model quantification, and model validation, the Eviews software was used. Some of the advantages of this software include simplicity, user friendliness, or detailed outputs. In addition to that, it also has various versions of GARCH model implemented.
Unit root tests results [39–42] presented in Table 5 show that this series is not stationary as it is characterized by a unit root. In order to stationarize the series, it was differentiated. As seen from Table 5, unit root tests confirmed that the differentiated series was stationary which is a necessary condition in BoxJenkins modeling.
(a)  
 
(b)  
 
(I): model without constant and deterministic trend (5%). (II): model with constant and without deterministic trend (5%). (III): model with constant and deterministic trend (5%). 
By analyzing autocorrelation (ACF) and partial autocorrelation function (PACF) of the differentiated series of USD/CAD (see Figure 8), there were no significant correlation coefficients, so one could deduce that first differences of the original series formed a white noise process. In that case, the original series would have formed random walk process (RWP) as RWP was process. Assuming the returns of the original series formed a random walk process, we selected AR(0) as the basic level model. LjungBox statistics confirmed this assumption and the applicability of AR(0) process as the correlations were statistically not significant.
However, the assumption of normality of residuals of AR(0) returns was rejected (see Table 6). Moreover, the observed asymmetry might have indicated the presence of nonlinearities in the evolution process of returns. This nonlinearity was confirmed by graphical quantiles comparison (versus normal distribution) and a scatter plot of the series which did not appear to be in the form of a regular ellipsoid (see Figures 9 and 10). In addition, BDS test rejected the random walk hypothesis (see Table 7) as the BDS statistic is greater than the critical value at 5%. Therefore, other tests had to be performed in order to correctly model this series.
 
J.B.: JarqueBera statistic and A.D.: AndersonDarling statistic. 
 
The BDS statistic was computed by two methods, with . 
We noted that the residuals (Figure 11) were not characterized by a Gaussian distribution (see Table 6). The asymmetry might have indicated nonlinearities in the residuals. When looking at the graph of residuals (Figure 11) one could observe that the variability of these residuals could be caused by a nonconstant variance. Residual with small value follows another residual with a small value. On the other hand, residual with a large value usually follows a residual with another large value. However, this was not typical for a white noise process. This assumption leads us to think about stochastic model for volatility. The suitability for using stochastic volatility model was also accepted by performing heteroskedasticity test. ARCH test (see Table 6) confirmed that the series was heteroskedastic since the null hypothesis of homoscedasticity was rejected at 5% and so the residuals were characterized by the presence of ARCH effect which is quite a frequent phenomenon at financial time series. Therefore, we applied a stochastic volatility model into the basic model.
We estimated several models of ARCH [43] and GARCH [1], respectively. We estimated several stochastic models; except for the basic GARCH model we estimated GARCH extensions too. For modelling the ARCH model we used the information from the correlogram of squared residuals (Figure 12). For each model we calculated Akaike [44], Schwarz [45] information criteria, and loglikelihood function. It is important to remember that the estimation of different models was only based on 912 insample observations, in order to make exante predictions with remaining 132 observations. We used Marquardt optimization procedure for finding the optimal values of GARCH parameters; initial values of parameters were counted using Ordinary Least Squares (OLS) method and then these values by iterative process consisted of 500 iterations. Convergence rate was set to 0.0001.
In view of Table 8, we find that the information criteria are minimum for GARCH model with GED error distribution. It showed that GARCH is very well applied for this type of time series as well as other financial time series. Regarding the fact that we applied the models on exchange rate data, it was no surprise that the asymmetrical effect was not present. The residuals of the models were characterized by the absence of conditional heteroskedasticity: the ARCHLM statistics are strictly less than the critical value at 5%. In addition, the standardized residuals tested with LjungBox test confirmed that there were no significant coefficients in residuals of these models. Figure 13 states these results for GARCH GED model.
 
GED: generalized error. 
To compare the forecasting performance of the tested models two criteria were used: the mean squared error (MSE) and the mean absolute percentage error (MAPE). We primarily tested the forecasting accuracy on the validation set, that is, outofsample predictions, and we used onestepahead predictions.
The best results for outofsample prediction were achieved with EGARCH model with Gaussian error distribution (Table 9). Therefore, this model will be later compared with the neural network models and our suggested model. This volatility is defined as follows:

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This paper has been written with the support of European Social Fund, project Quality Education with support of Innovative Forms of High Quality Research and International Cooperation, a successful graduate for needs of practice, ITMS Code 26110230090, and project Innovation and Internationalization of Education, instruments to increase the quality of the University of Žilina in the European educational area, ITMS Code 26110230079. Modern Education for the Knowledge Society/Project is funded by EU. This paper has been supported by VEGA Grant Project 1/0942/14: Dynamic Modeling and Soft Techniques in Prediction of Economic Variables. This work has been partially supported by the European Regional Development Fund in the IT4 Innovations Centre of Excellence Project (CZ.1.05/1.1.00/02.0070).
References
 T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986. View at: Publisher Site  Google Scholar  MathSciNet
 H. White, “Economic prediction using neural networks: the case of IBM daily stock returns,” in Proceedings of the 2nd IEEE Annual International Conference on Neural Networks, vol. 2, pp. 451–458, IEEE, San Diego, Calif, USA, July 1988. View at: Publisher Site  Google Scholar
 K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989. View at: Publisher Site  Google Scholar
 K. Hornik, “Some new results on neural network approximation,” Neural Networks, vol. 6, no. 8, pp. 1069–1072, 1993. View at: Publisher Site  Google Scholar
 L. S. Maciel and R. Ballini, “Design a Neural Network for Time Series Financial Forecasting: Accuracy and Robustness Analysis,” 2008, http://www.cse.unr.edu/~harryt/CS773C/Project/89516971PB.pdf. View at: Google Scholar
 G. A. Darbellay and M. Slama, “Forecasting the shortterm demand for electricity: do neural networks stand a better chance?” International Journal of Forecasting, vol. 16, no. 1, pp. 71–83, 2000. View at: Publisher Site  Google Scholar
 G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neural networks: the state of the art,” International Journal of Forecasting, vol. 14, no. 1, pp. 35–62, 1998. View at: Publisher Site  Google Scholar
 J. A. Anderson and E. Rosenfeld, Neurocomputing: Foundations of Research, MIT Press, 1988, A collection of articles summarizing the stateoftheart as of 1988.
 R. HechtNielsen, Neurocomputing, AddisonWesley, Reading, Mass, USA, 1990.
 J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation, Westview Press, 1991. View at: MathSciNet
 C. Hiemstra and J. D. Jones, “Testing for linear and nonlinear Granger causality in the stock pricevolume relation,” The Journal of Finance, vol. 49, no. 5, pp. 1639–1664, 1994. View at: Publisher Site  Google Scholar
 W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115–133, 1943. View at: Google Scholar  MathSciNet
 M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, 1969.
 M. J. L. Orr, “Introduction to radial basis function networks,” Tech. Rep., Center for Cognitive Science, University of Edinburgh, Edinburgh, Scotland, 1996. View at: Google Scholar
 M. D. Buhmann, Radial Basis Functions: Theory and Implementations, Cambridge University Press, 2003. View at: Publisher Site  MathSciNet
 N. K. Ahmed, A. F. Atiya, N. El Gayar, and H. ElShishiny, “An empirical comparison of machine learning models for time series forecasting,” Econometric Reviews, vol. 29, no. 56, pp. 594–621, 2010. View at: Publisher Site  Google Scholar  MathSciNet
 D. Marcek and L. Falat, “Volatility forecasting in financial risk management with statistical models and ARCHRBF neural networks,” Journal of Risk Analysis and Crisis Response, vol. 4, no. 2, pp. 77–95, 2014. View at: Publisher Site  Google Scholar
 D. Marcek and M. Marcek, Neural Networks and Their Applications, EDISZU, Žilina, Slovakia, 2006.
 J. G. de Gooijer and R. J. Hyndman, “25 years of time series forecasting,” International Journal of Forecasting, vol. 22, no. 3, pp. 443–473, 2006. View at: Publisher Site  Google Scholar
 T. Hill, L. Marquez, M. O'Connor, and W. Remus, “Artificial neural network models for forecasting and decision making,” International Journal of Forecasting, vol. 10, no. 1, pp. 5–15, 1994. View at: Publisher Site  Google Scholar
 C. H. O. Park and S. H. Irwin, “The profitability of technical analysis: a review,” AgMAS Project Research Report 200404, University of Illinois at UrbanaChampaign, Champaign, Ill, USA, 2004. View at: Google Scholar
 Y. Chou, Statistical Analysis: With Business and Economic Applications, Quantitative Methods Series, Holt, Rinehart and Winston, 1975.
 R. Dacco and S. Satchell, “Why do regimeswitching models forecast so badly?” Journal of Forecasting, vol. 18, no. 1, pp. 1–16, 1999. View at: Publisher Site  Google Scholar
 Y. Yang, “Combining forecasting procedures: some theoretical results,” Econometric Theory, vol. 20, no. 1, pp. 176–222, 2004. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. T. Clemen, “Combining forecasts: a review and annotated bibliography,” International Journal of Forecasting, vol. 5, no. 4, pp. 559–583, 1989. View at: Publisher Site  Google Scholar
 K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990. View at: Publisher Site  Google Scholar
 M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, 1993. View at: Publisher Site  Google Scholar
 A. E. Bryson and Y. C. Ho, Applied Optimal Control: Optimization, Estimation, and Control, Blaisdell Publishing Company, 1969. View at: MathSciNet
 D. J. Montana and L. Davis, “Training feedforward neural networks using genetic algorithms,” in Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI '89), vol. 1, pp. 762–767, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 1989. View at: Google Scholar
 M. Dharmistha, “Genetic algorithm based weights optimization of artificial neural network,” International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, vol. 1, no. 3, pp. 206–211, 2013. View at: Google Scholar
 J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975. View at: MathSciNet
 L. Davis, Genetic Algorithms and Simulated Annealing, Pitman, London, UK, 1987.
 D. Whitley, Applying Genetic Algorithms to Neural Network Problems, International Neural Network Society, 1988.
 T. Kohonen, SelfOrganizing Maps, Springer, Berlin, Germany, 1995. View at: Publisher Site  MathSciNet
 J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, University of California Press, Berkeley, Calif, USA, 1967. View at: Google Scholar
 G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, HoldenDay, San Francisco, Calif, USA, 1976.
 D. Heider, J. Verheyen, and D. Hoffmann, “Predicting Bevirimat resistance of HIV1 from genotype,” BMC Bioinformatics, vol. 11, article 37, 2010. View at: Publisher Site  Google Scholar
 D. A. Dickey and W. A. Fuller, “Distribution of the estimators for autoregressive time series with a unit root,” Journal of the American Statistical Association, vol. 74, no. 366, pp. 427–431, 1979. View at: Google Scholar  MathSciNet
 P. C. B. Phillips and P. Perron, “Testing for a unit root in time series regression,” Biometrika, vol. 75, no. 2, pp. 335–346, 1988. View at: Publisher Site  Google Scholar  MathSciNet
 D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, and Y. Shin, “Testing the null hypothesis of stationarity against the alternative of a unit root; How sure are we that economic time series have a unit root?” Journal of Econometrics, vol. 54, no. 1–3, pp. 159–178, 1992. View at: Publisher Site  Google Scholar
 G. Elliott, T. J. Rothenberg, and J. H. Stock, “Efficient tests for an autoregressive unit root,” Econometrica, vol. 64, no. 4, pp. 813–836, 1996. View at: Publisher Site  Google Scholar  MathSciNet
 R. F. Engle, “Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation,” Econometrica, vol. 50, no. 4, pp. 987–1008, 1982. View at: Google Scholar
 H. Akaike, “Statistical predictor identification,” Annals of the Institute of Statistical Mathematics, vol. 22, pp. 203–217, 1970. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978. View at: Publisher Site  Google Scholar  MathSciNet
Copyright
Copyright © 2016 Lukas Falat et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.