Mathematical Tools of Soft Computing 2014View this Special Issue
Comparing the Selected Transfer Functions and Local Optimization Methods for Neural Network Flood Runoff Forecast
The presented paper aims to analyze the influence of the selection of transfer function and training algorithms on neural network flood runoff forecast. Nine of the most significant flood events, caused by the extreme rainfall, were selected from 10 years of measurement on small headwater catchment in the Czech Republic, and flood runoff forecast was investigated using the extensive set of multilayer perceptrons with one hidden layer of neurons. The analyzed artificial neural network models with 11 different activation functions in hidden layer were trained using 7 local optimization algorithms. The results show that the Levenberg-Marquardt algorithm was superior compared to the remaining tested local optimization methods. When comparing the 11 nonlinear transfer functions, used in hidden layer neurons, the RootSig function was superior compared to the rest of analyzed activation functions.
In recent three decades, the implementations of various models based on artificial neural networks (ANN) were intensively explored in hydrological engineering. The general reviews of ANNs modeling strategies and applications with the emphases on modeling of hydrological processes are presented in [1–3]. They confirm that the class of multilayer perceptron (MLP) [4, 5] belongs to the most frequently studied ANN’s models in hydrological modeling [6–9].
The MLP forms the nonlinear data driven model. According to its architecture, it is a fully connected feed-forward network, which organizes the processing units (neurons) into the layers and allows the interconnections only between neurons in two following layers. As it was proved by , the MLP is the universal function approximator. This important property has been widely confirmed by many hydrological studies [11–14].
Despite the positive research results of a large number of studies on MLP runoff forecasting, there is a need for clear methodological recommendations of MLP transfer function selection [15, 22–24] combined together with the training method assessment and the implementation of new training method [8, 18, 19, 25].
Main aims of presented paper are to analyze the hourly flood runoff forecast on small headwater catchment with MLP-ANN models, which are based on 12 different MLP’s transfer functions following the work of [15, 24], to compare the 7 local optimization algorithms [5, 17, 19], and finally to evaluate the MLP performance with 4 selected model evaluation measures [26, 27].
2. Material and Methods
The tested runoff prediction using the MLP-ANN models uses the set of rainfall runoff data. The MLP-ANN implementation for runoff forecast generally consists of data preprocessing, model architecture selection, MLP training, and model validation. In this section, we give a very brief description of the MLP-ANN model architecture and tested optimization schemes and datasets.
2.1. MLP-ANN Model
We analyzed the MLP model with one hidden layer. The similar ANN architecture was used in a large number of hydrologically oriented studies [18, 28–31]. The studied MLP models had in total three layers of neurons, the input layer, the hidden layer, and the output layer. As proved by Hornik et al. , this type of artificial neural network with sufficiently a large number of neurons in the second layer can approximate with desired precision any measurable functional relationship.
The implemented MLP-ANN models had a general form where the is a network output, that is, flood runoff forecast for given time interval, is network input for input layer neuron , is the number of MLP inputs, the is the weight of input to hidden layer neuron, is the activation function constant for all hidden layer neurons, is the number of hidden neurons, is the weight for output from hidden neuron , and , are neuron biases [2–4, 18, 25, 31].
2.1.1. MLP-ANN Transfer Functions
The type of activation function together with network architecture influences the generalization of neural network. Imrie et al.  empirically confirmed that the transfer function bounding influences the ANN generalization and hydrological extreme simulations during runoff forecast. Following the work of , we implemented the 12 different types of transfer functions, and 11 of them were tested in hidden neuron layer of analyzed MLP-ANN models. Table 1 provides their list.
The activation functions type combined with specific type of training methods influences the average performance of leaning algorithm and computing time [15, 24]. For example, the Bishop  pointed out that the implementation of hyperbolic function speeds up the training process compared to the use of logistic sigmoid.
2.1.2. MLP-ANN Local Optimization Methods
All tested gradient local search methods (except BP_regul) minimized the error function represented as a the sum of square of residuals and the residuals were defined as differences between observed and computed flood runoff.
The two first order local training methods are represented by the standard backpropagation and backpropagation with regularization term. Both backpropagation methods implement the following modification: constant learning rate and momentum parameter. The BP_regul used the regularization term, which penalizes the size of estimated weights, and the error function is defined as where the is a total number of MLP-ANN weights . The hyperparameters and were constant within the standard backpropagation with the regularization term [4, 16].
The scaled conjugate gradient methods are built together with safe line search based on golden section search combined with bracketing the minima [33, 34]. The implementation enables the restarting during the iteration search based on the recommendations of [21, 35]. The restarting controls the prescribed number of iterations or gradient norm. The implementation of scaled conjugate gradient uses four different updating schemes in detail described by [19, 36].
All gradient based methods apply the standard backpropagation algorithm for the estimation derivatives of the objective function with respect to weights . The Levenberg-Marquardt methods approximate the Hessian matrix using first order derivatives neglecting the terms with the second order derivatives [4, 17].
2.1.3. The MLP-ANN Performance
We based the evaluation of MLP-ANN model simulations of training, testing, and validation datasets on the following statistics [26, 27, 38]: mean absolute error (MAE) Nash Sutcliffe efficiency (NS) fourth root mean quadrupled error (R4MS4E) persistency index (PI) where the represents the total number of time intervals to be predicted, the is the average of observed flood runoff , and is the time shift describing last observed flood runoff .
2.1.4. The PONS2train
The tested MLP-ANN models were implemented using the PONS2train software application. The PONS2train is software written in C++ programing language, whose main goal is to test MLP models with different architectures. The software application uses the LAPACK, BLAS, and ARMADILLO C++ linear algebra libraries [39–41]. The application is freely distributed upon a request to authors.
The PONS2train has additional features: the weight initialization can be performed using two methods. The first one follows the work of Nguyen and Widrow , while the second one uses random initialization coming from the uniform distribution.
Giustolisi and Laucelli  extensively studied the eight methods for improving the MLP performance and generalization. One of them the early stopping is incorporated in designed application. Following the recommendations of Stäger and Agarwal , the PONS2train also controls the avoiding of the neuron’s saturation.
The software design also allows the comparative analysis of MLP’s architectures with or without bias neurons in layers. The PONS2train also enables the comparison of MLP trained on shuffled and unshuffled dataset. The shuffling of data patterns follows the random permutation algorithm of Durstenfeld .
The MLP datasets are scaled using two methods. Both methods scale the analyses datasets into the interval with arbitrary chosen upper bound . The nonlinear scaling provides the transformed data obtained from original data using exponential transformation where the is a control parameter. The second scaling methods is a linear one.
2.2. The Dataset Description
We explored the MLP-ANN models using the rainfall and runoff time series data obtained from 10-year monitoring in the Modrava catchment 0.17 km2. The experimental watershed was established in 1998 in upper parts of Bohemian Forest National Park. The basin belongs to the set of testbeds designed to monitor the hydrological behavior of headwater forested catchments. The watershed description shows that of Pavlasek et al. .
The forest cover is a clearing with young artificially planted forest combined with an undergrowth of herbs (mainly Calamagrostis villosa, Avenella flexuosa, Scirpus sylvaticus, and Vaccinium myrtillus) and bryophyte (Polytrichastrum formosum, Dicranum scoparium, and Sphagnum girgensohnii). A small part of the catchment (less than 10%) is covered by 40-year-old forest. The bark beetle calamity removed the original forest cover. Catchment bedrock is formed by granite, migmatite, and paragneiss covered by Haplic Podzols with depths of up to 0.9 m. The mean runoff coefficient is 0.2, mean daily runoff 1.2 mm.
The most significant nine rainfall runoff events observed in hourly time step were selected from 10-year measurement. The flood runoff prediction was analyzed via proposed MLP-ANN models. The characteristics of flood events are described in Table 3. All floods events were complemented with the periods of 5 preceding days. The rainfall runoff events were divided into the nonoverlapping training, testing, and validation dataset.
The division of flood events into the datasets was made with respect to the similarity of empirical distribution functions of training, testing, and validation datasets and to their independence. The empirical distribution functions were estimated using the quantile estimation method, which was specifically developed for the description of hydrological time series (for detailed information see ). The selected quantiles of all datasets are shown in Table 4. The quantiles show that the distinctions of the information in training, testing, and validation datasets are not significant.
3. Results and Discussion
We tested MLP-ANN models with 4 MLP architectures; they are different according to the number of hidden layer neurons . For each MLP architecture, we prepared 11 types of MLP-ANN models according to the type of hidden layer activation function (AF) (see Table 1). Each of them was trained with 7 training algorithms (TA) (see Table 2).
All MLP-ANN datasets consisted of all available pairs of four inputs and one output. The inputs were one runoff interval and three rainfall intervals , , and and output was formed from one runoff output for all available time intervals . The total number of training pairs was 1270, the testing input-output datasets were 1221, and validation datasets were 1423.
Although there are suitable methodologies for selection of the proper input vector for MLP model, that is, [48–50], we based our flood forecast on small number of previous rainfall intervals and one previous runoff mainly due to fast hydrological response of analyzed watershed. The datasets were transformed using the nonlinear exponential transformation.
Each training algorithm was repeated 150 times. The random initialization of network weights was performed by the method of . Each optimization multirun used the same values of 150 mutually different initial random vectors of weights, in order to ensure that the comparison of performances of optimization algorithms was based on similar random weights initializations.
3.1. The Benchmark Model
The flood forecast was simulated using the benchmark model based on simple linear model—SLMB. The SLMB parameters were calculated using the ordinary least squares. Table 5 shows results obtained from the simulation of SLMB benchmark model.
Since the benchmark model provides the single simulation and one value for all tested model comparison measures, we compared the results of SLMB with results of the best selected single MLP-ANN models. In model ensemble, we found MLP-ANN models, which were superior compared SLMB.
For example, the model performance based on the PI index shows all MLP-ANN provided models, which were superior compared to SLMB (see the results of Table 6). The highest differences between the best PI values of ANN and PI of SLMB were obtained on MLP-ANN trained using LM algorithm on training dataset (). The LM and PER training algorithms provided models with the highest values of PI on testing and validation datasets (, resp., ).
These conclusions are in agreement with the values of remaining model performance measures—MAE, NS, and R4MS4E (see Table 7). The LM and BP_regul were superior in terms of differences with SLBM according to the MAE and R4MS4E. The LM and PER were superior compared to SLMB for NS values on training, testing, and validation datasets.
The similar results can be found, when comparing the results of SLMB with the best MLP-ANN models organized in terms of different transfer functions. The highest differences of PI values were on training dataset for MLP-ANN with LL transfer function (), for testing dataset on RS transfer function () and for validation dataset on LL transfer function (). These were calculated for MLP-ANN with transfer functions, which were successful in more than 10% of simulations on validation dataset.
Those results were confirmed by the values of MAE, NS, and R4MS4E obtained for the best model of a simulation ensemble. The RS transfer function provided the best results in terms of differences between , , and on training, testing, and validation datasets.
3.2. The Optimization Algorithms
The results of MLP-ANN models were explained through the values of model performance measures, which are shown in Tables 6 and 7. All training computations controlled the neuron’s saturation using the method of Stäger and Agarwal . The parameters of TA (i.e., number of epochs, learning rate, etc.) were selected in such a way that the number of MLP-ANN evaluations was similar in all tested TA.
Table 6 shows the results of persistency index, which was used as a main reference index, since the PI compares the model with last observed information . The best TA according to the number of successful models with was the PER (the scaled conjugate gradient method with Perry updating formula). The highest number of successfully trained models was found on MLP with (see the ntrained = 1181, ntest = 838, and nval = 468 in Table 6).
When comparing the performance of TA according to the best single value of PI (see columns PI_train, PI_test, and PI_val in Table 6) and the average performance of best MLP-ANN models on PI (see columns mPI_train, mPI_test, and mPI_val in Table 6), the Levenberg-Marquardt algorithm was mostly superior compared to all remaining TA, except for three cases, when the PER and BP_regul were better on validation datasets for MLP with on best single value of PI and for average of mPI_val for .
Table 7 displays the results of best models for remaining statistical measures of MLP-ANN models trained on tested TA. Only three algorithms were superior at least for one architecture of MLP and on one dataset. They are LM, PER, and BP_regul. Again, the LM was mostly superior compared to the other tested TA. The differences between results of LM and PER and BP_regul were very small.
The best values of NS were in agreement with values of PI (see, e.g., the PER on MLP with ). The BP_regul was better in terms of the length of residuals for MAE_test on MLP ANN models with . Also when comparing the simulation of peak flow in terms of R4MS4E, the BP_regul was better on MLP with for validation dataset.
Our finding are in agreement with results on runoff forecast of Piotrowski and Napiorkowski , who compared the Levenberg-Marquardt approach even with more robust global optimization schemes, and found that the LM provides comparable results with MLP trained using the selected evolutionary computation methods.
3.3. The Transfer Functions
The results of PI, MAE, NS, and R4MS4E are shown in Tables 8 and 9. The PI has again served as a reference. We trained the MLP with all AF listed in Table 1. Tables 8 and 9 show the results of AF for MLP-ANN models, which were successful in more than 10% of simulations on validation dataset.
When comparing the absolute values of number of MLP-ANN models with , the models with two AF (RS and CLm) were superior compared to MLP models with remaining 9 AFs. The MLP with RS provided the larger number of better models in terms of PI value on 8 datasets, while the MLP with CLm transfer function was successful on 4 datasets.
RS was also the most successful TA on training dataset at MLPs with (note that for the differences in PI between RS and CLm are almost insignificant). The LL also provided good results on training dataset (for all tested values of ) and on validation data for .
The mean performances based on arithmetical means of PI values of best models showed that three AFs were superior compared to remaining 8 AFs (see mPI_train, mPI_test, and mPI_val in Table 8). They were CL, HT, and RS MLP ANN models. Their differences of PI were again very small.
Table 9 shows the averages of MAE, NS, and R4MS4E on set of tested models. The results point out that the RS transfer function provided in summary superior values compared to rest of tested AF. The CLm, HT, and LS activation functions were on some datasets better in terms of mean values of tested statistical measures but the differences between the RS MLP ANN models were again negligible.
When reflecting the results of da S. Gomes et al. , who recommended the CL, CLm, and LL functions on MLP ANN models, we point out the ability of the MLP models with RS to improve the flood runoff forecast.
Our findings on the selection of suitable AF on MLP ANN models recommend that different AF should be tested during the implementation of MLP models for flood runoff forecast.
During the extensive computational test, we trained in total the 46200 models of multilayer perceptron with one hidden layer. The main aim of computational exercise was the evaluation of the impacts of the transfer function selection and the test of selected local optimization schemes on flood runoff forecast.
Using the rainfall runoff data of nine of the most significant flood events, we analyzed the short term runoff forecast on small watershed with fast hydrological response. The developed MLP ANN models were able to predict flood runoff using the records of past rainfall and runoff from the basin.
When comparing the tested MLP ANN models with benchmark simple linear model, the developed MLP models were superior in terms of values of model performance measures compared to the SLMB.
The PONS2Train software application was developed for the purposes of the evaluation of MLP-ANN models with different architectures and for providing the simulations of neural network flood forecast.
When analyzing the 7 different gradient oriented optimization schemes we found that the Levenberg-Marquardt algorithm was superior compared to the tested set of scaled conjugate gradient methods and two first order local optimization schemes.
When analyzing the 11 different transfer functions used in hidden neurons we found that the RootSig function was according to the values of four model performance measures most promising activation function in terms of flood runoff forecast.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
H. R. Maier, A. Jain, G. C. Dandy, and K. P. Sudheer, “Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions,” Environmental Modelling and Software, vol. 25, no. 8, pp. 891–909, 2010.View at: Publisher Site | Google Scholar
C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, New York, NY, USA, 1995.View at: MathSciNet
R. D. Reed and R. J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, Cambridge, Mass, USA, 1998.
D. J. C. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, New York, NY, USA, 2003.View at: MathSciNet
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++: The Art of Scientific Computing, Cambridge University Press, 2002.
T. Masters, Practical Neural Network Recipes in C++, Morgan Kaufmann, 1st edition, 1993.
E. Anderson, Z. Bai, J. Dongarra et al., “Lapack: a portable linear algebra library for high-performance computers,” in Proceedings of the ACM/IEEE conference on Super- computing, pp. 2–11, IEEE Computer Society Press, Los Alamitos, Calif, USA, November 1990.View at: Google Scholar
C. Sanderson, “Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments,” Tech. Rep., NICTA, Sydney, Australia, 2010.View at: Google Scholar
D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of adaptive weights,” in Proceedings of the International Joint Conference on Neural Networks ( IJCNN '90), vol. 1–3, pp. C21–C26, International Neural Network Society, San Diego, Calif, USA, June 1990.View at: Google Scholar
R. Durstenfeld, “Algorithm 235: random permutation,” Communications of the ACM, vol. 7, no. 7, p. 420, 1964.View at: Google Scholar
J. Pavlasek, M. Tesar, P. Maca et al., “Ten years of hydrological monitoring in upland microcatchments in the bohemian forest, Czech Republic,” in Status and Perspectives of Hydrology in Small Basins, pp. 213–219, IAHS, 2010.View at: Google Scholar
R. J. Hyndman and Y. Fan, “Sample quantiles in statistical packages,” American Statistician, vol. 50, no. 4, pp. 361–365, 1996.View at: Google Scholar
R. J. May, G. C. Dandy, H. R. Maier, and J. B. Nixon, “Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems,” Environmental Modelling & Software, vol. 23, no. 10-11, pp. 1289–1299, 2008.View at: Publisher Site | Google Scholar