#### Abstract

Nowadays, drought phenomena increasingly affect large areas of the globe; therefore, the need for a careful and rational management of water resources is becoming more pressing. Considering that most of the world’s unfrozen freshwater reserves are stored in aquifers, the capability of prediction of spring discharges is a crucial issue. An approach based on water balance is often extremely complicated or ineffective. A promising alternative is represented by data-driven approaches. Recently, many hydraulic engineering problems have been addressed by means of advanced models derived from artificial intelligence studies. Three different machine learning algorithms were used for spring discharge forecasting in this comparative study: M5P regression tree, random forest, and support vector regression. The spring of Rasiglia Alzabove, Umbria, Central Italy, was selected as a case study. The machine learning models have proven to be able to provide very encouraging results. M5P provides good short-term predictions of monthly average flow rates (e.g., in predicting average discharge of the spring after 1 month, , %, if a 4-month input is considered), while RF is able to provide accurate medium-term forecasts (e.g., in forecasting average discharge of the spring after 3 months, , %, if a 4-month input is considered). As the time of forecasting advances, the models generally provide less accurate predictions. Moreover, the effectiveness of the models significantly depends on the duration of the period considered for input data. This duration should be close to the aquifer response time, approximately estimated by cross-correlation analysis.

#### 1. Introduction

In recent years, long and frequent droughts have affected many countries in the world. These events require an ever more careful and rational management of water resources. Most of the globe’s unfrozen freshwater reserves are stored in aquifers. Groundwater is generally a renewable resource that shows good quality and resilience to fluctuations. Thus, if properly managed, groundwater could ensure long-term supply in order to meet increasing water demand.

For this purpose, it is of crucial importance to be able to predict the flow rates provided by springs. These represent the transitions from groundwater to surface water and reflect the dynamics of the aquifer, with the whole flow system behind. Moreover, spring influences water bodies into which they discharge. The importance of springs in groundwater research is highlighted in some significant contributions [1, 2]. In-depth studies on springs started only after the concept of sustainability was introduced in the management of water resources [3].

A spring hydrograph is the consequence of several processes governing the transformation of precipitation in the spring recharge area into the single output discharge at the spring. A water balance states that the change rate in water stored in the feeding aquifer is balanced by the rate at which water flows into and out of the aquifer. A quantitative water balance generally has to take the following terms into account: precipitation, infiltration, surface runoff, evapotranspiration, groundwater recharge, soil moisture deficit, spring discharge, lateral inflow to the aquifer, leakage between the aquifer and the underlying aquitard, well pumpage from the aquifer, and change of the storage in the aquifer.

In many cases, the evaluation of the terms of the water balance is very complicated. The complexity of the problem arises from many factors: hydrologic, hydrographic, and hydrogeological features, geologic and geomorphologic characteristics, land use, land cover, water withdrawals, and climatic conditions.

Even more complicated would be to estimate future spring discharges by using a model based on the balance equations. Therefore, simplified approaches are frequently pursued for practical purposes.

Many authors have addressed the problem of correlating the spring discharges to the rainfall through different approaches. Zhang et al. [4] used a lumped-parameter model and least-squares method to simulate temporal variations of discharge from a limestone aquifer in Iowa, USA. Lambrakis et al. [5] applied nonlinear time series analysis and artificial neural network to the study of the regime and the possibility of short-term forecasting of the discharges of the karstic spring of Almyros in Iraklion, Greece. Hu et al. [6] developed a model for simulating spring discharges using ANN and applied the model to Niangziguan Springs, China. Fiorillo and Doglioni [7] presented a study on the relation between rainfall and the discharge from two karst springs in Southern Italy based on cross-correlation analyses. Fan et al. [8] proposed an assembled extreme value statistical model to investigate spring discharge depletion processes under extreme climate change and intense groundwater development. Diodato et al. [9] proposed a lumped climatological model for spring discharge estimation in which groundwater is represented by a single reservoir and spring discharge is described by a single valued function of storage in the reservoir.

Recently, many researchers have investigated the feasibility of addressing hydraulic engineering issues by means of advanced models derived from artificial intelligence studies. Regression Tree models, Ensemble methods, and support vector machines have been increasingly used in solving water engineering problems.

Dibike et al. [10] compared support vector machines (SVMs) and artificial neural networks (ANNs) in rainfall-runoff modelling. Ahmad et al. [11] employed SVMs to estimate soil moisture on the base of remote sensing data. Raghavendra and Deka [12] provided an extensive review of the SVM applications to hydrology problems. Granata et al. [13] carried out a comparative study between support vector regression (SVR) and Stormwater Management Model in the rainfall-runoff modelling in urban drainage. Najafzadeh et al. [14] used SVM and ANFIS for scour prediction in long contractions in waterways. Granata et al. [15] applied SVR and regression trees in order to predict wastewater quality indicators.

Tree models or ensemble methods were implemented to forecast flood events [16], to foresee mean annual flood [17], to predict scour depth due to waves [18], to evaluate sediment yield in rivers [19], to predict maximum scour depth around piers [20], to address wastewater hydraulics issues [21], to forecast local scour depth downstream of sluice gates [22], to evaluate sediment transport [23], and to evaluate the flow discharge in main channels and floodplains [24].

The aim of this study is to assess the ability of a machine-learning algorithm-based approach in predicting average monthly discharge of a spring, when the forecast horizon does not exceed a few months, if few years of monthly flow rate measurements and rainfall data are available. Therefore, regression tree, random forest, and support vector regression were used to build forecasting models and to perform a comparative study. The proposed approach was tested by means of experimental data obtained from the spring of Rasiglia Alzabove, Umbria, Central Italy. Time series data are available from the Regional Agency for Environmental Protection (http://www.arpa.umbria.it). Future monthly average discharges were correlated to the average discharges of the past months and to the cumulative rainfall of the same months.

#### 2. Materials and Methods

##### 2.1. Machine Learning Models: Regression Trees

A regression tree (RT) model (Figure 1) develops a decision tree in order to make predictions [25]. The target variables take real values. In a regression tree, each internal node represents one of the input variables, whereas each leaf corresponds to an assigned value of the target variable and a root node contains all data.

During the growth of a regression tree model, the input data domain is recursively divided into subdomains. The predictions are made in each of them by means of multivariable linear regression models. At the first step of the iterative algorithm, all data are allocated into two branches, considering all the possible split on every field. Subsequently, the development process of the regression tree continues by splitting each branch into smaller partitions, as the system expands. At each stage, the procedure identifies the subdivision in two distinct partitions that minimize the sum of the squared deviations from the mean. This sum can be considered a measure of the “impurity” at a node that is a quantification of the predictive capability of the node. The algorithm continues until the lowest impurity level is obtained or until a stopping rule is met. Usually, a stopping rule is related to the threshold for the minimum impurity variation provided by new splits, the minimum number of units in each node, or the maximum tree depth.

The algorithm here used is commonly known as M5P and is based on Quinlan’s M5 algorithm [26]. The impurity of each node is estimated by the least-squared deviation (LSD), , defined as in which is the number of sample units in node , is the value of the target variable for the unit, and is the mean of the target variable in node .

The split process at each node is carried out based on the following function of the LSD: in which and are the left and right nodes generated by the split , while and are the portions allocated in the left and right child node. Finally, the subdivision that maximizes the value of is adopted.

A regression tree might suffer from overfitting when the model structure is fully developed. Overfitting occurs when a machine learning model has become too attuned to the data on which it was trained and therefore loses its applicability to any other dataset. Therefore, overfitting reduces the tree ability to make predictions, when the model is applied to novel data. To minimize this risk, a *pruning* process is generally carried out to reduce the size of the regression tree by removing the splits that do not bring significant improvements to the forecasting capability. Overfitting could also be prevented by extending the training dataset.

##### 2.2. Random Forest

If the results of different regression trees are combined into a single prediction, an *ensemble method* is obtained. The simplest combination is a weighted average.

A random forest (RF) [27] is obtained by an ensemble of uncorrelated, simple regression trees (Figure 2). Each tree is built using a different bootstrap sample of the data, and each new training set is attained, with replacement, from the original training set. In addition, random forests differ in how the regression trees are built. In a standard tree, each node is divided by referring to the best split among all variables. A random forest, instead, is built by randomly choosing, at each node, a small group of input variables to split on. If the input variables are , a number is identified such that at each node, variables are randomly selected out of and the best split on these is used to subdivide the node. The value of is kept constant during the expansion of the forest. Each tree is expanded as far as possible. No pruning is carried out.

The forest error rate is affected by two factors: the correlation between any two trees and the strength of each single tree. The forest error rate rises if the correlation increases. Moreover, the forest error rate decreases if the strength of the individual trees increases. If is reduced, both the strength and the correlation decrease.

##### 2.3. Support Vector Regression

Different is the approach of support vector machine algorithms [28]. They are supervised learning models that analyse data for classification or regression analysis [29]. If is a training dataset, where is the space of the input patterns (e.g., ), support vector regression (SVR) is aimed at identifying a function that has a maximum deviation from the experimental target values for all the training data. Smaller than errors can be tolerated, while the greater than errors are generally unacceptable (Figure 3). In addition, should be as flat as possible.

Therefore, given a linear function in the form in which , , and is the dot product in , the Euclidean norm has to be minimized, respecting the maximum deviation constraint. This condition leads to a convex optimization problem. However, in many cases, a certain error has to be tolerated. Therefore, slack variables have to be introduced in the constraints of the optimization problem that can be formulated as

The constant affects both the flatness of and the accepted deviations.

The optimization problem stated in (4) is generally solved in its dual formulation, by means of Lagrange multipliers: in which .

The partial derivatives of with respect to the variables must be equal to zero in correspondence with the optimal condition. It follows that the optimization problem in dual form can be stated as

The evaluation of is obtained imposing the Karush-Kuhn-Tucker conditions [30], according to which the product between constraints and dual variables must be equal to zero at the optimal condition.

In order to make the SVR algorithm nonlinear, the training patterns may be preliminarily processed by a function : , where is some feature space. The SVR algorithm only depends on the dot products between the different patterns; therefore, it is possible to use a kernel instead of specifically using the map (Figure 4).

It follows that condition (6) in the optimization problem can be replaced by while constraints expressed by (7) remain unchanged.

In the nonlinear case, the optimization problem requires finding the flattest function in the feature space, not in the input space. The expansion of can be formulated as

In this research, a radial basis function (RBF) was selected as kernel. The RBF has the form

In particular, in the case study, the parameters assume the following values: , , and .

The described algorithms were implemented in a specific code written in MATLAB language. The search for the optimal structure of the models was conducted by means of a trial-and-error iteration procedure. The holdout method was used in the cross-validation process during training. It involves removing a part of the training data and using it to get predictions from the model trained on the rest of the data. The estimation error tells how the model is doing on unseen data or the validation set.

##### 2.4. Case Study

The Menotre River Valley, in correspondence of the Rasiglia town (Umbria Region), is characterized by the Capo Vena spring (elevation of 670 m a.s.l. and average discharge of 700 l/s), Alzabove spring (elevation of 650 m a.s.l. and average discharge of 250 l/s), and the minor spring of Verchiano Aqueduct (elevation of 650 m a.s.l. and average discharge of 45 l/s) [31]. The Alzabove spring, partially collected by the Sella Valle Umbra Sud aqueduct (derived water discharge 125 l/s), represents an ideal site for the application of the proposed methodology, due to the available discharge time series and the favorable hydrogeological setting.

Carbonate deposits of the Umbria-Marche sequence characterize the area [31]. The stratigraphic and tectonic setting influences the groundwater circulation and the emergence point of the Alzabove spring. In wider terms, the Umbria-Marche series can be divided in three hydrogeological complexes consisting in the Corniola and Calcare Massiccio basal complex, the Maiolica complex, and the Scaglia Calcarea complex. These hydrogeological complexes with high permeability and high storing capacity constitute the main regional aquifers.

The complexes of “Marne del Sentino-Rosso Ammonitico-Marne ad Aptici Fm.,” “Marne a Fucoidi,” and “Scaglia Cinerea-Marnoso Arenacea” have an aquitard function and divide the groundwater circulation of the calcareous complexes. Groundwater of the Alzabove spring takes origin from the contact between Maiolica Fm. (lower Cretaceous) and Marne a Fucoidi Fm. (middle Cretaceous), via a contact of 500 m length, along the right side of Menotre River (Figure 5(a)). The contact between Maiolica Fm. and Marne a Fucoidi Fm. is located along the western limb of the Toricello Mt. anticlinal (Figures 5(a) and 5(b)) belonging to a large fold system. The anticline fold consists in layers of the Maiolica Fm. and constitutes the natural reservoir which feeds the Alzabove spring. Marne a Fucoidi Fm. represents the permeability limit of the aquifer [31–33] (Figures 5(a) and 5(b)).

In this context, the Alzabove spring can be considered an overflow spring (Figure 5(b)), collected via a trench drain, located along the Toricello Mt. and at the contact between Maiolica Fm. and Marne a Fucoidi Fm. From a geochemical point of view, the Alzabove spring is a bicarbonate-calcium water with a TDS of 300 mg/l, a Mg/Ca ratio of 0.12, and isotopic values of ∂34S equal to +7.5‰. Values are different from the water of the Capo Vena spring that has bicarbonate-sulphate-calcium characteristics with a TDS of 500 mg/l, Mg/Ca ratio of 0.25, and isotope value of ∂34S equal to +13‰ [33].

The chemical analysis proves that the Alzabove spring is characterized by low contents of Mg, typical of Maiolica Fm. On the contrary, groundwater circulation of the Capo Vena spring is affected by dolomitic horizons with sulphates. Therefore, the groundwater path of the Alzabove spring is shallower and less influenced by karst phenomena and tectonics, if compared with the Capo Vena reservoir which comprehends the basal portion of the Umbria-Marche sequence (lower Lias).

The identification of the hydrogeological model of the Alzabove spring confirms that it constitutes a useful case for the proposed algorithm training.

##### 2.5. Effectiveness of the Models

The effectiveness of forecasting algorithms was evaluated by the following criteria: the coefficient of determination , the mean absolute error (MAE), the root mean squared error (RMSE), and the relative absolute error (RAE).

The coefficient of determination, , is a measure of the accuracy of the model, because it assesses how well the model replicates observed outcomes and how well it predicts future outcomes. is defined as where is the total number of observed data, is the predicted value for data point , is the measured value for data point , and is the averaged value of the experimental data.

The mean absolute error measures how much the predictions are close to the observed values. It is evaluated by

RMSE is the sample standard deviation of the differences between experimental and predicted values. It is given by

Finally, RAE normalizes the total absolute error dividing it by the total absolute error of the simple predictor. Its definition is

#### 3. Results and Discussion

The input data of the different models are the past monthly average flow rates, , and the cumulative monthly rainfall on the aquifer basin, . A time series of 60 months (Figure 6) was used in the calculations; 80% of the available data was used for the algorithm training. The proposed approach is effective even with data from a single rain gauge; thus, rainfall data were taken from the Norcia rain gauge (Figure 5(c)). They optimally represent the cumulative rainfall on the basin. In addition, preliminary analyses showed that data from Norcia pluviometry are characterized by the highest values of cross-correlation with the flow rates of the Alzabove springs with respect to the other rain gauges located in the recharge basin.

The available time series is not very long, but this is not a limitation as one of the primary objectives of this study is to evaluate the predictive capabilities of the considered models when a few years of experimental observations are available.

In Figure 7 the cross-correlation between cumulative monthly rainfall and average monthly discharge is reported. Obviously, the positive values of the lag have no real interest. The analysis of the cross-correlogram shows that the cross-correlation takes a maximum value for a lag value of about 4 months. The hydrogeological characteristics of several aquifers in Central Italy would suggest a higher lag. Therefore, data on the previous 4, 6, and 8 months were alternatively considered, in order to assess the influence of the input period duration on the model’s response.

Preliminary analysis showed that better results are obtained if the input vector has the same number of flow rate and cumulative rainfall data. Thus, each vector of the input matrix to the three different models is composed as follows: where , 6, 8.

Different models were built to predict the monthly average discharge of the spring after 1 month, , and the average discharge after 2 months, , after 3 months, , and after 4 months, .

Table 1 and Figures 8–19 show a comparison of the results provided by the different models.

With regard to the average discharge of the following month, , all models show adequate predictive capabilities (Table 1) when the input data takes into account the previous 4 months (Figure 8) or the previous 6 months (Figure 9). In these cases, the M5P algorithm, followed by the SVR algorithm, provides the best results. The RF algorithm leads to not fully satisfactory results, particularly for an input based on the previous 4 months, for which . M5P provides slightly better results with an input based on the previous 4 months (, , , and ), while SVR performance is slightly better if an input based on the previous 6 months is considered (, , , and ). If, on the other hand, 8-month input is considered, the results of all models (Figure 10) get significantly worse (Table 1). Again, in this case M5P leads to the best results (, , , and ) while RF provides the least satisfactory ones (, , , and ).

Regarding the forecast of the average flow rate after two months, , all models lead to reliable results (Table 1), both with 4-month input (Figure 11) and 6-month input (Figure 12). M5P and RF show the best performance in terms of coefficient of determination , but RF predictions are characterized by higher values of MAE, RMSE, and RAE. The best predictions of are provided by M5P with a 6-month input (, , , and ).

Considering an 8-month input (Figure 13), it can be noted that the performance of all models significantly declines (Table 1). M5P is characterized by the lowest and the highest errors, while performance of SVR and RF, which are very similar, is much better than that of M5P.

All models show the most significant errors for .

Analyzing the average flow rate after 3 months, , referring to a 4-month input (Figure 14), it can be seen that RF provides the best results in terms of both and MAE, RMSE, and RAE, while M5P provides less accurate predictions and SVR leads to the worse results (Table 1). Considering instead a 6-month input (Figure 15), the accuracy of M5P improves significantly, with increasing from 0.859 to 0.921 and MAE decreasing from 0.0487 m^{3}/s to 0.0349 m^{3}/s, RMSE from 0.0544 m^{3}/s to 0.0405 m^{3}/s, and RAE from 56.43% to 40.40%. RF performance indices remain almost unchanged; therefore, RF continues to be characterized by the higher value of , while error values are now higher than those of M5P. SVR provides less accurate predictions either with a 4-month input or with a 6-month input. In the first case, , , , and ; in the second case, , , , and : even the accuracy of SVR predictions improves by switching from 4-month input to 6-month input.

If, on the other hand, an 8-month input is considered (Figure 16), once again the performance of forecast models significantly declines (Table 1). Only the RF algorithm is characterized by a fairly high value of , equal to 0.855, while SVR provided results affected by excessive errors.

Again, all the models show the most significant errors for .

With regard to the average discharge after 4 months, , the algorithm RF-based model provides the most accurate predictions for all input durations considered (Table 1). Considering a 4-month input (Figure 17), RF is characterized by , , , and RAE = 47.41%. For a 6-month input (Figure 18), instead, , , , and : the performance of the model with two different inputs are very close. Better than expected is the RF performance for an 8-month input: , , , and . Both in the case of 4-month input and 6-month input are performances of M5P and SVR very similar. With reference to the 8-month input (Figure 19), however, SVR provides more accurate predictions than M5P (Table 1).

The error of the models is fairly evenly distributed over the entire flow rate range considered for testing.

It can also be noted that as the forecasting horizon advances, the M5P model provides less accurate predictions. Similar is the behavior of SVR. The accuracy of the RF model, instead, is less reduced as the monthly timeframe advances. While M5P provides very good short-term forecasts, RF is able to provide fairly accurate predictions of average flow rates that will be available after some months.

All the considered models tend to be less effective if the input data relate to an appreciably longer period than the actual aquifer response to rainfall on the basin; therefore, it is appropriate to estimate this time by means of a cross-correlation analysis.

#### 4. Conclusions

The capability of forecasting spring discharges is essential for a careful management and an accurate planning of water resources. In many cases, a prediction of the flow rate that will be available in the future on the basis of the basin water balance is very complicated or impossible. Machine learning models represent a very interesting alternative. These models can be built on the basis only of past discharges and cumulative rainfall.

Three different machine learning algorithms were used and compared in this study: M5P, random forest, and support vector regression. The spring of Rasiglia Alzabove, Umbria, Central Italy, was chosen as a case study.

The considered models have proven to be able to provide encouraging results even if the available time series for training is rather limited. M5P provides very good short-term predictions of monthly average flow rates, while RF is able to provide accurate medium-term forecasts.

As the time of forecasting advances, the models generally lead to less accurate predictions. Moreover, the effectiveness of the models significantly depends on the duration of the period considered for input data. This time should be approximately estimated by means of a cross-correlation analysis, in order to evaluate the actual aquifer response time.

#### Notations

b: | Bias in SVR algorithm |

C: | Cost of error in SVR algorithm |

N: | Number of sample units in the generic node |

P_{−k}: | Cumulative monthly rainfall of k months before |

p_{L}: | Portion of units assigned to the left child node |

p_{R}: | Portion of units assigned to the right child node |

Q_{+k}: | Predicted monthly average flow rate after k months |

Q_{−k}: | Monthly average flow rate of k months before |

R: | Least-squared deviation, within variance for the generic node |

s:_{p} | Generic split during tree model growing |

t: | Generic node of a tree model |

t_{L}: | Left node generated by the generic split s_{p} |

t_{R}: | Right node generated by split s_{p} |

w: | Variable in SVR algorithm |

y:_{i} | Value of the target variable for the i-th unit in the generic node |

y_{m}: | Mean value of the target variable in the generic node |

: | Function used to make the SVR algorithm nonlinear |

α:_{i} | Variable in SVR algorithm |

α:_{i}^{∗} | Variable in SVR algorithm |

ε: | Deviation parameter in SVR algorithm |

: | Function of the least-squared deviation |

η:_{i} | Variable in SVR algorithm |

η_{i}^{∗}: | Variable in SVR algorithm |

: | Slack variable in SVR algorithm |

: | Slack variable in SVR algorithm. |

#### Data Availability

The data used to support the findings of this study were provided by Umbria regional agency for environmental protection. They are freely available online (http://www.arpa.umbria.it/articoli/portata-delle-sorgenti-000).

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.