Frontiers in Data-Driven Methods for Understanding, Prediction, and Control of Complex Systems 2021View this Special Issue
A Comparative Analysis of Data-Driven Empirical and Artificial Intelligence Models for Estimating Infiltration Rates
Infiltration is a vital phenomenon in the water cycle, and consequently, estimation of infiltration rate is important for many hydrologic studies. In the present paper, different data-driven models including Multiple Linear Regression (MLR), Generalized Reduced Gradient (GRG), two Artificial Intelligence (AI) techniques (Artificial Neural Network (ANN) and Multigene Genetic Programming (MGGP)), and the hybrid MGGP-GRG have been applied to estimate the infiltration rates. The estimated infiltration rates were compared with those obtained by empirical infiltration models (Horton’s model, Philip’s model, and modified Kostiakov’s model) for the published infiltration data. Among the conventional models considered, Philip’s model provided the best estimates of infiltration rate. It was observed that the application of the hybrid MGGP-GRG model and MGGP improved the estimates of infiltration rates as compared to conventional infiltration model, while ANN provided the best prediction of infiltration rates. To be more specific, the application of ANN and the hybrid MGGP-GRG reduced the sum of square of errors by 97.86% and 81.53%, respectively. Finally, based on the comparative analysis, implementation of AI-based models, as a more accurate alternative, is suggested for estimating infiltration rates in hydrological models.
Infiltration can be defined as the process by which water enters the surface of Earth . It leads to the entrance of water into the soil, thereby catering to groundwater recharge and subsurface runoff. In essence, the infiltration phenomenon is among the most crucial processes of water cycle. Furthermore, estimates of infiltration capacity of soil is required in the design of efficient irrigation systems, estimation of evapotranspiration, groundwater recharge, surface runoff, effective rainfall, crop water requirement, and transport of chemicals in surface and subsurface water . As a result, modelling and prediction of infiltration rates is an inevitable part of hydrological modelling. For instance, Morel-Seytoux  reviewed the importance of infiltration in large-scale hydrologic modelling. Furthermore, Šraj et al.  pointed towards the impact of the estimation of infiltration rates on the runoff hydrograph, which plays a vital role in watershed modelling and water management. Similarly, Wen et al.  demonstrated the implication of excessive infiltration on watershed models. Finally, these studies demonstrated why an accurate estimation of time-dependent infiltration is important in hydrological modelling.
Owing to the wide applications of the infiltration rate, its estimation has gained significant attention from researchers. Over the years, various infiltration models have been proposed by the researchers for the estimation of infiltration rates. They include models that have physical, semiempirical, and even empirical formulations. Despite the development of several models, no single model exists that outperforms other ones universally. The suitability of infiltration model for a particular site depends on the type of soil and field conditions . In this regard, many comparative studies have been conducted to assess the suitability of various infiltration models for different soil types under varying field conditions. Mishra et al.  conducted one of the most comprehensive analyses on suitability of infiltration models for different soils. Similarly, the methodology used to model infiltration rate has a significant impact on the estimation of infiltration. Deep and Das  compared various optimization algorithms to estimate the parameters of infiltration models. Nonetheless, the application of different optimization techniques can only move the solution from local optimum parameters towards global optimum parameters, while they cannot increase the flexibility of infiltration models to mimic actual infiltration rates. Haghiabi et al.  employed a dimensionless form of infiltration data to estimate infiltration parameters accurately. However, Zakwan  suggested that such transformation may not necessarily improve the accuracy of infiltration equations. Finally, Chen et al.  utilized genetic algorithm to improve the estimate of Green-Ampt infiltration model under a rainfall condition.
Recent applications of computational techniques in water resource engineering have widened the scope further [11–21]. With the advancement in the computational method and modelling approaches, the application of these approaches has provided a viable alternative towards the estimation of infiltration rates also. Kumar and Sihag  applied Gene Expression Programming (GEP) to model infiltration rates. Moreover, Dewidar et al.  proposed the application of fuzzy logic to estimate the infiltration rates. In addition, Patle et al.  employed a multiple linear regression model to predict time-dependent infiltration values based on several soil properties such as bulk density, silt, sand percentage, and moisture content. Furthermore, Sihag et al.  exploited the support vector machine (SVM) for modelling infiltration rates in sandy soil. Also, Pahlavan-Rad et al.  compared the performance of Multiple Linear Regression (MLR) and Random Forest Tree in depicting the spatial variation of infiltration rates and reported the superiority of Random Forest Tree over MLR. Recently, Sepahvand et al.  utilized several data-driven models to predict infiltration rates. Their investigation revealed the superiority of neural networks over other data-driven techniques such as model tree, Gaussian process, and regression analysis. According to the recent studies, considering time as the exclusive state variable in empirical models should be revisited in favour of better infiltration predictions, while the AI-based models were found to have a better performance in comparison with the conventional infiltration models. Therefore, despite previous efforts in improving the estimation of infiltration rates, further studies are needed to explore these issues.
The present study aims to compare the performances of different infiltration methods. Additionally, it attempts to assess the capability MGGP and of the novel hybrid MGGP-GRG to model the infiltration process. In a bid to seek for a better time-dependent infiltration model, the performances of the MGGP-based models were compared with those of the conventional models, regression techniques, and commonly used neural network.
2. Materials and Methods
In the present study, the infiltration data reported by Sihag et al.  were utilized. The data were divided into training and testing data sets. To be more precise, 75% of the data were used for training, while the rest of the data were exploited to test the obtained results. Table 1 summarises the data sets used in the present paper.
Figure 1 shows the observed infiltration data at the same time duration. Also, it illustrates that the infiltration rate may be dependent on other factors (soil properties, such as bulk density and sand percentage) apart from time. The infiltration data set, which was obtained from the literature , belongs to the infiltration observations carried out at Davood Rashid and Honam regions in Lorestan Province and the Kelat region in Ilam Province in Western Iran.
2.2. Conventional Infiltration Models
There are a number of infiltration models available in the literature. Brief description of some of the commonly used infiltration models considered in the present study is as follows.(1)Horton’s Model
Horton  proposed an empirical equation, which is presented in the following equation, for exponential decay of the infiltration rate after analysing several infiltrometer data sets:where f is the infiltration capacity at any time t from the start; fc is the final or ultimate infiltration capacity occurring at t = tc; f0 is the initial infiltration capacity at time t = 0; and K is Horton’s decay coefficient. (2) Philip’s Model
Philip  proposed an infinite series solution of Richard’s equation to drive a relationship between the cumulative infiltration (F) and soil properties. It is presented in the following equation:
By differentiating the above equation, the infiltration rate may be represented as (3) Modified Kostiakov’s Model
Kostiakov  observed the temporal variation of infiltration into soil and proposed a time-dependent infiltration model, invariantly known as Kostiakov’s model. The major limitation of Kostiakov’s model is that it approached to zero final infiltration rates rather than toward constant final infiltration rates and infinite infiltration rates at the start. Smith  modified Kostiakov’s  equation to include the constant term fi. The modified version is shown in the following equation:
The parameters of different infiltration models were obtained by minimizing the sum of square of errors using a nonlinear optimization tool. Thus, the objective function becomeswhere fobs is the observed infiltration rate and fest is the estimated infiltration rate at any time t.
2.3. Multiple Linear Regression
MLR has been widely used in water resource engineering [17, 33]. It has also been applied to estimate the infiltration rate . In accordance with MLR, infiltration rate can be expressed aswhere , ,, and are coefficients, is the infiltration rate in cm/min; is time in minutes; is the percentage of sand; is the density in g/cm3.
2.4. Generalized Reduced Gradient (GRG)
GRG is a gradient-based nonlinear optimization technique . Earlier, Zakwan et al.  and Muzzammil et al.  suggested that GRG technique is superior to the conventional graphical method for estimating infiltration parameters and rating curve parameters. In accordance with GRG, the infiltration rate can be expressed aswhere ,, , and are coefficients.
In the present study, GRG solver embedded in Microsoft Excel was used to estimate the infiltration rate based on minimizing the sum of square of errors. Detailed explanation on working of GRG technique is available in the literature [17, 36].
2.5. Artificial Neural Network
ANN is one well-documented AI model. It has been used for solving various problems in water resources and hydrological modelling [37, 38]. Generally, ANN has a few layers, whose neurons store data. The neurons in each layer (input, hidden, and output layers) are connected with neurons in the previous and next layers, whereas there is no connection between neurons in a typical layer . The flexible architecture of ANN basically facilitates the estimation of a relationship between input and output data . In this study, a feed-forward ANN was exploited to predict the rate of the infiltration. The controlling parameters of ANN were set as those used in the previous studies .
2.6. Multigene Genetic Programming
MGGP is a modified version of genetic programming (GP), which is classified as an AI technique . Not only does it utilize genetic algorithm as its search engine but also it works as a flexible estimator without the need to know the shape of a prediction model under investigation . In essence, MGGP follows a similar solving approach as GP using a tree-like structure, while it enables the use of more than one gene, i.e., tree, in each individual. This characteristic benefits MGGP in the light of developing estimation models when the relation among involved variables is complicated to study. As a result, a typical MGGP solution consists of a set of equations, each associated with one gene, which is algebraically summed up using weighting coefficients. These coefficients are calibrated in MGGP, while a term invariantly called as bias is also added to the final solution. The terms comprising the final solution of MGGP help in improving its flexibility in capturing the relationship between input and output data.
In this study, an open-access code of MGGP was exploited. This code was adopted form the literature , while it was used in previous studies for other purposes . It minimizes the root mean square of errors between the estimated and observed values of the normalized infiltration rates. Additionally, the MGGP parameters were selected from previous studies [20, 43]. Since each run of MGGP may result in a unique equation, more than 100 runs of MGGP were considered for developing the relation between the infiltration rates and other variables involved. The common number of MGGP runs in the literature is 50 [20, 45], while the double number of runs, i.e., 100, was taken into account to make sure that the best relation was achieved.
2.7. Hybrid MGGP-GRG Technique
The hybrid MGGP-GRG was first proposed for developing stage-discharge relationships in the literature . In this technique, MGGP and GRG are used in two successive steps to find the best-fit model. Figure 2 depicts the flowchart of the hybrid MGGP-GRG for estimating infiltration rates. As shown, MGGP is initially operated to search for the best-fit form of equation to the data, while the GRG technique is utilized to optimize the coefficients of the equation obtained by MGGP. Hence, this hybrid technique not only benefits from the powerful capability of MGGP for seeking an accurate prediction model, but also uses GRG to enhance the performance of the estimation model.
2.8. Performance Evaluation Criteria
The performance of infiltration models and soft computing techniques was compared based on several criteria, which are presented in the following equations [28, 46]:where fobs is the observed infiltration rate, and are the maximum and minimum observed infiltration rates, fest is the estimated infiltration rate at any time, and is the mean of the observed infiltration capacity. Nash criterion has been widely used as an indicator for goodness of fit, while its value ranges from 0.0 to 1.0. The higher values of NSE indicate a better agreement between measured and estimated data. Similarly, WI values close to unity represent the best-fitted model. However, SSE, NRMSE, MAE, MARE, and MXARE should be as low as possible for the model with highest accuracy.
2.9. Sensitivity Analysis
In a bid to determine how much the results achieved by a typical model are sensitive to each input parameter, a sensitivity analysis can be conducted . In this study, the parentage of the sensitivity analysis (SA) of the infiltration rate in respect of each input parameter (time, sand percentage, and density), which were selected based on Sihag et al.’s  study, is computed using where and are the minimum and maximum infiltration rate determined by considering the variation of the input parameter () when each one of other input parameters are set as their average values. The more the SA percentage for a specific input variable, the higher the model is sensitive to that variable.
2.10. Reliability Analysis
The reliability analysis is basically conducted to investigate the overall consistency of a prediction model. For this analysis, the relative error for each data point is achieved by the estimation model and compared with a threshold. Then, the number of cases, which have an equal or lower relative error than the threshold specified, is divided by the total number of points. Finally, the aforementioned ratio in the percentage would be the reliability metric, which demonstrates how reliable the prediction model performs in accordance with the desirable threshold. In this study, the reliability analysis was carried out for all methods used for predicting the infiltration rate, while the threshold was selected to be 20% based on the literature .
3. Results and Discussion
Accurate estimation of infiltration rate plays a vital role in various aspects of watershed hydrology. The present work focuses on improving the estimates of the infiltration rate through application of different soft computing approaches. The infiltration rates estimated by these techniques were compared with those approximated by the conventional infiltration models (Horton’s model, modified Kostiakov’s model, and Philip’s model). In the conventional infiltration model, the observed infiltration rates and time were used as input data in accordance to model equations to obtain the estimated infiltration rate. On the other hand, in MLR, GRG, ANN, MGGP, and the hybrid MGGP-GRG models, the observed infiltration rates, time, sand percentage, and density were used as the input variables to obtain the estimated infiltration rates.
3.1. Comparison of the Conventional Infiltration Models
Table 2 presents the model parameters obtained in the training phase for the three conventional infiltration models. For the test phase, these parameters were used to estimate the infiltration rate based on equations (1), (3), and (4).
The results of different approaches considered in the present study were compared with respect to four criteria for both train and test data. This comparative analysis is shown in Table 3. In this comparison, the same data divisions were considered for all methods. The metrics used for comparing different infiltration models are given in Table 3. Based on Table 3, it may be observed that the performance of Horton’s model was the worst for both training and testing parts of data. The modified Kostiakov’s model improved the estimates of the infiltration rate by almost 4% and 10% as compared to those of Horton’s model during training and testing, respectively. The performances of Philip’s model and modified Kostiakov’s model were almost comparable.
3.2. Comparison of the Conventional Models with Soft Computing Approaches
A perusal of Table 3 reveals that the technique used to model infiltration rates influences the estimates of the infiltration rate considerably. It can be observed that MLR provides the worst estimates of infiltration rates, which may depict the nonlinear nature of the infiltration process. The conventional models provide slightly better predictions of infiltration rates as compared to those obtained by MLR. Application of GRG solver further improves the estimate of infiltration as equation (7) involves a higher nonlinearity and more number of parameters as compared to equations (1)-(4). Before the application of MGGP and the hybrid MGGP-GRG model, the observed infiltration rates were normalized as , where,, and are the normalized, minimum, and maximum discharges of the ith observation. The normalized infiltration rate obtained from MGGP and the hybrid MGGP-GRG model are presented by the following equations, respectively:where .
Figures 3–10 present the relative error plots obtained from different conventional infiltration models and computational techniques during training and testing. These figures also compare different methods based on MARE and MXARE for both train and test data. Although the relative error plots of the conventional infiltration models and other computational techniques (MLR, GRG, MGGP, and the hybrid MGGP-GRG) followed a similar sequence, the nature of relative error plots of ANN followed a different pattern during both training and testing. It may also be observed from Figures 3–10 that the relative errors achieved by ANN are the least as compared to others. On the other hand, relative errors obtained by Horton’s infiltration model were the highest as compared to others. Furthermore, the AI-based models (ANN, MGGP, and the hybrid MGGP-GRG), which consider three independent variables (t, s, and d) instead of one variable (t), achieved much better MARE and MXARE in comparison with the empirical models during the training and testing phases. According to Figures 3–10, ANN and the hybrid MGGP-GRG resulted in the first and second best MARE and MXARE values, whereas MLR and Horton’s model yielded to the first and second worst MARE and MXARE values for the train and test data.
Figures 11 and 12 depict the comparison between the observed and estimated infiltration rates obtained by the best-fit model (ANN and the hybrid MGGP-GRG) and the worst-fit model (Horton’s model). It may be observed from Figure 11 that the infiltration rates estimated by ANN almost fit the observed data during training phase. On the other hand, the infiltration rates predicted by Horton’s model deviated significantly from the observed data. The performance of the hybrid MGGP-GRG was better than that of Horton’s model but poorer than that of ANN. During the testing phase, the estimates of the hybrid MGGP-GRG and ANN were almost identical as shown in Figure 12. The estimates obtained by Horton’s model during the testing phase were again significantly different from the corresponding observed values. Hence, Figures 11 and 12 obviously demonstrate how much the infiltration estimations can be enhanced by considering other variables involved in the process in addition to time, while they clearly indicate the better performances of AI-based models in comparison with those of the available empirical equations.
Figure 13 depicts the results of the sensitivity analysis, which was conducted for ANN and MGGP. As shown, time has the highest SA percentage (SA = 91.98%) for ANN, which implies that the infiltration rates predicted by ANN are mostly sensitive to time in comparison with other two input variables (sand percentage and density). This achievement is in agreement with the fact that the empirical models (such as Horton’s and modified Kostiakov) used for estimating infiltration rates rely only on time. On the other hand, MGGP-based model, which yielded a lower accuracy for predicting infiltration rates than ANN, was found to be more sensitive to sand percentage than to time. Therefore, as infiltration rates may be affected by time based on the physical background of the problem statement, the results of the sensitivity analysis also indicate that ANN estimated infiltration rates better than the MGGP-based model.
The reliability analysis was carried out for the train and test data separately. The results of this analysis are presented in Figure 14. As shown, ANN achieved the highest percentages of reliability for both the train and test data. Furthermore, the reliability percentages obtained by MGGP and the hybrid MGGP-GRG were higher than those of empirical model, MLR, and GRG. Finally, the reliability analysis conducted in this study reveals the improvement made by the AI models over other data-driven methods available in the literature for predicting infiltration rates.
The structure of the equations developed by the conventional infiltration models, MLR and GRG, are known in advance of applying these methods. On the other hand, ANN, MGGP, and the hybrid MGGP-GRG are highly nonlinear techniques with greater degrees of freedom and complexity and, therefore, provide better estimates of the infiltration rate. However, more precise results are obtained by ANN, MGGP, and the hybrid MGGP-GRG at the expense of higher computational efforts. These machine learning tools require a considerable number of runs, unlike the conventional models and MLR in which a single attempt is sufficient for determining the model output. Based on the comparative analysis conducted in this study, ANN certainly yielded to the best estimates of infiltration rates. However, the estimates obtained from the hybrid MGGP-GRG were also comparable, especially, for the test data. Furthermore, unlike ANN, the hybrid MGGP-GRG model provided explicit equations for predicting infiltration rates, which can be implemented in a typical hydrological modelling or preferred in practice by engineers, which may be counted as an advantage of this AI-based technique.
In the present study, published infiltration data was used to assess the performances of MGGP and the hybrid MGGP-GRG technique in modelling the infiltration rates of soil. The estimated infiltration rates were compared with those obtained by the conventional models (Horton’s model, Philip’s model, and modified Kostiakov’s model). It was observed that application of the hybrid MGGP-GRG and MGGP improved the estimates of infiltration rates as compared to the conventional infiltration model by over 80%. On the other hand, ANN provided the best estimates of infiltration rates. In addition to the accuracy improvement, the application of ANN, MGGP, and the hybrid MGGP-GRG increased the complexity of modelling equations. Future studies may focus on the comparison of the hybrid MGGP-based models with the other machine learning approaches, while applying the explicit infiltration models developed by either MGGP or the hybrid MGGP-GRG in hydrological models is anticipated in favor of assessing their performances in practice.
The data used in this study are available in the related literature.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
M. Šraj, L. Dirnbek, and M. Brilly, “The influence of effective rainfall on modeled runoff hydrograph,” Journal of Hydrology and Hydromechanics, vol. 58, no. 1, pp. 3–14, 2010.View at: Google Scholar
K. Deep and K. N. Das, “Optimization of Infiltration parameters in hydrology,” World Journal of Modelling and Simulation, vol. 4, pp. 120–130, 2008.View at: Google Scholar
A. Haghiabi, J. Abedi-koupai, M. Heidarpour, and J. Mohammadzadeh-habili, “A new method for estimating parameters of Kostiakov and Modified Kostiakov infiltration equations,” World Applied Science Journal, vol. 15, pp. 129–135, 2011.View at: Google Scholar
A. D. Mehr, V. Nourani, E. Kahya, B. Hrnjica, A. M. Sattar, and Z. M. Yaseen, “Genetic programming in water resources engineering: a state-of-the-art review,” Journal of Hydrology, vol. 566, pp. 643–667, 2018.View at: Google Scholar
M. Najafzadeh and G. Oliveto, “More reliable predictions of clear-water scour depth at pile groups by robust artificial intelligence techniques while preserving physical consistency,” Soft Computing, vol. 25, pp. 5723–5746, 2021.View at: Google Scholar
P. Sihag, N. K. Tiwari, and S. Ranjan, “Support vector regression-based modeling of cumulative infiltration of sandy soil,” ISH Journal of Hydraulic Engineering, vol. 26, no. 1, pp. 44–50, 2020.View at: Google Scholar
R. I. Horton, “The interpretation and application of runoff plot experiments with reference to soil erosion problems,” Soil Science Society of America Proceedings, vol. 3, pp. 340–349, 1938.View at: Google Scholar
A. N. Kostiakov, “On the dynamics of the coefficients of water percolation in soils,” in Sixth Commission, pp. 15–21, International Society of Soil Science, Part A, Vienna, AT, USA, 1932.View at: Google Scholar
M. Zakwan, M. Muzzammil, and J. Alam, “Estimation of soil properties using infiltration data,” in Proceedings of the National Conference of Advanced Geotechnological Engineering, pp. 198–201, Aligarh, India, December 2016.View at: Google Scholar
M. Muzzammil, J. Alam, and M. Zakwan, “A spreadsheet approach for prediction of rating curve parameters,” in Hydrologic Modeling. Water Science and Technology Library, V. Singh, S. Yadav, and R. Yadava, Eds., vol. 81, Springer, Singapore, 2018.View at: Google Scholar
A. R. Nawaz, M. Zakwan, I. Khan, and Z. A. Rahim, “Comparative analysis of variants of Muskingum model,” Water and Energy International, vol. 63, no. 7, pp. 64–73, 2020.View at: Google Scholar
D. Searson, GPTIPS: Genetic Programming and Symbolic Regression for MATLAB, 2009, https://www.researchgate.net/publication/277297934_GPTIPS_Genetic_Programming_Symbolic_Regression_for_MATLAB_User_Guide.
A. Garg, K. Tai, V. Vijayaraghavan, and P. M. Singru, “Mathematical modelling of burr height of the drilling process using a statistical-based multi-gene genetic programming approach,” The International Journal of Advanced Manufacturing Technology, vol. 73, no. 1-4, pp. 113–126, 2014.View at: Publisher Site | Google Scholar
F. Saberi-Movahed, M. Najafzadeh, and A. Mehrpooya, “Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: training group method of data handling using extreme learning machine conceptions,” Water Resources Management, vol. 34, no. 2, pp. 529–561, 2020.View at: Publisher Site | Google Scholar