Modelling Soil Water Retention Using Support Vector Machines with Genetic Algorithm Optimisation
This work presents point pedotransfer function (PTF) models of the soil water retention curve. The developed models allowed for estimation of the soil water content for the specified soil water potentials: –0.98, –3.10, –9.81, –31.02, –491.66, and –1554.78 kPa, based on the following soil characteristics: soil granulometric composition, total porosity, and bulk density. Support Vector Machines (SVM) methodology was used for model development. A new methodology for elaboration of retention function models is proposed. Alternative to previous attempts known from literature, the -SVM method was used for model development and the results were compared with the formerly used the -SVM method. For the purpose of models’ parameters search, genetic algorithms were used as an optimisation framework. A new form of the aim function used for models parameters search is proposed which allowed for development of models with better prediction capabilities. This new aim function avoids overestimation of models which is typically encountered when root mean squared error is used as an aim function. Elaborated models showed good agreement with measured soil water retention data. Achieved coefficients of determination values were in the range 0.67–0.92. Studies demonstrated usability of -SVM methodology together with genetic algorithm optimisation for retention modelling which gave better performing models than other tested approaches.
Soil hydrologic parameters have great impact on soil water transport processes. The soil water retention curve and soil water hydraulic conductivity are required for an appropriate description of soil water phenomena, such as drainage, infiltration, or soil pollutant movement. The retention curve describes the relationship between soil water content and soil water potential and is especially important for hydrological modelling and agronomical practice as it determines soil water availability for plants.
Measurements give strict evaluation of hydraulic properties of soils. Unfortunately, measurement of the soil water retention curve is time consuming and requires specialised equipment. The classical pressure plate extractor technique  may be used to determine the soil water retention curve, or an alternative technique based on the dynamic simultaneous time-domain reflectometry soil water content and pressure head measurements .
Fortunately in many applications the hydraulic parameters can be estimated rather than measured. Pedotransfer functions (PTF) are commonly used in such circumstances  and allow for estimation of the retention curve or hydraulic conductivity based on easily measured soil characteristics. The most widely used soil characteristics for PTF development are granulometric composition, bulk density, and organic matter content.
The soil water retention curve basically may be modelled in two ways: indirectly (parametric PTFs) and directly (also known as point PTFs). The first method evaluates parameters of some models of the soil water retention curve, for instance, the Mualem-van Genuchten function parameters: , , , and ; the latter evaluates water content for a specified number of water potential values. Recently a new method of retention modelling has been developed, which directly estimates soil water content but is not limited in estimations to fixed set of soil water potentials . This so-called pseudocontinuous approach allows for estimation of the soil water content for any potential value in the range from 0 kPa to model dependent minimum value.
There have been numerous attempts to develop PTFs for retention curve, utilising a wide set of mathematical methods. Regression modelling is a widely used tool for PTF model development. Some methods rely simply on granulometric distribution as input parameters , and others also use soil bulk density [7–9]. The mean weight diameter of soil particles, a granulometric composition dependent parameter, is used by some authors  too. Soil content of organic carbon is another widely used parameter [9–11] for development of PTFs. Other models  additionally use soil water content for specified potentials, for example, −33 kPa and −1500 kPa.
Artificial neural networks (ANNs) are another technique often used for developing PTFs. Feed forward or radial basis neural networks allow for estimation of some set of output parameters based on knowledge of input parameters. ANNs were extensively used as a tool for PTF developments [13–17]. Unfortunately, there are some problems specific to artificial neural network modelling such as a tendency to stacking in local minima of mean square error hyperplane during ANN training process  or difficulties with appropriate choice of ANN architecture which causes overfitting of ANN to training data. The partial solution to this problem is based on bootstrap averaging, which averages the predictions of many ANNs trained on randomly modified input data .
Recently another mathematical tool, the Support Vector Machine (SVM), has been used for PTF modelling. This technique resolves typical problems for ANN-based PTFs development. There have been attempts of SVM usage for PTF development in a direct manner  and in parametric form  where Mualem-van Genuchten parameters were estimated. Both of the SVM models were compared with ANN-based counterparts and showed better performance in water retention modelling.
In this work we focused on the development of the point PTFs for estimation of the soil water retention curve using SVM and on some of its methodological aspects. We investigated whether the newer -SVM method, not used before in PTF studies, was applicable and possibly better for this purpose than typically used -SVM method. For the purpose of automated models’ parameters search genetic algorithms were used as an optimisation framework. A new form of the aim function used for models’ parameters search is proposed, which allows for better selection of models’ parameters. As a result more accurate PTF models may be developed.
2. Material and Methods
2.1. Soil Datasets
Soil dataset used in this study was an extract from the Soil Profiles Bank of the Polish Mineral Soils database  and contained 639 soil samples, taken from 290 different soil profiles. Soil samples were collected from three horizons for most soil profiles. Undisturbed samples were collected into the metal cylinders of volume 100 cm3 and diameter 5 cm, and then basic soil parameters were analysed. The following soil parameters were extracted from the soil database for the purposes of this work: soil water content for various six soil water potential values: −0.98 kPa, −3.10 kPa, −9.81 kPa, −31.02 kPa, −491.66 kPa, and −1554.78 kPa; particle size distribution; total porosity; and bulk density. Particle size distribution was determined for the following fractions: clay mm, silt 0.002–0.05 mm, and sand 0.05–1 mm. Basic statistics of soil dataset used in this study are presented in Table 1.
2.2. SVM Methodology
SVM is one of the classes of soft-computing techniques . Originally SVM was developed for solving classification problems; then its usage has been extended to regression-type applications for function estimation . SVMs used for regression modelling estimate one output variable based on a set of input variables. As being supervised learning method SVM uses training dataset for model development. Elaborated model reproduces input-output relationship present in the training dataset and is capable of making estimations based on arbitrary input variables.
From the user’s perspective, SVM model development consists of the following steps:(a)training and testing datasets preparation,(b)SVM model selection (-SVM or -SVM),(c)selection of the kernel function,(d)selection of the SVM model and kernel function parameters,(e)model learning using training dataset,(f)validation of the model against training dataset,(g)validation of developed model against testing dataset.
If one is optimising model’s parameters the steps (d) and (e) may be repeated until best parameters will be found.
The training dataset consists of pairs where many input values are mapped to one output response value. Training data which are used for model development are specific to applications and the training dataset has to be representative for modelled problem. Quality of these data impacts on model generalisation capability and its ability for making accurate predictions.
There are two different types of SVM algorithms used in regression modelling: -SVM and -SVM. The SVM methodology originally introduced by Vapnik is currently known as -SVM or SVM Type 1 models. These models have two adjustable parameters which influence their behaviour: —the so-called penalty parameter—and —insensitivity zone. determines the mutual relationship between the training error and the model complexity. An increase in causes penalising of larger errors, which leads to decreasing of approximation error. Insensitivity zone describes the tolerance for training error in the SVM model: a decrease in leads to strict fitting to training data, which may cause overfitting and decrease the model generalisation properties. SVM models with lower values use a larger number of support vectors.
Support vectors are selected data records taken from the training dataset. Which data records taken from the training dataset are support vectors is decided by SVM algorithm during model training phase. Support vectors are vital part of the developed model as they are used further by the SVM algorithm for making model estimations.
One of the most important properties of SVM regression models is the ability to generalise, which allows for appropriate predictions from previously unseen input data. The technical criterion which stands behind this requisite is a limit set on the number of support vectors to about half of all vectors in the training dataset .
If a SVM model with arbitrarily chosen support vectors percentage is required, it is convenient to use another SVM model formulation known as -SVM or SVM Type 2 models . In this type of SVM models instead of another model parameter is used, which is utilised for internal trading off between the model accuracy for training data and the model complexity (number of support vectors), which influences the model generalisation properties. Two parameters in -SVM regression models are used: and . Formally, the parameter expresses the desired number of SVs and is a lower bound on the fraction of support vectors . In -SVM models, the number of support vectors participating in model formulation is indirectly related to the model parameters and while in -SVM models it is connected with the parameter itself.
Kernel function is an important part of SVM model. It is used internally in the SVM algorithm to map the input parameters to highly dimensional feature space used in the algorithm internal computations. Thanks to nonlinear kernel functions SVM algorithm allows for estimations where dependence between estimated output variable and input variables is highly nonlinear. There are some commonly used kernel functions: linear , polynomial , and radial basis . The most often used kernel function in SVM regression modelling is radial basis kernel function. The linear kernel function is in fact a special case of a polynomial kernel function with fixed parameters: , , and .
The main purpose of SVM model development is to select proper support vectors from training dataset. Proper selection of support vectors has an impact on model performance and the ability for generalisation. The SVM model’s parameters together with kernel function parameters (if any) have to be thoroughly chosen while model building phase.
2.3. Genetic Algorithm Parameters Optimisation
SVM models depend on parameters which must be adjusted. Different types of models tested use different sets of parameters: SVM cost (), intensive zone width (), and parameter () of the radial basis kernel function. Table 2 summarizes types of SVM models, kernel functions used, models parameters, and names of the models used for the reference in this paper.
Different models depend on different sets of parameters. The values of these parameters may be selected arbitrarily or determined using some kind of universal, developer independent procedure. In fact the determination of SVM model parameters is an optimisation problem and typical methods for such tasks may be used. Leave-one-out method  has been used in previous work, Lamorski et al. , but it is extremely computationally demanding. A simple search for an optimal solution on a grid of possible parameter values is another method commonly used . Unfortunately grid search method does not test whole space of possible parameters values as it is limited to arbitrarily chosen fixed combinations of parameters. As a result nonoptimal models’ parameters may be chosen. Some optimisation algorithms could be used instead.
Genetic algorithms (GA) with elitism were used in the present study for searching the optimal values of model parameters. GA is a technique used among others for optimisation purposes  and is especially well suited for applications where the aim function is not differentiable and may have local minima. None of the classical optimisation methods such as the Nelder-Mead downhill simplex method or gradient-based methods may be used successfully in such circumstances.
Parameters values searched for an optimal solution using GA were defined by the following bands: , , and .
Genetic algorithm operation is controlled by two main parameters: population size and a number of GA iterations. The other vital GA parameters are elitism percentage and mutation chance. The elitism percentage used in our study was 0.2, and mutation chance was 0.02.
Population size determines the number of points in the space of input parameters, where the model performance is evaluated. A population size of 100 was used for searching of the parameters in that study.
2.4. Model Formulation
SVM was used for model development. The soil database was randomly split into two subsets: training dataset (414 samples) and testing dataset (225 samples). SVM models were built using the training dataset and tested against the test dataset. Test datasets were not used for model development at any stage, except for final model testing and validation.
Correlation analysis was performed on the soil dataset and used for model elaboration. The analysis of this data led to selection of the input parameters for developed models: sand fraction, clay fraction, total porosity, and bulk density.
Due to the fact that SVM regression models allow for estimation of only one output parameter for a given set of input parameters, six different SVM models were developed, one for each value of the soil water potential in which water content was evaluated.
The -fold approach  was used for model elaboration, which allowed for a cross validation during the training phase of model development. The training dataset was randomly divided into ten distinct equinumerous subsets, for the purpose of the -fold method.
The developed models’ returned value, which was an average output from 10 different submodels, resulted from the -fold method. This submodel was trained on the joined-together nine subsets (373 soil samples). The remaining tenth subset was used for cross validation purposes and was rotated for each of the ten submodels elaborated. The -fold method allowed for estimation of variance of evaluated properties.
One of the major decisions made during SVM model development is to choose an appropriate kernel function. Previous attempts using SVM for retention curve modelling have applied the radial basis kernel function [20, 21]. In the present study we wished to investigate the influence of the kernel function on PTF model performance. We tested two kernel functions: linear kernel and radial basis kernel. The advantage of the linear kernel function over the radial basis kernel is the reduced number of model parameters. The radial basis kernel function introduces an additional parameter, , to the model, while the linear kernel function does not depend on any additional parameters.
One of the expected PTF model features is the ability for generalisation—when a model predicts correctly the results for previously unseen input data. The technical criterion in SVM model formulation that stands behind this model feature is the number of support vectors used for model formulation . In commonly used -SVM models there is no direct influence on the number of support vectors, which depend implicitly on other model parameters. The objective of the present study was to compare two classes of PTFs: -SVM based (with a fixed value of ) and -SVM. The parameter , in -SVM based models, explicitly determines expected percentage of SVs used in the model formulation. The model performance is checked with a theoretically optimal 50% of SVs, so the value of is fixed at 0.5.
2.5. Model Performance Criteria
Some kind of model performance criteria is needed in SVM model development for validation purposes. Typically root mean square error () and the coefficient of determination () are used: where is the number of data analysed, is a value approximated by the model, is the “true” measured value, and is the mean of the measured values. In ideal conditions, when the values approximated by the model equal the measured ones, then and .
3. Results and Discussion
3.1. Overfitting and the Radial Basis Kernel Function
(1) and (2) are widely used model performance criteria for PFT development. Model parameters are adjusted to minimise the for the training dataset. Usually a model which minimises for the training dataset also has a small for the test dataset—if so, the model has good generalisation properties.
One of the main machine learning paradigms states that only the training dataset is used for model development. On the other hand, the testing dataset is used only at the very last step, to check how developed model performs for unseen previously data. A model minimising between estimations and measured values from testing dataset is considered to be the best model.
At the initial stage of the model development was used as the aim function and the GAs were used for seeking the optimal model’s parameters. For both models utilising the linear kernel function, the C-linear and nu-linear, usage of as the aim function was a successful strategy. In that case the GA was able to find optimal model parameters.
However, when the radial basis kernel function was used, GA optimisation led to overfitting regardless of the type of the SVM method used. Very low values () for the training dataset were achieved; however, for the test dataset the was high. The generalisation capability of these developed models was very poor, and the number of support vectors was close to the number of all training vectors—which is an indicator of overfitting of the model. This phenomenon was caused by a too high value of the radial basis kernel function’s parameter determined by GA. Thus the GA algorithm was choosing the highest available value of , that is, an upper limit of the range of possible parameter values.
The SVM models seem to be especially sensitive to the value of parameter of the radial basis kernel function. The increasing value of leads to an increased number of support vectors in the model, which degrades its generalisation capabilities. When was used as the aim function, GAs chose optimal values of parameters which minimised for the training dataset. But for these parameters was relatively high for the testing dataset, so parameters were not acceptable from the point of view of model generalisation capabilities. In fact using as the aim function together with genetic algorithms optimisation led to development of models which were not optimal when radial basis kernel function was used.
An example of overfitting phenomenon for the -SVM model, which evaluates the value of water content for the soil water potential of −0.98 kPa, is given in Figure 1. Figure 1(a) shows the dependence between and , calculated for the training dataset ( train) and the testing dataset ( test). The other -SVM model parameters are as follows: and . For values of , the value of for the training dataset increased until an unrealistic value (overfitting) was reached, while for testing dataset decreased greatly (i.e., lack of generalisation capabilities). Figure 1(a) gives also dependence between the number of support vectors (number of SV) in the model and the value of parameter . It can be observed that for high values of the number of SV in the model reaches the maximum 373 (whole -fold training dataset)—which is also the evidence of overtraining. Similarly (Figure 1(b)) for training dataset is decreasing for growing values of and for testing dataset is increasing. Figure 2 shows similar dependence between parameter and models statistical characteristics and for -SVM based model estimating water content for soil water potential −0.98 kPa.
To give better insight into models structure, sensitivity analysis was performed. For this purpose, the Morris  global sensitivity analysis method, modified according to Campolongo et al. , was used in its classical formulation: the one-step-at-a-time (OAT) approach. The outcome from sensitivity analysis was index , which represents relative to other parameters the impact of tested model parameter on model outcome averaged over the whole parameters space. The model outcome which was used for the purpose of sensitivity analysis was a sum of model estimates made for all the records from training dataset.
Figure 3 presents normalized values of sensitivity indices determined for all models estimating water content for the soil water potential −0.98 kPa as an example. Results for model nu-linear are missing as this model depends on only one parameter . What can be seen here is relatively low dependence of the model on the SVM cost parameter . Model -SVM utilizing linear kernel function (-linear), which depends on and parameters, is almost not sensitive to parameter . The same for the -SVM linear kernel model, where the parameter has low influence on model’s outcome. Radial basis kernel function -SVM model (nu-radial) depends on two parameters: and , and the latter has much higher impact on models estimations.
3.2. Alternative Form of the Aim Function
Models for the other soil water potentials, both -SVM and -SVM based, demonstrated the same behaviour (overfitting) when was used as the aim function and the radial basis kernel function was used. If some kind of automated model-parameter search method such as GA is to be used, then the use of as the aim function is discouraged because of the dominant influence of on its value.
An aim function is needed that will explicitly take into account both factors: the model performance criterion (e.g., ) and the model generalisation capabilities. An alternative form of the aim function, instead of , is proposed in (3)
Equation (3) is dependent on two arguments: and the number of support vectors in the elaborated model (). This formula mimics the normal bivariate distribution, with the parameters , , and which are constants.
The proposed new aim function has a minimum at the point where and , both of these criteria have to be met in the minimum of this function. The parameter is the number of support vectors expected in the developed model. The value of the parameter should be equal to half of the total number of the input data in the training dataset [25, 26] to achieve an optimal nonovertrained model. In the present study, as we have 373 records in each -fold training dataset and we assumed theoretical ideal proportion (1/2) between the number of support vectors and the number of data in the training dataset.
Constants and have the impact on the shape of the aim function and are connected with its slopes. The values of these parameters were selected based on the empirical basis, numerical tests, and analysis of their influence on the shape of the aim function (Figure 4). The parameter was related to the , the maximum value of which could occur during SVM model parameter search procedure. In this study value of was estimated altogether with sensitivity analysis of the models, but achieved values of were pretty standard for point PTF developments and could be chosen arbitrarily.
Similarly, the parameter was related to the , the total number of the records in the training dataset. In cases of the parameter the appropriate value of while in case of the parameter the good value of . Figure 4 presents the shape of the aim function in relation to the values of the and parameters.
Instead of simple formula new aim function (3) was used in conjunction with GA optimisation techniques for searching for models’ parameters. For -SVM based models (nu-radial) an optimal solution was found, and both criteria were reached: was low and an optimal number of support vectors were selected. For -SVM based models (-radial) the results were much better than found previously when was used as an aim function but were still inadequate.
3.3. PTF Model Performance
Finally eight types of PTF models were developed, one for each value of the soil water potential for which water content was estimated. These models utilised -SVM and -SVM based modelling and both kernel functions: radial and linear. Two types of aim function were used together with GA for models development. Comparison of results for these eight models presents Table 3.
Table 3 summarises performance indices of elaborated models: number of support vectors, , and ; all calculated for the training and testing dataset. Values of statistical indices and were calculated from comparisons of evaluated model values against measurements of soil water content for specified values of soil water potential.
The ability of the model to predict correctly previously unseen data is very important, so model’s results for testing dataset are most interesting.
In case of -SVM models based on the radial basis kernel function, the use of the aim function in a new form, proposed in this paper, allowed for selection of much better model’s parameters. As a result developed models are not overtrained and perform much better than models which were developed using as the aim function.
We may state that the best combination of SVM algorithm used kernel function and method of models parameter search was radial basis kernel function based -SVM based model trained using the aim function in the form of (3).
Newly proposed aim function increased also substantially results achieved in case of -SVM radial basis kernel function based models, but in this case the results are not good enough to use these models anyway. Other models perform better, even linear kernel based.
We can see (Table 3) that when linear kernel is used, there is really no difference between models developed using different methodology. Regardless the SVM algorithm type used (-SVM or -SVM) or aim function used for GA parameter search the results achieved by the models are the same. The small differences in (0.0495–0.0517) occurring for the soil water potential −491.66 kPa in case of -linear model are not important from practical point of view of PTF usage.
This information, together with the observation that the models are almost not sensitive to the value of the parameter (which is the only parameter in case of -SVM model), gives the conclusion that PTF models based on -SVM algorithm together with linear kernel function may be constructed without any optimisation of the value of the parameter. The general rules for selection of the value of parameter  may be used instead.
When results for the testing dataset were considered, the conclusion was that the nu-radial model based on the -SVM method and the radial basis kernel function was the model of first choice for further PTF evaluations. However, the nu-linear model was also very appealing, due to having accuracy not much lower than that of the nu-radial counterpart while having only one model parameter, which simplifies substantially model development. One note has to be made regarding this statement. It is known from the literature of SVM kernel algorithms that linear kernel function may be considered as the alternative to radial basis kernel only for big training datasets. In case of smaller training datasets radial basis kernel function should be used. Results obtained here suggest that in our case the number of soil data used form model development was almost numerous enough (373 records) to use linear kernel function for model development.
SVM methodology was successfully applied to water retention modelling. Elaborated point PTF models used sand fraction, clay fraction, total porosity, and bulk density as the input parameters.
Thenewly proposed -SVM based retention models, with a fixed value of , showed better performance than -SVM based models.
Proposed, new form of the aim function (3) used for searching of the model’s parameters allows for development of better models in case of radial basis kernel based models.
Results of this study showed that the -SVM method is suitable for the development of PTF models for retention curve approximation. The advantage of using this method is a limited number of model parameters in comparison with -SVM methodology.
The investigated linear kernel function may be used successfully instead of the radial basis function, for point PTF developments. This kind of kernel function allows for reduction of the model parameters by one compared with the radial basis kernel function and also for simplified model development.
|ANN:||Artificial neural network|
|SVM:||Support Vector Machines|
|SWRC:||Soil water retention curve|
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The work was partly financed from budget of National Science Centre under Contract no. UMO-2011/01/B/ST10/07544.
C. Sławiński, R. T. Walczak, and W. Skierucha, “Error analysis of water conductivity coefficient measurement by instantaneous profiles method,” International Agrophysics, vol. 20, no. 1, pp. 55–61, 2006.View at: Google Scholar
L. R. Ahuja, J. W. Naney, and R. D. Williams, “Estimating soil water characteristics from simpler properties or limited data,” Soil Science Society of America Journal, vol. 49, no. 5, pp. 1100–1105, 1985.View at: Google Scholar
R. Walczak, B. Witkowska-Walczak, and C. Sławiński, “Comparison of correlation models for the estimation of the water retention characteristics of soil,” International Agrophysics, vol. 16, no. 1, pp. 79–82, 2001.View at: Google Scholar
S. C. Gupta and W. E. Larson, “Estimating soil water retention characteristics from particle size distribution, organic matter percent, and bulk density,” Water Resources Research, vol. 15, no. 6, pp. 1633–1635, 1979.View at: Google Scholar
W. J. Rawles and D. L. Brakensiek, “Estimating soil water retention from soil properties,” Journal of the Irrigation & Drainage Division, vol. 108, no. 2, pp. 166–171, 1982.View at: Google Scholar
B. Ghanbarian-Alavijeh and H. Millán, “Point pedotransfer functions for estimating soil water retention curve,” International Agrophysics, vol. 24, no. 3, pp. 243–251, 2010.View at: Google Scholar
H. Merdun, Ö. Çinar, R. Meral, and M. Apan, “Comparison of artificial neural network and regression pedotransfer functions for prediction of soil water retention and saturated hydraulic conductivity,” Soil and Tillage Research, vol. 90, no. 1-2, pp. 108–116, 2006.View at: Publisher Site | Google Scholar
B. Minasny and A. B. McBratney, “The neuro-m method for fitting neural network parametric pedotransfer functions,” Soil Science Society of America Journal, vol. 66, no. 2, pp. 352–361, 2002.View at: Google Scholar
Y. A. Pachepsky, D. Timlin, and G. Varallyay, “Artificial neural networks to estimate soil water retention from easily measurable data,” Soil Science Society of America Journal, vol. 60, no. 3, pp. 727–733, 1996.View at: Google Scholar
V. Kecman, Learning and Soft Computing, Support Vector Machines, Neural Networks, and Fuzzy Logic Models, MIT Press, Cambridge, Mass, USA, 2001.
V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, New York, NY, USA, 2nd edition, 2008.
B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, “New support vector algorithms,” Neural Computation, vol. 12, no. 5, pp. 1207–1245, 2000.View at: Google Scholar
R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms, John Wiley & Sons, Hoboken, NJ, USA, 2004.
M. Morris, “Factorial sampling plans for preliminary computational experiments,” Technometrics, vol. 33, no. 2, pp. 161–174, 1991.View at: Google Scholar