Abstract

We investigated eleven particle-size distribution (PSD) models to determine the appropriate models for describing the PSDs of 16349 Chinese soil samples. These data are based on three soil texture classification schemes, including one ISSS (International Society of Soil Science) scheme with four data points and two Katschinski’s schemes with five and six data points, respectively. The adjusted coefficient of determination , Akaike’s information criterion (AIC), and geometric mean error ratio (GMER) were used to evaluate the model performance. The soil data were converted to the USDA (United States Department of Agriculture) standard using PSD models and the fractal concept. The performance of PSD models was affected by soil texture and classification of fraction schemes. The performance of PSD models also varied with clay content of soils. The Anderson, Fredlund, modified logistic growth, Skaggs, and Weilbull models were the best.

1. Introduction

Particle-size distribution (PSD) is a basic physical property of soils that affects many important soil properties. PSD has been widely used for estimating various soil hydraulic properties [1, 2]. The PSD prediction has been used for comparing and converting texture measurements from different classification systems [36]. Many soil databases do not contain detailed PSD data but contain only several mass fractions and use different cutoff points to separate the size fractions. To obtain a more complete description of texture, various parametric PSD models are used [712]. In the Second National Soil Surveys of China, soil textures were measured by ISSS (International Society of Soil Science) and Katschinski’s systems. The conversion from ISSS and Katschinski’s to the popular USDA (United States Department of Agriculture) system is required to achieve compatibility of soil data and to use pedotransfer functions as estimators of soil hydraulic properties [13, 14].

PSD models can be categorized as several classes: regression [15], spline [13], fractal method [16, 17], method based on PSD statistics [12], similarity procedure [13], gray model [18], and so on. In this study, we investigated regression models for their simplicity and effectiveness when there are a small number of fractions.

A few studies have been performed with the purpose of determining the best model for fitting particle-size distribution curves of soils [9, 15, 1923]. Hwang et al. [15] compared seven PSD models to fit PSD data sets of Korean soils and found that the Fredlund model with four fitting parameters [11] showed the best performance for the majority of texture types. Hwang [19] examined nine PSD models and found that the performance of the PSD models was affected by soil texture and most of them improved with an increase in clay content. He also reported that the Fredlund model and two Skaggs models [12] showed better performance for several specific soil textures. Liu et al. [20] evaluated the suitability of five models to fit particle-size analysis data of Chinese soils of three texture types. The results indicated that the four-parameter Fredlund model showed the best representation of particle-size distributions, and the three-parameter Fredlund model and modified logistic growth model produced comparative results for silty clay loam and silt loam soils but yielded worse results for sandy loam soils. da Silva et al. [21] tested 14 different models with feasibility to fit the cumulative soil particle-size distribution curve based on four measured points and experienced that the most recommendable models were the Skaggs et al. [12], Lima and da Silva [24], Weibull [25], and Morgan et al. [26] models, which are all three-parameter models. Botula et al. [23] evaluated ten PSD models for soils of humid tropics and found that the performance was dependent on the bimodal character of the soil PSD. However, these studies have not fully investigated the performance of PSD models and used data measured only according to USDA standard.

Several studies have proposed conversion methods between texture classification systems. Nemes et al. [13] evaluated the accuracy of four interpolation procedures using a data set of European countries, and these procedures were used to accomplish the standardization of particle-size description by the FAO (Food and Agriculture Organization)/USDA system. Minasny and McBratney [4] used an empirical multiple linear regression to accomplish the conversion between the ISSS and USDA system. Shirazi et al. [27] developed a conversion table between the USDA and ISSS system based on a lognormal particle-size distribution assumption within each size fraction (sand, silt, and clay). Shirazi et al. [5] expanded the development of the PSD statistics (i.e., the geometric mean particle diameter and its standard deviation) to compare USDA and ISSS systems including rock fragments, which can be applied to the conversion between the two systems. Most PSD models are incapable of supporting the conversion from Katschinski’s to USDA systems. These PSD models are not capable of extrapolating the soil fraction above the upper limit (i.e., 1 mm) of Katschinski’s system, but the fraction between 2 mm and 1 mm is included by the USDA system. Few studies have been devoted to the conversion between USDA and Katschinski’s systems [3]. The conventional method is plotting the particle-size curve by semilogarithm regression [28], but it is time-consuming and of low accuracy. Rousseva [3] used the fraction between 1 mm and 3 mm to interpolate the cumulative fraction at 2 mm limit by exponential and power-law distribution models during the transformation from Katchinski’s to ISSS and USDA systems. However, the gravel contents are usually not measured (e.g., the Second National Soil Surveys of China). The following question arises: is it possible to extrapolate the soil fraction between 2 mm and 1 mm based on the data below 1 mm limit? The fractal method [29] provides an optional solution, which was used in this work.

The objective of this study is to evaluate the performances of several PSD models to find the best describing the soil PSD, to convert the data from ISSS and Katschinski’s schemes to the USDA standard, and to investigate the effects of soil texture and classification schemes on the efficiency of PSD models.

2. Material and Methods

2.1. The Chinese Soil Profile Database

The Chinese soil profile database was established using the results of the Second National Soil Surveys of China conducted in the 1980s. It contains data from 33010 soil horizons representing 8979 profiles. The data have been published by the National Soil Survey Office [30], soil survey offices of provinces, and some Tibetan counties’ soil survey offices. This work collected and digitized these data, which are available in Microsoft Excel worksheet file format. Not all soil horizons have PSD data. Fine size fractions were determined by the hydrometer or pipette method, whereas the coarse fractions were obtained by sieving [28]. Particle-size fraction data were classified by ISSS or Katschinski’s schemes. Katschinski’s [31] scheme was not only used in China but also widely used in countries of former USSR and Eastern Europe. Because some schemes have small number of soil samples and some schemes have too few particle-size limits (e.g., 2, 0.02, and 0.002) for PSD models to fit the curve, we used only part of the data (i.e., 16349 samples) in this study. Four to six particle-size fractions are given in one ISSS scheme and in two Katschinski’s schemes (Table 1).

2.2. Particle-Size Distribution Models

Eleven PSD models were tested in this study (Table 2). Ten models are shown in Table 2 and a self-similar model (SELF) is described here because this model cannot be given in a simple equation [32] and it has not yet been compared with other models in previous studies.

The self-similar model uses the soil textural data to define an iterated function system, which determines how the self-similar distribution reproduces its fractal structure at different length scales [33]. A set of proportions of soil mass are selected (for simplification, assume that ). The fractal model for PSD derived from the self-similarity hypothesis is constructed as follows (see Martín and Taguas [32] for the details): let , , and be the subfractions of sizes corresponding to the three size classes and let , , and be the proportions of mass for the fractions , , and , respectively. Notice that and . Associated with these definitions, we consider the following functions: , , and , where , , and and is any point (or particle size value) in the fraction . , , and are the linear mappings (similarities) that transform the points of the fraction in the points of the fractions , , and , respectively. The set , is called an iterated function system (IFS) and it defines a self-similar distribution μ supported on , which satisfies for every . The set of textural data, together with the self-similarity hypothesis mentioned above, determine a self-similar mass distribution for particle size determined in , which fits exactly the input data; that is, . From the self-similarity, the contribution to soil mass of particles with sizes in the fraction is given by . In general, for any size fraction , the IFS model allows one to determine the mass of soil corresponding to the soil particles with sizes in the fraction . In this study, the values of the fraction limits chosen are , , and (in μm); and the model is seen as a biparametric model ( and ) which varies until the best simulation, in terms of minimizing the distance between simulated and real data, is reached.

Six models which showed relatively good performance in the work of Hwang et al. [15, 19] were tested, including the Fredlund4P (F4P), Skaggs (S), van Genuchten (VG), Weibull (W), offset-nonrenormalized lognormal (ONL), and offset-renormalized lognormal (ORL) models. In addition, four other models with different number of fitting parameters were tested. These models are selected among scores of PSD models in the literature according to their performance in previous studies. The Anderson model (AD) [8] assumes that the log mass of the particles is arcus-tangent distributed (i.e., Cauchy distributed). Fredlund et al. [11] found that a value of 0.001 for the parameter in F4P model (Table 2) provided a reasonable fit in most cases, so the Fredlund3P (F3P) model with three parameters was also tested. The modified logistic growth model (ML) [20] is modified from the logistic growth model whose graph is in a sigmoid shape. The van Genuchten type model is modified by Zhuang et al. [34] (VGM) which is also based on an assumption that the shape of particle-size distribution is sigmoidal. Spline is not included because Liu et al. [20] found that it performs worse than modified logistic model.

2.3. Model Comparison and Fitting Techniques

We used three statistical indices to determine the performance of the PSD models. The adjusted coefficient of determination () was used as a relative measure of goodness-of-fit between predicted data and observed data, which is a better criterion than the coefficient of determination [23]. is defined as with where is the number of observed data points, is the number of model parameters, and and are the observed and predicted cumulative mass fractions, respectively. A further index was Akaike’s information criterion (AIC) [35] which examines the complexity of the model together with its goodness of fit to the sample data, balances between the two, and discourages overfitting. AIC is defined as with where RSS is the residual sum of squares. Geometric mean error ratio (GMER) was used to test whether the underestimation (values bigger than 1) or overestimation (values smaller than 1) in model’s prediction occurs:

The “soiltexture” package of language was developed to fit the parametric models to the raw data [36]. The optimization methods in the “optim” function of are used, including Nelder-Mead, quasi-Newton and conjugate-gradient, box-constrained, and simulated annealing algorithms [37]. We ran all methods and chose the best results with minimum RSS. The nonlinear optimization procedures were carried out using at least five random initial parameter estimates for all soils. When the final solution for each soil converged to different parameter values, the parameter values with the best fitting statistics (i.e., RSS) were kept. In most cases, the parameters were similar.

2.4. Conversion to the USDA Standard

The soil data were converted to the USDA standard, including soil fractions (i.e., clay fraction ( mm), silt fraction ( mm), and sand fraction ( mm)).

After estimating the parameters of each model for every soil sample, we can easily obtain the predicted values of the cumulative mass fractions at a specific point between the upper and lower limits of the three soil fraction schemes, so the fractions in USDA standard can be obtained. The result of the model with the biggest value was adopted for a sample in the conversion to the USDA standard. For T1 scheme, the clay fraction was already known, the silt fraction was the predicted cumulative fraction at the observed 0.05 mm limit minus clay fraction, and the sand fraction was calculated by subtracting silt fraction and clay fraction from 100%. For T2 and T3 scheme, the clay fraction was predicted by PSD models, and the silt fraction was calculated by subtracting clay fraction from the cumulative fraction at the observed 0.05 mm limit. To predict the cumulative mass fraction at the 2 mm limit, the fractal method was adopted. In recent years, PSD of soil samples has been successfully studied by fractal methods. Turcotte [16] and Tyler and Wheatcraft [17] developed a relationship relating mass measurements and diameters for the analysis of PSD, which is expressed as where is the mass of soil particles with a radius smaller than a prespecified size ; is the total mass of particles; is the upper particle size limit; and is the fractal dimension of fragmentation. The fractal dimension of fragmentation can be found by estimating the slope coefficient of the versus plot. Many studies using detailed experimental data have shown multiple fractal dimensions on log-transformed PSD of soil samples. Wu et al. [38] found three domains within PSDs determined over six orders of magnitude in the particle size. Bittelli et al. [39] reported that three main power-law domains could characterize the PSD across the whole range of measurements, including the sand domain between and 2 mm. Millán et al. [40] divided the whole PSD domain into two fractal domains (i.e.,  mm and  mm), where the  mm domain spanned approximately one order of magnitude with fractal concepts. Prosperini and Perugini [41, 42] found that all soils in their analysis displayed two scaling domains in the range from 0.0014 to 50.8 mm. According to these studies, the slope of the log-log plot between the cumulative fractions and particle sizes has a linear trend in the domain from 0.5 to 2 mm. The relationship between the cumulative fraction at 2 mm and the cumulative fractions at 1 and 0.5 mm can be derived from (6). The equation is given by where , , and are cumulative fractions with a diameter smaller than 2, 1, and 0.5 mm, respectively. and were predicted by PSD models. All three fractions were divided by the predicted value at 2 mm so that the cumulative final value was 100 percent. We did not apply the fractal model alone to do the conversion, because there are not enough observed data points to determine the subdomains and fractal dimensions to accomplish the interpolation for each soil sample in this study.

3. Results and Discussion

3.1. Model Performance

Among all of the three soil fraction schemes and all of the models, values of ranged from 0.873 to 1 (Figure 1). In T1 scheme of ISSS, the S and W models showed the highest values; F3P, ML, and ORL models gave similar performance; and the other four models (ONL, VG, VGM, and SELF) had lower values. In the two Katschinski’s schemes (i.e., T2 and T3), the performances of PSD models were quite similar to each other. The S model with two parameters yielded the highest values. The AD, F3P, F4P, ML, and W models indicated similar performance. The other five models had lower values. The F4P model with four parameters performed slightly better than the F3P model with three parameters in both Katschinski’s schemes, which indicates that F3P is enough for most soils with the advantage of one less parameter [11]. The ONL model had lower values in the T1 scheme of ISSS than in Katschinski’s scheme, while the ORL had better performance in the T1 scheme. The performance of AD model became better from the T2 scheme to the T3 scheme as one limit (0.25 mm) of PSD is omitted in T3, while the opposite happened to the SELF model. The results of the AIC test were generally consistent with those of assessment (Figures 1 and 2). The performance of AD and F4P model with four parameters in the T1 scheme can be evaluated by AIC but not by , because the denominator (i.e., -) will be zero in (3). The AD model had the lowest AIC, while the F4P had similar AIC with the F3P model. Though AIC has penalty on parameter number, the AD model with four parameters and the W model with three parameters were still among the best PSD models.

Though the performances evaluated by and AIC were quite similar on the whole (Figures 1 and 2), the best model according to different criteria can be different for a specific soil. Figure 3 shows the percentage of cases where a model was the best according to and AIC for soils of the T3 scheme. The percentage of best cases according to for the AD and F4P models was smaller than that according to AIC, while the opposite happened to the S and ONL models.

Figure 4 shows GMER for all of the three soil fraction schemes and all of the models. Most models overestimated the fractions while the S and VG model underestimated them in general. The VGM model showed an overestimation in T1 scheme but it showed an underestimation in T2 and T3, while the opposite happened to the ONL model. The AD and W models showed the best results without apparent underestimation or overestimation.

Over all, the AD, S, and W models provided the best performance for soils in ISSS and Katschinski’s schemes according to the analysis, AIC tests, and GMER. The F4P, F3P, and ML models also displayed relative good performances.

3.2. The Validation of Combined Use of PSD Model and Fractal Method

PSD models to interpolate and fractal method to extrapolate have different statistical assumptions. We need to know whether the combined use of two distinct models results in a real representing soil sample. For this, 20 soil samples were used to validate the combined use, which has the observed cumulative particle-size percentages at 2 mm, 0.05 mm, and 0.002 mm of USDA standard and at the limits of T2 scheme, respectively. These samples were only available with the given specifics. We used each PSD model combined with the fractal method to predict PSD with data points of T2 scheme. Figure 5 illustrates the test of suitability of the combined method to predict percentages of particles finer than 0.05 and 0.002 mm. The value was 0.960, which was slightly lower than those of PSD models. Thus, the combined method with interpolation and extrapolation is suitable for transferring data from Katschinski’s to USDA scheme. However, it cannot be determined whether this method is suitable for a specific soil textural class, because the sample size is too small.

3.3. Effect of Soil Textural Classes and Fraction Schemes

The AIC analysis had revealed that the performance of a PSD model can be affected by soil textural classes and soil fraction schemes (Tables 3 and 4). The dataset used in this study represents all the 12 textural classes of ISSS system and 10 textural classes of Katschinski system but in various degrees (Tables 3 and 4). The dominant textural classes in the ISSS scheme are light clay (25.7%), clay loam (21.7%), sandy loam (15%), and loam (9%). The dominant textural classes in the T2 scheme are light clay (40.4%), moderate clay (27.6%), heavy loam (14.8%), and heavy clay (12%). The dominant textural classes in the T3 scheme are light clay (33.6%), moderate clay (21%), heavy loam (19%), and moderate loam (11.3%). Table 3 shows the number of cases as the best model due to the smallest AIC value for each soil textural class of ISSS in the T1 scheme. All the models performed best in some cases. The AD model had the largest number of cases for most texture, while the S model did so for sand and silt clay. The AD model had the largest number of cases as the best model, followed by the S model. Although the S and W models performed similarly in the and AIC analysis (Figures 1 and 2), the S model performed better with a bigger number of cases which resulted in the smallest AIC values for all soil textural classes. Although the SELF model did not perform well in the overall analysis (Figures 1 and 2), it had similar number of best cases with the W model. These results suggest that the PSD models that show good performance for overall soil PSDs are not always good models for each soil textural class. Table 4 shows the number of cases of all the tested PSD models having the smallest AIC value for each soil textural class in the T2 and T3 schemes. In the T2 scheme (Table 4(a)), most soils are clay soils in Katschinski system (i.e., heavy clay, moderate clay, and light clay). The AD model had the largest number of cases which resulted in the smallest AIC values for these clay soils. However, the S model performed better for heavy loam. In the T3 scheme (Table 4(b)), the AD model also had the largest number of cases having the smallest AIC values for almost all soil textures, though the F4P, S, and W models also have a large number of cases. From the comparisons among the T1, T2, and T3 schemes, it appears that the performance of PSD models vary not only with the soil textural classes but also with the soil fraction schemes. However, the dataset used in this work does not seem to be sufficient to draw a well-founded conclusion for some textural classes with a small number of samples, especially in the T2 and T3 schemes.

The performance of PSD models appears to be affected by clay content of soils in different manners. In the T2 scheme (Figure 6), the clay content is defined as particles smaller than 0.001 mm in Katschinski system. All models performed best with the clay content between 30% and 40%; the AD model had the smallest values with clay content between 10% and 20%, and other models had the smallest values with clay content larger than 40%; the S and W model became better with the increase in the clay content but were very poor with the clay content over 40%. However, the performance of these models with different clay content in the T1 and T3 schemes is quite different from that in the T2 scheme (results not shown here). Hwang et al. [15] found better fitting with the increase in the clay content. Botula et al. [23] observed no clear trend of performance with clay content using a more clayed soil database. Our results generally confirmed the findings of Botula et al. [23].

4. Conclusions

Eleven models for soil PSD were compared, including the AD, F4P, F3P, ML, ONL, ORL, S, SELF, VG, VGM, W, and SELF models, using one ISSS scheme (i.e., T1) data with four data points and two Katschinski’s schemes (i.e., T2 and T3) with five and six data points. The results from values, AIC, and GMER test indicated that the AD, F4P, F3P, S, and W models had the best performance in three schemes. The combined use of PSD model and fractal method was suitable for transferring soil texture data from Katschinski’s to USDA scheme when there are only sparse observed data points. Soil texture can affect the performance of PSD models. And the performance of PSD models has some difference using experimental data of different soil fraction schemes. The performance of PSD models can be affected by clay content of soils in quite different manners for different classification of soil fraction schemes. This work contributes to the selection of soil PSD models to describe the soil particle-size distribution curve which can be used to estimate the hydraulic properties and to compare and convert the different soil texture classification systems.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by the Chinese NSFC under Grant 41205037, MOST no. 2010CB951802, and the Fundamental Research Funds for the Central Universities. The data collection was helped by a group of students in Beijing Normal University. This research was also partially supported by Plan Nacional de Investigación Científica (I+D+i) under reference AGL2011/25175 and DGUI (Comunidad de Madrid) and UPM under reference QM100245066.