#### Abstract

Geotechnical models are usually built upon assumptions and simplifications, inevitably resulting in discrepancies between model predictions and measurements. To enhance prediction accuracy, geotechnical models are typically calibrated against measurements by bringing in additional empirical or semiempirical correction terms. Different approaches have been used in the literature to determine the optimal values of empirical parameters in the correction terms. When measured data are abundant, calibration outcomes using different approaches can be expected to be practically the same. However, if measurements are scarce or limited, calibration outcomes could differ significantly, depending largely on the adopted calibration approach. In this study, we examine two most commonly used approaches for geotechnical model calibration in the literature, namely, (1) purely data-catering (PDC) approach, and (2) root mean squared error (RMSE) method. Here, the purely data-catering approach refers to selection of empirical parameter values that minimize coefficient of variation of model factor while maintains its mean value of one, based solely on measured data. A real case of calibrating the Federal Highway Administration (FHWA) simplified facing load model for design of soil nail walls is illustrated to thoroughly elaborate the differences in practical calibration and design outcomes using the two approaches under scarce data conditions.

#### 1. Introduction

It has been well recognized that model uncertainty plays a key role in reliability-based design of geotechnical structures [1–5], as it is usually much larger than uncertainty associated with design parameters (e.g., soil cohesion, unit weight, and internal friction angle). Typically, geotechnical models need to be assessed and calibrated against measured or observed data before used for design. However, in many cases, measurements or observations that are available for model assessment and calibration are limited, mainly due to two reasons: first, obtaining in situ geotechnical data is costly and time consuming in general; and second, monitored data were always undervalued and thus not well collected and pooled, although in recent years, geotechnical engineers start to realize the value of data and make effort to make the best use of it [6–13].

Despite of the situation, assessment and calibration of geotechnical models using limited data are far better than doing nothing at all [3]. Usually, calibration of a geotechnical model can be done by following two steps: (1) introduce an empirical or semiempirical correction term to a model and (2) determine the constants in the correction term according to certain criteria. Based on the calibration criteria, there are two methods that have been widely adopted for geotechnical model calibration in the literature. One is the purely data-catering (PDC) approach, and the other is the root mean squared error (RMSE) approach.

The PDC approach calibrates a model by adjusting the constants to satisfying two criteria: keeping the mean of bias equal to one while minimizing the coefficient of variation (COV) of bias. Here, bias is defined as the ratio of measured to the predicted value. Bias, also referred to as model bias or model factor elsewhere, is commonly treated as a random variable and used as an indicator for quantification of model accuracy. Previous geotechnical model calibration using the PDC approach can be seen in, e.g., Lin and Liu [14], Lin et al. [15], Lin et al. [16], Phoon and Kulhawy [17], Phoon and Tang [18], Tang and Phoon [19], Yuan et al. [20], and Yuan et al. [21].

On the other hand, the RMSE approach determines the constants as the set that minimizes the root mean squared error between measurements and predictions. Common examples are geotechnical models developed using the response surface method and machine learning methods, e.g., Bathurst and Yu [22], Lin et al. [16], Liu et al. [23], Yu and Bathurst [24], Zhang et al. [25], Zhang et al. [26], Zhang et al. [27], and Zhang et al. [28]. Obviously, these two calibration approaches are not equivalent, especially when the data for calibration are scarce. This will result in different calibrated values for the constants, which in turn lead to different geotechnical design outcomes. Discussion on the difference of these two approaches will be provided later in this study.

While being extensively used, influences of adoption of the two approaches on geotechnical calibration outcomes and the consequences are not yet thoroughly examined. To fill the gap, this study is focused on investigating the differences in both calibration and practical design outcomes that are resulted by using the above two calibration approaches. To allow comparing model competence from a third angle, the Bayesian information criterion (BIC) is employed [29]. A case study of calibrating the default Federal Highway Administration (FHWA) facing load model for facing design of soil nail walls is shown to elaborate the influences.

#### 2. Approaches for Model Calibration and Ranking

This section introduces in detail the PDC and RMSE approaches for calibration of geotechnical models against observed data. The commonalities and differences of the two approaches are discussed. A likelihood-based model ranking method called Bayesian information criterion (BIC) is also introduced, which will be used later to quantify the competences of the calibrated models.

##### 2.1. Purely Data-Catering Approach

Suppose be the geotechnical model to be calibrated. Here, is the model output which is taken to be a scalar for simplicity; and is the model input parameter vector, . Note that the input parameter vector is the same across different design scenarios, while the values of its elements could vary. For convenient, we denote and for design scenario .

Let be observed or measured values for from real cases, and be the model factor (which is a random variable) for . Then, based on the method of moment, the sample mean and sample standard deviation of , denoted as and , respectively, can be computed as

The purely data-catering (PDC) approach assumes that a model can be adequately (indeed, compromisingly) calibrated in terms of and that are obtained by the method of moment. The PDC approach introduces an empirical correction term, , to the original model for calibration purposes, where is a function of and , with being a vector of empirical constants to be determined using . In general, the correction term can be written as . As such, the calibrated model can be expressed aswhere is the model output after calibration. The sample mean and sample standard deviation of the model factor for the calibrated model, denoted as , can then be calculated as

The calibration principles of the PDC approach are to adjust the values of the empirical constants in until simultaneously satisfying two criteria: (1) is equal to 1, and (2) is minimized. Evidently, when the data available for calibration are limited, i.e., in equations (3) and (4) is not sufficiently large, both and could be significantly influenced by and , resulting in unstable calibration outcomes.

##### 2.2. Root Mean Squared Error (RMSE) Approach

This method assesses the accuracy of a model by computing its root mean squared error (RMSE) between model predictions (i.e., or ) and measured (true) values . The RMSE can be computed asfor the original model, i.e., , andfor the calibrated (corrected) model, i.e., .

The calibration principle of the RMSE method is to select that minimizes . Note that this method does not necessarily result in with minimal , where is the coefficient of variation of . Although the RMSE method has been widely used for geotechnical model development and calibrations, it does not provide an intuitive impression of model accuracy.

##### 2.3. Model Competence Ranking

The Bayesian information criterion (BIC) is a relative measure of goodness-of-fit among models given observed data and has been widely used in ranking model competence [29, 30]. The BIC is computed aswhere is the log of the maximum likelihood, is the number of model parameters, and is the number of data points.

Typically, the maximum value of the likelihood function and the corresponding maximum likelihood estimators can be found using numerical methods. Technical descriptions of the maximum likelihood estimation method, e.g., the construction of a likelihood function, are not provided here for brevity. Interested readers are directed to, e.g., Juang et al. [31] and Lin and Liu [14] for more details.

The criterion simply states that the smaller the BIC value of a fitting model, the better the model captures the observations. It should be emphasized that the absolute value of the BIC itself is meaningless in terms of model competence; only the difference between BICs helps ranking the models.

##### 2.4. Discussion

Although the above two approaches have been widely adopted for geotechnical model calibrations based on observed data, there are some fundamental differences in calibration outcomes and interpretations, as given in Table 1.

The PDC approach uses two indicators to jointly describe the model accuracy, i.e., and ; whereas, the RMSE approach uses only one indicator, . Typically, is interpreted as an indicator for on-average accuracy, while is taken as an indicator for dispersive accuracy. The advantage of such an accuracy assessment scheme is that it provides an immediate and general idea of the performance of a model. For example, and suggest that overall model predictions are 10% larger than the corresponding observations, while the dispersion in prediction accuracy is 30%. Clearly, if a model is perfect, then it would have and . The disadvantage is probably the lack of ability to compare accuracies among models. This can be easily seen for two models, for example, A and B, where A has and and B has and . In such a case, model A has a better on-average accuracy, but its prediction is more dispersive; whereas, model B has a less on-average accuracy, but the prediction spreads less. Hence, it is difficult to directly determine which model is more accurate, if without any further analyses.

For the RMSE approach, the smaller the , the better the model accuracy. For a perfect model, . As this method uses a single index for model assessment, accuracy comparison among models is straightforward. However, this approach does not provide an intuitive sense of accuracy of the model itself.

Another difference is the criteria set for calibration. The PDC approach minimizes or conditioned on . A mean of of one represents that the calibrated model is unbiased on average, within the context of observations . The objective of the RMSE approach is to minimize to obtain the best accuracy. As has been pointed out earlier, usually this is not necessarily equivalent to the criteria for the PDC approach.

Generally, both PDC and RMSE approaches can only handle uncensored data. If observation data are censored, then more robust approaches such as the maximum likelihood method or Bayesian inference technique can be employed. However, to do so, the maximum likelihood method and Bayesian approach would have to first assume the probability distribution of so as to construct the likelihood function . Therefore, the estimators by these approaches are conditioned on the distribution of . For PDC and RMSE approaches, and and are computed without any assumptions of distribution of .

Last, it is pointed out that while each measurement or observation is equally weighted in the PDC calibration approach, it is not the case in the RMSE calibration approach. For the RMSE approach, measurements with large values weigh much more than those with small values in the calibration process.

#### 3. Case Study

This section presents a case study to elaborate the difference in model calibration outcomes using the two approaches. The example is to calibrate the default Federal Highway Administration (FHWA) simplified model for computation of facing loads of soil nail walls using a total of 23 measured data. These data were collected by Liu et al. [32] from the literature. They corresponded to facing loads monitored during or at completion of wall constructions. Hence, they should be interpreted as “short-term” facing loads.

In the following, the measured facing load database established by Liu et al. [32] is first introduced, followed by a brief review of the default FHWA simplified facing load model. Section 3.3 presents the calibration results along with comparisons and discussion. Note that calibration of the FHWA facing load model has been done by Liu et al. [32] using the PDC approach. Section 3.4 shows how would the selection of calibration approaches affect the practical designs of facing of soil nail walls.

##### 3.1. Database of Short-Term Facing Loads

Figure 1 shows the side and front views of the facing of a typical soil nail wall. Nails are structurally connected to the facing at their heads. As the wall deforms, lateral active earth pressures act onto the facing, which are then transferred to nails due to the nail-facing connections. In equilibrium state, a nail is responsible for the lateral earth pressure within a tributary area where the nail head centers. The product of the earth pressure and the tributary area is referred to as the nail head tensile load or facing load (Figure 1(a)) in this study. Here, the tributary area is the product of horizontal and vertical nail spacing (Figure 1(b)).

**(a)**

**(b)**

Liu et al. [32] developed a database containing measured long-term and short-term facing loads. The short-term load data are extracted and briefly reviewed here for the reader’s convenience to follow. They collected in total 31 short-term load data; however, 8 were identified as questionable data and thus filtered from analyses. Table 2 provides the wall geometry, soil properties, facing type, and nail spacing for soil nail walls where the remaining 23 data were from.

The data were collected from five wall sections, ranging from 4 to 12 m high. All walls had vertical or steep facings and horizontal back slopes. Four walls were in cohesionless soils, while one was in cohesive soil. Two walls were subjected to surcharge, in addition to soil self-weights. The facings were constructed with shotcrete or concrete panels. Nail spacings were set between 1 and 2 m, which are typical. Readers are directed to Liu et al. [32] for detailed description of the collected facing load data.

##### 3.2. FHWA Facing Load Models

The facing of a soil nail wall, as shown in Figure 1, can be simplified as a continuous two-way slab. According to the FHWA soil nail wall design manuals [33], under working conditions, the maximum facing load due to the lateral active earth pressure, , can be calculated aswhere is the empirical spacing factor expressed as ; (unit: m) is the larger of horizontal and vertical nail spacing and , respectively; is the empirical depth factor expressed as = 1.25 h/*H* + 0.5 for 0 < h/*H* ≤ 0.2, = 0.75 for 0.2 < h/*H* ≤ 0.7, and = 2.03−1.83 h/*H* for 0.7 < ≤ 1, where and are the depth and wall height, respectively; , , and are the Coulomb active earth pressure coefficient, soil unit weight, and surcharge, respectively.

The measured facing loads are plotted against the corresponding predicted using equation (8), as shown in Figure 2(a). The data points in the figure appear to be two clusters: one around the 1 : 1 correspondence line and the other below the line. This suggests that the current default FHWA facing load model is conservative as generally it would overestimate the maximum facing loads. This observation is confirmed by computing the sample mean and sample COV of of equation (8), which are and . Hence, on average, equation (8) overpredicts the maximum facing loads by 23%, and the predictions are highly dispersive, according to the ranking scheme proposed by Phoon and Tang [18]. Furthermore, Figure 2(b) shows that tends to decrease as increases. The dependency is quantitatively confirmed at a level of significance of 0.05 by Spearman’s rank correlation test results that are also given in the figure. Such a dependency is undesirable, and its effects on reliability-based geotechnical design have been investigated by Lin and Bathurst [34]. As a result, it is necessary to perform model calibration for equation (8) for accuracy improvements.

**(a)**

**(b)**

To identify the sources of model errors and the abovementioned dependency, are plotted against each input parameter of equation (8), as shown in Figure 3. Spearman’s rank correlation test results show that are statistically correlated to , , and at a significance level of 0.05. Therefore, a correction term which is a function of these three parameters can be introduced to the equation for calibration, i.e., . However, since and are highly correlated, only is kept, while is removed from formulation for simplification. Moreover, is removed from to further simplify the calibration as the focus of this study is on comparison of calibration approaches rather than model development. Last, a power form expression is assumed for , and thus, equation (8) becomeswhere and are the empirical constants to be determined using the 23 measured facing load data; and is the typical tributary area used to make dimensionless. The PDC and RMSE approaches discussed earlier in this study are then used to determine the values of and . The results are shown and discussed in the next section.

##### 3.3. Analysis Results

Calibration of equation (9) is carried out in this section. The accuracy of the calibrated model is then compared based on sample mean and sample COV and also root mean squared errors (RMSEs). After that, the accuracy is rescrutinized from a maximum likelihood perspective, which helps understanding the calibration outcomes.

###### 3.3.1. Comparisons between PDC and RMSE Approaches

Table 3 provides the calibration outcomes for equation (9) using the two approaches. By using the PDC approach, the constants in equation (9) are determined as and , which correspond to and a minimal . The RMSE for this case is computed to be 38.3618. On the other hand, by the RMSE approach, the optimal and values are found to be 0.4866 and –1.3118, respectively. The minimal RMSE is 35.9635. The sample mean and sample COV for this case are and . Several interesting observations can be made from these results.

First, these two approaches are not practically equivalent from the perspective of model calibration when the data volume available for the calibration is small. The resulted values are close, i.e., 0.4964 versus 0.4866; whereas, the values are significantly different, i.e., −1.0677 versus −1.3118.

Second, based on bias statistics of equation (9), the PDC approach appears to be superior as its results are unbiased on average and less dispersive compared to those by the RMSE approach (i.e., 4% overestimation and 2% more dispersive). Interestingly, if comparing the RMSEs, one would easily reach a reverse conclusion that the RMSE approach is better than the PDC approach as it gives less RMSE, 35.9635 against 38.3618. Therefore, it is difficult to judge which one is better if based merely on the results given in Table 3. The comparison should be made from a third angle, which will be discussed in the next subsection.

Last, both approaches are effective and efficient in model calibration for accuracy enhancement. The essence of calibration is to move the data points towards the 1 : 1 correspondence line in general, as shown in Figure 4. The difference is that by the PDC approach seems to be more skewed, while those by the RMSE approach appear to be more uniform. Despite of this, by the two approaches follow closely along the 1 : 1 correspondence line, and thus, the two methods in general do not lead to fatally different outcomes.

###### 3.3.2. Rescrutinization from a Maximum Likelihood Perspective

The accuracies of the three facing load models, i.e., default FHWA, calibrated FHWA by PDC, and calibrated FHWA by RMSE, are reassessed in this section using the maximum likelihood method. The cumulative distributions of and are shown in Figure 5. Kolmogorov–Smirnov tests are applied to the three bias datasets, and the results show that all the datasets can be considered as both normally and log-normally distributed at a significance level of 0.05. Hence, the maximum likelihood estimation (MLE) is carried out assuming normal and log-normal and . The estimation outcomes are given in Table 4.

For both cases, the estimated means by MLE are practically the same as the sample means; while, the estimated COVs by MLE are slightly less than the sample COVs, i.e., about 2–4% which are practically neglectable. However, for the default model case (i.e., equation (9)), the estimated bias COV by MLE assuming log-normal is much larger than the sample COV or that assuming normal , i.e., about 20% higher.

For the normal case, the computed maximum loglikelihood values are −12.8317 and −12.9868 for the PDC and RMSE approaches, respectively. As for both models, the number of parameters is , and the number of data points is ; by using equation (7), the BIC values are correspondingly calculated to be 31.93 and 32.24. This means that, if is a normal random variable, then in this case, the PDC calibration approach is better than the RMSE approach as the BIC value for the former is smaller. On the other hand, are −9.5680 if using the PDC approach and −8.6353 if using the RMSE approach. The corresponding BIC values are 25.41 and 23.54. This suggests that if is a log-normal random variable, then model calibration using the RMSE approach is preferable. For both approaches, the values for the log-normal case are always less than those for the normal case; hence, it can be said that is more likely a log-normal random variable. If comparing the four values, one would conclude that the RMSE-log-normal scenario is the best one as its is the smallest one.

##### 3.4. Practical Influences

Analyses presented earlier show how the selection of a calibration approach would affect the calibration outcomes, i.e., . In this part, we investigate the influences of the calibration outcomes on practical designs of facing of soil nail walls. The facing design must ensure adequate margins of safety against various limit states, including facing flexural, punching shear, and headed-studs tensile failures. For illustration purposes, here we only consider the facing flexural limit state, which requires estimation of the maximum facing load and the ultimate facing flexure capacity. The primary design parameter for this limit state is the reinforcement ratio cross-sectional area per unit width at the nail head and at midspan. The influences of the two calibration approaches are assessed by using to compute the maximum facing loads and to determine the reinforcement ratio.

The example wall used for analysis is 10 m in height with a horizontal back slope and a vertical facing . The means of the soil strength parameters are assumed to be kPa, , and . The COVs are taken as 0.15 for and 0.075 for . Nails are spaced at 1.2 m horizontally and vertically.

###### 3.4.1. On Estimation of Facing Loads

Both equations (9) and (10) are used to compute the nominal facing loads along depth . The results are shown in Figure 6. Visually, along is a trapezoidal shape with larger values in the middle and smaller values at the top and bottom of the wall. The default FHWA model gives the highest facing loads, for example, at , the nominal is about 42 kN; while, the calibrated FHWA model by the PDC approach gives the smallest , which is about 33 kN. The difference is over 20%. The by the calibrated FHWA model by the RMSE approach is about 36 kN. The difference in using the two approaches is about 8%.

Figure 7 shows the distributions of facing loads corrected by biases, or , where the statistics of and are estimated by the method of moment (referred to as sample bias) and maximum likelihood method (referred to as MLE bias). Visually, the distributions of by PDC and RMSE approaches are highly similar, regardless of the methods used to compute bias statistics. Compared to the calibrated FHWA cases, the distributions of for the default FHWA case move leftward, meaning that overall is smaller than . There is a noticeable difference in distributions by sample bias and MLE bias, mainly due to the large difference in COV of , as given in Table 5, which summarizes computed means and COVs of and at using different calibration approaches and bias estimation approaches. On the other hand, for the default FHWA case has longer tails than those for the calibrated FHWA cases; obviously, this is due to the much higher COVs of , compared to those for . Last, it is observed that, in general, the differences in the means and COVs of based on RMSE and PDC approaches are practically insignificant, as given in Table 5.

###### 3.4.2. On Facing Design

Consider identical reinforcement cross-sectional area per unit width at the nail head and at midspan horizontally and vertically, with , the facing flexural capacity, , calculated as [33]where is the empirical factor accounting for the nonuniformity of soil pressures behind the facing and is equal to 2.0, 1.5, and 1.0 for temporary walls when the facing thickness is [35], and is the reinforcement tensile yield strength. In this example, the facing thickness is taken as mm, and thus, . The nominal value of is taken as 414 MPa. The reinforcement area is the main design parameter to be determined given a target margin of safety (e.g., factor of safety or reliability index). The design factor of safety for this limit state can be calculated as

The performance function, , can be written aswhere and are the model factors (model factors (biases) for and , respectively. In this study, is if using the default FHWA model and if using the calibrated FHWA models. is taken as a log-normal random variable with mean of 1.1 and COV of 0.1 [36, 37].

Table 6 provides the design outcomes of using the deterministic approach where the margin of safety is taken as and using the reliability approach where the margin of safety is taken as . Here, is the target reliability index, and roughly corresponds to a probability of failure of 1/5000. Note that these and values are consistent with those recommended in the FHWA soil nail wall design manual by Lazarte et al. [33] and Aashto [38]. Based on the deterministic approach (equation (11)), using the FHWA facing load model calibrated by the PDC approach gives the least (), which is 212 mm^{2}/m; whereas, the default FHWA model gives the highest value, i.e., 266 mm^{2}/m. The difference is about 20%.

However, based on the reliability approach, the computed values using the default FHWA model are much larger than those using the calibrated FHWA models, i.e., 730 versus 446 and 492 and 1135 versus 430 and 447. The difference is about 40–60%. The design outcomes by the calibrated FHWA models by PDC and RMSE approaches are more or less similar, albeit those by the RMSE approach are slightly higher.

Figure 8 shows the values with respect to ranging from 1 to 5 and ranging from 2 to 4 using the default and calibrated FHWA facing load models. It confirms that the difference in is insignificant between the two calibrated FHWA models; both are much less than those obtained by the default model. This highlights the importance of performing geotechnical model calibration while the influence of selection of the model calibration approach is secondary. Tables 5 and 6 and Figure 8 together suggest that adopting the PDC and RMSE approaches for geotechnical model calibration does not result in a fatal difference on practical design outcomes, even under scare data conditions.

**(a)**

**(b)**

#### 4. Conclusions

Calibrations of geotechnical models in many cases have to be carried out with scarce data. This study examines two approaches that have been widely adopted for geotechnical model calibration in the literature, namely, pure data-catering (PDC) approach and root mean squared error (RMSE) approach. The PDC approach calibrates a model by adhering to two criteria: maintaining mean of model bias of one while minimizing COV of model bias, where model bias is defined as the ratio measured to the predicted value. The RMSE approach calibrates a model by minimizing the root mean squared error between measured and predicted values.

A case study is presented to elaborate the influence of selection of model calibration approaches from a practical point of view. The case study is on calibration of the default Federal Highway Administration (FHWA) simplified facing load model for facing design of soil nail walls. A total of 23 measured facing load data collected by Liu et al. [15] are adopted for calibration. Calibration results confirm that the two approaches are not practically equivalent when the data available for calibration are scarce. A model calibrated by the PDC approach usually does not reach minimal RMSE or vice versa. The Bayesian information criterion (BIC) is introduced to rank the competence of the PDC and RMSE-calibrated models fitting to the data. According to BIC, a model calibrated by the PDC approach may or may not be superior to its counterpart by the RMSE approach, depending on the assumption of distribution of model bias.

The two PDC- and RMSE-calibrated models are then used for estimation of facing loads and design of reinforcement ratio against the facing flexure limit state using both deterministic and reliability-based design approaches. It is demonstrated that the estimated facing loads and the determined reinforcement ratios using both calibrated models do not differ significantly from each other. Therefore, in practice, either approach can be adopted for geotechnical model calibration even with scarce data.

Last, there are also other approaches for model calibrations, for example, the Bayesian inference technique. The Bayesian approach provides distributions other than point estimates for estimation of model parameters. This sets the basic differences between the Bayesian approach and the PDC and RMSE approaches. Discussion on parameter determination using Bayesian approaches can be referenced to, e.g., Lin and Yuan [39] and Lin et al. [40].

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors are grateful for financial support provided by Guangdong Provincial Key Laboratory of Marine Civil Engineering (LMCE202107).