Abstract

This study employs geostatistical and artificial intelligence (AI) methods to estimate the degree of ground improvement after dynamic compaction. We implement artificial neural network (ANN) and random forest (RF) for artificial intelligence spatial interpolation to investigate the efficiency of dynamic compaction considering the spatial distribution of the geotechnical parameters. Data used in this study involve averaged SPT N value before dynamic compaction (Nbefore), averaged SPT N value after dynamic compaction (Nafter), applied energy (AE), X- and Y-coordinates at each borehole location, and degree of ground improvement (DI). This study uses the data obtained from a total of 42 borehole logs with an average depth of 17 m and testing depth intervals of 1.5 m after dynamic compaction and 26 SPT-N log data before dynamic compaction. An optimal spatial interpolation tool selected in this study develops a bearing capacity map after dynamic compaction. The model performance is examined using the correlations between SPT-based and predicted bearing capacity in the context of mean absolute error (MAE), coefficient of determination (r2), and root mean square error (RMSE). The model with the least MAE and RMSE and the highest r2 is selected as optimal. The optimal RF (RFVD) model has an RMSE of 15.83 while out of the two geostatistical models considered OK recorded the lower RMSE of 22.62. Results show that RF spatial interpolation techniques outperform traditional geostatistical methods. The artificial neural network model shows good compatibility with physical and intuitive processes pertinent to dynamic compaction. The ANN resulted in a prediction RMSE of 0.11 for DI and an r2 of 0.97. This unique approach for evaluating the efficiency of dynamic compaction will be useful to geotechnical engineers when designing site improvement projects, especially dynamic compaction by employing easily obtainable field data for coarse-grained soils.

1. Introduction

Ground improvement techniques are applied in geotechnical engineering to enhance the engineering properties of loose, soft, or weak soils. In particular, dynamic compaction uses the impact of a free-falling heavy tamper from a drop height H onto the ground (usually H = 10 to 30 m) to densify in-situ coarse-grained soils and it is economical and yields immediate enhancement of soil properties [14]. The factors associated with dynamic compaction designs and the efficiency of its implementation can be divided into two groups: (1) the site conditions such as soil type, groundwater table, bulk unit weight or in-situ density of soil, strata, and seams of soft materials; and (2) the dynamic compaction method with the hammer shape, weight, surface area, drop height, the number of drops, grid spacing, the time interval between phases, and the number of phases [5, 6]. Furthermore, soft underlying compressible soils and/or oversaturated soils often reduce the compaction efficiency [6].

In general, penetration resistances are used to conduct a comprehensive analysis of dynamic compaction results. These empirical correlations and charts assist engineers in overcoming the challenges faced with penetration resistance tests [4, 7]. However, the empirical correlations and charts cannot consider the spatial variability and covariance of geotechnical properties and parameters in heterogeneous soil deposits [8, 9]. Furthermore, compaction efficiency and degree of ground improvements vary with high/low energy compaction modes as well as hammer characteristics [6]. The penetration test results analyzed using empirical charts and equations may not properly reflect the attributes of the hammer and spatial variability [6]. Clearly, soil improvement assessments after dynamic compaction require reliable data interpretation techniques where more complicated spatial variability arises in each single compaction site.

Spatial interpolation methods successfully minimize the uncertainty associated with the standardized use of empirical correlations and charts [10]. The methods include deterministic spatial interpolations (e.g., inverse distance weighting), geostatistical interpolations (e.g., ordinary kriging), and geostatistical simulations (e.g., sequential Gaussian simulation) [8, 11]. The deterministic method is distance-based and an exact estimator unable to consider orientations, trends, and anisotropy in the dataset [12]. The weights selected for the deterministic spatial interpolation are local and arbitrary with the primary goal of interpolation (i.e., error minimization). However, this method cannot carry out a direction-dependent weighting, and often results in unrealistic maps referred to as “bull’s-eyes” [13, 14]. By contrast, geostatistics builds on semivariogram and can account for anisotropy, trends, and orientations. In addition, the geostatistical models result in the generation of realistic maps [13, 15]. The geostatistical interpolation and simulation models cannot account for errors in the estimation of the semivariogram assuming the second-order stationarity for data distribution and in some cases data transformation and back transformation of interpolated data [16]. Geostatistical conditional simulations introduce a level of randomness conditioned to the data [15] that overcomes the smoothing effects in kriging. In comparing these three models, the geostatistical methods (i.e., the latter two) are relatively better estimators with lesser errors than the deterministic methods (Zou et al., 2017). Still, there exist concerns related to geostatistical interpolations, but machine learning ML algorithms appear to be a robust tool with a broad range of applications to geotechnical engineering and scientific problems [1719].

Artificial intelligence algorithms are efficient in solving complex and nonlinear problems and detecting trends present in very huge datasets [20, 21]. Their applications also involve classification and image detection from motion and still pictures. However, machine learning ML for spatial interpolation remains a budding research area in artificial intelligence AI that has been applied only a few times to geotechnical engineering. This unique model could overcome the key issues associated with geostatistical models.

This study aims to establish a robust approach for assessing dynamic compaction using geostatistics methods such as ordinary kriging (OK) and sequential Gaussian simulation (SGS), random forest (RF), and artificial neural network (ANN). The geostatistics methods and random forest method were applied to obtain spatial interpolation of bearing capacity after the dynamic compaction. In addition, the artificial neural network (ANN) was used to develop a model to predict the spatial distribution of the degree of ground improvement (DI) attained after dynamic compaction. The ANN model facilitates the assessment of the spatial distribution of bearing capacity back-calculated from the degree of ground improvement after dynamic compaction. This manuscript starts with a brief introduction to the test site under consideration.

2. Study Area

Land reclamation was performed to create a construction site for oil facilities at the shore off the southwest coast of Ulsan Province in South Korea. As shown in Figure 1, the entire site covers a total area of 675,400 m2 with a perimeter of 5,350 m and it is divided into 9 sections for construction purposes. The study site is the 9th section, which is the last area demarcated with a red rectangular borderline in Figure 1. The reclamation was processed in two steps: (1) a hydraulic filling was conducted with soils dredged from the ocean bed up to the sea level by using a sand dredger vessel, and then (2) soils gathered from the land were dumped to complete the reclamation works up to a predetermined final elevation. The thickness of reclamation fill ranged approximately between 3 and 27 m with different locations. Although the spatial thicknesses of reclamation fills are different with locations, the fills are mainly composed of gravels and sands.

A standard penetration test (SPT) was performed to characterize the site before and after the reclamation work. As shown in Figure 2(a), measured SPT-N values range from 2 to 32 with a mean of 14, implying a loose soil condition and a low strength immediately after reclamation. Although some measured N values satisfy a predetermined design N value for constructing tanks at a few locations, low N values are dominant for most of the study area. Therefore, dynamic compaction was conducted to enhance the engineering properties of the soil within the top of 15 m depth from a surface. The main tamper used in this study weighed 21.5 tons with a base area of 2.25 m2 and the tamper fell from 20 m. The ironing tamper weighed 10 tons and was also dropped from a height of 20 m. After the dynamic compaction, N values were significantly increased with all depth and ranged from 30 to 43 with a mean of 37.

Figure 2(b) shows the coefficient of variation (COV) for N values before and after dynamic compaction. The COV for N values before dynamic compaction shows relatively high fluctuations within the range of COV ≈0.4 to 0.6 while the COV after the dynamic compaction shows minimal variations with an approximate mean COV = 0.25 with depths. Based on the measured N values, the variability in the N values was significantly reduced after the dynamic compaction, implying consistency in the increase in the engineering properties of the soils. Figure 2(c) shows the layout of the average SPT N per each borehole location after DC. This figure shows that the geographical distribution of the boreholes is adequate for spatial evaluation of DC gains in terms of bearing capacity after DC.

3. Methodology

This section details the methodology used in this research that involves two main sections: (1) spatial interpolation of the bearing capacity conducted using geostatistics and machine learning ML approaches and (2) estimation of optimal spatial variability of the degree of ground improvement DI and the bearing capacity after the dynamic compaction using a reliable spatial interpolation model. This study uses Quantum GIS QGIS to develop the location map of the study area while RStudio is used to analyze data and create models for both geostatistical and AI approaches [2224].

3.1. Bearing Capacity Estimation Using Empirical Correlation

Basic data used in this study are the bearing capacity calculated using measured N values with depth before and after the dynamic compaction and their corresponding geographic coordinates transformed to projected coordinates system. The computation of bearing capacity (qa) for a raft foundation uses an empirical equation [25]:where N = uncorrected average SPT-N value from the depth of footing to Df + B, Df = embedded depth of foundation [m], B = width of the foundation where a width of oil tank in this study = 20 m, Fd = depth factor = 1 + 0.33(Df/B) ≤ 1.33, and s = tolerable settlement (s = 25.4 mm in this study). The bearing capacity empirically calculated using the averaged SPT N value after dynamic compaction is used as the observed and raw data for geostatistical and artificial intelligence modeling.

3.2. Geostatistical Approaches

Geostatistical approaches can be categorized as linear or nonlinear [15, 26]. Kriging is a linear weighted estimation algorithm and computes the best linear unbiased estimator based on a spatial stochastic model [9, 27, 28]. Linear methods are simple and derive estimations using observed values assuming a normal distribution of samples. Different kriging types are applied depending on the stochastic properties of a random field. Nonlinear methods are also linear; however, they utilize nonlinearly transformed values of the measured data to estimate spatial parameters [15]. Nonlinear methods give an estimate of its probability distribution conditioned to the original dataset [29]. Note that their predictions are generally more accurate and reliable where a Gaussian random process is unsuitable for generating predictive models [26, 29]. Two geostatistical methods (i.e., one linear and the other nonlinear) were utilized to assess the spatial distribution of the bearing capacity after dynamic compaction in this study: ordinary kriging (OK) is the linear model, while sequential Gaussian simulation (SGS) is the nonlinear model. Both OK and SGS rely on a measure of spatial autocorrelation to derive interpolation weights to make predictions at unsampled locations. The spatial autocorrelation can be determined with a semivariogram.

3.2.1. Semivariogram

Half the sum of the squared difference of geographical variables pairs separated by a given lag defines the measure of dissimilarity or semivariance at the lag under consideration [15]. An experimental semivariogram is developed by plotting semivariance versus separation distance or Euclidean distance between the gathered data points. Semivariogram modeling is required at all distances to estimate values at unsampled locations within the entire study area. Therefore, theoretical semivariograms are utilized as fitting models for experimental semivariograms [30] and the semivariogram is a fundamental step in geostatistical interpolations and simulations. Following the equation below, the semivariogram γ(h) can be calculated for a lag h:where γ(h) is the semivariogram value at a lag of h, n is the number of data pairs separated by a lag h, z(xi) is the field data value at location xi, and z(xi + h) is the field data value at location xi + h.

3.2.2. Ordinary Kriging (OK)

Ordinary kriging (OK) is a commonly utilized kriging technique and provides an estimate of a value at any location for which a semivariogram is known. OK assumes a constant and unknown mean and adopts the stationarity of the autocorrelation of the spatial values (i.e., second-order stationarity). An obvious limitation of OK is the assumption of second-order stationarity which might be inapplicable to some field data; nonetheless, OK is an uncomplicated and versatile spatial prediction tool [9, 31]. OK interpolation estimates the value under consideration at a target location and its mathematical formula is as follows:where stands for the kriging weight of data point at the ith location, zi denotes raw data at a known location (i.e., empirically calculated bearing capacity–Equation 1), and n represents the number of data points in the search neighborhood. The optimal kriging weight is obtained from the covariance matrix structure which minimizes the prediction variance.

3.2.3. Sequential Gaussian Simulation (SGS)

Sequential Gaussian simulation (SGS) is a method that makes use of both the kriging variance and mean to establish a Gaussian field. The Gaussian distribution is typically defined by normal score transformation [9, 15]. SGS is intended for continuous data and normal (Gaussian) distribution, necessitating the use of a normal score transformation. The data is assumed to be stationary in SGS, which implies that the mean, variance, and spatial structure (semivariogram) do not change over the data’s spatial domain [9, 15]. SGS gives a better representation of regional variance since it incorporates local variability that is not captured by kriging. Multiple, equally likely spatial distributional representations of the variable under consideration can be generated through geostatistical simulation making SGS the Monte Carlo technique of geostatistics [32].

3.3. Artificial Intelligence AI
3.3.1. Random Forest (RF) Spatial Interpolation

Random Forest (RF) model is an ensemble method that solves problems by training many weak learners (i.e., decision trees) via bagging [3335]. Bagging is made up of two parts: bootstrap and aggregation. Bootstrapping (sampling with replacement) is performed repeatedly to sample the whole dataset, resulting in a large number of weak learners. Aggregation is used to integrate them for the final forecast, which takes into account all probable outcomes. Decision trees, also known as classification and regression trees (CART), is a machine learning approach for forecasting that uses a sequence of splitting rules. The splitting rules are expressed by nodes, while decisions are portrayed by branches, and predictions are presented by leaves [33, 36]. The classification and regression trees (CART) is prone to overfitting the training data and is not robust, resulting in lower prediction accuracy.

The poor performance of decision trees can be mitigated by random forests. RF decreases variation in predicted values by averaging several decision trees trained on various portions of the same training set. The random forest technique has two essential hyperparameters: the number of trees used in the forest and the number of random variables used in each tree [37]. An optimal RF model can be developed by hyperparameter optimization which involves the optimal selection of the number of features per tree and number of trees through several iterations [38].

To determine the spatial interpolation for the bearing capacity after the dynamic compaction, three cases of random forest (RF) were applied in this study. Input variables are different for each method: the inputs for the first case of the RF model are coordinates (i.e., RFC) while the other two use colinear variables such as vector distance maps (RFVD) and nearest observations of bearing capacity values and their corresponding distances from the interpolation points (RFNO). The introduction of these colinear variables (i.e., RFVD and RFNO) based on the dataset allows the RF algorithm to better learn the spatial autocorrelation of the dataset [39, 40].

3.3.2. Artificial Neural Network (ANN)

An artificial neural network (ANN) is a computerized simulation that mimics the format and functions of the human brain’s neural systems [20, 41]. The ANN is a kind of artificial intelligence model trained based on the input and output data. The learning process of this computational model acquires information about the complex and/or simple relationships between the inputs and outputs [42]. The training considers the errors from each prediction and adjusts the weights for the next prediction until the variance between the predicted values and the target satisfies a preset error margin or stopping criteria [21]. Training in ANN requires a selection of activation and backpropagation functions from many related algorithms [43].

The selection and development of ANN architecture incorporate the model geometry and how the information flows through the network. Although there are numerous architectures and models available, this study employs the multilayer perceptron MLP feed-forward backpropagation model for pattern identification and prediction applications [19, 44]. The architecture of the network displays the selection of the number of hidden layers and nodes that go with each hidden layer. Nonlinear transfer activation functions in the hidden layer enable the model to detect the nonlinearity and complexity between the input and output variables. The selection of the number of nodes is purely based on an iterative process, affects the convergence, and determines the performance of the ANN model. Predicted results associated with the number of nodes include high errors (i.e., compromised model) for fewer nodes and overfitting with poor generalization capacity for too many nodes [21].

In this study, the training of the ANN algorithm in a single hidden layer starts first with one node, and then, extends sequentially up to thirteen nodes to select the optimal. The training avoids overfitting utilizing stopping criteria based on a maximum number of one million steps, and a low error of 0.001 [45]. When cross-validation of the ANN model using the tested dataset begins to show overfitting, low error measurement, and a high coefficient of determination between estimated and observed values, the ANN model is considered to be optimal. After detection of the optimal model, the extended training process confirms the convergence at the global minimum rather than a local minimum. With the aid of the backpropagation that is a very common optimization algorithm in the feed-forward neural network, optimal weights selection is ensured [46]. The optimal ANN model used in this study has 4 nodes in the input layer, 10 nodes in the hidden layer, and one node in the output layer (Figure 3).

An input parameter used for the ANN model is the applied energy AE normalized in terms of the cross-sectional area of the tamper which can be calculated as:where Nd = number of drops of the tamper per phase, Wt = weight of tamper, Hd = drop height, Ae = influence area of each impact point, where Ae = s2, s= spacing between tamper drops, and At = cross-sectional area of tamper.

Data were transformed for ANN modeling by scaling (i.e., Iscaled) within a range of 0.1 and 0.9 using the following equation:where Imin and Imax are the minimum and maximum values of the unscaled dataset (i.e., Iunscaled), respectively. This particular range was selected because it is consistent with the logistic function which was the activation function used in this ANN model. Scaling also ensured early convergence as well as the efficiency of the learning process but subsequently, the scaling process is reversed for the predicted values after training and testing to derive their unscaled values.

3.4. Validation of Models

For comparison of accuracies between geostatistics and machine learning spatial interpolation models, this study employs leave-one-out cross-validation (LOOCV). In LOOCV, just one sample of the whole dataset is used for testing, while the rest of them is used for training. This means that as the prediction based on the testing set is done all the time, only one sample from the dataset is used. LOOCV in actuality is a kind of k-fold cross-validation where k = the total number of data. In the case of the artificial neural network ANN, a 10-fold cross-validation CV was applied, and the model with the smallest RMSE was selected for further analyses as in this case it fully represents the training and testing data characteristics [47]. To reduce bias, the dataset was divided into 10 folds at random. For a total of 10 training and testing iteration instances, a distinct single fold or subset was used as the testing set while the remaining folds were utilized as the training set. The metrics of accuracy MOA used in this study for assessing the predictive accuracy of the models involve the mean absolute error MAE, coefficient of determination r2, root mean square error RMSE, and mean bias error MBE.

3.5. Degree of Ground Improvement

The assessment of the degree of ground improvement DI before and after the dynamic compactions uses average N values as follows:where Nafter is the average N values after the dynamic compaction; and Nbefore is the average N values before the dynamic compaction. Further analyses use DI results in ANN modeling to assess the site improvement in terms of the bearing capacity.

4. Results and Discussion

4.1. Semivariogram Modeling

Figure 4 shows the experimental semivariograms of raw and normal score transformed bearing capacities after the dynamic compaction fitted using three theoretical semivariograms: exponential, Gaussian, and spherical. The normal score transformation (NST) is a data transformation technique that converts data into a normal distribution [15, 48]. This is achieved by ranking the dataset’s values from lowest to highest and comparing them to comparable normal distribution rankings., The range of all theoretically-fitted semivariograms was 45-to-100m, suggesting a reasonable spatial correlation given the size of the area under consideration. Furthermore, the low nugget values in this study imply that any pair of data points separated by very small lags have minimal variation. Outside of the range, all semivariograms show both negative and positive correlations as expected for the decline in autocorrelation with distance.

Based on the criteria of maximum coefficient of determination r2 and lowest MAE and RMSE (Table 1), Figure 4(a) shows that the exponential model seems to be the best match with the experimental semivariogram. For the normal score-transformed bearing capacity, the same criteria were applied to identify theoretical exponential semivariogram to be the best fit (Figure 4(b)). Table 1 summarizes the range, partial sill, and nugget values from the optimized theoretical semivariogram models. These semivariogram parameters were used to determine spatial estimation weights for geostatistical interpolations and simulations for unsampled locations [15]. The bold text in Table 1 highlights the optimal theoretical semivariogram with their measures of accuracy and semivariogram parameters required for semivariogram-based spatial interpolations and simulations.

4.2. Geostatistical Interpolations and Simulations
4.2.1. Cross-Validation

Figure 5 presents the leave-one-out cross-validation (LOOCV) plots for comparison between the geostatistics predicted and empirical-based bearing capacities after dynamic compaction. The predicted bearing capacity after dynamic compaction is obtained using the ordinary kriging (OK) and sequential Gaussian simulation (SGS) (note: empirical-based bearing capacity after dynamic compaction is computed using (1)). The ordinary kriging model provides a relatively higher forecast for the bearing capacity, according to a general comparison of the two LOOCV plots in terms of their measures of accuracy (MOA). Further details, ordinary kriging has a higher value for the coefficient of determination r2 value and lower error values captured in MAE and RMSE in comparison to sequential Gaussian simulation (Table 1). Although ordinary kriging shows a higher r2 value, both models do not show significant differences in their estimation. Sequential Gaussian simulation SGS is commonly used to address the smoothing effect in the kriging. The two models are more similar but SGS can forecast a dataset with a distribution range closer to the original dataset. In particular, SGS rather than kriging performs better to distinguish small characteristics [49].

Table 2 presents the summary statistics for bearing capacities predicted using ordinary kriging OK and sequential Gaussian simulation SGS and the empirical-based equation after dynamic compaction together with the measure of accuracy MOA obtained from LOOCV analyses. Note that the RMSE and MAE describe the prediction accuracy of the estimated values for OK and SGS while the mean bias error MBE assesses the tendency to overestimate or underestimate. SGS has the larger MBE in comparison to OK; yet, both models have small positive mean bias error values near zero, indicating a minor propensity to overestimate. There seems to be a negligible difference in mean values for both models and the raw bearing capacity data after the dynamic compaction. However, the standard deviations for both models are less than that of the raw data. This observation implies that estimated values have less variance relative to the raw data. The narrow LOOCV range of predicted values highlights the smoothing effect of ordinary kriging as shown in Table 2 [50]. The LOOCV range is 248.03-to-351.98 kN/m2 for OK, 235.83-to-365.94 kN/m2 for SGS, and 237.74-to-368.91 kN/m2 for empirical-based bearing capacity after the dynamic compaction. When compared to OK, SGS prediction is clearly closer to the empirically estimated bearing capacity utilized as the raw data for spatial interpolation. This shows that SGS, rather than OK, predicts values closer to the raw data’s real range, as SGS accounts for the smoothing effects common to kriging techniques such as OK [49, 51].

4.2.2. Spatial Distribution Maps

Figure 6 displays the spatial distribution of estimated bearing capacity after the dynamic compaction using the geostatistical interpolation and simulation models. Each row in Figure 6 (i.e., a and (b) has three plots describing the estimated bearing capacity map, prediction error map (i.e., uncertainty estimated using the standard deviation from the mean), and probability distribution function PDF bar plots for raw data overlaid with the spatial prediction of bearing capacity after dynamic compaction. Both OK and SGS methods similarly demarcated that: (1) lower bearing capacities are anticipated on the right side compared to the center and the left-hand side areas and (2) higher bearing capacities appear in the extreme upper left corner and lower central areas (see the left column in Figure 6). In addition, the prediction uncertainty for OK and SGS methods decreases significantly near the sampled points suggesting that the existing conditions of the site must have affected the prediction maps (middle column in Figure 6). By adding interpolated values to the original or raw dataset for further interpolations, SGS overcomes the smoothing effects in kriging estimations as seen in the clear demarcations of smaller details [15, 49]. The probability distribution function PDF bar plots for both OK and SGS reveal a normal distribution similar to the empirical-based bearing capacity after dynamic compaction used as the raw data for spatial interpolations (right column in Figure 6). This observation implies that the distribution of estimated bearing capacity (see third plots to the right of Figures 6(a) and 6(b)) was influenced by the raw data used for OK and SGS modeling [15, 49].

4.3. Machine Learning Spatial Interpolation (ML-SI)

This study uses the random forest (RF) as the machine learning spatial interpolation ML-SI model. There are three RF models such as RFC using coordinates, RFVD using vector distances, and RFNO using nearest observations and their distances. For the RFC model, the input parameters are the X- and Y-coordinates of the study area where the desired target value is the bearing capacity after the dynamic compaction. By contrast, the development of RFVD and RFNO models uses colinear variables. The RFVD uses distance vectors while RFNO uses near observations of empirically calculated bearing capacity after dynamic compaction with their distances from the interpolation point to enable a better learning process of spatial autocorrelation [39, 40].

4.3.1. Cross-Validation

Figure 7 shows the leave-one-out cross-validation LOOCV plot for the three ML-SI models. In general, there is a linear trend in predicted vs. empirical-based bearing capacities after the dynamic compaction; yet the three models tend to over and underestimate the bearing capacity after dynamic compaction around BC ∼ 300 kN/m2. Clearly, the deviation of predicted bearing capacity after dynamic compaction from the 1-to-1 line varies with the model. Table 3 shows the summary statistics for bearing capacities predicted using machine learning spatial interpolation ML-SI and the raw data (i.e., empirical-based bearing capacity after dynamic compaction) together with the measure of accuracy MOA obtained from LOOCV analyses. Based on the mean and standard deviation values shown in Table 3, the ML-SI projected values statistically demonstrate strong similarities with the empirical-based bearing capacity dataset, resulting in a reliable estimation of bearing capacity after dynamic compaction. Overall, the low positive values of the mean bias error MBE indicate that ML-SI tends to slightly overestimate the raw data. In addition, RFVD has the biggest LOOCV range followed by RFNO, and the least range was produced by RFC relative to the empirical-based bearing capacity used as the raw data for ML-SI. The forecasts of RFVD are the most similar to the raw data, followed by RFNO, and finally RFC.

According to the RMSE, MAE, and r2 values reported in Table 3 for each ML-SI model, the RFNO appears to perform best in contrast to the RFC and RFVD, with RFVD performing better than RFC, therefore making RFC the least spatial predictor of bearing capacity after dynamic compaction. In terms of accuracy measures (RMSE, MAE, and r2), the three ML-SI models tend to surpass the two geostatistical models (Figures 5 and 7; Tables 2 and 3). These three models are compared in terms of their LOOCV findings of the ML-SI to aid in the selection of the optimal ML-SI.

4.3.2. Spatial Distribution Maps

Figure 8 shows estimated bearing capacity maps and other inferential plots developed using the three ML-SI models. Each row in Figure 8 (i.e., a, b, and c) involves three plots and describes a map of estimated bearing capacity after dynamic compaction, a prediction error map, and a PDF bar plot of estimated bearing capacity overlaid with the PDF bar plot of the empirical-based bearing capacity after dynamic compaction.

The first plot in Figure 8(a) shows that blocky features as delineated by the contour lines characterize the spatial distribution of the bearing capacities estimated by RFC. Higher bearing capacity values prevail in the left-hand side to the central part of the site while relatively lower appear in the right-hand side region. The blocky pattern-dominant RFC prediction map successfully displays the spatial distribution of bearing capacity, yet this map does not ensure the adequate prediction of realistic spatial data distribution. Therefore, the use of this model requires further analyses to capture the autocorrelation in the site via the introduction of additional correlated covariates or geostatistics as in ML-geostatistics hybrid models [39]. The prediction error for RFC was generally low over the whole site, with a few peripheral locations having larger errors (second plot in Figure 8(a)). The estimation error for RFC is different from the conventional spatial prediction error map [15]. From the third plot of Figure 8(a), the PDF bar plot of RFC-estimated bearing capacities is similar to the PDF bar plot for empirical-based bearing capacity which indicates the influence of the raw data which is comprised of the empirical-based bearing capacity on the predicted values.

The bearing capacity map developed by RFVD was able to delineate the relatively high, medium, and low bearing capacity zones as shown in the first plot of Figure 8(b), and the prediction error for RFVD indicates generally low prediction error values in most places of the map except for the boundaries between low, medium, and high bearing capacity zones, indicating that further field data should be collected to confirm this characteristic at these locations if it is of critical importance (second plot in Figure 8(b)). According to the RFVD estimated FBC PDF bar plots overlaid with the PDF bar plot for the raw data (right column in Figure 8(b)), RFVD estimates more values similar to the raw data.

RFNO was able to demarcate the site into three clear zones of high, medium, and low estimated bearing capacities as shown in the first plot of Figure 8(c). Also, based on the second plot in Figure 8(c), RFNO shows high prediction errors generally across the site. The PDF bar plot of the estimated bearing capacity by RFNO suggests that most of the predictions were biased towards the mean value. The predicted FBC is also likewise a Gaussian distribution, but the mean value has a high density in comparison to the raw data which is the empirical-based bearing capacity (third plot in Figure 8(c)). This illustrates that the RFNO-predicted bearing capacity after dynamic compaction may deviate from the empirical-based bearing capacity in terms of statistical characteristics.

Machine learning spatial interpolation (ML-SI) can, in general, predict the spatial distribution of unsampled geographical data. Based on the spatial maps and PDF bar plots of spatial interpolated bearing capacity after dynamic compaction, the RFVD is judged to be the best model since its map is nonblocky, unlike the RFC, and it is not excessively skewed towards the mean, as in the PDF bar plot of RFNO (Figure 8). Although RFC has blocky properties that make it inappropriate for map modeling, it might be used in conjunction with other computational models for further research [39].

4.4. Artificial Neural Network ANN Modeling

Since Artificial Neural Network ANN is a commonly applied AI algorithm in geotechnical engineering that has produced reliable and robust predictions, this study employs ANN to predict the degree of ground improvement (DI) after the dynamic compaction by using the projected coordinates (X, Y), applied energy AE normalized in terms of the tamper cross-sectional area, and average SPT-N values before dynamic compaction Nbefore. Although measured penetration resistances before and after the dynamic compactions have traditionally been used to evaluate the efficiency of the dynamic compaction, the evaluation method may add some uncertainty because the locations of measured penetration resistances before and after dynamic compaction are practically different. This section clearly addresses the challenge abovementioned by utilizing the optimal RFVD to match empirical-based bearing capacities before and after dynamic compaction with their estimated counterpart based on location.

4.4.1. Optimal ANN Model: Architecture

The architecture of the optimal ANN model for DI prediction consists of the connecting lines between the layers and provides information about the optimal connecting weights (Figure 3). The black lines represent positive weights whereas the grey lines indicate negative weights, and the thickness of each connecting line corresponds to the magnitude of the weights. The thick black line connecting I4 to H2 in Figure 3 suggests that Nbefore may play a primary role in the ANN model. By contrast, a mix of both grey and black lines indicates that the remaining variables such as X, Y, and AE seems to have a secondary influence on DI prediction with an almost equal effect.

4.4.2. Training and Testing

Ten distinct iterations were utilized to train the ANN model, with ten randomly sampled subsets produced by the ten-fold cross-validation technique for training and testing to avoid bias sampling. Figure 9(a) shows the root mean square error RMSE for the 10-folds cross-validation CV. The RMSE for training was mostly lower in comparison to the testing case because the training set is known to the model whereas the testing set is entirely new to the model. RMSE for training cases suddenly increased for second iterations and then gradually decreased with increasing iteration numbers. This trend suggests that the individual characteristics of each training and testing set can influence the predictions. The model at iteration eight was selected and used as a representative model for further analyses because the RMSE value is slightly lower compared to other testing cases as shown in Figure 9(a). Then, Figure 9(b) presents the cross-validation CV result for the estimated and empirical-based degree of ground improvement DI. The plot for the testing sets shows a good prediction with a high r2 = 0.93, a low MAE = 0.15, and RMSE = 0.19. This observation implies that the learning process was successfully carried out without significant overfitting and underfitting between the predicted and calculated DI for both the training and testing subsets.

The statistics of the total training and testing data enumerated in Table 4 indicate that these randomly unbiased sampled subsets have similar statistics and do not show remarkable variance from the total dataset. Furthermore, the feature variables and output which is the degree of ground improvement after dynamic compaction for total, training, and testing datasets do not have any significant dissimilarities. In fact, the training subset adequately captured representative characteristics of the total dataset. The usage of a 10-fold CV ensures that the datasets used for ANN modeling are unbiased and the resulting trained ANN model has a greater generalization ability.

4.4.3. Sensitivity Analyses for ANN Model Variables

A sensitivity analysis was performed to examine the conformity of the ANN model to the general physical behavior which occurs in the dynamic compaction process as depicted in Figure 10. The sensitivity analysis assesses the contributions and behavior of a single input variable on the degree of ground improvement DI while the other independent variables vary based on predefined quantiles. Once again, the input variables studied in the ANN model involve X- and Y-coordinates applied energy AE, and averaged SPT N value before dynamic compaction. Those variables were divided into six splits based on their minimum and maximum values, and also 0.2, 0.4, 0.6, and 0.8 quantiles across their range. By this approach, the variation of the degree of ground improvement is examined from lower to higher quantiles.

Figure 10(a) shows the effect of varying X (i.e., coordinates) on the DI while varying the other explanatory variables across the quantiles stated above. It can be observed that DI generally decreased with increasing quantiles of the other explanatory variables irrespective of the magnitude of the value of X (regardless of the splits). Also, at a constant quantile for the other explanatory variables, the DI reduced for increasing values of X, implying the ANN model was able to detect a trend in conjunction with location and therefore making provision for the contribution of the site-specific characteristics in the model. Similarly, the effect of the Y input variable on DI as shown in Figure 10(b) indicates little or no change in DI for most of the Y splits. Nevertheless, split 1 shows the highest DI values. This pattern indicates that the ANN model for low Y values was able to capture the impacts of the site’s geographical locations in the estimation of the DI. The X and Y input variables for projected coordinates revealed a slight trend (Figures 10(a) and 10(b)), implying that the ANN model identified a relationship between these input variables and DI, which might be thought of as spatial characteristics describing site-specific information.

Since the maximum DI is generally recorded at the initial impact of the tamper (Figure 10(c)), split 1 which indicates the first tamper drop recorded the highest DI for all cases with increasing explanatory quantiles. Also, with an increase in the explanatory quantiles, an increment in DI is observed for the first two splits of AE as shown in Figure 10(c). This observation implies that based on the site-specific conditions and at the initial stages of dynamic compaction higher DI can be recorded. AE splits 3 to 6 which are of higher energy show relatively no change in the DI with increasing other explanatory variables, suggesting that the higher energy dynamic compaction can result in lower gains [7, 52].

According to the results of the ANN sensitivity plot for Nbefore in Figure 10(d), the biggest DI was recorded at the initial application of the tamper, regardless of the magnitude of Nbefore, and the lowest DI occurred at the highest values of the other explanatory variables. This is consistent with dynamic compaction’s physical and intuitive process. It also suggests that, regardless of the Nbefore split, the biggest gains in a dynamic compaction process are recorded at the beginning, and after a certain number of tamper drops, smaller improvements are recorded due to progressive increase in soil strength during the dynamic compaction process [7]. In observing the general behavior of each explanatory variable, it can be deduced that the Nbefore appears to be the most sensitive parameter as it shows significant changes with increasing quantiles of the other explanatory variable. The other parameters show lesser variations compared to Nbefore. This trend suggests that Nbefore is the most influential explanatory variable on DI prediction.

4.4.4. Variable Importance

Figure 11 shows the importance of the independent variables. Clearly, Nbefore has the biggest influence on the model, which is consistent with the intuitive dynamic compaction process, suggesting that the initial site conditions in terms of soil strength have a greater impact on the result of the dynamic compaction. Note that lower strength characteristics of loose soils show greater gains from the dynamic compaction than soils with higher bearing strength. The input parameter X is the next influential followed by Y, and then, lastly by the AE. Once again, this occurrence is in line with the physical process of the dynamic compaction as the site-specific characteristics can be located using their coordinates. As a result, it is not surprising that the ANN model captured the relevance of the input variables in this way. Furthermore, AE has a limited influence on the dynamic compaction process since very small improvements are gained for soils with higher bearing strength regardless of the AE employed [7]. In addition, the site conditions serve as a basis for the appropriate selection of AE, making the Nbefore the dominant parameter relative to the rest.

4.4.5. ANN Model Results: Spatial Distribution of Degree of Ground Improvement (DI)

Figure 12(a) presents a spatial distribution of the degree of ground improvement (DI) obtained using the ANN ML-SI model. Clearly, most of the highest gains due to the dynamic compaction were achieved in the central part of the site, presuming relatively lower soil strength conditions before the dynamic compaction. Figure 12(b) shows the spatial distribution of the bearing capacity after the dynamic compaction obtained from the DI. Presumed bearing capacity after dynamic compaction can be distinguished predominately into three distinctive zones: the right-hand side is characterized by relatively low bearing capacities whereas the central parts to portions of the left-hand side regions are characterized by medium bearing capacities. In addition, the upper and lower portions of the left-hand side are characterized by relatively high bearing capacity. The contour lines of the bearing capacity in Figure 12(b) aids in the identification of high, medium, and low zones of bearing capacities after dynamic compaction. PDF bar plot for predicted bearing capacity after the dynamic compaction in Figure 12(c) is more comparable to the PDF of the raw data-bearing capacities. The ANN model was trained to simulate the dynamic compaction process to estimate the degree of improvement DI, which may be used as a parameter for dynamic compaction efficacy. In addition, the ANN ML-SI model employed in this research predicts DI well and may be used to create bearing capacity maps. Therefore, the unique prediction model used in this study is beneficial because it can produce the spatial distribution of estimated bearing capacity after the dynamic compaction from its target variable, DI, which is important for the dynamic compaction evaluation.

5. Conclusion

This study aims to assess the efficiency of dynamic compaction using geostatistics methods such as ordinary kriging (OK) and sequential Gaussian simulation (SGS), random forest (RF), and artificial neural network (ANN). The geostatistics and random forest methods were used to obtain spatial interpolation of bearing capacity after the dynamic compaction. The artificial neural network (ANN) also allows the prediction of the spatial distribution for the degree of ground improvement (DI) attained after dynamic compaction.

The OK outperformed the SGS according to the LOOCV MOA results of the two geostatistical models. After dynamic compaction, the smoothing effect of OK resulted in a lower range of expected bearing capacities, whereas SGS resulted in a larger range of predicted bearing capacities that were closer to the raw data (i.e., empirically calculated bearing capacities).

In general, all ML-SI models outperformed geostatistical models. Furthermore, ML-SI modeling may be utilized to address the smoothing effect in kriging. Based on the LOOCV approach, the best ML-SI model (RFVD) appeared to be reliable and robust, with high prediction accuracy indices (RMSE = 15.83 and MAE = 12.33).

The results of the ANN ML-SI trained model based on the cross-validation showed that the predictive model for the response variable DI is reliable and satisfactory. A bearing capacity map after dynamic compaction was developed based on the ANN ML-SI-based predicted DI. The comparison between the maps generated by the ML-SI models indicates the additional covariates captured in ANN ML-SI can contribute to a better prediction of bearing capacities.

According to the variable importance evaluation of the ANN model, the average SPT-N value before dynamic compaction (Nbefore) appeared to be the most significant variable whereas the applied energy (AE) contributed the least in forecasting the degree of improvement by dynamic compaction. Furthermore, in the ANN DI prediction model, the X- and Y-coordinates played an intermediary role between the Nbefore and AE variables. This unique technique (i.e., ANN model) can be used to assess the efficacy of dynamic compaction and to deduce a final bearing capacity map after or during the dynamic compaction.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1I1A1A01052283) and funded by the Ministry of Science, ICT & Future Planning (2021R1A2C1010281).