Artificial neural networks (ANNs) are a form of artificial intelligence that has proved to provide a high level of competency in solving many complex engineering problems that are beyond the computational capability of classical mathematics and traditional procedures. In particular, ANNs have been applied successfully to almost all aspects of geotechnical engineering problems. Despite the increasing number and diversity of ANN applications in geotechnical engineering, the contents of reported applications indicate that the progress in ANN development and procedures is marginal and not moving forward since the mid-1990s. This paper presents a brief overview of ANN applications in geotechnical engineering, briefly provides an overview of the operation of ANN modeling, investigates the current research directions of ANNs in geotechnical engineering, and discusses some ANN modeling issues that need further attention in the future, including model robustness; transparency and knowledge extraction; extrapolation; uncertainty.

1. Introduction

Artificial neural networks (ANNs) are well suited to model the complex behavior of most geotechnical engineering materials which, by their very nature, exhibit extreme variability. ANNs have also demonstrated superior predictive ability when compared with traditional methods. Since the early 1990s, ANNs have been applied successfully to virtually every problem in geotechnical engineering. In this section, post-2001 applications of ANNs in geotechnical engineering are briefly examined, and interested readers are referred to Shahin et al. [1], where the pre-2001 papers are reviewed in some detail.

The behavior of deep (pile) and shallow foundations in soils is complex, uncertain, and not yet entirely understood. This fact has encouraged many researchers to apply the ANN technique to the prediction of the behavior of foundations. For example, ANNs have been used extensively for modeling the axial and lateral load capacities of deep foundations in compression and uplift, including driven piles [26], drilled shafts [7, 8], and ground anchor piles [9, 10]. The prediction of behavior of shallow foundations has also been investigated, including settlement estimation [1116] and bearing capacity [1719].

Classical constitutive modeling based on elasticity and plasticity theories has limited capability to simulate properly the behavior of geomaterials. This is attributed to reasons associated with the formulation complexity, idealization of material behavior, and excessive empirical parameters [20]. In this regard, many neural networks have been proposed as a reliable and practical alternative to model the constitutive monotonic and hysteretic behavior of geomaterials [2129].

Geotechnical properties and behavior of soils are controlled by factors such as mineralogy; fabric; pore water, and the interactions of these factors are difficult to establish solely by traditional statistical methods due to their interdependence [30]. Based on the application of ANNs, methodologies have been developed for estimating several soil properties, including the preconsolidation pressure [31], shear strength and stress history [30, 3237], swell pressure [38, 39], lateral earth pressure [40], compaction characteristics and permeability [41, 42], soil composition and classification [43, 44], and properties of soil dynamics [45, 46].

Liquefaction during earthquakes is one of the very dangerous ground failure phenomena that can cause a large amount of damage to most civil engineering structures. Although the liquefaction mechanism is well known, the prediction of liquefaction potential is very complex [47]. This fact has attracted many researchers to investigate the applicability of ANNs for predicting liquefaction [4755].

Other applications of ANNs in geotechnical engineering include earth retaining structures [56], dams [57, 58], blasting [59], mining [60], environmental geotechnics [61], rock mechanics [6267], site characterization [68], tunnels and underground openings [6974], slope stability and landslides [71, 7579], and deep excavation [80].

2. Brief Overview of Artificial Neural Networks

Many authors have described the structure and operation of ANNs (e.g., [81, 82]), and whilst a comprehensive description of ANNs is beyond the scope of this paper, it is useful to provide a brief overview. ANNs are a data driven artificial intelligence approach that attempts to mimic, in a very simplistic way, the cognition capability of the human brain. ANNs learn by examples of data inputs and outputs presented to them so that the subtle functional relationships among the data are captured, even if the underlying relationships are unknown or the physical meaning is difficult to explain. This is in contrast to most traditional empirical and statistical methods, which need prior knowledge about the nature of the relationships among the data. This is one of the main benefits of ANNs when compared with most empirical and statistical methods.

Typically, the architecture of ANNs consists of a series of processing elements (PEs), or nodes, that are usually arranged in layers: an input layer, an output layer, and one or more hidden layers, as shown in Figure 1.

The input from each PE in the previous layer is multiplied by an adjustable connection weight . At each PE, the weighted input signals are summed and a threshold value is added. This combined input is then passed through a nonlinear transfer function to produce the output of the PE . The output of one PE provides the input to the PEs in the next layer. This process is summarized in (1) and (2) and illustrated in Figure 1.

The propagation of information in an ANN starts at the input layer, where the input data are presented. The network adjusts its weights on the presentation of a training data set and uses a learning rule to find a set of weights that will produce the input/output mapping that has the smallest possible error. This process is called “learning” or “training.” Once the training phase of the model has been successfully accomplished, the performance of the trained model needs to be validated using an independent testing set. The main steps involved in the development of an ANN, as suggested by Maier and Dandy [83], are illustrated in Figure 2. Several of these steps are discussed in some depth in the following section.

3. Current Development and Future Directions in Utilization of ANNs

One issue that needs to be addressed in order to improve the performance of ANN models is the utilization of a systematic approach in their development. Such an approach needs to address major factors, including the determination of adequate model inputs, data division and preprocessing, choice of suitable network architecture, careful selection of some internal parameters that control the optimization method, stopping criteria, and model validation. For example, in relation to the second step of choice of data sets, method for data division, Shahin et al. [84] provided guidance using a geotechnical engineering example, and recommended the use of three, statistically consistent but independent data sets, one for each of training, testing, and validation. In this context, Shahin et al. [84] have introduced three approaches so that data division can be carried out in a systematic manner, including trial-and-error, self-organizing maps, and fuzzy clustering. For a detailed treatment of each of the steps in the model development process, interested readers are referred to Shahin et al. [85].

Other key issues in relation to ANN modeling that have received recent attention and require further research in the future include developing approaches that (i) ensure the development of robust models, (ii) increase model transparency and enable knowledge to be extracted from trained ANNs, (iii) improve extrapolation ability, and (iv) deal with uncertainty. Each of these is discussed in what follows.

3.1. Model Robustness

Model robustness is the predictive ability of ANN models to generalize over a range of data similar to that used for model training. Kingston et al. [86] stated that if “ANNs are to become more widely accepted and reach their full potential, they should not only provide a good fit to the calibration and validation data, but the predictions should also be plausible in terms of the relationship modeled and robust under a wide range of conditions.” and that “while ANNs validated against error alone may produce accurate predictions for situations similar to those contained in the training data, they may not be robust under different conditions unless the relationship by which the data were generated has been adequately estimated.” This is in agreement with the investigation into the robustness of ANNs carried out by Shahin et al. [87] for a case study of predicting the settlement of shallow foundations on granular soils. Shahin et al. [87] found that good performance of ANN models on the data used for model calibration and validation does not guarantee that the models will perform well in a robust fashion over a range of data similar to those used in the model calibration phase. For this reason, Shahin et al. [87] proposed a method to test the robustness of the predictive ability of ANN models by carrying out a sensitivity analysis to investigate the response of ANN model outputs to changes in its inputs. The robustness of the model can then be determined by examining how well model predictions are in agreement with the known underlying physical processes of the problem in hand over a range of inputs. In addition, Shahin et al. [87] advised that the connection weights should be examined as part of the interpretation of ANN model behavior, using, for example, the method suggested by Garson [88]. On the other hand, Kingston et al. [86] adopted the connection weight approach of Olden et al. [89] for a case study in hydrological modeling in order to assess the relationship modeled by the ANNs, as Olden et al. [89] found that this approach provided the best overall methodology for quantifying ANN input importance in comparison to other commonly used methods, though with a few limitations.

Support vector machines (SVMs) are an alternative data-driven modeling approach that is claimed to provide better generalization capabilities and higher accuracy than ANNs and are therefore worth further consideration in relation to achieving improved model robustness [90]. Interested readers are referred to A. T. C. Goh and S. H. Goh [91] for a good overview of this technique. Recent applications of SVMs in the field of geotechnical engineering include the prediction of liquefaction potential [90, 91], analysis of slope stability [92], and modeling friction capacity of driven piles [93].

3.2. Model Transparency and Knowledge Extraction

Model transparency and knowledge extraction are the feasibility of interpreting ANN models in a way that provides insights into how model inputs affect outputs. Figure 3 shows the classification of modeling techniques based on colors [94] in which the higher the physical knowledge used during model development, the better the physical interpretation of the phenomenon that the model provides to the user. It can be seen that the color coding of mathematical modeling can be classified into: white-, black-, and grey-box models, each of which can be explained as follows [95]. White-box models are systems that are based on first principles (e.g., physical laws) where model variables and parameters are known and have physical meaning by which the underlying physical relationships of the system can be explained. Black-box models are data-driven or regressive systems in which the functional form of relationships between model variables is unknown and needs to be estimated. Black-box models rely on data to map the relationships between model inputs and corresponding outputs rather than to find a feasible structure of the model input-output relationships. Grey-box models are conceptual systems in which the mathematical structure of the model can be derived, allowing further information of the system behavior to be resolved.

ANNs belong to the class of black-box models due to their lack of transparency and the fact that they do not consider nor explain the underlying physical processes explicitly. This is because the knowledge extracted by ANNs is stored in a set of weights that are difficult to interpret properly, and due to the large complexity of the network structure, ANNs fail to give a transparent function that relates the inputs to the corresponding outputs. Consequently, it is difficult to understand the nature of the input-output relationships derived. This issue has been addressed by many researchers with respect to hydrological engineering. For example, Jain et al. [96] examined whether or not the physical processes in a watershed were inherent in a trained ANN rainfall-runoff model. This was carried out by assessing the strengths of the relationships between the distributed components of the ANN model, in terms of the responses from the hidden nodes, and the deterministic components of the hydrological process, computed from a conceptual rainfall runoff model, along with the observed input variables, using correlation coefficients and scatter plots. They concluded that the trained ANN, in fact, captured different components of the physical process and a careful examination of the distributed information contained in the trained ANN can be informative about the nature of the physical processes captured by various components of the ANN model. Sudheer [97] performed perturbation analysis to assess the influence of each individual input variable on the output variable and found it to be an effective means of identifying the underlying physical process inherent in the trained ANN. Olden et al. [89], Sudheer and Jain [98], and Kingston et al. [99] also addressed this issue of model transparency and knowledge extraction.

In the context of geotechnical engineering, Shahin et al. [12] and Shahin and Jaksa [9] expressed the results of the trained ANNs in the form of relatively straightforward equations. This was possible due to the relatively small number of input and output variables, and hidden nodes. Neurofuzzy applications are another means of knowledge extraction that facilitate model transparency. Neurofuzzy networks use the fuzzy logic system to store knowledge acquired from a set of input variables () and the corresponding output variable in a set of linguistic fuzzy rules that can be easily interpreted, such as IF ( is high AND is low) THEN ( is high), , where is the rule confidence, which indicates the degree to which the above rule has contributed to the output. Examples of such applications in geotechnical engineering include Ni et al. [100], Shahin et al. [16], Gokceoglu et al. [62], Provenzano et al. [19], and Padmini et al. [18].

A recent technique that belongs to the class of grey-box models, and therefore does not suffer from the problem of model transparency and knowledge extraction, is genetic programming (GP). Several researchers (e.g., [34, 50, 101104]) have recently used the GP technique as an alterative to ANNs in order to obtain greatly simplified formulae for some geotechnical engineering problems. GP is a computing method that attempts to mimic the biological evolution of living organisms. GP makes use of the principles of genetic algorithms (GAs) for parameter optimization in which a population of expressions (or computer programs) for a function , coded in tree structures of variable size, is generated and executed. The generated expressions are then modified by means of artificial evolution in order to perform a global search to arrive at the best fit mathematical expression for that solves a certain problem. Additional advantages of GP over ANNs are that the structure and network parameters of ANNs (e.g., number of hidden layers and their number of nodes, transfer functions, learning rate, etc.) should be identified a priori and are usually obtained using adhoc trial-and-error approaches. However, the number and combination of terms, as well as the values of GP modeling parameters, are all evolved automatically during model calibration. However, hybrid approaches can also be used, in which genetic algorithms are used to evolve optimal ANN structures and connection weight values. It should be noted that while white-box models provide maximum transparency, their construction may be difficult to obtain for many geotechnical engineering problems, where the underlying mechanism is not entirely understood.

3.3. Model Extrapolation

Model extrapolation is the ability of ANN models to predict well outside the range of the data used for model calibration. It is generally accepted that ANNs perform best when they do not extrapolate beyond the range of the data used for calibration [105107]. Whilst this is not unlike other models, it is nevertheless an important limitation of ANNs, as it restricts their usefulness and applicability. Extreme value prediction is of particular concern in several areas of civil engineering, such as hydrological engineering, when floods are forecast, as well as in geotechnical engineering when, for example, liquefaction potential and the stability of slopes are assessed. Sudheer et al. [108] highlighted this issue and proposed a methodology, based on the Wilson-Hilferty transformation, for enabling ANN models to predict extreme values with respect to peak river flows. Their methodology yielded superior predictions when compared with those obtained from an ANN model using untransformed data.

3.4. Model Uncertainty

Finally, a further limitation of ANNs is that the uncertainty in the predictions generated is seldom quantified [109]. Failure to account for such uncertainty makes it impossible to assess the quality of ANN predictions, which severely limits their efficacy. In an effort to address this, a few researchers have applied Bayesian techniques to ANN training (e.g., [110113]) in the context of hydrological engineering and Goh et al. [7] with respect to geotechnical engineering. Goh et al. [7] observed that the integration of the Bayesian framework into the back-propagation algorithm enhanced neural network prediction capabilities and provided assessment of the confidence associated with network predictions. Research to date has demonstrated the value of Bayesian neural networks, although further work is needed in the area of geotechnical engineering. Shahin et al. [13, 114] also incorporated uncertainty in the ANN process by developing a series of design charts expressing the reliability of settlement predictions for shallow foundations on cohesionless soils.

4. Discussion and Conclusions

In the field of geotechnical engineering, it is possible to encounter some types of problems that are very complex and not well understood. In this regard, ANNs provide several advantages over more conventional computing techniques. For most traditional mathematical models, the lack of physical understanding is usually supplemented by either simplifying the problem or incorporating several assumptions into the models. Mathematical models also rely on assuming the structure of the model in advance, which may be suboptimal. Consequently, many mathematical models fail to simulate the complex behavior of most geotechnical engineering problems. In contrast, ANNs are a data driven approach in which the model can be trained on input-output data pairs to determine the structure and parameters of the model. In this case, there is no need to either simplify the problem or incorporate any assumptions. Moreover, ANNs can always be updated to obtain better results by presenting new training examples as new data become available. These factors combine to make ANNs a powerful modeling tool in geotechnical engineering.

Despite the success of ANNs in geotechnical engineering and other disciplines, they suffer from some shortcomings that need further attention in the future, including model robustness, transparency and knowledge extraction, extrapolation, and uncertainty. In addition and according to Flood [115], ANNs in civil engineering, including geotechnical engineering, were used mostly as simple vector mapping devices for function modeling of applications that require rarely more than a few tens of neurons without higher-order structuring. Together, improvements in these issues will greatly enhance the usefulness of ANN models and will provide the next generation of applied artificial neural networks with the best way for advancing the field to the next level of sophistication and application. Until such an improvement is achieved, the authors agree with Flood and Kartam [105] that neural networks for the time being might be treated as a complement to conventional computing techniques rather than as an alternative or may be used as a quick check on solutions developed by more time-consuming and in-depth analyses.