#### Abstract

Since 2000, the research of uncertainty quantification (UQ) has been successfully applied in many fields and has been highly valued and strongly supported by academia and industry. This review firstly discusses the sources and the types of uncertainties and gives an overall discussion on the goal, practical significance, and basic framework of the research of UQ. Then, the core ideas and typical methods of several important UQ processes are introduced, including sensitivity analysis, uncertainty propagation, model calibration, Bayesian inference, experimental design, surrogate model, and model uncertainty analysis.

#### 1. Introduction

In weapon manufacturing [1], aerodynamics modeling [2], detonation modeling [3], inertial confinement fusion [4], and other natural science and engineering fields, *test*, *modeling*, *and simulation* are three main approaches to understand a complex process. The basic element of scientific exploration is experiment and observation. For example, cylinder test is a reliable and representative measure of an explosive ability to accelerate metal [5]. However, due to the limitation of test cost, lack of test conditions, environmental damage caused by test, political constraints, and other reasons, sometimes we have to rely on other means, such as *modeling and simulation* (*M&S*) [6]. The process of mathematical and physical modeling is to decompose and refine the complex system and then attempt to reveal the basic principle behind a phenomenon. Often, complex mathematical models, such as partial differential equations (PDEs), have no closed-form solution [7]. Hence, we need to rely on some numerical simulation methods, such as finite element or finite difference schemes, to obtain the results. The importance of simulation is that it allows parameters to be changed in the models to understand the cause and effect of some complex phenomena which might be too expensive or dangerous for conventional experimental methods [8].

In these processes, many uncertainties are introduced. For the experiment, the main sources of uncertainty are observation error and random perturbation of some experimental conditions. Then, because of the complexity of reality, the incompleteness of information, and the limitation of cognition, the mathematical model will neglect some influence factors and can only approximate the real behavior at a certain level of accuracy, which naturally introduces discrepancies between the physical system and the mathematical model [7]. The discrepancies are the source of uncertainty, including multiple model forms describing the same physical process, random variables, unknown distribution parameters, and initial and boundary conditions. In addition, in the process of numerical solution, time-space discretization, truncation error, rounding error, and the inherent accuracy of digital system all introduce uncertainties to the system [9].

Generally, uncertainties are broadly classified into two categories: aleatory and epistemic uncertainty. Aleatory uncertainty describes the natural/intrinsic variability of a quantity of interest (QoI). Epistemic uncertainty, on the other hand, describes the lack of knowledge and is potentially reducible by acquiring more knowledge. This lack of knowledge comes from many sources [10], for example, inadequate understanding of the processes, incomplete knowledge of the phenomena, and imprecise evaluation of the related characteristics.

The uncertainties are major obstacles for the predictive capability and the reliability of simulations [11]. Different types of uncertainties may present in a given problem and interact with each other, which will affect almost all aspects of engineering modeling and design. Hence, it is important to quantify the errors in order to be able to interpret the results [9], including identifying the main sources of uncertainty, analyzing how the uncertainty propagates in complex systems, and finding stable optimized solutions across a wide range of inputs, and then make better decisions at a known level of confidence, so as to reduce development time, prototype cost, and unexpected failure.

The research on the uncertainty in the deterministic engineering modeling of complex physical processes dates back to around 1980 [12]. After nearly four decades of development, uncertainty quantification (UQ) has played an important role and has been successfully applied in many fields. For example, NASA Ames Research Center has carried out a research on Mars atmospheric entry condition [13] to isolate the rate-limiting mechanisms and identify the chief sources of aeroheating uncertainty. Through Monte Carlo sensitivity analysis and uncertainty analysis, a total of 130 input parameters are statistically varied to shortlist a handful of parameters that essentially control the heat flux prediction. In the research of theoretical nuclear physics, Lawrence Livermore National Laboratory has carried out a research on the reliable UQ of nuclear forces [14], which is an urgent necessity in ab initio nuclear structure and nuclear reaction calculations. Based on the discrepancy between the systematic uncertainty of the Skyrme parameters and the sample standard deviation of the six interactions, they discussed and showed how a detailed statistical scrutiny of the nucleon-nucleon scattering data may provide valuable hints on the interplay between theory and experiment and their assumed uncertainties. In addition, UQ and aeroelasticity are important ingredients for the design optimization of more reliable aircraft [15]. Through designing under uncertainty, designers can seek to minimize structural weight while meeting probabilistic safety constraints. At the same time, a surrogate model with high accuracy and low computational complexity is also an important tool for aerodynamic optimization design [16]. Moreover, due to the uncertainty involved, the credibility of risk assessment results is still a major issue [17]. Hence, the uncertainty quantification in risk assessment is crucial in high-risk fields such as the nuclear and chemical industries. Parameter uncertainty characterization (representation) must be managed correctly so that it can then be propagated through risk models and produce a satisfactory output. Then, by considering all the uncertainties in the risk estimation, the decision with lower risk will be obtained.

Since 2000, the research of UQ has been highly valued and strongly supported by academia and industry and has become one of the important research directions of Applied Mathematics [18]. At the same time, different approaches of UQ have been developed, for instance, sensitivity analysis, Monte Carlo simulation, response surface approaches, evaluation of classical statistical confidence bounds, Dempster–Shafer theory, and Bayesian inference. At present, the methods of UQ are relatively isolated and have not formed into a complete system. This paper will give the basic framework of research on UQ and systematically review the core ideas and typical methods of several important UQ processes.

#### 2. The Framework of UQ

Figure 1 shows the connections between some of these essential components of UQ. It is obvious that the forward UQ process, such as sensitivity analysis and uncertainty propagation, always starts with the characterization of the input uncertainties. Unfortunately, this information is not always readily available. Such condition is known as the “lack of input uncertainty information” issue. Up to now, in the uncertainty, sensitivity, and validation studies of engineering, “expert opinion” or “user self-assessment” has been predominantly used [19]. Such ad hoc specifications of input uncertainty information have been considered reasonable for a long time. However, these approaches are subjective and lack mathematical rigor and can lead to inconsistencies.

The “lack of input uncertainty information” issue necessitates the research on inverse UQ (backward problem [20] or inverse uncertainty propagation [21]), including model calibration and Bayesian Inference. According to Oberkampf [20], “the backward problem asks whether we can reduce the output uncertainty by updating the statistical model using comparisons between computations and experiments.”

While the simulation-based uncertainty quantification seems straightforward, the slow convergence rate poses a major challenge in applications where the computational cost of each sample is high [11]. Experimental design is an effective method, which focuses on how to achieve the same effect as full factorial design with fewer samples [22]. Another approach for overcoming the difficulty of expensive model simulations is surrogate model or response surface method [11]. The surrogate models provide an approximate functional mapping that replaces the true mapping . Once constructed, the surrogate models can be evaluated at negligible computational costs.

Model uncertainty also needs attention. This is mainly aimed at some phenomenological models or data-driven models with insufficient theoretical support coupled in complex physical systems or engineering problems, as well as the influence of numerical discretization techniques, such as the density and type of discrete mesh. Multiple models with similar effects but different structures can be evaluated and integrated using model averaging and model selection method combining the differences between the test and the simulations [11]. Note that, in some instances, it also depends on the optimization and calibration of model parameters, so as to eliminate the influence of other factors, such as parameter uncertainty.

#### 3. Sensitivity Analysis

Sensitivity analysis (SA) is to study the impacts of the input parameters on the outputs of a mathematical model or system. Through SA, the importance ranking of the input parameters can be given according to their contribution to output uncertainties. This ranking can provide two main ideas for system optimization design.

First, by neglecting the uncertainty of input factors with small importance, we can effectively reduce the dimension of high-dimensional complex system to solve the “disaster of dimension,” and reduce the cost of trial and error to provide more efficient guidance information for system optimization design.

Second, by controlling the uncertainty of the important input factors, the designer can reduce the uncertainty of the model output with the minimum economic and time cost, so as to improve the robustness of the model prediction or reduce the failure probability of the structure system to the greatest extent and directly achieve the optimal design of the structure.

There are generally two types of sensitivity analysis: local sensitivity analysis and global sensitivity analysis [23]. Both statistical and deterministic methods are used for sensitivity analysis [24]. In principle, both types of procedures can be used for either local or global sensitivity and uncertainty analysis, but in practice, deterministic methods are used mostly for local analysis, while statistical methods are mostly used for global analysis.

##### 3.1. Local Sensitivity Analysis

The local sensitivity analysis is to analyze the effects of local changes of a parameter in the system [25]. It can further gain more insight of the system for the local structure of the system. The main method for the local sensitivity analysis is the partial derivatives method [26], calculating the exact slopes provided by of the model response with respect to the -th model parameters . For some complex systems, the partial derivatives can be determined indirectly by recalculations of the response using parameter values that deviate by a small amount, , from their nominal values as follows:where . Although this *indirect* (*or brute-force*) *method* [27] is conceptually simple to use and requires no additional model development, it is expensive for models based on nonlinear partial equations with many sensitivity coefficients because each solution of the equations is itself expensive to obtain. Additionally, it involves a trial-and-error process when selecting the parameter perturbations . Note that erroneous sensitivities will be obtained if (i) is chosen to be too small, in which case the computational round off errors will overwhelm the correct values and (ii) the parameter dependence is nonlinear and is chosen too large, in which case the assumption of local linearity is violated.

Another method of calculating the sensitivity coefficients is the *direct method*. In the direct method, the sensitivity coefficients are calculated from an auxiliary set of equations derived from the model equations. For some applications of the direct method in kinetics, the model equations and auxiliary equations have been coupled and solved together. This coupled solution procedure, however, has been found to be unstable or has failed completely for several stiff problems, and it is also quite inefficient [27]. For other applications, the auxiliary equations have been decoupled from the model equations and the two sets of equations solved separately. Among these, the most advanced and computationally economical method is the *decoupled direct method* (*DDM*) [23], which is originally proposed by Dunker [27]. In this approach, the Jacobian matrix needed to solve the original system at a given time step is also used to solve the sensitivity equations at the respective time step before proceeding to solve both the original and sensitivity systems at the next time step. The decoupling procedure greatly increases the efficiency of the method by taking advantage of the fact that the auxiliary equations for different sensitivity coefficients are quite similar.

To overcome the difficulties connected with the early coupled direct method, the *Green’s function method* (*GFM*) was developed [28]. The basis for the method is the well-known Green’s function technique. By working with the same differential equations as the conventional direct method, this new approach reduces the number of differential equations to be solved and replaces them by a set of integrals. Sensitivity coefficients of all orders are expressed in integral form and evaluated in a recursive manner. Since evaluating well-behaved integrals is usually much easier than solving stiff differential equations, substantial savings can be achieved by the method when the number of system parameters is large. Afterwards, a modification of the GFM with the analytically integrated Magnus (GFM/AIM) [29] was presented, which dramatically reduces the computational effort required to determine linear sensitivity coefficients. The technique employs the piecewise Magnus method for more efficient calculation of Green’s function kernels and treats the sensitivity integrals analytically.

For large-scale systems, in which the number of system parameters to be considered exceeds the number of responses, the *adjoint sensitivity analysis procedure* (*ASAP*) is, by far, the most efficient method even though it can only be implemented with an appropriately constructed adjoint sensitivity system [30]. The remarkable efficiency of the ASAP stems from the fact that the adjoint sensitivity system is linear in the adjoint function and is independent of any parameter variations. Hence, the adjoint sensitivity equation needs to be solved only once, for each response, in order to obtain the adjoint function. In particular, if the original model is linear in the state (i.e., dependent) variables, then the adjoint sensitivity equation can be solved independently of the original model. In turn, once the adjoint function has been calculated, it is used to obtain the sensitivities to all system parameters, by simple quadratures, without needing to solve repeatedly differential and/or integral equations.

In addition, there is another kind of sensitivity, namely, implicit sensitivity. In some cases, the implicit sensitivity was ignored because it is always difficult to be quantified. But this may lead to wrong conclusions in some cases. For example, in nuclear reactor physics calculation, the perturbation of micro cross section may impact the resonance calculation and consequently impact the macro cross section and then to the eigenvalue and neutron flux distribution. Response (eigenvalue, reaction rate, etc.) sensitivity with respect to cross sections can be divided into two parts, namely, explicit sensitivity and implicit sensitivity. The former is the direct impact of cross section perturbation on the responses through neutron transport equation, while the latter is the indirect impact of cross section perturbation on the responses through resonance self-shielding procedure [31]. As an indirect impact related with resonance calculation, implicit sensitivity is often neglected in many sensitivity and uncertainty analysis, and many sensitivity and uncertainty analysis codes lack the ability to perform implicit sensitivity calculation. However, from the original research of Greenspan et al. [32] to the subsequent research of Williams et al. [31], the results indicated that the implicit sensitivity had a nonnegligible importance relative to the explicit sensitivity and the implicit effect had a magnitude that was more that of the explicit effect in some cases. Up to now, however, most implicit sensitivity studies are mainly established for simple resonance-calculation methods such as Bondarenko method [31], generalized Stammler method [33], and so on [34], which are not applicable for complex fuel and core designs. In order to expand the implicit sensitivity analysis method to wider application domain, Liu et al. [35] proposed a method based on the generalized perturbation theory (GPT) to calculate the implicit sensitivity coefficients by using the subgroup method in the resonance self-shielding calculation. The numerical results show that it is necessary to perform implicit sensitivity analysis in sensitivity and uncertainty analysis to obtain more rigorous results.

##### 3.2. Global Sensitivity Analysis

To overcome the limitations of local methods (linearity and normality assumptions, local variations) [26], another class of methods has been developed in a statistical framework. In contrast to local SA, global SA aims to evaluate the entire parameter space [24] and measure the contribution of input variables to the output from the average point of view. In the global SA framework, the uncertainty of the inputs is modeled by random vectors. Common approaches for global sensitivity analysis include variance-based methods, moment-independent importance measures, and reliability-based SA.

The *variance-based methods*, such as the correlation ratio-based methods and the Sobol method [36], use variance as a measure of the importance of a parameter in contributing to the overall uncertainty of the response . For the correlation ratio-based method, the sensitivity measure (first-order effect) is written as follows:where denotes the conditional expectation of while keeping fixed, denotes the variance of , and

If has great influence on , then the fluctuation of will be basically determined by that of . It means that the variation of at any given , *i.e.*, , should be really small as well as . From the following decomposition of variancewe can see that should be large as well as . Hence, can reflect the importance of the impact of on . The advantage of this index is that it avoids the specific form of and is applicable to all systems with quadratic integrability. The disadvantage is that it can only describe the importance of a single variable to , but cannot reflect the joint influence of a group of variables.

To supplement the insufficient, Sobol et al. proposed some more generalized indexes based on the orthogonal decomposition, Hoeffding decomposition [37], of the objective function [38]. Under the assumption that is independent with each other and is quadratic integral, the objective function can be divided into the sum of several functions with different dimensions:where , and so on for a total of terms, including . The unicity condition of the decomposition is granted by [39]where . The functions are obtained fromand similarly for higher orders:where , and denotes the conditional expectation of at given . Because of the orthogonality of the decomposition,where denotes the covariance of and . Then, we can get the following conclusions:where . In this way, the fluctuation of output variable is decomposed into the sum of the fluctuation of several items. Based on this variance decomposition, several indexes are proposed: Sobol (1993) [39]: known as the Global Sensitivity Index, well reflects the interaction effect of several input variables on the response . Homma and Saltelli (1996) [40]: where denotes the total effect of on and denotes all the subsets of that including .

These indexes can be used to measure the importance of the variables. And they can identify whether the interaction effect of a group of input variables on output variables exists. More importantly, they have no restrictions on the specific form of the model. The model can be nonlinear, nonmonotonic, and nonadditive. However, due to the high-dimensional integration of complex functions involved in the calculation of expectation and variance, this method is sometimes difficult to achieve.

The class of *moment-independent importance measures* comprises sensitivity measures based on discrepancies between density functions, cumulative distribution functions, and value of information [41]. The name moment independent communicates the intuition that these sensitivity measures take into account the change in the entire distribution (density) of the model output, instead of the variation of one of its particular moments (*e.g.*, variance). The common idea of these methods is to construct an index which can measure the difference between conditional distribution (by fixing a variable) and unconditional distribution to reflect the importance of the variable on the system. The following are some examples.(i)*Density-Based Importance Measure*: The first class of moment-independent sensitivity measures introduced is the class of density-based sensitivity measures:where the factor is inserted for normalization purpose, denotes the density function of , and denotes the conditional density function by fixing . In particular, assumes the null value if and only if is independent of . In fact, if and are independent, fixing leaves the distribution of unchanged. The sensitivity measure is defined by the -norm. Similarly, several other distance measurements, such as Kullback–Leibler (K-L) divergence, can also be used to define the probabilistic sensitivity measures:where represents the K-L divergence between the conditional and unconditional model output distributions. Finally, while full details about estimation cannot be given due to space limitations, we observe that the estimation of any density-based sensitivity measure involves the problem of estimating the empirical density.(ii)*Cumulative Distribution-Based Sensitivity Measures*: A second class of moment-independent sensitivity measures is represented by taking the separation between cumulative distribution functions into account. In general, we writewhere is the unconditional model output cumulative distribution function, is the cumulative distribution function conditional on fixing model input , and is a distance or divergence between cumulative distribution function, such as generic -distance metric, Kolmogorov–Smirnov distance, and Kuiper’s distance. These metrics hold different properties, including scale invariance and transformation invariance. In Baucells and Borgonovo [42], the choice of the metric is thoroughly discussed. In many practical problems, the primary interest of the analyst may be focused on a particular mode of failure of the system under consideration, while the detailed spectrum of probabilistic outcomes may be of secondary concern [43]. For such problems, the so-called *reliability-based algorithms* [41] provide much faster and more economical solutions regarding the particular mode of failure. The concept of “failure” is characterized by a threshold level that specifies mathematically. The reliability algorithms most often used are first-order reliability methods and second-order reliability methods [44]. Both of these methods use optimization algorithms to seek “the most likely failure point” in the space of uncertain parameters, which is defined by the mathematical model and the response function. Once this most likely failure point (referred to as the “design point”) has been determined, the probability of failure is approximately evaluated by fitting a first- (or second-) order surface at that point. Reliability algorithms have been applied to a variety of problems including structural safety, offshore oil field design and operation, and multiphase flow and transport in subsurface hydrology.

#### 4. Uncertainty Propagation

Uncertainty propagation (UP, or error propagation) [45] aims to measure the impact of disturbances in the input variables on the system output. This information is critical when the system needs to quantify the confidence of the outputs [9]. Through appropriate methods, we can make horizontal comparisons of the uncertainties generated by several different models or analyze which steps are the key factors of the diffusion of uncertainty in a complex multiphysical process coupled system. Unlike SA, which focuses on the importance ranking of input, UP pays more attention to the result of error propagation in the system and its influence on system stability. In general, probabilistic method is a mature methodology for uncertain problems when sufficient sample information is available to construct the accurate probability distributions of random inputs [46]. However, due to the expensive experimental cost, the sample data in many engineering practices are always scarce. In this case, there is no enough information to make accurate probabilistic representation of the different types of uncertainties existing in the system. Therefore, some other promising uncertainty theories have emerged, for example, nonprobabilistic interval process [47] and Dempster–Shafer theory [48]. Compared with the traditional probability theory, these new representations of uncertainty are able to represent the epistemic uncertainty more accurately.

##### 4.1. Probability Boxes

In many cases, knowledge on the input is incomplete and probability theory is not sufficient to describe the uncertainty. This motivates the introduction of so-called probability boxes (p-boxes) which account for aleatory as well as epistemic uncertainty in the description of a variable [49]. There are two distinguished types of p-boxes, namely, free and parametric p-boxes.

Mathematically speaking, a *free p-box* is defined by lower and upper bounds denoted by and on the cumulative distribution function (CDF) of a variable. A free p-box can be constructed by Kolmogorov–Smirnov confidence bound [50], Chebyshev’s inequalities [51], and robust Bayes’ method [52]. This implies that the true CDF can have an arbitrary shape as long as it fulls the characteristics of a generic CDF and lies within the bounds of the p-box. Because the shape of the true CDF is not specified, different types of curves are possible, including nonsmooth ones [49].

*Parametric p-boxes* (or distributional p-boxes) are defined as distribution function families, the parameters of which are known in intervals:where is the interval domain of the distribution parameters. And the lower and upper boundary curves of the parametric p-box are obtained by

A parametric p-box can be generated by determining confidence bounds onto the distribution parameters. Parametric p-boxes allow for a clear separation of aleatory and epistemic uncertainty [7]: aleatory uncertainty is represented by the distribution function family, whereas epistemic uncertainty is represented by the intervals in the distribution parameters. However, parametric p-boxes are more restrictive than free p-boxes because they require knowledge on the distribution family.

##### 4.2. Dempster–Shafer Theory

*Dempster–Shafer theory* (*D-S theory*) is also known as evidence theory. In particular, evidence theory considers two measures, *i.e.,* belief and plausibility, for each event in the event space :where the belief measure is defined as the minimum amount that must be associated with an event , whereas the plausibility is defined as the maximum amount that could be associated with . And is interpreted as the amount of likelihood that is associated with event but without any specifications of how this likelihood might be appointed. Note that

The advantage of using evidence theory lies in the fact that it can be successfully used to quantify the degree of uncertainty when the amount of information available is small. And the evidence theory loosens the strict assumption of a single probability measure in probability theory. Apart from that, since there is uncertainty in the given information, the evidential measure for the occurrence of an event and that for its negation do not have to sum to unity. However, Dempster–Shafer theory is based on the assumption that these sources are independent. This rule has been subject to some criticism in the sense that it tends to completely ignore the conflicts that exist between the available evidence from different sources.

##### 4.3. Interval Process

*Interval process* is considered as a flexible method for uncertainty propagation because only the variational ranges need to be well defined without any detailed statistical characteristics of uncertain parameters [47]. Along with the widespread concern in the recent two decades, many interval approaches have been successfully developed [53]. By treating the response bounds as two extreme value models, theoretically the optimization-based methods could obtain accurate results [54]. It means that when the system response was monotonic with respect to the uncertain parameters, the accurate response bounds could be derived by the interval vertex methods, where only the endpoint combinations of interval parameters were simulated [55]. When the system does not satisfy monotonicity, the response bounds can be directly simulated via substantial sample processes, but sometimes also obtained via monotonic interval determination with the help of sensitivity analysis to avoid the huge computational cost.

In practice, p-boxes and D-S theory provide measures of the uncertainty of the variables. However, in order to perform the uncertainty propagation, we need to combine the interval process. To obtain more details about the response on each of the variables with different domains and weaken the presupposition of monotonicity, slicing algorithm needs to be introduced. The slicing algorithm transforms the propagation of p-boxes into the propagation of a large number of intervals [7].

First, each p-box is discretized into a number of intervals and associated probability masses. For variable , the interval is divided into subintervals with corresponding thickness where and . Let us denote the lower and upper boundaries of these intervals by and , respectively. Given the bounds of the free p-box of , the corresponding intervals in are as follows:

Then, let be a set of multi-indices defining a combination of intervals of each input parameter :

Let be the hyperrectangle defined by

For each , two optimization problems are solved to define the associated the bounds of :

When and become large, this quickly becomes intractable due to the large number of optimizations. Hence, sometimes we need surrogate models to simplify the process.

#### 5. Model Calibration

Here we focus on the difference between computer simulations and physical experiments. Model calibration aims to reduce the parameter uncertainty and improve consistency between the computer models and physical experiments by adjusting some selected tunable parameters through estimation, optimization approaches, and so on. Tunable parameters refer to the modeling or calculation parameters that have no physical meaning or have physical meaning but have great cognitive defect and have obvious influence on Model and Simulation (M&S) results. Calibration can be classified as deterministic and statistical calibration [56]. Deterministic calibration merely determines the point estimates of best-fit input parameters to minimize the discrepancies between code output and experimental data [57].

The earliest calibration work is mostly manual, or based on the modeler’s experience. For example, for the parameter calibration of the equation of state in detonation test, Lee et al. [58] put forward an initial guess for the parameters and varied their values in a hydrocode and then repeatedly calculated until a satisfactory agreement between calculation and experimental result is obtained. In these research studies, parameters were “guessed” [58] or “fiddled” [59] based on experience. Current deterministic calibration models range from manually tuning parameters [60], to brute-force search methods [61]; all rely on repeated simulation. An underlying challenge in simulation-based calibration is the curse of dimensionality, which characterizes problems in which the search space (and therefore computation time) increases drastically—often exponentially—with the number of parameters being simultaneously estimated. Hence, simulation-based calibration method is more suitable for small-scale problems. As for large-scale applications, we need to rely on high-precision surrogate models [62].

Moreover, the calibration problem requires the location of the minimum of an objective function that measures the fit of the model results to the data (e.g., the sum of squares of errors). Unfortunately, the objective function may have many local minima and the global minimum may reside in a part of the parameter space that the modeler considers undesirable [60]. However, the statistical method can get a more global calibration result. We will review the statistical inverse UQ in detail in the next section.

#### 6. Bayesian Inference

The inference method based on Bayesian is a typical UQ method under probabilistic representation. It can be used to analyze how the uncertainty is transferred from input to output in a complex multilevel system. It can also be used to quantify the uncertainty of the input parameters through a posteriori probability analysis to reduce the discrepancy between experimental and numerical simulation.

##### 6.1. Hierarchical Bayesian Approach

Simulation of a complex physical system involves multiple levels of modeling, such as material (lowest level) to component to subsystem to system (highest level). Different interdependent physics-based predictive models and their implementation in computer simulation codes are developed at each level. The uncertainties in the input variables are propagated through the simulation codes from one level to next level. The challenges are to identify the relationships between the models at various levels and uncertainty propagation between the computational models.

To quantify the uncertainties in multilevel models, a *hierarchical Bayesian approach* is proposed [63], which is a structural equation modeling (SEM) with latent variables supplemented with Bayesian regression and inference methods. The variabilities and uncertainties are quantified at individual levels, and the probability of meeting requirements may be assessed by model extrapolation from the subsystem to the full system. By using Bayesian estimation, unbiased estimates are derived for the relationships between latent variables at different levels and the measurement and prediction errors are modeled explicitly, while the variabilities of input variables in the computational model are updated effectively [64].

Multilevel Bayesian methods allow using more flexible assumptions regarding both the model parameters and prediction error probability distributions [65]. Development of hierarchical Bayesian models has brought about many successful applications in different scientific disciplines [66]. In molecular dynamics, hierarchical models have recently been developed for calibrating parametric models and fusing heterogeneous experimental data from different system operating conditions [67]. In structural dynamics, Behmanesh et al. [68] have developed a hierarchical framework to model and consider the variability of modal parameters over dissimilar experiments. This framework has found extensive applications in uncertainty quantification and propagation of dynamical models. Nagel et al. [69] have proposed a unified multilevel Bayesian framework for calibrating dynamical models for the special case of having noise-free vibration measurements. Sedehi et al. [70] introduce a novel Bayesian hierarchical setting, which breaks time-history vibrational responses into several segments so as to capture and identify the variability of inferred parameters over multiple segments.

##### 6.2. Bayesian Inference for Inverse UQ

*Inverse UQ* aims to quantify the uncertainty in input parameters such that the discrepancies between code output and observed experimental data can be reduced [71]. In this sense, it is similar to model calibration or parameter estimation. However, unlike the deterministic calibration methods, the statistical calibration methods also capture the uncertainty of the estimates rather than merely determining point of the best-fit input parameters [19].

The inverse UQ mainly employs the Bayesian inference theory and explores the posterior PDF with Monte Carlo sampling. According to the Bayesian inference theory, the information, posterior PDF, for the input parameters can be obtained given the observation of the data:where is the prior, is the likelihood function, and is the posterior. In brief, prior and posterior probabilities represent degrees of belief about possible values of , before and after observing the data . A simple model for the likelihood assumes that independent additive errors account for the deviation between predicted and observed values of :where the components of are *i.i.d*. (independent and identically distributed) random variables. The posterior PDF is the Bayesian solution to the inverse problem. Compared with most of the deterministic inverse methods, it results in not just a single value but a PDF. Various moments and marginal densities can be computed from the posterior PDF. However, such defined posterior PDF is nonstandard and implicit, not normalized, and we need a numerical sampling method to explore this posterior PDF.

Monte Carlo sampling is commonly used to explore the posterior PDF by approximating the integralswith the numerical form

*Plain Monte Carlo* methods are rarely used due to its difficulty to randomly sample from complex or high-dimensional distributions. Instead, *Markov chain Monte Carlo* (*MCMC*) methods are commonly used, which are a class of sequential sampling strategies in which the next sampled state only depends on the current state [11]. The MCMC algorithm samples from a given distribution by constructing a Markov chain whose stationary distribution coincides with the target distribution, such as Metropolis–Hastings sampling and Gibbs sampling. A major challenge of MCMC is that it requires a large number of samples to achieve statistical convergence. Typically the required number of samples range from to , with the specific number depending on the shape of the posteriori distribution and the effectiveness of the sampling. In CFD applications, each evaluation involves a simulation that takes hours or even weeks to run. Clearly, it is impractical to perform a full simulation for each evaluation of likelihood in the MCMC sampling. Therefore, surrogate models are commonly used for likelihood evaluation in MCMC-based model uncertainty quantification to alleviate the high computational cost of simulations [72].

Besides, when the exact probability is not critical and only the low-order moments such as the mean and the variance are important, various approximate Bayesian inference methods can be used. These methods use *maximum a posteriori* (*MAP*) probability estimate to obtain the mode (peak) of the posterior and not the full posterior distribution. The MAP point can be obtained by finding the optimal values for that maximize the log posterior (minimize the negative log posterior):

This effectively eliminates the burn-in procedure for an MCMC chain where some initial portion of the Markov chain is discarded, as the MCMC chain can instead be initiated from a high probability starting point: the MAP solution.

Further, the MAP estimate can be computed in several other ways, among which the most commonly used are variational methods [73] and ensemble methods [74]. In variational methods, the minimization problem is often solved by using gradient descent methods, with the gradient obtained with adjoint methods. In contrast, ensemble methods use samples to estimate the covariance of the state vector, which is further used to solve the optimization problem. Variational methods have been the standard in data assimilation and still dominate the field, while ensemble methods such as ensemble Kalman filtering have matured in the past decades [75]. Hybrid approaches combining both approaches are an area of intense research and have been explored in CFD applications. Moreover, mathematicians have performed analyses to shed light on why they have worked well in practice even with theoretical limitations [76].

#### 7. Experimental Design

In M&S procedure, the choice of the experimental design, *i.e.,* the set of input samples, is crucial for an accurate representation of the computational model [77]. Various approaches are available, from purely deterministic to fully stochastic sampling techniques. The performance of a sampling strategy and the quality of its resulting samples directly control the efficiency and robustness of any associated sampling-based analysis. The key point is how to estimate the behavior of the computational model with a few representative samples, which should be carefully chosen so that the experimental design covers the entire space of input parameters.

##### 7.1. Initial Experimental Design

Intuitively, the regular grid is a good choice to cover the whole space of input variables in a deterministic way [77]. It is a full factorial design, *i.e.*, all regions are covered regularly with the same density of samples in each subdomain. A drawback of the regular grid is that the design is full factorial. This implies that it involves a large number of samples, *i.e.,*, where is the number of levels of each dimension and is the number of variables (dimensions). Therefore, we need some more efficient methods to generate samples.

In theory, *Monte Carlo sampling* is a purely stochastic design of experiments. The samples are generated randomly according to an assumed probability density function. For more efficient sampling strategies, semirandom designs were created, which include randomness and a deterministic part [77].

*Orthogonal experimental design* is one method of the deterministic part. When there exists interaction between variables, the workload of the experiment will become very large, even difficult to implement. To solve this problem, orthogonal experimental design is undoubtedly a better choice [78]. A highly efficient tool of orthogonal experimental design is orthogonal table [79]. The experimenter can find out the corresponding orthogonal table according to the number of factors, the level number of factors, and whether there is interaction and then select some representative points from the comprehensive experiment. The orthogonality makes it possible to achieve the equivalent result with the least number of experiments as a large number of comprehensive experiments. Therefore, orthogonal table design is an efficient, fast, and economic multifactor test design method [80].

*Latin hypercube sampling* (*LHS*) is the special case of an orthogonal array [77], which is the most widely used method at present. The property of this experimental design is that the projection onto any axis in the -dimensional space results in a uniform distribution [81]. The space of input parameters is defined by a regular grid. The samples are then arranged so that there is one sample in each dimension of the grid. Inside each square, the sample coordinates are chosen randomly [82], such as random Latin hypercube (RLHS) and median Latin hypercube (MLHS). LHS is typically used to save computer processing time when running Monte Carlo simulations.

Due to the multiple dimensions of the input parameters in reactor-physics M&S, the sample size required by the conventional sampling methods is very huge and it is impossible to directly estimate the exact sample size which can provide the converged UQ results. Therefore, it is an issue to notably reduce and determine the sample size on the basis of ensuring the consistent UQ results with those under infinite sample size and infinitesimal statistical fluctuations. In this context, Sui et al. [83] have proposed a covariance-oriented sample transformation (COST) to generate multivariate normal distribution samples for uncertainty analysis. In this method, samples from the standard normal distribution are transformed linearly in which the mean and covariance of the transformed samples are ensured equal to that of the input parameter population, respectively. In this way, the transformed samples can fully describe the uncertainty information of the input parameters. From the numerical result comparisons, it can be observed that the consistent uncertainty analysis results can be provided by COST with very small sample size, compared with the conventional sampling methods with very huge sample size.

##### 7.2. Adaptive Experimental Design

In practice, we hope to have a “proper sample size” from the initial sampling process for a given simulation model and sampling-based analysis [84]. First, it refers to a sufficiently large number of sample points that can ensure the convergence or robustness of the analysis results. Then, on the premise of achieving the previous goal, we hope the sample size can be as small as possible. However, the “proper sample size” is not typically known a priori. It means that we may have to supplement some samples in the subsequent analysis process according to the information we get from the initial samples and calculations.

One major drawback of traditional LHS and many other sampling strategies is that they generate the entire sample points at once [84], which is referred to as one-stage or one-shot sampling. This requires users to specify the sample size prior to the associated sampling-based analysis. Also, it is often the case that the user is not satisfied with the resulting sampling-based analysis (*e.g.*, convergence criteria are not met), and needs to enlarge the sample size and resumes the sampling-based analysis with the updated/new sample. The need that maintains the desired distributional proprieties while the sample size grows progressively warrants the development and application of multistage or sequential sampling. This way, sequential sampling will allow the user to monitor the performance of the sampling-based analysis and assess the stopping criteria (*e.g.*, convergence criteria) in an online manner. To address these, Sheikholeslami et al. [84] introduced a novel strategy, called *PLHS* (*progressive Latin hypercube sampling*), for sequentially sampling the input space while progressively maintaining the Latin hypercube properties. The proposed PLHS is composed of a series of smaller slices generated in a way that the union of these slices from the beginning to the current stage optimally preserves the desired distributional properties and at the same time achieves maximum space-filling.

In addition, unlike space-filling DoE techniques (*e.g.*, LHS), there is another special kind of adaptive design technique generating more design points in areas of interest, for example, the areas with high gradients of the response function. It starts with generating initial design by one of the space-filling techniques and builds an approximation based on this initial sample. Adaptive design technique then enriches sample iteratively by adding the next best design point to the most interesting regions, minimizing the uncertainty of the approximation [85]. The question then becomes one of determining the meaning of best. In information theory, the mutual information is a measure of the reduction in the uncertainty of one random variable due to the knowledge of another [86]. Recast into the context of experimental design, the mutual information represents how much the proposed experiment and resulting observation would reduce the uncertainties in the model parameters. Therefore, given a set of experimental design conditions, that which maximizes the mutual information is the most desirable. This is the premise that motivates the Bayesian experimental design algorithm implemented in Dakota [85]. Similarly, Sudret [7] introduced an efficient adaptive experimental design strategy to add multiple samples at each iteration to increase the accuracy in the estimation of the QoI. Further, this algorithm is equipped with a new stopping criterion which monitors the convergence of the QoI better than existing ones. This further reduces the total computational resources needed for a more accurate estimation.

#### 8. Surrogate Model

State-of-the-art numerical simulations are often characterized by a vast number of input parameters [87]. That means these applications always require thousands to millions of runs of the high-fidelity (parameterized) computational models, which is not affordable in many practical cases even with high-performance, parallel computing architectures [7]. Hence, the idea of using a simpler surrogate model to represent a complex phenomenon has gained increasing popularity over past three decades [88]. Surrogate model, also known as response surface or meta-model, is a set of easy-to-evaluate mathematical functions that approximates the actual simulation model based on pairs of input-output samples [89]. Here, we focus on surrogate models which treat the computational model as a black box and just need the input values and the corresponding output values of the QoI. Among them, there are several types of models that are widely used and have received sustained attention from researchers: polynomial chaos expansions (PCEs), kriging, reduced-order model (ROM), and artificial neural network (ANN).

##### 8.1. Polynomial Chaos Expansions

PCE models approximate the computational model by the linear combination of a series of multivariate polynomials:where are the multi-indices with denoting the degree of the univariate polynomial in ,are multivariate orthonormal polynomials which are built in coherency with the distribution of the input random vector , and are the associated deterministic coefficients.

If the simulation response function can be assumed as a continuous and well-behaved function of the input variables, then one may represent the function as a Taylor series about some particular environment condition [90]. However, since the computation model is not explicitly expressed, the coefficients of a Taylor series cannot be obtained by derivation, but rather by regression methods such as least square (LS) and maximum likelihood (ML) [91]. In practice, it is not tractable to use an infinite series expansion. In the early days, people focused on how to find the orthonormal basis [92, 93]. The so-called projection methods were developed [94] for a comprehensive review. Their findings open the possibility of representing stochastic processes with different orthogonal polynomials according to the property of the processes.

In practice, the use of infinite-dimensional PCEs is not tractable. And typically for smooth functions, a small number of polynomials are able to represent accurately the output of the computational model. Hence, some other researchers pay attention to obtain approximate representations by means of truncation. They introduced some truncation schemes to reduce the number of candidate polynomials [95]. The truncated representation is given byin which is a truncation set and is the truncation-induced error. A classical truncation scheme consists in selecting all polynomials of the total degree less than or equal to , when the truncation set readswherewhere is a hyperparameter and a decreasing leads to a smaller number of interactive polynomials. Later, some variable selection methods in the field of statistics are gradually introduced into the construction process of a PCE model to screen out variables that have an important impact on response. By means of submodel testing, adding penalty term to the likelihood function or to the least square objective function [96], the variables are reduced and a sparse model is constructed.

##### 8.2. Kriging

Kriging [97], also known as Gaussian process modeling, assumes that the computational model is a realization of a Gaussian random process. It is a regression algorithm for spatial modeling and prediction of stochastic process or random field based on covariance function [98]. The property of the best linear unbiased predictor makes it widely applied in various fields related to spatial statistics, such as geostatistics, environmental science, and atmospheric science. The general form of a kriging surrogate can be formulated as the summation of two terms: a trend of mean prediction defined by some known independent basis functions at specified location and a random error with zero mean distribution implied the correlation of two distinct samples:where is the mean value (a.k.a trend) of the Gaussian process, and is the trend coefficient corresponding to the -th function . is the process variance and is a zero-mean, unit-variance stationary Gaussian process. is characterized by an autocorrelation function with the hyperparameter .

Various correlation functions can be found in the literature [99], including linear, exponential, Gaussian (also called squared exponential), and Matérn autocorrelation functions. According to the complexity of the trend function, there are three popular kriging models [100], namely, simple, ordinary, and universal kriging. *Simple kriging* assumes that the trend has a known constant value, *i.e.,*. *Ordinary kriging* assumes that the trend has a constant but unknown value, *i.e.,* and is unknown. The most general and flexible is *universal kriging*, which assumes that the trend is composed of a sum preselected functions .

However, specifying a trend or a value for the mean when the underlying function is unknown may lead to inaccuracy in prediction. To avoid this problem, blind kriging is proposed [101] using a Bayesian technique to select models that have maximum posterior probability. Overall, kriging focuses on the local behavior of the computational model, resulting in high prediction accuracy close to sample points of the experimental design, but its global behavior can be poor. To make up for this, Schöbi et al. [77] proposed a novel PC-kriging approach combining kriging with the classical PCE:which holds a good global approximation property. And numerical results show that PC-kriging performs better than the two traditional meta-modeling techniques taken separately [7].

##### 8.3. Reduced-Order Model

In the study of many complex physical processes, the solution is usually high-dimensional, which poses many mathematical challenges for traditional statistical methods. For example, despite tremendous progress seen in the computational fluid dynamics community for the past few decades, numerical tools are still too slow for the simulation of practical flow problems, consuming thousands or even millions of computational core-hours. To enable feasible multidisciplinary analysis and design, the numerical techniques need to be accelerated by orders of magnitude. Reduced-order modeling (ROM) has been considered one promising approach for such purposes.

From a mathematical perspective, the problem of dimensionality/order reduction can be formulated as follows: given the -dimensional vector , find a reduced representation which can retain the geometry of the data as much as possible. Here, is the intrinsic dimensionality, and is expected. Moreover, intrinsic dimensionality indicates that the original data is lying on or near a manifold with dimensionality which is embedded in the *M*-dimensional space. In general, dimensionality reduction techniques can be divided into two groups: i.e., linear and nonlinear methods. Linear techniques assume that the data lies on or near a linear subspace of the high-dimensional space, which can be written as , where is a linear transformation matrix. For nonlinear methods, the linear transformations are replaced with , where denotes the nonlinear mapping function. Here we introduce two popular ROM technologies; for a detailed review of other ROMs, the readers are referred to work by Benner et al. [102] and Yu et al. [103].

One of the most popular linear techniques is the proper orthogonal decomposition (POD) [103], which is also known as Karhunen–Loéve procedure, principal component analysis (PCA), Hotelling analysis, empirical component analysis, quasiharmonic modes, and empirical eigenfunction decomposition in different fields. POD was originally introduced to fluid problems to help analyze coherent structures of turbulence. Through singular value decomposition (SVD), this yields an orthonormal basis, the POD basis. It is optimal in the sense that, for an orthonormal basis of size , it minimizes the least squares error of snapshot reconstruction. Due to its broad applicability to linear and nonlinear systems, as well as to parametrically varying systems, the proper orthogonal decomposition (POD) has become widely used in many different application domains as a method for computing the reduced basis.

*Dynamic mode decomposition* (*DMD*) has been designed to decompose time-resolved data into modes, each one of which corresponds to a single characteristic frequency and growth/decay rate. In principle, DMD is the eigendecomposition of a best-fit linear operator that approximates the underlying dynamics embedded in the datasets. A number of improved variants of DMD have occurred in the literature recently. Hemati et al. [104] proposed an efficient DMD method for large datasets. Also, a parallel version of DMD was developed by Belson et al. [105]. Moreover, it should be noted that the original DMD modes are not orthogonal, which is especially undesirable for a reduced-order model. Orthogonalized DMD modes can be obtained with a recursive method [106]. For problems with highly intermittent dynamics, the original DMD method shows poor performance, for which the multiresolution [107] and time-delay DMD [108] methods are promising.

##### 8.4. Artificial Neural Network

ANN is also a good choice for surrogate model due to its proved universal approximation, that is to say, a standard multilayer feedforward network with a locally bounded piecewise continuous activation function can approximate any continuous function to any degree of accuracy if and only if the network’s activation function is not a polynomial [109, 110]. ANN has been widely applied in modern surrogate modeling with its advantage of consuming trivial computational effort; thus, it can be used as the assistant for CFD calculation [111, 112] for a large number of designed geometries, which greatly increases the optimization efficiency. Its accurate generalization and parallel computation capabilities in complex engineering design problems are helpful in the rapid investigation of design space and searching for the optimal solution [113]. For example, ANN has been used to expedite decision-making process in early stages of aircraft design process and to select proper combination of engine thrust, wing area, and the aircraft weight without going through elaborate details of other direct approaches [114].

However, more often than not, in the course of analyzing complex physical, the cost of data acquisition is prohibitive and we are inevitably faced with the challenge of drawing conclusions and making decisions under partial information. In this small data regime, the vast majority of state-of-the-art ANN techniques are lacking robustness and fail to provide any guarantees of convergence [115]. Therefore, unsupervised learning methods for complex mathematical model (PDEs) will become the focus and difficulty of future research. Recently, Raissi et al. [111] introduced physics-informed neural networks, which are data-driven algorithms for inferring solutions to general nonlinear partial differential equations. Results showcase a series of promising results for a diverse collection of problems in computational science, which open a new path for constructing surrogate models for mathematical physics under zero sample circumstance.

#### 9. Model Uncertainty Analysis

Nowadays, model uncertainty has become one of the most important problems in both academia and industry [116]. The modeling of complex environmental stochastic systems is a difficult task. Because of the complexity of reality, the incompleteness of information, and the limitation of cognition, the mathematical model will neglect some influence factors and can only approximate the real behavior at a certain level of accuracy [7]. While scientists have made substantial progress in getting a better insight into environmental relationships and changes, model uncertainties are still a major obstacle for the predictive capability of simulations [11]. There are many ways in which model uncertainty may occur: the structure of the model may be misspecified a priori, or the model may be identified incorrectly from the data, etc. [117]. All these situations could cause serious problems. For example, in the simulation of turbulent flow, since it is often not possible to know beforehand if one or more of such flow features will be present in a new flow configuration, predictions based on the Reynolds averaged Navier–Stokes equations are flawed by a structural (i.e., model-form) uncertainty [11]. Here we introduce three main approaches to remove or reduce structural uncertainty.

##### 9.1. Model Correction

Here we first consider the fact that the observed quantities differ from the true ones by the experimental (observational) noise, which may be expressed through the relation:where is the true value for and a random vector representative of the experimental noise. The experimental data noise is often assumed to be independently distributed without spatial correlation, and it is modeled as a Gaussian process with diagonal covariance matrix.

Theoretically, the true value for could be obtained as an output of the model , once a suitable set of parameters has been identified, i.e., . In practice, however, few models are perfect [11], even if there is no parameter uncertainty, i.e., the predicted value will not equal the true value of the process [118]. The discrepancy is due to model inadequacy. A general framework to include the model inadequacy term in the stochastic model was first proposed in Kennedy and O’Hagan [119]. Model discrepancy can be taken into account by introducing an additional error term to the statistical model as in equation (36), which could be of additive nature, i.e.,or of multiplicative nature:

The symbol denotes the Hadamard (element-wise) multiplication. The choice of model inadequacy formulation largely depends on the nature and prior knowledge about the observed quantity. is a random field representative of the model inadequacy. In several cases, may involve additional parameters introduced for describing the error behavior, referred to as hyperparameters, which can be estimated based on likelihood maximization criteria or calibrated from the data along with the physical model parameters. When an additive model inadequacy term is used, it becomes difficult to separate its effect from that of the observational error. As for the multiplicative one, if the competing multiplicative statistical models describe the inadequacy term as Gaussian process, the observations can also be modeled as a Gaussian process based on the -method [120].

Although the use of model inadequacy terms is helpful in alleviating parameter overfitting problems, the approach suffers from several limitations: the correction terms are specific to the observed QoI and depend on the spatial distribution of the observed data for the specific scenario [11, 121].

##### 9.2. Model Selection

Model selection is to choose the best model in some statistical sense among a class of competing models. This consists in providing estimates of the posterior probability of each model in the considered set of models given the observed data. The “model” here should be interpreted in a broader sense, including not only physical models with associated coefficients but also statistical models. Each model consists of a family of distributions indexed by . For such setups, the Bayesian approach provides a natural and general probabilistic framework that simultaneously treats both model and parameter uncertainty. Coupled with the advent of MCMC methods for posterior computation, the development and application of Bayesian methods for model uncertainty have seen remarkable evolution over the past decade [122].

The comprehensive Bayesian approach for multiple model setups proceeds by assigning a prior probability distribution to the parameters of each model and a prior probability to each model. Margining out the parameters and conditioning on the data yields the posterior conditional model probabilities:whereis the marginal likelihood of . Based on these posterior probabilities, pairwise comparison of models is summarized by the posterior odds

Insofar as the priors provide an initial representation of model uncertainty, the model posterior provides a complete representation of postdata model uncertainty that can be used for a variety of inferences and decisions. By treating as a measure of the “truth” of model , a natural and simple strategy for model selection is to choose the most probable , the modal model for which is largest.

However, the drawbacks of this approach exist. The selection of one particular model may lead to riskier decisions. In other words, if we choose a wrong model, the consequence will be disastrous. Moral-Benito [123] already pointed out the concern. From a pure empirical viewpoint, model uncertainty represents a concern because estimates may well depend on the particular model considered. Therefore, combining multiple models to reduce the model uncertainty is very desirable.

##### 9.3. Model Averaging

The difficulty in making predictions with a single calibrated model clearly calls for a framework based on multimodel ensembles. Multimodel approaches have been used in aerodynamics [124] and many other applications [125, 126]. Bayesian modeling averaging is among the most widely used multimodel approaches, which combines multiple models enabling researchers to draw conclusions based on the whole universe of candidate models. Rather than choosing a single best model, the analyst may wish to utilize several models that are thought to be plausible a priori, or that seem to provide a reasonable approximation to the given data for the required objective. Prior knowledge is used to select a set of plausible models, and prior probabilities are attached to them. The data are then used to evaluate posterior probabilities for the different models, after which models with low posterior probabilities may be discarded to keep the problem manageable. Finally, a weighted sum of the predictions from the remaining competing models is calculated. There are two different approaches to model averaging in the literature, including frequentist model averaging (FMA) and Bayesian model averaging (BMA) [116].

Frequentist approaches focus on improving prediction and use weighted mean of estimates from different models while Bayesian approaches focus on the probability that a model is true and consider priors and posteriors for different models. The FMA approach does not consider priors, so the corresponding estimators depend solely on data. For its simplicity, the FMA approach has received some attention over the last decade [127].

The BMA approach includes a posterior of the predicted quantity :given calibration data and a set of models. In this framework, the posterior of is an average of posterior predictive distributions corresponding to competing models weighted by their respective model posterior. The Bayesian approaches have the advantage of using arbitrary domain knowledge through a proper prior. However, how to set prior probabilities and how to deal with the priors when they are in conflict with each other are still problems [128]. The probably approximately correct- (PAC-) Bayes theory, first proposed by McAllester [129], combines the performance measurement of PAC, and hence, it can make full use of prior information, provide the tightest generalization error boundary for various learning algorithms, and evaluate the generalization performance of learning algorithms [130]. Generally speaking, compared to model selection methods, the result of an averaged model will not be as good as the (a priori unknown) best model but will not be as bad as the worst one [116].

#### 10. Conclusion

At present, in the complex physical and engineering problems, M&S research has gradually developed a mature system, but also encountered new challenges. The uncertainties introduced by many different sources become a major obstacle for the predictive capability and the reliability of simulations. The research of UQ aims to make better decisions, reduce the cost of trial and error during code development, and improve the reliability of simulation through identifying the main source of uncertainty, analyzing how the uncertainty propagates, searching for stable optimized solutions, and so on.

This paper gives a comprehensive review on the goals, ideas, and principle methods for each of the UQ processes. According to the data flow, the research of UQ can be divided into two categories: forward analysis concerning about the uncertainty propagation from input to output and backward analysis concerning about how to obtain the inference of input from experiment data and simulation output. In addition, as an aid, experimental design and surrogate models are introduced to improve the efficiency of UQ. In practice, most of the processes of UQ are related to and dependent on each other. For example, the good analysis or approximation requires representative samples, while the process of adaptive sampling depends on a good surrogate model and a sufficient understanding of the input, which needs to be obtained through sensitivity analysis, Bayes inference, etc. Last but not least, there comes a new trend that machine learning methods, which are suitable for the analysis of big data (*e.g.*, parameter selection and high-dimensional approximation), are found to have great potential in the UQ research for complex physical processes, and will probably become an important research direction in the future.

Finally, it should be noted that this paper aims to highlight the main conceptual ideas of the whole system of UQ research and focus on innovative concepts, and hence, some details of the methods are intentionally/unintentionally ignored to avoid the impact on the integrity and coherence of the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by National Numerical Wind Tunnel Project (Grant no. NNW2019ZT7-A13), Science Challenge Project (Grant no. TZ2018001), and the Development Program for Defense Ministry of China (Grant no. C1520110002).