Science and Technology of Nuclear Installations

Volume 2015, Article ID 839249, 17 pages

http://dx.doi.org/10.1155/2015/839249

## Demonstration of Emulator-Based Bayesian Calibration of Safety Analysis Codes: Theory and Formulation

^{1}MIT, 77 Massachusetts Avenue, Cambridge, MA 02139, USA^{2}FPoliSolutions, LLC, 4618 Old William Penn Highway, Murrysville, PA 15668, USA^{3}INL, P.O. Box 1625, Idaho Falls, ID 83415-3870, USA

Received 16 January 2015; Revised 1 April 2015; Accepted 28 May 2015

Academic Editor: Francesco Di Maio

Copyright © 2015 Joseph P. Yurko et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

System codes for simulation of safety performance of nuclear plants may contain parameters whose values are not known very accurately. New information from tests or operating experience is incorporated into safety codes by a process known as calibration, which reduces uncertainty in the output of the code and thereby improves its support for decision-making. The work reported here implements several improvements on classic calibration techniques afforded by modern analysis techniques. The key innovation has come from development of code surrogate model (or code emulator) construction and prediction algorithms. Use of a fast emulator makes the calibration processes used here with Markov Chain Monte Carlo (MCMC) sampling feasible. This work uses Gaussian Process (GP) based emulators, which have been used previously to emulate computer codes in the nuclear field. The present work describes the formulation of an emulator that incorporates GPs into a factor analysis-type or pattern recognition-type model. This “function factorization” Gaussian Process (FFGP) model allows overcoming limitations present in standard GP emulators, thereby improving both accuracy and speed of the emulator-based calibration process. Calibration of a friction-factor example using a Method of Manufactured Solution is performed to illustrate key properties of the FFGP based process.

#### 1. Introduction

Propagating input parameter uncertainty for a nuclear reactor system code is a challenging problem due to often nonlinear system response to the numerous parameters involved and lengthy computational times, issues that compound when a statistical sampling procedure is adopted, since the code must be run many times. Additionally, the parameters are sampled from distributions that are themselves uncertain. Current industry approaches rely heavily on expert opinion for setting the assumed parameter distributions. Observational data is typically used to judge if the code predictions follow the expected trends within reasonable accuracy. All together, these shortcomings lead to current uncertainty quantification (UQ) efforts relying on overly conservative assumptions, which ultimately hurt the economic performance of nuclear energy.

This work adopts a Bayesian framework that allows reducing computer code predictive uncertainty by calibrating parameters directly to observational data; this process is also known as solving the inverse problem. Unlike the current heuristic calibration approach, Bayesian calibration is systematic and statistically rigorous, as it calibrates the parameter distributions to the data, not simply tune point values. With enough data, any biases from expert opinion on the starting parameter distributions can be greatly reduced. Multiple levels of data are easier to handle as well, since Integral and Separate Effect Test (IET and SET) data can be used simultaneously in the calibration process. However, implementing Bayesian calibration for safety analysis codes is very challenging. Because the posterior distribution cannot be obtained analytically, approximate Bayesian inference with sampling is required. Markov Chain Monte Carlo (MCMC) sampling algorithms are very powerful and have become increasingly widespread over the last decade [1]. However, for even relatively fast computer models practical implementation of Bayesian inference with MCMC would simply take too long because MCMC samples must be drawn in series. As an example, a computer model that takes 1 minute to run but needs 10^{5} MCMC samples would take about 70 days to complete. A very fast approximation to the system code is thus required to use the Bayesian approach. Surrogate models (or* emulators*) that emulate the behavior of the input/output relationship of the computer model but are computationally inexpensive allow MCMC sampling to be possible. An emulator that is 1000x faster than the computer model would need less than two hours to perform the same number of MCMC samples. As the computer model run time increases, the surrogate model becomes even more attractive because MCMC sampling would become impractically lengthy.

Gaussian Process- (GP-) based emulators have been used to calibrate computer code for a variety of applications. Please consult [2–5] for specific cases as well as reviews of other sources. This work applies a relatively new class of statistical model, the function factorization with Gaussian Process (FFGP) priors model, to emulate the behavior of the safety analysis code. The FFGP model builds on the more commonly used GP emulator but overcomes certain limiting assumptions inherent in the GP emulator, as will be explained later. The FFGP model is therefore better suited to emulate the complex time series output produced by the system code. The surrogate is used in place of the system code to perform the parameter calibration, thereby allowing the observational data to directly improve the current state of knowledge.

The rest of this paper is organized as follows. An overview of the entire emulator-based Bayesian calibration process is described in Section 2. Section 3 discusses the emulators in detail. The first half of Section 3 summarizes the important expressions related to GP emulators. Most of these expressions can be found in numerous other texts and references on GP models, including [6, 7]. They are repeated in this paper for completeness as well as providing comparison to the FFGP expressions in the latter half of Section 3. Section 4 presents a method of manufactured solutions-type demonstration problem that highlights the benefits of the FFGP model over the standard GP model.

#### 2. Overview of Emulator-Based Bayesian Calibration

As already stated, the emulator-based approach replaces the potentially very computationally expensive safety analysis code (also known as a simulator, computer code, system code, or simply the code) with a computationally inexpensive surrogate. Surrogate models are used extensively in a wide range of engineering disciplines, most commonly in the form of response surfaces and look-up tables. Reference [4] provides a thorough review of many different types of surrogate models. The present work refers to the surrogates as* emulators* to denote that they provide an estimate of their own uncertainty when making a prediction [5]. An emulator is therefore a probabilistic response surface which is a very convenient approach because the emulator’s contribution to the total uncertainty can be included in the Bayesian calibration process. An uncertain (noisy) emulator would therefore limit the parameter posterior precision, relative to calibrating the parameters using the long-running computer code itself. Obviously, it is desirable to create an emulator that is as accurate as possible relative to the computer code, which limits the influence of error and uncertainty on the results.

The emulator-based approach begins with choosing the input parameters and their corresponding prior distributions. If the emulator was not used in place of the system code, the Bayesian calibration process would start in the exact same manner. The priors encode the current state of knowledge (or lack thereof) about each of the uncertain input parameters. Choice of prior for epistemically uncertain variables is controversial and relies heavily on expert opinion. Justification for the priors used in the applications of this work is given later on, but choice of the priors is not the focus of this work. Additionally, the choice of the specific input parameters to be used for calibration may be controversial. Dimensionality reduction techniques might be used to help screen out unimportant input parameters [4]. Some screening algorithms such as the Reference Distribution Variable Selection (RDVS) algorithm use GPs to identify statistically significant input parameters [8]. In the nuclear industry specifically, expert opinion-based Phenomena Identification and Ranking Tables (PIRTs) are commonly used to down-select the most important physical processes that influence a Figure of Merit (FOM) [9]. More recently, Quantitative PIRTs, or QPIRTs, have been used in place of the traditional expert opinion PIRTs to try to remove bias and to capture relevant physical processes as viewed by the computer code [10, 11]. No matter the approach, the set of input parameters and their corresponding prior distribution must be specified.

In the emulator-based approach, the prior has the additional role of aiding in choosing the training set on which the emulator is based. As the phrase implies, the training set is the sequence of computer code evaluations used to build or train the emulator. Once trained on selected inputs and outputs, the emulator reflects the complex input/output relationship, so training is clearly an essential piece of the emulator-based approach. There are numerous methods and decision criteria for the selection of the training set; see [4, 5] for more details. Reference [12] provides an excellent counter point for the dangers of not using enough points in generating the training set. This work does not focus on choosing the “optimal” or “best” training set, which is an active area of research. The input parameter prior is used to set bounds on the input parameter values; Latin Hypercube Sampling (LHS) is then used to create a “space filling” design within those bounds. Although not guaranteed to produce the best possible training set, this method adequately covers the prior range of possible input parameter values. An active area of research is how to enhance the training set during the calibration process itself, in order to focus more on the posterior range of possible values.

With the training input values chosen, the computer code is run the desired number of times to generate the training output. The complete training set is then the training input values with their corresponding training output. The emulator is then built by learning specific characteristics that allow the emulator to represent the input/output relationship encoded in the training set. The specific characteristics that must be learned depend on the type of emulator being used. Training algorithms for the standard GP emulator and FFGP emulator are described in Section 3.

Once trained, the emulator is used in place of the computer code in the MCMC sampling via an emulator-modified likelihood function. The modified likelihood functions are presented in Section 3 for each of the emulators used in this work. Regardless of the chosen type of emulator, the emulator-based calibration process results in uncertain input parameter posterior distributions and posterior-approximated predictions, conditioned on observational data. A flow chart describing the key steps in the emulator-based Bayesian calibration process is shown in Figure 1.