Science and Technology of Nuclear Installations

Volume 2017, Article ID 2679243, 16 pages

https://doi.org/10.1155/2017/2679243

## Comprehensive Uncertainty Quantification in Nuclear Safeguards

^{1}SGIM/Nuclear Fuel Cycle Information Analysis, International Atomic Energy Agency, Vienna, Austria^{2}International Safeguards, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, Jülich, Germany

Correspondence should be addressed to T. Burr; gro.aeai@rrub.t

Received 12 March 2017; Accepted 13 June 2017; Published 12 September 2017

Academic Editor: Oleg Melikhov

Copyright © 2017 E. Bonner et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Nuclear safeguards aim to confirm that nuclear materials and activities are used for peaceful purposes. To ensure that States are honoring their safeguards obligations, quantitative conclusions regarding nuclear material inventories and transfers are needed. Statistical analyses used to support these conclusions require uncertainty quantification (UQ), usually by estimating the relative standard deviation (RSD) in random and systematic errors associated with each measurement method. This paper has two main components. First, it reviews why UQ is needed in nuclear safeguards and examines recent efforts to improve both top-down (empirical) UQ and bottom-up (first-principles) UQ for calibration data. Second, simulation is used to evaluate the impact of uncertainty in measurement error RSDs on estimated nuclear material loss detection probabilities in sequences of measured material balances.

#### 1. Introduction

Nuclear material accounting (NMA) provides a quantitative basis to detect nuclear material loss or diversion at declared nuclear facilities. NMA involves periodically measuring facility input transfers , output transfers , and physical inventory to compute a material balance (MB) defined for balance period as , where is the book inventory. In NMA, one MB or a collection of MBs are tested for the presence of any statistically significant large differences and/or for trends, while allowing for random and systematic errors in variance propagation to estimate the measurement error standard deviation of , Similarly, in verification activities done by an inspector, paired operator and inspector data are tested for any large differences and/or for trends [1, 2]. Therefore, both material balance evaluation and verification activities require statistical analyses, which require UQ.

In metrology for nuclear safeguards, the term “uncertainty” characterizes the dispersion of estimates of a quantity known as the measurand, which is typically the amount of NM (such as U or Pu) in an item. To measure the amount of NM, both destructive analysis (DA, a sample of item material is analyzed by mass spectrometry in an analytical chemistry laboratory) and nondestructive assay (NDA, an item is assayed by using a neutron or gamma detector) are used. NDA uses calibration and modeling to infer NM mass on the basis of radiation particles, such as neutrons and gammas emitted by the item and registered by detectors. For any measurement technique, one can use a first-principles physics-based or “bottom-up” approach to UQ by considering each key step and assumption of the particular method. Alternatively, one can take an empirical or “top-down” approach to UQ, for example, by comparing assay results on the same or similar items by multiple laboratories and/or calibration periods.

A well-known guide for bottom-up UQ is the Guide to the Expression of Uncertainty in Measurement (GUM, [3]). The GUM also briefly mentions top-down UQ in the context of applying analysis of variance (ANOVA, [4]) to data from interlaboratory studies. Although the GUM is useful, it is being revised because it has known limitations [5–7]. For example, the GUM provides little technical guidance regarding calibration as a type of bottom-up UQ or regarding top-down UQ [5–8]. The GUM also mixes Bayesian with non-Bayesian concepts. In a Bayesian approach, all quantities, including the true measurand value, are regarded as random. In a non-Bayesian (frequentist) approach, some quantities are regarded as random and other quantities, such as the true value of the measurand, are regarded as unknown constants. This paper uses both Bayesian and non-Bayesian concepts but specifies when each is in effect. For example, in the Bayesian approach to top-down UQ in Section 3, the true RSD values are regarded as being random.

In NDA safeguards applications, the facility operator declares the NM mass of each item. Then, some of those items are randomly selected for NDA verification measurement by inspectors. This is a challenging NDA application because often the detector is brought to the facility where ambient conditions can vary over time and because the items to be assayed are often heterogeneous in some way and/or are different from the items that were used to calibrate/validate and assess uncertainty in the NDA method. Because of such challenges, “dark uncertainty” [9] can be large, as is evident whenever bottom-up UQ predicts smaller measurement error RSDs than are observed in top-down UQ [1]. The RSD of an assay method is often defined as the reproducibility standard deviation as estimated in an interlaboratory comparison. As shown in Section 3, comparing NDA verification measurements to the operator’s DA measurements can be regarded as a special case of an interlaboratory evaluation [10–12].

For top-down UQ applied to NM measurements of the same item by both the operator (often using DA) and the inspector (often using NDA), this paper describes an existing and a new approach to separately estimate operator and inspector systematic and random error variance components. Systematic and random error components must be separated because their modes of propagation are different (Section 4). Currently, random error variance estimates (from paired data) are based on Grubbs’ estimator or variations of Grubbs’ estimator, which was originally developed by Grubbs to estimate random error variance separately for each of the two methods applied to each of several items, without repetition of measurement by either method [13, 14]. In Section 3, Grubbs’ estimator, constrained versions of Grubbs’ estimator, and a Bayesian alternative [7] are described; the Bayesian option easily allows for parameter constraints and prior information regarding the relative magnitudes of variance components to be exploited to improve top-down UQ.

This paper is organized as follows. Section 2 provides a background on bottom-up UQ for NDA, describes a gamma-based NDA example and a neutron-based NDA example, and illustrates why simulation is necessary for improved UQ for calibration data. Section 3 reviews currently used top-down UQ and describes a new Bayesian option [7] that applies approximate Bayesian computation. Section 4 provides a new simulation study assessing the sensitivity of estimated NM loss detection probabilities to estimation errors in measurement error RSDs. Section 5 concludes with a summary.

#### 2. Bottom-Up UQ

For bottom-up UQ, the GUM [3] assumes that the measured value can be expressed using a measurand equation that relates input quantities (data collected during the measurement process and relevant fundamental nuclear data such as attenuation corrections) to the output (the final measurement value). The GUM’s main technical tool is a first-order Taylor approximation applied to the measurand equation which relates input quantities (regarded as random) to the measurand* Y* (also regarded as random). The input quantities can include estimates of other measurands or of calibration parameters, so (1) is quite general. The variance of each and and any covariances, , between pairs of ’s are then propagated using the Taylor approximation to obtain (or using simulation if the Taylor approximation is not sufficiently accurate) to estimate the variance in ,

According to (1), the estimated value of the measurand is a random variable, regardless of whether the left side of (1) is expressed as* Y* (as in a typical Bayesian approach) or as (as in a typical non-Bayesian setting). The hat notation is a frequentist convention for denoting an estimator, so is an estimate of , and (which is also denoted as , where “*T*” denotes the true value) denotes the unknown true value of the measurand.

##### 2.1. Calibration UQ as an Example of Bottom-Up UQ

Typically, calibration is performed using reference materials having nominal measurand values (known to within a relatively small uncertainty), and then, in the case of linear calibration, (1) can be reexpressed as , where is the estimated measurand value, and are parameters estimated from calibration data, is the net count rate (usually the net gamma or net neutron count rate in NDA; see Section 2.3), and the three inputs in mapping to (1) are , , and . The estimates and will vary in predictable ways (Sections 2.2 and 2.3) across repeats of the calibration.

The convention in statistical literature to reverse the roles of and from that in GUM’s equation (1) will be followed here, so denotes the quantity to be inferred (the measurand value) and denotes the detected net radiation count rate. Then, in the case of reverse regression (see below), (1) can be expressed as identifying , , and . Following calibration on data consisting of pairs (lowercase denotes observed value of a random variable), the three “input quantities” , , and have variances and covariances that can be estimated. However, in most applications of calibration in NDA, accurate estimation of these variances and covariances requires simulation because analytical approximations as described in Section 2.2 have been shown to be inadequate (see Section 2.3).

Expressing (2) as indicates how the estimate is computed and how to assign systematic and random error variances to . For example, and to introduce notation used in top-down UQ (Section 3), one could express the estimate as , where denotes the true value of the measurand,* S* denotes systematic error due to estimation error in the fitted slope and intercept and/or due to correlations among the inputs, and denotes random error. If there are no correlations among the inputs but only estimation error in the fitted slope and intercept during calibration, then expressions for the variances of and* R*, denoted as and , respectively, can be given (Sections 2.3 and 3), which allow comparison between bottom-up UQ and top-down UQ. The GUM does not discuss calibration in much detail; instead, the GUM applies propagation of variance to the steps modeled in (1), which sometimes leads to a defensible estimate of the combined variances of and . The GUM does not attempt to separately estimate the variances of and* R*, but such separation is needed in some applications, such as assigning an uncertainty to a sum of measurand estimates ([15] and Section 4).

##### 2.2. Extension of Standard Regression Results to Calibration

One way to express that the net count rate depends on the true measurand value iswhich is a typical model used in regression when there is negligible error in . If errors in predictors cannot be ignored, (3) should be modified; however, one can still regress measured on measured* X*, so, in effect, (3) can be reexpressed as , where the tildes denote that parameter values and the random error are different from those in (3).

In inverse calibration, (3) is used, and one inverts the fitted model using and to use future measured to predict enrichment usingwhich is regression followed by inversion. An alternative model to (3) is reverse calibration:In reverse calibration, (5) expresses the measurand as a function of the true net count rate , but in practice one must regress on , where is a random error. As an aside, this paper does not consider models with systematic errors such as or . Cheng and Van Ness [16] point out that any additive systematic errors in or could be absorbed into and , respectively; however, any systematic errors in the values used for calibration would remain a part of the total uncertainty.

Both inverse and reverse calibrations involve ratios of random variables, which can be problematic [7, 17]. In inverse calibration, the solution in (4) involves division by the random variable , which has a normal distribution under typical modeling assumptions. Williams [18] notes that in (4) has infinite variance even if the expected value of is nonzero, due to division by a normal random variable [19], and hence has infinite mean squared error, while the reverse estimator has finite variance and mean squared error. In reverse calibration, the least squares solution also involves division of random variables ( is the vector or matrix of values used in calibration and is the vector of values in calibration). Experience suggests that one can develop adequate approximations for the ratio of random variables when the ratio is almost certain to be far from infinity or zero [19]. Ignoring errors in predictors, [20] uses the following common approximation for the variance of the ratio of random variables and* V*:where denotes expectation to derive the approximation (for inverse calibration) for variance due to uncertainty in the estimated calibration coefficients and and in the test measurement : where and is the mean of the values in the calibration data. To apply (7), and are estimated from the calibration data (assuming (3) in forward calibration or the alternate version of (3), ). Equation (7) is almost the same as the corresponding well-known result for regression; the only differences are the swapping of the roles for and* y* and the appearance of in the denominator. For reverse regression, [20] derives , where and . Reference [20] also showed the long-term bias for inverse calibration and for reverse calibration, where . Notice that decreases as increases (because increase as increases), but does not decrease as increases; however, recall that, in NDA applications,* n* is small, usually 3 to 10.

A common summary performance measure of an estimator combines squared bias and repeatability variance defined as RMSE = (repeatability variance) + (bias)^{2}; that is, , where denotes the expected value (i.e., the first moment of the underlying probability distribution) [7]. Some technical details arise regarding the best model fitting approach if the predictor is measured with nonnegligible error. In addition, there is controversy regarding the relative merits of inverse and reverse calibration [7, 17, 21, 22]. Simulation can be used to choose between inverse and reverse calibration, because simulation provides accurate UQ (such as RMSE estimation) for both options. In simulations for NDA calibration, errors in the standard reference materials’ nominal values (’s) are usually small compared to errors in the instrument responses* Y*’s, which are possibly adjusted by using adjustment factors that have uncertainty (see Section 2.3).

##### 2.3. Summary of Recent NDA Examples

Recent publications have used simulation to assess the adequacy of (7) in the context of NM measurements by gamma detection [7, 17] and neutron detection [1, 7, 23, 24].

###### 2.3.1. Enrichment Meter Principle (EMP)

The EMP aims to infer the fraction of ^{235}U in U (enrichment, defined as atom percent of ^{235}U in an item) by measuring the count rate of the strongest-intensity direct (full-energy) gamma from decay of ^{235}U, which is emitted at 185.7 keV [7, 25, 26]. The EMP makes three key assumptions: () the detector field of view into each item is the same as that in the calibration items, () the item is homogeneous with respect to both ^{235}U enrichment and chemical composition, and () the container attenuation of gamma-rays is the same as or similar to that in the calibration items, so empirical correction factors have modest impact and are reasonably effective. If these three assumptions are approximately met, the enrichment of ^{235}U in the U is directly proportional to the count rate of the 185.7 keV gamma-rays emitted from the item. It has been shown empirically that, under good measurement conditions, the EMP can have a random error RSD of less than 0.5% and long-term bias of less than 1% relative to the true value, depending on the specific implementation of the EMP. Implementation details include features such as the detector resolution, stability, and extent of corrections needed to adjust items to calibration conditions. However, in some EMP applications, the random error RSD can be larger than bottom-up UQ predicts and larger than the 0.5% goal. For example, assay of the ^{235}U mass in a stratum of UO_{2} drums suggests that there is a larger-than-anticipated random RSD [17].

###### 2.3.2. Uranium Neutron Coincidence Collar (UNCL)

The UNCL uses an active neutron source to induce fission in ^{235}U in fresh fuel assemblies [27]. Neutrons from fission are emitted in short bursts of time and so exhibit non-Poisson bursts in detected count rates. Neutron coincidence counting is used to measure the “doubles” neutron coincident rate* Y*, which can be used to estimate the linear density of ^{235}U in a fuel assembly (-^{235}U/cm) using calibration parameters, and . The coincident rate is the observed rate of observing two neutrons in very short time gates, each of approximately 10^{-6 }sec, and is attributable to fission events. The equation commonly used to convert the measured doubles rate to an estimate of* X* (grams ^{235}U per cm) is , where and are calibration parameters, and is a product of correction factors that adjust to item-, detector-, and source-specific conditions in the calibration [27]. Therefore, is a special case of GUM’s equation (1) (with and reversed), where the two calibration parameters and and the 6 correction factors , and are among ’s in (1).

Reference [23] showed that calibration is most effective (leading to smallest RMSE in ) if there is no adjustment for errors in the predictor and that errors in , and , in , should be included in synthetic calibration data. Note that, by working with 1/*X* and 1/*Y*, one can convert to one that is linear in the transformed predictor

###### 2.3.3. Main Results for Sections 2.3.1 and 2.3.2

The main results for Sections 2.3.1 and 2.3.2 can be summarized in four main points as follows.

() If possible, both classical (see (2)) and reverse (see (3)) regression methods should be compared; however, reverse regression tends to do either as well as or better than classical regression. Analytical approximations such as (7) have been shown not to be sufficiently adequate in some settings, so simulation is recommended to compare classical and reverse regression and to estimate variance components in (Section 3).

() Error sources that are expected to be present in test measurements, such as container thickness measurements, can be simulated in synthetic calibration data. Such error sources often lead to item-specific biases (Burr et al., 2016).

() If reverse regression is used, then there is no need to adjust for errors in the predictors in (3). If inverse regression is used, then it is better to adjust for errors in predictors.

() Figure 1 plots (a) the observed and predicted bias and (b) the observed and predicted RMSE in a generic NDA example involving either gamma or neutron counting. It is not well known that calibration applications lead to bias, and [7, 17] showed that the bias cannot be easily removed, because measurement errors obscure the true measurand value and hence the true bias. Note in Figure 1(a) that the observed bias (in simulated data) is not in close agreement with the predicted bias, which is obtained from the expressions in Section 2.2. Therefore, long-term bias should be estimated using simulation rather than relying on the approximate expressions for inverse calibration and for reverse calibration, where . Similarly, Figure 1(b) illustrates that the observed RMSE is not well predicted by the expressions in Section 2.2, so, again, simulation is needed for adequate estimation of the RMSE. Note that the smallest RMSE is for reverse regression. Burr et al. [7, 17] show that reverse regression tends to have smaller RMSE than inverse regression but that if inverse regression is used, then methods to adjust for errors in predictors should be used.