#### Abstract

Nuclear material accounting (NMA) is the only safeguards system whose benefits are routinely quantified. Process monitoring (PM) is another safeguards system that is increasingly used, and one challenge is how to quantify its benefit. This paper considers PM in the role of enabling frequent NMA, which is referred to as near-real-time accounting (NRTA). We quantify NRTA benefits using period-driven and data-driven testing. Period-driven testing makes a decision to alarm or not at fixed periods. Data-driven testing decides as the data arrives whether to alarm or continue testing. The difference between period-driven and datad-riven viewpoints is illustrated by using one-year and two-year periods. For both one-year and two-year periods, period-driven NMA using once-per-year cumulative material unaccounted for (CUMUF) testing is compared to more frequent Shewhart and joint sequential cusum testing using either MUF or standardized, independently transformed MUF (SITMUF) data. We show that the data-driven viewpoint is appropriate for NRTA and that it can be used to compare safeguards effectiveness. In addition to providing period-driven and data-driven viewpoints, new features include assessing the impact of uncertainty in the estimated covariance matrix of the MUF sequence and the impact of both random and systematic measurement errors.

#### 1. Introduction

One challenge in modern safeguards at declared facilities that process special nuclear material (SNM) is how to quantify the benefit of process monitoring (PM) [1–4]. There are many examples of PM data used in safeguards, such as neutron-based counting of a waste stream (the “hulls”) exiting the dissolver in an aqueous reprocessing facility, or in-tank bulk mass measurements of solutions in tanks. Although there is no standard definition of PM data, it is generally collected very frequently (in-line), is often an indirect measurement of SNM, and is often collected by the operator for process control. Recent efforts to quantify the benefits of PM data are described, for example, in Burr et al. [1, 2].

Because PM can have several roles, it is necessary to consider the quantitative benefits of each possible role. For example, PM can have a “front-line” role in monitoring for indicators of facility misuse, such as a shift in nitric acid concentration to direct excess Pu to a waste stream from a separations area in an aqueous reprocessing facility [5]. Alternatively, in the role this paper considers, PM can enable in-process inventory estimation by using empirical modeling and measurements of flows in and out of a separations area [4]. In a pyroreprocessing facility, several PM options are being studied, such as monitoring voltage and current in the electrorefiner, which holds most of the in-process inventory [6–9]. Electrorefiner voltage and current are among the measured quantities that can, using a model such as the one in Zhang [10], predict the SNM inventory in the electrorefiner in real time.

Traditionally, nuclear material accounting (NMA) consists of relatively infrequent material balance closures (such as once per year), with the material balance (MB) defined as , where is an inventory and is a transfer. And, assuming that the MB has approximately a normal distribution (the assumption is justified because many measurements enter an MB calculation, so the central limit theorem is in effect), the measurement error standard deviation of the MB, , determines the probability to detect a specified amount of SNM for a given false alarm probability. Therefore, for a single balance period, is the main quantitative measure of safeguards effectiveness, with PM in a support role without a quantitative measure of effectiveness. Similarly, for multiple balance periods, the covariance matrix is the main quantitative measure of safeguards effectiveness, with single-period variances on the diagonal and between-period covariances on the off-diagonal. Some studies such as Burr et al. [1, 2] have considered data from PM used in another role not considered here, where PM data are put on the same statistical footing as NMA data (the MB sequences).

Toward the goal of quantifying the benefits of PM data, this paper revisits statistical methods for NMA and for frequent NMA, called, near-real-time accounting (NRTA). Both NMA and NRTA have been discussed in two main literature reviews, Speed and Culpin [11] and Goldman et al., 1982 [12]. Speed and Culpin [11] described some of the sequential tests used for NMA and advocated a controversial Bayesian approach to testing for loss of SNM. The controversy arose for the usual reasons in Bayesian approaches: the need to specify costs of wrong decisions and the need to specify a prior probability of SNM loss. Goldman et al. [12] speculated that NRTA would replace conventional NMA and described the important possibility of data falsification by the operator in the context of international safeguards. Here we consider the operator's MB sequence, so most directly address domestic safeguards. However, the operator's MB sequence is also used within the framework of international safeguards, along with the inspector's “difference” statistic.

NRTA is made possible by PM in the support role of enabling frequent material balance closures. The statistical methods include Shewhart charts based on multiple material balances during a year on MB data (the International Atomic Energy Agency refers to the MB sequence as the material unaccounted for, MUF, sequence), SITMUF (standardized independent transformed MUF) data, and a once-per-year balance based on the MUF data, known as CUMUF. The SITMUF sequence is a transform of the MUF sequence given by (where is the Cholesky decomposition of so that ) that is independently and identically distributed as a normal random variance with mean 0 and variance 1 [13]. Therefore, the SITMUF sequence has an identity matrix as its covariance matrix, while the MUF sequence has a positive definite but otherwise arbitrary covariance matrix .

As is customary in safeguards, entries in are estimated using metrology of individual measurement methods and variance propagation to combine measurement error effects from multiple measurements [12, 14, 15]. After sufficient operating history, production facilities can typically estimate reasonably well, with much of the uncertainty in the estimated entries (variances and covariances) in arising from in-process inventory measurements whose quality is not well known. For example, Yamaha et al. [4] used measurements of flows in and out of a pulsed column at the Rokkasho aqueous reprocessing facility to estimate the SNM (Pu) in the column. The quality of such indirect SNM measurements is assessed using very limited data, typically from measurements of SNM that is recovered from infrequent equipment washout. Section 3.5 considers the impact of estimation error in on estimated alarm probabilities from sequential testing.

We also consider joint cumulative sum or cusum (also known as Page's test) methods [16] for the multiple balance MUF and SITMUF data. Shewhart and cusum methods are studied with several forms of with and without systematic measurement error in addition to random measurement error.

We revisit NRTA as a quantitative component of safeguards at declared facilities to monitor for SNM loss using both period-driven and data-driven testing. A period-driven approach makes a statistical decision to alarm or not at the end of each fixed period. A data-driven approach (also known as sequential testing) does not use a set decision period but instead decides as the data arrives whether to alarm or continue testing.

The difference between period-driven and data-driven viewpoints is illustrated simply by using both one-year and two-year periods, with sequential testing using a two-year truncation period serving as a surrogate for data-driven testing. For both one-year and two-year periods, conventional (period-driven) NMA using a once-per-year cumulative MUF (CUMUF, cumulative material unaccounted for) testing is compared to Shewhart and joint sequential cusum testing using either MUF or SITMUF (standardized, independently transformed MUF) data.

In addition to providing period-driven and data-driven viewpoints, new features include assessing the impact of uncertainty in the estimated covariance matrix of the MUF sequence and the impact of both random and systematic measurement errors.

There is interest in comparing pyroreprocessing to aqueous (and comparing several variations of aqueous) reprocessing with regard to proliferation resistance [6, 7]. With PM in the role of enabling NRTA to be performed (mainly by assisting with in-process inventory estimates), NRTA effectiveness can be compared by comparing at a pyrofacility to at an aqueous facility, and using data-driven testing, which we show in Section 2, is more appropriate than period-driven testing for NRTA.

The paper is organized as follows. Section 2 reviews and then describes our extensions to previous period-driven NMA studies. Section 3 gives numerical results of our simulation study. Section 4 describes our two-year truncated sequential test that is a surrogate for full-blown sequential testing. Section 5 is a summary.

#### 2. One-Year Study

Avenhaus and Jaech [17] and Jones [18–23] studied period-driven testing, making a statistical decision to alarm or not once per year.

Avenhaus and Jaech [17] compared the annual CUMUF test to more frequent Shewhart testing and used the Neyman-Pearson lemma from classical statistical hypothesis testing to prove that if the loss vector is proportional to the row sums of , , then the CUMUF test has the highest alarm probability (AP). Assuming a loss of one significant quantity (SQ) over one year, testing for loss only (not gain), and setting the false alarm probability (FAP) to 0.05, the AP is 0.95, provided . The factor of 3.3 arises from double use of the value 1.65, which corresponds to both the 0.05 FAP and the 0.95 AP [1, 2].

In period-driven (annual) testing, Burr et al. [1, 2] point out that an adversary could divert a significant quantity (SQ) over 12 months by diverting one half SQ during year one and the other half during year two (e.g., from months 7 to 18). In that case, assuming the same 0.05 annual FAP, the 0.95 AP decreases to 0.75, because the per-year AP decreases from 0.95 to 0.50 and two-year , which follows because the probability of nondetection over the two years is . This simple example illustrates why we endorse data-driven testing over period-driven testing. The more relevant AP is 0.75 rather than 0.95 because it is unrealistic to assume the adversary will divert the entire SQ during a specific 12-month period. Also, “front-line” roles for PM are under evaluation [1, 2] in which data-driven testing will almost certainly be used. Therefore, for a fair comparison of NMA alone to PM alone to combined NMA and PM systems, it is helpful to assume that the same type of testing (data-driven or period-driven) will be used on all candidate safeguards system options.

In response to the “worst-case” loss scenario given previously from Avenhaus and Jaech [17], Jones [18–23] evaluated a statistical test consisting of two sequential cusum tests. The cusum test at period for an abrupt threat is calculated as , where is the quantity being monitored, such as the MUF sequence or the SITMUF sequence, and is a control parameter. For a protracted threat, the cusum test at period is calculated as . As in the Shewhart test, the cusum tests alarm on period if or for an alarm threshold . In using two cusum tests, Jones [18–23] aimed for high AP against abrupt (single-period) loss while still having almost as high AP against the “worst-case” protracted loss as the single CUMUF test (which has the highest possible AP for the “worst-case” loss). Of course any number of cusum tests could be used, but limiting to two cusum tests, one best for abrupt loss and the other best for protracted loss, seems to provide reasonable AP for a wide range of loss scenarios.

Avenhaus and Jaech [17] and Jones [18–23] ignored the impact of systematic errors on and ignored the effects of estimation error in . Therefore, one contribution of this paper is to redo these studies while considering impact of systematic measurement error. In addition, modern computational methods (we use the package mvtnorm in R from the R Development Core Team [24]) allow us to easily provide exact critical values for an FAP of 0.05 under the null hypothesis of no loss. Although both “exact” and “approximate” critical values are given in Avenhaus and Jaech [17], even the “exact” critical values were based on approximate numerical techniques. However, our study confirms that the reported APs in Avenhaus and Jaech are correct for the cases considered. We calculated the AP under various loss scenarios using the multivariate normal cdf (pmvnorm function in the mvtnorm package for R) for Shewhart tests applied to the MUF and SITMUF data. We used simulated multivariate normal data ( sequences) to accurately estimate the AP for the cusum test applied to the MUF and SITMUF data.

We consider the following setup motivated by Avenhaus and Jaech [17] and Jones [18–23]: (i) 48, 36, 24, 12, 6, 4, 1 balances per year, .(ii) for inventory ( is the absolute standard deviation of the inventory measurement). (iii) for transfers with associated total amount lost the total threat amount over one year. Note that the to ratio varies as 10, 2, 1. For comparison to other methods, the APs for CUMUF are 0.85 for all three cases defined by for (a tridiagonal covariance matrix) and total loss (actually, , resp., when our exact calculations replace Avenhaus's approximate calculations and we fix the CUMUF AP at 0.85 for all three cases). See the next point regarding MUF covariance matrices. (iv) MUF covariance matrix (a) is tridiagonal with variances equal to and off-diagonals equal to for elements with . Note that the base case is for as in Avenhaus and Jaech [17]. The negative correlation at lag 1 results from the previous balance ending inventory being the next balance beginning inventory. Jones [18–23] assumed a facility cleanout after each balance, so his covariance matrix is slightly different. Note that for this covariance matrix the sum of its entries is the same for all , so the CUMUF variance does not depend on , the number of balances per year. Specifically, for the th balance, where and are the standard deviations of and , respectively. Then for all and .(b) Avenhaus and Jaech [17] and Jones [18–23] considered only random errors. Here we also consider systematic errors in the measurement error modeling. For the th balance, where and are the random and systematic error standard deviations of and and are the random and systematic error standard deviations of . Then we have and for , We let and with and both equal to 2 and 1; these are denoted by and , respectively.(v) The methods we consider are(a) MUF for balances, with covariance matrix .(b) SITMUF for balances, with identity covariance matrix.(c) CUMUF—one balance per year so , alarm if for critical value .(d) MUF Shewhart—multiple balances—alarm if any for common critical value .(e) SITMUF Shewhart—multiple balances—alarm if any for common critical value .(f) MUF joint cusum—two cusum streams, one for abrupt , one for protracted , compute cusum streams on normalized MUF , , and . Alarm if or . Find and so that FAP using protracted stream is 0.04 and the joint FAP using both streams is 0.05 following Jones [21, 22]. (g) SITMUF joint cusum—two cusum streams, one for abrupt , one for protracted , compute cusum streams on SITMUF , , and . Alarm if or . Find and so that FAP using protracted stream is 0.04 and the joint FAP using both streams is 0.05 following Jones [21, 22]. (vi) Our approach: critical values are found exactly or by simulation ( simulations). The APs (powers) are found exactly or by simulation ( simulations). (vii) Our performance criteria are as follows.(a) For protracted threat, AP over one year. (b) For abrupt threat, timeliness AP, AP within 30 days; 90 and 60 days for equals 4 and 6, respectively. Jones [21] introduced timeliness AP. (c) For protracted threat (same loss per balance), AP and expected loss. The AP is over the entire year and expected loss EL is computed as where is the cumulative loss after th balance and is the probability of alarming in the th balance. The expected loss for CUMUF is . Jones [21] introduced expected loss.

#### 3. Results

##### 3.1. Protracted Threat, AP

From Figure 1 for (, ), we make the following observations. The CUMUF APs (dashed lines) are smallest to largest in the order to to for but are smallest to largest in the opposite order to to for . The SITMUF Shewhart dominates MUF Shewhart for , but for and , SITMUF Shewhart is only somewhat better than MUF Shewhart especially for large . The SITMUF joint cusum dominates MUF joint cusum for especially for large . For , SITMUF joint cusum is only somewhat better than MUF joint cusum especially for large . Note for large , the SITMUF joint cusum dominates CUMUF.

(a) |

(b) |

(c) |

(d) |

From Figure 2 for (, ), we see that the CUMUF AP (dashed lines) decreases from – for all . The performance of SITMUF Shewhart, MUF joint cusum and SITMUF joint cusum decreases substantially for and for large so that all these methods are not much better than MUF Shewhart. We see that for and and large , MUF joint cusum is slightly better than SITMUF joint cusum. For and , MUF Shewhart is slightly better than SITMUF Shewhart. Note that SITMUF joint cusum dominates CUMUF only for .

(a) |

(b) |

(c) |

(d) |

For the (, ) case, there are similar patterns except that for and , the MUF versions are better than the SITMUF versions of Shewhart (for small ) and joint cusum (for large ).

We emphasize that what is new here is that we have identified different covariance matrices than considered previously where the SITMUF version does not dominate (have uniformly higher APs than) the MUF version and the joint cusum versions do not dominate CUMUF in terms of the AP.

##### 3.2. Protracted Threat, Expected Loss

For CUMUF, the expected loss . From Figure 3 for (, ), all methods outperform CUMUF. The SITMUF versions outperform the MUF versions. We see that as increases the joint cusum methods outperform the Shewhart methods. The methods do better going from to for and do worse going from to for . The SITMUF joint cusum EL is less than 70% of CUMUF's especially for , slightly worse for and somewhat more worse for for .

(a) |

(b) |

(c) |

(d) |

For (, ), all methods outperform CUMUF but not substantially for and . All methods are similar for and . Only for and especially for large do the SITMUF versions outperform the MUF versions and the joint cusum methods outperform the Shewhart methods. The SITMUF joint cusum EL is less than 70% of CUMUF's only for and .

For (, ), similar observations hold as those for (, ). Here the SITMUF versions are not much better than the MUF versions. The SITMUF joint cusum EL is less than 75% of CUMUF's only for and .

##### 3.3. Abrupt Threat, Timeliness AP

From Figure 4 for (, ), timeliness AP (TAP) improves going from to with the SITMUF Shewhart and joint cusum methods outperforming the MUF Shewhart method; the SITMUF Shewhart and joint cusum methods perform similarly.

(a) |

(b) |

(c) |

(d) |

For (, ), the performance is flat across all the methods and near 1.0 for all covariance matrices. Perhaps there is more differentiation for smaller . The SITMUF joint cusum method is somewhat better than MUF joint cusum method for especially for larger . Figures illustrating this case and other cases throughout the results sections are in Burr and Hamada [25], which is available upon request.

For (, ), the performance is flat across all the methods and near 1.0 for all covariance matrices. It is possible there would be differences in APs for smaller .

##### 3.4. Optimal Protracted Threat for CUMUF, AP

From Figure 5 we see that CUMUF outperforms the other methods for , , as Avenhaus and Jaech [17] showed for MUF Shewhart under optimal protracted threat (MUF means are proportional to row sums of ). This holds for the other values not shown here, and results are qualitatively similar for , and for , .

(a) |

(b) |

(c) |

(d) |

##### 3.5. SITMUF Joint Cusum

Here we consider the SITMUF joint cusum method performance as varies in Figures 6–8. For protracted loss, Figure 6 shows that AP improves as increases for but decreases as increases for and . Consequently, for and , a small is recommended. Note that the CUMUF AP plotted as dashed lines is a function of and depends on the MUF covariance matrix.

(a) , |

(b) , |

(c) , |

(a) , |

(b) , |

(c) , |

(a) , |

(b) , |

(c) , |

From Figure 7(a) for protracted loss and (, ), EL improves as increases for but first decreases and then increases as increases for and .

From Figures 7(b) and 7(c) for protracted loss and (, ) and (, ), EL decreases as increases for but increases as increases for and .

From Figure 8 and abrupt loss, the timeliness AP (TAP) is quite high especially for . Note that in Figure 8, the “30 Day AP” is actually “90 Day AP” and “60 day AP” for equal to 4 and 6, respectively. We see that TAP increases from to .

Jones [22] points out the serious drawback about the MUF joint cusum method; one needs to know the MUF covariance ahead of time in order to set up the method. Consequently, the SITMUF joint cusum method is preferred from this practical point of view. However, while the SITMUF joint cusum generally has higher AP over most of the situations considered here, we show in Section 3.7 that the MUF join cusum can have just as large an AP.

##### 3.6. SITMUF Cusum or Joint Cusum Robustness to MUF Covariance Matrix

As mentioned in Section 1, production facilities with some operation history can reasonably expect to achieve approximately 15% relative estimation error in the entries in . That is, we chose the 15% variation as being a typical amount by which estimated entries in might vary from the respective true values. Recall that is estimated using variance propagation applied to measurement control data collected from the same instruments as those making the NMA measurements.

As a convenient way to introduce measurement error in the MUF covariance matrix , we use sample MUF covariance matrices based on 50 to 100 samples that produce samples variances and covariances that vary about 15% relative to the true values.

To study robustness to estimation error in , previous analyses were repeated with estimated (rather than true) covariance matrices. For example, for the SITMUF cusum, using the MUF covariance matrix with and , with , and , for balances per year, the nominal FAP of 0.05 varied from 0.04 to 0.06 across 1000 simulations when was fixed without error . When was estimated using 100 samples, the FAP varied from 0.033 to 0.070 and when estimated using 200 samples, the FAP varied from 0.029 to 0.077. The alarm probabilities showed a similar sensitivity to estimation error in . When the number of balance periods was increased to per year, the FAP varied from 0.11 to 0.12 (using the same alarm threshold as in the case) and from 0.09 to 0.14 when using 100 samples.

Similarly, for the SITMUF joint cusum, the critical values we used were based on the true MUF covariance matrix, and the FAP increases on average from 0.05 to 0.055 and the 0.05 and 0.95 quantiles across 1000 realizations of the estimated are approximately 0.05 and 0.06, respectively (depending on the case, as defined by , , and ). The 0.05 and 0.95 quantiles in the APs across 1000 realizations of the estimated are approximately 0.67 and 0.68, respectively (depending on the case).

These findings for the SITMUF cusum or joint cusum suggest robustness to 15% estimation error in . Note for comparison that if only a single MUF value is considered and is the scalar , then 15% estimation error in also increases the FAP on average from 0.05 to 0.055, but the 0.05 and 0.95 quantiles are 0.02 and 0.11, respectively, and are 0.34 and 0.66 for a mean shift of . So if only a single balance period is considered, then 15% measurement error in leads to large uncertainty in the true FAP and true AP. But when a matrix of, for example, 12-by-12 variances and covariances in a sequence of 12 MUFs has 15% estimation error in each entry, apparently there is friendly cancellation of effect, leading to small changes in the FAPs and APs.

We find that the estimated APs are robust by being nearly the same as in the situation of having zero error in .

##### 3.7. Threat Scenarios in Which MUF Outperforms SITMUF for the Joint Cusum Method

In a real facility, is rarely known in advance of a processing campaign so it must be estimated at each balance period. For that reason, there is a logistic advantage in using tests based on the SITMUF sequence rather than the MUF sequence. The advantage is that the SITMUF sequence is iid if no loss has occurred, so it is relatively simple to control the FAP for any type of testing.

Regarding a possible performance advantage of SITMUF over MUF, this section shows that on average across random loss scenarios, the AP for SITMUF-based testing is the same as the AP for MUF-based testing. Note that because the SITMUF sequence is a transform of the MUF sequence given by (where is the Cholesky decomposition of so that ), a test based on SITMUF with loss will have statistically the same behavior as a test based on MUF with loss . By “statistically the same behavior,” we mean that the average AP across many realizations of the SITMUF sequence is the same for the SITMUF with loss as for the MUF with loss (which we confirmed using simulation in R).

Burr and Hamada [25] evaluate the APs for MUF and for SITMUF for the study reported previously (MUF joint cusum AP minus SITMUF joint cusum AP so that a positive value means that the MUF-based method has higher AP).

Burr and Hamada [25] give results for cases 1–3 corresponding to –. Under each case, there are matrices for the three 's (0.1, 0.5, 1). The matrices are 7 rows by 5 columns. The 7 rows correspond to equal to 48, 40, 36, 24, 12, 6, and 4. The 5 columns correspond to threat scenarios: none, protracted with even loss each balance, protracted with even loss every other balance, abrupt loss at balance , and protracted loss that CUMUF is guaranteed to dominate according to Avenhaus and Jaech [17].

We also used a genetic algorithm (GA) to find scenarios which maximizes the MUF versus SITMUF difference. We considered 8 cases defined by , MUF covariance matrices , , and (and associated ). We ran the GA for 25 generations or stopped sooner if the best MUF versus SITMUF difference stopped changing. There is no guarantee that the GA found the scenario with the maximum difference but it does find several with positive differences that are too large to explain by chance.

#### 4. Two-Year Study

To contrast period-driven (make a statistical decision to alarm or not once per year, e.g.,) from data-driven testing (make a statistical decision to alarm or not on the fly, as data arrives), we can mimic data-driven testing by comparing a two-year study with truncated sequential testing to a one-year period-driven study.

We consider five covariance matrices for two years. These correspond to (1) no cleanout, no systematic measurement error, and the other four have systematic measurement error with (2) no cleanout, with measurement calibration, (3) no cleanout, without measurement calibration, (4) with cleanout, with measurement calibration, (5) with cleanout, with measurement calibration.

##### 4.1. A&J/Jones Model

Avenhaus and Jaech [17] assume only random error components for and , inventory and transfer at the th balance. There is no cleanout and no systematic measurement error. We have the th balance, where and are standard deviations of and , respectively. Then for all and . A&J use values and based on 12 balance periods per year. If there are balance periods per year, then .

The scenarios in the remainder of this section have systematic measurement error.

##### 4.2. Extended Model, No Cleanout, with Measurement Calibration

In the remaining sections we consider both random and systematic error components for and . Avenhaus and Jaech [17] and Jones [18] considered only random errors. We have the th balance, where and are the random and systematic error standard deviations of and and are the random and systematic error standard deviations of . Then we have for the first year: and for , For multiple years, the covariance matrix is block diagonal, except the last balance of one year and the first balance of the next year, where and for the first balance of one year and the first balance of the next year, where

##### 4.3. Extended Model, No Cleanout, No Measurement Calibration

We have the th balance, where and are the random and systematic error standard deviations of and and are the random and systematic error standard deviations of . Then we have and for , This holds for multiple years.

##### 4.4. Extended Model, Cleanout, Measurement Calibration

We have the th balance, where and are the random and systematic error standard deviations of and and are the random and systematic error standard deviations of . Because of cleanout, and are zero with no associated measurement error. Then we have for the first year:

For ,

For ,

For ,

For , , and , For multiple years, the covariance matrix is block diagonal.

##### 4.5. Extended Model, Cleanout, No Measurement Calibration

We have the same covariances for balances within the same year as given in the preceding scenario. The Appendix gives what has to be added for years 1 and 2 which applies to all pairs of years.

##### 4.6. Results

The FAP is set at over two years. We compare the CUMUF and SITMUF joint cusum. The protracted loss cusum FAP is 0.08 and the joint cusum FAP is 0.0975. We consider the loss scenario of even loss across the last half of the first year and the first half of the second year with total loss of ; the values of used are the same as before that depend on .

First we compare the APs from – in Figure 9.

(a) , |

(b) , |

(c) , |

The APs for loss over two years of and are qualitatively similar to those shown in Figure 9.

Next we compare SITMUF joint cusum with CUMUF for , , and in Figure 10. Results for a two-year loss of and are somewhat different than the results for a two-year loss of as in Figure 10 [25].

(a) , |

(b) , |

(c) , |

#### 5. Summary

We have considered process monitoring in the support role of enabling NRTA. In most facilities, frequent balance closure is made possible by process monitoring to aid in estimating in-process inventory. We have evaluated options for NRTA, extending results from the safeguards literature by presenting both data-driven and period-driven views, including the effects of systematic measurement errors in several distinct covariance matrices for a sequence of material balances. The quantitative effect of estimation error in was also evaluated.

Our main summary points are the following.(i)In evaluating NRTA data, data-driven (sequential) testing is more appropriate than period-driven testing. However, the Neyman-Pearson lemma for fixed-period testing is convenient (as shown by [17]) for identifying the worst-case loss scenario and calculating the corresponding detection probability, provided one assumes that the loss occurs over the particular 12-month period being evaluated.(ii)For system studies such as those presented in Sections 2–4, data-driven testing can be evaluated using period-driven (truncated) sequential testing over long periods. For example, the IAEA goal is to detect abrupt or protracted (over 1 year) diversion with high probability. To detect diversion over 1 year, it is adequate to consider a 2-year period truncated version of the cusum (Page's) sequential test.(iii)In sequential testing, there is no analogue to the classic Neyman-Pearson lemma to identify the best test. However a joint cusum test as in Jones [18] is recommended as a good compromise that balances the competing goals of having large AP for abrupt loss and reasonably large AP for protracted loss. For logistic convenience in system studies and because entries in become known at each balance period rather than in advance, it is best to apply the joint cusum to the SITMUF rather than to the MUF sequence. For suitably transformed loss vector, the AP is the same for the joint cusum applied to the SITMUF or to the MUF sequence; therefore, there is no performance advantage in using the SITMUF sequence rather than MUF sequence.(iv)Our numerical study of robustness to estimation error in suggests that at least 15% relative standard deviation in the entries in can be present with quite small effects on the APs.(v)Our inclusion of systematic errors in the measurements that are involved in estimating clearly shows that systematic errors have an important impact on and therefore also on APs.(vi)We concur with Avenhaus and Jaech [17] and Jones [18] that NRTA is not a panacea, because NRTA does not increase APs for protracted loss, even in data-driven testing. That is, using PM to enable NRTA will not solve the known dilemma that protracted loss is difficult to detect. However, PM in other front-line roles could detect specified protracted loss for which the PM option is tuned to detect [1, 2].

There are other possible roles for PM in safeguards, so future work will quantify the benefit of PM in possible roles other than enabling NRTA.

#### Appendix

#### Section 4.5 Additional Results

This appendix gives what has to be added to Section 4.5 results for years 1 and 2 and applies to all pairs of years: For , For , For and , For , For ,

#### Acknowledgments

The authors acknowledge support from the National Nuclear Security Administration Office of Nuclear Nonproliferation Research and Development (NA-22) and Nuclear Energy (NE) programs.