Abstract

For sequentially collected data, this paper introduces a lag-one differencing method to estimate the random error standard deviation and then uses the estimate to calculate a change detection threshold in a moving window method to detect shifts in the short-term systematic error. Performance results on simulated and real data are presented. Fortunately, the impact of having to perform change detection on the estimated short-term systematic and random error variances is anticipated to be modest or small. The motivating example arises from facilities under nuclear safeguards agreements, where inspector data collected during International Atomic Energy Agency (IAEA) verifications are compared to corresponding operator data. The differences between the operator and inspector values are evaluated using an application of analysis of variance (ANOVA). Typically, it is assumed that short-term systematic errors change across inspection periods, so inspection periods form the groups used in the ANOVA. In some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times.

1. Introduction and Background

For facilities under nuclear safeguards agreements, inspector data collected during International Atomic Energy Agency (IAEA) verifications are compared to corresponding operator data. The differences between the operator and inspector values are evaluated using an application of analysis of variance (ANOVA). It is typically assumed that short-term systematic errors change across inspection periods (groups); however, in some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times. This paper introduces a lag-one differencing method to estimate the within-group random error standard deviation and then uses the estimate to calculate a change detection threshold in a moving window method to detect shifts in the short-term systematic error. Performance results on simulated and real data are presented.

An effective measurement error model for inspector measurement data must account for variation within and between groups, where in this context a group is usually (but not always) an inspection period. A model for multiplicative errors for the inspector () (and similarly for the operator ) iswhere for item (from 1 to ) from group (from 1 to ), is the inspector’s measured value, is the true unknown value, is the short-term systematic error in group , and is a random error. The notation means that has a normal distribution with mean 0 and variance , and similarly for [1]. For simplicity, it is assumed here that there are measurements in each inspection period; allowing for a differing number of measurements in each inspection period is straightforward.

The measurement error model in (1) sets the stage for applying ANOVA with random effects [14]. Neither nor are observable. However, for various types of observed data, the variances and can be estimated. The variance of is , or for a fixed value of , , where the item variance is the variance of the , defined as , and is the overall mean of all the true values. The values , by linear error variance propagation, have approximate variance (an accurate approximation for large , up to approximately = 0.60). Regarding notation, and . This paper focuses on estimating and .

2. ANOVA

The usual ANOVA decomposition (which is a standard analysis technique) iswhere , , and In (2), SSW is the within-group sum of squares, and SSB is the between-group sum of squares. In random effects ANOVA (the group means are random), the expected value and , so reasonable estimators are and . It follows that . The vector is a complete sufficient statistic for so by the Lehmann-Sheffé theorem, the estimators and have the minimum variance among all possible unbiased estimators (MVUE) [1]. However, biased estimators can have smaller mean-squared error (MSE), where . Also, the MVUE property relies on the normality assumption. Therefore, other estimators are sometimes considered [1]; for example, setting to 0 if it is negative leads to a positive bias, but also to a lower MSE.

Figure 1(a) plots 10 simulated values in each of 3 groups. The alarm limit on the right side of the plot in Figure 1(a) is estimated on the basis of the = 3 sets of = 10 values of The groups in data such as that in Figure 1(a) are typically inspection periods; however, in some data sets, it appears that the short-term errors have changed at other times, so change detection methods can be used to detect the actual change times. This paper introduces a lag-one differencing method to estimate the random error standard deviation and then uses the estimate to calculate a change detection threshold in a moving window method to detect shifts in the short-term error . Performance results on simulated and real data are presented.

Data such as that in Figure 1 are used for verification, and also for two main metrology purposes. One metrology purpose is to estimate performance values for the four RSDs, . For verification, statistical tests are applied to , so a second metrology purpose is to estimate and , by applying random effects one-way ANOVA to real paired relative difference data that are typically assumed to follow (2). Reference [5] evaluates impacts on alarm probabilities of using estimates of and (this paper’s focus) instead of the true quantities.

Figure 2 is four pairs of real operator and inspector data for each of three measurement periods (“groups”), plotted as versus the measurement order. In the context of change detection for paired data such as that in Figure 1 (simulated data) or Figure 2 (real data), there are situations where the default choice to group by inspection period seems to be contradicted by the data. The main goals for such paired data are to(1)estimate the within-group and between-group variance for data (this paper’s focus),(2)estimate thresholds that correspond to small false alarm probabilities for future data as in Figure 1,(3)estimate the within-group and between-group variance for both and data for international target values [4] and to use for error variance propagation in material balance evaluation.

This application is an example of random effects ANOVA, which is a standard analysis technique, but with the nonstandard feature that the group memberships are unknown. Some research issues arise in applying ANOVA to paired data, including the following three:

One-way random effects ANOVA that arises in the verification context sometimes with unknown groups has not been investigated in published literature; see Section 3.

Although ANOVA dates to the 1930s [1], research continues regarding, for example, constructing intervals for . Assuming that the and values are normally distributed, an exact CI can be constructed for using the distribution, but there are only approximate methods to construct CIs for , because the distribution of is a difference of two independent random variables. Kraemer [6] proposes a modified CI construction method for and investigates impacts of nonnormality. Kraemer’s modified CI method is based on the chi-squared based Satterthwaite approximation method, , where the chi-squared degrees of freedom for a CI, and also on the approximation [7], where is the kurtosis of the distribution of . If the and distributions are normal, then This approach works reasonably well in simulations reported in [6]; however, it ignores the fact (unpublished) that , where is the fourth moment of the distribution of .

A variance approximation issue arises in analysing instead of (not available in practice). Linear first-term error variance propagation is accurate for quite large up to approximately 0.60. The results reported in Section 4 use to estimate and . Also, when checking whether future values are within alarm limits, there is no need to separately estimate , so usual ANOVA is adequate, rather than the Grubbs’ ANOVA[25].

3. One-Way Random Effects ANOVA with Unknown Groups

Regarding research issue one, if wrong groupings are chosen, then and can be adversely impacted. For example, Figure 3 plots an exaggerated example for ease of illustration, in which successive groups are incorrectly combined. To show that and are impacted by mismatch between the true and assumed grouping, consider the ANOVA decomposition equation (3), where for multiplicative models. In Figure 3, the groups 1 and 2, 3 and 4, and 5 and 6 are merged. Specifically, let and the assumed values are =12 and =3, but the true values are = 6 and = 6. One must take care to interpret terms because there are two values of (= 6 and =12) and two values of (= 6 and = 3), depending on whether the true or merged groups are being used in the calculation.

Because the new group 1 has two different values of (one for the first 6 values and another for the second 6 values), the within-group (former groups 1 and 2) MS (which should estimate random error variance only) has expected value .

Therefore, .

The between-groups sum of squares also has the wrong expected value due to the fact that var() is being estimated, because the average of 2 successive S values is used in each of the 3 (merged) groups. So, has expected value . Therefore, the estimate = (MSB-MSW)/ has expected value.

The general expressions for and when merging successive pairs of groups of size areResults (3) and (4) have been confirmed by simulation in the R statistical programming language [8] for data simulated from (1) such as that in Figure 3 (see Table 1). Notice from (3) and (4) that .

As another extreme case, assume that each of the 6 true groups in Figure 3 is split into 3 groups, creating 18 groups of size 2. It is easily shown for this case that and . The factor arises from having 3 repeated values of , and in the expressionMore generally,and soThe extreme case of merging distinct groups 1 and 2, 3 and 4, and 5 and 6 or of splitting each of the 6 true groups into 3 groups will not occur in this simulated data or in other similar data if any reasonably effective change detection method were used, as illustrated in Section 4.

4. Change Detection to Detect Changes in the Value of S =

There are many change detection options for sequences of (-) data [9, 10]. Assuming that not too many changes occur in during the entire analysis period, a robust estimate of , where , is available, using the median absolute deviation (MAD) of the lag-one differences, adjusted to provide an unbiased estimate of the true variance [8]. The following function madwin implements the MAD-based estimate of for a moving window change detector in the R statistical programming language [8]. The moving window compares to , where is the average value of in window (which by the central limit theorem can be assumed to have approximately a normal distribution) and is a threshold chosen such that the false alarm probability over the entire analysis period is small, such as 0.05.madwin = function(x=testdata1-testdata2,nwin1=5,nwin2=7,t.percentile=.99,peak.threshold=.5) # nwin1 is the width of the moving window;#nwin2 is the width of a second window that is used to eliminate duplicate inferred change points #arising from the same mean shift (shift in S value).​​n = length(x); temp.check = numeric(n); indices.change = 0tempdiff =c(0,diff(x))rstd = mad(diff(x)); rvar = 2​​# the mad function provides a robust used of the standard deviation# that is adjusted to have zero bias for Gaussian data.df1.use =mean(c(n-1,nwin1));df2.use = mean(c(n-1,nwin2)) # approximate degrees of freedomfor(i in (nwin1+1):(n-nwin1+1)) the first candidate change point is at index nwin1+1indicesl = (i-nwin1):(i-1);indicesh = i:(i+nwin1-1)tempdiff = mean(x[indicesh]) - mean(x[indicesl]) # difference between left and right windowstemp.check[i] = tempdiff/(rstd/nwin.5)if(abs(temp.check[i]) > qt(t.percentile,df=df1.use)) indices.change = c(indices.change,indices.change = indices.change[-1]indices.change1 = 0​​temp = abs(temp.check)for(i in (nwin2+1):(n-nwin2-1)) if(temp[i] == max(temp[(i-nwin2):(i+nwin2)]) && abs(temp.check[i]) > qt(t.percentile,df=df2.use)) indices.change1 = c(indices.change1,indices.change1[-1] # function returns indices.change1, discarding the initial 0.

Figure 4 illustrates a moving window option and a cumulative sum option [3, 10] (monitoring separately for positive shifts and for negative shifts) that both use the MAD-based estimate of . The data is simulated with = 4, = 100, , and . The estimated number of groups is a random quantity that varies across hypothetical realizations of the data for the same fixed unknown true grouping. In simulation studies, one can know the true grouping and enforce all assumptions to hold, so it is possible to compare the performance of and for candidate grouping options.

In Figure 4, there are three change points, at 101, 201, and 301, so there are four true groups. The black 1s are the values. Notice the positive mean shift from group 1 (1:100) to group 2 (101:201). The red 2s are the moving average difference, scaled by using the MAD-based robust estimate of . The blue and green curves are Page’s cumulative sum, defined as , where [3, 10] for positive and for negative change detection. The red 2s (size 3 moving window) easily detect all three changes, at 101, 201, and 301. The cusums are more difficult to assess visually, but correctly have edges near the change points. The two horizontal lines are approximate 0.95 false alarm probability limits for the parametric size 3 moving window using the MAD approach, so the moving window option has a few false alarms where the red 2s exceed the horizontal lines away from the three change points, meaning it detects more than four groups in this realization. The moving window option computes the difference between successive moving averages of, for example, 3, (user specified) successive nonoverlapping values.

Figure 5 is a simulated example to illustrate that even if is large, so that most mean shifts due to shifting will be easily observed, some shifts will be small and difficult to detect. Similarly, in Figure 5, the mean shift from group 1 to group 2 might not be detected, particularly if increased.

Other change detection options can be considered; for example, a nonparametric option has been implemented that works with the ranks of values in a moving window system similar to the parametric window system used here. Nonparametric options can outperform parametric options if the distributional assumptions made by the parametric options are badly violated in the data.

Several simulations have been performed in R [8] to assess the impacts of having to perform change detection to choose the groups. As shown above, incorrect group assignments can lead to negative bias in and to positive bias in The IAEA’s calculations of the sample size needed to attain a specified loss detection probability require separate estimation of and [5].

For the results in Tables 1 and 2, the simulations used a total of 48 observations, divided into cases defined as having 2, 3, 4, 6, 8, 12, 16, or 24 equal-sized groups of 24, 16, 12, 8, 6, 4, 3, or 2, observations, respectfully. Each case was analysed using 105 simulations, and for brevity here, only some of the results are reported in Tables 1 and 2. The Table 1 results (repeatable to across sets of 105 simulations) are the relative bias (in percent) and the relative root mean-squared error (in percent). Table 2 is the same, but negative values of are set to 0, and the Table 2 entries are for estimation of , and rather than for estimation of . The RMSE is defined in the usual manner, for example, for in Table 1 as , and the relative RMSE is defined as . The results reported in Tables 1 and 2 use the correct grouping ( = 6, =8) and several wrong groupings as indicated. Tables 1 and 2 also use the MAD-based windowing change detection method (with a false alarm probability, FAP, of 0.1, 0.05, or 0.01 as listed) and reduce the bias in the estimates or . The RMSE in or is only slightly larger than in the case where the true groups are known.

Figure 6(a) plots an example realization of the data with = 6, = 8 and with . Figure 6(b) plots the results of the MAD-based windowing change detection method with a false alarm probability (deciding that a potential change point is a true change point when in fact it is not) of 0.01. Figure 7 is the same as Figure 6, but for with Based on the 105 simulations, the distribution of the number of groups found using the false alarm rate of 0.05, 0.17, 0.33, 0.30, 0.13, 0.02, 0.002, and 0.00006, for = 4,5,6,7,8,9,10, or 11, respectively; the average number of groups inferred is 5.4; the standard deviation in the estimated number of groups is 1.1.

Figure 8 plots real (O-I)/O with the groups being the inspection periods in Figure 8(a) and the groups being inferred using windowing in Figure 8(b). In Figure 8(a), the estimates are . In Figure 8(b), the estimates are . Figure 9 plots a second example of real (O-I)/O with the groups being the inspection years in Figure 9(a), inspection periods (multiple inspection periods year year) in Figure 9(b), and the groups being inferred using windowing in Figure 8(c). In Figure 9(a), the estimates are . In Figure 9(b), the estimates are . In Figure 9(c), the estimates are .

5. Discussion and Summary

This paper introduced a lag-one differencing method to estimate the random error standard deviation and then used the estimate to calculate a change detection threshold in a moving window method to detect shifts in the short-term systematic error. Performance results on simulated and real data were presented. The impact of having to perform change detection on the estimated short-term systematic and random error variances appears to be modest or small. In the motivating example, typically, it is assumed that short-term systematic errors change across inspection periods, so inspection periods form the groups used in the ANOVA. In some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times.

The default assumption that is often adequate is that changes across inspection periods, so the groups are defined as inspection periods. However, if change detection contradicts the default assumption that changes across inspection periods, then change detection as presented in Section 4 is a reasonable option. Recall from Section 3 that , so if the grouping structure were ignored totally (implicitly assuming = 1), then the sample variance has expected value . For example, with = 1, = 48, = 6, = 8 (Section 4 example), then the expected value of is , which is not dramatically different from the desired value. Fortunately (see Tables 1 and 2), the bias in is not that much larger when the groups are chosen by change detection than in the known-groups case. Interestingly, due to a negative bias in , the RMSE in can be slightly smaller for some wrong-groups choices than for the known-groups case. However, negative bias in could lead to more false alarms in nuclear safeguards applications, such as the application illustrated in Figure 1 [5], so it is preferable to use change detection to avoid large negative biases. Any reasonably effective change detection method [9, 10] should be adequate in this context; the windowing scheme based on a robust MAD-based estimate of is among the simplest reasonable options. References [25] indicate how the error variances and propagate in several examples; generally, tends to be smaller than but often has larger impact on the variance of the difference statistic or of the material balance [25]. Therefore, grouping choices that lead to negative bias in will tend to underestimate variance () and variance().

Data such as that in Figure 1(a) are collected in statistical quality control applications, where in phase I control limits are estimated, and in phase II the process is monitored using the estimated limits from phase I [11]. A key choice in quality control is the choice of rational subgroups [11]; a good rational subgrouping minimizes variation within groups and maximizes variation between groups. Ideally, only random errors are present within groups, while other effects might be present between groups. Analysis of Phase I data is retrospective, such as presented here for paired data, where the windowing scheme based on a robust MAD-based estimate of was introduced to find good subgroups such as those found in Figure 7 (the inferred groups closely approximated the true groups).

One open question is to estimate the RMSE in for any particular data set for which the true values of and are unknown. One option is to use the same type of simulations that were used to estimate the bias and RMSE in (Tables 1 and 2), using and as the true values of and . However, this approach is likely to be quite noisy unless the number of groups is large enough that the RMSE in is small. The relative RMSE of 52% in is too large (Table 2) for = 6, = 8, so more than 6 groups would probably be needed for reliable estimation of the RMSE in . A second open question is to estimate the actual false alarm probability using the estimates and in data such as that in Figure 1, when the nominal false alarm probability is specified, such as 0.05. Reference [12] addressed this question using tolerance intervals assuming that the group memberships are known. A third open question is to evaluate the impact of having to infer group memberships on .

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.