Abstract
Automated public health records provide the necessary data for rapid outbreak detection. An adaptive exponentially weighted moving average (EWMA) plan is developed for signalling unusually high incidence when monitoring a time series of nonhomogeneous daily disease counts. A Poisson transitional regression model is used to fit background/expected trend in counts and provides “one-day-ahead” forecasts of the next day's count. Departures of counts from their forecasts are monitored. The paper outlines an approach for improving early outbreak data signals by dynamically adjusting the exponential weights to be efficient at signalling local persistent high side changes. We emphasise outbreak signals in steady-state situations; that is, changes that occur after the EWMA statistic had run through several in-control counts.
1. Introduction
Early detection of disease outbreaks is essential for the efficient control of acute public health risks. For example, identifying the source of, and restricting exposure to, contaminated food products can only proceed once there is enough information for an outbreak data signal. Automated detection systems that accumulate sufficient information for earlier identification of an outbreak data signal are common now days. Identifying unusual disease incidence is normally carried out by monitoring data streams such as daily, weekly, or monthly counts or rates.
Statistical process control (SPC) methods, including control charts, are increasingly being applied to public health surveillance [1, 2]. Unusual incidence is automatically signalled when the control chart of the disease time series exceeds a threshold “control limit’’. However, disease incidence time series can exhibit characteristics that increase the complexity of detection of unusual incidences. Counts typically vary due to known influences such as school holidays, seasons, day of the week, and normal transitional (autocorrelation or autoregressive) influences. Subtracting expected behaviour from the counts reduces predictable variation effects in the time series, making it easier to detect unusual behaviour. Traditional monitoring of standardised differences between observed and modelled counts “standardised forecast errors’’ assumes normality of the forecast errors. However, Buckeridge et al. [2] indicated that conventional process control methods applied to syndromic data result in thresholds that produce unacceptably high false alarm rates when thresholds are not adjusted for the violation of distributional and other assumptions. The focus of the paper is detecting unexpected aberrations in heterogeneous time series of counts while trying to control for false alarms.
Exponentially weighted moving average (EWMA) control charts have been useful for monitoring industrial production processes (e.g., see Lucas and Saccucci [3]). More recently they have been used in public health surveillance (see review article by Woodall [1]) and for monitoring network counts (see Lambert and Liu [4]). Lambert and Liu used an EWMA plan of pseudo residuals (see Zucchini and MacDonald [5]) to monitor network counts (note that a surveillance methodology with all parameters specified is defined as a plan in this paper). We propose an alternative approach that avoids conversion to pseudo residuals. Lambert and Liu [4] used the EWMA to update means every minute, or on hourly cycles, whereas we use transitional Poisson regression models to update one-day-ahead forecast mean daily rates similar to that of Burr et al. [6]. We compared the CUSUM rather than EWMA of pseudoforecast errors to our proposed plans, because CUSUM and EWMA have similar performance in terms of early detection, but the CUSUM plan can be optimised [7].
Grigg and Spiegelhalter [8] used generalised linear models to estimate expected outcomes and correct counts by adjusting for risk factors. Their adjusted scores (called pseudo-observations) facilitate the temporal comparisons to baseline values thus allowing confounders to the comparison to be removed. Similarly we adjust for the influences of known variation caused by day of the week influences, seasonal influences and usual transitional influences, thus giving our plans a better chance of detecting unusual causes of variation. Unlike Grigg and Spiegelhalter [8] a primary aim of our work is to improve the early detection of outbreak data signals; a major goal of automated real-time public health surveillance. We offer a practical way of dynamically changing the exponential weights, giving the plan a better chance of detecting predictable (yet relatively small) (unusual) epidemic footprints. The strategy used is similar to that applied by Capizzi and Masarotto [9] and Nembhard and Kao [10]. The traditional method for overcoming the heterogeneity in trend is to use one-step-ahead forecasts from regression models that conform to usual, background disease count behaviour and then calculate control charts from the standardised forecast residuals (e.g., see Parker [11], Miller et al. [12], Kleinschmidt et al. [13], Wang et al. [14], Farrington et al. [15], Cowling et al. [16], Stoto et al. [17], and Kaninda et al. [18]). The forecast residuals are the difference between new count observations and the forecast count predicted by the regression model. In other words, regression models are used to define usual or expected behaviour, and the forecast residuals to detect unusual variation. We included explanatory variables that accounted for(i)the annual seasonal influence,(ii)day of the week influences,(iii)usual transitional (day-to-day) carry over influences of the lag counts.
EWMA plans are used to accumulate memory of local trends. The benefit of accumulating memory is having sufficient power to signal a change. The exponential weights control the amount of memory retained in the EWMA plan. Lower values retain a longer memory.
Monitoring performance is measured in terms of out-of-control average time to signal. We used the average number of days from the day of onset of outbreak to the day the outbreak signal to define performance. The average time to signal for a step change of in mean (denoted by ) is used to measure a plan early detection performance. An in-control of 100 is used for monitoring daily counts, which translates to just over 3 false alarms per annum.
Most of the work on control chart efficiency in the literature on surveillance has been carried out for EWMA plans in their initial state where the out-of-control behaviour is generated starting the EWMA at its in-control value. In practice, process changes seldom start when EWMA statistics equals the in-control value. Therefore, all values are found using the more realistic steady-state situation, where the EWMA statistic has worked through some in-control background data prior to the introduction of outbreak data. Steady-state situations were simulated by running the EWMA plans through 25 days of in-control counts (denoted the burn-in period) before introducing a change in the mean count representing outbreak data.
Section 2 introduces the EWMA for homogeneous counts. Section 2.1 examines how the EWMA plan can be improved when the in-control count follows a homogeneous Poisson distribution. Section 3 extends the EWMA plan for nonhomogeneous background (in-control) counts, and examines their improvement when parameters are known. We discuss how the performance of EWMA plans can be improved when counts are nonhomogeneous Poisson distributed. We consider early detection of both seasonal epidemic footprints, such as the start of the influenza season, and unusual increases in counts. Seasonal epidemic footprints are expected each year and therefore are not unusual. However, early detection is important because seasonal epidemic footprints vary in size and timing. Early detection of the start of the epidemic footprint defines when closer monitoring is required for informed decision making, such as, increasing the use of agency staff to cope with increased patient demand for hospital emergency department services. A simulated example is used to demonstrate the efficiency of the recommended improvement strategy and demonstrate that this plan is significantly better than using the “optimal’’ adaptive CUSUM of pseudo residuals in terms of early signals for the same false alarm rate. Section 4 presents an example of application that demonstrates the value of our improved plans relative to alternatives. The paper concludes with a discussion in Section 5.
2. EWMA Plans for Homogeneous Counts
Let the daily count on day of a certain disease be denoted by . The EWMA statistic for these counts is given by (in-control or background means are constant with time therefore homogeneous counts have means that do not depend on ) and where (the exponential weight) is a suitable constant . The value determines the level of memory included in the EWMA plan. At time , , the count for time is given a weight of , therefore a burn-in of 25 observations is adequate for the EWMA output series to reduce to a steady-state for values of , because is close to zero.
Suppose outbreak data cause the mean to increase from to . An important consideration is the selection of a -value that detects this shift earliest.
For Poisson counts with mean , the EWMA plan signals a significant shift in mean from the in-control mean of whenever where is a function of , , and (the in-control average run length). Values for were found using simulation techniques (for more details on how the control limits were found see Sparks et al. [19]). Table 1 provides values for for a range of and values.
The statistic is easy to interpret because it provides a local estimate of the mean for daily counts (the average incidence expected per day). Improving EWMA charts for early detection is now investigated in steady-state situations.
2.1. Improving EWMA Plans for Homogeneous Poisson Counts When Parameters Are Known
Sparks [7] and Shu and Jiang [20] optimised adaptive CUSUM plans for detecting shifts when data are approximately normally distributed. The same approach is followed here. Let the value be the exponential weight that provides the earliest signal (the smallest ) for the shift from to (i.e., ). An operating model for interpolating values for values of is provided in (2.3). The operating model is used for interpolating values for any given values for and . When the normal approximation is adequate and standard EWMA plans can be applied.
For disease counts that are Poisson distributed, a model for interpolating “optimal’’ exponential weights (denoted by ) that deliver approximately the smallest for and is where is an indicator variable that takes on a value of if and zero otherwise. For example, improving EWMA plans for detecting shifts from 1 to 2.25, 2 to 4, 3 to 5.25, and 4 to 6.5 would be achieved approximately by selecting equal 0.175, 0.2, 0.225, and 0.225, respectively. The restriction of the model (2.3) to is not serious because the values for shifts that lead to are generally 2 or less, and therefore there is little opportunity to improve the EWMA performance for such large shifts. Table 2 reports for all combinations of or and where or . The correlation between the “true’’ values (derived from simulation) and the model fitted values (expected) using (2.3) is 0.984 with the standard error for the regression model is given by 0.0188. The standard error of 0.0188 is adequate because the EWMA plan with delivers close to the lowest value. The change in is only slight within this range. In practice, improving EWMA plans for shifts from to is achievable by selecting only when and are known or can be reliably estimated. The minimum value of is taken as 0.05. Placing this lower bound on reduces the importance of early detection of small shifts that may not be worth investigating.
3. Plans for Nonhomogeneous Counts
For nonhomogeneous counts , the EWMA statistic of (2.1) needs to be standardised in some way to remove the influence of nonhomogeneity, that is, changing with time , denoted by . In addition, the optimisation process for the plan involves changing the exponential weights dynamically for each day to (). One possible standardisation is to divide both sides of inequality (2.2) by to adjust for changing and with time . For known moving means , this standardisation leads to the adaptive EWMA for nonhomogeneous Poisson counts given by Statistic (3.1) is similar to the adaptive control statistic used in Sparks [7] for normally distributed data. Sparks developed a CUSUM plan, that is, nearly invariant to changes in the reference value. Similarly 's control limit is nearly invariant to changes in and . That is, the statistic signals whenever with average time between false alarms close to . The invariant property of plans are very useful, because plans do not need to be adjusted for changes in and . The multiplicative adjustment to the threshold of 1 for needed to achieve is examined. The control limit of 1 is based on homogeneous Poisson count control limits with known means, and these may need to be adjusted for nonhomogeneous Poisson counts where means are unknown and are estimated from the data. The adjustments for specific models are available for particular examples in Tables 3 and 4. Note that values are very close to 1 for all models considered, that is, almost no adjustment was required. In addition, it will be demonstrated later that estimating to be temporally “optimal’’ makes the plan perform better than fixing at .
A disadvantage with is that it has lost the simple interpretation of estimating the local mean. In addition, it is difficult to establish whether is trending above or below what is expected. It is therefore helpful to draw the time series line of the EWMA smoothed value of on the time series plot of (represented by a dashed line in all plots). If is trending above or below the dashed line then counts are higher or lower than expected, respectively.
If future shifts are known in advance for homogeneous counts then the value can be selected to minimise the EWMA plan's out-of-control as outlined in Section 2.
3.1. Efficient Plans for Nonhomogeneous Poisson Counts When Means Are Unknown
An adaptive EWMA plan using should work significantly better than guessing in advance a value for and using it unchanged. In practice, efficient estimates of and are required before an improvement is possible. In addition, we are less interested in detecting small shifts in mean (), for example, we may decide not to respond to any increase in mean of or less. Designing plans for has the advantage of delaying signals of increases in means less than . We achieve this by setting the lower bound for , thus never fall below . In this paper, we arbitrarily set . Therefore, the application of the methodology involves the following steps.(1)Estimate or using one-day-ahead forecasts (details on how this is achieved under different situations are given in the next two subsections).(2) Use the estimates in step to estimate , and in turn calculate .(3)Calculate in (3.1) using the estimates of and obtained in step .(4)Signal an unusual epidemic footprint whenever where is the adjustment value discussed earlier.
Let the one-day-ahead model forecast for mean made at time be denoted by . The next two subsections describe the improved EWMA design procedure for different purposes.
3.1.1. Detecting Unexpected Increases in Counts
The one-day-ahead forecast is used to remove the natural variation that comes from day of the week, holidays, seasonal and transitional influences, that is, is estimated by . This process improves the detection of unusual outbreak data signals such as might be caused by bioterrorism or an epidemic footprint. The smoothed day ahead model forecast error , constrained to be positive (since we are only interested in shifts on the high side) is defined by where is a suitable smoothing weight. Note that estimates . The estimate of is therefore The estimates of and given by and above, respectively, are used in step to estimate from (2.3).
3.1.2. Detecting Seasonal Epidemics Early
Suppose the aim of surveillance is identifying the start of a seasonal epidemic as early as possible, for example, the start of the annual influenza season. An epidemic footprint is expected each year, but the start time varies from season to season and varies in magnitude. Assume we have prior information that the seasonal epidemic event is rare in a certain period of the year (e.g., summer for influenza). Let the average (nearly homogenous) mean during the nonepidemic period be denoted by (used to estimate ). Assume we are not interested in the early detection of any epidemic with a mean less than for some fixed . An improved adaptive EWMA can be constructed by estimating by and by , and use these estimates in step to estimate from (2.3). The resulting plan should be close to the “optimal’’ plan for detecting shifts from to . The closeness depends on how well estimates . In other words, if , for all days , are reasonable forecasts of the means during epidemics, then the improved plan should be efficient at detecting the epidemic footprint early.
3.2. Efficiency of the Plans
A simulation study is used to evaluate the practical efficiency of the plans. Tables 3 and 4 evaluate the average time to signal for two types of outbreak data: step changes and parabolic changes, respectively.
In-control background counts: were generated using means equal to given by with the following models.
Model 1. for (Monday to Friday), (Saturday), (Sunday), , ,
Model 2. for (Monday to Friday), (Saturday), (Sunday), , ,
Model 3. for (Monday to Friday), (Saturday), (Sunday), , ,
Model 4. for (Monday to Friday), (Saturday), (Sunday), , ,
For Models 1–4, for respectively, (i.e., ). The value in (3.5) relates to the moving window size in the model fitting process and therefore can be ignored in the data simulation process. After generating the counts, these models are assumed hidden and refitted using the simulated data. The parameter estimates are assumed to move slowly by estimating them using a moving window of observations. Therefore, no knowledge of the fixed regression parameters are assumed, and the parameters are assumed to change slowly over time with (say) process improvements, for example, improve data collection or slow drifts in population sizes. The moving window can be thought of as allowing the monitoring plan to adapt to changes in the process in the same way as control charts are updated as the process improves. The moving window relates more to good practice in updating the plan and is unrelated to the simulated data.
Simulated examples of unusual outbreak data: in-control counts were generated as above and used as training data to fit the model for the first day of the outbreak. An epidemic footprint was generated by adding counts simulated from a Poisson distribution with mean to the in-control generating counts on day . The result is Poisson counts with means .
Two outbreak data used for the results in Tables 3 and 4, respectively, are the following.(i)A step change of in mean was simulated to mimic increasing disease counts that result, for example, from health consequences to step changes in environmental pollution. The simulated epidemic footprint was continued until all plans signalled.(ii)A parabolic change in mean simulated to mimic say seasonal increases in say influenza counts. Here the mean increases of where were generated as an unusual epidemic starting from day . The indicates how long the epidemic lasts. and are selected to keep the epidemics of the same size, that is, is varied so that is constant for all selections of making all outbreaks roughly equally important. We use A signal could only be given for its epidemic footprint duration, and therefore, it could be missed by the monitoring plans. Therefore the proportion of missed detections is recorded (see Table 4).
The model in (3.5) was fitted using the first three years of simulated data to deliver one-day-ahead forecast in-control (usual) behaviour (expected counts). Thereafter forecasts were made using the fitted model for a moving window of days of data. The fitted values were used to select values that delivered smaller values for the “unknown’’ simulated epidemic footprints. In Table 3, new data were recursively simulated until the outbreak footprint was signalled, and the time to signal (in days) was recorded. This same process is repeated 10 000 times to give the average time to signal (i.e., or ). Tables 3 and 4 compare the improved adaptive EWMA to the adaptive EWMA with and the adaptive CUSUM plan (Sparks [7] and Shu and Jiang [20]) applied to pseudo-residuals given by For each simulation run different three thousand Poisson counts were generated using the above model. The first 1095 days (3 years) were used as training data to fit the model that provides a forecast for day 1096, and thereafter a moving window of 1095 days of data fitted the model to give the day ahead forecasts. Additional counts for the epidemic were added to the counts starting from day 1096 onwards (going for L days for the parabolic increase in means). The number of days from 1095 to the plan's first out-of-control signal is taken as the time to detection (in days). Repeat the process 10 000 times to estimate of . The values are reported in Tables 3 and 4 for the step change and parabolic change, respectively. The plans in Tables 3 and 4 all are designed to deliver in-control of 100, so plans are comparable. The CUSUM of the pseudoforecast errors has the multiplier adjustment needed to deliver a plan with , , not close to one, thus indicating the normal approximation of pseudo residuals is inadequate to use EWMA plans based on normality without adjustment. However, more importantly is not approximately constant across all models, which means that(1)if the conventional CUSUM is applied to the pseudoforecast errors (without adjustment), the plan will deliver too many false alarms,(2)the threshold needs an adjustment for each change in the mean distribution.
On the other hand, the Poisson adaptive EWMA plans need very little adjustments for the design control limit of one ( ranging from 1.0075 to 1.02). From the results in Table 3, we conclude that the improved Poisson adaptive EWMA is always, on average, more efficient than the Poisson adaptive EWMA with for step changes. For Models in Table 3, improved Poisson adaptive EWMA plans have smaller values for all . For Models and in Table 3, improved Poisson adaptive EWMA plans have smaller values for all . In Table 4, the improved Poisson adaptive EWMA’s advantage only holds for Models and . In addition, for Models and , the improved Poisson adaptive EWMA is likely to miss fewer parabolic unusual changes (not evident from Table 4 because the gains are small). For low counts (Model ), there is generally insufficient information to improve the exponential weights, and so we recommend using for low counts.
In summary, the dynamic adaptive Poisson EWMA plan signals, on average, earlier than the adaptive Poisson EWMA plans with for most step changes in mean of 2 or more (Table 3), while for parabolic changes the results are mixed. This knowledge allows users to revise the dynamic plan by setting when means trend lower.Notice that the Poisson adaptive EWMA signals step changes on average earlier than the “optimal’’ adaptive CUSUM of the pseudoforecast residuals. The similar results hold for parabolic changes but these results are not reported. Therefore, the “optimal’’ adaptive CUSUM of the pseudoforecast residuals plan is not recommended based on these simulation results.
4. An Application
The application involves influenza A daily counts in New South Wales (NSW), Australia ranging from the end of 2003 to the end of 2005. Temporal trends in the daily disease counts are found in the first plot of Figures 1 and 2. Figures 1 and 2 examine the detection of unusual epidemic footprints and early detection of the start of the flu season, respectively. The second plot in Figures 1and2 is the “optimal’’ adaptive EWMA plan (), and the last plot is a time series of the “optimal’’ CUSUM of the one-day-ahead pseudoforecast errors. These are compared in terms of early detection. The adaptive plan is designed to give a false alarm every 100 days under the assumption that counts are Poisson distributed with the means given by their one-day-ahead forecasts. We consider models of the form of (3.5) except that a school holiday factor was included and the number of lags in Poisson transition regression model (3.5) was taken as 4 (the transitional model that gave the smallest AIC value). Model (3.5) is used to forecast the mean for the day ahead. Similar approaches to modelling disease counts can be found in Burr et al. [6], Bjornstad et al. [21], and Christensen et al. [22]. Moving windows of one, two, and three years of data were tried. In addition, various discounting weights, for example, , giving more recent observations greater weight in determining regression parameters were tried. The forecasting plan that produced the smallest forecast errors was (see [23])(i)a moving window of 730 days of data (2 years),(ii)discounted weights as the number of days moves away from the day being forecasted ().

(a)

(b)

(c)

(a)

(b)

(c)
The autocorrelations of standardised forecast errors for influenza were insignificant (see Figure 3). Therefore applying standard control assuming independence is reasonable provided that the normality assumption is not violated.

The forecasts, together with the smoothed forecast errors, are used to optimise the adaptive EWMA plan as outlined previously. After each signal, EWMA plans reset the EWMA statistic to its initial value, and therefore follow-up signals indicate that either the epidemic footprint persisted, or that the counts continued to be higher than expected. The “optimal’’ adaptive CUSUM plans for the one-day-ahead pseudoforecast errors used in Tables 3 and 4 were compared to the dynamic adaptive Poisson EWMA plan.
4.1. Early Seasonal Epidemic Footprint Detection
The first plot in Figure 1 indicates daily counts for influenza A in NSW, Australia. We assumed that a seasonal increase in influenza counts can start any time from beginning of May to the end of October. The model is fitted using a moving window of two years (730 days). As such, control limits adjust, for example, as either disease control improves or the population size changes. This adjustment is similar to revising the control limits as the process improves in manufacturing. This process helps signal small epidemic footprints after counts reduce due to improved policy or control.
The second plot in Figure 1 presents the Poisson adaptive plan with dynamic exponential weights designed for earlier detection of epidemic footprints. The dashed line in Figure 1 indicates the expected trend for the dynamic Poisson adaptive EWMA. The dynamic Poisson adaptive EWMA line trends above the dashed line when the counts are trending higher than expected.
Two epidemic footprints were detected in each of years 2004 and 2005 (see Figure 1 with time on the x-axis). Note that the Poisson adaptive EWMA flags the first epidemic footprint in 2004 in mid-July 2004 (second plot in Figure 1), much earlier than the “optimal’’ adaptive CUSUM of the pseudoforecast residuals plan (second plot in Figure 1). However, the second epidemic footprint is signalled a few days earlier by the “optimal’’ adaptive CUSUM. The major difficulty with the “optimal’’ adaptive CUSUM is what adjustment () to use for nonnormality. The value (see Table 3) was selected because it was appropriate for early detection of the first epidemic footprint, but it was not appropriate for the second. This point highlights the difficulty with using the “optimal’’ adaptive CUSUM in practical circumstances. The Poisson adaptive EWMA plan, on the other hand, is very robust to changes in the process mean.
4.2. Detection of Unusual Counts
The second plot in Figure 2 presents the dynamic adaptive plans for early detection of unusual high influenza A incidence after correcting for usual seasonal, day of the week, school holiday, and transitional influences. The dynamic Poisson adaptive EWMA plan signals unusual increase in counts many more times than the “optimal’’ adaptive CUSUM plan independent of adjustment of normality (both and are recorded in Figure 2). Again this highlights the difficulty with applying the adaptive CUSUM plan to pseudo residuals because of the difficulty knowing what adjustment to use.
A laboratory diagnosing patients as having influenza A was found to produce many false positives in early 2005. Poisson adaptive EWMA plan signalled these whereas the “optimal’’ adaptive CUSUM plan fails to signal (see southern hemisphere summer of 2004/2005 in Figure 2).
Further analysis (not reported) demonstrated that the Poisson adaptive EWMA plan for misses the first unusual increase in Figure 2, thereafter the performance of both Poisson adaptive EWMA plans was almost identical. The dynamic Poisson adaptive EWMA plan has an advantage over the Poisson adaptive EWMA plan with in this example.
5. Discussion
In summary, we offer a dynamic Poisson adaptive EWMA plan for early detection of epidemic footprints when disease counts follow a Poisson regression model. Results demonstrate that in the case of step changes in means, the dynamic Poisson adaptive EWMA detects outbreaks earlier than the CUSUM plan of pseudoforecast errors. Results for parabolic changes demonstrate the superiority of the dynamic Poisson adaptive EWMA over adaptive EWMA with for only counts with means not too low. We recommend the adaptive EWMA with when counts are low (e.g., average means less than 2).
If the in-control mean changes dramatically from season to season, then the adaptive CUSUM of pseudo residuals plan threshold adjust would need revising from season to season. This aspect makes the implementation of the adaptive CUSUM of pseudo residuals plan more difficult.
Most of the effort has been placed on sequential hypothesis testing, but the performance of plans is also influenced by recursive estimation of model parameters. More effort in modelling would improve the performance of the Poisson adaptive EWMA plan. There is also value in monitoring model lack of fit for each moving window of data (see Zeileis [24, 25] and Zeileis and Kleiber [26]).
Acknowledgments
The authors are grateful to Dr. Petra Graham for writing R programs for fitting the models in the application. In addition, the authors thank the referees for their valuable comments that have improved the presentation of this paper.