Abstract

Monitoring a process over time using a control chart allows quick detection of unusual states. In phase I, some historical process data, assumed to come from an in-control process, are used to construct the control limits. In Phase II, the process is monitored for an ongoing basis using control limits from Phase I. In Phase II, observations falling outside the control limits or unusual patterns of observations signal that the process has shifted from in-control process settings. Such signals trigger a search for assignable cause and, if the cause is found, corrective action will be implemented to prevent its recurrence. The purpose of this paper is to introduce a new methodology appropriate for constructing a robust control chart when a nonnormal or a contaminated data that may arise in phase I state. Through extensive Monte Carlo simulations, we examine the behaviors and performances of the proposed MM robust control chart when there is a process shift in mean.

1. Introduction

Statistical process control (SPC) concepts and methods have become very significant in the manufacturing and process industries. Their goal is to monitor the performance of a process over time in order to justify whether or not the process is remaining in a “state of statistical control.” This state of control is said to occur if certain process or product variables remain near to their desired values and the only source of variation is “common-cause” variation, that is, the variation which affects the process all the time and is essentially inevitable within the current process. Shewhart charts are used to monitor key product variables in order to detect the occurrence of any event having a “special” or “assignable” cause. By discovering assignable causes, long-term improvements in the process and in product quality can be accomplished by eliminating the causes or improving the process or its operating procedures.

Detecting one or more change points in a batch of observations has attracted substantial investigation in the statistical, engineering, and econometric literature. Assuming that there is an ordered sequence of observations, usually, but not necessarily, taken at equally spaced times, there is a change point between two successive observations if their statistical distributions are different. Between change points, the distributions are usually considered to be identical. In practice, recognizing when a process has changed would simplify the search for the special cause. If the time of the change could be identified, process engineers would have a smaller search window within which to look for the special cause. Consequently, the special cause can be determined more quickly, and necessary actions needed to improve quality can be carried out sooner. In this paper, we will analyze the efficiency of a change point estimator in process mean [1] for each of Shewhart , Median, and the proposed MM control chart once issues a signal. The derivation of the change point estimator that is shown in the appendix is in virtue of Hinkley [2]. Hinkley discussed the asymptotic properties of the estimator. Whenever each of Shewhart , Median, or the proposed MM chart signals that a special cause is present. The estimator provides practitioners with a useful estimate of the time of the process change. In Section 2, we will introduce a model for a step change in the location of a process. We consider a step change for a process mean occurs when the mean suddenly changes its value and then remains unchanged again until corrective action has been taken. On the basis of this step-change model, we adopt the estimator of the time of the process change when the corresponding chart does signal. In Section 5, we analyze the performances of each chart by means of Monte Carlo simulation.

2. Process Step-Change Model

Suppose that the process is initially in control, with observations coming from a Normal distribution with a known mean of and a known standard deviation of . Even so, after an unknown point in time (known as the process change point), the process location changes from to , where is the subgroup size and is the unknown magnitude of the change. Assuming also that once this step change in the process location occurs, the process remains at the new level of , until the special cause has been identified and removed. We let   be the first subgroup average to exceed a control limit and that this signal is not a false alarm. Hence, are the subgroup averages that come from the in-control process, while are from the changed process.

3. Definitions

To illustrate, we concentrate on robust estimates for the simple location-scale model, by letting be observations on the real line satisfying where are independent and identically distributed observations with variance equal to 1. We are interested in estimating and the scale which is a nuisance parameter. We consider -location estimates which is proposed by Huber [3]. He defined as the solution of an estimating equation of the form where is a robust estimate of the residuals scale, and is a bounded, nondecreasing, and odd real function. We focus on function which is continuous and differentiable influence function given in (3.2) (see [4]) where is a user-chosen tuning constant, and (see [5]), for other choices of smooth functions .

The scale estimate in (3.2) is an -estimate of scale (see [6]) which is defined as follows. Let be a bounded, continuous, and even function satisfying and let . The -scale is defined by where, for each , is the solution of Indeed, associated with this family are the -location estimates is given by Beaton and Tukey [7] proposed a family of functions given by where the tuning constant is positive. According to Yohai [8], these -location estimates obtained with an -scale estimate are called MM-location estimates. Specifically, the estimates , , and solve the following system of equations: Let for in (3.7), in (3.5), and for in (3.3), which yields a location estimate with 50% breakdown point and 95% efficiency when the errors have a normal distribution.

3.1. Sample Median

Sample median has been used in early process control charts as it is insensitive to behavior in the tails of the distribution. However, under the normal distribution, the efficiency of the sample median drops off rapidly towards its asymptotic value of 0.64 as sample size increases. For a random sample of size observations , the sample median, denoted by MD, is defined as follows:

The interest of using the sample median, MD, is that it is easy to determine, requires only the middle values to calculate, can be used when a distribution is skewed, is not affected by outliers, and has a maximal 50 percent breakdown point. Moreover, its gross-error sensitivity is low and as the sample size increases, the variance of the MD decreases as , but the maximum bias does not change. Hence, the bias is the property of importance for large sample sizes, and MD is the estimator possesses the smallest maximum bias for a given proportion of contamination . Huber [3] showed that it minimizes the maximum asymptotic bias over contamination neighborhoods. As opposed to that, the disadvantages for the sample median, MD, are its difficulty to handle in mathematical equations, nonutilizing all available values, and being misleading when the distributions come from a long tail distribution as it might sometimes discard some useful information (see [9, 10]). However, the sample median has become as a good general purpose estimator and is generally considered as an alternative average to the sample mean especially whenever outliers might present in the distribution.

3.2. Median Absolute Deviation from the Sample Median

The median absolute deviation from the sample median, denoted as MAD, is a more robust scale estimator than the standard deviation. The MAD was first introduced by Hampel [11] who attributed it to Gauss. It is simple and easy to compute and mainly used in detecting outliers in a data. The estimate is often used as an initial value for the computation of more efficient robust estimators. Let us denote as a random sample of size observations with sample median MD. MAD possesses the following properties: (i)it has a maximal 50 percent breakdown point which is twice as the IQR;(ii)in the case of the standard normal distribution, , the influence function of the MAD estimator, , is a step function that takes on two values. This is bounded by the sharpest possible bound among all scale estimators. With regard to the optimality properties of MAD, Martin and Zamar [12] established expressions for the maximum asymptotic bias of -estimates of scale over contamination neighborhood as a function of the fraction of contamination and show that the similar strong results are obtained in terms of maximum asymptotic bias for MAD as with the MD.

3.3. Control Limits

In this paper, we consider the general equations for constructing control limits (see [13]). Thus, with a robust location estimator and the corresponding scale estimator , the control limits are given by The constant in (3.10) is determined in such a way that is an unbiased estimator of the scale parameter. The most commonly used control charts are Shewhart charts using the sample range. For charts using the sample standard deviation, the in (3.10) is the sample mean and is the sample standard deviation with .

Shewhart modified these control limits using rational subgroups (see [14]) in which rational subgroups with each of size are taken. According to Shewhart's suggestion, these subgroups are formed in order that the between-groups variability is maximized while the within-group variation is minimized. In this view, then where and are the subgroup mean and standard deviation, respectively. Each of these estimates is an unbiased estimate of the corresponding parameter, then the control limits using rational subgroups are

In practice, the control limits are the average of the control limits for the subgroups. In the case of chart for which we employ the sample range as the scale parameter, it is estimated with the average range computed by averaging over the subgroups Then, the control limits are defined as

For constructing the control charts under a normal distribution using the robust estimators, we will determine the appropriate constant for the desired estimators through computer simulations. To illustrate, a sample of size was taken from . The constant was computed by averaging over 100,000 repetitions. Here, for instance, if we consider using as scale estimator, then over 100,000 repetitions, we expect for any . Table 1 exhibits the constant such that . It can be seen that if scale estimator is the sample range, the simulated values () agree closely with the standard tabled values. and are the corresponding estimates for the Median and the proposed MM charts.

Thus, in the same way, the estimators for location and scale for Median-MAD chart are given by while the estimators for the proposed MM chart are given by In this study, three different estimators under investigation are as follows:

: = Sample Mean; = Range,

: = Median; = MAD = Median,

: = ; = .

3.4. Confidence Regions (Confidence Set)

One of the benchmarks for assessing the performance of a control chart is to construct a confidence region for the time of the process change. The use of confidence region on the change point is that it will suggest practitioners with useful starting points for searching their process log books and records for the special cause. This will provide the practitioners a “search window” for the special cause and aid in quicker identification. Hence, practitioners can then take necessary action for the special cause sooner in order to improve quality as well as to reduce process downtime.

Basically, we will incorporate the likelihood function to obtain a confidence region for the process change point. The confidence region approaches in the statistics literature involve the likelihood function that relies on asymptotic theory (see [15]). In process monitoring, there are relatively small time intervals between the process change point and the time of the control chart signal. Thus, approximations based on asymptotic theory may not be appropriate.

Box and Cox [16] proposed a method involving the log likelihood function for constructing a possibly noncontiguous confidence region (also called a confidence set) on a parameter. Their approach can be used to build a confidence set (CS) for the process change point using the log likelihood function having the form Here, is the maximum value of the log likelihood function; is the MLE of (i.e., the value of that maximizes the log likelihood function), where is the value of the log likelihood function at . We let and represent constants determined by the subgroup averages. Thus, can be expressed as

Box and Cox [16] proposed using to obtain a confidence region. Siegmund [17] used asymptotic theory to develop a confidence set for the change point of a normal process mean based on the log likelihood function. He proposed using the value

By means of Monte Carlo simulation, we study the confidence sets obtained with nominal confidence coefficient of . In accordance with Box and Cox [16] and Siegmund [17], the values for a 90% confidence set are and , respectively. It was observed that value suggested by Box and Cox [16] provides a 90% coverage for a value of between 2.0 and 3.0, while Siegmund’s [17] of provides at least 90% coverage for . By trial and error, the value of provides at least 90% coverage for .

4. Methodology

In order to analyze the performance of the control charts, we consider using Shewhart , Median, and our proposed MM control chart. When a control chart signals that suggest a process change has occurred, the change point estimator (see the appendix) is then applied to the data to estimate the time of the change at which we need to find the value of in the range which maximizes . The reverse cumulative average, , is the overall average of the most recent subgroups for which the value of maximizing the values is our estimator of the last subgroup from in-control process.

5. Simulation Study

We will now analyze the performance of the change point estimator adopting the three control charts through the simulation study. Assuming that the process is initially in control, with observations coming from a normal distribution with a known mean of and a known standard deviation of . However, after an unknown point in time (known as the process change point), the process location changes from to , where is the unknown magnitude of the change. We also assume that once this step change in the process location occurs, the process remains at the new level of , until the special cause has been identified and removed.

To illustrate, the data are generated under different settings of distributions. In addition to the normal distribution, two alternative distributional forms are considered. They are contaminated model (Case 2 of (5.2)) and Slash distribution. Under different types of distributions, for each run, the data consist of subgroups of size are used to construct the control limits and summary statistics are calculated. In order to assess the performance of the corresponding chart, observations for 1 to 100 are generated from standard normal distribution. Then, starting from subgroup 101, observations were randomly generated from a normal distribution with mean and standard deviation 1 until each of Shewhart , Median, and the proposed MM control chart produces a signal. The procedure was repeated a total of 10,000 times for each of the values of magnitude that was studied, namely , 2.0, and 3.0. For each simulation run, the change point estimate was computed. Subsequently, the average of the estimates of for the 10,000 simulation runs was computed along with its standard error, expected length, and coverage probability.

For analyzing the outlier model, we modified the contaminated model by Davis and Adams [18]. Specifically, the in-control conditions with contaminated data values are determined by generating a random number from a Uniform (0, 1) distribution and the corresponding frequency of contaminated data, . A Uniform (0, 1) random probabilistic value is generated for observations of sample , . Let represent an indicator function with A random observation in the simulated data is described by (5.2). Different changes for each of process states are illustrated in the expression. For the purpose of comparisons, the frequencies of contamination of and with are considered for the purpose of creating some disturbances in the data

Case 1. In-control (no contaminated data): , , , .

Case 2. In-control (contaminated data): , , , .

As with the computation of sizes of confidence sets obtained using a specific value of , for each control chart and magnitude of change studied, a step change in the normal process mean was simulated following . The confidence set estimator was applied following a signal from the corresponding control charts considered. The size of the confidence set was recorded as well as whether the confidence set covered the true process change point of . This procedure was repeated for a total of simulation runs for each of values considered. The proportion of the 10,000 runs that covered the true process change point was also determined. This was reflected by the resulting estimates of the coverage probabilities obtained by specifying the value to be 2.97, such that it provides at least 90% coverage for , which has been discussed in Section 4. The results are tabulated along with the average sizes of confidence sets. For a given coverage probability, a smaller confidence set is preferred, so that process engineers can more narrowly focus their search for the special cause. In general, it is presumed that the increase in the magnitude of shift will be followed by the increase in the corresponding coverage probability.

From Tables 2, 3, 4, and 5, we can see that the performances of the three charts in terms of coverage probability are quite similar especially when the process data come from a normal distribution. Generally, the Median chart and the proposed MM chart perform better in most of the cases (particularly for a larger proportion of contamination). For instance, for a change of magnitude , with , , the coverage probability utilizing chart is 0.588, while the coverage probability for Median chart and the proposed MM chart are 0.626 and 0.632, respectively.

In Tables 25, averages of change point estimates are also tabulated for various sizes of change in the process mean together with its corresponding standard error estimates for a normal case setting. As the actual change point for the simulation was at time 100, the average estimated time of the process change, , should possibly be close to 100. With chart, we see that when the process step change of standardized magnitude , the average estimated time of the process change was 100.00, which is fairly close to the actual change point of 100. While for a standardized process location change of size , the average estimated time of the change is 99.60. Meanwhile, when , the average estimated time of the change is 99.47. Hence, on average, the change point estimate of the time of the process change is considerably close to the actual time of the change, regardless of the magnitude of the change.

By the same taken, with the Median chart, for the process step change of standardized magnitude , the average estimated time of the process change was 99.70, which is also close to the actual change point of 100.00. As for a standardized process location change of size , the average estimated time of the change is 99.49. And when , the average estimated time of the change is 99.57.

Lastly, when the process is monitored with our proposed robust MM chart, by and large, the change point estimate of the time of the process change is fairly close to the actual time of the change, regardless of the magnitude of the change. In the case when the process step change of standardized magnitude , the average estimated time of the process change was 99.90. It turns out that for a standardized process location change of size , the average estimated time of the change is 99.42. As with , the average estimated time of the change is 99.59. Overall, we could say that for all types of charts under study, the change point estimator of the time of the process change is able to detect the change point considerably close to the actual time of the change, irrespective of the magnitude of the change.

Another benchmark of evaluating the control chart is by examining the expected length of the signal. This is the expected time at which the control chart signals a change in the process mean that is supposed to occur at time 100. It is generally perceived that Shewhart control chart might issue a signal of a change in a process mean a considerable amount of time after the change in the process mean actually occurred. Thus, estimating the time of process change with the time when the control chart indeed issues a signal would lead to an unfavorably biased estimate. As a consequence, probably a misleading estimate of the time of the process changes. This bias is in virtue of the potentially large delay in generating a signal from the control chart. Hence, the criterion for evaluating the performance of the control chart is how quick the chart would signal (Expected Length). Tables 25 and Figures 14 summarize the performance in terms of expected length for the three charts.

In a normal distribution situation, for a step change in the process mean of magnitude , it is easy to see from Figure 1 that the expected length for a Shewhart chart is 57. The result seems to suggest that chart is the best compared to the Median chart and the proposed MM chart for which each needs 157 and 98, respectively, when the shift is small. The situation improves as the magnitude of shift increases to 2 or 3. All types of charts considered here seem to be relatively comparable and perform quite closely.

On the other hand, the Shewhart chart is inferior in an outlier model. With and , according to Table 4, both the Median chart and the proposed MM chart appear to be better than Shewhart chart for different magnitudes of shifts. The similar situation arises when a very heavy-tailed distribution (Slash distribution) is considered. Again, the Median chart and the proposed MM chart outperformed Shewhart chart with respect to the expected length. It can be observed from Table 5 that the differences are quite apparent, whereby Shewhart chart requires the expected length of 24 and yet the Median chart and the proposed MM chart both demand about 3 subgroups before detecting the first signal when the magnitude of shift is 1.

We now turn to evaluate the observed frequency in which the estimates of the time of the step were within observations of the actual time of the change, for . The results are tabulated in Tables 6, 7, 8, 9, 10, 11, 12, 13, and 14. This provides an indication of the precision of the estimator by means of the three different charts. The proportion of the 10,000 runs where the estimated time of the change was within of the actual change is expected to be increase in size as increases. Referring to Tables 614, we observed that the precision increases with the increases of for each value. Let us first focus our attention to a normal setting, when the process step change of magnitude . Monitoring the process using traditional Shewhart chart identified correctly the change point in 60.49% of the trials. It was within one observation of the actual change point in 83.23% of the trials, and within two observations of the actual change point in 91.39% of the trials. Turning to the Median chart, which is shown in Table 7, the chart detected correctly the change point in 59.59% of the trials. It was within one observation of the actual change point in 83.46% of the trials, and within two observations of the actual change point in 91.66% of the trials. It then follows that, based on Table 8, our proposed MM chart located accurately the change point in 59.61% of the trials. It was within one observation of the actual change point in 82.88% of the trials, and within two observations of the actual change point in 91.25% of the trials. All types of charts considered here seem to be comparable and performed quite equally.

Next, we observe the situation under outlier model setting, with and . Consider again when the process step change of magnitude is , for step change of this magnitude, the Shewhart chart estimator exactly identified the time of the change in just 23.65% of the trials and was within one (two) observation of the time of the actual process change in 45.03% (56.69%) of the trials. As for the Median chart, we can notice that, of the 10,000 simulation trials conducted for , 24.99% of those simulation trials identified the change point precisely. It was in 46.92% and 58.11% of the trials that the change point was estimated to be within ±1 and ±2, respectively, from the actual time of the process change. The results of the study also indicate that 25.25% of those simulation trials identified the change point correctly for the proposed MM chart. In 47.53% of the trials, the estimate was within ±1 observation, and in 58.61% of the trials, the estimate was within ±2 observations. Overall, the procedure using the proposed MM chart seems to perform slightly better than the other two charts in this respect.

Finally, we focus on nonnormal data without outliers’ situation, the Median chart and the proposed MM chart generally lead to shorter control limits than the traditional Shewhart chart. Here, we just limit ourselves to the study of Slash distribution (see [13]) which is a very heavy-tailed distribution. When the process step change of magnitude , monitoring the process using traditional chart identified correctly the change point in 5.31% of the trials. It was within one observation of the actual change point in 14.44% of the trials, and within two observations of the actual change point in 20.64% of the trials. For the Median chart which is shown by Table 13, the chart was able to detect correctly the change point in 5.45% of the trials. It was within one observation of the actual change point in 12.08% of the trials, and within two observations of the actual change point in 17.95% of the trials. It then follows that, based on Table 14, our proposed MM chart discovered accurately the change point in 5.49% of the trials. It was within one observation of the actual change point in 12.28% of the trials, and within two observations of the actual change point in 18.02% of the trials. On the whole, we would say that the Median, and the proposed MM charts perform better and more consistently than the chart in this setting.

6. Conclusions

Control charts are used to detect whether or not a process has changed. When a control chart signals indicate that a process has changed, practitioners must initiate a search for the special cause. However, given a signal from a control chart, practitioners generally do not know what caused the process situation to change or when the process has changed. Identifying the time of the process change would simplify the seeking of the special cause. In the event that the practitioners knew when the process changed, the search would simply be reduced for discovering what aspect of the process changed at that time. As a result, practitioners would increase their chances of identifying the special cause more correctly and quickly. Subsequently, This allows them to take the appropriate actions immediately to improve the quality.

In this paper, monitoring processes in the presence of data contamination and under nonnormal setting are of primary concern. We have applied an estimator that is useful for identifying the change point of a step change in normal process mean, nonnormal, and when contamination may exist. We have discussed the performance of the change point estimator and other criteria when they are monitored by Shewhart , the Median and the proposed MM control charts. The results show that the proposed MM robust control chart consistently performed well in a range of situations. It provides a useful and much better alternative in using the time of the signal from the conventional control chart. Although the proposed MM chart and the Median chart are comparable under nonnormal and contamination situation; the Median chart becomes worse in the event of normal setting. The performance of our proposed MM control chart has good properties in the aspect of expected length and coverage probability for contamination data and that arise from a heavy-tailed distribution functions for moderate sample sizes. The proposed MM chart compares favorably with traditional Shewhart control chart in normal setting especially when magnitudes of shift are 2 and 3. It is interesting to note that the proposed robust MM chart is more efficient than the Shewhart chart when the process distribution function has a heavy-tailed distribution.

The philosophy of the proposed robust MM control chart is more in keeping with the desire to provide robust limits in the face of nonnormal distribution situation or that outlier(s) may arise in data collection. The proposed robust MM charting methodology gives better performance than the traditional Shewhart chart if the underlying distribution of chance cause is nonnormal or contaminated. This is usually desirable feature for any control chart to be applied in the industry.

Appendix

In this appendix, we show the derivation of the maximum likelihood estimator (MLE) of , the process location change point. For maximum-likelihood estimation techniques, see [1]. Denoting the MLE of the change point as , let subgroup averages be , the MLE of is the value of that maximizes the likelihood function or, equivalently, its logarithm. It is shown that logarithm of the likelihood function is We can see that there are two unknowns in the log-likelihood function: and . If the change point was known, the MLE of would be , the average of the most recent subgroup averages. Substituting this back into (A.1), we obtain It is easy to verify that this is equivalent to In what follows, the value of that maximizes the log-likelihood function is where , that is, is the value of in the range which maximizes

Acknowledgment

The authors wish to thank the anonymous referee(s) for very constructive comments and suggestions that improved the paper.