Abstract

Recently, an alternative robust control chart based on a new robust estimator known as minimum vector variance (MVV) estimator, , was introduced in Phase II. was able to detect out-of-control signal and simultaneously control false alarm rate even as the dimension increased. However, the estimated UCLs of are large as compared to the traditional chart. In this study, we improved the MVV estimators in terms of consistency and bias. The result showed great improvement in the control limit values while maintaining its good performance in terms of false alarm and probability of detection.

1. Introduction

Hotelling statistic was the first statistic known to be used in multivariate control chart. The control chart is referred to as Hotelling control chart. This statistic is used to measure the significance of the shifted distance from the out-of-control mean vector, , to the nominal mean vector, , with the assumption that the covariance matrix remains constant at . The purpose of the control chart is to monitor the stability of a multivariate process in Phase I and II. Analysis in Phase I seeks to identify a stable historical data set (HDS). From this dataset, the in-control mean vector and the in-control variance-covariance matrix are estimated and later will be used in the Phase II analysis. A successful process monitoring in Phase II totally depends on the estimates of the parameters obtained from a stable HDS. However, the estimators are easily affected by unstable process, that is, multivariate outliers. The existence of outliers can violate the normality assumption. This violation may lead to the inflation of control limits and reduction of the probability of detection in Phase I, which consequently will cause the level of false alarm to be distorted, and the power to detect changes will be reduced in Phase II process [1]. False alarm rate is the probability of out-of-control signal when a process is in control. The value becomes large if the process is unstable due to the increase in variability. Inflated false alarm rate can lead to unnecessary process adjustments and loss of confidence in the control chart as a monitoring tool [2]. Hence, a method which can control the false alarm rate to the desired (nominal) level is necessary.

However, the traditional Hotelling control chart is only effective in eliminating extreme outliers in small sample sizes, but it fails to detect moderate outliers particularly when the number of variables is large [35]. To overcome the problem, alternative estimation methods have been proposed in the literature. One of the approaches is to calculate the statistic based on successive differences variance-covariance matrix estimator [69]. Though this approach is effective in detecting shifts in the mean vector, it fails to detect other outliers as shown in Vargas [3]. Another approach is to use robust estimators in place of the classical estimators ( and ). Robust estimators are known to be more effective in detecting the deviation of data, or outliers as compared to the classical estimators [10]. A wide range of robust estimators of multivariate location and scatter is available; see [11, 12] for a review. However, MCD estimator is more attractive than others because it has good theoretical properties with affine equivariance, high breakdown value, bounded influence function, and better convergence rate [13, 14]. The study on the significant role of MCD estimators in the construction of robust Hotelling chart can be easily found in the literature. Vargas [3] and Jensen et al. [4] introduced robust control chart based on MCD estimator for multivariate individual observations. They identified and removed the outliers in Phase I analysis by using robust estimator and then calculated the classical estimator using the remaining good data points for Phase II analysis. They noticed some drawbacks when MCD was used in Phase I. Hotelling’s issued from MCD needed a larger sample size if large number of outliers was suspected to ensure that MCD estimator did not breakdown and lose its ability especially when monitoring with more quality characteristics (). To abate the problems, Chenouri et al. [5] proposed robust Hotelling chart based on reweighted MCD estimator. Besides possessing the nice properties of MCD estimator, the estimator was not unduly influenced by outliers and was more efficient than MCD. However, their approach was different from Vargas [3] and Jensen et al. [4] whereby they used RMCD estimator in place of classical estimators in constructing Hotelling chart for Phase II data directly. Using the same approach as Chenouri et al. [5], Alfaro and Ortega [15] conducted a comparison study on the performance of Hotelling control chart in Phase II process based on MCD, MVE, reweighted MCD, and trimmed estimator. Their finding showed a conflict between the percentage of outliers detection and the ability of the robust control charts in controlling the overall false alarm rate under certain conditions. To alleviate this conflict, Yahaya et al. [16] introduced the estimator in Hotelling chart () in Phase II. In general, the result showed that chart was able to increase the detection of out-of-control signals and simultaneously control false alarm rates even with large number of quality characteristics. In contrast, the MCD charts performed well in detecting out-of-control signals but failed in controlling false alarm rates. The traditional chart, however, was able to control false alarm rates but not effective in detecting out-of-control signals. Despite the good performance of , the estimated UCLs for Hotelling chart issued from estimators were large as compared to the traditional and MCD charts. Thus, this study attempts to improve the estimators in achieving the desired UCLs by making the estimators consistent at normal model. Since in practice we always deal with finite samples, therefore the issue of bias in a finite sample is also considered in this study. The advantage of having unbiased estimator for a finite sample is that this estimator remains unbiased even though the sample size becomes larger [17]. With respect to the latter issue, this paper will also seek to improve the performance of by making it unbiased for finite samples.

The organization of the remaining part of this paper is as follows. The formal definition of estimator and the adjustment done on the scatter estimator to ensure that it is consistent and unbiased will be discussed next, followed by the investigation on the improved estimator through simulation study. The discussion continues with the computation of control limits for the traditional, MCD, and the improved Hotelling charts, and the improvement of the proposed chart is revealed in this section. A real data analysis from aircraft industry is presented to illustrate the applicability of the proposed charts before arriving to the conclusion in the last section.

2. Minimum Vector Variance (MVV) Estimator

Herwindiati et al. [18] had proved that estimators possess three major properties of a good robust estimator, that is, high breakdown point, affine equivariance, and computational efficiency. The main method used in the estimation of is the Mahalanobis squared distances (MSDs). Let be a data set of -variate observations. Denote the estimators for the location parameter and scatter by , and respectively. Now let , the and are determined based on the set consisting of data such that has minimum trace of , denoted as , among all possible sets of data. To compute the estimates of , we used the algorithm proposed in Yahaya et al. [16]. The location and scatter estimators are defined as

2.1. Consistency Factor

The aim of Hotelling chart in Phase I is to estimate the in-control parameters of location, and scatter, . The usual estimators for these parameters are the normal maximum likelihood estimators (MLE). The estimation of parameters is based on the data set from multivariate normal distribution with density with and . However, the distribution of (3) is only an approximation because a portion of the data may be contaminated by outliers [19]. With the existence of outliers, MLE which are known to be sensitive to outliers will not be able to precisely estimate the parameters. To address this problem, we propose estimators, that is, robust estimators with highest breakdown point (50%) proposed by Herwindianti [20] to replace the MLE. We compute the estimators in Phase I data sets, with location and scatter estimators as defined in (1) and (2), respectively. The estimator has a fixed integer such that

The preferred choice of for outlier detection is its lower bound, which yields the breakdown value, . Let and be the mean and the scatter matrix calculated from the observations out of , whose classical scatter matrix has the lowest vector variance resulting from smallest MSD. The is a scatter matrix which is positive definite, symmetric (PDS), and affine equivariant [20]. However, this estimator is not consistent under normal model. Robust scatter estimator is typically calibrated to be consistent for normal model. Known as Fisher consistency, this is a standard concept in robust statistics which denotes that the functionals evaluated at the model distribution return the true parameter value, [19]. In order to achieve consistency under the normal model, (in (2)) is multiplied by a consistency factor, , as follows: The approximation of consistency factor can be obtained from elliptical truncation in the multivariate normal distribution based on squared distance. If ,   is defined as where is the -quantile of distribution. This formula is derived by Butler et al. [13] and further discussed in Croux and Haesbroeck [14] based on the functional form of the MCD estimator. Since have the same functional form with the MCD estimator, we used (6) as the consistency factor for . Albeit guaranteed consistency under normality distribution, Pison et al. [17] cautioned that MCD estimators were biased for small sample sizes. Thus, the consistency factor in (6) only might not be sufficient to make estimator unbiased for small sample sizes. For that reason, we also include the computation of correction factor at any sample size and dimension .

2.2. Correction Factor

A simulation study on the effect of correction factor on the estimator is carried out for several sample sizes and dimension = 2, 5, 10, 15, and 20. We generated data sets from standard multivariate normal distribution. For each data set , we then determine the in (5). If the estimator is unbiased, , therefore the th root of the determinant of equals 1. Given the mean of the th root of the determinant as . To determine the correction factor, we performed simulations for different sample sizes and dimensions , with such that The computed values are displayed in Table 1. Then, using in (7) as the correction factor for , we obtain Since can be considered consistent and unbiased, the determinant of should approach 1.

3. Investigation through Simulation Experiment

Gather and Becker [21] have emphasized that robust estimators to be used in the method of outliers detection should have sufficient rate of convergence to some true underlying model parameter for consistency and unbiased. A sequence of asymptotically unbiased estimators for parameter is called consistent if . To illustrate the analysis on the consistency of estimator at multivariate normal, data are randomly generated from . An experiment is carried out for several values of sample sizes until convergent for a fixed moderate dimension such that . Figure 1 shows the determinants of corresponding to the sample size, . As the value of increases, we can observe that the determinant approaches 1 which implies that the is consistent. Next, the investigation using simulation experiment continues to show that and which replaced the MLE, and , in Hotelling are consistent and unbiased. The squared distances using any affine-equivariant robust location and scatter estimators which are consistent and unbiased under normal model are asymptotically distributed [21]. Therefore, if and are consistent and unbiased estimators for and, then with observations i.i.d in , it follows that is asymptotically distributed. Since is similar to Hotelling , the asymptotic distribution of the improved Hotelling when should follow distribution if the estimators are consistent. If we consider a sample of quality characteristics such that where as a phase I data set, then the improved statistic for can be constructed in the following manner: To check on the distributions of the improved , we employed the QQ plots and evaluated the goodness of fit on those plots based on the slope and the -square of the straight line as shown in Table 2. The hypothetical distribution represents the without error if all points are in a straight line with slope equals 1 and -square also equals 1 [22]. Random data were generated from multivariate standard normal distribution . This study is carried out for the sample size of = 10,000 with dimensions of = 2, 5, 10, 15, and 20. From this table we observe that the -square values for all ’s are 0.999. With regard to the slopes, we can see a considerable difference in the values between the Hotelling’s with original and Hotelling’s with improved especially when = 2. The slopes for are consistent and approximately equal to 1 regardless of the dimensions (). In contrast, the slopes for are quite a distance away from 1 even though the pattern shows a declining in values towards 1 as increases. We observe that the values for the two measurements ( and slopes) are very close to the ideal value, which signify that the distribution fits well with the simulated values. The result implies that the constant fulfills the condition of the multiplicative factors to make the estimators consistent and unbiased for .

4. MVV Hotelling Control Chart

Let be the -variate random sample of observations of preliminary data set in Phase I. Calculate the and estimators. Since the estimators are known to be free from outliers due to their estimation process, they could be readily used as in-control estimators in Phase II. By using these estimates, statistic in (9) is computed for Phase II observation, where .

4.1. Estimation of Control Limits

In this section, we present the control limit of the improved control chart by using simulated data with different combinations of sample sizes, , and number of dimensions, . The control limit of chart is then compared with the control limit of chart, robust Hotelling chart using MCD () and the traditional Hotelling charts. The application of robust estimators in place of the mean and covariance structure in traditional Phase II Hotelling statistic will cause the distributional properties of the traditional chart to change [9]. To demonstrate the performance of and control charts, we need to identify the distribution of each method in order to obtain appropriate control limits, that is, UCL. Since the exact distribution of is unknown, we apply Monte Carlo method to estimate the quantiles of the and , for several combinations of sample sizes and dimensions. In order to estimate the 95% quantile of for a given Phase I of sample size and dimension , we generate = 5000 samples of size from a standard multivariate normal distribution, . For each data set of size , we compute the mean vector and the modified covariance matrix estimates, and , respectively, from . In addition, for each data set, we randomly generate a new observation treated as a Phase II observation from and calculate the corresponding values. The empirical distribution function of is based on the simulated values We sort values in ascending order, and the UCL is the 95% quantile of the 5000 statistics. The results of the investigation are presented in Table 3. We observe that the estimated UCLs for are large as compared to the traditional control charts , , and . However, after making the scatter estimator consistent and unbiased as shown in (8), the results improved immensely. As we can see here, the UCLs are closer to the traditional UCLs.

4.2. Real Data Analysis

The application of the improved method on real data is illustrated using data furnished by Asian Composites Manufacturing Sdn. Bhd. (ACM) which involves in the production of advanced composite panels for the aircraft industry. ACM produces flat and contoured primary (Aileron Skins, Spoilers and Spars) and secondary (Flat Panels, Leading Edges and MISC: Components) structure composite bond assemblies and sub-assemblies for aerospace industries. The company provided us with data on spoilers as shown in Table 4. For the purpose of this study, a sample of 47 spoilers () which consists of several features, namely, trim edge (), trim edge spar (), and drill hole () were furnished to us by the company. Out of the total, 21 spoilers were collected from 2009, while the rest were from 2010. Hence, we decided to use the 2009 spoilers as Phase I historical data and considered the spoilers from 2010 as future data in this study. Estimates for the location vector () and scatter matrix () are presented in Table 5. In the last column of Table 5, we could clearly observe that the upper control limit (UCL) for is the closest to the traditional Hotelling with values of 11.5513 and 11.035, respectively, whereas the other control charts produce large UCL values especially the original . When we compare the improved with the original , we observe a large disparity between the two values such that = 41.298, and = 11.5513. The result indicates great improvement in the UCL values from to . Table 6 identifies the out-of-control data (bold font) using the different statistics. Among the four statistics, , and consider signal observations 20, 22, and 25 as out-of-control but fails to signal observation 22 and only consider, observations 20 and 25 as out-of-control. This is expected due to the low probability of detection in the traditional control chart [16]. For a clearer visualisation on the performance of the control charts in detecting out or control observations, graphical presentation of the corresponding control charts are put on view in Figure 2.

5. Conclusion

The UCL value for the Hotelling control chart using consistent and unbiased estimators seemed to improve significantly from the Hotelling control chart based on the original estimators. The improved control chart () was put to test on real data. Even though the performance of the improved control chart was on par with the original chart, the improved estimators have successfully reduced the inflated UCL of the original close to the UCL of the traditional Hotelling () control chart. However, when the improved control chart was compared with the traditional chart based on their almost equal UCLs, the finding showed that the improved control chart performed better in detecting out-of-control observations. With the good properties and performance, this improved estimators that should be considered as alternative estimators to replace the usual mean and variance vector in the construction of the robust Hotelling control chart as well as other multivariate statistical procedures.