Abstract

Multivariate analysis is increasingly used to include all dimensions of quality concept, in light of rapid development of customer requirements. With the recent advances in information technology and in recording, large amounts of multivariate data are now needed to be analyzed. Many charting procedures are based on Mahalanobis distance, but their applicability relies heavily on the requirement of normality and their performance is related to the choice of a type I error rate. An alternative charting scheme based on data depth is pursued and its performance is assessed through a real example. This performance and that of a chart for individual observations are discussed. Using the centre-outward ranking, this new method named DD-diagram is used to detect any multivariate quality datum that one of its components exceeds its limiting variation interval. For a given error-free sample, the DD-diagram can be used to signal out any point of another observed sample taken from a multivariate quality process. This new scheme based on data depth uses a properly chosen limiting variation line or in order to evaluate the outlyingness of every point in the observed sample in all directions of the considered -variates of quality process.

1. Introduction

Control charts are standard tools that are used to monitor quality process to identify instability within the manufacturing process. In practice, the quality of a product is determined by the interaction of multiple characteristics that are correlated, it is a multivariate phenomenon by nature. So adequate techniques need to be used to monitor the multivariate quality process.

Multivariate Shewhart control chart was first introduced in 1947. It was based on the test statistic and known as Hotelling’s chart. Then, a number of multivariate control charts were designed to suit different situations such as multivariate CUSUM and multivariate EWMA charts. These classical monitoring charts have been developed under a number of assumptions quoted by [1].

The performance of the multivariate control charts relies heavily on the hypothesis that the underlying distribution of the quality process is multivariate normal. It is well known that in practice this hypothesis rarely holds. Alternative procedures are needed to overcome this limit. Based on a data depth notion, [2] refined a visual procedure named DD-diagram which uses data depth plot to monitor any multivariate quality data and does not require any assumptions about the underlying distribution of the process. This graphical method provides a visualization of a change in position and/or in scale between an empirical sample with respect to a reference one. Furthermore, this diagram permits for an easier interpretation and a rapid adjusting action in the multivariate quality process through the use of newly suggested control limits detecting any out-of-control signal.

In this paper, an application of both control chart and DD-diagram is investigated using individual observations taken off a real case of quality process from the Tunisian industry. The data of the samples are collected during two different times of the production process. The reference sample measures are drawn from a production process during which the process is considered in control. However, the empirical sample measures are drawn later in the frame work of a quality control routine. The chart and the DD-diagram are given in Sections 2 and 3, respectively. In Section 4 these monitoring techniques are applied. The results and concluding remarks are provided in Section 5.

2. The Control Chart

Let be the quality of an item at time period , so if this level of quality is characterized by quality characteristics then or simply is a vector-valued output at time . The components of the sample are assumed to be independent and identically distributed multivariate normal random variables with mean vector and covariance matrix .

It is clear that the control chart is the suitable monitoring technique for the above vector-valued output series. So, the above assumptions make it possible to say that has a distribution. The statistic is the Mahalanobis distance of the vector from the mean .

When the mean vector and the covariance matrix are known, the control chart for the ’s series is constructed using the statistic in (2). As stated by [3], its lower control limit (LCL) and upper control limit (UCL) are specified by where is the percentile of the chi-distribution with degrees of freedom. In reality, however, the parameters and are rarely known so they have to be estimated from a base period of observations when the process is in control. If the base period sample referred to as a reference sample of size is denoted by , then the mean vector is estimated by such that and the covariance matrix is estimated by Replacing and by their respective estimators in (2), the empirical version of the statistic is obtained by When the th observation is independent of both and , then according to [3] the statistic of (6) has an exact distribution and the control limits of the control chart are given by where is the percentile of the distribution with and degrees of freedom. The process is declared out of control whenever an observation has its value not lying in the band defined by (7).

3. The DD-Diagram

Many charting procedures are based on Mahalanobis distance, but their applicability relies heavily on the requirement of normality and their performance is related to the choice of a type I error rate. An alternative charting scheme based on data depth is pursued. For a given error free sample, this new method named DD-diagram can be used to signal out any point of another observed sample taken from a multivariate quality process. This new scheme based on data depth uses a properly chosen limiting variation line or in order to evaluate the outlyingness of every point in the observed sample in all directions of the considered -variates of quality process.

Let be a probability distribution in , . Throughout the following, unless stated otherwise, we assume that is absolutely continuous and also that the reference sample is derived from . Therefore, if the quality of the th observed unit is denoted by , then .

According to [4], a data depth is a way of measuring how deep or central a given point is with respect to or to a given data cloud . Then using (2), the Mahalanobis depth at with respect to is defined to be The sample version of (8) is obtained by replacing the mean vector and the covariance matrix of as in [4], with their respective sample estimates in (4) and (5), then

Henceforth, or will be used to indicate the datum depth notion and a larger value of always implies a deeper (or more central) with respect to .

Given a notion of data depth, one can compute the depths of all quality measures and order them according to decreasing depth values. This gives a ranking of the sample point associated with the th highest depth value. Consequently, is the order statistics, with being the deepest or the most central point or simply the centre, and the most outlying point. The implication is that a larger rank is associated with a more outlying position with respect to the data cloud. These order statistics induced by a data depth are different from the usual order statistics on the real line, since the latter are ordered from the smallest sample point to the largest, while the former start from the middle sample point and move outwards in all directions, see [4].

Given definitions (8) or (9), the sample becomes ; there is a natural choice of location parameter for the observed distribution. Specifically, the centre is the most central point so

When the depth-equivalence class contains more than one point measure, according to [4] the median or the centre is the average of the deepest points, so in this case

On this basis and using data depth, (10) and (11) fix out a centre or a multivariate median. Moreover, [4] stated that if Mahalanobis depth is used, the central point defined in (10) and (11) turns out to be the mean of the observed data. This suggests concepts of location which are intermediate between the mean and the median.

A data depth plot is a graphical comparison between two multivariate distributions based on data depth. So in addition to the reference sample , let be another sample referred to as an empirical sample of distribution characterizing the observed quality process. The reference sample is taken when the production process is in control and the empirical sample is drawn when inspecting the process during another period of time later. The DD-diagram is obtained when plotting data depth values of a sample versus the other. Precisely, the DD-diagram is defined by

Since is a subset of , the resulting graph is one-dimensional curve in the plane. If the two distributions are identical, the DD-diagram in (12) turns out to be the diagonal line from the point to . Different patterns of deviations from the diagonal line in the DD-diagram are indications of differences in specific characteristics of and .

In general, the distributions are rarely known so instead we use an empirical version of the DD-diagram. If and are unknown distributions for the samples and , then the DD-diagram is obtained when plotting if (9) is used to compute the data depth.

If and if and are both absolutely continuous, then DD-diagram corresponds to a region with nonzero area. The area of this region can serve as a measure of the discrepancy between and , see [4]. If the two distributions are identical, the data cloud of the DD-plot should be concentrated along the diagonal line. Other patterns are indications of differences in specific characteristics of and , that is, in position, in scale, in skewness, and so forth.

In most cases, the departure from the diagonal line usually takes the form of pulling down from the point to the origin, leaving the upper right corner empty and spreading out the points as a scatter plot diagram pointing at . In order to bring out scale differences, the centre of the samples should be equalized first by subtracting the obtained centre of (10) or (11) from the data points. Suppose that is more spread out than the reference sample , then the points in DD-diagram tend to arch toward the sample around the origin.

In analogous manner to the multivariate Shewhart chart, [2] has suggested the control limits and  , in order to detect visually the shifts in location and/or in dispersion. The region under control is located between the limits and . This marked region turns out to be so large. In order to make the DD-diagram more sensitive, it is best to include the centre defined in (10) or in (11) in detecting any out of control shift so we suggest to use instead the following control limit: as a least critical value for a data depth according to which the corresponding point will be considered to have components not satisfying the variation intervals.

This control limit, , is conversely proportional to the sum of the depth of the reference sample centre, , and the function of modified degrees of freedom of the empirical sample, , minus the depth of the closest point to the centre, 1, all multiplied by with being the number of the considered variables affecting quality. So, is the defined function of the -variates that characterize the quality of a product, the sample length, , and the data depth of the deepest point (or the centre) of the reference observed distribution . It appears to be sensitive to any change in the values of these three parameters. The role of this limiting value line is to decide graphically if the production process is in control or not. Therefore, if a point computed using (12) or (13) is lower than this , the observed process is out of control and an investigation must be made in order to point out which characteristic is responsible among the considered -variates.

4. Application of Chart and DD-Diagram

The basic idea is to collect individual observations data from a production process during which the process is considered in control. These observations are used to identify the parameters of the distribution of the observed process. Then, another series of individual observations are drawn from the same process when its distribution has drifted to . Both series of observations are used to construct and to argue the performance of the monitoring schemes: the chart and the DD-diagram.

4.1. Quality Processing of the Firm

Kairouan Tobacco Manufacture (KTM) and National Tobacco and Matches Corporation represent the monopoly of the tobacco in Tunisia. This study focuses on the quality of cigarettes manufactured by KTM. Aiming to improve quality, to satisfy consumers, and to comply with the legislation standards, both companies have equipped their laboratories with equipments that enable a rigorous quality control during all phases of manufacturing process. KTM produces several types of cigarettes, but in this application only the quality data of “Cristal light” cigarettes will be seen. The quality control activities cover all steps of the production process. It begins with the supervision of manufacturing sheets of tobacco, until conditioning the cigarettes in packages. In the tobacco industry, quality is defined by a plenty of requirements imposed by state. These requirements are considered when setting up the cigarettes specifications. Hence, tolerance limits are defined for each measure of quality characteristic of the cigarettes. The firm KTM uses five (5) characteristics that ensure the quality of cigarettes. They are weight, module, humidity rate of tobacco, pulling resistance, and folding capacity. These characteristics are hermetically measured to cope with the State norms.(i)The weight of a cigarette is made up of the tobacco, the filter, and the cigarette paper weights. It varies between 0.965 and 1 gram.(ii)The module of a cigarette corresponds to its diameter, varying from 6.75 to 8.0 mm.(iii)The humidity rate of tobacco is the proportion of water contained in a cigarette. It is considered acceptable if it varies between 11.5% and 13.5%.(iv)Pulling resistance of a cigarette is defined by the difference in pressure between the two extremities of a cigarette when a quantity of air is passed through it. The pulling resistance is considered acceptable when it varies from 100 to 115 CE (colonne d’eau).(v)The folding capacity or density corresponds to the volume occupied by the mass of the tobacco inside a cigarette. It is tolerable to belong to  cm3.

4.2. Collecting and Processing the Data

As introduced earlier, the collection of data is made up at two different periods of time. In fact, 60 measures are drawn as the reference sample on February 1, 2004, when the process is in control, whereas the empirical sample, of size 60 measures, is drawn on September 30, 2009. These collected samples are given in Table 2.

Like any multivariate quality process, the observation involves the simultaneous measurement of five variables as indicated: (1) the weight, (2) the module, (3) the humidity rate, (4) the pulling resistance, and (5) the density.

Processing the observed data begins with the start-up stage that consists of estimating the parameters of , constructing the control limits of chart, and determining the centre of as reference sample.

According to (4), the vector of sample mean is and to (5), the sample covariance matrix is

To construct a multivariate control chart, control limits are determined according to (7) with as follows:

Figure 1 shows the corresponding chart. The observations were examined individually to determine a possible assignable cause and no observation is detected lying outside the in-control region specified by the above computed control limits.

To determine the centre, data depths of all observations of the reference sample are calculated using (9). As recorded in Table 2, the highest value of the statistic data depth is for a sample of size . It corresponds to observation in the sample so the most central point is defining a depth-equivalence class of order one containing a single cigarette with rank 45.

In order to detect graphically any point that is not satisfying the limiting variation intervals, the for the minimum data depths is calculated according to(14) as the least acceptable data depth value and below which the corresponding point is considered out of control, that is, at least one of the -characteristics exceeds its limiting variation interval.

When constructing DD-diagram in Figure 2, there is no restriction made to the data distribution generated by the production process. So to obtain both subplots of Figure 2, (13) is used and instead of data depth of distribution, the data depth of distribution is employed in the -axis and -axis, respectively. This work is done twice before and after centring measures with respect to the identified centre of distribution.

If the observed measures are centred with respect to the centre of , this most central point will be characterized by the highest depth (or by the smallest Mahalanobis distance). So, it is clear that in the reference sample, the point of order 45 is characterized by the maximum data depth in either case before and after centering with respect to the computed vector-valued centre . This fact explains why is affine invariant and is used simultaneously in DD-diagram and DD-diagram of centred measures, see [4]. This deepest point in the reference sample is marked by a circle in both subplots of Figure 2.

The second stage consists of using both control schemes to evaluate the stability of the observed production process when an empirical sample is drawn. At this phase, the parameters of the reference sample obtained in the start-up stage are used to monitor any taken empirical sample in the future. Specifically, after drawing the empirical sample given in Table 2 the vector-valued mean and the covariance matrix are used to assess the charting statistic and to construct the chart.

Figure 3 shows the corresponding multivariate control chart. Observations 19, 33, 50, 52, and 54 of the empirical sample lie outside the in-control region. These points are examined thoroughly to determine which characteristics are causing this drift in quality. It indicates that cigarettes 19 and 54 are considered out of control because the “module" () exceeds its specified measure. The other observations 33, 50, and 52 are considered out of control because the humidity rate () is lower than its minimum value.

Henceforth, the vector-valued centre and the limiting variation value are used to evaluate the data depths of all measures making up the sample and also to identify any change in location and/or in scale of the process.

Figure 4 shows the corresponding DD-diagram for both cases before and after centering measures. Both subplots (left and right) of Figure 4 sketch out any change in location and/or a scale increase when moving from the distribution to the distribution in the multivariate quality process.

Before centering measures, the DD-diagram gives a larger set of out-of-control observations than the chart (with type I error rate ), not only the points 19, 33, 50, 52, and 54 but also 2, 3, 27, 29, 39, and 43 of the empirical sample . These points have data depths lower than the computed limiting variation value because of a change in location and/or in scale, respectively. They are indicated with red stars in the left subplot of Figure 4. In order to determine the characteristics responsible for this drift, refer to Table 1.

For centered measures of both samples with respect to the centre of , the DD-diagram in the right subplot of Figure 4 shows the observations under the effect of a scale change only. It can be deduced that observations 3 and 50 are out of control because of a change in location only. When measures are centered, these two observations disappear from the out of control region in the right subplot of Figure 4. But, observations 2, 19, 27, 29, 33, 39, 43, 52, and 54 are out-of-control under the effect of a change in dispersion and of a location shift, respectively. These points are indicated with red stars in the DD-diagram in the right subplot of Figure 4.

5. Results and Concluding Remarks

The DD-diagram is a graphical comparison that exhibits location shifts and/or scale increase when moving from the distribution of the reference sample to the distribution of the empirical one. And to use this diagram, we do not need any requirement about the nature of the observed multivariate quality process distribution. Although, this procedure looks like a nonparametric method, DD-diagram does not require large samples. It suffices to have a size of the samples that goes beyond 30 to ensure a reasonable performance. So, whenever this size goes bigger, the DD-diagram improves in performance.

The above application allows us to say that DD-diagram performs better than chart because its out-of-control signal does not depend on an error rate as for the case of control chart. Prior to the use of DD-diagram to monitor any observed process, the reference sample is tested if it is in control or not by using it twice as a reference sample and an empirical one in both -axis and -axis, respectively. In the above application, DD-diagram detects 11 points indicating that their components exceed their specified limits, whereas the control chart gives only 5 points corresponding to an error rate . In order to detect the same points given by DD-diagram, the control chart ought to use a type I error rate .

When a multivariate quality process changes its distribution from to and if the location shift is eliminated, that is, centering the measures with respect to the centre or the deepest point of , DD-diagram makes it possible to distinguish between the out-of-control observations that were drifted because of location shifts and scale increase, respectively, and those that were drifted under the effect of variations in dispersion only. This fact is not feasible when using chart.

In general consider the test of a null hypothesis asserting stability of a production process versus an alternative one that concerns the existence of shifts in location and/or in scale then the empirical sample has higher dispersion than that of the reference one. This is deduced from the fact that the resulting clouds, of centered measures or not, are located under the limiting variation level line at . Then DD-diagram enables to present and detect graphically any out-of-control observation that the components exceed their specified limits. In addition, the DD-diagram sends out an out-of-control signal when the outlyingness of a point exceeds a specified value in all directions.