Research Article  Open Access
A OneClass ClassificationBased Control Chart Using the Means Data Description Algorithm
Abstract
This paper aims to enlarge the family of oneclass classificationbased control charts, referred to as OCcharts, and extend their applications. We propose a new OCchart using the means data description (KMDD) algorithm, referred to as KMchart. The proposed KMchart gives the minimum closed spherical boundary around the incontrol process data. It measures the distance between the center of KMDDbased sphere and the new incoming sample to be monitored. Any sample having a distance greater than the radius of KMDDbased sphere is considered as an outofcontrol sample. Phase I and II analysis of KMchart was evaluated through a real industrial application. In a comparative study based on the average run length (ARL) criterion, KMchart was compared with the kerneldistance based control chart, referred to as Kchart, and the nearest neighbor data descriptionbased control chart, referred to as KNNchart. Results revealed that, in terms of ARL, KMchart performed better than KNNchart in detecting small shifts in mean vector. Furthermore, the paper provides the MATLAB code for KMchart, developed by the authors.
1. Introduction
In recent years, several attempts have been proposed to integrate data mining with statistical process control (SPC) [1–6]. The objective was to overcome the limitations of traditional parametric control charts especially the normality assumption, which may not be applicable in the case of modern manufacturing systems. The most commonly applied data mining technique in SPC is oneclass classification. Actually, oneclass classification methods have been widely used for process monitoring [7–12]. The principle of oneclass consists in constructing a sphere which contains the maximum of data with minimum volume. This sphere distinguishes incontrol process data, also known as target data, from outofcontrol process data. The shape and volume of the oneclass depend on the used oneclass classifier, also known as data description algorithm. The oneclass approach was applied to develop a new family of control charts called oneclass classificationbased control charts, referred to as OCcharts.
Several types of oneclass classifiers exist in the literature. For instance, only two oneclass classifiers were used to develop control charts, named the support vector data description (SVDD) algorithm [13] and the nearest neighbor data description (KNNDD) algorithm [14]. Sun and Tsung [15] used SVDD to develop the kernel distancebased multivariate control chart, also known as the Kchart, which is considered as the first OCchart that uses support vector principles. When monitoring more than two variables, the Kchart uses kernel methods that provide the advantage of dealing with highdimension data. During the last decade, the Kchart has received significant attention and has witnessed several improvements through many works [16–19].
The SVDD algorithm was also employed to develop other control charts such as the SVDD based multivariate cumulative sum control chart [20], monitor batch process [21], monitor nonlinear processes [22], and perform industrial calibration [23]. Despite its successful application, the high computational cost of SVDD remains its main drawback. Actually, the SVDD algorithm loses its efficiency when the size of the training data becomes large. To overcome this shortcoming, Sukchotrat et al. [24] proposed the use of KNNDD algorithm to develop the KNNchart. KNNDD is a simple and fast algorithm that performs better with high dimensional data and does not consume much time during the training phase. Gani and Limam [25] compared the performance of the Kchart and the KNNchart and demonstrated that the Kchart is sensitive to small shifts in mean vector, while the KNNchart is sensitive to moderate shifts in mean vector.
This paper investigates the use of another oneclass classifier which is the means data description (KMDD) algorithm to construct a new OCchart, referred to as the KMchart. The objective of this work is twofold. First, we aim to enlarge the family of OCcharts and extend their applications by showing the methodology of their construction and providing the necessary software codes. Second, we attempt to propose an OCchart that can compete with Kchart and KNNchart in terms of the average run length (ARL) criterion.
The rest of this paper is organized as follows. A review of OCcharts is presented in Section 2. The proposed KMchart is introduced in Section 3. Construction methodology of the KMchart using a real data example is shown in Section 4, while performance analysis of the proposed control chart is discussed in Section 5. Section 6 summarizes this paper.
2. Background on OCCharts
In the literature, there are two common OCcharts which are the Kchart and the KNNchart. In the following, we give a review of these control charts.
2.1. The KChart
The Kchart relies on SVDD algorithm, which an unsupervised oneclass classifier, to fit a sphere around the target data. This sphere is determined by solving the following quadratic programming
subject to where , , and , are, respectively, the cost function to minimize, the center and the radius of the sphere. Equation (2) shows that the vector of quality characteristics, denoted by , having a distance smaller than the radius are considered as target. To allow the possibility of having outliers in the training set, the distance from to the center should not be strictly smaller than , and larger distances should be penalized. Therefore, we introduce slack variables and the minimization problem becomes
subject to where is a parameter introduced for the tradeoff between the volume of the sphere and the errors.
Equation (4) can be incorporated into (3) by using Lagrange multipliers with the Lagrange multipliers and , should be minimized with respect to , , and and maximized with respect to and . Setting partial derivatives of , we obtain
From (8), , , and , then Lagrange multipliers can be removed and we have
By substituting (6) and (8) into (5), we have subject to
A test sample, denoted by , is accepted when its distance is smaller or equal to the radius. This is equivalent to
Generally, data is not spherically distributed. To make the method more flexible, the vectors of are transformed to a higher dimensional feature space. The inner products in (10) and (12) are substituted by a kernel function . In a higher dimension, the sphere becomes a complex form called “hypersphere.” The problem of finding the optimal hypersphere is given by subject to (11).
A test sample is accepted when
The construction of the Kchart consists in determining which samples are support vectors (SVs) by solving the following quadratic programming subject to
Once the SVs are obtained, the kernel distance (KD) of each sample is computed. For a test sample , the KD is computed as follows where is the set of SVs.
The KD of SVs, denoted by , represents the upper control limit (UCL) for the Kchart used to monitor a new sample . This can be illustrated by the following hypothesis test
Under the process is considered as incontrol and under the process is considered as outofcontrol, when sample was taken.
2.2. The KNNChart
The KNNchart uses an unsupervised oneclass classifier called KNNDD to construct a oneclass by estimating the local density of data. To understand the mechanism of the KNNchart a brief description of the KNNDD algorithm, based on the work of Sukchotrat et al. [24], is presented below.
Let be the th nearest neighbor training observation of data point that needs to be monitored. Let be the volume of the hypersphere containing nearest neighbor training observations and the size of the training set. The local density of , denoted by , can be determined as
Similarly, the local density of , denoted by , can be determined by where is the th nearest neighbor of in the same training set.
The KNNDD method classifies as the target class when the ratio of its local density of is greater than or equal to one, which can be explained as follows
To make the algorithm more robust, the average of distances is considered (for ). Thus, (21) becomes
To construct the KNNchart, the statistic representing the average distance between and the nearest observations is computed as follows
The values are used as monitoring statistics.
3. The Proposed KMChart
The proposed KMchart gives the minimum closed spherical boundary around the incontrol process data using the KMDD algorithm. The latter is an unsupervised oneclass classifier, based on the means algorithm which is a very popular clustering method. It measures the distance between the KMDDbased sphere and the new incoming sample to be monitored. The sphere is described by clusters placed such that the average distance to a cluster center is minimized.
The phase I of the KMchart consists in determining the optimal KMDDbased oneclass, by estimating the optimal number of clusters. In this step, the means clustering algorithm aims to find clusters, denoted by , that minimize the within clusters sum of squares as follows where are the disjoint sets of cluster indices, is the number of observations in the training phase, is the sample mean of the observations in the th cluster, and is the Euclidean distance of the quality characteristic . It should be noted that the values are used as charting statistics for the KMchart. The optimization problem in (24) can be solved by iterating the following two steps.(i)Given cluster centers , assign each point to the cluster with the closest center.(ii)Given a cluster assignment, update the cluster centers to be the sample mean of the observations in each cluster.
In phase II, the distance of a new incoming sample, denoted by , is computed as follows where is the number of observations in the testing phase.
The test sample is accepted when its distance is smaller or equal to the radius of the KMDDbased sphere, denoted by . This is equivalent to where is the radius of KMDDbased oneclass, representing the UCL for the KMchart. It is set according to the number of clusters used for the construction of oneclass.
4. A Real Industrial Application
To demonstrate the efficacy of the proposed KMchart, we applied it to the “Cristal Light” cigarettes data set. Actually, “Cristal Light” cigarettes are Tunisian trademark produced by the Kairouan Tobacco Manufacture. The production process of “Cristal Light” cigarettes comprises 12 sequences of operations which are humidification of tobacco leaves, threshing tobacco leaves, strip processing, hashing, drying, expansion of edges, casing, flavoring, introduction of expanded tobacco, confecting ion of cigarettes, packing and boxing, and conditioning. Details about the “Cristal Light” data set can be found in Hajlaoui [26].
The quality of “Cristal Light” cigarettes is defined by five main characteristics which are as follows.(1)The weight of a cigarette, which is the made up of the tobacco, the filter, and the cigarette paper weights. It varies between 0.965 and 1 gram.(2)The module of a cigarette, which corresponds to its diameter; it varies from 6.75 to 8.0 millimeters.(3)The humidity rate of tobacco, which is the proportion of water contained in a cigarette. It is considered acceptable if it varies between 11.5% and 13.5%.(4)Pulling resistance of a cigarette, which is defined by the difference in pressure between the two extremities of a cigarette when a quantity of air is passed through it. The pulling resistance is considered acceptable when it varies from 100 to 115 CE (colonne d’eau).(5)The folding density, which corresponds to the volume occupied by the mass of the tobacco inside a cigarette. It is tolerable to belong to 450 ± 20 cm^{3}.
The “Cristal Light” data set is composed of 65 observations. The first 60 cigarettes are used to construct OCcharts in phase I. Each cigarette took one minute to be collected. The five remainder cigarettes are used for testing outofcontrol states in phase II. For the construction of KMchart, we follow the same methodology of Gani and Limam [25], which consists of three main steps.
Step 1. The data set is analyzed using principal component analysis (PCA) method to obtain independent and identical distributed data, which is a fundamental assumption for oneclass classification problem.
Step 2. The principal components (PCs) resulting from Step 1 are used to construct the oneclass. In our application, we have three oneclass classifiers which are SVDD, KNNDD, and KMDD.
Step 3. The optimal oneclass obtained from Step 2 is used to construct OCcharts by computing the charting statistics which are KD for the Kchart, for the KNNchart, and for the KMchart.
All calculations were carried out with MATLAB software. For the construction of Kchart and KNNchart, we used the MATLAB codes of Gani and Limam [25]. For the construction of KMchart, we used the MATLAB code developed by the authors (see Algorithm 1).

After performing PCA, two PCs explaining more than 90% of the variation were retained. Several numbers of clusters were tested for the construction of KMDDbased oneclass and to determine the incontrol state of the “Cristal Light” process. It is clear from Figure 1 that the number of clusters influences the shape of KMDDbased oneclass and plays a pivotal role in determining the tradeoff between oversmoothness and undersmoothness of the control boundary. In our application, KMDDbased oneclass was constructed with , since the used sample size was not large.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
The detection of an abnormal observation in the target class depends on the shape of the established oneclass. The KMDD provided a spherical oneclass, while SVDD gave a flexible nonspherical oneclass due to the use of SVs. It is worth noticing that the shape of SVDDbased oneclass depends on the width of the radial basis function while the shape of KNNDDbased oneclass is function of the size of the nearest neighbor, denoted by . Details about the characteristics of SVDD and KNNDDbased oneclasses can be found in Gani and Limam [25].
The KMchart exceeded its control limit of 6.001 at around the 19th, 25th, 28th, 40th, 48th, and 50th cigarettes, as shown in Figure 2. On the other hand, these outofcontrol cigarettes have a distance greater than the radius of the established KMDDbased sphere with . For these six abnormal cigarettes, at least one of their five quality characteristics did not respect its tolerance interval, as discussed above. In comparison with the two other control charts, the proposed KMchart succeeded to detect a new outofcontrol observation which is cigarette number 48. Both Kchart and KNNchart failed to detect this abnormal cigarette. Once these outofcontrol observations are removed, no additional outliers were detected, and the incontrol process was established.
(a)
(b)
(c)
In phase II, five “Cristal Light” cigarettes were used to detect outofcontrol states. The KMchart triggered an alarm at around the 62nd cigarette and remained below its control limit for the last three cigarettes. Cigarette number 62 was also declared by Kchart as an outofcontrol observation, while cigarettes number 62 and 65 were declared by KNNchart as outofcontrol observations. Figure 3 shows the discussed OCcharts for phase II.
(a)
(b)
(c)
5. Performance Comparison
In this section, we study the performance of KMchart and we compare it with Kchart and KNNchart. The performance study is based on ARL criterion which is defined as the expected number of samples taken before the shift is detected. It is given by where is the probability of one point plots outofcontrol.
A simulation study was conducted to estimate the ARL of OCcharts. In order to be consistent with Gani and Limam [25], we follow their simulation procedure given by the following.
Step 1. Five multivariate normal variables were generated with a mean vector = (0.986; 7.650; 0.121; 107.183; 451.527) and a covariance matrix similar to the mean vector and the covariance matrix of the “Cristal Light” data set used in Section 4. The Kchart, KNNchart, and KMchart were designed to achieve an overall incontrol ARL of 200. The ARL value was estimated by averaging the run lengths obtained by running 1000 simulated charts.
Step 2. Multivariable shifts, denoted by , were introduced in the mean vector according to Table 1. Basically, large values of correspond to bigger shifts in the mean. The value is the incontrol state.

For detecting small shifts (), KMchart performed better than KNNchart since it gave an ARL = 192.308, while KNNchart gave an ARL = 200. For the same shift level, Kchart yielded an ARL = 100, which was better than that of KMchart.
For detecting moderate shifts (), KNNchart behaved better than the other two control charts since it gave an ARL = 40 against an ARL of 50 and 147.059 of Kchart and KMchart, respectively.
The difference in sensitivity to shifts in the mean vector between the three OCcharts is due to the difference in the nature of distance used by each control chart. The Kchart uses KD, whereas KMchart and KNNchart are based on the Euclidean distance. The advantage of KD in comparison with Euclidean distance lies essentially in the use of the kernel function. The latter is equivalent to the distance between two samples measured in a higher dimensional space. This allows Kchart to easily detect any small shift in the process. In terms of ARL and for small shifts in the mean vector, one can draw the conclusion that our proposed KMchart is situated between KNNchart and Kchart (KNNchart < KMchart < Kchart). Broadly speaking, each OCchart has its advantages and disadvantages. For example, Kchart performs better than KMchart and KNNchart in quickly detecting changes in the process, while the computational cost of KMchart and KNNchart is lower than that of Kchart. Table 2 summarizes the characteristics of each OCchart.

6. Conclusion
In this paper, we have developed a new OCchart using KMDD algorithm, called KMchart. Construction methodology of KMchart is demonstrated through a real industrial application. Performance analysis of KMchart in phase I and II showed that our proposed control chart is a competitive SPC tool. In phase I, the proposed KMchart detected a new abnormal observation which is cigarette number 48. Both Kchart and KNNchart failed to detect this abnormal cigarette. Based on the ARL criterion, our proposed control chart outperformed KNNchart in detecting small shifts in the mean vector.
The proposed KMchart can be extended to monitor nonlinear processes by using the global kernel means algorithm instead of using the standard means algorithm. The global kernel means has the advantage to identify nonlinearly separable clusters and therefore allows KMchart to monitor sophisticated manufacturing processes.
Appendix
The MATLAB Code for KMChart
The MATLAB code for the KMchart requires the PRtools toolbox available at http://www.prtools.org and the dd_tools toolbox available at http://prlab.tudelft.nl/davidtax/dd_tools.html.
For more details see Algorithm 1.
Conflict of Interests
The authors declare that they do not have a direct financial relation with the software mentioned in this paper and no competing interests.
Acknowledgment
The authors express their appreciation to LARODEC of ISG, University of Tunis, for supporting this paper.
References
 D. F. Cook, “Using basis function networks to recognize shifts in correlated process parameters,” IIE Transactions, vol. 30, no. 3, pp. 227–234, 1998. View at: Google Scholar
 R. B. Chinnam, “Support vector machines for recognizing shifts in correlated and other manufacturing processes,” International Journal of Production Research, vol. 40, no. 17, pp. 4449–4466, 2002. View at: Publisher Site  Google Scholar
 W. Hwang, G. Runger, and E. Tuv, “Multivariate statistical process control with artificial contrasts,” IIE Transactions, vol. 39, no. 6, pp. 659–669, 2007. View at: Publisher Site  Google Scholar
 J. Hu, G. Runger, and E. Tuv, “Tuned artificial contrasts to detect signals,” International Journal of Production Research, vol. 45, no. 23, pp. 5527–5534, 2007. View at: Publisher Site  Google Scholar
 W. Gani, H. Taleb, and M. Limam, “Support vector regression based residual control charts,” Journal of Applied Statistics, vol. 37, no. 2, pp. 309–324, 2010. View at: Publisher Site  Google Scholar
 S. B. Kim, W. Jitpitaklert, S.K. Park, and S.J. Hwang, “Data mining modelbased control charts for multivariate and autocorrelated processes,” Expert Systems with Applications, vol. 39, no. 2, pp. 2073–2081, 2012. View at: Publisher Site  Google Scholar
 S. Mahadevan and S. L. Shah, “Fault detection and diagnosis in process data using oneclass support vector machines,” Journal of Process Control, vol. 19, no. 10, pp. 1627–1639, 2009. View at: Publisher Site  Google Scholar
 S. B. Kim, W. Jitpitaklert, and T. Sukchotrat, “Oneclass classificationbased control charts for monitoring autocorrelated multivariate processes,” Communications in Statistics: Simulation and Computation, vol. 39, no. 3, pp. 461–474, 2010. View at: Publisher Site  Google Scholar
 S. Kittiwachana, D. L. S. Ferreira, G. R. Lloyd et al., “One class classifiers for process monitoring illustrated by the application to online HPLC of a continuous process,” Journal of Chemometrics, vol. 24, no. 34, pp. 96–110, 2010. View at: Publisher Site  Google Scholar
 S. B. Kim, T. Sukchotrat, and S.K. Park, “A nonparametric fault isolation approach through oneclass classification algorithms,” IIE Transactions, vol. 43, no. 7, pp. 505–517, 2011. View at: Publisher Site  Google Scholar
 R. G. Brereton, “Oneclass classifiers,” Journal of Chemometrics, vol. 25, no. 5, pp. 225–246, 2011. View at: Publisher Site  Google Scholar
 T. Sukchotrat, S. B. Kim, K.L. Tsui, and V. C. P. Chen, “Integration of classification algorithms and control chart techniques for monitoring multivariate processes,” Journal of Statistical Computation and Simulation, vol. 81, no. 12, pp. 1897–1911, 2011. View at: Google Scholar
 D. M. J. Tax and R. P. W. Duin, “Support vector data description,” Machine Learning, vol. 54, no. 1, pp. 45–66, 2004. View at: Publisher Site  Google Scholar
 D. M. J. Tax, Oneclass classification: conceptlearning in the absence of counterexamples [Ph.D. thesis], Delft University of Technology, Stevinweg, The Netherlands, 2001.
 R. Sun and F. Tsung, “A kerneldistancebased multivariate control chart using support vector methods,” International Journal of Production Research, vol. 41, no. 13, pp. 2975–2989, 2003. View at: Publisher Site  Google Scholar
 S. Kumar, A. K. Choudhary, M. Kumar, R. Shankar, and M. K. Tiwari, “Kernel distancebased robust support vector methods and its application in developing a robust Kchart,” International Journal of Production Research, vol. 44, no. 1, pp. 77–96, 2006. View at: Publisher Site  Google Scholar
 F. Camci, R. B. Chinnam, and R. D. Ellis, “Robust kernel distance multivariate control chart using support vector principles,” International Journal of Production Research, vol. 46, no. 18, pp. 5075–5095, 2008. View at: Publisher Site  Google Scholar
 W. Gani, H. Taleb, and M. Limam, “An assessment of the kerneldistancebased multivariate control chart through an industrial application,” Quality and Reliability Engineering International, vol. 27, no. 4, pp. 391–401, 2011. View at: Publisher Site  Google Scholar
 W. Gani and M. Limam, “On the use of the Kchart for phase II monitoring of simple linear profiles,” Journal of Quality and Reliability Engineering, vol. 2013, Article ID 705450, 8 pages, 2013. View at: Publisher Site  Google Scholar
 S. He and C. Zhang, “Support vector data description based multivariate cumulative sum control chart,” Advanced Materials Research, vol. 314–316, pp. 2482–2485, 2011. View at: Publisher Site  Google Scholar
 Z. Ge, F. Gao, and Z. Song, “Batch process monitoring based on support vector data description method,” Journal of Process Control, vol. 21, no. 6, pp. 949–959, 2011. View at: Publisher Site  Google Scholar
 X. Liu, K. Li, M. McAfee, and G. W. Irwin, “Improved nonlinear PCA for process monitoring using support vector data description,” Journal of Process Control, vol. 21, no. 9, pp. 1306–1317, 2011. View at: Publisher Site  Google Scholar
 H.W. Cho, M. K. Jeong, and Y. Kwon, “Support vector data description for calibration monitoring of remotely located microrobotic system,” Journal of Manufacturing Systems, vol. 25, no. 3, pp. 196–208, 2006. View at: Publisher Site  Google Scholar
 T. Sukchotrat, S. B. Kim, and F. Tsung, “Oneclass classificationbased control charts for multivariate process monitoring,” IIE Transactions, vol. 42, no. 2, pp. 107–120, 2010. View at: Publisher Site  Google Scholar
 W. Gani and M. Limam, “Performance evaluation of oneclass classificationbased control charts through an industrial application,” Quality and Reliability Engineering International, vol. 29, no. 6, pp. 841–854, 2013. View at: Google Scholar
 M. Hajlaoui, “On the charting procedures: T^{2} chart and DDdiagram,” International Journal of Quality, Statistics, and Reliability, vol. 2011, Article ID 830764, 8 pages, 2011. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Walid Gani and Mohamed Limam. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.