Artificial Intelligence and Its Applications 2014View this Special Issue
Recognition of Mixture Control Chart Pattern Using Multiclass Support Vector Machine and Genetic Algorithm Based on Statistical and Shape Features
Control charts have been widely utilized for monitoring process variation in numerous applications. Abnormal patterns exhibited by control charts imply certain potentially assignable causes that may deteriorate the process performance. Most of the previous studies are concerned with the recognition of single abnormal control chart patterns (CCPs). This paper introduces an intelligent hybrid model for recognizing the mixture CCPs that includes three main aspects: feature extraction, classifier, and parameters optimization. In the feature extraction, statistical and shape features of observation data are used in the data input to get the effective data for the classifier. A multiclass support vector machine (MSVM) applies for recognizing the mixture CCPs. Finally, genetic algorithm (GA) is utilized to optimize the MSVM classifier by searching the best values of the parameters of MSVM and kernel function. The performance of the hybrid approach is evaluated by simulation experiments, and simulation results demonstrate that the proposed approach is able to effectively recognize mixture CCPs.
In today’s manufacturing and service industries, control charts are particularly important tools to improve product quality and monitor production process. Various kinds of control charts have been developed by different quality attributes and control targets. Recognizing control chart patterns (CCPs) is one of the most prevalently used techniques to detect process disturbances, equipment malfunctions, or other special events. In general, six basic CCPs are commonly exhibited by control charts, including normal (NOR), cyclic (CC), increasing trend (IT), decreasing trend (DT), upward shift (US), and downward shift (DS). Figure 1 shows these six types of control chart patterns . Over the past two decades, attention has been given to improve the recognition accuracies of these basic CCPs using normalized original data. Automatic CCPs recognition was an active research area in last decade but has not yet been realized fully.
There are numerous research papers on CCPs organization. Most of the previous studies are concerned with the recognition of single abnormal CCPs [2–4]. However, in practice, the observed process data may be mixture CCPs, which may be combined with two or three basic patterns. Compared to the basic patterns, the mixture patterns are more difficult to recognize and result in serious performance degradation for patterns recognition. So it is a challenging task to identify mixture patterns effectively. Only a few studies have reported on mixture patterns recognition [5–8]. Guh and Tannock  use the back-propagation neural network to recognize the mixture CCPs. H. Yang and S. Yang  propose an efficient statistical correlation coefficient method for the recognition of mixture CCPs. Chen et al.  integrate wavelet method and back-propagation neural network for online recognition of mixture CCPs. Lu et al.  propose a hybrid system that uses independent component analysis and supports vector machine to recognition mixture CCPs.
Feature extraction plays an important role in CCPs recognition. Most of the existing literatures use normalized original data as the inputs. These data normally generate large structures and are not very effective for complicated recognition problems. A smaller data size can lead to faster training and more efficiency. Regarding this, researchers have proposed various methods to extract features for CCPs recognition [2, 9, 10]. Ranaee et al.  use both shape features and statistical features as the data inputs. The results show that this method is good for control chart recognition. Hassan et al.  introduce feature-based control chart pattern recognition. Six statistical features were proposed: mean, variance, skewness, mean-square value, autocorrelation, and cusum. It is intended to improve the performance of CCPs recognizer by smaller size features. Gauri and Chakraborty  present the improved feature extraction from a large number of potentially useful features using a CART-based approach. And other feature extraction methods are proposed for eliminating the duplicated information, like independent component analysis (ICA) , fisher discriminate analysis (FDA) , and principal component analysis (PCA) [13, 14]. The feature extraction efforts cited above did not approach a suitable set of features. In this paper, thirteen features that consist of both statistical and shape features of the CCPs are initially chosen. It is a well-established dimensionality reduction technique, which can be employed to compress the noise and correlated measurements, so that makes the data into a simpler and smaller informative subspace for measurement data sets.
Traditionally, CCPs were analyzed and interpreted manually. Until the end of the 1980s, expert systems were employed for control chart patterns recognition [15, 16]. With the development of computer technology, machine learning techniques have been widely adopted in automatic process monitoring. In particular, artificial neural networks (ANNs) are the most frequently used in control chart patterns recognition [17–20]. The use of ANNs has overcome some drawbacks in the traditional expert system method. Artificial neural networks utilize a multilayer perception with back propagation training to classify unnatural patterns and show higher accuracy. In subsequent studies, many other methods like decision tree, fuzzy clustering, and wavelet analysis are combined with ANNs to recognize CCPs [19, 20].
However, ANNs also suffer from several weaknesses, such as the need for a large amount of training data, bad generalization ability, the risk of model over-fitting, difficulty to obtain stable solution, and getting into a local extremum easily. The application of ANNs is limited due to these weaknesses. Support vector machine (SVM), based on statistical learning theory, is proposed to recognize CCPs because of its excellent performance in the practical application. It mainly used the principle of structural risk minimization, which makes it have greater generalization ability when there is a small sample, and is superior to the principle of the empirical risk minimization principle as artificial neural networks [7, 21, 22]. The biggest problems encountered in setting up the SVM model are how to select the kernel function and its parameters values. The parameter set of the penalty parameter and kernel function parameter should be optimized.
The purpose of this study is to develop an intelligent hybrid CCPs recognition model that can be used for mixture CCPs to improve the recognition accuracy. This paper considers the six basic and four mixture CCPs and generates their statistical and shape features as the inputs and multiclass support vector machine (MSVM) as classifier. At the same time, genetic algorithm (GA) is chosen as an optimization tool to optimize the MSVM parameters. This model will improve CCPs recognition performance.
2. Modeling for Control Chart Patterns Recognition
The aim of this model is to recognize CCPs effectively and automatically. Figure 2 shows the schematic diagram representing the procedure of the CCPs recognition, in which three modules are in series: feature extraction, classifier, and parameters optimization (F_MSVM_GA).
In the feature extraction module, statistical and shape features of observation data are used as the data inputs for the classifier. As we know, every control chart pattern has different properties, and features represent the properties of various CCPs. If some effective features are chosen to reflect the pattern, it is easier to recognize the abnormal patterns. Original data as the inputs usually have large data and are not very effective for complicated recognition problems. In this paper, statistical and shape features of CCPs as the feature extraction method are utilized to get the suitable data. In classifier module, an MSVM classifier is developed for recognizing the basic and mixture patterns. In order to achieve satisfactory recognition performance, the MSVM classifier needs to be properly designed, trained, and tested. However, using MSVM has some difficulties, like how to select the optimal kernel function type and the most appropriate hyperparameters values for MSVM training and testing stages. Therefore, genetic algorithm is applied for finding the optimum values of hyperparameters, that is, the kernel parameter and classifier parameters in parameters optimization module.
2.1. Statistical and Shape Features
The patterns can be described in the original data. The statistical features and shape features can be got from the original data. It is efficient to simplify the data number and get the useful information. In this paper, eight statistical features and five shape features are chosen to reflect the patterns; these thirteen features are, respectively, shown below .
2.1.1. Statistical Features
They are as follows: mean, standard deviation, mean-square value, autocorrelation, positive cusum, negative cusum, skewness, and kurtosis.
(1 ) Mean. The mean for normal and cyclic pattern is around zero, while that for other patterns is different from zero. Therefore, it may be a good candidate to differentiate normal and cyclic patterns from other patterns:(2 ) Standard Deviation. Standard deviation of sample data, each mode performance is different(3 ) Mean-Square Value. Consider(4 ) Average Autocorrelation. This paper takes the average of correlation degree between property values for each sample:(5 ) Positive Cusum. Sample data points are greater than the average and then cumulate the gap between data and their average:(6 ) Negative Cusum. Sample data points are smaller than the average and then cumulate the gap between data and their average:(7 ) Skewness. It provides information regarding the degree of asymmetry:(8 ) Kurtosis. It measures the relative peakness or flatness of its distribution:
2.1.2. Shape Features
They are as follows: slope, N1, N2, APML, and APLS.
(1 ) Slope. The slope of the least-square line: the slope for normal and cyclic pattern is around zero, while that for other patterns is greater than zero. Therefore, it may be a good candidate to differentiate normal and cyclic patterns from other patterns:(2 ) N1. The number of mean crossing: it is almost zero for shift and trend patterns but very high for normal patterns; cyclic pattern is the intermediate pattern. The feature differences can distinguish the normal and cyclic from shift and trend patterns:(3 ) N2. The number of least-square line crossing: this feature is the highest for normal and trend patterns, intermediate for shift patterns, and the lowest for cyclic patterns. Thus it can be used for separation of normal and trend patterns from others:(4 ) APML. The area between the pattern and its mean line: this feature is the lowest for normal pattern; therefore, it differentiates the normal pattern from others:(5 ) APLS. The area between the pattern and its least-square line: normal and trend patterns have lower values than shift and cyclic patterns. Thus it can be used to distinguish normal and trend patterns from shift and cyclic patterns:
2.2. Support Vector Machine
Basic SVM is invented by Vapnik of the AT&T Bell lab team. It is created based on the VC dimension theory and structural risk minimization of statistical learning theory. So that gets the best solution between model complexities and learning ability according to the limited sample information. The basic SVM deals with two-class problems. However, it can be extended to Multiclass SVM [7, 21–23].
An SVM performs classification tasks by constructing optimal separating hyperplanes (OSHs). An OSH maximizes the margin between the two nearest data points belonging to two separate classes. Suppose that the training set, , , , , can be separated by the hyperplane , where is the number of sample observations and is the dimension of each observation and is the weight vector and is the bias. If this hyperplane separates the data from two classes with maximal margin width and all the points on the boundary are named the support vector, the SVM solves the following optimization problem:
This is a convex quadratic programming (QP) problem, and Lagrange multipliers are used to solve it. And, for input data with a high noise level, an SVM using soft margins can be expressed with the introduction of the nonnegative slack variables . Equation (14) is transformed into the following constrained form:
In (15), is the penalty factor; it determines the penalty degree of the error. It can be viewed as a tuning parameter, which can be utilized to control the trade-off between maximizing the margin and the classification error.
An MSVM method is adopted in the classifier stage. There are two methods: one-against-all (OAA) or one-against-one (OAO). Suppose that it has an N-class pattern recognition problem; N independent SVMs are constructed and each of them is trained to separate one class of samples from all others. When testing the system after all the SVMs are trained, a sample is input to all the SVMs. Suppose that this sample belongs to class N1; ideally only the SVM trained to separate class N1 from the others can have a positive response. Another method is called one-against-one (OAO) method. For an N-class problem, SVMs are constructed and each of them is trained to separate one class from another class. Again, the decision of a testing sample is based on the voting results of these SVMs. In this paper, OAO is adopted for patterns recognition .
In the nonlinearly separable cases, which cannot be linearly separated in the input space, the SVM uses the kernel method to transform the original input space into a high dimensional feature space, where an optimal linear separating hyperplane can be found. Although there are several types of kernel function, the most widely used kernel function is the radial basis function (RBF), which is defined as
The largest problems encountered in MSVM are how to select the penalty parameter and kernel function parameters value . The GA is used to search for the best value of parameters in MSVM classifier.
2.3. Genetic Algorithms (GA)
GA is a powerful tool in the field of global optimization. It has better search efficiency, robustness, and parallel compared with traditional optimization algorithms. Genetic algorithms belong to the larger class of evolutionary algorithms, which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.
In this paper, OAO and RBF kernel function are adopted for MSVM; the performance of an MSVM is mainly impacted by the setting of parameters of two parameters ( and ). The GA is used to search for the best value of parameters in MSVM classifier. The particle has two dimensions: and ; the accuracy of training set is selected as the fitness function. The steps are as follows .
Step 1. Set GA parameters, like the number of population, evolutionary generation, crossover and mutation probability, and parameter ranges.
Step 2. Optimize coding parameters and initialize the population.
Step 3. Optimize decoding parameters and calculate the recognition rate, decode chromosomes of population, select the training set of model, and use training set recognition rate as the fitness function of the GA algorithm, so that we obtain the optimal MSVM parameters ( and ).
Step 4. Genetic manipulation (selection, crossover, and mutation): each chromosome does the selection, crossover, and mutation based on the fitness, thus excluding low fitness chromosomes and leaving high fitness chromosomes. The new group members are outstanding in the previous generation groups, which are better than the previous generation. GA performs iteratively until meeting some predetermined optimized targets.
Step 5. Get optimal parameters: decode the best chromosome, use the optimal parameters to train the training data in support vector machine classifier, and ultimately get the optimized support vector machine classifier.
3. Simulation and Results Analysis
3.1. Data Generation
In order to analyze the CCPs recognition, Monte Carlo method is used to get the sample data. The following equation is applied to generate the data points for six basic patterns; different parameters are shown in Table 1 :where means the value of sample data at time ; is the mean of data; , is the random value of standard normal distributed between −1 and 1, is the standard deviation of normal distribution, and is the abnormal value. We chose , and use the 40 data points of observation window as inputs of the feature extraction model. Every pattern generates 100 sample data.
However, the observed process data may be mixture control chart patterns in practice, which is combined with two or three basic patterns. Figure 3 shows four kinds of mixture CCPs, which are combined with cyclic, increasing trend and decreasing shift. We know that the principles of increasing/decreasing trend or upward/downward shift are similar, so the increasing trend and downward shift are chosen for the mixture CCPs. And sample data of mixture CCPs can be generated by different parameters in Table 3. Six basic CCPs (see Figure 1) and four mixture CCPs (see Figure 3) are, respectively, used for training and testing the proposed F_MSVM_GA method in this study (Table 2).
3.2. Parameters of MSVM and GA
The performance of MSVM is influenced by its parameters. As Section 2.2 analysis, MSVM based on RBF kernel function is chosen in this study. Related parameters and for this kernel were varied in the fixed ranges [0.1, 100] and [0.01, 10], so as to cover high or small regulations of the classifier and fat or thin kernels, respectively. In the GA optimization module, there are several coefficients, whose values can be adjusted to produce better performances during training in this study, are summarized in Table 3.
4. Performance Analyses
In this section, we measure the performance of the proposed recognizer. For this purpose, we have previously generated 10 patterns, 100 of each type; every sample has 40 data points of observation windows. And we have used about 50% of the sample for training the classifier and the rest for testing. The testing samples can be used to estimate the performance of recognizer for each pattern and then compute the average recognition accuracy of CCPs. Several performances are done to verify the effectiveness of the proposed model.
4.1. Performance of Recognizer in Optimization
First, we have applied MSVM classifier with different features. Table 4 indicates the recognition accuracy (RA) of proposed F_MSVM_GA model on the 13 statistical and shape features and GA optimization algorithm. In order to demonstrate the superior performance of the proposed F_MSVM_GA scheme, MSVM using 13 features as inputs without GA optimization (called F_MSVM) is constructed; the performance results are shown in Table 5.
As reported in Tables 4 and 5, the average recognition accuracies of F_MSVM_GA and F_MSVM are 97.4% and 91.4%. The proposed F_MSVM_GA model has better recognition performance for the mixture CCPs, especially in TS and CTS. Genetic algorithm searches for the best combination of MSVM classifier parameters to gain the fitness maximum, so as to improve recognition rate of testing samples.
4.2. Performance of Recognizer in Different Features
Feature extraction can lead to faster training and more efficiency in CCPs recognition. Thirteen statistical and shape features are utilized as the inputs in this paper. In order to explain its effectiveness, MSVM classifier using the original 40 data points as the inputs (called D_MSVM) is constructed. Table 6 shows the recognition accuracy of mixture CCPs.
The average recognition accuracies of D_MSVM (78.0%) and F_MSVM (91.4%) show that feature extraction method plays an important role in improving the recognition accuracy. From the data, we can find that mixture control chart patterns are difficult to recognize due to the complex relation, but the result is much better after using statistical and shape features extraction method.
4.3. Comparison of the Proposed Method with BP Method
Artificial neural networks (ANNs) are the most frequently used in control chart patterns recognition. It utilized a multilayer perception with back-propagation training to classify abnormal patterns. Back-propagation (BP) method is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. We define that the number of input neurons is 50 and the number of hidden layers is 5. Table 7 shows the performance results.
Compare MSVM models; the accuracy of BP method is only 65.4%, much lower than the other MSVM methods. The reason is that BP neural network quite depends on the quantity and quality of the sample data, but only 50 training samples are considered in this study; it belongs to the small sample noise problem.
We have compared the proposed model with other approaches. This comparison can be seen in Figure 4, and six basic CCPs and mixture CCPs are, respectively, numbered from 1 to 10.
(a) Classification results of F_MSVM_GA
(b) Classification results of F_MSVM
(c) Classification results of D_MSVM
(d) Classification results of BP
Control charts are the most useful tools in statistical process control, and mixture control chart patterns are more and more widely used in manufacturing and service processes. Recognizing the mixture CCPs plays an important role in finding the abnormal quality problems. In this study, a hybrid method by integrating statistical and shape features extraction, MSVM, and GA are presented for recognizing the mixture CCPs. The proposed method initially uses statistical and shape features to get effective input data; then the combination of MSVM and GA is applied to recognize the mixture patterns. GA is to optimize the parameters of MSVM kernel parameters. Six basic CCPs and four mixture CCPs are used in this study for evaluating the performance of the proposed method. From the experiments, the simulation results indicate that the intelligent hybrid method can achieve the highest average recognition accuracies in the tested methods.
The future work will be focused on the following aspects: (1) employing statistical and shape features method as feature extraction method which we will compare with other excellent feature extraction methods, (2) comparing GA with other intelligent algorithms, including particle swarm optimization, simulated annealing algorithm, and ant colony optimization, (3) researching the fundamental principles of mixture CCPs with the help of mathematicians, and (4) seeking economic explanation of our method with the help of economists.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is financially supported by National Natural Science Foundation of China (NSFC) under Grant no. 51175442 and the Fundamental Research Funds for the Central Universities under Grant no. 2682014BR022.
D. C. Montgomery, Introduction to Statistical Quality Control, John Wiley & Sons, New York, NY, USA, 2001.
S.-Y. Yang, D.-H. Wu, and H.-T. Su, “Abnormal pattern recognition method for control chart based on principal component analysis and support vector machine,” Journal of System Simulation, vol. 18, no. 5, pp. 1314–1318, 2006 (Chinese).View at: Google Scholar
C. Wu and L. Zhao, “Control chart pattern recognition based on wavelet analysis and SVM,” China Mechanical Engineering, vol. 21, no. 13, pp. 1572–1576, 2010 (Chinese).View at: Google Scholar