Abstract
The identification of CTQs for complex products is the first step to implement quality control. To improve the efficiency and accuracy of CTQs identification, we propose a novel hybrid approach based on mutual information and improved gravitational search algorithm, which has advantages of filter and wrapper. At first, the information relevance and redundancy are measured by mutual information. Then, the improved gravitational search algorithm is used to search the CTQs. Experimentation is carried out using 2 UCI data sets, and the classification capability of CTQs is tested by SVM and tenfold cross validation. The results show that the presented method is verified to be effective and practically applicable.
1. Introduction
The quality characteristics of complex products with complex structure, hightech, and components highly integrated are highdimensional characteristics. It is not economically or logistically feasible to control or monitor all of the quality characteristics for the highdimensional data. Of all these characteristics, some are critical to quality characteristics (CTQs) which determine the quality of the product while others are redundant or insignificant [1–4]. Therefore, identifying the CTQs is the key to monitor, analyze, and improve the quality of complex products. In this paper, we propose a novel hybrid approach based on mutual information and improved gravitational search algorithm (MIGSA) for CTQs identification; see Algorithm 1. The rest of this paper is structured as follows. Section 2 introduces the literature review of CTQs identification. Some basic concepts on mutual information, gravitational search algorithm, and rough sets theory are described briefly in Section 3. The fundamentals of the proposed hybrid approach to CTQs identification of complex products are described in Section 4. The experimental results and discussion are presented in Section 5. Finally, conclusions are given in Section 6.

2. Literature Reviews of CTQs
According to the literatures, the research on CTQs identification can be divided into two sections: CTQs in design phase and CTQs in manufacture process.
For CTQs of design, the design team identifies the voice of the customer (VOC) and determines the CTQs from engineering characteristics based on the relationship of VOCs and engineering characteristics. The typical method of CTQs identification is quality function deployment (QFD) [5, 6]. QFD provides a means of translating customer requirements into the appropriate technical requirements for each stage of product development and production [7]. For example, Zhang et al. established a multiple CTQs optimization model by using QFD technology and successfully extracted various influencing factors for multiple quality characteristics [8]. He et al. put forward an approach to CTQs decomposition from customer requirements into critical technical parameters based on the relational tree [6]. Thornton distinguishes importance degree of obtained characteristics through adding process data [4, 9]. Rowe presented a methodology compatible with Design for Six Sigma (DFSS) for constructing comprehensive statistical design and process control specifications for CTQs [10].
During manufacture process, the CTQs can be identified by using sequential experimental design. Shen and Wan proposed controlled sequential factorial design (CSFD) for discreteevent simulation experiments [11]. Rout and Mittal used combined array design of experiment approach to screen the factors influencing the performance of manipulator [12]. Mathieu and Marguet presented an integrated method for CTQs identification based on assembly directed graph [13]. Whitney proposed the concept of a data flow chain (DFC), which was used to analyze the effect factors of CTQs [14]. Variation mode and effect analysis (VMEA) was used in identification of noise factors that caused CTQs to fluctuate and risk coefficient was used to measure the importance of them [15]. Wang et al. proposed CTQs identification method in multistage manufacturing process by combining the partial least squares regression (PLSR) method with the state space model [16]. Wu presented an approach to optimizing the correlated multiple quality characteristics based on the modified doubleexponential desirability function [17].
The above methods have been applied successfully in many fields. However, if the number of factors is large (say more than 30) and the output dimension is relatively high, it is hard to obtain the expected characteristics reduction by using the above methods.
According to the characteristic of the manufacturing process, Yan et al. constructed the relationship between the quality characteristics and the class of each product sample and then used information gain (IG) methodology to identify the CTQs of high dimensional complex products [18]. CTQs identification in complex products can be regarded as a feature selection problem. Data mining approach can be used to solve the problem. Yan et al. [1] used ReliefF algorithm to identify CTQs in complex products. The method is verified feasible, but the classification accuracy of the results is low (nearly 70%).
The method MIGSA proposed in this paper merges the merits of efficiency of both filter and high accuracy wrapper. The following sections provide a more detailed description of the approach.
3. Preliminaries
3.1. Mutual Information
In information theory, the uncertainty of random variables can be measured by the entropy [19]. Let be random variables with discrete values; its entropy is defined as where is the probability density function of . Then, let and be two discrete random variables; their joint probability density function is ; then, the joint entropy of them is Conditional entropy is used to describe the uncertainty reduction of variable when variable is known. It is defined as
The mutual information is defined to measure the common information of two variables and :
From the above definition, the high value of means that the two variables and are closely related; otherwise, the two variables are not closely related when the value is small; specially, they are totally unrelated when .
According to (1)–(4), the relation between the entropy and mutual information can be described in
3.2. Gravitational Search Algorithm
The gravitational search algorithm (GSA) is a recently proposed heuristic search algorithm by Rashedi et al. [20], which has been inspired by the Newtonian laws of gravity and motion. As a new stochastic populationbased heuristic optimization tool, the GSA algorithm provides an iterative method that simulates mass interactions and moves through a multidimensional search space in the influence of gravitation. In the GSA algorithm, agents are considered as objects and their performance is measured by their masses. The GSA algorithm is introduced as follows [20, 21].
For a system with agents, the position of the th agent is defined as where presents the position of the th agent in the th dimension and is the dimension of the search space.
The velocity and position of the th agent in the th dimension can be updated using (7) and (12): where is uniform random data in the interval , which is utilized to give a randomized characteristic to the search. The acceleration can be calculated as follows: is the total force exerted on agent in the th dimension and is the mass of the agent at time .
The force acting on agent from the agent at time is defined as where is gravitational constant at time , , is a small constant, and is the distance between agents and . Then, and can be calculated: where is a random number in the interval . is the fitness value of the agent at time . and are the best and worst at time , respectively.
According to the difference of the position updating, GSA can be divided into continuous GSA (CGSA) and binary GSA (BGSA). In the binary algorithm, the position updating means a switching between “0” and “1” values. In the implementation of the BGSA, a large value of velocity must provide a high probability of hanging the position of the mass with respect to its previous position and a small value of the velocity must provide a small probability of changing the position. So, can be transferred into a probability function as follows: Then, the agents will move according to the following rule:
4. The Approach to CTQs Identification
4.1. Characteristics Ranking by Mutual Information
In information theory, mutual information is used to quantitatively analyze the relationship between two characteristics or between a characteristic and a class variable. There are two subsets about the characteristics, one is the already selected subset ; the other is unselected subset . Among all the characteristics in , the characteristic , which has the largest information about classes that not provided by the already selected characteristics, can be selected into [19]. The mutual information of characteristic can be estimated as follows [22]: where and is the characteristic of already selected subset.
4.2. Improved GSA
We use the GSA method to identify the CTQs for complex products. It has been proved that GSA is a better optimization algorithm than PSO and GA in most cases [20]. However, similar to other intelligent algorithms, GSA has a limitation of premature convergence. In order to overcome the drawback, we use oppositionbased learning and immune strategy to improve GSA (IGSA).
The concept of oppositionbased learning was introduced by Tizhoosh [23]. The main idea behind oppositionbased learning is the simultaneous consideration of an estimate and its corresponding opposite estimate in order to achieve a better approximation for the current candidate solution [24]. Let be a real number defined on a certain interval: . The opposite number is defined as follows: Analogously, the opposite number in a multidimensional case can be defined.
Immune algorithm (IA) is a kind of optimization method based on the characteristics of the biological immune system. It is proposed by Farmer, Packard, and Perelson in 1986, when they discussed links between immune system and other artificial intelligence methods. It has the capability to control a complex system [25] and has been applied in many fields. In the IA, antigen represents the problem to be solved. An antibody is generated where each member represents a candidate solution. Affinity is the fitness of an antibody to the antigen. The key of IA is how to generate antibodies [26].
Immune strategy can effectively solve the problem of population diversity and improve the convergence speed through immune recognition and immune memory. In the immune strategy, affinity is used to describe the information contained in an antibody [25]. The affinity function can be calculated as follows: where is the dimension, is the number of antibodies, and is the best position. Then the affinity of all memory antibodies with best position is calculated.
4.3. Implementation of Hybrid Approach
4.3.1. Representation of Position
Assume that there are total characteristics; there will be kinds of characteristic subsets which are different from each other. If each agent takes one characteristic subset, the agent’s position can be represented as binary bit strings of length ; every bit represents a characteristic; 1 means the corresponding characteristic is selected while 0 means the characteristic is not selected.
4.3.2. Fitness Function
The CTQs subset should not only have a small length but also have a high classification quality [27]. So the fitness function can be defined as follows: where is the classification quality of condition attribute set relative to decision in rough set theory and is the length of selected characteristic subset. is the total number of characteristics. and are two parameters corresponding to the importance of classification quality and subset length, and . The high assures that the best position is at least a real rough set reduction. We can calculate the quality of each position by the formula. The goal is to maximize fitness values [27].
4.3.3. Hybrid Approach (MIGSA)
The proposed method has two major parts. The first one is ranking the characteristics by mutual information, and the second one is finding the optimal subset of characteristics by improved GSA based on the result of the first part. Let be the number of the characteristics that we select by the mutual information, . Then, the characteristics dimension is reduced through the selection.
5. Experiments
For our experiments, we implement the hybrid approach for CTQs identification in Matlab 7.10.0. We test the algorithm through two datasets from UCI machine learning repository: SECOM dataset and Statlog (LS) dataset. The brief information about them is given in Table 1. SECOM dataset is the data about semiconductor manufacturing process. There are two classes in 1567 instances including 104 fails, and they have some missing values. Statlog (LS) dataset is the data about landsat satellite. There are 6 classes in 6435 instances. The instance number of each class is 1533, 703, 1358, 626, 707, and 1508, respectively. The dataset of SECOM is used to test the proposed method’s capability, and Statlog (LS) dataset is used to compare with other methods in literatures.
Before the beginning of hybrid approach, data preprocessing, which contains missing values preprocessing and data balance, should be done first. Then mutual information was calculated between characteristics and classes . After that, characteristics are selected to optimize through improved GSA. The program is terminated when the algorithm reaches the stopping criterion. The parameters of configuration of GSA are given in Table 2.
Figure 1 is the process of global best on SECOM and Statlog (LS) by IGSA and GSA. From Figure 1, we find that the IGSA overcomes the premature convergence of GSA. Then, SVM is used as the training procedure, and the classification accuracy of CTQs is estimated by tenfold cross validation. The results are given in Table 3. From Table 3, we can find that the number of CTQs is 11, and the classification is 88.3% for SECOM dataset; the number of CTQs is 5, and the classification is 84.9% for the Statlog (LS) dataset.
In order to prove the proposed method’s capability, it is compared with four algorithms (MRPSO, BPSO, CBPSO, and ReliefF) in the literature [28] from two dimensions: number of CTQs and accuracy using Statlog (LS). The results are listed in Table 4. From Table 4, we can find that the accuracy of CTQs obtained by 5 methods is similar. However, the numbers of CTQs are significantly different. Of the four existing methods, the best is MRPSO with 85.24 percent accuracy and 16 CTQs, while the accuracy of CTQs obtained by our proposed method MIGSA can reach 84.9 percent and only needs 5 CTQs. Hence, the proposed approach is an efficient method of CTQs identification.
6. Conclusions
In order to solve the identification of CTQs for complex products, we propose a hybrid approach based on mutual information and GSA. Due to premature convergence that often happens on GSA, we improve the algorithm through opposition learning and immune algorithm. At first, we compare the improved method IGSA with the original method GSA, and the results of experiment show that IGSA has a strong search capability. Then, MIGSA is compared with 4 methods in the literature; the experimental results show that it can reduce CTQs dimension greatly. From experiments, it can be said that MIGSA is an effective method to identify CTQs for complex products.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is financially supported by the National Science Fund for Distinguished Young Scholars of China (no. 71225006) and National Natural Science Foundation of China (nos. 71071107 and 70931004). The authors thank the editor and reviewers for their insightful comments and suggestions.