Abstract
Engine ignition patterns can be analyzed to identify the engine fault according to both the specific prior domain knowledge and the shape features of the patterns. One of the challenges in ignition system diagnosis is that more than one fault may appear at a time. This kind of problem refers to simultaneousfault diagnosis. Another challenge is the acquisition of a large amount of costly simultaneousfault ignition patterns for constructing the diagnostic system because the number of the training patterns depends on the combination of different single faults. The above problems could be resolved by the proposed framework combining feature extraction, probabilistic classification, and decision threshold optimization. With the proposed framework, the features of the single faults in a simultaneousfault pattern are extracted and then detected using a new probabilistic classifier, namely, pairwise coupling relevance vector machine, which is trained with singlefault patterns only. Therefore, the training dataset of simultaneousfault patterns is not necessary. Experimental results show that the proposed framework performs well for both singlefault and simultaneousfault diagnoses and is superior to the existing approach.
1. Introduction
1.1. Background of Engine Ignition Patterns
Although automotive engine ignition systems vary in construction, they are similar in basic operation. All of them have a primary circuit that causes a spark in the secondary circuit, which is then delivered to the correct spark plug at the proper time. The conditions inside the ignition system and the cylinder also affect the ignition pattern in the secondary circuit. Consequently, the ignition patterns reflect the conditions within the ignition system and help pinpoint their faults [1], such as wide or narrow sparkplug gaps and open sparkplug cables. After capturing the ignition pattern, the automotive mechanic compares the features of the captured pattern with samples from handbooks for diagnosis [2, 3]. This procedure is called ignition system diagnosis. However, there are several challenges for the automotive mechanic which are as follows.(1)The engine ignition pattern is time dependent. Different engine models produce the ignition patterns of various amplitude and duration for the same kind of fault. Even for the same engine, it may produce slightly different shapes of ignition patterns for each engine cycle due to engine speed fluctuation and various testing conditions. Therefore, there is no exact scale and duration for sample patterns in the handbooks. Hence, the traditional diagnosis merely relies on prior domain knowledge and the engineer’s experience.(2)Practically, the engine ignitionsystem diagnosis is a simultaneousfault problem, but many handbooks only provide singlefault patterns for reference. To determine simultaneous faults, the engineer can only extract and analyze some specific features of singlefault patterns from a simultaneousfault pattern, such as frequency, firing voltage, and burn time, and make a decision about the presence of simultaneous faults according to their experience and knowledge.(3)As suggested in the existing literature [1–3], the ignitionsystem diagnosis based on the shape features and the prior domain knowledge of the ignition pattern cannot conclude a definite answer. It is because many possible faults may occur individually or simultaneously. The handbooks do not provide the rank of the probability of each possible fault. Therefore, to find out a fault based on ignition patterns, many trials for disassembling and assembling of engine parts are often necessary unless the engineer has very rich experience.
To tackle these challenges, an effective feature extraction method for engine ignition patterns is required, which combines domain knowledge (DK), timefrequency decomposition, and dimensional reduction techniques. Moreover, an advanced probabilistic classifier is necessary to provide the rank of each possible fault and reliable diagnostic results. In recent years, some intelligent diagnostic methods based on pattern recognition have been developed for multiclass fault diagnosis (i.e., singlefault diagnosis because only a single fault is identified) of mechanical systems [4–9]. Generally, these methods include two steps: feature extraction and classification.
1.2. Feature Extraction Methods
Feature extraction is very important because the indepth and hidden features of singlefault patterns can be detected through frequency subband decomposition. Referring to the existing literature, many classical feature extraction techniques were applied to fault diagnosis; the most typical one is the fast Fourier transform (FFT) [10–13]. However, its main drawback is the unsuitability for nonstationary patterns. Wavelet packet transform (WPT) [1, 4, 14–19] is another popular time frequency localization analysis method that received a widespread utilization in the past decade. By means of multiscale analysis, WPT can be successfully applied to nonstationary patterns, based on subband coding and a systematic decomposition of a pattern into its subband levels for pattern analysis. Therefore, WPT is employed in this research for feature extraction.
Nevertheless, one drawback for WPT is that the size of the extracted features is larger or equal to that of its original pattern. If the original pattern is of a high dimension, there is a large amount of extracted features that may incur two issues: (1) the high complexity of the trained classifiers because of the huge amount of inputs; (2) there may be many redundant and unimportant extracted features so that noise can be induced. Both of the issues can degrade the classifier performance. Therefore, compensating the drawback by employing dimensional reduction technique such as principal component analysis (PCA) [20–22] is suggested. In this research, PCA is selected as the dimensional reduction technique for a simple illustration purpose. More advance techniques could be considered in the future. Compared to other dimensional reduction techniques, PCA has three advantages: (1) it has no hyperparameter; (2) PCA eliminates the interaction of variables because the principal components are independent of each other; (3) the principal components are sorted by their information weights, so some unimportant principal components can be further reduced. Then, the feature extraction approach of WPT+PCA can transform an original ignition pattern into a reduced dimensional feature vector while retaining most of the information content.
1.3. Classification Methods
For classification, a fault can be considered as a label, no matter whether it is a single fault or simultaneous fault. To date, there are only a few researches on simultaneousfault diagnosis. The typical classification method for simultaneousfault diagnosis is to build a number of classifiers according to the combination of all possible faults; this method is called monolabel classification [23]. However, it is practically difficult to obtain the training data of all possible combinations particularly for ignition patterns. Normally, the number of combination of all faults in an engineering problem is very large that affects the diagnostic accuracy because the complexity of the classifiers will also be immensely increased. Moreover, if a new single fault is added in the future, the number of required training simultaneousfault patterns grows significantly. To overcome this drawback, Yélamos et al. [23] proposed a binarization strategy using support vector machine (SVM) and applied to simultaneousfault diagnosis of a simulated chemical process based on timeindependent data, in which the labels of the single faults or simultaneous faults were processed as binary vectors, that is, 0 or 1 only. For each label, a binary classifier was constructed using SVM with oneversusall splitting strategy. Given an unknown pattern, the classifier would output a vector of binary results (0 or 1). From this approach, only singlefault patterns are used for training the classifiers while simultaneousfault patterns are not necessary. The experimental results showed that the overall accuracy of their binarization approach is almost the same as that of the traditional monolabel approach. This kind of binarization approach sounds good but still suffers from several drawbacks: (1) the approach assumes that informative features are obvious and available that is not always the case for timedependent signal patterns, so this approach cannot be suitable for ignition patterns; (2) the oneversusall strategy ignores the pairwise correlation between the labels and hence the classification accuracy is mostly degenerated; (3) the approach only considers the presence of a fault, if its corresponding output is close to the classification margin which lacks confidence of correct classification, that is, the degree of belief of faults.
From the practical point of view, a proper classifier has to offer the probabilities of all possible faults. Then the user can at least trace the other possible faults according to the rank of their probabilities when the predicted fault(s) from the classifier is incorrect in the problem. Therefore, it is better to employ probabilistic classifier for simultaneousfault diagnosis. The probabilistic structure is also suitable for the fault with uncertainty such as engine ignitionsystem diagnosis. Typically, probabilistic neural network (PNN) [24, 25] was employed as a probabilistic classifier. It was shown in [24] that the performance of PNN is superior to SVM based method for multilabel classification. However, the main drawback of PNN lies in the limited number of inputs because the complexity of the network and the training time are heavily related to the number of inputs. Recently, Widodo et al. [6] proposed to apply an advanced classifier, namely, relevance vector machine (RVM) to fault diagnosis of low speed bearings. They showed that RVM is superior to SVM in terms of diagnostic accuracy. Besides, RVM can also handle regression problem [26]. RVM is a statistical learning method proposed by Tipping [27], which trains a probabilistic classifier with sparser model using Bayesian framework. RVM can be extended to multiclass version using oneversusall (1vA) strategy. However, this strategy was verified to produce a large region of indecision [28, 29]. In view of this drawback, this research is the first in the literature to incorporate pairwise coupling, that is, oneversusone (1v1) strategy, into RVM, namely pairwise coupled relevance vector machine (PCRVM). As PCRVM considers the correlation between every pair of fault labels, a more accurate estimate of label probabilities for simultaneousfault signals can be achieved.
1.4. Decision Threshold Optimization
If a probabilistic classification is applied to fault detection, the predicted fault is usually inferred as the one with the largest probability. The other alternative approach is that the probabilistic classifier ranks all the possible faults according to their probabilities and lets the engineer make a decision. These inference approaches work fine with singlefault detection but fail to determine which faults occur simultaneously in the simultaneousfault problem. It is because the engineer cannot identify the number of simultaneous faults based on the output probability of each label. For instance, an output probability vector for five labels is given as [0.21,0.5,0.69,0.01,0.6]. In this example, it is difficult for the engineer to judge whether the simultaneous faults are labels 2, 3, and 5. To identify the number of simultaneous faults, a decision threshold must be introduced and thus a new step of decision threshold optimization is proposed in the current framework other than feature extraction and probabilistic classification.
1.5. Research Objectives and the Proposed Framework
Currently, very little research examines whether the features of singlefault ignition patterns can be reflected in the ignition patterns of some simultaneous faults. If it is feasible, some rational (not all) simultaneous faults are likely to be identified based on the prior domain knowledge and the features of singlefault ignition patterns. In other words, the features about the single faults in a simultaneousfault pattern could be detected and then classified using the probabilistic classifier trained with the singlefault patterns only. Under this concept, the simultaneousfault patterns are not necessary for training the classifiers. Once a new single fault is added in the future, the diagnostic system can be easily extended because the issue of combinatory single faults has been eliminated. To verify the feasibility and determine the best feature extraction method, this research proposes to extract the important knowledgespecific, timedomain, and frequencydomain features of the singlefault patterns using the combination of WPT+PCA, FFT, and DK. Then the pairwise coupled probabilistic classifier is trained using a training dataset of these extracted singlefault features in order to identify simultaneous faults for reasonable unseen patterns. Therefore, a feasibility study on this idea for simultaneousfault diagnosis is an important contribution of this research. Another important contribution of the research is the reduction of required training patterns for simultaneousfault diagnosis.
This paper is organized as follows. The proposed framework and the related techniques are described in Section 2. In Section 3, the experimental setup is presented, followed by the results and a comparison with latest approach [23] in Section 4 and discussion in Section 5. Finally, a conclusion is given in Section 6.
2. Proposed Framework and Related Techniques
The proposed diagnosis framework (Figure 1) includes three steps: feature extraction, classification, and threshold optimization. The framework is general so that different feature extraction, probabilistic classification, and threshold optimization techniques could be adopted. In this paper, FFT, WPT, and PCA are examined in the step of feature extraction and their detailed descriptions can be, respectively, found in [22, 30, 31]. In addition, these techniques are combined, respectively, with timerelated domain knowledge (DK) for a comprehensive comparison.
2.1. Formulation of the Proposed Framework
Given a sample dataset of (singlefault or simultaneousfault) patterns, to , and is a vector of labels of the corresponding singlefault pattern of and is the number of single faults. Here there may be more than one fault in so that , for to . In Figure 1, the sample dataset is divided into three groups: training dataset, validation dataset, and test dataset where training dataset only involves singlefault patterns.
After applying feature extraction techniques to the patterns , a set of feature vectors is produced. A training dataset of singlefault patterns only (no simultaneousfault patterns are necessary) is selected to train a multilabel classifier by using probabilistic classification algorithm. Then takes an unknown feature vector as input and outputs a probability vector where is the number of the singlefault labels. Here denotes the probability that belongs to the th label for to . Since every is an independent probability, is not necessarily equal to one. At this stage, the diagnostic system can provide the probability vector to the user as a quantitative measure for reference and further use. Afterwards, the multilabel decision vector is constructed from using (1): where is a userdefined decision threshold and indicates that belongs to the th label or not (Figure 2). For example, if and , then . Therefore, is diagnosed as a simultaneousfault . Notice that indicates that no fault has been found, and hence the unseen instance is diagnosed as a normal pattern.
2.2. Extraction of Prior Domain Knowledge Features for Ignition Patterns
When an engine starts firing, its secondary coil produces a rapid high voltage to cause spark plug to produce spark. This high voltage is called the firing voltage. Then the spark voltage decreases until zero. The spark voltage represents the voltage required to maintain spark for the duration of the spark line. The duration is called the burn time. After the burn time, the energy in the ignition coil nearly exhausts, and the residual energy forms slight oscillation in the ignition coil. The entire procedure is shown in Figure 3. Using the ignition pattern to diagnose the engine fault is a common diagnostic method for automotive engineers. With reference to some handbooks [2, 3], the following prior domain knowledge for a pattern can be observed for engine fault diagnosis (Figure 3): (1)firing voltage ;(2)burn time ;(3)average spark voltage .
In this study, all patterns start from the firing voltage which is at the first sampling point: where is the voltage of the first sampling point. Ideally, the burn time starts from the spark voltage and ends at the position where the spark voltage falls to zero. However, in practice, the voltage could slightly oscillate after the burn time so that exact zero value may not be reached. In this study, when the voltage falls to 0.1% of the firing voltage, it is considered as zero and the burn time ends. The feature can be obtained as illustrated in Figure 4, where indicates the end point of burn time, and is the length of patterns. With the index and time step , the average spark voltage of the spark line can be calculated as follows:
2.3. Feature Extraction Using WPT and PCA and Combined Feature Vector
WPT is a generalization of wavelet decomposition that offers a richer signal analysis [31]. It is well known that WPT can extract timefrequency features of a signal pattern. Given a set of patterns , to , WPT transforms an ignition pattern into a set of coefficient packets , and is the ceiling function of at level to . Then, these packets are concatenated as as the extracted features of the pattern . It is believed that the indepth and hidden features of the single fault patterns can be detected through the coefficient packets after WPT decomposition. WPT is applied to every to form a set of features , to .
Usually, the dimension of is large and a certain amount of the features may be redundant. Therefore, PCA is employed for dimension reduction of while retaining its important information. The details of PCA can be found in [22]. After applying PCA to , a set of eigen vectors and eigen values are returned, which represent the transformation vectors and the importance of the transformed dimensions, where to , . The most important dimensions are selected based on the criterion of , that is, a 1% of information loss is allowed, where is a normalized eigen value. Knowing the value of , the corresponding transformation matrix is then formed. So is the reduced feature dataset. For any unseen ignition pattern in the future, its feature vector can be obtained by , where . By combining the prior domain knowledge, the final feature vector as the classifier inputs is given in the following:
2.4. Relevance Vector Machine
Relevance vector machine [27] is a statistical learning method utilizing Bayesian learning framework and popular kernel methods. In fault diagnosis, RVM is designed to predict the posterior probability of the binary class membership (i.e., either positive or negative) for an unseen input , given a set of training data , to , , and is the number of training data. It follows the statistical convention and generalizes the linear model by applying the logistic sigmoid function to the predicted decision and adopting the Bernoulli distribution for . The likelihood of the data is written as follows [27]: where are the adjustable parameters, and a radial basis function (RBF) is typically chosen for .
The current objective is to find the optimal weight vector in (5) for the given dataset , which is equivalent to find so as to maximize the probability , with a vector of hyperparameters. However, it is impossible to determine the weights analytically. Hence, closedform expressions for either the marginal likelihood or equivalently the weight posterior are denied. Thus, the following approximation procedure is chosen [32], which is based on Laplace’s method.(a) For the current fixed values of , the most probable weights are found, which is the location of the posterior mode. Since , this step is equivalent to the following maximization: (b) Laplace’s method is simply a Gaussian approximation to the logposterior around the mode of the weights . Equation (6) is differentiated twice to give where is a diagonal matrix with , and is a design matrix with and , to , and to . By inverting (7), the covariance matrix can be obtained.(c) The hyperparameters are updated using an iterative reestimation equation. Firstly, randomly guess and calculate , where is the th diagonal element of the covariance matrix . Then reestimate as follows: where . Set and reestimate and again until convergence. Then is estimated so that the classification model is obtained.
2.5. Pairwise Coupled RVM
The traditional RVM formulation is designed only for binary classification; that is, the output is either positive or negative . In order to resolve the current simultaneousfault problem, multiclass strategies of oneversusall (1vA) and oneversusone (1v1, or specifically named as pairwise coupling) [28] can be employed. Traditionally 1vA strategy constructs a group of classifiers in a label classification problem. For any unknown input , the classification vector , where if or if . The 1vA strategy is simple and easy to implement. However, it generally gives a poor result [29, 33, 34] since 1vA does not consider the pairwise correlation and hence induces a much larger indecisive region than 1v1 as shown in Figure 5.
(a)
(b)
On the other hand, pairwise coupling (1v1) also constructs a group of classifiers in a label classification problem. However, each is composed of a set of different pairwise classifiers , . Since and are complementary, there are totally pairwise classifiers in (Figure 6(b)).
(a)
(b)
In this study, each can be an RVM classifier which estimates the pairwise probability that an unknown instance belongs to the th label against the th label, that is, or . There are several methods for pairwise coupling strategy [28], which are, however, suitable for multiclass diagnosis only because of the constraint . Note that the nature of simultaneousfault diagnosis is that is not necessarily equal to 1. Therefore, the following simple pairwise coupling strategy for simultaneousfault diagnosis is proposed.
Every is trained only by the training data with the th and th labels. Let or ) be the pairwise probability of the th label against the th label for an unknown instance , where is estimated using RVM. Then, is calculated as where is the number of training data with the th and th labels. Hence, the probability can be more accurately estimated from because the pairwise correlation between the labels are taken into account. With the above pairwise coupling strategy, PCRVM can more accurately estimate the probability vector and hence generate a higher classification accuracy for simultaneousfault diagnosis.
2.6. Decision Threshold Optimization and Measure
PCRVM can only provide the probability vector of the singlefault labels but the desired result is the classification vector . It is obvious that the value of decision threshold will greatly affect the classification accuracy. For a situation without any prior information, the best estimate of may be simply set to 0.5, that is, the presence of a fault is considered if its probability is at least 0.5. However, the value of should be optimized according to the classification accuracy. In other words, the value should be chosen to produce the highest classification accuracy over a validation dataset.
Besides, the traditional evaluation of classification accuracy only considers exact matching of the predicted label vector against the true label vector . This evaluation is however not suitable for simultaneousfault diagnosis where partial matching is preferred. Therefore, a common evaluation called measure is employed.
measure [35] is commonly used as performance evaluation for information retrieval systems where a document may belong to a single or multiple tags simultaneously. This is very similar to the current application that contains a mixture of singlefault and simultaneousfault patterns. With measure, the evaluation of singlefault and simultaneousfault test patterns can be appropriately done at one time. To define measure , two concepts of precision and recall are used so that where and are originally designed for singlefault patterns only but can be extended to handle simultaneousfault patterns. For singlefault and simultaneousfault test data, where and are, respectively, the th predicted label and the th true label in the th test data, and . Substituting (11) into (10), the final measure equation is given in (12). The larger the measure value, the higher the diagnostic accuracy is With measure, the value can be optimized using typical direct search techniques such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) [36].
2.7. Principle of Detection of Single Faults and Simultaneous Faults
After an unknown instance is passed to the above system, a probability vector is produced. If is caused by a single fault (e.g., the th fault), contains only the symptoms of the th fault. Then, in , the corresponding probability so that in the decision vector while all other , . In other words, and hence a single fault is detected.
For the case that is caused by two simultaneous faults (e.g., the th and th faults), is constituted by the symptoms of the th and th faults. These symptoms may be overlapping or interdistorted. In the current diagnostic system, probabilities are employed to give the similarity of against the th and th faults by and , respectively. If their symptoms are not highly overlapping or interdistorted, there is a high chance that the corresponding probabilities . Under this circumstance, and , making so that a simultaneous fault can be detected. The mechanism is similar for three or more simultaneous faults. By combining these cases, the proposed system can diagnose both single fault and simultaneous faults using classifiers trained with single faults only.
2.8. Summary of Proposed Framework and Techniques
The previous framework and techniques are summarized in Algorithm 1. Figure 7(a) shows the workflow of using DK and WPT+PCA as feature extraction. Every dataset for training, validation and test requires going through the step of feature extraction. Figure 7(b) shows the construction of the classifier . The classifier has the architecture of pairwise coupling as depicted in Figure 6(b). Then the classifier is passed to an optimizer to search for the optimal decision threshold based on a validation set and measure as shown in Figure 7(c), where outputs the probability vector for each case in . To optimize the threshold, the measure over can be evaluated as the fitness value. Since the direct search technique is easily stuck by local minima, it is necessary to run different times of the optimization step in Figure 7(c) to avoid this issue. For testing and running, the step in Figure 7(d) is very similar to Figure 7(c) except the optimal threshold that has been determined. The choice of parameters of the feature extraction, classification, and direct search techniques are discussed in Section 4.

(a) Feature extraction
(b) Training
(c) Decision threshold optimization
(d) Test∖run
3. Experimental Setup
To verify the effectiveness of the proposed methodology, an experiment was set up for sample data acquisition and evaluation tests. The details of the experimental setup and preparation of datasets are presented in the following subsections.
3.1. Data Sampling
In total, a set of single faults and simultaneous faults were imitated and selected as demonstration examples. There are 10 kinds of single faults as described in Tables 1 and 4 kinds of simultaneous faults as described in Table 2. However, there is an issue that the simultaneousfault patterns are not caused by a random combination of single faults but some reasonable combinations (e.g., it is impossible to have wide sparkplug gap and narrow sparkplug gap at the same time). Moreover, the experimental data show that a simultaneousfault ignition pattern is caused by a combination of at most three single faults. Beyond these constraints, the ignition patterns cannot be captured due to engine stall. Some sample ignition patterns of these single faults and reasonable simultaneous faults are shown in Figures 8 and 9, respectively.
(a)
(b)
(c)
(d)
(e)
In this study, five wellknown inline 4cylinder electronic ignition engines, namely, HONDA B18C, HONDA D15B, HONDA K20A, TOYOTA 2NZFE, and MITSUBISHI 4G15, were employed as the experimental platforms, and a computerlinked automotive scope meter was used (Figure 10) to capture raw ignition patterns. Different models of engines were used for training in order to enhance the generalization of the classifier. To capture ignition patterns, the sampling frequency of the scope meter was set to a high rate of 100 kHz, that is, 100,000 sampling points per second. Under the software provided by the scope meter, ignition patterns were recorded in a PC and converted into a file of excel format for processing and analysis.
For each case (single fault or simultaneous faults in Tables 1 and 2) in every test engine, sixteen ignition patterns (four patterns for each cylinder) were captured over two different engine testing conditions according to the standard procedure in [3] (1200 rpm and 2000 rpm). As the pattern obtained in each cylinder per engine cycle is somewhat unrepeatable, four patterns per cylinder are required. The reason for causing unrepeatable patterns is that a constant engine speed is difficult to hold during sampling. Furthermore, each cylinder has its own manufacturing error, different inlet and exhaust flow characteristics, and so forth. Finally there were 1600 ignition patterns of single faults (i.e., 10 labels × 4 patterns × 4 cylinders × 2 testing conditions × 5 engines) and 800 ignition patterns of simultaneous faults (i.e., 5 labels × 4 patterns × 4 cylinders × 2 testing conditions × 5 engines).
3.2. Data Normalization
As the number of sampling points of every captured pattern is not exactly the same due to engine speed fluctuation and various testing conditions, all patterns were normalized within the same range in order to match the number of inputs of the classifier. Normalization of the ignition patterns was done in terms of duration. In this study, the number of sampling points for every pattern was less than 17,000. For the sake of conservation, a standard number of sampling points for all patterns was set to 18,000 in order not to lose any exceptional information. To standardize the duration of all patterns, steadystate values can be appended to the rear part of the patterns if necessary. Normally, the steadystate value for the ignition pattern is equal to zero (0 V). For those patterns having fewer than 18,000 data points, zeros can be appended. Therefore, the durations of all sample patterns were normalized before feature extraction using WPT+PCA.
3.3. Allocation of Datasets
In order to test the diagnostic performance for both single faults and simultaneous faults, about 3/4 of the singlefault patterns were taken as training data . There were 1/16 of the singlefault patterns and 1/5 simultaneousfault patterns in the validation dataset , while the remaining 3/16 of the singlefault patterns and 4/5 simultaneousfault patterns were used as test dataset .
4. Experimental Results
To select the best combination of the techniques for feature extraction, classification, and threshold optimization, many experiments based on the sample dataset were conducted. The sample dataset was separated into 3 groups: for training the classifier, for the threshold optimization and selection of direct search techniques, and for evaluating the performance of different combinations of the feature extraction, classification, and threshold optimization techniques. The performance evaluation over is based on measure that can evaluate singlefault and simultaneousfault patterns at one time according to partial matching criterion. All experiments were carried out under a PC with Core i5 @ 3.20GHz and 4GB RAM. All the proposed techniques mentioned were implemented using Matlab R2008a.
4.1. Results of Various Combinations of Feature Extraction and Classification Techniques
The reasonable combinations of DK, FFT, and WPT+PCA for feature extraction were tested as shown in Table 3 along with the corresponding evaluation. The classification techniques used in the experiment include PNN, RVM, and PCRVM. PNN [24] was selected for comparison because it is a traditional probabilistic classification using radial basis (Gaussian) kernel. The input dimension of the classifiers for evaluation is subject to the feature extraction technique. In terms of WPT, PCA, and DK, WPT transforms the original patterns of 18000 points into different packets at level . The value of can be determined using entropy information. A builtin function bestlev (meaning best level) is available in Matlab wavelet toolbox for this purpose. After carrying out many experiments using the function bestlev, was tested to be 9 for the sample dataset of ignition patterns. In this study, the common mother wavelet, Haar wavelet, was selected for the purposes of illustration and comparison of different feature extraction techniques. For better performance, different types of mother wavelets could be evaluated in the future. After PCA, the most 22 important dimensions were selected as described in Section 2.3. Therefore, the size of is equal to 22 plus the three domain features, that is, 25 totally. For FFT and DK, the sizes of are equal to 18000 and 3 features, respectively.
In the construction of the intelligent engine diagnostic systems with different techniques for comparison, each feature extraction technique was firstly employed to preprocess the training dataset , and then different classification techniques were applied. The performance of every combination was evaluated over using measure. In order to reflect the effectiveness of the feature extraction, the classification techniques under without any preprocessing were also examined. Therefore, there were totally 18 combinations of feature extraction and classification techniques as shown in Table 3.
For classification techniques of PNN, RVM, and PCRVM, several simple settings are necessary. PNN requires a hyperparameter called smoothing factor or spread, which is equivalent to the width of the Gaussian kernel within PNN. If the value of spread is set too high, the trained classifier may easily overfit the training patterns and hence a lower generalization. In the case study, the value of spread for PNN was simply set to be 0.2 according to rule of thumb [37]. The RVM and PCRVM employ different classification strategies (1vA versus 1v1) but they share the same set of hyperparameters, namely, type of kernel functions and the corresponding kernel parameters. For illustration purpose, Gaussian was selected as the kernel function and its kernel width was set to be 1.0 in order to calculate the design matrix in (7). The experimental results of various combinations of feature extraction and classification techniques are shown in Table 3. In order to evaluate the measures under different combinations of preprocessing and classifications, the decision threshold was simply set to 0.5 for a simple and fair comparison in this phase.
4.2. Results with Threshold Optimization
Genetic Algorithms (GA) are the most classical direct search technique, while Particle Swarm Optimization (PSO) is another popular choice. Both of them were tested for the optimization of the decision threshold and they share the same objective function. Since the measure , the objective function of optimization can be simply set as follows: The higher the , the better the optimization result will be. The optimization procedure follows the proposed algorithm in Algorithm 1, where the number of runs was set to be 20. Tables 4 and 5 show the detail settings of the GA and PSO operators and parameters, respectively, according to the literature [36]. Therefore, among 20 runs of the proposed algorithm for every combination of feature extraction and classification techniques under GA and PSO optimization, the optimized threshold and its corresponding value of different combinations of techniques are shown in Tables 6 and 7, respectively.
4.3. Individual Result of Single and SimultaneousFault Diagnosis
The objective of this research is to train a probabilistic classifier using singlefault patterns and then predict both single and simultaneous faults. However, it is unclear whether the performance of the trained probabilistic classifier on simultaneous faults in Section 4.2 is correct or not because the classification results of different combinations of techniques were all evaluated over the whole test dataset , which contains singlefault and simultaneousfault patterns. To better illustrate the performance of the proposed method, was further separated into two groups, one for purely singlefaults , and another for purely simultaneousfaults . All evaluation tests were done using the combination of DK+WPT+PCA as feature extraction and the PSOoptimized threshold of 0.7147 because Tables 6 and 7 show that this combination produces the best measure. The measures of purely single faults and purely simultaneous faults are shown in Tables 8 and 9, respectively, which were calculated using (12) with the related faults. For example, for Fault 1, is evaluated on the test cases of Fault 1 only. For simultaneous faults of the combination , after prediction there is a classification vector , and a true vector , then and with the true values and from the test cases are employed to compute the two separate values for detail analysis.
4.4. Results Comparison with the Latest Technique
To further verify the effectiveness of the presented framework, the existing binarization approach using SVM [23] was applied to the ignition system diagnosis for comparison. The binarization approach builds classifiers directly based on raw ignition patterns, so there is no feature extraction step. In this approach, a number of binary classifiers were constructed, respectively, using support vector machines (SVM) with oneversusall splitting strategy where to , is the number of single faults again. A decision vector can be obtained for an unknown pattern , where is the raw output value of the th SVM classifier, and if and Otherwise. From this framework, only singlefault patterns were used for training the binary classifiers while simultaneousfault patterns are also not necessary. Since there is no probabilistic output but only a binary decision vector is generated in the binarization approach, no decision threshold optimization is necessary in this experiment. The results using the binarization approach is shown in Table 10.
5. Discussion of Results
5.1. Effect of Feature Extraction and Pairwise Probabilistic Classification
The experimental results presented in Section 4 are discussed in this section. Table 3 illustrates that the step of feature extraction is effective. DK is the timerelated features of an ignition pattern but only improves the overall classification accuracy about 1% as compared with the methods without any feature extraction, while FFT and WPT+PCA give about 4.4% and 4.8% improvement, respectively. When combining both timerelated and frequencyrelated features by DK and WPT+PCA, the overall classification accuracy is about 7% higher than that without any feature extraction. Table 3 also indicates that no matter which classification technique is employed, the integration of DK and WPT+PCA as feature extraction gives the best accuracy. In addition, the three classification techniques are compared by using measure as well. Both PNN and RVM employ 1vA strategy for probabilistic classification. In other words, only binary classifiers were constructed for labels so that there are large indecision regions between pairs of classes. Therefore, when a test case lies on these regions, PNN and RVM mostly fail to classify the faults correctly. However, PCRVM employs 1v1 strategy, which minimizes those indecision regions. Table 3 verifies the effectiveness of the 1v1 strategy because PCRVM outperforms the other two classification techniques. This situation is almost the same as the tests with optimized decision threshold as shown in Tables 6 and 7. Therefore, the proposed PCRVM is a very effective and promising classification technique.
5.2. Effect of Decision Threshold Optimization
Tables 3, 6, and 7 illustrate that the GA and PSO can improve the overall accuracy by 3.48% and 3.5% as compared with the fixed decision threshold of 0.5, but these two techniques give nearly the same threshold and . The reason is that the experiment was run for 20 times for both the GA and PSO, and then the pair of results with the highest was returned. However, it is found that the standard deviations of the 20 results for the GA and PSO are 1.02E3 and 3.23E4, respectively. For the GA, the standard deviation is larger than PSO in this case study. This result indicates that PSO is more stable than the GA and theoretically requires a fewer number of runs to obtain a suboptimal result than the GA. This is because PSO is somehow insensitive to the initial values, whereas the GA is initialized with random start points within the search space and the search result is very sensitive to the initial values [36]. Consequently, PSO is recommended for this application.
5.3. Diagnosis of Simultaneous Faults
Table 8 reveals that the trained classifiers using PNN, RVM, and PCRVM perform well because the test cases contain singlefault patterns only. Due to the advantage of pairwise coupling, PCRVM performs the best among the three classification techniques.
For the test cases of simultaneousfault patterns, there are only five reasonable combinations of simultaneous faults because not every combination is possible. Since a simultaneousfault pattern is caused by different single faults, some of the timerelated and frequencyrelated features may be distorted or even vanished. Therefore, the feature extraction using DK and WPT+PCA cannot work very well and hence the values of in Table 9 drop a little bit as compared with the values in Table 8, but they can still provide an accuracy ranging from 0.49 to 0.8. Once again, PCRVM outperforms the other classified techniques because of pairwise coupling strategy. Within the simultaneousfault diagnosis, the most misclassified fault is Fault 10, because the ignition pattern of Fault 10 is almost distorted by Fault 3. Nevertheless, the experimental results can still verify the following:(1)the proposed framework can alleviate the problem of exponential growth of training dataset for simultaneousfault ignition patterns by training the probabilistic classifier using singlefault patterns only. This evidence can be found in Tables 8 and 9 that the singlefault patterns can be almost correctly classified, while the overall classification accuracy for simultaneousfault ignition patterns is still satisfactory;(2)the feature extraction techniques of DK combined with WPT+PCA can effectively capture the timerelated and frequencyrelated features from singlefault and simultaneousfault ignition patterns;(3)the features of singlefault ignition patterns can really be detected in some feasible simultaneousfault ignition patterns; this feasibility will create a new research direction for automotive engine diagnosis;(4)RVM is more robust than PNN for probabilistic classification;(5)the pairwise coupling (1v1) strategy can improve the accuracy for common probabilistic classification techniques.
5.4. Comparison with the Latest Approach
Table 10 reveals that the binarization approach works badly on ignition pattern classification. In other words, the binarization method does not work for engine ignitionsystem diagnosis. In addition, after feature extraction, the performance of the binarization can be generally raised about 50% as well. Therefore, the effectiveness of feature extraction is verified under all frameworks and techniques tested in this paper. It is highly believed that this feature extraction can also work well in many other practical applications.
6. Conclusions
One of the challenges in ignition system diagnosis is that more than one single fault may appear at a time. Another challenge is the acquisition of large amount of costly simultaneousfault ignition patterns for constructing the diagnostic system because the number of the training patterns depends on the combination of different single faults. In this paper, simultaneousfault diagnosis for automotive engine ignition patterns was studied and a new framework combining feature extraction, probabilistic classification, and decision threshold optimization based on a fair multilabel assessment, measure, has successfully been developed. With the proposed diagnosis framework, the acquisition of large amount of simultaneousfault patterns can be avoided.
In this study, the combination of feature extraction techniques of DK, FFT, and WPT+PCA have been tried along with the classification techniques of PNN, RVM, and PCRVM to tackle the simultaneousfault diagnosis. The experimental results reveal that PCRVM combined with WPT+PCA and DK performs the best for both singlefault and simultaneousfault diagnoses. Its average accuracy for singlefault diagnosis is about 0.95 while the average accuracy for simultaneous faults is only about 0.76. It implies that the feature extraction technique based on DK and WPT+PCA for simultaneousfault detection may not be perfect. Alternative approach, such as the integration of feature extraction, classification, and multiexpert reasoning, could be studied in the future.
This study also shows that the decision threshold for identifying the number of simultaneous faults can be optimized over measure using direct search techniques, such as GA and PSO. Both the GA and PSO generate almost the same decision threshold but PSO requires less computational time and is more stable because of its lower standard deviation in multiple runs. Moreover, PSO has fewer operators and hence fewer adjustable parameters that can further reduce the user burden. Overall speaking, PSO should be the first choice of the threshold optimization technique in the current application.
To further verify the effectiveness of the proposed framework, the latest method, binarization method using SVM, was also employed to diagnose the simultaneous faults. The results show that the diagnosis accuracy of the binarization method is worse than that of the proposed framework. Therefore, the proposed framework is very suitable for engine ignitionsystem fault diagnosis. Since the proposed framework for simultaneousfault diagnosis is general, it can be adapted to other similar applications. Finally, the original contributions of the research are summarized as follows.(1)The research is a first attempt at integrating DK+WPT+PCA, PCRVM, and direct search techniques into a general framework for simultaneousfault diagnosis of automotive ignition systems. (2)The proposed diagnostic system is the first in the literature that can be trained with singlefault signal patterns (i.e., singlefault timedependent patterns) only, while it can diagnose simultaneousfault signal patterns too. (3)This paper is also the first in the literature that reports that the features of singlefault ignition patterns can be detected in some feasible simultaneousfault ignition patterns. This fact is an important contribution to automotive engine diagnosis.(4)The integration of the pairwise coupling (1v1) strategy into RVM is original, and the 1v1 strategy can really improve the classification accuracy of RVM.
Notation
:  Diagonal matrix of hyperparamters 
:  End point of burning time 
:  Diagonal matrix in RVM 
:  th binary classifier 
:  th probabilistic classifier 
:  Probability of belonging to the th label 
:  Pairwise classifier 
:  Pairwise probability of belonging to the th label against the th label 
:  Cognitive parameter of PSO 
:  Social parameter of PSO 
:  Sample dataset 
:  Number of labels (faults) 
:  th eigen value 
:  th normalized eigen value 
:  Set of feature vectors 
:  Set of feature vectors created by WPT and PCA 
:  Firing voltage 
:  Burn time 
:  Average spark voltage of spark line 
:  measure 
:  Feature vector 
:  Feature vector created by WPT and PCA 
:  th feature vector 
:  Probabilistic classifier 
:  PCA transformation matrix 
:  th eigen vector 
:  Decomposition level of WPT 
:  Kernel function in RVM 
:  Length of ignition pattern (i.e., number of data point in ignition pattern) 
:  True label vector 
:  th true label vector 
:  th label in 
:  th label in 
:  th label in the th test data 
:  Number of training data 
:  Number of cases in sample dataset 
:  Number of test data 
:  Number of training data with the th and th labels 
:  Probability of belonging to 
:  Probability 
:  Input dimension of classifier to be evaluated 
:  Set of faulty labels in training dataset 
:  Faulty label of the th training case 
:  Original test dataset 
:  Singlefault cases in test dataset 
:  Simultaneousfault cases in test dataset 
:  Test dataset after feature extraction 
:  Original training dataset 
:  Training dataset after feature extraction 
:  Set of coefficient vectors 
:  Original validation dataset 
:  Validation dataset after feature extraction 
:  Coefficient vector 
:  Optimal vector in RVM 
:  th optimal parameter in RVM 
:  Most probable weight vector in RVM 
:  Inertial weight of PSO 
WPT(·):  Wavelet packet transform function 
:  Set of ignition pattern vectors 
:  Unseen ignition pattern 
:  th data point in 
:  Predicted label vector 
:  th predicted label 
:  th predicted label in the jth test data 
:  Predicted decision 
:  Hyperparameter vector of RVM 
:  th hyperparameter of RVM 
:  Decision threshold 
:  th tentative threshold produced in optimization process 
:  Optimized threshold 
:  Decision function of binarization approach 
:  Precision 
:  Probability vector 
:  Probability of the th label 
:  Pairwise probability of the th label against the th label 
:  Covariance matrix in RVM 
:  th diagonal element of covariance matrix 
:  Logistic sigmoid function 
:  Recall 
:  Initial population 
:  Design matrix in RVM. 
Acknowledgment
The research is supported by the University of Macau Research Grant nos. MYRG075(Y2L2)FST12VCM, MYRG141(Y2L2)FST11IWF, and MYRG149(Y2L2)FST11WPK.