Abstract

A novel feature extraction and selection scheme is presented for intelligent engine fault diagnosis by utilizing two-dimensional nonnegative matrix factorization (2DNMF), mutual information, and nondominated sorting genetic algorithms II (NSGA-II). Experiments are conducted on an engine test rig, in which eight different engine operating conditions including one normal condition and seven fault conditions are simulated, to evaluate the presented feature extraction and selection scheme. In the phase of feature extraction, the transform technique is firstly utilized to convert the engine vibration signals to time-frequency domain, which can provide richer information on engine operating conditions. Then a novel feature extraction technique, named two-dimensional nonnegative matrix factorization, is employed for characterizing the time-frequency representations. In the feature selection phase, a hybrid filter and wrapper scheme based on mutual information and NSGA-II is utilized to acquire a compact feature subset for engine fault diagnosis. Experimental results by adopted three different classifiers have demonstrated that the proposed feature extraction and selection scheme can achieve a very satisfying classification performance with fewer features for engine fault diagnosis.

1. Introduction

Engine is one of the core mechanical components in a wide range of industrial applications. Detecting and identifying the faults of engine at an early stage while the machine is still on operation can help to avoid serious accidents and reduce economic losses. For this reason, various intelligent techniques such as artificial neural networks (ANN), support vector machine (SVM), relevance vector machine (RVM), and extreme learning machines (ELM) have been successfully applied to automated detection and diagnosis of engine conditions over the past few years [16].

For any intelligent fault diagnosis systems, feature extraction and feature selection schemes can be regarded as the two most important steps. Feature extraction is a mapping process from the measured signal space to the feature space. Representative features associated with the conditions of machinery components should be extracted by using appropriate signal processing and calculating approaches. Various techniques, such as wavelet analysis, empirical mode decomposition, multivariate statistics, morphological pattern spectrum, fractal theory, and nonnegative matrix factorization, have been applied in engine fault diagnosis in the past few years [711]. In this work, a new feature extraction approach based on transform [12] and two-dimensional nonnegative matrix factorization (2DNMF), which has been used for bearing fault diagnosis in our earlier work [13], is employed for engine fault diagnosis. To the best of our knowledge, this feature extraction has never been utilized in engine fault diagnosis. The main advantage of 2DNMF to NMF is that it does not need to transform the 2D matrix to 1D vector, which will lead to a high-dimensional vector space and loss of some structure information hiding in original 2D matrices. Thus, the 2DNMF is capable of characterizing the time-frequency representations more effectively and efficiently for engine fault diagnosis.

Filter and wrapper methods can be regarded as the two main categories of feature selection approaches in the literatures. Several researches have been done by utilizing the filter method or wrapper method for mechanical fault diagnosis utilizing Fisher’s criterion [14], distance evaluation technique [15, 16], decision tree [17, 18], and evolution algorithm (EA) combined with ANN and SVM [1924]. They have largely improved the efficiency and accuracy of mechanical fault diagnosis in practice. Despite the successive applications of filter or wrapper methods in fault diagnosis, it should be noted that the wrapper and filter methods can complement each other. The filter methods can search through the feature space efficiently but usually fail to obtain a good accuracy, while the wrappers can provide good accuracy but require much computation time. Therefore, it is very desirable to combine the filter and wrapper methods to achieve high efficiency and accuracy simultaneously [2528]. In our earlier works [29, 30], the hybrid feature selection scheme combining filter and wrapper methods based on ReliefF, genetic algorithm (GA), mutual information, and nondominated sort genetic algorithm (NSGA-II) has been investigated for gear fault diagnosis. However, to the best of our knowledge, these schemes have not yet been used for engine fault diagnosis. Therefore, in this work, the hybrid feature selection scheme based on the mutual information and the improved nondominated sort genetic algorithm (NSGA-II) is employed and verified for engine fault diagnosis. Otherwise, a more detailed demonstration of the above-mentioned feature extraction and selection scheme, which has not been demonstrated in earlier works, is given in this work.

Experiments are conducted on an engine test rig, in which eight different engine operating states including normal state and seven fault states are simulated in the experiments. The performance of the proposed feature extraction and selection scheme is testified on the engine dataset. Experimental results have demonstrated the superiority of the proposed intelligent scheme on computation cost and classification accuracies in engine fault diagnosis. Figure 1 illustrates the flowchart of the intelligent engine fault diagnosis scheme presented in this work.

The remainder of this work is organized as follows. Section 2 describes the feature extraction method based on the transform and 2DNMF. In Section 3, the hybrid filter and wrapper feature selection scheme based on mutual information and NSGA-II is detailed. Section 4 presents the application results of the proposed feature extraction and selection scheme for engine fault diagnosis. The conclusions of this paper are summarized in Section 5.

2. Feature Extraction Based on Transform and Two-Dimensional Nonnegative Matrix Factorization

2.1. Transform

The transform, put forward by Stockwell et al. in 1996, can be regarded as an extension to the ideas of the Gabor transform and the wavelet transform. The transform of signal is defined aswhereThen the transform can be given by combining (1) and (2):Since transform is a representation of the local spectra, Fourier or time average spectrum can be directly obtained by averaging the local spectra aswhere is the Fourier transform of .

The inverse transform is given by

The main advantage of the transform over the short-time Fourier transform (STFT) is that the standard deviation is actually a function of frequency . Consequently, the window function is also a function of time and frequency. As the width of the window is controlled by the frequency, it can obviously be seen that the window is wider in the time domain at lower frequencies and narrower at higher frequencies. In other words, the window provides good localization in the frequency domain for low frequencies while providing good localization in time domain for higher frequencies. It is a very desirable characteristic for accurate representation of nonstationary vibration signals in time-frequency domain.

2.2. Nonnegative Matrix Factorization (NMF)
2.2.1. Nonnegative Matrix Factorization

The NMF algorithm is a technique that compresses a matrix into a smaller number of basis functions and their encodings [31]. The factorization can be expressed as follows:where denotes an matrix and is the number of examples in the dataset, each column of which contains an -dimensional observed data vector with nonnegative values. This matrix then approximately factorized into matrix and matrix . The rank of the factorization is usually chosen such that , and hence the compression or dimensionality reduction is achieved. The key characteristic of NMF is the nonnegativity constraints imposed on the two factors, and the nonnegativity constraints are compatible with the intuitive notion of combining parts to form a whole.

2.2.2. Two-Dimensional Nonnegative Matrix Factorization (2DNMF)

The key difference between 2DNMF and NMF is that the former adopt a novel representation for original time-frequency representations. In traditional NMF, the 2D time-frequency matrices must be previously transformed into 1D vector. The resulting vectors usually lead to a high-dimensional vector space, where it is difficult to find good bases to approximately reconstruct original time-frequency distributions. That is also called the “curse of dimensionality” problem, which is more apparent in small-sample-size cases. Another disadvantage of NMF is that such a matrix-to-vector transform may cause the loss of some structure information hiding in original 2D time-frequency representations [13].

Assume matrix denotes the time-frequency representation of th training sample signal acquired from engine. In traditional NMF, a 2D time-frequency representation is first transformed into a 1D vector, and then the training databases are represented with matrix , each column of which contains nonnegative values of one of the time-frequency representations. In 2DNMF, however, it never transforms the 2D time-frequency representations into their corresponding 1D vector. Instead, it will use a more straightforward way which views a time-frequency representation as a 2D matrix.

The procedure of 2DNMF method consists of two successive stages. At first it aligns training TFR matrices into a matrix = , where each denotes one of the TFR matrices. Similar to NMF, 2DNMF first finds nonnegative matrix and nonnegative matrix such that

Here and are the basis functions and encoding coefficients, respectively. Since each column of corresponds to a column of original TFR matrix, is also named as column bases.

The second stage of 2DNMF involves computing the row bases. In this stage, it first constructs a new matrix , where each denotes the transpose of . Similarly, 2DNMF seeks a nonnegative matrix and a nonnegative matrix such that

Here and are the basis functions and encoding coefficients, respectively.

By now the dimensional column bases and the dimensional row bases are obtained. A new representation of TFR projected to the column bases and row bases can be denoted aswhere is a matrix, which can be regarded as a reduced dimension representation of TFR and can be used as features for fault diagnosis of engine states.

3. Hybrid Filter and Wrapper Feature Selection Scheme Based on Mutual Information and NSGA-II

3.1. Hybrid Filter and Wrapper Feature Selection Scheme

Filters and wrappers are the two main categories of feature selection algorithms in the literatures. Filter methods evaluate the goodness of the feature subset by using the intrinsic characteristic of the data. They are relatively computationally cheap since they do not involve the induction algorithm. However, they also take the risk of selecting subsets of features that may not match the chosen induction algorithm. Wrapper methods, on the contrary, directly use the classifiers to evaluate the feature subsets. They generally outperform filter methods in terms of prediction accuracy, but they are generally computationally more intensive [3235]. In summary, wrapper and filter methods can complement each other, in that filter methods can search through the feature space efficiently while the wrappers can provide good accuracy. It is desirable to combine the filter and wrapper methods to achieve high efficiency and accuracy simultaneously.

In this work, the hybrid feature selection combining filter and wrapper feature selection technique based on the mutual information and the improved nondominated sort genetic algorithm (NSGA-II) is employed [30]. In this hybrid filter and wrapper feature selection scheme, there are two steps involved. In the first stage, a candidate feature subset is chosen according to the max-relevance and min-redundancy (mRMR) criterion based on mutual information from the original feature set. Then at the second stage, classifier combined with NSGA-II is adopted to find a more compact feature subset from the candidate feature subset. In this stage, feature selection problem is defined as a multiobjective problem dealing with two competing objectives, mean lesser features and lower classification error rate.

3.2. Filter Method Based on Mutual Information
3.2.1. Mutual Information

Mutual information is one of the most widely used measures to define relevancy of variables [36]. In this section, the feature selection method based on mutual information is focused on. Given two random variables and , their mutual information can be defined in terms of their probabilistic density functions , , and :

The estimation of the mutual information of two variables was detailed in [36].

In supervised classification, one can view the classes as a variable (named ) with possible values (where is the number of classes of the system) and the feature component as another variable (named ) with possible values (where is the number of parameters of the system). So one will be able to compute the mutual information between the classes and the feature :

Then the informative variables with larger can be identified. A more compact feature subset can be obtained via selecting the best features based on (12) from the original feature set.

Equation (12) provides us with a measure to evaluate the effectiveness of the “global” feature that is simultaneously suitable to differentiate all classes of signals. For a small number of classes, this approach may be sufficient. The more signal classes, the more ambiguous .

3.2.2. Max-Relevance and Min-Redundancy

Max-relevance means that the selected features are required, individually, to have the largest mutual information . It means that the best individual features should be selected according to this criterion. It can be represented aswhere denotes the number of features contained by .

However, it has been proved that the simple combination of the best individual features does not necessarily lead to a good performance. In other words, “the best features are not the best features” [36, 37]. The most important problem of the max-relevance is that it neglects the redundancy between features and may cause the degradation of the classification performance.

So the min-redundancy criterion should be added to the selection of the optimal subsets. It can be represented as

The criterion combining the above two constraints is called the “maximal relevance minimal redundancy” (mRMR) [36]. The operator is defined to optimize and simultaneously:

3.2.3. Candidate Feature Subset Obtained Based on Max-Relevance and Min-Redundancy

In practice, greedy search methods can be used to find the near-optimal features by . Let be the original feature sets and let be the selected subsets. Suppose that already has , which means it has selected features. The next work is to select feature from the set . This is done according to the following criterion:

The main steps can be represented as follows.

Step 1. Let be the original feature set and let be the selected subset. is initialized to be an empty subset, .

Step 2. Calculate the relevance of individual feature with the target class , denoted by .

Step 3. Find the feature which has the maximum relevance:

Let , .

Step 4. This step consists of the following:
for Let , , find the according to the following criterion:Let , end

In this way, sequential feature subsets can be obtained and satisfy . In practice, the first features can be selected as the feature subset , which is the candidate feature subset for wrapper methods.

3.3. Wrapper Method Based on NSGA-II
3.3.1. A Brief Review on NSGA-II

The presence of multiple objectives in practical problems has given rise to the rapid development of multiobjective evolutionary algorithms over the past few years. Nondominated sorting genetic algorithm (NSGA), which was suggested by Goldberg and implemented by Srinivas and Deb [38], has been proved to be an effective approach for multiobjective optimization problems. However, NSGA is still suffering three main drawbacks: the high computational complexity of nondominated sorting, lack of elitism, and requirement for specifying the sharing parameter.

As an improved version of the NSGA, NSGA-II was introduced by Deb et al. in 2002 [39]. The NSGA-II overcame the original NSGA defects by introducing the fast nondominated sorting algorithm to alleviate computational complexity, the elitist-preserving mechanism to speed up the evolution, and the crowded comparison operator to avoid specifying the sharing parameter. It has been verified that the NSGA-II is able to maintain a better spread of solutions and converges better in the obtained nondominated front compared to other similar elitist multiple objectives evolution algorithms (MOEAs). More details about the description and implementation can be found in [39].

3.3.2. Wrapper Feature Selection Using NSGA-II

In most cases of conventional wrapper methods for fault diagnosis, the feature selection problem was formulized as a single objective problem [19, 21, 22, 24, 40]. However, the feature selection is inherently a multiobjective problem, which deals with two competing objectives: the feature dimension and the classification accuracy. An optimal feature set has to be of a minimal number of features and has to produce the minimum classification error rate.

In this work, the feature selection problem for engine fault diagnosis is formulated to be a multiobjective problem. The NSGA-II mentioned above is utilized to optimize the two objectives: the minimal number of features and minimum classification error rate. A step-by-step procedure for solving the feature selection problem by utilizing the NSGA-II is illustrated in Figure 2.

3.3.3. Implementation Issues for Wrapper Feature Selection Using NSGA-II

For wrapper feature selection approach, there are several factors for controlling the process of NSGA-II while searching the suboptimal feature subsets for classifiers. To apply NSGA-II to feature selection, the following issues are focused on.

(1) Fitness Functions. Two competing objectives are defined as the fitness functions: the first was minimization of the number of used features and the second was minimization of the classification error rate. Three popular classifiers, that is, nearest neighbor classifier (NNC) [41], Naïve Bayes classifier (NBC) [42], and least-square support vector machine (LS-SVM) [43], are employed as induction algorithms to implement and evaluate the proposed feature selection approach. The NNC and NMC are implemented by utilizing the MATLAB Toolbox for Pattern Recognition (PRTools 4.1) [44]. The LS-SVM was implemented by the LS-SVMlab1.5, which can be downloaded from [45].

(2) Encoding Scheme. The binary coding system is used to represent the chromosome in this investigation. For chromosome representing the feature subsets, the bit with value “1” represents the feature being selected, and “0” indicates that feature is not selected, as shown in Figure 3.

(3) Genetic Operators. Genetic operator consists of two basic operators, that is, crossover and mutation. The used crossover technique is the uniform crossover consisting of replacing genetic material of the two selected parents uniformly in several points. The mutation operator used in this work is implemented as conventional mutation operator operating on each bit separately and changing randomly its value.

4. Results and Discussion

4.1. Engine Dataset Description

To evaluate the performance of the presented feature extraction and selection scheme, experiments are carried out on F3L912 engine with three cylinders to measure the vibration signals. Eight engine running conditions, including one healthy engine state and seven faulted states as summarized in Table 1, are tested in the experiments. All the valve defects are set on the first cylinder. The running speed is set to be 1200 rpm.

Vibration data is collected using accelerometers, which are attached to the cylinder head near first cylinder with magnetic bases. The sample frequency is 20 K. A working cycle of vibration signals is recorded as a sample, which include 4096 sample points. Forty samples are recorded for every condition of engine, where there are totally 320 samples. Figure 4 shows the waveform of vibration signal from the eight engine states.

4.2. Feature Extraction Based on Transform and 2DNMF

The transform is utilized to convert the vibration signals from time domain to time-frequency domain, which can provide more discriminative information on engine working states. Figure 5 illustrates the time-frequency representations of eight engine states obtained by transform.

For each sample of the engine, a 2048 × 4096 time-frequency matrix can be obtained based on the transform. The size of the matrix is too large for processing. So the matrix is firstly segmented to 256 × 512 blocks, where every block consists of an 8 × 8 submatrix. The mean value of the submatrix is calculated to represent the block. In this way the time-frequency matrix reduced to 256 × 512. The new matrix could provide enough information for classification of engine faults. But it is still not realistic to take all the elements of time-frequency distribution as features, where the dimension could be 131072. For this reason, it is very necessary to compress the feature dimensions from the time-frequency matrix. The 2DNMF, as described in Section 2.2, is used to characterize the time-frequency representations (TFRs) for engine fault diagnosis.

Based on the transform, 320 TFRs in total can be obtained. Forty TFRs, five samples for each engine state, are selected as training samples to calculate the column basis matrix and row basis matrix for 2DNMF. The parameters and of the 2DNMF are both set to be 10. Figures 6 and 7 have shown the column basis matrix and row basis matrix obtained by 2DNMF for characterizing the TFRs of engine.

Then by mapping each TFR onto column basis matrix and row basis matrix based on (9), a 10 × 10 mapping coefficient matrix can be obtained. The feature matrices of forty training TFRs are demonstrated in Figure 8, in which every row represents one engine state.

It can be found that the feature matrices obtained by 2DNMF can distinguish the eight engine operating states very effectively. The 100 elements of the feature matrix can be used as the original feature subset for engine fault diagnosis. The original feature subset is set to be .

4.3. Feature Selection Based on Mutual Information and NSGA-II

Although the original feature subset obtained by 2DNMF has shown to be effective in identifying the eight engine states, the dimension of the feature subset can be reduced and the classification accuracy can be improved via feature selection methods. In this subsection, the hybrid filter and wrapper scheme based on mutual information and NSGA-II is employed to find a more compact feature subset for engine fault diagnosis.

In the first stage, the filter method based on mutual information is used to obtain a candidate feature subset. Figures 9 and 10 show the relevance values of the 100 features and the redundancy values between the 100 features calculated by mutual information.

According to the relevance values and redundancy values calculated above, 100 sequential feature subsets can be obtained based on the mRMR criterion described in Section 3.2. The first 50 features, meaning half of the original feature subset, are selected as the candidate feature subset for wrapper methods in this work. This feature subset is denoted as .

In the second stage, the wrapper method based on NSGA-II is used to find a more compact feature subset based on the feature subset . An experiment is also conducted on applying the wrapper method directly on the original feature subset for a comparison with the two-stage feature selection scheme.

Three classifiers, NNC, NBC, and LS-SVM, as mentioned in Section 3.3, are employed to evaluate the presented two-stage feature selection scheme for engine fault diagnosis. The NNC classifier is just selected to illustrate the wrapper method based on NSGA-II algorithm. For every chromosome created by NSGA-II, the dimension of features selected by this chromosome and the classification error rate based on NNC classifier are regarded as the fitness functions. Other parameters of NSGA-II for feature selection are summarized in Table 2.

Figures 11 and 12 show the distributions of the solutions over the objective plane utilizing the NSGA-II and NNC based on the original feature subset and the candidate feature set , respectively. In each figure, the solutions obtained by NSGA-II at generations 10, 20, 40, and 60 are shown. In the figures, all the solutions are marked as blue asterisks and the Pareto optimal solutions are marked with red circles. It can be observed that, with the increasing of generation, the Pareto fronts move toward the ideal solution, meaning the lower dimension of features and lower classification error rate, for the feature selection problem.

Furthermore, it can be obviously noted from Figures 11 and 12 that the hybrid feature selection scheme which conducts wrapper method on obtained better solutions than the wrapper method directly conducted on the original feature subset . The hybrid feature selection scheme achieved a similar or lower classification error rate with much less features than the wrapper method.

4.4. Classification Performances of the Different Feature Subsets

In this section, the classification performance of four different feature subsets, the original feature subset (), the feature subset obtained by filter method based on mutual information (), the feature subset obtained by wrapper method based on NSGA-II (), and the feature subset obtained by the hybrid filter and wrapper method (), is evaluated and compared. The computation time, feature subset dimensions, and the classification accuracy of different feature subsets based on three classifiers are shown in Table 3. All the experiments are conducted on a personal computer with 2.93 GHz CPU and 512 M memory. The software used is MATLAB with version 7.1.

It can be found from Table 3 that the original feature subset , which consists of all the 100 features, obtained satisfactory classification performances by using all the three classifiers for engine fault diagnosis. It verifies the effectiveness of the presented feature extraction scheme based on transform and 2DNMF.

However, it also can be observed that the classification accuracies of are the lowest in the four feature subsets. It ascertains our assumption that there exist many irrelevant and redundant features, which will decrease the performances in the original feature subset. A feature selection procedure is indispensable before classification.

The performances of the feature subset showed to be better than . Otherwise, the dimension of is half of the original feature subset. However, the classification accuracies of are inferior to and obviously.

The feature subsets and , which are obtained by wrapper method and hybrid filter and wrapper method, have demonstrated similar classification accuracies in our case. The achieved the highest classification rates by using NNC and LS-SVM classifiers, while the gained the best performance by using NBC classifier. However, it can be found from Table 3 that the feature dimensions of are lower than . Moreover, the hybrid feature selection scheme required much less computation cost than the wrapper method. Therefore, it is very desirable to use hybrid filter and wrapper feature selection scheme to get a satisfactory refined feature subset for engine fault diagnosis.

It also can be found that very satisfactory classification performances can be achieved by the three classifiers with only 7, 9, and 9 features for engine fault diagnosis. It indicates that the presented feature extraction and selection scheme has provided a very effective and efficient approach for engine intelligent fault diagnosis.

5. Conclusion

This work has presented a new feature extraction and feature selection scheme for intelligent fault diagnosis of engine by utilizing transform, 2DNMF, mutual information, and NSGA-II. Eight different engine operating states, which are simulated on an engine test rig, are employed to evaluate the effectiveness of the proposed methods.

In the phase of feature extraction, the transform is firstly adopted to obtain time-frequency representations of vibration signals. Then the 2DNMF technique is applied to characterize the time-frequency representations. Experimental results revealed that the features extracted by transform and 2DNMF obtain a very promising performance for identifying the eight engine states.

In the phase of feature selection, the hybrid filter and wrapper scheme based on mutual information and NSGA-II is employed to obtain a more compact feature subset and higher classification accuracy. And the performance of the hybrid method and the separate filter and wrapper methods is also compared. Experimental results have shown that the hybrid feature selection scheme achieved very promising performances with very few features for engine fault diagnosis. The faults classification accuracies of three different classifiers using the selected features by the presented scheme are consistently higher than those using original feature subset and feature subsets obtained by other feature selection methods. The dimension of the feature subset obtained by the hybrid feature selection scheme is lower than the filter or wrapper method. Furthermore, the computation cost of the hybrid feature selection is much less than the wrapper method.

This research demonstrates clearly that the presented feature extraction and feature selection scheme has great potential to be an effective and efficient tool for the fault diagnosis of engine and can be easily extended to be applied to other machineries.

Competing Interests

The authors declare that they have no competing interests.