Abstract

Machine fault classification is an important task for intelligent identification of the health patterns for a mechanical system being monitored. Effective feature extraction of vibration data is very critical to reliable classification of machine faults with different types and severities. In this paper, a new method is proposed to acquire the sensitive features through a combination of local discriminant bases (LDB) and locality preserving projections (LPP). In the method, the LDB is employed to select the optimal wavelet packet (WP) nodes that exhibit high discrimination from a redundant WP library of wavelet packet transform (WPT). Considering that the obtained discriminatory features on these selected nodes characterize the class pattern in different sensitivity, the LPP is then applied to address mining inherent class pattern feature embedded in the raw features. The proposed feature extraction method combines the merits of LDB and LPP and extracts the inherent pattern structure embedded in the discriminatory feature values of samples in different classes. Therefore, the proposed feature not only considers the discriminatory features themselves but also considers the dynamic sensitive class pattern structure. The effectiveness of the proposed feature is verified by case studies on vibration data-based classification of bearing fault types and severities.

1. Introduction

Machine fault classification is an important task for intelligent identification of the condition patterns for the system being monitored. For a mechanical system, vibration monitoring is often employed to evaluate the system dynamics. A specific application considered in this paper is to monitor health condition of a machine or its components, such as bearings, for timely identifying possible faults, which is increasingly significant to reduce machine downtime and ensure high productivity. Once a fault happens in a machine, it makes sense to identify the fault type or the fault severity through vibration data analysis so that time and safety can be guaranteed. There are many reasons leading to machine failures. For instance, poor lubrication, acid corrosion, and plastic deformation could cause the bearing to work in an abnormal condition, respectively [1]. In addition, typical damages of the rolling bearing are located at outer raceway, inner raceway, or rolling element. To effectively monitor and recognize the machine condition, the major challenge is to extract reliable features from vibration data which are often disturbed by the environment noise. The traditional features, such as the time-domain features and the frequency-domain ones, are often applied to fault diagnosis [26]. However, the pattern of vibration signals demonstrates many nonlinear characteristics and the methods mentioned above cannot extract these nonlinear features effectively for classifying fault types and severities. Therefore, this study intends to find a good feature representation of the raw signals that yields higher discriminatory information.

Wavelet transform has the ability to well express the nonstationary signals and represent sensitive features with its multiresolution capability, which has achieved a great success in fault classification [7]. As one of the most widely used wavelet transform methods, the wavelet packet transform (WPT) is well-known for its orthogonal, complete, and local property [8]. WPT leads to a redundant binary tree of a signal with a set of time-frequency subspaces each of which is made up of a wavelet packet (WP) base vector. The whole subspaces are called a WP library. As we know, different WP bases give rise to different representation of a given signal. Thus, it is important to select optimal WP bases out of the whole WP library for enhanced signal analysis or classification. For classification, the main objective is to find an optimal set of WP nodes that yield high discriminatory information for discriminating different classes as much as possible. This can be realized by the local discriminant bases (LDB) [9]. The algorithm identifies optimal LDBs with high discriminatory information by using a dissimilarity measure on the given dataset. Many related works have been conducted in the last two decades to demonstrate the effectiveness of the LDB to achieve a good classification through selecting the optimal WP bases among various redundant WP subspaces [918]. Although the discriminatory features can be obtained by selecting WP nodes via the LDB, different node displays different sensitivity in characterizing class information. In the machine learning-based classification approach, the classification accuracy will mainly depend on the sensitive features. Current methods mainly employ the most sensitive bases for classification. We focus on another approach, which is using the dimensionality reduction techniques to mine more sensitive features in the whole set of discriminatory features by LDB. Therefore, in this study, one challenge is how to extract the most useful and sensitive information hidden in high-dimensional data based on the selected WP nodes.

In the past few decades, many useful dimensionality reduction techniques have been employed for fault diagnosis and classification [2, 3, 1927]. These techniques can be simply divided into two types: linear and nonlinear approaches. Linear dimensionality reduction aims to find a set of low-dimensional bases from high-dimensional data through the linear transformation. Two of the well-known linear learning methods are principal component analysis (PCA) [19, 20] and linear discriminant analysis (LDA) [21, 22]. The other type, nonlinear dimensionality reduction, searches for nonlinear structure hidden in high-dimensional data. There are two main nonlinear approaches including kernel-based techniques [2, 23, 24] and manifold learning techniques [2527]. Manifold learning pursuits the goal to embed data that originally lies in a high-dimensional space in a lower dimensional space while preserving local characteristic properties, for example, local geometric property (Isomap [28]), local embedding structure (LLE [29]), local adjacency relations (LE [30]), and local tangent space information (LTSA [31]). Although these nonlinear manifold learning methods have been effectively developed to machine fault classification, they need heavy computation cost and are complex to be extended for fault classification of a new data [2527, 32]. He and Niyogi [33] proposed a new linear model, locality preserving projections (LPP), which can reveal the nonlinear manifold structure embedded in the dataset with a kernel that maintains the local information. LPP is provided with the remarkable superiority that it can form an explicit map to the manifold learning algorithm, which is linear and easily operational. Some works have indicated that LPP is beneficial to feature extraction in machine fault classification [3, 24]. Hence, the LPP is employed in this study to extract the sensitive information hidden in the raw feature data from the selected WP nodes.

In this paper, based on energy features of the nodes selected by LDB algorithm from the WP library, a new effective feature is proposed to mine the nonlinear pattern information by LPP in the case of bearing fault classification. The proposed feature intends to overcome the weakness of the discriminatory WP nodes for characterizing the fault pattern in different sensitivity. Specifically, vibration signals from the bearings with different fault types and severities are firstly decomposed into the WP library and the LDB is then applied to identify the optimal WP subspaces that supply maximum dissimilarity information among them. After that, the root energy of the selected nodes constitutes a raw feature set. Due to the redundant property of the features in representing the fault pattern, some important sensitive information may be submerged among them. Therefore, the LPP is employed to extract the nonlinear sensitive pattern information embedded in the dataset. These sensitive features are finally chosen as inputs to a diagnostic classifier for characterizing bearing types and severities.

The rest of this paper is organized as follows. Section 2 describes the theoretical background and major principle of the proposed feature extraction method that combines the LDB and the LPP. In Section 3, experimental results on bearing fault classification are used to verify the effectiveness of the proposed method as compared to other traditional feature extraction methods. Finally, conclusions are provided in Section 4.

2. Theoretical Background

2.1. WPT for Signal Decomposition

The WPT is an excellent signal decomposition tool with well-known properties of being orthogonal, complete, and local [8]. In operation, the WPT utilizes a series of low-pass and high-pass filters to filter a signal being analyzed recursively. Through this way, a signal can be decomposed into a set of WP nodes with the form of a full binary tree by the WPT. Each node possesses a specific time-frequency subspace. Let denote a vector space corresponding to the node 0 of the parent tree. Then at each level the vector space is split into two mutually orthogonal subspaces by a pair of low-pass and high-pass filters. The split process can be given by where indicates the level of the tree and represents the node index in level with . This process is repeated until level , giving rise to mutually orthogonal subspaces.

Each subspace is spanned by a set of base vectors , where ( is signal length and is the maximum level of signal decomposition). The vector represents the WP base function indexed by the triplet representing scale, frequency band (oscillation), and time position, respectively.

The WP coefficients of signal can be calculated in the inner product of the signal with every WP function as follows: where denotes the th set of WP coefficients at the th scale parameter and is the translation parameter. In other words, the signal is decomposed into subspaces with coefficients in each subspace.

The signal can then be expressed as where, in the index ,   corresponds to the terminal (leaf) nodes and are the base vector coefficients at position .

2.2. LDB for WP Selection

The LDB is a pruning algorithm that identifies the subspaces and their bases that exhibit high discrimination between signal classes using a given dissimilarity measure [9]. The optimal selection of LDB subspaces for a given dataset is driven by the nature of the dataset and the dissimilarity measure. Dissimilarity measure is designed to evaluate the “statistical distances” among different classes for each WP node. Numerous dissimilarity measures have been developed so far, such as relative entropy, energy difference, correlation index, and nonstationarity. In this paper, relative entropy is investigated as the dissimilarity measure in identifying optimal WP subspaces.

The LDB algorithm is used to identify the WP nodes that exhibit high discrimination, indicated by large statistical distance between classes. A set of training signals for all classes are decomposed into full binary WP trees of order . Let each of the signals in the training set be denoted by , where the index and correspond to the th training signal in the th class. The WP tree is pruned by the LDB algorithm in such a way that a node is split if the cumulative discriminative measure of the children nodes is greater than that of the parent node. In other words, a node is split only if the children nodes have better discriminative power than that of the parent node. As a result, the process will end with a subset of terminal WP nodes that contribute to maximizing the statistical distance between different classes.

Mathematically, the LDB selection process is described as follows. Suppose that represents the desired local discriminant base restricted to the span of , which is a set of base vectors at node, and is the array containing the discriminant measure of the same node.

LDB Algorithm. A training dataset consisting of class of signals with being the total number of training signals in class is given.

Step 1. Choose a time-frequency decomposition method, such as the WPT, to decompose the signals contained in the training dataset.

Step 2. Construct time-frequency energy maps for on the WP coefficients. Here, is calculated by accumulating the squares of expansion coefficients of the signals at each position followed by a normalization with respect to the total energy of all the training signals belonging to class as follows: where ,  ,  .

Step 3. Set , where is the base set spanning subspace of node and then evaluate for . Let ; for multiple class problems, the dissimilarity measure based on relative entropy is expressed as

Step 4. Determine the best subspace for , by the following rule:set ;if , that is, if discriminatory power of a parent node in WP tree is greater than those of children nodes;then ;else and set .

Step 5. Order the chosen base functions by their power of discrimination.

Step 6. Use (normally much less than ) most discriminant base functions for constructing classifiers.

After Step 4 is performed, a complete set of orthogonal bases are constructed. Orthogonality of the bases ensures that wavelet coefficients used as features during classification process are uncorrelated as much as possible. Subsequently, one can use all the WP coefficients from each of the terminal nodes of the pruned tree or just use their subset with highest discriminant bases in Step 6 or employ a statistical method to produce low-dimensional features as the input features of a classifier for discriminating different classes. In this paper, the WP coefficients of the selected optimal WP nodes are taken for calculating the root energy contained in each node. Mathematically, for the WP coefficients , , from each node , their root energy is calculated as Totally, the root energy values of all of the selected nodes are put together to formulate a vector denoted by (where is the subscript set of the selected WP nodes ) which is conveniently called the LDB feature. The set of root energy values contained in the WP nodes at the final level of the WPT is called WPT feature in this paper and denoted by .

2.3. LPP for Feature Pattern Mining

In this section, we briefly describe the LPP algorithm of learning a locality preserving subspace from a high-dimensional data containing the sample values of a feature vector or . Let denote a data matrix, representing a set of -dimensional samples of size with zero mean. Now, consider the problem of representing the data matrix by a single vector such that represents . We will thus find a linear mapping, denoted by a transformation vector , from the -dimensional space to a one-dimensional space, so that . LPP is a technique that seeks to preserve the intrinsic geometry of the data and local structure. The criterion of the objective function for choosing a map of the LPP is as follows: where is the one-dimensional representation of and the matrix is a similarity matrix.

A possible way of defining is as follows: where parameter and defines the radius of the local neighborhood and is sufficiently small but bigger than 0. Two samples and are viewed within a local -neighborhood provided that .

The objective function in (7) with the choice of symmetric weights will be heavily penalized if neighboring points and are mapped far apart, that is, if is large. Therefore, minimizing the objective function is to ensure that if and are close, then and are close as well. Based on this point, the local structure of the input data can be preserved. Following some algebraic steps, we can get where is a diagonal matrix; its entries are column (or row since is symmetric) sums of , . is the Laplacian matrix. The bigger the value (corresponding to ) is, the more important is. Therefore, the LPP algorithm imposes a constraint as follows:

Then, the minimization problem reduces to finding

The transformation vector that minimizes the objective function is finally given by the minimum eigenvalue solution to the generalized eigenvalue problem: where the matrices and are symmetric and positive semidefinite. The top several projective vectors that minimize the objective function are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold, so they are capable of discovering the nonlinear manifold structure [33]. In this paper, the top several projective vectors are chosen as the mapping vectors to represent the LDB feature. They characterize the inherent class pattern and thus are hoped to mine the useful sensitive features for classification.

2.4. Proposed Feature Extraction Scheme for Data Classification

In the techniques mentioned above, the LDB and the LPP techniques have specific merits for classification. Specifically, the LDB algorithm focuses on identification of optimal decomposition subspaces for discriminatory feature extraction, while the LPP addresses the nonlinear pattern structure that represents the inherent condition class pattern. In other words, the LDB focuses on extraction of optimal raw features but each feature characterizes the class pattern in different sensitivity or local sensitivity, while LPP mainly addresses mining inherent class pattern feature embedded in the raw features. Therefore, this paper is proposed to combine the merits of these two techniques for a novel feature extraction. Specifically, the novel feature addresses extracting the inherent pattern structure embedded in the optimal WP nodes. Therefore, the proposed feature not only considers the static discriminatory WP node features themselves but also considers the dynamic sensitive class pattern structure embedded in the samples.

The idea of the proposed feature is illustrated in Figure 1. It can be found that although the optimal WP nodes (filled in black in Figure 1) have been selected through the LDB algorithm, they have different sensitivity in characterizing the class pattern. However, after conducting the LPP algorithm on the feature values, a new sensitive feature that clearly represents the class pattern is effectively extracted. In this process, the sensitive feature characterizes a nonlinear class pattern manifold embedded in sample values of the raw features. This indicates that LPP is beneficial to improve the class sensitivity of the selected discriminatory features. Therefore, this new kind of feature can be well used to vibration data-based machine fault classification.

Based on the principle of combination of LDB and LPP, the proposed feature extraction algorithm can be then described as follows.

LDB-LPP Feature Extraction Algorithm. A training dataset consisting of class of signals and a testing dataset are given.

Step 1. Conduct the WPT to decompose the signals contained in the dataset into the WP library with level via (2).

If the signal is from the training dataset, go to Step 2.

Else, if the signal is from the testing dataset, then go to Step 3.

Step 2. Conduct the LDB algorithm to identify the optimal WP nodes that supply maximum dissimilarity information among the training dataset.

Step 3. Calculate the root energy of the coefficients of selected WP nodes to constitute a raw feature set via (6).

If the signal is from the training dataset, go to Step 4.

Else, if the signal is from the testing dataset, then go to Step 5.

Step 4. Conduct the LPP algorithm to the raw feature value sets of the training dataset to obtain the mapping matrix through solving (12); then go to Step 5.

Step 5. Use the mapping matrix in Step 4 to calculate the new feature values of the dataset.

The proposed features have the most sensitive discriminatory capability and are thus chosen as inputs to a diagnostic classifier for characterizing data classes. To make it clearer, the flowchart of the proposed algorithm is shown in Figure 2 as well as the scheme of machine fault classification. The machine fault classification scheme includes two parts: the LDB-LPP feature values are firstly extracted for both the training and testing signals and then a diagnostic classifier is trained for classification of the fault signals.

3. Experimental Results and Analysis

In order to evaluate the effectiveness of the feature extraction scheme proposed above for machine fault classification, the bearing data with multiple faults from real bearing experiments are analyzed in this study.

3.1. Experimental Dataset

The experimental data are from Case Western Reserve University Bearing Data Center [34]. The experimental setup consists of four parts which are an induction motor, a dynamometer, a torque transducer, and control electronics. The resulting vibration was measured by an accelerometer being mounted to the motor housing at the drive end of the motor as illustrated in Figure 3. The accelerometer is a vibration sensor with a bandwidth up to 5000 Hz and a 1 V/g output. Single point faults of size 0.007, 0.014, 0.021, and 0.028 inches were set on the drive-end bearings by using the electric discharge machining approach. These faults were set, respectively, on rolling element, inner raceway, and outer raceway in the experiments. The sampling frequency of the data is 12 kHz with the sample length being 2000 and the motor speed was 1748 rev/min.

Datasets A and B to be analyzed consist of ten classes (class labels are marked in Table 1) covering different bearing fault types and severities as listed in Table 1. In the datasets, there are four different fault types including normal, outer-race fault, inner-race fault, and ball fault, and each of the last three fault types includes three different defect sizes of 0.007, 0.014, and 0.021 inches, respectively. In dataset A, the samples are split into 500 training ones (50 in each class) and 500 testing ones (50 in each class), while dataset B contains 250 training samples (25 in each class) and 250 testing samples (25 in each class). This is a complex ten-class problem to identify both the fault type and the fault size for the operating bearing conditions.

3.2. Feature Evaluation

In this study, the decomposition level of the WPT is set to be 6 and the Daubechies 8 wavelet is employed. The selected nodes by the LDB are shown in Figure 4. In the following study, the root energy of a signal decomposed into each selected node is calculated as the raw features in the proposed study, while that in each node at the last layer is used to form the traditional WPT feature for a comparison.

To quantitatively evaluate the capability of LDB feature in pattern classification, three common clustering evaluation metrics are analyzed as follows. The first is a widely used discriminant factor. Suppose that there is a feature vector , where is the dimension of feature; then the discriminant factor is defined as follows: where indicates the between-class scatter to describe the scattered level among different classes, while is the within-class scatter which represents the concentrated level in the same classes. These two scatters are, respectively, defined as where is the total number and is the average feature vector for samples in the th class and is the total average of the feature vectors for all classes. It can be seen that the discriminant factor is a comprehensive indicator that combines between-class scatter and within-class scatter. A larger discriminant factor is better for classification purpose to characterize the discriminating capability of the given feature.

The other two clustering evaluation metrics are the cluster accuracy (ACC) and normalized mutual information (NMI) metrics [35], which are defined as follows, respectively.

Cluster Accuracy (ACC). Assuming that and are the acquired new label and the provided real label of the given point , the ACC is defined as follows: where is the total number of the samples and is the optimal mapping function that ranges each to match the real label and can be found by the Kuhn-Munkres (KM) algorithm. Here, we assume that the relationship of the identified clusters with the predefined classes is known. Thus, it is easy to imagine that a larger ACC value indicates better clustering and generally better classification.

Normalized Mutual Information (NMI). It is a mutual information (MI) metric and defined as where is the number of the acquired samples in class and is the number of the provided samples in the ground truth class . In addition, is the number of the intersected samples between class and class . A larger NMI reveals better clustering performance, which is beneficial to classification.

For a visible purpose and a fair comparison, the dimensions of the LDB and the WPT features are both reduced to 3 by using dimensionality reduction techniques including the LPP and the traditional PCA. Note that the LDB feature followed by the LPP just generates the proposed feature in this study. We then calculated the mentioned three clustering evaluation metrics. Here, -means clustering method is applied to obtain the cluster label in the reduced 3-dimentional features before calculating the ACC and NMI. The number used in the -means clustering method is set as which is the number of the class. What is more, to realize efficient and stable convergence, we set the initial points as the intermediate point of each class in mathematics.

The neighborhood parameter of LPP is taken as 12. As an illustration, the scatter plots of the ten-class dataset A for training data are drawn in Figure 5. It can be seen that the LDB feature shows a better classification capability than the WPT features. On the other hand, it can be also found that the LPP has a much more excellent classification capability than the PCA. Therefore, the LDB-LPP shows the best classification capability in the between-class and within-class scatter performance. The extracted feature patterns are also demonstrated in Figure 6, where it can be clearly seen that the third LPP of LDB feature values characterizes a better difference for each class as compared to the third LPP of WPT feature values. Moreover, the quantitative results as listed in Table 2 also support the above statements. It can be seen that the clustering evaluation metrics , ACC, and NMI of the LDB feature are higher than those of the WPT feature, and LPP performs much better than PCA. The combination of LDB and LPP shows the most beneficial performance for classification. Moreover, the clustering evaluation of dataset B (with half the number of samples of dataset A) is also computed here as shown in Table 3. These clustering evaluation values show the same tendency and indicate that the LPP can learn a good nonlinear class pattern structure among the discriminatory LDB feature values.

3.3. Classification of Fault Types and Severities

To further evaluate the performance of the proposed feature in data classification, the ten-class datasets A and B are employed for fault classification by comparing various features. In this study, the proposed LDB-LPP feature is compared to traditional feature extraction methods including PCA, LDA, LPP, supervised LPP (SLPP) [36], LE, and LLE. Among the six methods, PCA and LPP are unsupervised linear techniques, LDA and SLPP are supervised linear techniques, and LE and LLE are nonlinear manifold learning techniques.

To emphasize the feature performance, the nearest mean classifier, one of the simplest and the most intuitive statistical classifiers, is applied for classification in this study. This classifier is based on the principle of the closest Euclidean distance and the concept of similarity that similar patterns should be assigned to the same class. In this study, the mean vector of the training data in each class is used to represent each pattern class. Patterns of samples can be distinguished according to the minimum distance criterion which means maximum similarity. Moreover, another advanced classifier, Gaussian mixture model (GMM) classifier, is also applied in this study.

The recognition accuracy of the proposed LDB-LPP feature and the other comparison features (extracted from the WPT feature without node selection) are shown in Table 4. It can be seen that the proposed LDB-LPP feature outperforms the traditional features achieved by PCA, LDA, LPP, SLPP, LE, and LLE, which verifies the benefits of the LDB for choosing discriminatory features. Moreover, the GMM classifier further improves the recognition rate of the nearest mean classifier. Note that the recognition accuracy of dataset A is generally higher than that of dataset B because the number of samples in dataset A is bigger. It can be found that the promotion of the recognition accuracy of the LDB-LPP feature in comparison with the other features becomes more obvious in dataset B than in dataset A for two classifiers. For instance, dataset A shows an average promotion 1% for testing by considering the LDB in the LPP feature extraction, while dataset B displays an average promotion 3% (for two classifiers) for testing. Figures 7 and 8 intuitively display that the proposed LDB-LPP feature performs the best among all the comparison features. In this study, the testing recognition accuracy based on LDB-LPP feature is equal to or very close to 100% for two classifiers. These results imply that the proposed LDB joint LPP feature extraction method could obtain significant achievements in improving classification accuracy.

4. Conclusions

This paper presents a feature extraction method which integrates the LDB and the LPP to explore the useful and powerful characteristics for vibration data-based machine fault classification. The LDB is used to select the most discriminant WP bases from a library of redundant and orthogonal time-frequency subspaces. The input features are produced by the selected optimal wavelet bases but they possess different sensitivity in characterizing class information. The LPP is then employed to acquire the sensitive feature that characterizes the inherent class pattern feature embedded in the raw features for a much better identification accuracy. The proposed feature extraction method combines the merits of the LDB and the LPP and thus displays valuable benefits for data classification. To verify the effectiveness of the proposed method, the vibration data representing different bearing fault types and severities are analyzed by comparing with other features extracted from the WPT feature. The experimental results for bearing fault classification indicate that the LDB-LPP feature is more effective than those feature extraction methods based on the WPT feature without base selection. The presented LDB joint LPP feature extraction method is also hoped to be well-suited to other machine fault classification, such as gears, spindles, and cutting tools, due to the excellent feature representation for the class patterns.

Moreover, the technical aspects in the proposed LDB-LPP feature extraction framework can be further improved and strengthened. First, this paper fairly compares the LDB feature and WPT feature in the same decomposition level, which can validate the benefits of the LDB in data classification. However, how to select the well-suited decomposition level in the LDB is still an open issue in the further study. Second, LPP is a typical and effective feature extraction method which obtains the manifold structure in a linear projection. Although the LPP has been successfully used to overcome the weakness of the LDB in this study, it is meaningful to apply the new well-performed manifold learning methods instead of LPP in the proposed framework to further enhance the performance of data classification. This should also depend on how complex the data to be analyzed is. Other possible applications on complex classification remained to be studied in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 51005221), the Research Fund for the Doctoral Program of Higher Education of China (Grant no. 20103402120017) and the Program for New Century Excellent Talents in University, China (Grant no. NCET-13-0539). The authors would like to thank Case Western Reserve University for offering free download of the bearing data and the anonymous reviewers for their constructive and valuable comments.