#### Abstract

This paper presents a comprehensive multifault diagnosis methodology for incipient rolling element bearing failures. This is done by combining a wavelet packet transform- (WPT-) based kurtogram and a new vector median-based feature analysis technique. The proposed approach first extracts useful features that are characteristic of the bearing health condition from the time domain, frequency domain, and envelope power spectrum of incoming acoustic emission (AE) signals by using a WPT-based kurtogram. Then, an enhanced feature analysis approach based on the linear discriminant analysis (LDA) technique is used to select the most discriminant bearing fault features from the original feature set. These selected fault features are used by a Naïve Bayes (NB) classifier to classify the bearing fault conditions. The performance of the proposed methodology is tested and validated under various bearing fault conditions on an experimental test rig and compared with conventional state-of-the-art approaches. The proposed bearing fault diagnosis methodology yields average classification accuracies of 91.11%, 96.67%, 98.89%, 99.44%, and 98.61% at rotational speeds of 300, 350, 400, 450, and 500 rpm, respectively.

#### 1. Introduction

For the past several decades, the development of reliable fault diagnosis systems to accurately detect and classify various bearing faults has been at the heart of research in the field of machine condition monitoring for preventive and predictive maintenance. These fault diagnosis systems aim to accurately detect bearing faults at the early stages of their development in order to prevent potential breakdown of industrial machines, thereby improving their reliability and reducing maintenance costs.

Several signal processing-based methods have been developed for fault diagnosis of rolling element bearings. These methods extract characteristic fault features from the inherently nonstationary signals obtained from faulty bearings. Antoni [1] presented the kurtogram, which has proven to be a powerful method for characterizing and extracting hidden nonstationary features in bearing fault signals [2–4]. The kurtogram method relies on spectral kurtosis, which was originally proposed by Dwyer [5], to detect the presence of nonstationary transients and accurately indicate their location in the frequency bands of bearing signals. Recently, in attempts to enhance the performance of the kurtogram method for fault diagnosis, many researchers have integrated the kurtogram either with the short-time Fourier transform (STFT) [6] or with multirate filter banks (MRFB) [1]. These approaches, however, have yielded little improvement in extracting transient characteristics and have rendered kurtogram analysis more sensitive to irrelevant impulsive components [7]. These shortcomings have been addressed by introducing enhanced variations of the kurtogram, such as the one proposed by Shi et al. [8], which uses the complex Morlet wavelet transform. The complex Morlet wavelet transform, however, has its own limitations. These include its high computational complexity and its inability to effectively separate the high frequency components, which usually carry the characteristic fault information. In this paper, we propose a way to improve the kurtogram-based method by using the envelope power spectrum and the wavelet packet transform (WPT). The wavelet packet transform (WPT) is computationally more efficient and highly effective in frequency decomposition. The proposed kurtogram is performed by computing kurtosis values from the envelope power spectrum of each subband signal at different levels of decomposition. The subband signal that yields the highest spectral kurtosis value is selected for extracting the features that are characteristic of the bearing fault conditions.

In addition to feature extraction techniques, feature analysis methods have also been widely studied in machine fault diagnosis in order to prevent the degradation of classification performance that is caused by redundant information in the feature space [9–13]. Dimensionality reduction techniques have been used to identify the features that effectively represent the high-dimensional data in a lower-dimensional space while retaining the intrinsic information of the bearing defects; these eventually lead to improvements in the classification performance of the fault diagnosis system. Among various dimensionality reduction methods, principle component analysis (PCA) [14], independent component analysis (ICA) [15], and linear discriminant analysis (LDA) [16] are the most popular methods that have been effectively utilized in machinery fault diagnosis. For instance, Widodo and Yang [11] applied the PCA technique to machine health prognostics to extract population characteristics from the available condition monitoring data. Guo et al. [12] investigated the performance of an ICA algorithm in separating the envelopes of vibration signals to extract the weak impulsive features of incipient faults in rolling element bearings. In their study looking into fault diagnosis in ball bearings, Harmouche et al. [13] demonstrated the use of an LDA technique to effectively discriminate the rather weak and otherwise difficult to detect/identify spectral features extracted from the envelope spectra of vibration signals. The LDA feature space yields better separation between different classes of bearing faults than conventional feature analysis methods, thereby resulting in improved classification performance. PCA and ICA are unsupervised methods that ignore the label information during dimensionality reduction, whereas LDA is a supervised method that utilizes the class label information to find an optimal low-dimensional representation of the original feature set while also preserving discriminant information between classes. As a result, LDA yields better classification results as compared to PCA and ICA [9, 13, 14]. This paper presents a new LDA-based feature analysis method that uses a vector median-based discriminant criterion (VMDC) to characterize the intraclass compactness and interclass separability of the feature space in order to select the most discriminative subset of bearing fault features.

This paper proposes a reliable multifault diagnosis scheme for rolling element bearings that combines an improved WPT-based kurtogram and the VMDC-based feature analysis methods. The contributions discussed in this paper can be summarized as follows:(1)An improved WPT-based kurtogram is proposed for extracting the most characteristic fault features from nonstationary acoustic emission (AE) signals. The kurtogram is performed by computing the spectral kurtosis values for the envelope power spectrum of each subband signal at different levels of decomposition. The subband signal that yields the highest spectral kurtosis value is selected for extracting the fault features.(2)An efficient VMDC-based discriminative fault feature analysis approach is proposed for selecting the most discriminative subset of fault features. A vector median-based discriminant criterion is presented to minimize the intraclass compactness and maximize the interclass separability of the feature space.(3)A comprehensive multiclass fault diagnosis methodology for rolling element bearings is presented, and its performance is validated using AE data for various single and compound bearing defects acquired under different simulated crack sizes and bearing rotational speeds.

The remainder of the paper is structured as follows. Section 2 describes the overall bearing fault diagnosis methodology, Section 3 introduces the self-designed fault simulator (including the data acquisition system and seeded bearing defects for collecting fault signals), and Section 4 shows the experimental results. Finally, Section 5 concludes this paper.

#### 2. Proposed Bearing Fault Diagnosis Methodology

The proposed comprehensive bearing fault diagnosis scheme consists of four main processes including signal preprocessing, feature extraction, feature selection, and fault classification. Figure 1 illustrates the overall flowchart of the proposed methodology.

##### 2.1. Preprocessing

The performance of wavelet kurtogram analysis can be improved if the variations between the peaks and flanks of the spectral energy are minimized as shown in Table 6; this can be achieved by preprocessing the AE signal before feature extraction [17]. In this study, the prewhitening technique is used to reduce the spectral energy variations in the incoming AE fault signal. To obtain the prewhitened signal, an autoregressive (AR) model is utilized, which is defined as follows:where is the incoming bearing signal, are the AR coefficients, denotes the order of the AR model, and is the residual signal representing a spectrum close to the white noise spectrum.

##### 2.2. Feature Extraction

For the purpose of extracting the characteristic fault signatures from the AE signals, this paper presents a two-step feature extraction process that works as follows.

*Step 1. *In this step, the proposed WPT-based kurtogram method is performed on the prewhitened AE signal to determine its most informative subband. The flowchart of the proposed WPT-based kurtogram is illustrated in Figure 2, and its details are provided below.

First, the prewhitened bearing signal is decomposed into a series of subbands by performing the wavelet packet transform (WPT) with five decomposition levels () using a Daubechies 2 (or db2) filter. The five-level WPT decomposition results in a total of 63 subband signals. The envelope spectrum of each subband signal is then calculated using the fast Fourier transform (FFT) to identify the characteristic bearing fault frequencies. The envelope spectrum of an AE signal obtained from a defective bearing is usually flooded with the characteristic defect frequency and its harmonics. We utilize kurtosis to measure the degree of protrusion of the envelope spectrum and then characterize the hidden bearing defect signatures. In order to do this effectively, we first determine the bearing defect frequencies, which include the ball pass frequency of the bearing outer raceway (BPFO), the ball pass frequency of the bearing inner raceway (BPFI), the ball spin frequency of the bearing roller (BSF), and the first harmonics ( in this paper) of each of these frequencies. These defect frequencies are defined as follows:where denotes the number of cylindrical rollers, is the shaft speed, is the contact angle, and and are the roller and pitch diameters, respectively. Each defect-related bandwidth ranges from to (), which can be calculated as follows:Here, is the error rate of the bearing test rig and is the cage frequency. Once the bearing fault bandwidths are identified, three different kurtosis values are calculated corresponding to the three bearing defect frequencies (i.e., BPFO, BPFI, and BSF). After obtaining the kurtosis values for the subband signals, the wavelet kurtogram is generated (Figure 3). Figure 3(a) shows the frequency-scale paving of the proposed wavelet kurtogram, while Figure 3(b) illustrates an example color kurtogram of the three bearing fault frequencies (i.e., BPFO, BPFI, and BSF), which represents the kurtosis values of all of the subbands.

Finally, for each kurtogram, the subband signal with the highest kurtosis value (i.e., the signal that has the most discriminative fault features) is selected for extracting the bearing fault signatures. In Figure 3(b), the selected subband with the highest kurtosis value is highlighted using a dashed rectangle.

*Step 2. *In this step, 10 time domain and three frequency domain statistical features are extracted from each of the selected subband signals, as shown in Tables 1 and 2, respectively. Furthermore, we calculate three root mean square (RMS) features () for the first three harmonics of the corresponding defect frequencies in the envelope spectrum of each of the selected subband signals, as described in Table 3. This results in a feature pool that consists of a total of features, where is the number of bearing fault classes, is the number of AE signals obtained for each fault type, and is the total number of features. In this case, is 10 + 3 + 3 = 16 and includes 10 time domain statistical features, three frequency domain statistical features, and three envelope spectrum RMS features (calculated using the extracted wavelet kurtogram) for each of the three bearing defect frequencies (i.e., BPFO, BPFI, and BSF).

##### 2.3. Feature Selection

In this paper, we propose a feature analysis method based on the linear discriminant analysis (LDA) technique plus a vector median-based discriminant criterion (VMDC) to find the most discriminative subset of the extracted fault features for accurate fault diagnosis. LDA is a supervised dimensionality reduction algorithm that utilizes the class label information to find an optimal linear transformation matrix that projects the original high-dimensional feature space onto the lower-dimensional representation while also preserving the information that can help discriminate among different classes. Let be a dataset, where is the original dimensionality and is the number of samples in . Each sample belongs to a class . Let be the number of samples in class and let be the total number of samples in all of the classes. Then, the intraclass scatter matrix and the interclass scatter matrix can be evaluated with (4) and (5), respectively: Here, is the mean of the samples labelled as class and is the mean of all of the samples. The LDA projects the space of the original variables onto a ()-dimensional space by maximizing the Fisher discriminant rule [18], which is defined as follows:

This discriminant rule maximizes the ratio of interclass separability to intraclass compactness by selecting a reduced number of the most discriminative components only. However, LDA-based feature analysis approaches are limited in their ability to efficiently select the optimal number of discriminative components to gain the highest classification performance. In the existing literature, there is no clear consensus on the optimal number of LDA components that yield the highest classification performance. To address this problem, we present a robust method based on the vector median approach to obtain the optimal number of LDA components that can maximize the classification performance. In order to identify the vector median of each class, this study utilizes a cumulative distance criterion using the Euclidean distance. The vector median of class is defined as the data point that yields the minimum accumulated distance and is determined as follows:

Once the vector medians of all classes are determined, the modified intraclass compactness and interclass separability values are calculated using (8) and (9), respectively: In (8), the intraclass compactness of a class is reflected by the distance from its vector median to other samples in that class. Alternatively, in (9), the interclass separability of a class from other classes is represented by the minimum distance of the vector median of that class to the vector medians of other classes. The effect of outliers in the feature space is minimized by averaging these distance measurements, which improves the discriminatory power of the proposed feature analysis method. The vector median-based discriminant criterion that is used to select the optimal number of useful LDA components is defined as follows:

##### 2.4. Fault Classification

In this study, a Naïve Bayes classifier [19] is used to classify various single and compound bearing faults. This classifier is selected due to its simplicity and high classification performance at a relatively low computational cost [19, 20]. The Naïve Bayes classifier has complexity , where is the total number of training samples and is the number of conditional features. Other classification algorithms, such as KNN, ANN, and SVM, have complexities , , and , respectively [20]. Furthermore, the Naïve Bayes classifier is adaptive in nature and does not require any fixed parameters (unlike other classification algorithms).

In this study, in order to evaluate the generalized classification performance of the proposed comprehensive bearing fault diagnosis method, -fold cross validation (-cv) [21] is employed. In -cv, the feature pool is randomly partitioned into mutual subsets ( in this study). Then, at the th iteration of -cv, one of the subsets is used as the testing dataset and the other () subsets are employed to train the Naïve Bayes classifier. The classification performance is averaged over trials of -cv. The advantage of this technique is that no matter how we divide the data every data sample is in the testing set once and is included in the training set () times. Thus, variance in the classification results is minimized. To validate the classification performance of the proposed method, this paper utilizes the average classification accuracy (ACA) and the true positive rate (TPR) as performance measures. These are defined in (11) and (12), respectively:where is the number of cross validation folds and and are the number of true positives and false negatives of class resulting in the th iteration of -cv, respectively. In addition, the number of true positives is defined as the total number of samples in class that are accurately classified as class . Alternatively, the number of false negatives indicates the number of samples in class that are not classified as .

#### 3. Experimental Setup

In this study, we used an experimental test rig developed by Intelligence Dynamics Lab (Gyeongsang National University, Korea) to validate the performance of the proposed fault diagnosis methodology, as shown in Figure 4. An acoustic emission (AE) sensor (WS type, from Physical Acoustic Cooperation) was employed to capture continuous AE signals. Table 3 describes the specifications of the data acquisition system used in this study.

The bearings used in this study were cylindrical rolling element type (FAG NJ206-3-TVP2). These bearings were engrained with different bearing defects, which are shown in Figure 5 and listed as follows: bearing-crack-on-outer-raceway (BCO), bearing-crack-on-inner-raceway (BCI), bearing-crack-on-roller (BCR), bearing-crack-on-outer-and-inner-raceways (BCOI), bearing-crack-on-outer-raceway-and-roller (BCOR), bearing-crack-on-inner-race-and-roller (BCIR), and bearing-crack-on-outer-and-inner-raceways-and-roller (BCOIR). A healthy or defect-free bearing (DFB) was used as a reference for baseline measurements. In total, eight types of bearing fault signals, including a normal defect-free bearing (as a baseline), were acquired under different rotational speeds (300, 350, 400, 450, and 500 rpm). A total of 360 AE fault signals, each 10 seconds in duration, were collected at a sampling rate of 250 KHz for each bearing defect in this study.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

#### 4. Experimental Results

##### 4.1. Data Preprocessing

In this subsection, we investigate and justify the preprocessing of the AE fault signals to improve the classification performance of our methodology. As mentioned earlier, a total of eight single and compound bearing defects were investigated in this study. Figures 6 and 7 show the feature distribution of the optimal LDA discriminant components for four datasets. These datasets were acquired at two different rotational speeds (i.e., 500 rpm and 300 rpm) and for two different bearing crack sizes (i.e., 3 mm and 12 mm). From Figure 6, it is clear that the fault features are not well separated and discrimination between different classes is poor without preprocessing. For instance, the class clusters of the 3 mm crack length and 300 rpm rotational speed datasets are heavily overlapped, which leads to higher error rate in classification. Conversely, preprocessing the original signals to obtain the prewhitened signals reduces overlapping between clusters and improves separation between their boundaries.

**(a)**

**(b)**

**(a)**

**(b)**

Likewise, Figure 7 shows that the boundaries between some class clusters of the original signals are blurred and unclear, whereas the prewhitened signals have clearer and more distinct boundaries. These results strongly suggest that prewhitened signals are able to strongly project the discriminative information that is hidden in the bearing signals. The variation between the peaks and flanks of the spectral energy in spectral kurtosis analysis is one reason why each class of the original signals cannot be clustered accurately. Therefore, a prewhitening process is required to improve separation between different clusters, which makes classification easier and more accurate, even with a Naïve Bayes classifier.

##### 4.2. Proposed VMDC-Based Feature Analysis Method

In order to validate the effectiveness of the proposed feature analysis method, this study compares the feature selection performance (in terms of the classification accuracy) of the proposed method and the original LDA for 10 different datasets, as shown in Figures 8 and 9. The vertical axis represents the average classification accuracy calculated for the classification results of different fault types through trials of -cv, and the horizontal axis represents the number of LDA discriminant components that are selected. There is no general consensus about the appropriate number of LDA components that always guarantees the highest classification accuracy. As evident in Figures 8 and 9, the LDA method achieves its highest classification accuracy when the number of LDA discriminant components reaches seven. Nevertheless, there are some exceptions. For example, in Figure 8 (for the dataset of a 3 mm length crack and a rotational speed of 450 rpm), the highest accuracy that can be achieved is 91.67% (with six LDA components); the highest accuracy obtained with seven LDA components is only 90.83%. Thus, the number of LDA components that can achieve the maximum classification accuracy must be determined empirically for each study. On the contrary, the proposed feature analysis approach always shows better average classification accuracies when compared with the original LDA algorithm. For instance, in Figure 9 (for the dataset of a 12 mm length crack and a rotational speed of 300 rpm), the proposed method yields a classification accuracy of up to 92.8%, clearly outperforming the LDA method with any number of components. This can be explained by the fact that our proposed method utilizes the vector-median discriminant criterion, which can adaptively select the optimal number of LDA components to maximize the ratio of interclass separability and intraclass compactness, thereby achieving the maximum discriminatory power in a low-dimensional feature space. This enables the proposed approach to always achieve the best classification performance. The analysis of these results highlights the main contributions of this study in analyzing and selecting the optimal features from the original feature pool to achieve superior classification performance as compared to conventional feature analysis algorithms. Using the LDA method, the number of LDA components that yield the best classification results must be determined empirically on a dataset to dataset basis, whereas the proposed VMDC-based approach does this adaptively.

##### 4.3. Classification Performance

In this section, we compare the classification performance of the proposed methodology with other state-of-the-art feature analysis approaches (i.e., PCA, ICA, and LDA). The optimal number of components for PCA, ICA, and LDA was determined by utilizing the training set of -cv fold at each iteration and finding the number of components that yielded the highest accuracy for each of these methods. The final classification results, which have been calculated through folds of -cv, are summarized in Tables 4 and 5. We used two evaluation indexes: the true positive rate (TPR) and the average classification accuracy (ACA). From the figures in Tables 4 and 5, it is clear that the proposed feature analysis scheme consistently outperforms the conventional approaches of PCA, ICA, and LDA for all of the datasets that were analyzed in this study. For instance, Table 5 reveals that the TPR of the proposed method is around 100% for almost all bearing faults (except for the BCR fault condition, where it ranges between 93 and 97% but is still higher than that of PCA and ICA). In Table 4, it can be observed that the proposed method achieves good classification performance even though the feature clustering of the 3 mm length crack datasets was poor and not well separated as compared to the 12 mm length crack datasets. This comparatively high classification performance demonstrates that the proposed approach is more effective in feature analysis as compared to other conventional methods. Our approach is also able to yield maximum discrimination in a lower-dimensional feature space.

#### 5. Conclusion

This paper proposed a comprehensive multifault diagnosis methodology based on AE analysis to detect multiple localized bearing faults of a rolling element bearing. The method comprises a prewhitening step, a feature extraction based on wavelet kurtogram, a useful feature selection with the proposed VMDC-based feature analysis, and a fault classification by employing the NB classifier. One of the main advantages of the proposed methodology is that the subband of the nonstationary AE signal, which carries the most sensitive information related to fault characteristic, is adaptively identified without any prior knowledge of the bearing fault type by using an enhanced WPT-based kurtogram algorithm. In addition, the proposed VMDC-based discriminative feature analysis approach efficiently identifies optimal features where their discriminatory power is maximized. The proposed methodology was evaluated by an experimental setup using a healthy bearing and seven damaged ones under different rotational speeds and different simulated crack sizes. Experimental results indicated that the proposed multifault diagnosis methodology achieves the highest classification accuracy up to 91.11%, 96.67%, 98.89%, 99.44%, and 98.61% under bearing rotational speeds of 300 rpm, 350 rpm, 400 rpm, 450 rpm, and 500 rpm, respectively.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (no. NRF-2013R1A2A2A05004566).