Research Article  Open Access
WenAn Yang, Maohua Xiao, Wei Zhou, Yu Guo, Wenhe Liao, Gang Shen, "Trace Ratio CriterionBased Kernel Discriminant Analysis for Fault Diagnosis of Rolling Element Bearings Using Binary Immune Genetic Algorithm", Shock and Vibration, vol. 2016, Article ID 8631639, 15 pages, 2016. https://doi.org/10.1155/2016/8631639
Trace Ratio CriterionBased Kernel Discriminant Analysis for Fault Diagnosis of Rolling Element Bearings Using Binary Immune Genetic Algorithm
Abstract
The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems. This study presents a trace ratio criterionbased kernel discriminant analysis (TRKDA) for fault diagnosis of rolling element bearings. The binary immune genetic algorithm (BIGA) is employed to solve the trace ratio problem in TRKDA. The numerical results obtained using extensive simulation indicate that the proposed TRKDA using BIGA (called TRKDABIGA) can effectively and efficiently classify different classes of rolling element bearing data, while also providing the capability of realtime visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TRKDABIGA performs better than existing methods in classifying different classes of rolling element bearing data. The proposed TRKDABIGA may be a promising tool for fault diagnosis of rolling element bearings.
1. Introduction
The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns [1–6]. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems.
Over the past few years, much research effort has been devoted to developing approaches to fault diagnosis of rolling element bearings. When faults occur in rolling element bearings, vibration signals in the relevant time/frequencydomain have been demonstrated to deviate from their normal ones because of the increased friction and impulsive forces [7–10]. Usually, several dozens or even hundreds of time/frequencydomain features are calculated from the bearing vibration signals to represent the different health status. In the current study, 9 timedomain features and 6 timefrequencydomain features are extracted from the bearing vibration signals to jointly construct a 15dimension feature vector. In that way, fault diagnosis of rolling element bearings is usually solved as a highdimensional pattern recognition problem. However, for highdimensional data, the intrinsic dimension may be small. For example, the number of features responsible for a certain type of fault pattern may be small. Moreover, projection of highdimensional data onto 2 or 3dimension subspace can provide realtime visualization, which is convenient for the user to monitor the health status of rolling element bearings. In addition, projection of highdimensional data onto low dimension subspace also plays a part of data compression, which is helpful for efficient storage and retrieval. Thus, dimensionality reduction techniques are often used to project the highdimensional feature space to a lowerdimensional space while preserving most of “intrinsic information” contained in the data properties [11–15]. Upon performing dimensionality reduction on the data, its compact representation can be utilized for succeeding tasks (e.g., visualization and classification). Among various dimensionality reduction methods [16–24], principal component analysis (PCA) and linear discriminant analysis (LDA) are the two most common methods [21]. The former is an unsupervised method, which pursues the direction of maximum variance for optimal reconstruction. The latter is a supervised method, which aims to maximize the betweenclass scatter while minimizing the withinclass scatter. Owning to the utilization of labeled information, the latter generally outperforms the former if sufficient labeled samples are provided [21]. In the past few years, a series of studies have been conducted to formulate the LDAs for pattern recognition by Fukunaga [21], Wang et al. [22], Sun and Chen [25], Guo et al. [26], Zhao et al. [27], Jin et al. [28], Jia et al. [29], and so on. Generally, the formulation of LDAs is based on the ratio trace criterion but not trace ratio criterion, because the ratio trace problem is more tractable than the trace ratio problem. Nevertheless, as pointed out by Wang et al. [22], solutions obtained based on ratio trace criterion may deviate from the original intent of the trace ratio problems. To improve the behaviour of LDA implementation, Wang et al. [22], Guo et al. [26], Zhao et al. [27], Jin et al. [28], and Jia et al. [29] presented various trace ratio criterionbased LDAs (TRLDAs), in which the numerator and denominator of the criterion directly reflect Euclidean distances between of inter and intraclass samples. Another advantage of trace ratio criterion is that the calculated projection matrix is orthogonal, which can eliminate the redundancy between different projection directions. In addition, the orthogonal projection can thus preserve such similarities without any change when using Euclidean distance to evaluate the similarity between data points [22]. Although the above TRLDA formulation methods have the aforementioned advantages, they are criticized due to their incapability of dealing with the redundancy among eigenvectors. For example, if the most discriminative eigenvector is duplicated several times, the above TRLDA formulation methods are prone to selecting all of them. This is problematic for selection of an optimal subset of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in this way can give rise to poor classification performance. Therefore, the issue of TRLDA formulation has remained unresolved.
A review of the related literature also indicates that most of the previous work in the area of applying LDA or TRLDA to fault diagnosis assumed that samples in each class follow a linear distribution. However, in many fault diagnosis practices, samples in each class that may follow a nonlinear distribution cannot satisfy the assumption. Without this assumption, the separation of different classes may not be well characterized by the scatter matrices, causing the classification results to be degraded [21]. To solve this problem, kernel trick [30–32], which is to extend many linear methods to its nonlinear kernel version, can be used to extend TRLDA to handle nonlinear problem. Thus, this study develops a nonlinear kernel version of TRLDA, that is, trace ratio criterionbased kernel discriminant analysis (TRKDA), for fault diagnosis of rolling element bearings. However, like many other TRLDA models, the TRKDA model presented in this study shares the trace ratio problem in the formulation of projection matrix. Although the above TRLDA formulation methods have the aforementioned advantages, they are criticized due to the inability to handle redundancy in eigenvector selection. For example, if the most discriminative eigenvector is duplicated several times, the above TRLDA formulation methods are prone to selecting all of them. This is problematic for selecting the best set of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in such a way can lead to a poor classification performance. Fortunately, immune genetic algorithm (IGA), a novel evolutionary computation technique developed by Jiao and Wang [33], has the potential to determine a set of discriminative and mutually irredundant eigenvectors. In this study, we propose a method called TRKDABIGA that uses binary IGA (BIGA) to formulate TRKDA for dimensionality reduction of statistical and wavelet features extracted from the vibration signals and gives rise to effective and efficient fault diagnosis of rolling element bearings. In particular the contributions are to(i)use immune evolutionary computation technique such as BIGA to obtain a reduced set of discriminative and mutually irredundant eigenvectors for TRKDABIGA formulation,(ii)provide the capability of twodimensional representation of bearing data that is very useful for the practitioners to monitor the health status of bearings,(iii)build a TRKDABIGA model architecture for the vibration measurements for effective and efficient fault diagnosis of rolling element bearings.
The rest of this study is structured as follows. Section 2 briefly reviews the basic concepts of TRLDA and kernel extension. Section 3 presents a TRKDABIGA method. Section 4 discusses its convergence and initialization. Section 5 conducts performance evaluations of the proposed TRKDABIGA on benchmark problems. Section 6 describes an overall flowchart of the proposed TRKDABIGA for fault diagnosis of rolling element bearings. Section 7 summarizes the conclusions drawn from this study.
2. Review of TRLDA and Kernel Extension
2.1. Review of TRLDA
Suppose we are given a set of dimensional samples , belonging to different classes. The goal of LDA tries to obtain a linear projection matrix that can map the original dimensional data onto the dimensional data (usually ) by maximizing the betweenclass scatter and meanwhile minimizing the withinclass scatter. The betweenclass scatter matrix and the withinclass scatter matrix are expressed as follows: where represents the total sample mean vector, represents the number of samples in the th class, represents the average vector of the th class, and represents theth sample in the th class. The new mapped feature vectors can then be expressed as . The original LDA formulation, known as the Fisher LDA [21], only handles binary classification problems. However, many practical applications involve multiclass classification. In order to overcome this issue, a number of researchers have proposed optimization criteria for extending the Fisher LDA to handle multiclass classification problems. The first optimization criterion is in a ratio trace form (referred to as RTLDA): where denotes the matrix trace; is an identity matrix. In order to achieve a set of orthogonal normalized vectors, it usually adds the constraint to (2). The second optimization criterion is in a trace ratio form (referred to as TRLDA):The optimization problem in (3) can be solved directly through the generalized eigenvalue decomposition (GED) method [22]:where is the th largest eigenvalue, is the eigenvector corresponding to , and constitutes the th column vector of the matrix . Although a closedform solution for (3) can be approximately obtained with the GED, it does not necessarily guarantee best trace ratio optimization. Thus, this approximation of ratio trace optimization to trace ratio optimization may lead to classification capability loss of the derived optimal lowdimensional feature space. Moreover, the physical meaning of the trace ratio form is clearer than that of the ratio trace form. However, the optimization problem in (3) is generally nonconvex and a closedform solution for it does not exist. Fortunately, a recent study conducted by Guo et al. [26] showed that, using the trace difference function , the trace ratio problem can be solved equivalently by finding zero points of the equation . Following up Guo et al.’s work, Wang et al. presented an iterative method named ITR algorithm to solve the trace ratio problem [22]. The ITR algorithm optimizes the objective function in an iterative and incremental manner. The in the th iteration step (referred to as ) is obtained through solving the trace difference problem , where represents the trace ratio value derived from the in the previous iteration step (referred to as ). However, the initialization for the influences substantially the convergence performance of the ITR algorithm. A good initialization can generally make the ITR algorithm yield a quick convergence. A bad initialization usually increases the number of iterations. Moreover, in ITR algorithm, although it seems that the formed with these eigenvectors corresponding to the largest eigenvalues of can maximize the trace difference , it cannot necessarily maximize the trace ratio . On the other hand, from the perspective of fault diagnosis, the aim is mainly to find a set of projection vectors that can pose the highest levels of discrimination in the different fault patterns. Thus, these eigenvectors with the largest eigenvalues are not necessarily representative for discriminating one class from others as previously mentioned in Section 1. To overcome the above shortcomings, this study presents a BIGAbased solution method for trace ratio criterion (to be in detail discussed in Section 3).
2.2. Kernel Extension
In some applications, it is insufficient to model the data using the TRLDA, which is a linear discriminating method. To address the issue of nonlinearities in the data, this section presents a nonlinear discriminating method using kernel trick [30–32], that is, TRKDA. The socalled kernel trick is to map the original data to a highdimensional Hilbert space through a nonlinear mapping function . Let denote the data matrix in the Hilbert space: . The function form of the mapping does not need to be known since it is implicitly defined by the choice of kernel function , that is, the inner product in the kernelinduced feature space. The kernel function may be any positive kernel satisfying Mercer’s condition. Radial basis function (RBF) kernel function, one of the most popular kernel functions employed in various kernelled learning algorithms, is adopted in this study. Then, (3) in Hilbert space can be written as follows:where , , and are the matrices in Hilbert space corresponding to , , and in (3), respectively. Notably, we can show that matrices and in (3) can be essentially expressed as and through simple manipulation, respectively. is the vector, where . Matrices and are the graph Laplacian matrices [34] of the weighted undirected graphs reflecting the betweenclass and withinclass relationship of the samples. Considerwhere is an dimensional vector. We can simplify the above equation even further by defining that Thus, we getThen, the matrix can similarly be computed as follows:where is a matrix, is an dimensional vector, is the identity matrix, is an matrix, and is the data covariance matrix of the th class. Based on (7), the above equation can be simplified similarly by defining Thus, we getUsing the definitions in (9) and (12), (5) can be rewritten as follows:In order to pursue the matrix , solving the above equation involves decomposition of into an orthogonal matrix (satisfying ) and a right triangular matrix such that . We haveLet us map into the span of . is currently an orthogonal basis of , so we have where is an orthogonal matrix satisfying . Using the definitions in (14) and (15), (13) can be further rewritten as follows:Let and ; then (16) can be further rewritten as follows:After the matrix is obtained with the BIGAbased solution method (to be in detail discussed in Section 3), the output points in the reduced data space can thus be expressed as
3. The Proposed TRKDABIGA
As previously mentioned, construction of TRKDA needs to select out of eigenvectors to form the matrix for dimensionality reduction. However, finding a subset of eigenvectors based on the trace ratio criterion is not an easy task since the space of possible subsets is very large especially when is a large number. Thus, it is not impractical to use exhaustive search to find an optimal subset of eigenvectors. Instead, in this study, the BIGA is utilized to select out of eigenvectors of as the bases for projection matrix formulation based on the trace ratio criterion such that the trace ratio value can be maximized. Immune genetic algorithm, originally developed by Jiao and Wang [33], is a novel genetic algorithm based on the biological immune theory, which combined the immune mechanism with the evolutionary mechanism. In what follows, further discussion of the proposed TRKDABIGA is carried out.
3.1. Chromosome Encoding
Encoding a solution of a problem into a chromosome is an important issue when using BIGAs. In this study, every chromosome in a BIGA corresponds to a discrete binary selector , where each gene in the chromosome is “1,” indicating an eigenvector of appearing in forming the projection matrix of the th step, while “0” denotes its absence. Thus, the length of the chromosome is .
3.2. Genetic Operators
Genetic operators give every chromosome the chance to become the fittest chromosome of its generation. If it is difficult to reach the target of trace ratio optimization, crossover and mutation may introduce degeneracy into generations of chromosomes.
3.2.1. Crossover Operator
Crossover operator in a BIGA is employed to generate two new children chromosomes based on two existing parent chromosomes selected from the current population in terms of a prespecified crossover rate. In this study, “onepoint” crossover operator was adopted to randomly select a cut point to exchange the parts between the cut point and the end of the string of the parent chromosomes. Specifically, suppose that two parent chromosomes and selected randomly from the population are undergoing the crossover operation at a randomly selected crossover point , where Consequently, the offspring is generated by onepoint crossover on the genes of two parents selected randomly from the population. We can thus get the two offspring chromosomes and :However, the exchange procedure is not simply exchanging their genetic information between gene segments after the crossover points. We must keep the number of eigenvectors to be included in the subset equal to . In this study, therefore, a simple but effective crossover operator strategy in this study is performed in order to ensure that the crossover operator does not change the total number of “1” genes in chromosomes.
LetWhen is not equal to , the following retention criterion will be conducted:(1)If is larger than , randomly select genes with “0bit” from the current offspring chromosome and reset these selected genes to “1bit,” and then randomly select genes with “1bit” from the current offspring chromosome and reset these selected genes to “0bit.”(2)If is smaller than , randomly select genes with “1bit” from the current offspring chromosome and reset these selected genes to “0bit,” and then randomly select genes with “0bit” from the current offspring chromosome and reset these selected genes to “1bit.”
3.2.2. Mutation Operator
Mutation operator in a BIGA is used primarily as a mechanism for maintaining diversity in the population. For each gene in a chromosome that is undergoing the mutation, a realvalued number is randomly selected within the range of . If the realvalued number is less than the prespecified mutation rate, then the gene will change from “0bit” to “1bit” and vice versa. Upon adding (or removing) one eigenvector in that way, we shall randomly remove (or add) a different one such that the number of eigenvectors to be included in the subset is equal to . The mutation operator helps the chromosomes to guide the search in new areas.
3.3. Immune Operators
The immune ability of BIGAs is realized through two kinds of immune operators: a vaccination and an immune selection. The vaccination is responsible for improving individuals’ overall fitness levels. The immune selection is responsible for prevention of deterioration.
3.3.1. Vaccination Operator
Given a chromosome , vaccination operation in a BIGA is employed to modify the genes on some bits according to a priori knowledge such that individuals with higher fitness have a greater probability of being selected. Let be a population; the vaccination operation on means that the operation is performed on chromosomes selected from according to the proportion of , where represents the population size of a BIGA. A vaccine is abstracted from the prior knowledge of the pending problem, whose information amount and validity play an important role in the performance of the algorithm.
3.3.2. Immune Selection Operator
The immune selection operation consists of the following two steps. The first step is the immunity test: if the fitness of a chromosome is smaller than that of the parent chromosome, which indicates that degeneration occurred during crossover and mutation, then the parent chromosome will be used for the next competition. The second step is the annealing selection [35]: a chromosome is selected from the current offspring population to join with the new parents with the probability as follows:where is the fitness of the individual and is the temperaturecontrolled series tending towards 0.
3.4. Fitness Evaluation
Fitness evaluation plays a critical role in selecting offspring chromosomes from the current population for the next generation. In this study, the fitness function for eigenvector selection is defined aswhere denotes the value for the th eigenvector, denotes the value for the th eigenvector, , , , , , and . Notably, is called the binary selector and is the desired lower feature dimension. Finally, according to the evolved binary selector , we can thus form the projection matrix of the th step by choosing the eigenvectors with . The procedures of the proposed TRKDABIGA are summarized in the procedures of the proposed TRKDABIGA part. The computational flow of the BIGA obtained using the aforementioned genetic and immune operators is also provided in the computational flow of the BIGA part.
The Procedures of the Proposed TRKDABIGA. The procedures are as follows:(1)Construct the kernel matrix .(2)Perform Cholesky decomposition to the kernel matrix .(3)Form the kernel scatter matrixes as and .(4)Set iterations number to 1.(5)Set the initial trace ratio value to .(6)Compute the eigendecomposition of as , where is the eigenvector of .(7)Calculate and for each eigenvector .(8)Generate a population of BIGA selectors.(9)Evolve the population where the fitness of a BIGA selector is measured as .(10) is the evolved best BIGA selector.(11)Form the projection matrix by choosing the eigenvectors with .(12)Update the trace ratio value , , and go to step . Repeat this procedure until a convergence condition was established when the trace ratio value does not increase in consecutive 5 iterations.(13)Output .
The Computational Flow of the BIGA. The computational flow is as follows:(1)Set (time of generation) to 1.(2)Initialize randomly the original population .(3)Evaluate each chromosome in the original population .(4)Abstract vaccines according to the prior knowledge.(5)Check for termination criteria. If the fixed number of generations is not reached or the optimal chromosome found thus far is not satisfied, then go to the next step. Otherwise, output the optimal chromosome as the final solutions for further decisionmaking.(6)Perform crossover operation on the and then generate the population .(7)Perform mutation operation on the and then generate the population results .(8)Perform vaccination operation on the and then generate the population .(9)Perform immune selection operation on the and then generate the next generational population . Go to step .
4. The Convergence of the Proposed TRKDABIGA
In this section, we analyze the convergence of the proposed TRLDABIGA. Before doing this task, it should be worth noting that the BIGA is convergent. It has been demonstrated by Jiao and Wang [33] that as long as enough iteration has been completed, the immune genetic population converges towards the true optimum with probability one.
Recall the trace difference function it follows that Since as previously mentioned, we getConsider the inequality and the equationand we have Consequently, Substituting the subscript by yieldsSo we obtain the following inequality which gives the first expression of convergence of the proposed TRKDABIGA.
Further, suppose that is the optimal trace ratio value; it follows thatwhere is the optimal projection matrix. We therefore haveConsider , and is semipositive definite; we haveSo we obtain the following inequality which gives the second expression of convergence of the proposed TRKDABIGA:We conclude therefore that, for a particular initial trace ratio value , the updated value can always satisfy (1) and (2) .
5. Performance Evaluation on Benchmark Problems
In order to extensively verify the performance of the proposed TRKDABIGA, it is first tested on wide types of commonly used benchmark problems taken from the UCI machine learning repository and evaluated with the classification rate (i.e., the number of correctly identified training examples/total number of training examples) by comparison with other existing methods such as PCA, LDA, KPCA [30, 31], KDA [32], and TRLDA. These data sets include Heartstatlog, Ionosphere, Iris, Wine, Waveform, Balance, and Synthetic Control Chart Time Series (SCCTS) data sets (Table 1), which are of small sizes, low dimensions, large sizes, and/or high dimensions. For comparative study, we randomly select 50% data points from each data set as training set and the rest of the data points as test set. All methods use training set in the output reduced space to train one nearest neighborhood (1NN) classifier for evaluating the classification rate of test set. To restrict the influence of random effects, the experiments of PCA, LDA, KPCA, KDA, TRLDA, and TRKDABIGA compared on each benchmark problem are independently performed for 20 runs. Table 2 compares the classification rate for benchmark problems of the proposed TRKDABIGA with that of the PCA, LDA, KPCA, KDA, and TRLDA. As seen in Table 2, the proposed TRKDABIGA can perform better than all the compared methods, except in the case of Heartstatlog.


The results obtained demonstrate the ability of the proposed TRKDABIGA in classifying different classes well. Thus, the proposed TRKDABIGA may be effectively employed for fault diagnosis of rolling element bearings.
6. The Proposed TRKDA Using BIGA for Fault Diagnosis of Rolling Element Bearings
In this section, the proposed TRKDABIGA is applied to fault diagnosis of rolling element bearings. Vibration signals resulting from rolling element bearings are first filtered by using a lowpass filter. Then, the filtered vibration signals are divided into sections of equal window length. One set of relevant features obtained from each window is used for characterizing to some extent the health status of the rolling element bearings. Most of the faults occurring in rolling element bearings will introduce the increased friction and impulsive forces when bearings are rotating, which generally lead the vibration signals in timedomain, frequencydomain, and/or timefrequencydomain to vary (become different) from the normal ones. In this study, 9 timedomain statistical features (Table 3) are extracted from the vibration signal. All of these 9 timedomain statistical features reflect the characteristics of time series data in the timedomain. Moreover, 6 timefrequencydomain wavelet features about the percentages of energy corresponding to wavelet coefficients are extracted from the vibration signal by using Daubechies4 (db4) wavelet to decompose the vibration signal into five levels [32]. Wavelet features extracted in such a way can to the greatest extent reflect the vibration energy distribution in the timefrequencydomain. Thus, 9 timedomain statistical features together with 6 timefrequencydomain wavelet features are used to represent each window’s vibration signals.
 
where is a digital signal series, , is the number of elements of the digital signal, and and are the mean value and rootmeansquare value of the digital signal series, respectively. 
6.1. Experimental Setup
In order to demonstrate the performance of the proposed TRKDABIGA, rolling element bearing data obtained from the Bearing Data Centre, Case Western Reserve University [36], are used. The test rolling element bearings were SKF 6205 JEM, a type of deep groove ball bearing. Singlepoint faults were seeded into the drive end ball bearing using electrodischarge machining. Faults occurring in rolling element bearings introduced impactlike vibration signals when bearings were rotating. An accelerometer was mounted on the drive end of the motor housing to detect such impacts that behaved like damped oscillations. Vibration signals were captured from four different health statuses of bearing, that is, normal bearings (Normal), inner race fault (IR), ball fault (BA), and outer race fault (OR). For each of the three abnormal statuses (IR, BA, and OR), there are three different levels of severity with fault diameters (0.007 inches, 0.014 inches, and 0.021 inches). All the experiments were done for three different load conditions (1 HP, 2 HP, and 3 HP). Figure 1 illustrates the experimental setup. Experimental data were collected from the drive end ball bearing of an induction motor (Reliance Electric 2 HP IQPreAlert) driven test rig. Table 4 gives a short description of rolling element bearing data.

(a)
(b)
6.2. Experiment Results
6.2.1. Visualization of Bearing Data
Visualization performances of the proposed TRKDABIGA are compared with those of PCA, LDA, KPCA, KDA, and TRLDA using simulations, where KPCA and KDA are the kernel extensions to PCA and LDA, respectively. The twodimensional visualization results of bearing data for three different load conditions (1, 2, and 3 HP) obtained with PCA, LDA, KPCA, KDA, TRLDA, and the proposed TRKDABIGA are summarized in Figures 2, 3, and 4, respectively. As seen in Figures 2, 3, and 4, the proposed TRKDABIGA outperforms all the compared methods in not only closely conglomerating bearing data belonging to the same class but also clearly separating bearing data belonging to different classes of three different load conditions (1, 2, and 3 HP). Compared with the unsupervised methods (i.e., PCA and KPCA), the supervised methods (i.e., LDA, KDA, TRLDA, and TRKDABIGA) can preserve more discriminative information embedded in bearing data and obtain clearer and less overlapped boundaries. It can also be concluded from Figures 2, 3, and 4 that the methods using kernel trick (i.e., KPCA, KDA, and TRKDABIGA) performed better than the methods without using kernel trick (i.e., PCA, LDA, and TRLDA) in separating the discriminative property—samples from different classes in the learned subspace.
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
6.2.2. Classification of Bearing Data
Classification performances of the proposed TRKDABIGA are compared with those of PCA, LDA, KPCA, KDA, and TRLDA. In order to show the robustness of the proposed TRKDABIGA, we perform 4 independent experiments for each load condition in terms of 4 different data partitions. In this study, 10, 20, 30, and 40 samples per class in bearing data set are randomly selected from each class in bearing data as the training set and the remaining samples as the test set. Then, each method uses the training set to train a 1NN classifier in order to classify different health status in test set. Tables 5, 6, and 7 summarize the average classification results of PCA, LDA, KPCA, KDA, TRLDA, and the proposed TRKDABIGA with various numbers of training samples for 1 HP, 2 HP, and 3 HP load conditions, respectively. It can be observed that the overall average performance of the classification of health status is fairly good. Tables 5, 6, and 7 demonstrate that the proposed TRKDABIGA performs remarkably better than the compared methods (PCA, LDA, KPCA, KDA, and TRLDA). It should be noted that the proposed model can also provide the capability of realtime visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Tables 5, 6, and 7 also demonstrate that the number of training samples does significantly affect the classification accuracy for bearing health status.



7. Conclusions
The rolling element bearing is a core component of many systems, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Effective and efficient fault diagnosis of rolling element bearings plays an extremely important role in the safe and reliable operation of their host systems. In the current study, fault diagnosis of rolling element bearings is done in a pattern recognition way by calculating a highdimensional feature data set from vibration signals, which represents the different status of bearings. Specifically, the TRKDA is presented for fault diagnosis of rolling element bearings and the BIGA is employed to solve the trace ratio problem in TRKDA. The numerical results obtained using extensive simulation indicate that the proposed TRKDABIGA can effectively classify different classes of rolling element bearing data, while also providing the capability of realtime visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TRKDABIGA performs better than existing methods in classifying different rolling element bearing data. The proposed TRKDABIGA may be a promising tool for fault diagnosis of rolling element bearings.
Three research directions are worth pursuing. First, although this study considers the specific fault diagnosis of rolling element bearings, the proposed method can be modified and extended to address the fault diagnosis of gearboxes [37, 38] and cutting tools [39, 40]. Second, frequencydomain information can be utilized for fault diagnosis of rolling element bearings [41, 42]; it would thus be interesting to integrate frequencydomain features to timedomain and timefrequencydomain features. Third, empirical mode decomposition is a very powerful tool for nonlinear and nonstationary signal processing [43–45]; it would be also interesting to employ the empirical mode decomposition to extract periodic components and random transient components from the bearing vibration signal mixture, which may be very helpful for extraction of fault signatures from a collected bearing vibration signal.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The research is funded partially by the National Science Foundation of China (51405239), National Defense Basic Scientific Research Program of China (A2620132010, A2520110003), Jiangsu Provincial Natural Science Foundation of China (BK20150745, BK20140727), Jiangsu Province Science and Technology Support Program (BE2014134), Fundamental Research Funds for the Central Universities (1005YAH15055), and Jiangsu Postdoctoral Science Foundation of China (1501024C). The authors would like to express sincere appreciation to Professor KA Loparo and Case Western Reserve University for their efforts to make bearing data set available and permission to use data set.
References
 X. Jin, E. W. M. Ma, L. L. Cheng, and M. Pecht, “Health monitoring of cooling fans based on mahalanobis distance with mRMR feature selection,” IEEE Transactions on Instrumentation and Measurement, vol. 61, no. 8, pp. 2222–2229, 2012. View at: Publisher Site  Google Scholar
 X. Jin and T. W. S. Chow, “Anomaly detection of cooling fan and fault classification of induction motor using Mahalanobis–Taguchi system,” Expert Systems with Applications, vol. 40, no. 15, pp. 5787–5795, 2013. View at: Publisher Site  Google Scholar
 J. Zarei, “Induction motors bearing fault detection using pattern recognition techniques,” Expert Systems with Applications, vol. 39, no. 1, pp. 68–73, 2012. View at: Publisher Site  Google Scholar
 J.B. Yu, “Bearing performance degradation assessment using locality preserving projections,” Expert Systems with Applications, vol. 38, no. 6, pp. 7440–7450, 2011. View at: Publisher Site  Google Scholar
 D. Wang, P. W. Tse, and Y. L. Tse, “A morphogram with the optimal selection of parameters used in morphological analysis for enhancing the ability in bearing fault diagnosis,” Measurement Science and Technology, vol. 23, no. 6, Article ID 065001, 2012. View at: Publisher Site  Google Scholar
 W. Wang and M. Pecht, “Economic analysis of canarybased prognostics and health management,” IEEE Transactions on Industrial Electronics, vol. 58, no. 7, pp. 3077–3089, 2011. View at: Publisher Site  Google Scholar
 R. B. Randall and J. Antoni, “Rolling element bearing diagnostics—a tutorial,” Mechanical Systems and Signal Processing, vol. 25, no. 2, pp. 485–520, 2011. View at: Publisher Site  Google Scholar
 Y. Yang, Y. Liao, G. Meng, and J. Lee, “A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis,” Expert Systems with Applications, vol. 38, no. 9, pp. 11311–11320, 2011. View at: Publisher Site  Google Scholar
 J. Rafiee, M. A. Rafiee, and P. W. Tse, “Application of mother wavelet functions for automatic gear and bearing fault diagnosis,” Expert Systems with Applications, vol. 37, no. 6, pp. 4568–4579, 2010. View at: Publisher Site  Google Scholar
 W. He, Z.N. Jiang, and K. Feng, “Bearing fault detection based on optimal wavelet filter and sparse code shrinkage,” Measurement, vol. 42, no. 7, pp. 1092–1102, 2009. View at: Publisher Site  Google Scholar
 J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. View at: Publisher Site  Google Scholar
 S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. View at: Publisher Site  Google Scholar
 M. D. Prieto, G. Cirrincione, A. G. Espinosa, J. A. Ortega, and H. Henao, “Bearing fault detection by a novel conditionmonitoring scheme based on statisticaltime features and neural networks,” IEEE Transactions on Industrial Electronics, vol. 60, no. 8, pp. 3398–3407, 2013. View at: Publisher Site  Google Scholar
 M. B. Zhao, Z. Zhang, and T. W. S. Chow, “Trace ratio criterion based generalized discriminative learning for semisupervised dimensionality reduction,” Pattern Recognition, vol. 45, no. 4, pp. 1482–1499, 2012. View at: Publisher Site  Google Scholar
 X. He, S. Yan, Y. Hu, P. Niyogi, and H.J. Zhang, “Face recognition using Laplacianfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328–340, 2005. View at: Publisher Site  Google Scholar
 K. Feng, Z. Jiang, W. He, and B. Ma, “A recognition and novelty detection approach based on Curvelet transform, nonlinear PCA and SVM with application to indicator diagram diagnosis,” Expert Systems with Applications, vol. 38, no. 10, pp. 12721–12729, 2011. View at: Publisher Site  Google Scholar
 Q. Jiang, M. Jia, J. Hu, and F. Xu, “Machinery fault diagnosis using supervised manifold learning,” Mechanical Systems and Signal Processing, vol. 23, no. 7, pp. 2301–2311, 2009. View at: Publisher Site  Google Scholar
 E. G. Strangas, S. Aviyente, and S. S. H. Zaidi, “Timefrequency analysis for efficient fault diagnosis and failure prognosis for interior permanentmagnet AC motors,” IEEE Transactions on Industrial Electronics, vol. 55, no. 12, pp. 4191–4199, 2008. View at: Publisher Site  Google Scholar
 Y. Wang, E. W. M. Ma, T. W. S. Chow, and K.L. Tsui, “A twostep parametric method for failure prediction in hard disk drives,” IEEE Transactions on Industrial Informatics, vol. 10, no. 1, pp. 419–430, 2014. View at: Publisher Site  Google Scholar
 J. Yu, “Local and nonlocal preserving projection for bearing defect classification and performance assessment,” IEEE Transactions on Industrial Electronics, vol. 59, no. 5, pp. 2363–2376, 2012. View at: Publisher Site  Google Scholar
 K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990. View at: MathSciNet
 H. Wang, S. Yan, D. Xu, X. Tang, and T. Huang, “Trace ratio vs. ratio trace for dimensionality reduction,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), Minneapolis, Minn, USA, June 2007. View at: Publisher Site  Google Scholar
 M. B. Zhao, R. H. M. Chan, P. Tang, T. W. S. Chow, and S. W. H. Wong, “Trace ratio linear discriminant analysis for medical diagnosis: a case study of dementia,” IEEE Signal Processing Letters, vol. 20, no. 5, pp. 431–434, 2013. View at: Publisher Site  Google Scholar
 L. Zhou, L. Wang, and C. H. Shen, “Feature selection with redundancyconstrained class separability,” IEEE Transactions on Neural Networks, vol. 21, no. 5, pp. 853–858, 2010. View at: Publisher Site  Google Scholar
 T. Sun and S. Chen, “Class label versus sample labelbased CCA,” Applied Mathematics and Computation, vol. 185, no. 1, pp. 272–283, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 Y.F. Guo, S.J. Li, J.Y. Yang, T.T. Shu, and L.D. Wu, “A generalized FoleySammon transform based on generalized fisher discriminant criterion and its application to face recognition,” Pattern Recognition Letters, vol. 24, no. 1–3, pp. 147–158, 2003. View at: Publisher Site  Google Scholar
 M. B. Zhao, X. H. Jin, Z. Zhang, and B. Li, “Fault diagnosis of rolling element bearings via discriminative subspace learning: visualization and classification,” Expert Systems with Applications, vol. 41, no. 7, pp. 3391–3401, 2014. View at: Publisher Site  Google Scholar
 X. H. Jin, M. B. Zhao, T. W. S. Chow, and M. S. Pecht, “Motor bearing fault diagnosis using trace ratio linear discriminant analysis,” IEEE Transactions on Industrial Electronics, vol. 61, no. 5, pp. 2441–2451, 2014. View at: Publisher Site  Google Scholar
 Y. Q. Jia, F. P. Nie, and C. S. Zhang, “Trace ratio problem revisited,” IEEE Transactions on Neural Networks, vol. 20, no. 4, pp. 729–735, 2009. View at: Publisher Site  Google Scholar
 J. Yang, A. F. Frangi, J.Y. Yang, D. Zhang, and Z. Jin, “KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 230–244, 2005. View at: Publisher Site  Google Scholar
 C. Zhang, F. Nie, and S. Xiang, “A general kernelization framework for learning algorithms based on kernel PCA,” Neurocomputing, vol. 73, no. 4–6, pp. 959–967, 2010. View at: Publisher Site  Google Scholar
 S. Ji and J. Ye, “Kernel uncorrelated and regularized discriminant analysis: a theoretical and computational study,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 10, pp. 1311–1321, 2008. View at: Publisher Site  Google Scholar
 L. C. Jiao and L. Wang, “A novel genetic algorithm based on immunity,” IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol. 30, no. 5, pp. 552–561, 2000. View at: Publisher Site  Google Scholar
 F. R. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, No. 92, American Mathematical Society, 1997.
 J. S. Zhang, Z. B. Xu, and Y. Liang, “The whole annealing genetic algorithms and their sufficient and necessary conditions of convergence,” Science of China, vol. 27, no. 2, pp. 154–164, 1997. View at: Google Scholar
 K. A. Loparo, “Bearings vibration data set,” Case Western Reserve University, http://csegroups.case.edu/bearingdatacenter/pages/welcomecasewesternreserveuniversitybearingdatacenterwebsite. View at: Google Scholar
 D. Wang, Q. Miao, and R. Kang, “Robust health evaluation of gearbox subject to tooth failure with wavelet decomposition,” Journal of Sound and Vibration, vol. 324, no. 3–5, pp. 1141–1157, 2009. View at: Publisher Site  Google Scholar
 D. Wang, P. W. Tse, W. Guo, and Q. Miao, “Support vector data description for fusion of multiple health indicators for enhancing gearbox fault diagnosis and prognosis,” Measurement Science and Technology, vol. 22, no. 2, Article ID 025102, 2011. View at: Publisher Site  Google Scholar
 F. J. Alonso and D. R. Salgado, “Analysis of the structure of vibration signals for tool wear detection,” Mechanical Systems and Signal Processing, vol. 22, no. 3, pp. 735–748, 2008. View at: Publisher Site  Google Scholar
 K. P. Zhu, G. S. Hong, and Y. S. Wong, “A comparative study of feature selection for hidden Markov modelbased micromilling tool wear monitoring,” Machining Science and Technology, vol. 12, no. 3, pp. 348–369, 2008. View at: Publisher Site  Google Scholar
 D. Wang, Q. Miao, X. F. Fan, and H.Z. Huang, “Rolling element bearing fault detection using an improved combination of Hilbert and wavelet transforms,” Journal of Mechanical Science and Technology, vol. 23, no. 12, pp. 3292–3301, 2010. View at: Publisher Site  Google Scholar
 Y. G. Lei, M. J. Zuo, Z. J. He, and Y. Y. Zi, “A multidimensional hybrid intelligent method for gear fault diagnosis,” Expert Systems with Applications, vol. 37, no. 2, pp. 1419–1430, 2010. View at: Publisher Site  Google Scholar
 D. Wang, W. Guo, and P. W. Tse, “An enhanced empirical mode decomposition method for blind component separation of a singlechannel vibration signal mixture,” Journal of Vibration and Control, 2014. View at: Publisher Site  Google Scholar
 Y. G. Lei, M. J. Zuo, and M. R. Hoseini, “The use of ensemble empirical mode decomposition to improve bispectral analysis for fault detection in rotating machinery,” Proceedings of the Institution of Mechanical Engineers Part C: Journal of Mechanical Engineering Science, vol. 224, no. 8, pp. 1759–1769, 2010. View at: Publisher Site  Google Scholar
 Y. G. Lei, Z. J. He, and Y. Y. Zi, “Application of the EEMD method to rotor fault diagnosis of rotating machinery,” Mechanical Systems and Signal Processing, vol. 23, no. 4, pp. 1327–1338, 2009. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 WenAn Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.