#### Abstract

The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems. This study presents a trace ratio criterion-based kernel discriminant analysis (TR-KDA) for fault diagnosis of rolling element bearings. The binary immune genetic algorithm (BIGA) is employed to solve the trace ratio problem in TR-KDA. The numerical results obtained using extensive simulation indicate that the proposed TR-KDA using BIGA (called TR-KDA-BIGA) can effectively and efficiently classify different classes of rolling element bearing data, while also providing the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TR-KDA-BIGA performs better than existing methods in classifying different classes of rolling element bearing data. The proposed TR-KDA-BIGA may be a promising tool for fault diagnosis of rolling element bearings.

#### 1. Introduction

The rolling element bearing is a core component of many systems such as aircraft, train, steamboat, and machine tool, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns [1–6]. Due to misoperation, manufacturing deficiencies, or the lack of monitoring and maintenance, it is often found to be the most unreliable component within these systems. Therefore, effective and efficient fault diagnosis of rolling element bearings has an important role in ensuring the continued safe and reliable operation of their host systems.

Over the past few years, much research effort has been devoted to developing approaches to fault diagnosis of rolling element bearings. When faults occur in rolling element bearings, vibration signals in the relevant time/frequency-domain have been demonstrated to deviate from their normal ones because of the increased friction and impulsive forces [7–10]. Usually, several dozens or even hundreds of time/frequency-domain features are calculated from the bearing vibration signals to represent the different health status. In the current study, 9 time-domain features and 6 time-frequency-domain features are extracted from the bearing vibration signals to jointly construct a 15-dimension feature vector. In that way, fault diagnosis of rolling element bearings is usually solved as a high-dimensional pattern recognition problem. However, for high-dimensional data, the intrinsic dimension may be small. For example, the number of features responsible for a certain type of fault pattern may be small. Moreover, projection of high-dimensional data onto 2- or 3-dimension subspace can provide real-time visualization, which is convenient for the user to monitor the health status of rolling element bearings. In addition, projection of high-dimensional data onto low dimension subspace also plays a part of data compression, which is helpful for efficient storage and retrieval. Thus, dimensionality reduction techniques are often used to project the high-dimensional feature space to a lower-dimensional space while preserving most of “intrinsic information” contained in the data properties [11–15]. Upon performing dimensionality reduction on the data, its compact representation can be utilized for succeeding tasks (e.g., visualization and classification). Among various dimensionality reduction methods [16–24], principal component analysis (PCA) and linear discriminant analysis (LDA) are the two most common methods [21]. The former is an unsupervised method, which pursues the direction of maximum variance for optimal reconstruction. The latter is a supervised method, which aims to maximize the between-class scatter while minimizing the within-class scatter. Owning to the utilization of labeled information, the latter generally outperforms the former if sufficient labeled samples are provided [21]. In the past few years, a series of studies have been conducted to formulate the LDAs for pattern recognition by Fukunaga [21], Wang et al. [22], Sun and Chen [25], Guo et al. [26], Zhao et al. [27], Jin et al. [28], Jia et al. [29], and so on. Generally, the formulation of LDAs is based on the ratio trace criterion but not trace ratio criterion, because the ratio trace problem is more tractable than the trace ratio problem. Nevertheless, as pointed out by Wang et al. [22], solutions obtained based on ratio trace criterion may deviate from the original intent of the trace ratio problems. To improve the behaviour of LDA implementation, Wang et al. [22], Guo et al. [26], Zhao et al. [27], Jin et al. [28], and Jia et al. [29] presented various trace ratio criterion-based LDAs (TR-LDAs), in which the numerator and denominator of the criterion directly reflect Euclidean distances between of inter- and intraclass samples. Another advantage of trace ratio criterion is that the calculated projection matrix is orthogonal, which can eliminate the redundancy between different projection directions. In addition, the orthogonal projection can thus preserve such similarities without any change when using Euclidean distance to evaluate the similarity between data points [22]. Although the above TR-LDA formulation methods have the aforementioned advantages, they are criticized due to their incapability of dealing with the redundancy among eigenvectors. For example, if the most discriminative eigenvector is duplicated several times, the above TR-LDA formulation methods are prone to selecting all of them. This is problematic for selection of an optimal subset of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in this way can give rise to poor classification performance. Therefore, the issue of TR-LDA formulation has remained unresolved.

A review of the related literature also indicates that most of the previous work in the area of applying LDA or TR-LDA to fault diagnosis assumed that samples in each class follow a linear distribution. However, in many fault diagnosis practices, samples in each class that may follow a nonlinear distribution cannot satisfy the assumption. Without this assumption, the separation of different classes may not be well characterized by the scatter matrices, causing the classification results to be degraded [21]. To solve this problem, kernel trick [30–32], which is to extend many linear methods to its nonlinear kernel version, can be used to extend TR-LDA to handle nonlinear problem. Thus, this study develops a nonlinear kernel version of TR-LDA, that is, trace ratio criterion-based kernel discriminant analysis (TR-KDA), for fault diagnosis of rolling element bearings. However, like many other TR-LDA models, the TR-KDA model presented in this study shares the trace ratio problem in the formulation of projection matrix. Although the above TR-LDA formulation methods have the aforementioned advantages, they are criticized due to the inability to handle redundancy in eigenvector selection. For example, if the most discriminative eigenvector is duplicated several times, the above TR-LDA formulation methods are prone to selecting all of them. This is problematic for selecting the best set of eigenvectors because other discriminative and complementary eigenvectors will be missed. A classifier with the eigenvectors selected in such a way can lead to a poor classification performance. Fortunately, immune genetic algorithm (IGA), a novel evolutionary computation technique developed by Jiao and Wang [33], has the potential to determine a set of discriminative and mutually irredundant eigenvectors. In this study, we propose a method called TR-KDA-BIGA that uses binary IGA (BIGA) to formulate TR-KDA for dimensionality reduction of statistical and wavelet features extracted from the vibration signals and gives rise to effective and efficient fault diagnosis of rolling element bearings. In particular the contributions are to(i)use immune evolutionary computation technique such as BIGA to obtain a reduced set of discriminative and mutually irredundant eigenvectors for TR-KDA-BIGA formulation,(ii)provide the capability of two-dimensional representation of bearing data that is very useful for the practitioners to monitor the health status of bearings,(iii)build a TR-KDA-BIGA model architecture for the vibration measurements for effective and efficient fault diagnosis of rolling element bearings.

The rest of this study is structured as follows. Section 2 briefly reviews the basic concepts of TR-LDA and kernel extension. Section 3 presents a TR-KDA-BIGA method. Section 4 discusses its convergence and initialization. Section 5 conducts performance evaluations of the proposed TR-KDA-BIGA on benchmark problems. Section 6 describes an overall flowchart of the proposed TR-KDA-BIGA for fault diagnosis of rolling element bearings. Section 7 summarizes the conclusions drawn from this study.

#### 2. Review of TR-LDA and Kernel Extension

##### 2.1. Review of TR-LDA

Suppose we are given a set of -dimensional samples , belonging to different classes. The goal of LDA tries to obtain a linear projection matrix that can map the original -dimensional data onto the -dimensional data (usually ) by maximizing the between-class scatter and meanwhile minimizing the within-class scatter. The between-class scatter matrix and the within-class scatter matrix are expressed as follows: where represents the total sample mean vector, represents the number of samples in the th class, represents the average vector of the th class, and represents theth sample in the th class. The new mapped feature vectors can then be expressed as . The original LDA formulation, known as the Fisher LDA [21], only handles binary classification problems. However, many practical applications involve multiclass classification. In order to overcome this issue, a number of researchers have proposed optimization criteria for extending the Fisher LDA to handle multiclass classification problems. The first optimization criterion is in a ratio trace form (referred to as RT-LDA): where denotes the matrix trace; is an identity matrix. In order to achieve a set of orthogonal normalized vectors, it usually adds the constraint to (2). The second optimization criterion is in a trace ratio form (referred to as TR-LDA):The optimization problem in (3) can be solved directly through the generalized eigenvalue decomposition (GED) method [22]:where is the th largest eigenvalue, is the eigenvector corresponding to , and constitutes the th column vector of the matrix . Although a closed-form solution for (3) can be approximately obtained with the GED, it does not necessarily guarantee best trace ratio optimization. Thus, this approximation of ratio trace optimization to trace ratio optimization may lead to classification capability loss of the derived optimal low-dimensional feature space. Moreover, the physical meaning of the trace ratio form is clearer than that of the ratio trace form. However, the optimization problem in (3) is generally nonconvex and a closed-form solution for it does not exist. Fortunately, a recent study conducted by Guo et al. [26] showed that, using the trace difference function , the trace ratio problem can be solved equivalently by finding zero points of the equation . Following up Guo et al.’s work, Wang et al. presented an iterative method named ITR algorithm to solve the trace ratio problem [22]. The ITR algorithm optimizes the objective function in an iterative and incremental manner. The in the th iteration step (referred to as ) is obtained through solving the trace difference problem , where represents the trace ratio value derived from the in the previous iteration step (referred to as ). However, the initialization for the influences substantially the convergence performance of the ITR algorithm. A good initialization can generally make the ITR algorithm yield a quick convergence. A bad initialization usually increases the number of iterations. Moreover, in ITR algorithm, although it seems that the formed with these eigenvectors corresponding to the largest eigenvalues of can maximize the trace difference , it cannot necessarily maximize the trace ratio . On the other hand, from the perspective of fault diagnosis, the aim is mainly to find a set of projection vectors that can pose the highest levels of discrimination in the different fault patterns. Thus, these eigenvectors with the largest eigenvalues are not necessarily representative for discriminating one class from others as previously mentioned in Section 1. To overcome the above shortcomings, this study presents a BIGA-based solution method for trace ratio criterion (to be in detail discussed in Section 3).

##### 2.2. Kernel Extension

In some applications, it is insufficient to model the data using the TR-LDA, which is a linear discriminating method. To address the issue of nonlinearities in the data, this section presents a nonlinear discriminating method using kernel trick [30–32], that is, TR-KDA. The so-called kernel trick is to map the original data to a high-dimensional Hilbert space through a nonlinear mapping function . Let denote the data matrix in the Hilbert space: . The function form of the mapping does not need to be known since it is implicitly defined by the choice of kernel function , that is, the inner product in the kernel-induced feature space. The kernel function may be any positive kernel satisfying Mercer’s condition. Radial basis function (RBF) kernel function, one of the most popular kernel functions employed in various kernelled learning algorithms, is adopted in this study. Then, (3) in Hilbert space can be written as follows:where , , and are the matrices in Hilbert space corresponding to , , and in (3), respectively. Notably, we can show that matrices and in (3) can be essentially expressed as and through simple manipulation, respectively. is the vector, where . Matrices and are the graph Laplacian matrices [34] of the weighted undirected graphs reflecting the between-class and within-class relationship of the samples. Considerwhere is an -dimensional vector. We can simplify the above equation even further by defining that Thus, we getThen, the matrix can similarly be computed as follows:where is a matrix, is an -dimensional vector, is the identity matrix, is an matrix, and is the data covariance matrix of the th class. Based on (7), the above equation can be simplified similarly by defining Thus, we getUsing the definitions in (9) and (12), (5) can be rewritten as follows:In order to pursue the matrix , solving the above equation involves decomposition of into an orthogonal matrix (satisfying ) and a right triangular matrix such that . We haveLet us map into the span of . is currently an orthogonal basis of , so we have where is an orthogonal matrix satisfying . Using the definitions in (14) and (15), (13) can be further rewritten as follows:Let and ; then (16) can be further rewritten as follows:After the matrix is obtained with the BIGA-based solution method (to be in detail discussed in Section 3), the output points in the reduced data space can thus be expressed as

#### 3. The Proposed TR-KDA-BIGA

As previously mentioned, construction of TR-KDA needs to select out of eigenvectors to form the matrix for dimensionality reduction. However, finding a subset of eigenvectors based on the trace ratio criterion is not an easy task since the space of possible subsets is very large especially when is a large number. Thus, it is not impractical to use exhaustive search to find an optimal subset of eigenvectors. Instead, in this study, the BIGA is utilized to select out of eigenvectors of as the bases for projection matrix formulation based on the trace ratio criterion such that the trace ratio value can be maximized. Immune genetic algorithm, originally developed by Jiao and Wang [33], is a novel genetic algorithm based on the biological immune theory, which combined the immune mechanism with the evolutionary mechanism. In what follows, further discussion of the proposed TR-KDA-BIGA is carried out.

##### 3.1. Chromosome Encoding

Encoding a solution of a problem into a chromosome is an important issue when using BIGAs. In this study, every chromosome in a BIGA corresponds to a discrete binary selector , where each gene in the* chromosome* is “1,” indicating an eigenvector of appearing in forming the projection matrix of the th step, while “0” denotes its absence. Thus, the length of the chromosome is .

##### 3.2. Genetic Operators

Genetic operators give every chromosome the chance to become the fittest chromosome of its generation. If it is difficult to reach the target of trace ratio optimization, crossover and mutation may introduce degeneracy into generations of chromosomes.

###### 3.2.1. Crossover Operator

Crossover operator in a BIGA is employed to generate two new children chromosomes based on two existing parent chromosomes selected from the current population in terms of a prespecified crossover rate. In this study, “one-point” crossover operator was adopted to randomly select a cut point to exchange the parts between the cut point and the end of the string of the parent chromosomes. Specifically, suppose that two parent chromosomes and selected randomly from the population are undergoing the crossover operation at a randomly selected crossover point , where Consequently, the offspring is generated by one-point crossover on the genes of two parents selected randomly from the population. We can thus get the two offspring chromosomes and :However, the exchange procedure is not simply exchanging their genetic information between gene segments after the crossover points. We must keep the number of eigenvectors to be included in the subset equal to . In this study, therefore, a simple but effective crossover operator strategy in this study is performed in order to ensure that the crossover operator does not change the total number of “1” genes in chromosomes.

LetWhen is not equal to , the following retention criterion will be conducted:(1)If is larger than , randomly select genes with “0-bit” from the current offspring chromosome and reset these selected genes to “1-bit,” and then randomly select genes with “1-bit” from the current offspring chromosome and reset these selected genes to “0-bit.”(2)If is smaller than , randomly select genes with “1-bit” from the current offspring chromosome and reset these selected genes to “0-bit,” and then randomly select genes with “0-bit” from the current offspring chromosome and reset these selected genes to “1-bit.”

###### 3.2.2. Mutation Operator

Mutation operator in a BIGA is used primarily as a mechanism for maintaining diversity in the population. For each gene in a chromosome that is undergoing the mutation, a real-valued number is randomly selected within the range of . If the real-valued number is less than the prespecified mutation rate, then the gene will change from “0-bit” to “1-bit” and vice versa. Upon adding (or removing) one eigenvector in that way, we shall randomly remove (or add) a different one such that the number of eigenvectors to be included in the subset is equal to . The mutation operator helps the chromosomes to guide the search in new areas.

##### 3.3. Immune Operators

The immune ability of BIGAs is realized through two kinds of immune operators: a vaccination and an immune selection. The vaccination is responsible for improving individuals’ overall fitness levels. The immune selection is responsible for prevention of deterioration.

###### 3.3.1. Vaccination Operator

Given a chromosome , vaccination operation in a BIGA is employed to modify the genes on some bits according to a priori knowledge such that individuals with higher fitness have a greater probability of being selected. Let be a population; the vaccination operation on means that the operation is performed on chromosomes selected from according to the proportion of , where represents the population size of a BIGA. A vaccine is abstracted from the prior knowledge of the pending problem, whose information amount and validity play an important role in the performance of the algorithm.

###### 3.3.2. Immune Selection Operator

The immune selection operation consists of the following two steps. The first step is the immunity test: if the fitness of a chromosome is smaller than that of the parent chromosome, which indicates that degeneration occurred during crossover and mutation, then the parent chromosome will be used for the next competition. The second step is the annealing selection [35]: a chromosome is selected from the current offspring population to join with the new parents with the probability as follows:where is the fitness of the individual and is the temperature-controlled series tending towards 0.

##### 3.4. Fitness Evaluation

Fitness evaluation plays a critical role in selecting offspring chromosomes from the current population for the next generation. In this study, the fitness function for eigenvector selection is defined aswhere denotes the value for the th eigenvector, denotes the value for the th eigenvector, , , , , , and . Notably, is called the binary selector and is the desired lower feature dimension. Finally, according to the evolved binary selector , we can thus form the projection matrix of the th step by choosing the eigenvectors with . The procedures of the proposed TR-KDA-BIGA are summarized in the procedures of the proposed TR-KDA-BIGA part. The computational flow of the BIGA obtained using the aforementioned genetic and immune operators is also provided in the computational flow of the BIGA part.

*The Procedures of the Proposed TR-KDA-BIGA.*
The procedures are as follows:(1)Construct the kernel matrix .(2)Perform Cholesky decomposition to the kernel matrix .(3)Form the kernel scatter matrixes as and .(4)Set iterations number to 1.(5)Set the initial trace ratio value to .(6)Compute the eigendecomposition of as , where is the eigenvector of .(7)Calculate and for each eigenvector .(8)Generate a population of BIGA selectors.(9)Evolve the population where the fitness of a BIGA selector is measured as .(10) is the evolved best BIGA selector.(11)Form the projection matrix by choosing the eigenvectors with .(12)Update the trace ratio value , , and go to step . Repeat this procedure until a convergence condition was established when the trace ratio value does not increase in consecutive 5 iterations.(13)Output .

*The Computational Flow of the BIGA*. The computational flow is as follows:(1)Set (time of generation) to 1.(2)Initialize randomly the original population .(3)Evaluate each chromosome in the original population .(4)Abstract vaccines according to the prior knowledge.(5)Check for termination criteria. If the fixed number of generations is not reached or the optimal chromosome found thus far is not satisfied, then go to the next step. Otherwise, output the optimal chromosome as the final solutions for further decision-making.(6)Perform crossover operation on the and then generate the population .(7)Perform mutation operation on the and then generate the population results .(8)Perform vaccination operation on the and then generate the population .(9)Perform immune selection operation on the and then generate the next generational population . Go to step .

#### 4. The Convergence of the Proposed TR-KDA-BIGA

In this section, we analyze the convergence of the proposed TR-LDA-BIGA. Before doing this task, it should be worth noting that the BIGA is convergent. It has been demonstrated by Jiao and Wang [33] that as long as enough iteration has been completed, the immune genetic population converges towards the true optimum with probability one.

Recall the trace difference function it follows that Since as previously mentioned, we getConsider the inequality and the equationand we have Consequently, Substituting the subscript by yieldsSo we obtain the following inequality which gives the first expression of convergence of the proposed TR-KDA-BIGA.

Further, suppose that is the optimal trace ratio value; it follows thatwhere is the optimal projection matrix. We therefore haveConsider , and is semipositive definite; we haveSo we obtain the following inequality which gives the second expression of convergence of the proposed TR-KDA-BIGA:We conclude therefore that, for a particular initial trace ratio value , the updated value can always satisfy (1) and (2) .

#### 5. Performance Evaluation on Benchmark Problems

In order to extensively verify the performance of the proposed TR-KDA-BIGA, it is first tested on wide types of commonly used benchmark problems taken from the UCI machine learning repository and evaluated with the classification rate (i.e., the number of correctly identified training examples/total number of training examples) by comparison with other existing methods such as PCA, LDA, KPCA [30, 31], KDA [32], and TR-LDA. These data sets include Heart-statlog, Ionosphere, Iris, Wine, Waveform, Balance, and Synthetic Control Chart Time Series (SCCTS) data sets (Table 1), which are of small sizes, low dimensions, large sizes, and/or high dimensions. For comparative study, we randomly select 50% data points from each data set as training set and the rest of the data points as test set. All methods use training set in the output reduced space to train one nearest neighborhood (1NN) classifier for evaluating the classification rate of test set. To restrict the influence of random effects, the experiments of PCA, LDA, KPCA, KDA, TR-LDA, and TR-KDA-BIGA compared on each benchmark problem are independently performed for 20 runs. Table 2 compares the classification rate for benchmark problems of the proposed TR-KDA-BIGA with that of the PCA, LDA, KPCA, KDA, and TR-LDA. As seen in Table 2, the proposed TR-KDA-BIGA can perform better than all the compared methods, except in the case of Heart-statlog.

The results obtained demonstrate the ability of the proposed TR-KDA-BIGA in classifying different classes well. Thus, the proposed TR-KDA-BIGA may be effectively employed for fault diagnosis of rolling element bearings.

#### 6. The Proposed TR-KDA Using BIGA for Fault Diagnosis of Rolling Element Bearings

In this section, the proposed TR-KDA-BIGA is applied to fault diagnosis of rolling element bearings. Vibration signals resulting from rolling element bearings are first filtered by using a low-pass filter. Then, the filtered vibration signals are divided into sections of equal window length. One set of relevant features obtained from each window is used for characterizing to some extent the health status of the rolling element bearings. Most of the faults occurring in rolling element bearings will introduce the increased friction and impulsive forces when bearings are rotating, which generally lead the vibration signals in time-domain, frequency-domain, and/or time-frequency-domain to vary (become different) from the normal ones. In this study, 9 time-domain statistical features (Table 3) are extracted from the vibration signal. All of these 9 time-domain statistical features reflect the characteristics of time series data in the time-domain. Moreover, 6 time-frequency-domain wavelet features about the percentages of energy corresponding to wavelet coefficients are extracted from the vibration signal by using Daubechies-4 (db4) wavelet to decompose the vibration signal into five levels [32]. Wavelet features extracted in such a way can to the greatest extent reflect the vibration energy distribution in the time-frequency-domain. Thus, 9 time-domain statistical features together with 6 time-frequency-domain wavelet features are used to represent each window’s vibration signals.

##### 6.1. Experimental Setup

In order to demonstrate the performance of the proposed TR-KDA-BIGA, rolling element bearing data obtained from the Bearing Data Centre, Case Western Reserve University [36], are used. The test rolling element bearings were SKF 6205 JEM, a type of deep groove ball bearing. Single-point faults were seeded into the drive end ball bearing using electrodischarge machining. Faults occurring in rolling element bearings introduced impact-like vibration signals when bearings were rotating. An accelerometer was mounted on the drive end of the motor housing to detect such impacts that behaved like damped oscillations. Vibration signals were captured from four different health statuses of bearing, that is, normal bearings (Normal), inner race fault (IR), ball fault (BA), and outer race fault (OR). For each of the three abnormal statuses (IR, BA, and OR), there are three different levels of severity with fault diameters (0.007 inches, 0.014 inches, and 0.021 inches). All the experiments were done for three different load conditions (1 HP, 2 HP, and 3 HP). Figure 1 illustrates the experimental setup. Experimental data were collected from the drive end ball bearing of an induction motor (Reliance Electric 2 HP IQPreAlert) driven test rig. Table 4 gives a short description of rolling element bearing data.

**(a)**

**(b)**

##### 6.2. Experiment Results

###### 6.2.1. Visualization of Bearing Data

Visualization performances of the proposed TR-KDA-BIGA are compared with those of PCA, LDA, KPCA, KDA, and TR-LDA using simulations, where KPCA and KDA are the kernel extensions to PCA and LDA, respectively. The two-dimensional visualization results of bearing data for three different load conditions (1, 2, and 3 HP) obtained with PCA, LDA, KPCA, KDA, TR-LDA, and the proposed TR-KDA-BIGA are summarized in Figures 2, 3, and 4, respectively. As seen in Figures 2, 3, and 4, the proposed TR-KDA-BIGA outperforms all the compared methods in not only closely conglomerating bearing data belonging to the same class but also clearly separating bearing data belonging to different classes of three different load conditions (1, 2, and 3 HP). Compared with the unsupervised methods (i.e., PCA and KPCA), the supervised methods (i.e., LDA, KDA, TR-LDA, and TR-KDA-BIGA) can preserve more discriminative information embedded in bearing data and obtain clearer and less overlapped boundaries. It can also be concluded from Figures 2, 3, and 4 that the methods using kernel trick (i.e., KPCA, KDA, and TR-KDA-BIGA) performed better than the methods without using kernel trick (i.e., PCA, LDA, and TR-LDA) in separating the discriminative property—samples from different classes in the learned subspace.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

###### 6.2.2. Classification of Bearing Data

Classification performances of the proposed TR-KDA-BIGA are compared with those of PCA, LDA, KPCA, KDA, and TR-LDA. In order to show the robustness of the proposed TR-KDA-BIGA, we perform 4 independent experiments for each load condition in terms of 4 different data partitions. In this study, 10, 20, 30, and 40 samples per class in bearing data set are randomly selected from each class in bearing data as the training set and the remaining samples as the test set. Then, each method uses the training set to train a 1NN classifier in order to classify different health status in test set. Tables 5, 6, and 7 summarize the average classification results of PCA, LDA, KPCA, KDA, TR-LDA, and the proposed TR-KDA-BIGA with various numbers of training samples for 1 HP, 2 HP, and 3 HP load conditions, respectively. It can be observed that the overall average performance of the classification of health status is fairly good. Tables 5, 6, and 7 demonstrate that the proposed TR-KDA-BIGA performs remarkably better than the compared methods (PCA, LDA, KPCA, KDA, and TR-LDA). It should be noted that the proposed model can also provide the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Tables 5, 6, and 7 also demonstrate that the number of training samples does significantly affect the classification accuracy for bearing health status.

#### 7. Conclusions

The rolling element bearing is a core component of many systems, and their failure can lead to reduced capability, downtime, and even catastrophic breakdowns. Effective and efficient fault diagnosis of rolling element bearings plays an extremely important role in the safe and reliable operation of their host systems. In the current study, fault diagnosis of rolling element bearings is done in a pattern recognition way by calculating a high-dimensional feature data set from vibration signals, which represents the different status of bearings. Specifically, the TR-KDA is presented for fault diagnosis of rolling element bearings and the BIGA is employed to solve the trace ratio problem in TR-KDA. The numerical results obtained using extensive simulation indicate that the proposed TR-KDA-BIGA can effectively classify different classes of rolling element bearing data, while also providing the capability of real-time visualization that is very useful for the practitioners to monitor the health status of rolling element bearings. Empirical comparisons show that the proposed TR-KDA-BIGA performs better than existing methods in classifying different rolling element bearing data. The proposed TR-KDA-BIGA may be a promising tool for fault diagnosis of rolling element bearings.

Three research directions are worth pursuing. First, although this study considers the specific fault diagnosis of rolling element bearings, the proposed method can be modified and extended to address the fault diagnosis of gearboxes [37, 38] and cutting tools [39, 40]. Second, frequency-domain information can be utilized for fault diagnosis of rolling element bearings [41, 42]; it would thus be interesting to integrate frequency-domain features to time-domain and time-frequency-domain features. Third, empirical mode decomposition is a very powerful tool for nonlinear and nonstationary signal processing [43–45]; it would be also interesting to employ the empirical mode decomposition to extract periodic components and random transient components from the bearing vibration signal mixture, which may be very helpful for extraction of fault signatures from a collected bearing vibration signal.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The research is funded partially by the National Science Foundation of China (51405239), National Defense Basic Scientific Research Program of China (A2620132010, A2520110003), Jiangsu Provincial Natural Science Foundation of China (BK20150745, BK20140727), Jiangsu Province Science and Technology Support Program (BE2014134), Fundamental Research Funds for the Central Universities (1005-YAH15055), and Jiangsu Postdoctoral Science Foundation of China (1501024C). The authors would like to express sincere appreciation to Professor KA Loparo and Case Western Reserve University for their efforts to make bearing data set available and permission to use data set.