#### Abstract

Dimensionality reduction is a crucial task in machinery fault diagnosis. Recently, as a popular dimensional reduction technology, manifold learning has been successfully used in many fields. However, most of these technologies are not suitable for the task, because they are unsupervised in nature and fail to discover the discriminate structure in the data. To overcome these weaknesses, kernel local linear discriminate (KLLD) algorithm is proposed. KLLD algorithm is a novel algorithm which combines the advantage of neighborhood preserving projections (NPP), Floyd, maximum margin criterion (MMC), and kernel trick. KLLD has four advantages. First of all, KLLD is a supervised dimension reduction method that can overcome the out-of-sample problems. Secondly, short-circuit problem can be avoided. Thirdly, KLLD algorithm can use between-class scatter matrix and inner-class scatter matrix more efficiently. Lastly, kernel trick is included in KLLD algorithm to find more precise solution. The main feature of the proposed method is that it attempts to both preserve the intrinsic neighborhood geometry of the increased data and exact the discriminate information. Experiments have been performed to evaluate the new method. The results show that KLLD has more benefits than traditional methods.

#### 1. Introduction

With the information collection technology becoming more and more advanced, a huge number of data have been produced during mechanical equipment running process. The sensitive information which reflects the running status of the equipment has been submerged in a large amount of redundant data. Effective dimensionality reduction can solve this problem. Dimensionality reduction is one of the key technologies for equipment condition monitoring and fault diagnosis. Nonlinear and nonstationary vibration signals generated by the rolling bearing [1, 2] make the original high-dimensional feature space which consists of the statistical characteristics of the signal inseparable. The traditional linear dimensionality reduction methods such as PCA and ICA not only are under the assumption of global linear structure of the data but also use different linear transformation matrix to find the best low-dimensional projection. The classification information plays an important role. In nonlinear conditions such as the original high dimensional feature space possesses a non-linear structure, however, the classification information is difficult to obtain by linear methods. KPCA is a traditional nonlinear dimensionality reduction method, which achieves the task of dimensionality reduction by discarding relatively small projection in a higher-dimensional linear space. In addition, KPCA aims to find the principal components with the largest variance, which may cause the loss of useful discriminate information [3].

Manifold learning is a data-driven approach and can reveal the underlying nature of the complex data structure, which provides a new approach for the analysis of the intrinsic dimension based on the data distribution. Manifold learning has got a series of research achievements in the feature extraction [4–6]. Actually, manifold learning method falls broadly into two categories [7] which have different advantages and disadvantages: global (Isomap [8]) and local (locally linear embedding [9]). In [10], the author points out that, as for the discriminate analysis, the local structure is usually more important than the global structure when there are no enough samples. As for local manifold learning, local linear embedding (LLE) is an algorithm which has many advantages such as global optimal solution and fast calculation. Furthermore, its minimum reconstruction error weights can keep data local neighborhood geometric properties unchanged when data exhibition shrinks and rotates. So LLE algorithm is applied to the fault feature extraction [11–15].

NPP [16] is one of the manifold learning methods, whose central idea is based on LLE by introducing a linear transform matrix. NPP has been successfully applied in famous “Swiss roll” and “S-curve” dataset dimension reduction. The algorithm assumes that the structure of data on the local significance is linear. However, when the data manifold has a larger bending, manifold learning method will result in the short-circuit problem.

As for these issues, a fault feature extraction method named KLLD is proposed in the paper. This method studies both the iris dataset and the rolling bear original feature dataset constructed by wavelet packet energy with dimensionality reduction application. The effectiveness of this method is verified by contrast with conventional analysis methods.

The rest of this paper is organized as follows. In Section 2, we review briefly the LLE, NPP, Floyd, and MMC algorithm. In Section 3, firstly, KLLD algorithm proposed in this paper is deduced; secondly, the short-circuit problem is introduced and Floyd algorithm is employed to overcome this drawback; lastly, based on the LLD, KLLD, and Floyd algorithm, the calculation steps of LLD and KLLD are designed. In Section 4, we design a KLLD experiment process for the dimension reduction of iris and rolling bear database; then we apply the KPCA, NPP, LLD, and KLLD algorithm to dataset dimension reduction. Conclusions are made and several issues for future study are addressed in Section 5.

#### 2. Basic Principle

##### 2.1. Local Linear Embedding

The LLE (local linear embedding) algorithm is a new nonlinear data dimension reduction technology, which utilizes local neighborhood relation to learn the global structure of nonlinear manifolds.

For the given -dimensional real-valued vectors , assume that the vectors of each point and its nearest neighbor lie in the local linear space, by weighting coefficients with neighborhood which belongs to to reconstruct . is selected by minimizing the cost function. That is,

In LLE algorithm low dimensional points can be reconstructed by high-dimensional matrix. Each of the high-dimensional data can be obtained by eigenvalue decomposition of formula (2) and then identify the bottom eigenvectors corresponding to its smallest eigenvalues. Then is a matrix that is constructed by using these eigenvectors and discarding the eigenvectors corresponding to its smallest eigenvalues of the matrix. Derivation can be found in the literature [9]: In summary, we state LLE as the following algorithm.

*Step 1. *Select neighbors by nearest neighbor algorithm.

*Step 2. *Reconstruct with linear weights by (1).

*Step 3. *Map to embed coordinates .

##### 2.2. Neighborhood Preserving Projections

LLE algorithm processes so many advantages; however, its computation cost is more expensive compared to other linear dimensionality reduction methods. Moreover, it does not have the ability to map new test datasets into low-dimension space directly, which is called out-of-sample problems. NPP derives from LLE and inherits LLE’s neighborhood property. However, it modifies the LLE by introducing a linear matrix . NPP can overcome the drawbacks of LLE.

The low-dimensional feature datasets are obtained from original datasets by the projection matrix . The main idea of the method is to compute the transform matrix on the premise of keeping the smallest reconstruction error (see (3) and (4)): Let . (3) can be transformed to the following: where is obtained by solving (2). The Lagrange extreme method is used: Obviously, we know the projection matrix is generalized eigenvectors of .

##### 2.3. Floyd Algorithm

This algorithm was proposed by Floyd, which is used to calculate the shortest distance between two specified points. is a matrix of and are specific numbers for each point in . represents the shortest path between vertex and vertex . Point is an intermediate point; the Floyd algorithm contains two calculation steps.

*Step 1. *Initialization: computing . We set when such shortest path does not exist; otherwise .

*Step 2. *For , using (7) to loop iteration, calculate short path of each point in :

##### 2.4. Maximum Margin Criterion

LDA (linear discriminate analysis) is a popular linear feature exactor. The key step is to find a transform matrix under the condition of Fisher criterion maximized, where is the number of dimension in original dataset and and are the number of ’s dimension [17]: where

represents the distance matrix between classes and represents distance matrix within class. is the number of classes; and are the mean vector and a priori probability of class , respectively*. * is overall mean vector.

However, we can find drawbacks, because (8) cannot be applied when is singular due to the small sample size problem. MMC method has been proposed in the literature [17] to overcome these drawbacks. According to LDA, MMC can be represented in the following:

#### 3. Kernel Local Linear Discriminate Algorithm

##### 3.1. Local Linear Discriminate Algorithm

According to LLTSTA derivation process in the literature [18], the basic idea of the LLD proposed in this paper is that if the linear transform matrix of NPP (see (5)) satisfies (10), the ability of discriminate different class of data will be greatly improved. This problem can be expressed as a multiobjective optimization problem: Equation (11) can be changed into constrained optimization problem: Lagrange multiplier method is used to solve this problem; that is, further deduced as follows: Equation (14) is converted into then we can find the projection matrix is eigenvectors of .

##### 3.2. Kernel Local Linear Discriminate Algorithm

Suppose that is a nonlinear mapping to some feature space ; (15) can be changed into the following:

To find local linear discrimination information in the feature space , we need (16) dot product form of input patterns. Then we replace the dot product form with one of the kernel functions. From the theory of reproducing kernels we know that any solution must lie in the span of all training samples in . Therefore, expansion terms for can be written as follows: Combining (16) and (17) and then multiplying at both sides of (16), we have We set , and then (18) can be rewritten as follows: where is the number of th samples, is the number of samples, and is the number of sample dimensions. Taking polynomial kernel function as the sample, can be rewritten as . Similar to Section 3.1, (19) can be considered as a generalized eigenvalue decomposition problem.

##### 3.3. Short-Circuit Problem

Traditional Euclidean distance method has many advantages such as perceptual intuitional, easy to understand and calculation. However, Euclidean method could easily lead to short-circuit problem [19], when the high-dimension space possesses a larger hypersurface curvature. Short-circuit problem refers to the fact that a point's neighbor mixed with different types of points, which results in discrimination information cannot be extracted effectively. Distribution of the two-type data in the two-dimensional space is shown in Figure 1.

Figure 1 shows two types of points including round and square. Under the condition of Euclidean distances, round12’s five close neighbors are {round11, round13, square2, square3, and square4}. This phenomenon will lead to distortion of data dimensionality reduction in low-dimensional space. In order to overcome this drawbacks, we created a connect graph. The point of different type is not connected and we deem the distance between the unconnected points is infinity. In this way the round12’s five nearest neighbor points are {round9, round10, round11, round13, and round14}. Therefore, the Floyd algorithm in Section 2.3 is used to find the distance between points in the figure after establishing a connection diagram in high-dimensional sample space. To find the right nearest neighbor point in the LLD algorithm, using the Floyd algorithm can effectively avoid the problem of mixing different types of data samples.

##### 3.4. Steps of LLD Calculation

According to Sections 3.1, 3.2. and 3.3 analysis, we state LLD as the following algorithm.

*Input*. One has the original space matrix , close neighbor points , connection distance , low-dimensional embedding dimension , and Kernel parameter .

*Output.* One has low-dimensional matrix .

*Step 1. *
This step involves computing the Euclidean distance between any two points .

*Step 2. *Set connection threshold , determine points value, and construct a weighted graph similar to Figure 1.

*Step 3. *The distance between and the rest points is calculated. ’s nearest neighborhoods are selected under the condition that minimum distance is determined according to Section 2.3.

*Step 4. *Reconstruct weighting matrix , which is calculated according to formula (1).

*Step 5. *Calculate matrix according to formula (15) and eigenvalue decomposition matrix, and then find the low-dimensional embedding smallest eigenvalue, 2 to smallest eigenvalue corresponding feature vector .

*Step 6. *Calculate by .

##### 3.5. Steps of KLLD Calculation

Besides the input and output of LLD, kernel parameter also should be considered as for KLLD’s input. There are 3 calculation steps.

*Step 1. *Reconstruct weighting matrix as Step 1 to Step 4 of Section 3.4, and then calculate according to formula (2).

*Step 2. *Calculate according to formula (19); then can be obtained by (17).

*Step 3. *Calculate by .

#### 4. Application Analysis

##### 4.1. Iris Dataset Dimensionality Reduction

We evaluated the performance of the new approach on the iris plants database. *I. setosa*, *versicolor*, and *virginica* are included in this dataset. Sepal length, sepal width, petal length, and petal width are the characteristics of the plant samples. The number of each of the plant samples is 50. We divided them into two parts equally named dataset1 and dataset2. So there were 25 plant samples in each class of the new database. KPCA, NPP, and LLD were also used in this section to demonstrate the advantage of the dimensionality reduction method. Polynomial kernel function whose parameter was employed in KLLD and KPCA. The number of close neighbor points is , which was used in NPP, LLD, and KLLD method. The results of iris database dimension reduction are shown in Figure 2.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

KPCA and NPP methods hardly discriminate three types of the plant as shown in Figures 2(a)–2(d), because of the fact that both of them reduce dimension for describing data. That is to say, they keep the information not discarded during dataset dimension reduction. Figures 2(e)–2(h) show the results of LLD and KLLD methods. There are points representing different plant overlaps in Figures 2(e)-2(f), because LLD is a linear method. As shown in Figures 2(g)-2(h), the KLLD method can discriminate different kinds of plants properly. In order to investigate the accuracy of classification, SVM [20] is used as the classifier. Table 1 shows the accuracy of SVM classification. Dataset1 is the training dataset and dataset2 is the testing dataset. Table 1 illustrates that the KLLD method obtains the highest classification accuracy.

##### 4.2. Rolling Bear Fault Datasets Dimensionality Reduction Experiment

###### 4.2.1. KLLD Experiment Process Design

The calculation steps are designed as follows.

*Step 1. *Collect vibration signal of the rolling bear.

*Step 2. *Wavelet packet energy is used to construct the original features datasets.

*Step 3. *Projection matrix is obtained by KLLD.

*Step 4. *Find out dimension reduction result by .

KLLD dimensionality reduction process for rolling bearing fault datasets is shown in Figure 3.

###### 4.2.2. Wavelet Packet Energy Original Feature Construction

Under normal and fault operating conditions, time-domain waveform signals are shown in Figure 4.

**(a)**

**(b)**

**(c)**

**(d)**

The time-domain waveform characteristic of bearing inner race fault is typical shock component. The waveform of normal bearing rolling shows the feature of stable and little fluctuation in amplitude. The waveform of rolling element bearings fault includes random single punch strike component, while the time-domain waveform of bearing outer race fault is very similar to the inner race fault waveform. It is hard to grasp the rolling bear feature of different fault condition only from time-domain waveform. Wavelet packet analysis is a precise method for signal analysis. It is widely used in bearing fault diagnosis currently. So we use this method to construct the original feature. Typical fault of wavelet packet energy is shown in Figure 5. We can find that the different rolling bear faults signals which are processed by the wavelet packet decomposition have significantly different amplitude in different frequency bands of energy.

**(a)**

**(b)**

**(c)**

**(d)**

###### 4.2.3. Wavelet Packet Energy Original Feature Construction

To verify the validity of the KLLD method, the experiment was performed on Electrical Engineering Laboratory rolling bear vibration database of Case Western Reserve University. We selected the bearing model SKF6203, with the running speed 1730 rpm under normal, inner race fault, ball fault, and outer race fault. They were processed by wavelet packet decomposition and two original feature datasets were constructed, named dataset1 and dataset2. Both of the datasets have 40 points. Table 2 shows the original features of dataset1.

###### 4.2.4. Parameter Settings

According to [21], the authors calculated the optimal embedding dimension of manifold learning algorithm by the following: where is the optimal embedding dimension and is the number of categories. This study considers four operational states of the rolling bear, so the low-dimensional space is 3 by (20). In addition, the distribution of the data can be clearly illustrated in 3-dimensional spaces.

The parameter is the quantity of nearest neighbor. It is one of the most important parameters in manifold learning algorithm, because if the number of nearest neighbors is too large the small-scale structure of the manifold could be eliminated and the whole manifold would be smooth. On the contrary, if the number of nearest neighborhoods is too little, the successive manifold may be divided into disjointed submanifolds. Residual variance can be defined as , where , represent the Euclidian distance matrixes of each point in and , respectively. represents standard linear correlation. The smaller the value of residual variance is, the better high-dimension dataset can be embedded into low dimension [9]. The optimal value of can be found by

LLE is the basic version of NPP, LLD, and KLLD and it can be used to determine the optimal of datasets. Dataset1 is input of LLE algorithm, the value of is from 2 to 39, and the results are illustrated in Figure 6. So is the optimal quantity of neighborhood.

We can find that the distance between different classes is large in numerical while that within the same class is small from Table 3. In order to guarantee that the number of neighborhood in the same class is 10, the connection distance is according to Table 3. The number of neighbors is . Polynomial kernel function is used in KLLD and KPCA and its parameter . Dataset2 is also done the same as we do in dataset1. The KLLD algorithm is used to calculate dataset1 and dataset2.

###### 4.2.5. Calculation and Discussion

KPCA, NPP, and LLD algorithms are also used to analyze the effectiveness of KLLD. Table 2 is input matrix. Figure 7 shows the results of distribution in three-dimensional space.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

As shown in Figures 7(a)-7(b), inner race fault and ball fault overlap with each other. KPCA hardly distinguishes different class of points during dimension reduction. Figures 7(c)-7(d) illustrate that normal and inner race fault can be distinguished; however, ball fault and outer race fault have some slight aliasing, especially in dataset2. We can find the same phenomenon in Figures 7(e)-7(f). The results of KLLD are directly shown in Figure7(c)-7(d). KLLD algorithm can distinguish different classes of rolling bearing dataset. Figure 7 suggests that the different states of discriminating sensitive characteristics are retained.

The inner-class distance in low dimension is calculated and it is shown in Table 4, in order to evaluate the effectiveness of the KLLD method precisely. It shows that the proposed method has better clustering ability. To quantitatively evaluate the separability of the method, the sample ratio of between-class average distance and average intraclass distance is calculated. We can find in Table 5 that KLLD is 4.38 × 10^{11} while other methods are less than 517.7 in dataset1, and KLLD is 186.4 while other methods are less than 120.8. SVM has been used to calculate the classification accuracy of low-dimension dataset. The results are shown in Table 6, which illustrate that KLLD-SVM method can recognize each condition of rolling bear vibration signal.

#### 5. Conclusions

A novel dimension reduction algorithm for the purpose of discrimination called kernel local linear discriminate (KLLD) has been proposed in this paper. The most prominent property of KLLD is the complete preservation of both discriminate and local geometrical structures in the data. However, traditional dimension reduction algorithm can not properly preserve the discriminate structure. We have applied our algorithm to iris databases dimension reduction. The experiment demonstrated that our algorithm can extract the different kinds of iris features and is suitable for classification. And then we applied KLLD to machinery fault diagnosis. At first, the original feature space of rolling bear dataset was constructed by wavelet energy. Secondly, KLLD algorithm and other dimensionality reduction methods were used, respectively, in the original feature space. Finally, SVM was used for classification. The experiment shows that our method has excellent capability of clustering and dimension reduction.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by the National Nature Science Foundation of China (no. 51175316), the Specialized Research Fund for the Doctoral Program of Higher Education (no. 20103108110006), and Shanghai Science and Technology Commission Basic Research Project (no. 11JC1404100).