#### Abstract

In view of the problems of uneven distribution of reality fault samples and dimension reduction effect of locally linear embedding (LLE) algorithm which is easily affected by neighboring points, an improved local linear embedding algorithm of homogenization distance (HLLE) is developed. The method makes the overall distribution of sample points tend to be homogenization and reduces the influence of neighboring points using homogenization distance instead of the traditional Euclidean distance. It is helpful to choose effective neighboring points to construct weight matrix for dimension reduction. Because the fault recognition performance improvement of HLLE is limited and unstable, the paper further proposes a new local linear embedding algorithm of supervision and homogenization distance (SHLLE) by adding the supervised learning mechanism. On the basis of homogenization distance, supervised learning increases the category information of sample points so that the same category of sample points will be gathered and the heterogeneous category of sample points will be scattered. It effectively improves the performance of fault diagnosis and maintains stability at the same time. A comparison of the methods mentioned above was made by simulation experiment with rotor system fault diagnosis, and the results show that SHLLE algorithm has superior fault recognition performance.

#### 1. Introduction

With the development of the modernization process, the structure of mechanical equipment has been more and more increasingly sophisticated while the degree of automation and function of realization grow increasingly stronger. In the fault diagnosis of rotating machinery, the more sophisticated the monitoring and control system, the more the numbers of sensors. What is more, the data of presenting state characteristics in real space is complex. These multiple variables make the data to present parameters of equipment running status become more complex which to describe the state of the data abstraction are the high-dimensional data. Faced with the characteristics of failure samples such as high-dimensional diversity, the traditional linear method has great limitations. However, the nonlinear manifold learning method was developed rapidly since it was first proposed in the journal of Science in 2000. From a huge amount of complicated and changeable high-dimensional observation data, the methods make data analysis and state decision-making extend from the original Euclidean space to the manifold, which is able to identify the key information accurately and dig out signal essential characteristics and internal rules, which will be analyzed and judged for fault diagnosis. Classic manifold learning methods such as isometric feature mapping (ISOMAP) [1], local linear embedding (LLE) [2], local tangent space alignment (LTSA) [3, 4], and Laplacian eigenmap (LE) [5, 6] algorithm are mainly applied to the fields of data mining, image processing, pattern recognition, and information retrieval [7–9]. In recent years, the manifold learning methods have been applied to many kinds of rotating machinery fault diagnosis. Yang et al. [10] put forward the noise reduction algorithm in nonlinear time series based on reconstruction of phase space and main manifold identification and successfully extracted the impact characteristics of gearbox fault signal from the noise; Yang et al. [11] put forward a method of incremental local tangent space alignment. In the different states of rolling bearing classification recognition, the algorithm of incremental learning for new samples has higher recognition rate; Liang et al. [12] proposed impact fault feature adaptive extraction algorithm based on manifold learning and extracted the optimal impact fault feature. For the extraction of weak feature of early fault, Li et al. [13] proposed the early fault diagnosis and feature extraction methods of the rolling bearing based on manifold learning, which improved the fault pattern classification performance. However, with the complex diversities and particularity of fault samples, the most of manifold learning methods in the practical application are optimization methods, and part of them are characterized by just the combination of two kinds of the methods and the stability of method which needs to be improved. Therefore, in this study with rotating machinery as the research object, an improved local linear embedding algorithm of homogenization distance (HLLE) is proposed on the basis of the research of locally linear embedding (LLE) method to solve the problems that the real fault samples set are not distributed evenly and the phenomena that LLE is easily affected by the neighboring points. Moreover, in order to further improve the recognition performance rate and stability of the method, the supervised learning mechanism is added. And a new local linear embedding algorithm of supervision and homogenization distance (SHLLE) is proposed. Then the rotor system fault simulation experiment is used to analyze and validate the effectiveness of fault diagnosis.

#### 2. Locally Linear Embedding Algorithm of Homogenization Distance

##### 2.1. Locally Linear Embedding Algorithm

Euclidean distance is used in LLE algorithm for local linear fitting to show the overall topological structure, namely, to represent the global nonlinear structure characteristics according to the local linear fitting. LLE algorithm is a local weight matrix manifold learning algorithm with the assumption that observation data set is located in or approximately in the low-dimensional embedding manifold in the high-dimensional space. The basic idea of LLE is that the weight value of the data points in high-dimensional space can be best reconstructed and it can carry the local geometry information of manifold from high-dimensional space to low-dimensional space. LLE holds the opinion that constructed weight matrix can preserve the essence characteristics of the local neighborhood. That means the weight matrix can maintain the same geometric properties of local neighborhood of data set regardless of scaling or rotation.

The basic steps (described as Figure 1) of LLE algorithm are as follows:

*Input*. The original feature matrix is composed of numbers of dimensional vector.

*Output*. The feature vector matrix is composed of numbers of dimensional vector ().

*Step 1. *Set the neighbor points () and () dimensional embedding values.

*Step 2. *Calculate the Euclidean distance between any two points among in sample set , choosing the nearest neighbor corresponding points to constitute the neighbor area based on neighbor point.

*Step 3. *In the neighbor area, represents the weights between and , calculating the best reconstruction weights of to make sure that reconstruction error is minimum and constraint conditions is .

*Step 4. *Calculate embedded dimensional vector by the weight matrix based on the minimum reconstruction error .

In LLE algorithm, nonlinear data set is smartly divided into the date representation with local linear structure to reduce dimension effectively for nonlinear date. The main parameters are neighbor points and embedding dimension. The problem of solving the least squares is transformed into the eigenvalues in calculating process; thus it reduces the amount of calculation. In general, LLE has advantages of less undetermined parameters, overall analysis optimal solution, smaller computational complexity, and direct geometric meaning.

##### 2.2. Locally Linear Embedding Algorithm of Homogenization Distance Algorithm

It is a certain difficulty when original LLE algorithm is used to deal with complicated data samples in the engineering practice. Because the LLE method assumes that the distribution of sample points on the manifold is continuous dense and uniform sampling, using Euclidean distance directly in high dimensional space to select local neighborhood may not truly reflect the intrinsic nature of the manifold structure. When manifold shows the curls or cascade, namely, the short distance between the two manifold surfaces, refactoring process will cause points with shorter distance from different surface to the same local neighborhood space, thus causing the distortion of the manifold structure. In addition, the numbers of neighbor points () have obvious effects on dimension reduction, as the local neighborhood composed of () nearest neighbor points on the sparse area distribution of the sample points is obviously bigger than that on the dense area, while choosing neighbor points will have difference. Therefore, an improved local linear embedding algorithm of homogenization distance (HLLE) is proposed. On the basis of the distance measurement between data in the Conformal-Isomap method and in accordance with the characteristics of practical fault samples, the following improving distance is used to replace the Euclidean distance in LLE algorithm:Among them, is the Euclidean distance between and , while and , respectively, represent the average distance of and to the other points, namely,

The purpose of homogenization distance is getting the relatively narrow distance between sample points in the relatively sparse area and the relatively increased distance in the dense area through calculating improved distance between sample points, so that the overall distribution of sample points tends to be homogeneous, which is helpful to the classification of sample set; namely, the change of distance makes categories much more classifiable and reduces dimension reduction effects of the neighbor points. Figure 2 is the effect diagram of distance measurement change.

The molecules of original Euclidean distance are constant in the homogenization distance formula (1); when sample points are distributed in sparse area, the distance between the point and other points is longer, and the average distance with other points is also longer. Therefore, the original distance divided by the larger denominator to get new distance is reduced accordingly. On the contrary, when sample points are distributed in dense area, the distance between the point and other points is shorter, and the average distance with other points is also shorter, which makes the new distance be increased accordingly. As a result, homogenization distance makes the overall distribution of sample points tend to be homogeneous. Additionally, the distribution area between two kinds of sample points also changes accordingly. The new distance in sparse area becomes shorter, while sparse area becomes downsized and dense area becomes relatively upsized. Through distance formula, the difference of the new distance of sample points in the sparse and dense area can be released, namely, the ratio of the density.(1)Sparse area:(2)Dense area:

Thereinto, and are neighbor point of . If is the same point, and are neighboring point in sparse area and dense regions, respectively, and the ratio of the density is as follows:

From formula (5), the change range ability of new distance in sparse area is bigger than that in dense area, so that sparse area can be separated from the dense one to a certain degree. In the subsequent algorithm process, the homogenization distance which chooses neighbor points according to neighbor points can basically take the same kind of neighborhood. Thus, it can reduce the influence of the neighboring points and be more conducive to feature extraction. And gathering samples with similar features makes the samples classified effectively.

#### 3. Locally Linear Embedding Algorithm of Supervision and Homogenization Distance

##### 3.1. Supervised Locally Linear Embedding Algorithm

Locally linear embedding method is a higher efficient manifold method in nonlinear data dimension reduction method. But it is a nonlinear dimension reduction method without supervised learning in essence. The inadequate use of the samples category information results in a certain influence on the classification accuracy, which will not achieve an optimal effect if it is applied to areas such as classification. Therefore, de Ridder et al. [14] proposed SLLE algorithm [14]. Its main purpose is to increase the distance between classes and minimize the global reconstruction error of local data by reducing the distance within class. The key is to modify the calculation method of () neighbor points though adding a distance parameter to the different sample, thus increasing the category information of sample points. In SLLE algorithm, the following formula instead of Euclidean distance in LLE algorithm is used to build the neighborhood of sample points set for dimension reduction:

From formula (6), is the distance that contains category information, is the Euclidean distance without category information, and is the maximum distance between classes. is the matrix whose value is 0 or 1. When two sample points belong to the same kind, the value is 0; otherwise the value is 1. is a parameter to adjust the distance between point sets, , and is an experienced parameter. When its value is 0, SLLE algorithm is equivalent to LLE algorithm.

##### 3.2. Locally Linear Embedding Algorithm of Supervision and Homogenization Distance Algorithm

Locally linear embedding algorithm of homogenization distance by the change of distance makes categories much more classifiable. But the improvement of its classification performance is limited, and it is also a kind of unsupervised algorithm. In order to promote the fault recognition rate and stability of method, supervised learning mechanism is introduced to the HLLE, and then locally linear embedding algorithm of supervision and homogenization distance algorithm (SHLLE) is proposed. On the basis of homogenization distance, to increase the category information of sample points by supervised learning, so that the same fault category are gathered and heterogeneous fault are scattered. The main steps of method are as follows.

*Step 1. *Set neighbor points , embedding dimension , and parameter values .

*Step 2. *Calculate the homogenization distance. For a given data set , calculate the Euclidean distance between sample points, and according to formula (1) receive the homogenization distance.

*Step 3. *Add sample information and select local neighbors. According to formula to get , look for the nearest neighbor points of each sample point in high-dimensional space.

*Step 4. *Calculate and reconstruct weight matrixes. Calculate locally optimal reconstruction weights of sample points and make the reconstruction error minimized. That is, acquire the optimal solutionAmong them, is neighbor point of , and is the weights between and . When and are not neighbor, .

*Step 5. *Calculate low-dimensional embedding matrix . From the above steps, weight matrix is obtained which is used to get the optimal low-dimensional embedding matrix , by minimizing the reconstruction error and using function . Matrix needs to meet . Among them is dimensional unit matrix. The optimization problem is transformed into the following constrained optimization problem: Among them, is solved to by Lagrange multiplier method. Select the eigenvectors which are corresponded with minimum nonzero eigenvalues of that is the requested low-dimensional coordinate matrix . As usual, the minimum eigenvalue is almost zero. Taking the eigenvectors corresponded with the eigenvalues between , so the optimal embedding results are obtained.

#### 4. Simulation Experiment of Rotor System Fault

Comprehensive fault simulation test bed of Spectra Quest Company (USA) is used as experimental platform. Specifically, fault simulation experiment system is composed of Spectra Quest integrated fault simulation test bench and PULSE data acquisition system. As shown in Figure 3, the rotor disk contains two laps of a total of 36 threaded holes, in which the bolt can be randomly installed and can be used to simulate the rotor unbalance fault, and the quality of bolt is 5.596 g in the experiment. Adjusting two same knobs of the plinth to control both ends in the relative position of the axial and radial bearing can simulate rotor misalignment fault, 0.025 mm/scale. In the experiment, the left clockwise rotation for 5 scales is 0.125 mm, and the right end clockwise rotation for 10 scales is 0.25 mm. For the pedestal loose fault of the experiment, it can be simulated to loosen the left end part of the bolt.

Through the simulation of the fault of the rotor system which is normal, unbalanced, misalignment, and loose in the pedestal at the running speed of 10 Hz, 20 Hz, and 30 Hz and using a total of six acceleration sensors at two bearing seats on three directions to carry out the vibration signal acquisition, a total of 144 groups of data signal were obtained.

#### 5. Fault Diagnosis Based on SHLLE

Merging and reconstructing raw signal data, through analysis and comparison, a set of data 1 is shown as vector index of fault feature which could represent normal, misalignment, and unbalanced fault. Another group of data 2 is shown as vector index of fault feature which represent loose, misalignment, and unbalanced fault. Then two groups of the data space are filtered and extract 8 time domain parameters to form the original featured space which can be rolled into two data matrices.

The analysis and comparison of classification and results of fault diagnosis between LLE, HLLE, SLLE, and SHLLE are as follows.

##### 5.1. Fault Identify and Comparison

As shown in Figures 4 and 5, (a), (b), (c), and (d) are two-dimensional classification map of LLE, HLLE, SLLE, and SHLLE algorithm for original feature datasets, while horizontal and vertical axes represent the component 1 and component 2 of the main characteristics, respectively.

**(a) LLE**

**(b) HLLE**

**(c) SLLE**

**(d) SHLLE**

**(a) LLE**

**(b) HLLE**

**(c) SLLE**

**(d) SHLLE**

In Figure 4, in data 1, “” indicates normal state, “” indicates misalignment fault, and “+” indicates unbalanced fault. To know, when using LLE and HLLE methods, normal state can be better separated, and overlapping phenomenon exists about unbalanced and misalignment faults. When using SLLE method and setting parameter , the separated data set is gathered into columnar shape. Normal data set is identified, but misalignment and unbalanced fault are partially overlapped. As for the SHLLE method, setting parameter , normal, misalignment, and unbalanced fault can be significantly classified and aggregated. It has better recognition effect compared with LLE and SLLE.

In Figure 5, “” represents loose fault; “” indicates misalignment fault; “+” indicates unbalanced fault. To know, when using LLE and HLLE methods, loose fault can be separated but not gathered, because of its impact that it is difficult to separate misalignment fault from unbalanced fault. When using SLLE method and setting parameter , misalignment and unbalanced fault can be better separated and gathered without the effect of loose fault, but loose fault still cannot be gathered. When using SHLLE method and also setting parameter , normal, misalignment, and unbalanced fault can be effectively classified, and fault recognition effect is relatively better. In short, there is great superiority when using SHLLE to handle data sets of original features which is more effective on classification and identification of faults.

##### 5.2. Comparison of Fault Recognition Rate

Two-dimensional map of data set according to various algorithms cannot fully reflect effect of fault recognition, because some methods also can be used to classify and identify the fault when the number of neighborhood is small. Therefore, a comparative analysis of the recognition rate changing with the neighbor points was shown below.

Figure 6(a) is the changeable map for normal, misalignment, and unbalanced data sets (data 1) fault diagnosis accuracy rate with neighboring points of change in based on LLE, HLLE, SLLE, and SHLLE algorithms. Taking the embedding dimension , . “Box” Line, “” Line, “x” Line, and “” Line represent a recognition rate curve of LLE, SLLE, HLLE, and SHLLE, respectively. It can be obtained according to the diagram and the data: about 80% recognition rate of LLE and HLLE is achieved with smaller neighbor points, which shows an unstable state of fluctuation obviously with the increase of ; recognition rate of SLLE algorithm shows a higher trend which reaches 100% and remains stable when values 10, which is suitable for the case where neighbor number is more than 10; SHLLE algorithm has great advantages of recognition rate of normal, misalignment, and unbalanced faults which reaches 100% and keeps stable when values 5.

**(a) Data 1**

**(b) Data 2**

Figure 6(b) is the changeable map for loose fault, misalignment, and unbalanced data sets (data 2) fault diagnosis accuracy rate with neighboring points of change in based on LLE, HLLE, SLLE, and SHLLE algorithms. Take the embedding dimension , . “Box” Line, “” Line, “x” Line, and “” Line represent a recognition rate curve of LLE, SLLE, HLLE, and SHLLE, respectively. To know, loose fault still has some bad impacts on the effect of LLE and HLLE algorithm, and even the purpose of fault diagnosis cannot be achieved; recognition rate of SLLE algorithm increases in general which shows unstable fluctuation when is smaller. Recognition rate of SLLE algorithm reaches 100% and keeps stable when is equal to 8, which is still suitable for the case where the neighbor number is larger.

SHLLE algorithm still has a soft spot for recognition rate of loose, misalignment, and unbalanced faults which reaches 100% firstly and keeps stable after identifying one case of fluctuation when value is euqal to 8.

In conclusion, fault diagnosis based on SHLLE algorithm has superior performance compared to the other LLE algorithm. SLLE algorithm is suitable for fault identification of which the neighborhood is slightly larger, and it is relatively stable. However, SHLLE algorithm has optimal performance, of which fault identification is more stable than others.

#### 6. Conclusion

The paper researches on rotating machinery. In order to get better recognition effect, LLE of homogenization distance (HLLE) and LLE of supervision and homogenization distance (SHLLE) are proposed. Proving the validity of the fault diagnosis by simulating rotor system failure experiment, the following conclusions are reached.(1)In two-dimensional map of each algorithm for two types of data set, overlapping phenomenon exists between unbalanced fault and misalignment fault when using LLE, HLLE, and SLLE methods. However, SHLLE has a strong advantage and is more effective in fault classification.(2)In the map with the changes of neighbor points of each algorithm for two types of data set, SLLE algorithm is suitable for the fault diagnosis when the number of neighbor points is slightly larger. However, the fault diagnosis of SHLLE algorithm has superior performance compared to the other LLE algorithm, even fault identification is more stable.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

Financial support from National Natural Science Foundation of China (51175170), The Industrial Cultivation Program of Scientific and Technological Achievements in Higher Educational Institutions of Hunan Province (10CY008), Aid Program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan province are gratefully acknowledged.