#### Abstract

Rolling bearing faults often lead to electromechanical system failure due to its high speed and complex working conditions. Recently, a large amount of fault diagnosis studies for rolling bearing based on vibration data has been reported. However, few studies have focused on fault diagnosis for rolling bearings under variable conditions. This paper proposes a fault diagnosis method based on image recognition for rolling bearings to realize fault classification under variable working conditions. The proposed method includes the following steps. First, the vibration signal data are transformed into a two-dimensional image based on recurrence plot (RP) technique. Next, a popular feature extraction method which has been widely used in the image field, scale invariant feature transform (SIFT), is employed to extract fault features from the two-dimensional RP and subsequently generate a 128-dimensional feature vector. Third, due to the redundancy of the high-dimensional feature, kernel principal component analysis is utilized to reduce the feature dimensionality. Finally, a neural network classifier trained by probabilistic neural network is used to perform fault diagnosis. Verification experiment results demonstrate the effectiveness of the proposed fault diagnosis method for rolling bearings under variable conditions, thereby providing a promising approach to fault diagnosis for rolling bearings.

#### 1. Introduction

Rolling bearings are considered to be a critical mechanical component in industrial applications. The bearings’ defects usually lead to a considerable decline in plant productivity and may even cause huge economic losses [1, 2]. Thus, it is important to diagnose rolling bearing fault to keep the bearings in good technical state.

Vibration-based methods have garnered particular attention due to their noninvasive nature and their high reactivity to incipient fault. Therefore, vibration signal analysis is vital for rolling bearings fault diagnosis due to its connection to fault feature extraction accuracy [3]. Aiming to extract the fault features, many feature extraction methods, including Wigner-Ville distribution (WVD) [4], wavelet packet decomposition (WPT) [5, 6], and empirical mode decomposition (EMD) [7–9], have been proposed and have been demonstrated to be powerful. Additionally, many fault diagnosis methods also have been proposed, such as fast spectral kurtosis based on genetic algorithms [10], multiscale entropy and adaptive neurofuzzy inference system [11], and time varying singular value decomposition [12]. However, most of these methods are proposed based on the assumption that the rolling bearings operate under fixed conditions when performing fault diagnosis. Moreover, the application of these methods is limited because of the tough, complex, and particularly variable working environment of rolling bearings [13, 14]. Therefore, it is important to investigate the fault diagnosis method suitable for varying conditions.

Many studies have researched rolling bearings fault diagnosis. However, thus far, few researchers have studied fault diagnosis under variable conditions. In 1990, Potter [15] proposed a constant angular sampling method (i.e., order tracking) that utilized the electronic impulse angular encoder and solved the frequency smearing phenomenon of the spectrum caused by fluctuating rotating speeds and realized the fault diagnosis for rotating machines. Considering the special analog hardware whose function is sampling data increases the cost of equipment; Fyfe and Munck [16] developed the computed order tracking (COT) technique based on order tracking to realize fault diagnosis for rotating machines. However, the COT may make the carrier frequencies of the transient responses, which are caused by the faults at various speeds, expand to a wider scope because the natural characteristic of the bearing system hardly changes, which is not beneficial for extracting the fault characteristic. In addition, [13] has proposed a new method for rolling bearing fault diagnosis under variable conditions. This new method utilizes LMD-SVD to extract features, but LMD also has the problem of iterative calculation capacity, frequency aliasing, end effect and other issues. Because of problems associated with the above methods, we need to research a new method for bearing vibration signal feature extraction, a method based on the nonstationary and nonlinear bearing vibration signals, thereby achieving fault diagnosis under variable conditions.

Scale invariant feature transform (SIFT), an image invariant feature extraction method, can recognize the same image when it is rotated, scaled, translational, and affine transformed. By extracting the 128-dimensional feature containing scale, orientation, and location information, SIFT can perform image recognition and matching under translation, rotation, scaling, and brightness changes [17]. Many studies have used SIFT to recognize images. For example, Montazer and Giveki [18] have utilized SIFT to extract image features and match them to a database (i.e., a content-based image retrieval system). Li et al. [19] have employed Robust SIFT to match remote sensing images, and a number of studies have also applied SIFT to such methods as facial expression recognition [20], ear recognition for a new biometric technology [21], and wheat grain classification [22]. Inspired by SIFT, the vibration signals of rolling bearings are considered to be transformed into images. The recurrence plot (RP) is a kind of method to describe the recursive behavior of dynamic orbit in the phase-space reconstruction; it is an important method to analyze the instability of time series. RPs of rolling bearing vibration signals under different conditions reveal translation and scaling characteristics, so RP is employed to transform the vibration signals under different conditions into images and SIFT is utilized to extract the features of transformed RPs, which is without interference of working conditions.

After the 128-dimensional invariant features are extracted, to reduce the data redundancy between the extracted features and the occupation of computer resources, a dimensionality reduction method is utilized to identify the low-dimensional structure hidden in high-dimensional data. Principal component analysis (PCA) is a widely utilized dimension reduction technique performed by linearly transforming a high-dimensional input space onto a lower dimensional one in which the components are uncorrelated. However, PCA will not perform well when the process exhibits nonlinearity. Hence, kernel principal component analysis (KPCA) was developed to overcome the limitations of PCA in dealing with the nonlinear system [23].

This paper is structured as follows. Section 2 first introduces the image transformation method, which generates images for the following recognition. Then, SIFT, the core of this paper, is described, which is utilized to extract the stable fault features under variable working conditions. Subsequently, KPCA is introduced for the dimensionality reduction. At last, probabilistic neural network (PNN) is described for the final fault classification. Section 3 describes the entire fault diagnosis method for rolling bearing under variable conditions, including descriptions of the experimental data, image transformation, feature extraction, and fault classification. Section 4 includes the results and discussion, and the conclusions are presented in Section 5.

#### 2. Related Theories

##### 2.1. Recurrence Plot

To achieve fault diagnosis under variable conditions, image transformation for SIFT is important to ensure success. Therefore, choosing a good image transformation method is particularly important. On account of the nonlinear and nonstationary characteristics of rolling bearing signals, detecting dynamical changes in complex systems is one of the most difficult problems. Recursiveness is one of the basic characteristics of a dynamic system, and the recurrence plot (RP) based on this characteristic is a good dynamic mainstream shape-description method. Through the black and white dots in the two-dimensional space, the recursive state can be visualized in the phase space [24]. This approach can uncover hidden periodicities in a signal in the recurrence domain. These periodicities are not easily noticeable, and it is an important method that analyzes the periodic, chaotic, and nonstationary of time series. The following theories are related.

The RP analysis is based on the phase-space reconstruction theory, which is described as follows.(1)For a time series, (), whose sample interval is , we chose the mutual information method and CAO algorithm to calculate the suitable embedding dimension and delay time , which could reconstitute the time series. The reconstructed time series is (2)Calculate the Euclidean norm of and in the reconstructed phase space [25]:(3)Calculate the recurrence value [26]: where is the threshold value and is the Heaviside function:(4)Utilize a coordinate graph whose abscissa is and whose ordinate is to draw , where and are the time series labels and the image is RP.

###### 2.1.1. Mutual Information Method

The mutual information method estimating the delay time has been proposed by Fraser and Swinney, based on the Shannon information theory, which is widely used in phase-space reconstruction [27].

The Shannon theory shows that we can obtain the information content of from the event :The relationship between and could be expressed with comentropy :

Apply the theory of the mutual information, and set isand set is

Then, (6) translates into

Usually at the beginning, is large; therefore, we can obtain an infinite amount of information in . and are completely independent for chaotic signals when is large; when , . Generally the first minimum of is selected as the delay time.

###### 2.1.2. CAO Algorithm

The CAO algorithm was proposed by CAO in 1997, and it has excellent properties to clearly distinguish real signal and noise, as well as high computational efficiency [28]. First, we calculated the distance of the points under the embedded dimensionality:where is the norm of the vector; is th vector after phase-space rebuilding, and the embedded dimension is ; is the nearest vector from .

Next, we calculated the average value of the distance change under the same dimension:where is the average value of all .

Finally, according to the discriminant equationwhen , stops changing or changes slowly, and is the minimum embedding dimension.

##### 2.2. SIFT Theory

Recognizing the images that are rotating, scaling, and translating refers to finding the stable points of the images. These points, such as the corners, blobs, T-junctions, and light spots in dark regions, do not disappear with the rotation, scale, translation, and brightness changes. SIFT was developed by Nurhaida et al. to extract distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene [29]. SIFT has five basic steps: constructing scale space, extreme points detection, precise location of key points, orientation assignment, and descriptor calculation [30].

###### 2.2.1. Gaussian Blur

SIFT finds key points in the different scale spaces, and the acquisition of scale space needs to be realized using Gaussian blur. Lindeberg has proved that Gaussian convolution kernel is the only kernel to achieve scale transformation, and it is the only linear kernel [31].

Gaussian blur is an image filter that utilizes normal distribution to calculate the fuzzy template, and the template is used to perform convolution operations with the original image to achieve the transition of fuzzy images.

The normal distribution equation of dimensional space iswhere is the standard deviation of the normal distribution; the larger is, the fuzzier image is. is the fuzzy radius that refers to the distance between the template element and the center of the template. If the two-dimensional template size is , then on the template corresponding to the Gauss equation is

According to the value of , the size of the Gaussian blur template matrix is . Equation (14) is used to calculate the value of the Gaussian template matrix; convolution is calculated with the original image, and the Gaussian blur image of the original image is obtained.

###### 2.2.2. Scale Space Construction

*(1) Scale Space Theory*. Scale space theory was first proposed by Iijima in 1962, and it was widely used in the field of computer vision after being promoted by Duits et al. [32].

The basic concept of scale space is as follows. A scale parameter is introduced in the image model, and the scale space sequence at multiple scales is obtained through the continuous change of scale parameter. The principal contours are extracted from the scale space of these sequences, and the principal contours are utilized as a feature vector to realize edge detection, corner detection, and feature extraction at different resolutions.

*(2) Representation of Scale Space*. The scale space of an image is defined as the convolution calculation between the Gauss function and the original image :where represents convolution: where and are the dimensionality of the Gaussian template determined by . is the pixel location in the image. is the scale space factor, the smaller value of which is the least amount of the smoothed image, and the corresponding scale is smaller.

*(3) Gaussian Pyramid*. The pyramid model of an image is as follows: the original image is constantly downsampling, and it generates a series of different sizes of images, ranging from large to small and from the bottom to the top, thereby constructing a tower-shaped model. The original image is the first stratum of the pyramid, and the new image obtained through downsampling is a stratum of the pyramid. The number of strata in the pyramid is jointly decided through the size of the original and top images. The equation is as follows:where and are the sizes of the original image and is logarithm of the minimum dimensionality of the top image.

To reflect the continuity of scale, the Gaussian pyramid introduces the Gaussian filter on the simple downsampling, as shown in Figure 1. The image in each stratum calculates the Gaussian blur using different parameters; thus, each stratum of the pyramid contains multiple Gaussian blur images. The images in each stratum are named octaves. The initial image (bottom image) of an octave in the Gaussian Pyramid is obtained by sampling from the last third image of the previous octave of images.

*(4) DOG Pyramid*. In 2002, Mikolajczyk found that the scale normalization of the Laplacian Gaussian function can produce the most stable image features compared to other feature extraction functions. The difference of Gaussian (DOG) function is similar to the scale normalization of the Laplacian Gaussian function [17]. Therefore, the DOG filter is applied to the input image. The image is gradually downsampled, and the filtering is performed at several scales. Figure 2 demonstrates the creation process for the DOG filters at different scales.

###### 2.2.3. Extreme Point Detection

The key points are the local extreme points (in the DOG space) whose initial exploration is accomplished by comparing the two adjacent images of each DOG in the same group. To determine the key points, a neighborhood comparison is used, as shown in Figure 3. Each pixel processed by the DOG pyramid is compared with 26 points of its 3-dimensional neighborhood to obtain the maximum or minimum, as the preliminary feature points.

###### 2.2.4. Precise Location of Key Points

*(1) Location of Interpolation*. The extreme points detected by the above methods are the extreme points of the discrete space, which are not the real extreme points. Figure 4 shows the difference of the extreme point of a two-dimensional function in discrete and continuous space. SIFT utilizes the linear interpolation method to obtain accurate key points.

*(2) Remove Edge Response*. The detected key points are further examined to choose the “best” candidates. The stability of the resulting set of key points is determined. Locations with low contrast and unstable locations along edges are discarded by calculating the ratio of the square of the matrix trace and determining the Hessian matrix.

###### 2.2.5. Orientation Assignment

To determine the descriptor with rotation invariance, the local features of images are needed to assign a reference orientation for each key point. By calculating the gradient orientations of the neighborhood pixels of key points, the orientation parameter is specified for each feature point. The gradient values and orientations in are

The gradient histogram statistical method is employed to further ascertain the orientation of the key point. The gradient values of the key points in the neighborhood window are calculated with the key point as the center and as the radius. The 360° of a circle are divided into 36 bins by drawing the gradient histogram, the contribution of each neighborhood point to the orientation of the gradient decreases with the increase of the distance between the neighborhood and the key point. The peak of the histogram is the main orientation of the key points.

After selecting the main orientation, there may also be one or more peaks whose values are more than 80% of the main peak. To enhance the robustness of the match, some key points whose locations and scales are the same as the original key point will be employed.

###### 2.2.6. Descriptor Calculation

After performing the above steps, the key points have the location, scale, and orientation. The next step is to create a descriptor for the key points. First, the coordinate axes are rotated as the key points to ensure the rotation invariance. Then, a window is taken; the center of this window has the key points shown in Figure 5(a). Each grid represents a pixel in the scale space of the neighborhood of the key point; the orientation of the arrow represents the gradient orientation of the pixel, and the length of the arrow represents the gradient mode. In the figure, the circle represents the Gauss range weighted. Next, the gradient orientation histogram of 8 orientations is calculated in every image, and the seed point is formed by drawing the cumulative value of each gradient orientation, as shown in Figure 5(b). In this figure, a key point is composed of 16 () seed points, each of which has eight orientation vectors. The key point can generate 128 () data sets and then form a 128-dimensional feature vector. The concept of the neighborhood orientation information alliance enhances the antinoise ability and also provides good fault tolerance for the feature matching with the localization error.

**(a)**pixel window

**(b)**subregions##### 2.3. Kernel Principal Component Analysis

KPCA projects the dimensional observed data matrix (, input space) onto a high-dimensional feature space , which can be expressed asSimilar to PCA, KPCA aims to project a high-dimensional feature space onto a lower space, in which the principal components are linear, uncorrelated combinations of the feature space [33]. The covariance matrix in the feature space can be formulated asThe characteristic equation iswhere is th sample in the feature space with zero mean, denotes the sample size, and is the transpose operation. Let be the data matrix in the feature space. Hence, can be expressed as . Due to the difficulty of obtaining , a Gram kernel matrix is determined as follows to directly avoid eigen-decomposition :where ; therefore, the inner product in the feature space (see (20)) can be obtained by introducing a kernel function to the input space.

Let

Simultaneously, (20), (21), and (22) obtain the following equation:

The vector of (23) is

To extract the principal components, the projection of feature vector in the feature space is calculated:

##### 2.4. Probabilistic Neural Network

The probabilistic neural network was proposed by Specht in 1990 [34]. It is a feed-forward neural network that was developed from the radial basis function, and its theoretical basis is the Bayes minimum risk rule (Bayes decision theory). As one of the radial basis networks, the PNN is suited to pattern classification, it places the Bayes decision analysis (with the Parzen window function) into the framework of a neural network, and the Bayes classification is produced by combining the Bayes decision and nonparametric estimation of probability density function. It can be described in the following manner. Assuming that there are two fault modes ( and ) for a fault feature sample ,where and denote the prior probability of fault modes and , generally, , , and and are the number of training samples of and , respectively, and is the total number of training samples. is the cost factor used to classify the feature sample belonging to into mode (falsely). is the cost factor used to classify the feature sample belonging to into mode (falsely). and are the probability density functions of the fault modes and , respectively.

Figure 6 shows the PNN structure that demonstrates that the input mode is divided into 2 types. As shown in Figure 6, the PNN is a feed-forward neural network with a 4-layer structure: the input layer, pattern layer, summation layer, and output layer. The input layer transmits input samples to each node of the pattern layer. The node of the pattern layer calculates the weighted sum of the data passed by the input node, following the operation of a nonlinear operator, which transmits the results to the summation layer. The nonlinear operator isAssuming that and are standardized into unit lengths, (1) is equivalent towhere is the weight vector.

The summation layer sums up the input from the pattern layer and then obtains the estimated probability densities. The classification result selected by the output layer is the maximum output of the summation layer.

The PNN is equal to the Bayes pattern classification method, which utilizes the Gauss kernel multivariate probability density function. The density function can be estimated as follows:where is input sample vector, is the number of sample vectors’ variables, (the weight in the PNN) is th training vector of fault mode , and is the number of training samples belonging to mode ; is the smoothing parameter.

#### 3. Method for Fault Diagnosis of Rolling Bearing under Variable Working Conditions

Inspired by SIFT, this study proposes a novel fault diagnosis for rolling bearings under variable conditions. The diagnostic procedure is shown in Figure 7.

The diagnostic process generally consists of four steps. First, the vibration signals in different fault modes under different conditions are transformed into RPs that are regarded as the objects of SIFT. To more accurately reconstruct the phase space, the delay time and embedded dimensionality are calculated using the mutual information algorithm and CAO algorithm is used to calculate the RPs. Second, the SIFT extracts the invariable features of the RPs (as described in Section 2), and the constructing scale space, extreme point detection, precise locations of the key points, orientation assignment, and descriptor calculation are determined to achieve the salient invariable features. Third, due to the high-dimensional vector of the extraction features, KPCA is employed by introducing a kernel function to reduce the dimensionality. Finally, the PNN is used as a classifier to diagnose the fault classification using data from one of the conditions to train the neural network and using data from the other conditions to test the proposed method.

#### 4. Results and Discussion

In this section, vibration data of rolling bearings collected from the Case Western Reserve University Bearing Data Center under different working conditions and fault modes were utilized to validate the effectiveness of the proposed method.

##### 4.1. Description of the Experimental Data

The experimental data used to test and verify the proposed method were obtained from the Bearing Data Center of Case Western Reserve University, Cleveland, OH, USA. The experimental setup used a Reliance Electric 2HP motor connected to a dynamometer, which was used as the prime mover to drive a shaft coupled with a bearing housing. Faults (i.e., size 7 mils, 14 mils, 21 mils, and 28 mils) were introduced into the drive-end bearing (6205-2RS JEM SKF) and fan-end (NTN equivalent bearing) of a motor using the electric discharge machining (EDM) method, with the motor speed varied at 1730, 1750, 1772, and 1797 rpm, respectively. These faults were introduced separately at the inner raceway, rolling element (ball), and outer raceway [35]. To quantify the stationary effect of the outer raceway faults, experiments were conducted for the FE and DE bearings, with outer raceway faults located at 3 o’clock, 6 o’clock, and 12 o’clock. An impulsive force was applied to the motor shaft, and the resulting vibration was measured using two accelerometers, one mounted on the motor housing and the other placed at the 12 o’clock position of the outer race of the drive-end bearing. Digital data were collected at 12,000 samples per second, and data were also collected at 48,000 samples per second for the drive-end bearing faults.

In this study, the DE bearing data for the normal, inner race fault, outer race fault, and rolling element fault with the speed varied between the 4 conditions were acquired for the fault pattern classification, and the fault diameters were 21 mils. The fault information (21 mils and outer race fault at 6 o’clock with four speeds), in terms of the test bearings, is listed in Table 1.

##### 4.2. Image Transformation of the Vibration Signals under Different Conditions

In this section, the vibration signals under different conditions are transformed into 2-dimensional images, which facilitate the extraction of invariable features for the fault classification. As previously mentioned, RP can uncover the hidden periodicities in a signal in the recurrence domain, and it is important that the method analyzes the periodic, chaotic, and nonstationary elements of the time series. Thus, RPs are particularly suitable for the image transformation of vibration signals without loss of signal information.

To accurately confirm the suitable embedding dimension, and delay time of each signal for the phase-space reconstruction, mutual information algorithm, and CAO algorithm were used. The parameters and for each condition are shown in Table 2. Unfortunately, to guarantee the calculation speed, for the data segments of each vibration signal, only 1,000 points were chosen to transform into an RP (with dimensionalities of ) when reconstructing the phase-space due to calculating the ergodic Euclidean norms of and . This experiment selected a 20-set data segment for each signal. Figure 8 shows the RPs in each fault mode under 4 different conditions that were randomly selected in a 20-set data segment. From the results, we see that the size of the RPs in the different modes under different conditions reveals slight differences.

In Figure 8, the row represents the condition changes, and the column represents the different fault modes (i.e., the first column shows the RPs in the normal state, the second column shows the RPs in the inner fault state, the third column shows the RPs in the element fault state, and the fourth column shows the RPs in the outer fault state). In Figure 8, we can see that the RPs of the different fault modes under different conditions have different structural characteristics, while the same fault modes under different conditions are notably similar. Affected by the condition changes, the RPs under the different conditions show the translation variation, scale variation, and combination of these changes.

##### 4.3. Feature Extraction Based SIFT and the Dimensionality Reduction

In this section, the invariable features in each fault mode under the different conditions are extracted from the transformed RPs, based on SIFT. Using SIFT, the scales, orientation, and locations of the key points are calculated. The scale information is obtained by establishing the difference of the Gaussian pyramid, which has 7 octaves. Each octave has 5 strata, and the scale factor is utilized to make the image fuzzy between the different strata. Because of the length of the paper, the DOG scale space of four different fault modes (only under condition 1) are shown in Figures 9–12; the locations of the key points are calculated by locating the extreme value points and through further interpolation to determine the exact extreme points on a continuous space. The detected key points of the RPs are shown in Figure 13, and the orientations of the key points are obtained by calculating the gradient orientation of the neighborhood pixels of the key points. The orientation parameters are specified for each key point through the gradient histogram statistics. After performing the above steps, the descriptor of the key point is established through a 128-dimensional vector.

Due to the essential features of RPs hidden in the high-dimensional space, which makes the calculation difficult, the aforementioned KPCA method was used to reduce dimensionality. First, the input space is mapped onto a high-dimensional feature space using the kernel function, and then the PCA is used to reduce the dimensionality. However, the features of the low-dimensional space are also too large and complex to be taken as feature vectors. To solve this problem and to improve the robustness of the feature vectors, singular value decomposition (SVD) was utilized in this paper to compress the scale of the fault feature vectors and to obtain more stable feature vectors [13]. Figure 14 shows the 3-dimensional visual feature points reduced by KPCA and SVD.

##### 4.4. Fault Classification Based on PNN

In this paper, PNN is employed as the classifier to classify the extracted features from vibration signals under different conditions, which are processed by SIFT and KPCA. To verify that the training data for the different conditions are effective, cross-validation is also necessary; the vibration data collected under each condition are orderly selected as training data, and the data collected under the other three conditions are used as test data, as shown in Table 3.

In each cross-validation, the training data and test data are composed as follows.

*Training Data*. 20 groups of data are selected for each fault mode under only one working condition; thereby totally 80 groups of data are selected for 4 fault modes.

*Test Data*. 20 data groups for each fault mode under the other 3 conditions, a total of 240 groups of data: 1–80 groups of data comprise the first condition, 81–160 groups of data comprise the second condition, and 161–240 groups of data comprise the third condition.

The results of the PNN classification are shown in Figures 15–18, where Figure 15 is the result of the first group of cross-validation, Figure 16 is the result of the second group of cross-validation, Figure 17 is the result of the third group of cross-validation, and Figure 18 is the result of the fourth group of cross-validation. The red spots are the actual fault mode, and the blue triangles are the result of the PNN classifier. In the vertical axis, 1, 2, 3, and 4 represent the normal, inner race fault, element fault, and outer race fault, respectively. At last, the detailed error samples statistics and classification accuracy of the cross-validation are shown in Tables 4 and 5. From Table 5, it can be seen that the classification accuracy values of the four groups of cross-validation are all higher than 97%, indicating that the proposed method is of high effectiveness.

#### 5. Conclusions

A novel rolling bearing fault diagnosis method under variable conditions, which was originally introduced for image recognition, is described in this paper. First, this method transforms the vibration signals into images, which are expressed in RPs. Then, as mentioned above, through the use of SIFT, the scales, orientation, and locations of the key points are calculated to identify the angular points, peripheral points, and bright points in the dark space of the RPs to extract the invariant features under variable conditions. After creating the key point descriptors, KPCA was employed to reduce the dimensionality of the high-dimensional feature vectors using kernel function to map the feature vector onto a higher-dimensional feature space and then using PCA to reduce the dimensionality. Finally, PNN was utilized as a classifier to execute the fault classification.

Future plans include conducting more experimental and object tests to further test the applicability of the proposed fault diagnosis method. In addition, because of the calculation speed limitation of phase-space reconstruction, new image transformation methods are also needed. To increase the feature extraction speed, the improved SIFT algorithm can be tested.

#### Competing Interests

The authors declare that there is not any potential conflict of interests in the research.

#### Acknowledgments

This study was supported by the Fundamental Research Funds for the Central Universities (Grant no. YWF-16-BJ-J-18) and the National Natural Science Foundation of China (Grant no. 51575021), and the Technology Foundation Program of National Defense (Grant no. Z132013B002).