#### Abstract

A novel bearing fault diagnosis method based on improved locality-constrained linear coding (LLC) and adaptive PSO-optimized support vector machine (SVM) is proposed. In traditional LLC, each feature is encoded by using a fixed number of bases without considering the distribution of the features and the weight of the bases. To address these problems, an improved LLC algorithm based on adaptive and weighted bases is proposed. Firstly, preliminary features are obtained by wavelet packet node energy. Then, dictionary learning with class-wise K-SVD algorithm is implemented. Subsequently, based on the learned dictionary the LLC codes can be solved using the improved LLC algorithm. Finally, SVM optimized by adaptive particle swarm optimization (PSO) is utilized to classify the discriminative LLC codes and thus bearing fault diagnosis is realized. In the dictionary leaning stage, other methods such as selecting the samples themselves as dictionary and -means are also conducted for comparison. The experiment results show that the LLC codes can effectively extract the bearing fault characteristics and the improved LLC outperforms traditional LLC. The dictionary learned by class-wise K-SVD achieves the best performance. Additionally, adaptive PSO-optimized SVM can greatly enhance the classification accuracy comparing with SVM using default parameters and linear SVM.

#### 1. Introduction

Rolling bearings have been widely used in rotating machinery. However, rolling bearings easily suffer faults due to their structural characteristics and serious mechanical failure maybe occurs. Hence, it is very essential to conduct fault diagnosis of rolling bearings. Vibration signals are usually used to monitor the bearing operating conditions. Feature extraction and pattern recognition are two crucial steps for machinery fault diagnosis using vibration signals. As for feature extraction of mechanical vibration signals, many methods based on the time domain, frequency domain, or time-frequency analysis have been proposed, such as envelope analysis [1], short time Fourier transform [2], wavelet transform [3, 4], Wigner-Ville distribution (WVD) [5], empirical mode decomposition [6], and Hilbert-Huang transform [7].

Sparse representation is more adaptive to represent a signal based on redundant dictionary comparing with the aforementioned signal processing methods, which has been widely applied in the field of image processing, for example, image denoising [8], image compression [9], and face recognition [10]. Recently, many researchers pay attention to the methods of machinery fault diagnosis based on sparse representation. Feng and Chu implemented matching pursuit for gear damage detection [11]. Liu et al. proposed a feature extraction scheme with sparse coding and realized bearing fault diagnosis [12]. Tang et al. used shift-invariant dictionary learning method to decompose the vibration signal into a series of latent components and then detected the bearing and gear faults [13]. Chen et al. proposed a new method, namely, Sparse Extraction of Impulse by Adaptive Dictionary (SpaEIAD), to extract impulse components of vibration signals with heavy noise and detected the gearbox fault using the proposed method [14].

As shown in [15], locality is more essential than sparsity. Based on [15], locality-constrained linear coding (LLC) algorithm is proposed to reduce computation cost, which employed the locality constraints to map feature in its local coordinate system and ensure that similar features have similar codes [16]. LLC has been widely used for feature extraction of images such as WCE image [17], lepidopteran images [18], CT Images [19], and facial image [20]. In the area of machinery fault diagnosis, LLC has also attracted extensive attention. Wang and Liu employed LLC for extracting the characteristics of time-frequency images of bearing vibration signals [21]. Li et al. utilized locality-constrained sparse coding method for bearing fault diagnosis and proved that locality-constrained sparse coding is effective for the feature extraction of bearing fault [22]. However, in traditional LLC each feature is encoded with a fixed number of bases, which may produce weak responses in the low-density area and poor representation in the high-density area. To tackle this problem, adaptive number of bases can be found based on the distribution of the features [17]. Moreover, in traditional LLC the weight information of the bases is ignored. Weight of the bases can be used to boost the discriminative power of LLC codes [23]. To address the two problems, an improved LLC algorithm based on adaptive and weighted bases is proposed in this paper.

Proper dictionary is very important for sparse representation, which should reflect the latent characteristic patterns of signals so as to obtain better representation. The dictionary can be predefined; however it is always difficult to know the prior knowledge of the signal in practical application. The dictionary can also be constructed by selecting the training samples directly. Nevertheless, when dealing with large-scale dataset the performance of this method is usually not good so that dictionary learning method that learns a small-sized dictionary is very essential. Some dictionary learning methods have been proposed, such as -means [24], -means singular value decomposition (K-SVD) [25], method of optimal directions (MOD) [26], and shift-invariant sparse coding [27]. K-SVD algorithm has proved to be very effective in image compression and denoising [8, 28]. Feature extraction methods based on K-SVD algorithm have been employed for machinery fault diagnosis recently. Feng and Liang exploited shift-invariant -means singular value decomposition (SI-K-SVD) to extract the components of planetary gearbox signals; subsequently background noise is suppressed and the vibration patterns are obtained [29]. In this study, K-SVD algorithm is applied to learn the dictionary for solving LLC codes.

With regard to the intelligent diagnosis of mechanical fault, it is essentially a pattern recognition problem. Many pattern recognition methods have been employed for mechanical fault diagnosis, such as artificial neural network (ANN) [30], support vector machine (SVM) [31–33], Bayesian classifiers [34], and hidden Markov model (HMM) [35]. SVM is a popular machine learning method based on statistical learning theory [36], which employs structural risk minimization principle, and the problems of overfitting, local minimum, and slow convergence speed can be overcome compared with neural network. Moreover, when dealing with small training samples and high-dimensional data SVM is more suitable. If the data cannot be linearly separated, a kernel function can be used in SVM to map the data into high-dimensional feature space. Linear kernel, polynomial kernel, and radial basis function (RBF) kernel are commonly used kernel functions. The penalization coefficient and the parameter of kernel function have a great influence on the performance of SVM. The optimization algorithm particle swarm optimization (PSO) [37] can be employed to optimize the parameters of SVM. SVM with PSO-optimized parameters has been widely applied for machinery fault diagnosis. Liu et al. proposed a hybrid intelligent diagnosis method of bearing fault based on empirical model decomposition (EMD), distance evaluation technique, and wavelet support vector machine (WSVM) with PSO [32]. Standard PSO algorithm easily falls into the local optimum and the convergence is slow. To this end, adaptive PSO algorithm with adaptive weight and acceleration coefficients is adopted for the optimization of the parameters of SVM in this paper [38].

In our work, a novel method based on improved LLC algorithm and adaptive PSO-optimized SVM is proposed for bearing fault diagnosis. Firstly, preliminary feature is extracted from the bearing vibration signals under different operating conditions. Then, class-wise K-SVD algorithm is employed to learn the dictionary for solving LLC codes. Afterwards, based on the learned dictionary LLC codes can be solved using the improved LLC algorithm and employed as the input feature vectors of SVM. Finally, adaptive PSO-optimized SVM is utilized to distinguish different operating conditions of bearings and hence the bearing fault diagnosis is realized.

The paper is organized as follows: in Section 2 the feature extraction method based on improved LLC is presented. In Section 3 the classification method based on adaptive PSO-optimized SVM is described. In Section 4 the summary of the proposed bearing fault diagnosis method is demonstrated. An experiment of rolling bearings with artificial fault is conducted to validate the proposed method in Section 5. Finally, the conclusions are drawn in Section 6.

#### 2. Feature Extraction Based on Improved Locality-Constrained Linear Coding

Firstly, feature extraction based on traditional methods is employed to acquire the preliminary feature. Then two stages including dictionary learning and LLC codes solving are conducted for feature extraction based on sparse LLC codes which can be regarded as high level sparse features.

##### 2.1. Preliminary Feature Based on Wavelet Packet Node Energy

Wavelet packet node energy (WPNE) is adopted for the preliminary feature extraction [34, 39]. Wavelet transform is very efficient for local time-frequency analysis of nonstationary signals. Based on discrete wavelet transform (DWT), discrete wavelet packet transform (DWPT) continues to decompose the detailed coefficients of DWT to finer frequency bands.

Denote a signal , wavelet packet transform is conducted, and we can get the wavelet packet coefficients corresponding to wavelet node , where is the decomposition level and is the node number with the same level. At time the coefficient of node is denoted as , where . The reconstructed signal obtained by node is , where . The energy of the signal can be expressed by the square of 2-norm of signal. The energy of wavelet packet node and corresponding reconstructed signal is equal, which is denoted as :

And the total energy of wavelet packet nodes corresponding to depth is equal to the energy of signal ; that is,

The wavelet packet node corresponds to the frequency range , where is sampling frequency, so we can obtain the energy distribution corresponding to the frequency domain, which is the relative wavelet packet node energy:

In this paper the vibration signal is decomposed with 3-level DWPT and eight wavelet packet nodes corresponding to different frequency bands are produced. The mother wavelet Daubechies 2 (db2) is used since it is effective for bearing fault diagnosis [4]. Feature vectors can be formulated as follows:

In this way an 8-dimensional feature vector is constructed for each signal, which is used for further feature extraction as shown below.

##### 2.2. Dictionary Learning

Proper dictionary selection is important for sparse features, which should fully reflect the characteristics of signals in order to better extract sparse features. However, predefined dictionaries such as Dirac dictionary, Fourier dictionary, wavelet dictionary, and Gabor dictionary usually cannot match the characteristics of signals well. When the dataset is getting large, the methods of using all the training samples directly or randomly selecting some training samples as dictionary may not perform well so learning a dictionary with small size which can better capture the signal characteristics is essential. Assume that there are a set of signals ; dictionary learning will learn an optimal dictionary with specified dictionary atoms which is adaptive to the signals. More generally, we should optimize both dictionary and sparse coefficients using sparsity constraint of norm as follows:where is sparse coefficient set corresponding to the signal ; are sparse coefficients corresponding to all signals. is the sparsity prior which means the nonzero number in should not be more than . The notation stands for the Frobenius norm.

We can also use sparsity constraint of norm as follows:where *β* is a scalar controlling the contribution of the sparsity term.

In (5) and (6) there are two issues.

*(**1) Coefficients Solving*. To solve the sparse coefficients with a specified dictionary, there are many algorithms for approximately doing this, including norm based basis pursuit (BP) [40], norm based matching pursuit (MP) [41], and orthogonal matching pursuit (OMP) [42]. In this paper, we employ the OMP method for solving the sparse coefficients because OMP performs better and runs faster.

*(**2) Dictionary Learning*. In order to find a dictionary** D** which best matches the structures in the signals, some algorithms have been developed for dictionary learning, for example, -means [24], K-SVD [25], MOD [26], and shift-invariant sparse coding [27].

Through dictionary learning, the dictionary atoms can capture the characteristic patterns of the signals. Subsequently, the signals can be sparsely represented based on the learned dictionary. The nonzero coefficient represents activation of a characteristic pattern; thus the characteristic pattern of the signal is more explicit than preliminary characteristic. Hence, sparse codes can be taken as high level representations of the signals.

A redundant dictionary can be learned based on K-SVD algorithm. K-SVD algorithm that solves (5) can learn a dictionary with small size based on the training samples. A two-step iterative algorithm using singular value decomposition is utilized for the optimization of object function demonstrated in the above equation. Finally a small-sized dictionary which best suits the training samples can be learned with K-SVD algorithm [25].

Using K-SVD algorithm, we can learn dictionary using the samples that contain all classes or learn a subdictionary separately based on the samples of each class, which is class-wise K-SVD. For class-wise K-SVD, assuming that there are classes of signals, the learned dictionary corresponding to each class named subdictionary (where indicates the class label) can be obtained with K-SVD algorithm. Each subdictionary contains atoms. Then a whole redundant dictionary with atoms can be formulated by concatenating the subdictionaries:

##### 2.3. Locality-Constrained Linear Coding

Assuming that the signal set , after the dictionary has been learned, sparse codes of the training samples and test samples can be solved by (5) or (6). However, since locality certainly leads to sparsity while sparsity cannot necessarily lead to locality, locality is more essential than sparsity [15]. Therefore, in LLC locality constraint is substituted for sparsity constraint as follows [16]:where represents element-wise multiplication. denotes the locality adaptor, which allows different freedom for each base proportional to its similarity to the signal :where and is the Euclidean distance between and . The locality adaptor is further normalized to by subtracting from . The parameter is utilized to adjust the weight decay speed of the locality adaptor. The constraint means the requirements of shift invariance of the LLC codes. The parameter denotes the weight of locality constraint term.

Comparing with sparse codes solved by (5) or (6), LLC codes own the property of local smooth sparsity so that similar codes can be produced for similar signals. In addition, (8) has analytical solution for each signal :where is the data covariance matrix. Hence, LLC codes can be solved very fast using the analytical solution while sparse code solved by (5) or (6) always needs iterative optimization process and thus has high computation complexity.

In order to improve the encoding speed, fast approximation LLC can be achieved by selecting only () nearest neighbors of as the local bases . Thus (8) is converted to

Using fast approximation LLC, the computation complexity can be reduced from to . In this paper, fast approximation LLC is employed.

##### 2.4. Improved LLC Based on Adaptive and Weighted Bases

###### 2.4.1. Adaptive LLC

In traditional LLC, each signal is encoded with a fixed number of bases. However, considering the distribution of the feature vectors, LLC with a fixed number of bases may produce weak responses in the low-density area and poor representation in the high-density area [43]. In order to solve this problem, adaptive bases can be selected based on the feature density. The traditional LLC is reformulated aswhere represents the weight of adaptive locality constraint term. denotes the adaptive locality adaptor:where . represents the set of nearest neighbors of .

For each signal , the variance of the distances to all bases can be computed:

As for the signal , if the distance variance is large which means the signal locates in the low-density area with relatively few neighbors, fewer nearest neighbors can represent the signal enough. On the contrary, if the distance variance is small which means the signal locates in the high-density area with more neighbors, then larger number of nearest bases should be selected to better represent the signal. Accordingly, the number of bases corresponding to the signal can be adaptively adjusted depending on the distance variance as follows:where denotes the distance variance of all the signals. The symbol int represents the round operation. The adaptive coding parameter is a specified scalar. After the adaptive number of nearest bases is obtained, the LLC codes can be solved using fast approximation LLC as the above section.

###### 2.4.2. Weighted LLC

A dictionary contains both sample information and weight information [23]. The sample information is the base vector of the dictionary. The weight information is the weight of a base depending on the proportion of training samples assigned to the base. However, the traditional LLC lacks the weight information and assumes that every base owns the same weight. In order to boost the discriminative ability of the LLC codes, the weight information of the dictionary is considered.

In the dictionary training stage, the dictionary is learned by the training samples. After the dictionary is obtained, for each training sample the Euclidean distance between the sample and each base can be calculated and the nearest base is treated as the base that the sample is assigned to. Then for each base , the accumulated number of the training samples assigned to the base can be acquired and the weight of the base iswhere is the number of the total training samples.

Assuming that nearest neighbors are selected using fast approximation LLC, the corresponding weight vector is , where denotes the weight of the th selected base. The weight vector should be normalized as follows:

Suppose that the solved LLC code of signal** x** corresponding to selected base using fast approximation LLC is ; the weighted code can be acquired by

###### 2.4.3. Improved LLC Based on Adaptive and Weighted Bases

Considering both adaptive number of bases and the weight of the selected bases, an improved LLC algorithm based on adaptive and weighted bases is proposed. Assume that for a signal using improved LLC based on adaptive bases, nearest bases are selected, the LLC code of the signal corresponding to selected bases is , and the corresponding weight vector is . Then the weight vector is normalized by

Finally, the weighted code can be solved as follows:

With regard to the LLC codes corresponding to the unselected bases, they are set to zeros directly. The improved LLC codes (ILLC codes) of the training samples and test samples which are solved by the improved LLC algorithm based on adaptive and weighted bases can be used as feature vectors directly. Thus -dimensional feature vector can be obtained for each sample.

#### 3. Classification Based on APSO-Optimized SVM

##### 3.1. Support Vector Machine

Support vector machine (SVM) is a popular machine learning method for the problems of classification, regression, and outlier detection, which is based on statistical learning theory. The basic SVM is applied to two-class classification problems, mapping the data into the high-dimensional feature space and finding the optimal hyperplane by maximizing the margin between the two classes. In practical application there are always multiple classes to discriminate, that is, multiclass classification. Many methods have been proposed to solve the multiclass problems by combining the two-class classification, such as one-against-one, one-against-all, and binary tree architecture. In this paper, LIBSVM is used for multiclass classification with one-against-one method [44]. Suppose there are classes; two-class classifiers are constructed.

The effect of kernel function is mapping the data into the high-dimensional feature space and hence makes the data linearly separable for classification in SVM. Common SVM kernel functions include linear, polynomial, sigmoid, and radial basis function (RBF) kernels. RBF kernel is widely used for nonlinear classification and thus RBF kernel function is utilized, where can stand for the width of RBF kernel function. In addition, the other important parameter is the penalization parameter . Therefore, the parameters need be optimized to obtain the best performance of SVM. Firstly, the data should be linearly normalized to the range of and cross validation based on the parameters is conducted to get the highest cross validation accuracy. Then the corresponding best parameters are employed to train on the training set and subsequently SVM model can be acquired. Finally, the test set is predicted using the SVM model. Adaptive particle swarm optimization (APSO) is implemented for the optimization of the parameters of SVM as shown below.

##### 3.2. Adaptive Particle Swarm Optimization

Particle swarm optimization (PSO) is an evolutionary algorithm based on swarm intelligence [37]. Comparing with genetic algorithm, PSO does not have the cross and mutation operations and the particles are updated only by the internal velocity so the algorithm is simple and easier to implement.

Assume that the particle swarm owns particles and the th particle is denoted as (where denotes the parameter numbers, in this paper ). The corresponding velocity of the th particle is denoted as . During the iteration, the best value of the th particle and the best particle in all the particles are denoted as and , respectively, among which represents the local best value of the th particle while represents the global best value. At first the particles of the particle swarm are randomly initialized within the specified range of parameter value. Then during the th iteration the particles and the corresponding velocities are updated as follows:where named weight is elastic coefficient of velocity for velocity update, which indicates the effect of the previous velocity on the current velocity. and are positive scalars named acceleration coefficients, among which reflects the local search ability while reflects the global search ability. and are random variables with uniform distribution limited in .

Each iteration can be regarded as one generation. When the maximum number of iterations, namely, maximum generations, has been reached, the iteration is terminated. Finally, the global best value can be found corresponding to the highest cross validation accuracy. The fitness function of PSO is also the cross validation accuracy.

In PSO algorithm, it is easy to fall into the local optimum and the convergence is slow. The weight is very important to balance exploration and exploitation and thus has great influence on the accuracy and efficiency of the search. In many papers, the weight is linearly decreased with the number of iterations. However, the iteration number cannot well indicate the convergence state of the particle swarm. The average velocity of the particle swarm is large in the early iteration while it becomes small in the later iteration. Therefore, the average velocity of the particle swarm can better reflect the convergence state as follows:

Based on the average velocity of the particle swarm, adaptive weight can be acquired bywhere is the specified maximum corresponding to the th parameter. It can be found that the weight is adaptively updated according to the average velocity of the particle swarm.

Moreover, the acceleration coefficients and are also adaptively updated bywhere is a specified constant. It indicates that the acceleration coefficient for local search is decreased while for global search is increased with the decline of the average velocity.

Therefore, in the improved PSO algorithm with adaptive weight and acceleration coefficients, namely, adaptive PSO (APSO), the particles and the corresponding velocities can be updated as follows:

The adaptive PSO algorithm can not only improve the global search ability but also accelerate the convergence compared with standard PSO algorithm [38].

#### 4. Bearing Fault Diagnosis Model Based on ILLC Codes and APSO-Optimized SVM

In this study, a bearing fault diagnosis model based on ILLC codes and APSO-optimized SVM is proposed. The proposed fault diagnosis method includes six stages: preliminary feature extraction, dictionary learning, sparse feature extraction base on ILLC codes, optimization of the parameters of SVM, SVM model training, and fault diagnosis. The flow chart of the proposed scheme is demonstrated in Figure 1. The detailed description is as follows.

*(**1) Preliminary Feature Extraction*. The bearing vibration signal is decomposed with 3-level DWPT and 8-dimensional wavelet packet node energy is obtained for each signal.

*(**2) Dictionary Learning*. The dictionary for solving LLC codes is learned by class-wise K-SVD algorithm based on the training set.

*(**3) Sparse Feature Extraction Base on ILLC Codes*. Based on the learned dictionary, the sparse LLC codes of the training set and test set can be solved with improved LLC algorithm. Since the sparse LLC codes are discriminative, they can be used for the input feature vector of the classifiers.

*(**4) Optimization of the Parameters of SVM*. Adaptive particle swarm optimization (APSO) is carried out for the optimization of the parameters of SVM and in order to obtain the best parameters and corresponding to the highest cross validation accuracy based on the training set.

*(**5) SVM Model Training*. Under the best parameters and , SVM model is trained using the training set.

*(**6) Fault Diagnosis*. The class label of the test sample can be acquired based on the trained SVM model and thus the bearing fault diagnosis is realized.

#### 5. Experiment and Analysis

##### 5.1. Description of Data Set

An artificial fault experiment on rolling bearings is carried out to verify the proposed method based on ILLC codes and adaptive PSO-optimized SVM. The bearing test rig is demonstrated in Figure 2. The machine shaft is driven by an AC motor using rubber belts and shaft coupling. The rolling bearing is employed to support the shaft, whose outer race is fixed while the inner race rotates with the shaft. A data acquisition system (NI PXI-1042) is installed to acquire the vibration signals from the rolling bearing. In the data acquisition system, a bracket is stuck on the rolling bearing and an accelerometer (Kistler 8791A250) is mounted on the bracket. A series of GB203 rolling bearings are applied to the experiment. Single point faults generated by electrodischarge machining are conducted on the surface of the inner race, outer race, and rolling element of the bearings and inner race fault, outer race fault, and ball fault are introduced, respectively. Therefore, the rolling bearings have four states including normal, inner race fault (IRF), outer race fault (ORF), and ball fault (REF). The rotating speed is 720 r/min. The vibration signals are collected with the sampling frequency 25.6 kHz.

In the experiment, there are 20,480 points in each example and 120 examples are collected in total under four operating states with 30 examples for each state. Time series with 2048 points are obtained through truncation on each example and thus 1200 data samples can be acquired under four states with 300 data samples for each state. The raw vibration signals of rolling bearings under four different states are shown in Figure 3.

With regard to each state, 150 data samples are selected randomly as the training set and the rest are taken as the test set. Therefore, 600 training samples and 600 test samples are formed. Totally there are four classes to distinguish different running status.

##### 5.2. Feature Extraction Based on Improved LLC

Firstly, wavelet packet transformation using db2 wavelet basis was carried out on both training samples and test samples and the preliminary features based on wavelet packet node energy were obtained subsequently. With respect to each signal, 8-dimensional preliminary feature was acquired. The preliminary features based on WPNE under four different states are demonstrated in Figure 4. It can be found that although the difference of the features corresponding to four classes is relatively obvious, it is still hard to distinguish so further feature extraction based on high level sparse LLC codes is implemented to enhance the discriminative ability.

**(a)**

**(b)**

**(c)**

**(d)**

Then dictionary learning was conducted using class-wise K-SVD algorithm based on the training samples. The dictionary atom number corresponding to each class is specified as 10 and the sparsity prior is set to 8. The learned dictionary with totally 40 atoms is shown in Figure 5. As shown in the figure, the learned dictionary atoms corresponding to different classes exhibit large difference while the dictionary atoms corresponding to the same class are more similar on the whole.

The method randomly selecting some training samples as a subdictionary for each class and then concatenating the subdictionaries to form a redundant dictionary was also conducted. For fair comparison, the selected number per class is the same as the proposed method. Moreover, we also employed dictionary learning methods based on class-wise -means and K-SVD for comparison. Class-wise -means learn a subdictionary for each class using -means algorithm and then a whole redundant dictionary can be formed by concatenating the subdictionaries while K-SVD directly learns the whole dictionary based on all the samples containing four classes. The dictionary size is also kept the same as the proposed method. During the dictionary learning stage of K-SVD, the sparsity priors are also set to 8.

After the dictionary has been learned, the sparse LLC codes of the training samples and test samples can be solved using improved LLC algorithm based on adaptive and weighted bases. The parameter is set to 10. Hence, 40-dimensional feature vector is obtained for each signal, which can be used as the input of the classifier. Using improved LLC algorithm, the sparse LLC codes of four test samples corresponding to four classes and the sum of absolute sparse LLC codes of different test samples corresponding to the same class are shown in Figures 6 and 7, respectively, where the dictionary atom numbers 1~10, 11~20, 21~30, and 31~40, respectively, represent the 1st class (norm state), the 2nd class (inner race fault), the 3rd class (ball fault), and the 4th class (outer race fault). The sum of absolute sparse LLC codes of different test samples using traditional LLC is also shown in Figure 8.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

As can be seen from Figure 6, for a test sample with class label , the corresponding LLC code has significant values in the dictionary atoms corresponding to the th class. As shown in Figure 7, for the test samples belonging to the same class, the peak values are mostly located at the dictionary atoms belonging to the corresponding class, which indicates that the improved LLC algorithm can ensure that similar signals belonging to the same class will have similar sparse codes and thus the sparse LLC codes are discriminative for classification. Comparing Figure 7 with Figure 8, it can be seen that the improved LLC codes have larger values in the dictionary atoms belonging to the same classes and smaller values in the dictionary atoms belonging to the other classes and thus are more discriminative than traditional LLC codes. Therefore, the improved LLC codes can effectively reflect the fault characteristics of rolling bearings and well distinguish different bearing states.

##### 5.3. Fault Diagnosis with ILLC Codes

After feature extraction the classification methods based on SVM were employed. Firstly, we used linear SVM. Then the effect of preliminary feature using different methods and the influence of dictionary learning and LLC parameter sets was discussed. Finally, the bearing fault diagnosis based on ILLC codes and APSO-optimized SVM was conducted.

###### 5.3.1. Diagnosis Results Based on Linear SVM

Combining different methods for generating dictionary with improved LLC algorithm, four feature extraction methods including randomly selecting some training samples as dictionary, class-wise -means, K-SVD, and class-wise K-SVD as mentioned before were implemented, which are denoted as original, CK-means, K-SVD, and CK-SVD, respectively. Combing class-wise K-SVD with traditional LLC algorithm was also conducted and denoted as CKSVD-LLC. The classification results are shown in Table 1. The detailed results of the classification accuracy for each class are shown in Figure 9.

From Table 1, it can be found that the dictionary has a significant influence on the classification accuracy. The feature extraction method based on improved LLC codes by combining class-wise K-SVD with improved LLC algorithm acquires the highest accuracy, which validates the superiority of the improved LLC codes. The accuracy of class-wise K-SVD is higher than class-wise -means, which indicates that K-SVD is more effective for dictionary learning. We can also find that class-wise K-SVD outperforms K-SVD. Based on the same dictionary learned by class-wise K-SVD, the improved LLC is superior to traditional LLC, which shows the effectiveness of the improved LLC algorithm. As shown in Figure 9, the classification accuracy of ball fault (REF) is the lowest for all the methods, which indicates that the ball fault is more complex to distinguish. The proposed method based on class-wise K-SVD and improved LLC has the highest accuracy for ball fault (REF), which shows that the proposed method can improve the classification accuracy of ball fault.

###### 5.3.2. Comparison with Different Preliminary Features

Different preliminary features based on direct use of the raw signal or the statistical features in time domain [45] were produced for comparison. The nine time-domain features include standard deviation, root mean square (RMS), kurtosis, skewness, crest factor, peak-peak, impulse factor, clearance factor, and shape factor. Then the sparse features based on class-wise K-SVD and improved LLC were extracted and linear SVM was employed as the classifier. The parameters of dictionary learning and LLC are the same as Section 5.2. Moreover, directly using preliminary features as the input of SVM was also conducted. The classification results are shown in Figure 10, where No FE (no feature extraction) represents the method that directly uses the raw signal. For each bar, the different color on the top means the improvement of the high level feature based on improved LLC. As shown in Figure 10, for both directly using preliminary feature and LLC codes, WPNE owns the best accuracy. With respect to each preliminary feature, the improved LLC performs better than direct use of preliminary feature especially direct use of the raw signal, which shows the superiority of the sparse feature based on improved LLC.

###### 5.3.3. Influence of Dictionary Learning and ILLC Parameter Sets

The parameter influence on the method that combines class-wise K-SVD, improved LLC, and linear SVM was investigated. The major parameters including the dictionary atom number corresponding to each class, namely, the whole dictionary size and adaptive coding parameter in improved LLC, were discussed.

Firstly, was set to 10 and the adaptive coding parameter varied between 3 and 15. Then different sizes of dictionaries with atom number corresponding to each class varied from and thus the whole dictionary size varied from were employed. The classification results are shown in Figures 11 and 12, respectively, and the dictionary training time with varying dictionary size is demonstrated in Figure 13.

It can be found in Figure 11 that the parameter in improved LLC has large influence on the classification result. The best classification accuracy is acquired at . From Figure 12 we can see that the classification accuracy rises rapidly when the dictionary size is small while the increase slows down when the dictionary size becomes large enough. Overall, the increase of dictionary size can improve the classification accuracy. However, it is shown in Figure 13 that the training time grows rapidly with the increase of so the parameter should be properly specified considering the computational complexity and classification accuracy simultaneously.

###### 5.3.4. Diagnosis Results Based on APSO-Optimized SVM

Based on class-wise K-SVD and improved LLC codes, adaptive PSO was implemented for the optimization of the parameters and of SVM with RBF kernel function. The parameters of dictionary learning and LLC remain the same as Section 5.2. and are limited in and 5-fold cross validation is conducted. The population number is 20, the maximum iteration number is 100, and the fitness function is the cross validation accuracy. The parameter is set to 3. Standard PSO with the same parameters was also carried out for comparison. The parameters , , and for standard PSO are, respectively, set to 1, 1.5, and 1.5. The fitness curves of standard PSO and adaptive PSO are shown in Figures 14 and 15, respectively. From Figures 14 and 15, it can be found that the convergence of adaptive PSO is faster than standard PSO. Moreover, the final cross validation accuracy of adaptive PSO is a little higher than standard PSO.

After the best parameters and are obtained, based on the training set optimized SVM model can be acquired using the optimized parameters. Finally, the test set can be predicted by the SVM model. The default parameters and of SVM are and ( is the data dimension) in libsvm [44]. SVM with default parameters and linear SVM were also performed for comparison. The classification results and computation time are shown in Table 2. From the results we can find that the classification accuracy of the proposed method based on improved LLC and adaptive PSO-optimized SVM is the highest, which indicates that the proposed method can effectively distinguish different status of rolling bearings and hence the bearing fault diagnosis is realized. The accuracy is relatively low when using the default parameters, which shows that the parameters and have great effect on the performance of SVM with RBF kernel function and thus parameter optimization is essential for SVM with RBF kernel function. Comparing with linear SVM, although the computation time of adaptive PSO-optimized SVM is much longer, the accuracy is obviously higher, which proves the effectiveness of RBF kernel. Comparing with standard PSO, the adaptive PSO has higher accuracy and shorter computation time, which validates the superiority of adaptive PSO.

#### 6. Conclusion

A novel method based on improved LLC algorithm and adaptive PSO-optimized SVM is proposed for fault diagnosis of rolling bearing. The experimental results show that the improved LLC algorithm is superior to traditional LLC. Moreover, different dictionary has large influence on the LLC codes. The dictionary learned by class-wise K-SVD algorithm can produce more discriminative LLC codes than the dictionary generated by selecting the samples themselves or other dictionary learning methods such as -means and K-SVD. Some other preliminary features are also employed for comparison, which shows that the high level sparse feature based on improved LLC outperforms the preliminary feature, especially when directly using the raw signal. The optimization results of SVM with RBF kernel function show that parameters selection is very important and adaptive PSO can greatly enhance the performance of SVM. Possible future work includes using other sparse coding algorithms to extract the sparse features of bearing vibration signals more effectively.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 51575339 and 51475286).