#### Abstract

Addressing the problem that it is difficult to extract the features of vibration signal and diagnose the fault of rolling bearing, we propose a novel diagnosis method combining multisynchrosqueezing S transform and faster dictionary learning (MSSST-FDL). Firstly, MSSST is adopted to transform vibration signals into high-resolution time-frequency images. Then, the local binary pattern (LBP) operator is introduced to extract the low-dimensional texture features of time-frequency images, which improves the speed of fault recognition. Finally, nonnegative matrix factorization (NMF) with only one hyperparameter and nonnegative linear equation are used to solve the dictionary learning and feature coding, respectively. The feature coding is input into the classifier for training and recognition. Experiments show that our method performs well on the rolling bearing dataset of Case Western Reserve University (CWRU) and the Society for Machinery Failure Prevention Technology (MFPT). Further, the proposed method is applied to the loudspeaker pure-tone detection dataset, and the loudspeaker anomaly diagnosis is achieved. The diagnosis results verify that our method can meet the needs of practical engineering.

#### 1. Introduction

Rotating machinery plays an increasing role in electric manufacturing, transportation, power, and other industries. As the core component of rotating machinery, rolling bearings directly affect the operation of the entire equipment. However, the damage of rolling bearings is inevitable, which may cause serious economic losses and even safety accidents [1–3]. Thus, it is of great significance to detect bearing faults in time and take appropriate maintenance measures according to the diagnosis results [4, 5].

Vibration signal, which contains rich information and can reflect the running state of rotating machinery well, has become the most commonly used signal source in fault diagnosis of rotating machinery. The vibration signals of rolling bearings have obvious nonlinear and nonstationary characteristics so that it is difficult to recognize their faults directly. By using the appropriate time-frequency analysis method to process the vibration signal, we can obtain the variation law of its spectrum with time. The idea of time-frequency analysis is originated from the Gabor transform [6]. Thereafter, short-time Fourier transform (STFT) [7], continuous wavelet transform (CWT) [8], and S transform (ST) [9] appear successively. Although these methods are easy to implement, the limitations of Heisenberg’s uncertainty principle [10, 11] prevent them from improving both time and frequency resolutions. To obtain time-frequency images with better energy concentration of vibration signals, Daubechies et al. [12] proposed synchrosqueezed wavelet transform (SSWT). In essence, it is a time-frequency analysis method of energy rearrangement. Namely, based on CWT, the spectrum energy is redistributed and concentrated on the instantaneous frequency [13]. Based on this idea, Huang et al. [14] proposed the synchrosqueezing S transform (SSST) and Yu et al. [15] proposed the multisynchrosqueezing transform (MSST). SSST is an energy rearrangement algorithm based on ST, and ST has better performance than STFT. MSST is a multiple energy rearrangement algorithm based on STFT. In other words, multiple iterations of synchrosqueezing transform are performed. The more iterations, the better the time-frequency energy concentration. Combining the advantages of both SSST and MSST, multisynchrosqueezing S transform (MSSST) [16] is proposed as a new time-frequency analysis method for rolling bearing vibration signals.

Although the time-frequency image of the vibration signal of the rolling bearing is more intuitive than the raw signal, the dimension of the time-frequency image is too large to be directly inputted into the classifier for training and recognition. Therefore, it is necessary to extract the low-dimensional and valuable features of the time-frequency image. Over recent years, many scholars have studied this problem. In reference [17–19], a convolutional neural network (CNN) was used to extract the features of CWT, STFT, and HHT time-frequency images, respectively. Li et al. [20] proposed a convolution sparse self-learning (CSSL) to extract the defective bearing morphological feature. Wang et al. [21] designed a one-dimensional vision ConvNet (VCN) to extract multiscale sensitive features of bearings in complex operating environments. These deep learning-based methods are relatively new, but they have three major problems as follows: (1) they require a large number of training samples which are expensive and difficult to obtain; (2) it is challenging and time-consuming to train an excellent CNN from scratch with only a small number of samples; (3) plenty of hyperparameters have to be predetermined for CNN, such as activation functions, epoch number, learning rate, momentum, kernel sizes, and numbers of layers.

Therefore, the traditional machine learning method based on feature engineering is still worth further study. In reference [22], two-dimensional nonnegative matrix factorization (2DNMF) is developed to extract more informative features from the ST time-frequency images for accurate fault recognition. In reference [23], the features of ST time-frequency images were first extracted by nonnegative matrix factorization (NMF) [24, 25], and then nondominated sorting genetic algorithms (NSGA-II) were proposed to make secondary the selection of features. Yu et al. [26] proposed a rolling bearing fault diagnosis method based on Hilbert–Huang transform (HHT) and supervised sparse coding (SSC). This method adopted SSC to obtain a sparse representation of the marginal spectrum generated by HHT and used the support vector machine (SVM) to achieve fault recognition. Although these methods can obtain high recognition precision on their own datasets, they may not be suitable for simultaneous diagnosis of fault location and damage degree.

In this regard, Sun et al. [27] proposed a method combining MSST and sparse feature coding based on dictionary learning (SFC-DL). Li et al. [28] designed a symplectic weighted sparse SMM (SWSSMM) model with the sparsity constraint and low-rank constraint, and Li et al. [29] developed the discriminative manifold random vector functional link neural network (DMRVFLNN) model. The above fault diagnosis methods cannot overcome the problem of how to select the optimal parameters for their models. More importantly, the method proposed by Sun et al. would consume many computer resources and running time because all the elements of time-frequency images are taken for nonnegative matrix factorization with sparseness constraints (NMFSCs) [30] and sparse coding.

To overcome the abovementioned problems, we propose to first extract the texture features of time-frequency images by the local binary pattern (LBP) [31, 32] operator and then use these texture features for dictionary learning. The LBP algorithm with linear order time complexity and space complexity is simple to calculate. Meanwhile, the texture features of time-frequency images have rich information and low dimension, which can greatly improve the performance of dictionary learning and feature coding. In addition, we use NMF instead of NMFSC for dictionary learning and reduce the hyperparameter to only one, which makes the optimal fault diagnosis model easier to be obtained. We name these optimizations faster dictionary learning (FDL).

In summary, the main contributions of this article are as follows:(1)A new method for rolling bearing fault diagnosis is proposed by combining MSSST and FDL, which is named MSSST-FDL(2)MSSST is adopted to obtain high-resolution time-frequency images of vibration signals, which can promote the accuracy of fault diagnosis(3)To improve the speed of feature extraction from time-frequency images, we design the FDL algorithm by introducing LBP and NMF(4)Experiment results on two rolling bearing datasets and one loudspeaker dataset show that the proposed method performs well and has the potential to be applied to different types of equipment

The remaining of this article is mainly described as follows. Section 2 introduces the theory of MSSST and FDL. Section 3 presents the experimental comparisons. An extended application of the proposed method is shown in Section 4. Finally, the conclusions are presented in Section 5.

#### 2. The Proposed Method

In this paper, a new method based on time-frequency analysis and improved dictionary learning is proposed for fault diagnosis of rolling bearing. The main procedures are described as follows: *Step 1*. Raw vibration signals of rolling bearing are collected under different working conditions, and their states are noted. *Step 2.* The raw signals are segmented and ensured that each sample signal contains one complete period at least. *Step 3*. MSSST is performed for sample signals to obtain the time-frequency images with high resolution. *Step 4*. FDL is used to process the time-frequency images, and we can get the effective feature coding of each sample quickly. *Step 5.* The feature coding set is divided into the training, validation, and testing sets. Then, the diagnosis model is obtained by cross-validation on the training set and the validation set. *Step 6.* Testing set is input into the diagnosis model for fault diagnosis.

Figure 1 shows the overall framework of the proposed method, and the following subsections provide details of MSSST and FDL.

##### 2.1. Multisynchrosqueezing S Transform

In practice, raw vibration signals are always nonlinear and nonstationary. It is necessary to process these complex signals. Fortunately, time-frequency analysis is an effective approach to reveal the frequency components and time-variation features of vibration signals. In various time-frequency analysis methods, MSSST combines the advantages of SSST and MSST to generate better energy concentration and suppress the cross-terms over the time-frequency plane.

Let the vibration signal be , then the expression of its ST is as follows:where is the frequency, represents the time axis displacement parameter, is an imaginary unit, and is the window function.

It can be seen from equation (1) that the window function of ST is flexible. Its window width can change according to the change of the frequency . The window width is wider in the low-frequency part and narrower in the high-frequency part. This not only improves the shortcomings of STFT but also inherits the multiresolution characteristics of CWT. Therefore, combining the advantages of ST and the idea of iterative compression of MSST, the MSSST can be expressed aswhere is the ST of signal , is the MSSST after *N* iterations, is the instantaneous frequency (IF) estimate based on ST, and its expression is defined as follows [14]:

We substitute into , and then the can be expressed as follows:

The rest can be done in the same manner, and then the can be expressed as follows:where .

The MSSST uses the S transform of the SSST to obtain time-frequency coefficients with better energy concentration. At the same time, combined with the idea of multiple iterations in the MSST, the time-frequency results can be further sharpened. After one iteration, MSSST will construct a new IF estimate to reassign the blurry ST energy. Therefore, after several iterations, the IF estimation in the MSSST will get closer and closer to the real IF of the vibration signal. Namely, the energy of the time-frequency distribution can be gradually concentrated.

##### 2.2. Faster Dictionary Learning

The dimension of the time-frequency image generated by MSSST is 1600 × 800, which is too high. If all the elements of the time-frequency image are directly input into the classifier for training and recognition, serious overfitting will occur. Therefore, we propose the FDL algorithm to extract the effective features of time-frequency images. The procedures of FDL are shown in Figure 2. First of all, the texture feature vectors of MSSST time-frequency images are extracted by the LBP operator, and each feature vector is taken as one column of *V*. After that, one-tenth of the samples are uniformly and randomly selected to compose matrix , which is decomposed by NMF to generate the dictionary *W*. At last, in combination with *W*, the feature coding set of all samples can be solved by NLE. More details are described in the following subsections.

###### 2.2.1. Local Binary Pattern Operator

As shown in Figure 3, the original LBP operator is defined as a 3×3 square window, and the center point of the window is taken as the threshold value to compare it with 8 adjacent pixels. If the surrounding pixel value is greater than the value of the center point, this pixel is marked as 1; otherwise, it is 0. In the end, we can obtain 256 types of binary patterns.

To further reduce the number of binary patterns and improve the statistics, a uniform pattern [32] is designed, which recorded the jump times of binary numbers 0 and 1 of the LBP operator. If the number of jumps is less than or equal to 2, it is called uniform pattern, and all except the uniform pattern are classified into one class. As a result, the number of patterns has been reduced from 256 to 59. Namely, the dimension of the feature vector of the time-frequency image is 59. The low-dimensional texture features of time-frequency images extracted by LBP can speed up dictionary learning and feature coding.

###### 2.2.2. Feature Coding for LBP

By extracting the LBP features of the time-frequency images, the feature size can be reduced to 59. However, there are still some redundant features that affect the recognition accuracy. The optimal dimension can be further determined by feature coding so as to achieve the best recognition.

Before feature coding for LBP, it is necessary to find its basis dictionary. The method is NMF which is defined as follows:where , , and are nonnegative matrices, is the small-batch set of the LBP features, is the basis dictionary, and is the feature coding set. Each column of is the LBP feature of one time-frequency image, and each column of is corresponding to the feature coding of each LBP feature. Then, *r* can control the dimension of the feature coding.

NMF has only one hyperparameter, which can be used for dictionary learning to obtain the optimal model more efficiently. All the remaining samples can be represented by different linear combinations of column vectors of the basis dictionary. The linear combinations are called feature coding. We can solve the feature coding by nonnegative linear equation (NLE), which is expressed as follows:where represents the *i*th sample, represents the corresponding feature coding of , and represents the *j*th element in the feature coding .

By equation (7), we can obtain the vector set in Figure 2. The set is the feature coding set of all LBP features, which is denoted as and input into the classifier for training and fault recognition.

#### 3. Experimental Study

To verify the performance of the proposed method, the Case Western Reserve University (CWRU) [33] and the Machinery Failure Prevention Technology (MFPT) [34] rolling bearing vibration signal datasets are selected for experiments. All experiments are carried out with Windows 10 (64 bit), CPU Intel Xeon [email protected] GHz, memory 64 GB, and MATLAB 2017b.

##### 3.1. Description of Dataset

The CWRU test platform is mainly composed of a motor, torque sensor, dynamometer, and electronic control equipment. The designation of the tested bearing is 6205-2RS JEM SKF, which is located at the drive end. The EDM technology is adopted to set the faults at three different positions (inner race, outer race, and ball) of the bearing. Each fault location has three different degrees of damage (fault diameter of 0.007, 0.014, and 0.021 inches, respectively). Therefore, after adding the normal state, there are 10 kinds of health conditions of rolling bearings. Vibration signals under each health condition are collected at four different motor loads (0, 1, 2, and 3 hp) and speeds (1797, 1772, 1750, and 1730 r/min) with a sampling frequency of 12 kHz. To retain data features as much as possible and increase the number of samples, the 40 raw samples are obtained under 10 health conditions and 4 working conditions are further divided. As shown in Figure 4, each raw sample is continuously divided into 150 samples, and each sample has 800 sampling points. Finally, the total number of samples is 6000, including 600 samples for each health condition and 150 samples for each working condition, which are divided into the training set, validation set, and testing set according to the ratio of 6 : 2 : 2. To improve the robustness of the diagnosis method and meet the needs of practical engineering, the influence of working conditions on fault recognition is not considered. At the same time, to avoid contingency, 150 samples under each working condition are randomly divided into the training set (90), validation set (30), and testing set (30). The details of the CWRU dataset are shown in Table 1.

MFPT dataset includes inner race, outer race, and normal health conditions with a motor speed of 25 Hz. The sampling frequencies of the inner race and outer race data are both 97656 Hz. The inner race data are collected at 7 different motor loads (0, 50, 100, 150, 200, 250, and 300 lbs). The outer race data are collected at 7 different motor loads (25, 50, 100, 150, 200, 250, and 300 lbs). The sampling frequency of normal data is 97656 Hz, and the motor load is 270 lbs. MFPT data are segmented in the same way as CWRU. However, each segmented sample has 4000 sampling points. To reduce the redundant information of each sample and facilitate the subsequent operation, it is sampled down every 5 points to get the final sample length of 800 sampling points. In the end, a total of 2,100 samples are obtained, with 700 samples for each health condition. In the case of inner and outer race faults, 100 samples are corresponding to each motor load. The ratio of training set validation set and testing set is 6 : 2 : 2. They are selected at random in the same way as the CWRU. The details of the MFPT dataset are shown in Table 2.

##### 3.2. Time-Frequency Analysis of Vibration Signals

To prove the superiority of MSSST, it is compared with several popular time-frequency analysis methods. Each type of faults in two datasets provides one sample randomly for our study. Table 3 shows the Rényi entropy (RE) [35, 36] of the samples. The time-domain waveform of the vibration signal of the CWRU inner race fault is shown in Figure 5(a). Figures 5(b)–5(g), respectively, show the time-frequency distribution of STFT, ST, Wigner–Ville transformation (WVT) [37], SSST, MSST, and MSSST.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

It can be seen from Figure 5 that the energy concentration of time-frequency images by STFT and ST is poor, and WVT has a serious cross-term. Further comparing SSST, MSST, and MSSST, we find that MSST and MSSST have better energy concentration, which verifies that iterative compression can improve the time-frequency analysis algorithm, and we also find that the time-frequency image of MSSST has less redundant information than that of MSST. Meanwhile, it can be seen from Table 3 that the RE of MSSST is always the lowest, which further indicates that the energy of the time-frequency distribution by MSSST is more concentrated. Therefore, in this paper, MSSST is adopted as the time-frequency analysis method of the rolling bearing vibration signal.

##### 3.3. Ablation Study

###### 3.3.1. Hyperparameter *r*

From Section 2.2.2, it can be seen that the hyperparameter *r* directly determines the dimension of feature coding of the sample. In order to obtain the algorithm model, it is necessary to determine the value of the hyperparameter *r*. We change the value of *r* and combine different time-frequency analysis methods with FDL to carry out the experiments. To avoid contingency and particularity, each method is performed for ten repeated runs under different *r*. We determine the value of *r* by the average recognition accuracy of the validation set.

Detailed results are available in Figure 6. Table 4 records the highest average recognition accuracy and its corresponding standard deviation and *r*. According to Figure 6, from the overall trend, the average accuracy of MSSST is higher than that of other time-frequency analysis methods. From Table 4, it can be observed that the standard deviations of diagnostic accuracy are relatively small, which indicates that the obtained models are relatively stable.

**(a)**

**(b)**

###### 3.3.2. Module Combination

We carry out a series of experiments to study the effect of time-frequency analysis and feature extraction methods. The hyperparameters *r* applied to the testing set are set according to Table 4. The experimental results are shown in Figure 7. We have the following three findings. (1) Under the same conditions, the recognition accuracy of MSSST is always higher than that of MSST. Investigating its reason, MSST is based on STFT and the width of its window function is fixed. However, it is based on ST that MSSST has a variable window function, which improves the self-adaptability of spectrum analysis and can extract more detailed time-frequency characteristics in vibration signals. (2) The features extracted using NMF + NLE have higher recognition accuracy than those extracted directly using the LBP operator. That is because NMF + NLE gets the optimal feature dimension, while LBP has a fixed value of 59. (3) When LBP is combined with NMF + NLE, the recognition accuracy reaches the highest, which indicates that when NMF + NLE is directly used to extract the features of the time-frequency image, its dimension is too large. Therefore, too much redundant information is also extracted, which affects the recognition effect.

**(a)**

**(b)**

Taken together, these results suggest that whether in the CWRU dataset or the MFPT dataset, MSSST + FDL is the best combination.

###### 3.3.3. Time for Feature Extraction

To verify the superiority of the time efficiency of the feature extraction algorithm in this paper, four different methods are designed to extract the features of the MSSST time-frequency image, and the time taken by them is recorded, respectively. We define as the time spent in dictionary learning and as the time spent in feature extraction of one sample.

As shown in Table 5, in CWRU and MFPT datasets, the dictionary learning time of texture feature vectors of time-frequency images is 0.037 and 0.012 hours, respectively. And yet, the time of dictionary learning on time-frequency images is 16.469 hours and 3.897 hours separately, which is very time-consuming. More importantly, in the process of fault recognition of the signal, although the time of only using LBP is the shortest, it can be seen from Section 3.3.2 that the recognition accuracy of this method is low. In addition, the feature coding of a time-frequency image takes 39,525 and 37,413.6 milliseconds, respectively.

By contrast, the proposed algorithm only consumes less than 100 milliseconds, saving more than 300 times of time, which can better meet the real-time requirements in practical engineering applications. At the same time, the feature coding algorithm in reference [27] is also very time-consuming, which further indicates that the proposed algorithm has high timeliness.

##### 3.4. Comparisons with Other Methods

To prove the effectiveness of the proposed method, Table 6 shows the recognition accuracy of different fault diagnosis methods for bearing faults.

Reference [27] directly adopted NMFSC + NLE to obtain the sparse coding of MSST time-frequency images and trained SVM to diagnose bearing faults. The parameter sparsity was set to 0.7; the parameter rank was set to 25 and 100 in datasets CWRU and MFPT, respectively. In reference [38, 39], Hilbert–Huang transform and convolutional neural network (HHT + CNN) were combined to recognize the bearing state. The former input the CWRU time-frequency images of 32 × 32 pixels into CNN, while the latter input MFPT time-frequency images of 32 × 32 and 96 × 96 pixels. In reference [40], the wavelet packet energy features combined with multifractal features (WPE-MFs), which are of feature size 33, were used to train SVM.

It can be seen from the results that, compared with other methods, our method has only one hyperparameter and can achieve higher recognition accuracy with fewer features.

#### 4. Extended Application

To demonstrate the practical engineering application value of the proposed method, we try to apply the proposed method to loudspeaker fault diagnosis.

##### 4.1. Data Description and Analysis

The loudspeaker signal acquisition system consists of a microphone, acquisition card, sweep generator, and acquisition software. First, the sweep generator touches the signal collection points of the loudspeaker. Then, the signal acquisition software synchronously collects the data of the loudspeaker sound signal. Finally, the label of the collected sound signal is marked.

The sampling frequency of the loudspeaker sound signal is 8 kHz. To increase the number of samples and ensure that each sample contains one complete period at least, as shown in Figure 8, the raw sound signal is randomly segmented to obtain the segmented samples with 8000 sampling points. Then, the down-sampling process is performed on the segmented samples and each sample is of 1600 sampling points. And it can be seen from Figure 8 that the down-sampling sample not only reduces the computational complexity but also retains the waveform of the segmented sample. Finally, we can obtain a total of 2000 samples, which are divided into the training set, validation set, and testing set at a ratio of 6 : 2 : 2. More sample information is shown in Table 7.

**(a)**

**(b)**

##### 4.2. Fault Diagnosis

The time-frequency images obtained by MSSST are shown in Figure 9. They have concentrated energy, but the dimension is 1600×3200, and there is a lot of irrelevant background information. Using the whole image as a raw sample for dictionary learning will consume a lot of computing resources and time, and redundant information will be added to the generated dictionary. Therefore, we can extract the optimal features of time-frequency images by FDL.

**(a)**

**(b)**

Figure 10 shows the average recognition accuracy of the validation set under different *r* values. It can be seen that the optimal value of *r* is 46. At this time, the average recognition accuracy of the validation set is 99%. This model is applied to the testing set. The experimental results show that 2 normal samples are misdiagnosed as the abnormal state and 1 abnormal sample is misdiagnosed as the normal state. Namely, the recognition accuracy is 98.5%.

In reference [41], the second-order time-reassigned multisynchrosqueezing transform was used to obtain the time-frequency images of the loudspeaker sound signals, and CNN was adopted to implement the feature extraction and fault recognition. The accuracy was 98.25%. By comparison, the proposed method is not inferior to the deep learning method.

#### 5. Conclusions

In this paper, a new fault diagnosis method for the rolling bearing, which is called MSSST-FDL, is proposed to boost the speed and accuracy of recognition. Experiments show that the MSSST has better energy concentration than other time-frequency analysis methods; time-frequency images with better energy concentration can improve the quality of fault diagnosis; the dictionary learning and feature coding of LBP feature vectors are faster than those of the whole time-frequency images, which can not only quickly determine the optimal hyperparameter but also meet the real-time requirement of fault diagnosis. The effectiveness of the proposed method is verified in CWRU and MFPT datasets, and the fault recognition accuracy is 99% and 97.85%, respectively. Furthermore, we apply the proposed method to loudspeaker anomaly diagnosis, and the recognition accuracy reaches 98.5%, which indicates that our method has the potential to be applied to other equipment.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (grant no. 51775177).