#### Abstract

The rolling bearing fault test signal has nonstationary and nonlinear characteristics. The feature extraction method based on variational mode decomposition (VMD) and permutation entropy can effectively measure the regularity of the signal and detect weak changes. Since the center frequency of the intrinsic mode function (IMF) of each fault test signal contains more details, this paper further extracts the multiscale permutation entropy feature for each IMF. The training samples and test samples of each IMF are constructed, and then the support vector machine (SVM) and the *K*-nearest neighbor algorithm (KNN) are used to identify the faults. The test results of the IMF components are used to determine the classification results combined with the maximum attribution index. Compared with the relevant feature extraction, the experimental results show that the method achieves a certain improvement in the accuracy of fault identification. The research results of rolling bearing fault data show that the multiscale permutation entropy and SVM/KNN can more accurately diagnose different fault modes, different fault sizes, and different operating states of rolling bearings.

#### 1. Introduction

In the diagnosis field of rolling bearings fault, machine learning-based fault diagnosis methods have attracted much attention, mainly including two procedures: feature extraction and pattern recognition.

In terms of feature extraction of faults, many statistical parameter estimation methods have been applied. In recent decades, entropy has commonly been used as the evaluation criteria of classification or clustering results, of which approximate entropy, permutation entropy, sample entropy, and fuzzy entropy have been introduced by many scholars in rolling bearing fault diagnosis.

In 1948, Shannon [1] proposed to take Shannon entropy as a measure of uncertainty for stochastic process results. As described in Shannon’s seminal paper, information entropy provides a measure of information content. High entropy levels imply a high degree of “unpredictability”in the system. Conversely, if each subsequent state of a system can be easily predicted from the previous state, the system is said to have low entropy. Subsequently, Kolmogorov [2] defined the concept of entropy for a new class of dynamic systems. In 1991, Pincus et al. [3, 4] modified Kolmogorov–Shannon (KS) entropy and created an approximate entropy (ApEn), which can be used to measure the complexity of real-life time series without coarse-grained procedures. In 2000, Richman and Moorman [5, 6] further modified ApEn and created sample entropy (SampEn), which has two advantages over ApEn: data length independence and higher relative agreement under multiple parameters. In 2002, Costa et al. proposed multiscale entropy (MSE) [7, 8] to address the problem of classifying healthy individuals, patients with congestive heart failure, and patients with unstable arrhythmias on a single time scale. The concept of a coarse-grained process was proposed in the entropy calculation process. In addition, Richman and Moorman [5] proposed cross-SampEn for the measurement of asynchrony and phase anisotropy of two different time series.

Guo et al. used dynamic time warping (DTW) and symplectic geometry mode decomposition(SGMD) to recognize the fault type of the test sample [9]. Kumar et al. employed an entropy-based state space model for estimating the remaining useful life (RUL) and carrying out the dynamic degradation monitoring of rolling element bearings [10]. Liu et al. solved the problem of failure samples being difficult to obtain by constructing FEM models with faults to obtain simulation signals [11]. Li et al. proposed a new complexity measure for analyzing time series and termed it as reverse dispersion entropy (RDE) [12].

In terms of uncertainty and randomness, sample entropy is superior to approximate entropy [9], and permutation entropy (PE) is superior to sample entropy [10]. PE has high sensitivity to signal change, strong noise elimination ability, and fast operation speed, as proposed by Bandt and Pompe, it is also available for the detection of the randomness of time series [11]. Considering these features, it can be used to quantify the fault sign gearbox signal present in vibration [12]. When any gear fault occurs during operation, it shows obvious signal characteristics. Therefore, the vibration signals obtained from such a system result in varied PE values in different health conditions. The smaller the entropy values, the lower the uncertainties present in the signal, and vice versa [13]. Due to the better predictability of the faults, the entropy will increase as the faults increase. As a consequence, PE can be used to distinguish bearing faults.

Among them, permutation entropy, as one of the most commonly used entropy methods, can effectively measure the regularity of signals and detect weak changes. It has the advantages of antinoise and invariance to nonlinear monotonic transformations. In the case of rolling bearing failure, the nonlinear dynamic complexity changes accordingly. Therefore, PE is well suited for feature extraction of rolling bearings. However, since PE analyzes time series on only one scale, the implementation of multiscale methods [14] has led to multiscale permutation entropy (MPE) techniques [15]. This enables permutation entropy to search for information contained in long-term trends in the signal, which can be lost if its original data points are analyzed directly. Similar to traditional single scale nonlinear metrics, PE can only measure the complexity of a signal (time series) on a single scale. In order to make up for the shortcomings, Aziz and Arif developed a multiscale permutation entropy (MPE) with better robustness than PE on the basis of PE [16], which was successfully applied to estimate the complexity and randomness of time series at different scales. Hsieh et al. used MPE to quantitatively analyze the vibration signals of rotating machinery at different scales and construct the original feature set [17]. A multichannel fused convolutional neural network was constructed to extract features from PE at multiple scales for fault identification of rolling bearings. Wang et al. proposed a data-driven method combining multiscale permutation entropy and linear local tangent spatial alignment to diagnose faults in vehicle suspension systems [18].

Furthermore, PE and MPE rely on pattern counts inside the signal, which itself is simply a sample series from a wider phenomenon. Thus, the measured entropy content is not a measure of certainty but an estimator [19, 20].

Considering the above theory and application results, after decomposing the original time series by VMD, we used multiscale permutation entropy (MPE) to perform feature extraction of intrinsic modal function components again, which were classified by the SVM method. Finally, the features were comprehensively judged using the calculation method of attribution degree. Multiscale permutation entropy of the original time series and VMD-based permutation entropy were applied for feature extraction. Moreover, two classification methods, KNN and SVM, were used for comparison.

#### 2. Multiscale Permutation Entropy

Permutation entropy is used to measure the complexity of time series. The size of the permutation entropy value corresponding to different time series varies. The more complex the time series, the greater the permutation entropy value; the more regular the time series, the smaller the permutation entropy value [21–23]. Due to the different harmonic content of current signals of different types of faults, the complexity is quite different. Therefore, the fault signal feature extraction can be realized by calculating the arrangement entropy of fault current signals. Compared with permutation entropy, multiscale permutation entropy can measure the complexity of time series from different time scales by overcoming the problem of a single feature of time series in a single scale permutation entropy metric. The steps to calculate multiscale permutation entropy are as follows: Step 1. Coarse-grain the IMF components obtained after decomposition at the s-scale and obtain a new time series at the s-scale, which is formulated as follows: In (1) is the scale factor; ; *N* is the sequence length; and indicates rounding down. Step 2. Perform phase space reconstruction with embedding dimension *m* and delay time *t* on the new time series after coarse-graining processing and incrementally arrange the interior of each subsequence . At this time, map each m-dimensional subsequence to one of the arrangements. Step 3. Through the above steps, represent an m-dimensional subspace as a symbol sequence and denote the probability distribution of all symbols by , where . Calculate the probability of the occurrence of each set of sequences at different time scales. The multiscale permutation entropy expression for each IMF component can be obtained as follows:When , take the maximum value of . Normalize , then

In (3): , which reflects the degree of randomness of the time series, of which the degree becomes stronger as the entropy value increases.

The multiscale permutation entropy algorithm is mainly affected by three parameters: scale factor, embedding dimension, and delay time. Therefore, it is very important to select three parameters of the embedding dimension *m*, delay time *τ,* and scale factor *s* of multiscale permutation entropy. Furthermore, the changes of these three parameters will affect our calculation results to a great extent. Zheng [24] et al. experimentally analyzed the change of the entropy value of the multiscale arrangement of the signal when the value of delay *τ* was different and verified that when *τ* was changed from 1–6, the entropy value of multiscale arrangement changed little, indicating that the effect of delay *τ* on entropy value of the multiscale arrangement was small, so *τ* = 1 was selected to calculate the entropy of the multiscale arrangement. Aziz and Arif [14] suggested that when the value of embedding dimension *m* was generally 3–7, the operation result reached the optimal state. When the value was 1–2, the number of reconstruction vector states was small, and the change of signal dynamics mutation could not accurately detect the change of signal. When the value was 6–7, the reconstruction of signal space homogenizes the time series, and the calculation amount of multiscale arrangement entropy increased dramatically and could not well reflect the subtle change in the time-series. He et al. [25] showed that the value of multiscale permutation entropy reaches an optimal state when the embedding dimension *m* = 5, *m* = 6, or *m* = 7. Therefore, in this paper, the embedding dimension *m* = 6 was selected.

In this paper, the vibration data of roller bearings were taken as the research object. The data source was the bearing vibration experimental data of the Electrical Engineering Laboratory of Case Western Reserve University. The driving end bearing model was deep groove ball bearing SKF6205, with 9 rollers, fault diameter of 0.1778 mm, motor load of 2 horsepower, motor speed of 1750 r/min, and sampling frequency of 12 KHz. The fault location was the inner ring, roller, and outer ring, so the data labels were divided into 4 types (normal state, inner ring fault, roller fault, and outer ring fault}. Figure 1 is the data display in one of the time series windows, and the number of sampling points is 2048.

It can be seen from the data diagram that the occurrence of a fault leads to the occurrence of a vibration shock, resulting in shock fluctuations in the amplitude of the three vibration signals. The shock frequency of the inner ring fault was large, the amplitude was small, and the shock frequency of the outer ring fault was relatively small, but the amplitude was large.

The results of the calculation of permutation entropy for the original time series of the four states are shown in Figure 2. The number of sampling points per frame was 2048, the head of the next frame was adjacent to the tail of the previous frame, and a total of 50 frames were obtained, with an average of 1.9247, 2.1105, 2.4871, and 1.5809 for the four states. Different thresholds were selected for comparison, and the results are shown in Table 1.

The normal state vibration data can be regarded as the superposition of noise and a certain frequency signal. According to the permutation entropy calculation rule, the appearance of a single periodic signal will make the entropy value reduce, and the amplitude and the number of a single signal will increase this reduction. So, the appearance of a periodic signal and its amplitude reduce the entropy value by 0.3438. For the roller, the irregular signal increases due to the irregular rotation of the roller itself, and then the entropy value increases.

The appearance of fault signals makes it difficult to analyze the entropy features of samples, so it is necessary to perform modal decomposition of the original time series.

#### 3. VMD

##### 3.1. VMD Rationale

VMD is a signal decomposition estimation method with better time-frequency distribution than EMD and LMD (local mean decomposition), and its overall framework is a variational problem, that is, it decomposes the signal according to the number of preset modal components. The original signal is decomposed into *K* modal function with central frequencies of , where *K* is the number of preset modal components. In the VMD algorithm, the intrinsic mode function (IMF) is redefined as an AMP-FM signal.where, phase is a nondecreasing function, ; the envelope is non-negative, ; and the envelope and the instantaneous frequency are retarded for the phase .

In order to obtain *K* modal components with certain bandwidth frequencies, first of all, for each modal function , the marginal spectrum is obtained by Hilbert transform; then, for each modal analytical signal, a central frequency is mixed and estimated; the spectrum of each modal is modulated to the corresponding base frequency band; then the square norm of the analytical signal gradient is calculated to estimate the bandwidth of each modal signal. The constrained variable problem is as follows:where, is number (*K*) of modal components obtained by decomposition; is the frequency center of each component; and is the pulse function.

In order to obtain the optimal solution to the above constrained variable problem, the quadratic penalty factor and Lagrange multiplication operator are introduced, and the alternate direction method of multipliers is used to solve the above variable problem. The saddle point of the extended Lagrange expression is gained by alternately updating , and .

The value of the modal function can be expressed as follows:where, *a* is the penalty parameter; is the Lagrange multiplier.

By using the Parseval/Plancherel Fourier isometric transform, the (6) is transited to the frequency domain.

The solution of the quadratic optimization is obtained after further transformation as follows:where, is the core of the power spectrum of the current modal function, and it can be seen that the Wiener filter is embedded in the VMD algorithm, and the algorithm has better noise robustness.

The value of the center frequency can be expressed as follows:

According to the same process, the value taking of the center frequency is first transformed into the frequency domain.

The solution to the quadratic optimization of the center frequency is as follows:

The VMD algorithm steps are as follows: Step 1. Initialize and *n*; Step 2. According to the formula (11) update and ; Step 3. Update ; Step 4. Repeat Steps 2 and 3, the cycle is ended until the iteration stop condition is met, and the result is output to obtain a modal component.

#### 4. Data Analysis and Discussion

VMD was used to decompose the original time series. It was necessary to determine the number of modes K and the penalty factor. In this paper, three cases of *K* = 3, 4, and 5 were analyzed, and the results are shown in Table 2–4. The center frequency aliasing was small in the case of *K* = 4, so *K* = 4 was determined. By referring to the references, the penalty factor was finally determined to be = 2000.

The original time series of various states are selected for VMD decomposition. The decomposition results of the normal state are selected, and the results are shown in Figure 3.

For the four selected bearing operation states (normal state, loss diameter 0.1778 inner ring fault, roller fault, and outer ring fault), the time windows of each running state were decomposed by VMD. The time window length selected was 2048, and the overlap between adjacent windows is determined. A total of 50 time windows were used for permutation entropy extraction, and the permutation entropy mean value of 50 time windows was also calculated. The results are shown in Figure 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

One of the time windows was selected for the extraction of entropy from multiscale samples, with a scale factor selection of 20 and nonoverlapping adjacent scales. The results are shown in Figure 5.

For each modal component of the four states, 100 time windows were selected and extracted for multiscale permutation entropy to obtain 100 20 training samples, and finally, 400 20 samples of the four states were labeled and used as input for SVM and KNN classifiers, with effects shown in Figure 6. The remaining IMF classification results were 100%.

**(a)**

**(b)**

To compare the accuracy of multiscale permutation entropy, the permutation entropy of modal components of each state was constructed as 100 5 training samples, and finally 400 5 samples of the four states were labeled and used as inputs for SVM and KNN classifiers. The effects are shown in Figure 7. It can be seen from the figure that four normal test samples are misjudged as roller faults when performing SVM classification.

**(a)**

**(b)**

To further compare the accuracy of multiscale permutation entropy, the time window of each state was used to extract the multiscale permutation entropy, which was constructed as 100 20 training samples. Finally, 400 20 samples of the four states were labeled and used as the input of SVM and KNN classifiers. From Figure 8, it can be seen that both classification methods reached 100% when performing classification.

**(a)**

**(b)**

For the selected four bearing operation states (normal state, inner ring fault with a loss diameter of 0.1778, roller fault, and outer ring fault), the test of data combination 1 was performed. The comprehensive test results are shown in Table Table 3. In the VMD-comprehensive test, the calculation method was to perform a membership degree calculation for the prediction results of four components and the residual error. That is, if one result belongs to one state with the most times, the result is judged to belong to the state. It can be seen from the table that the comprehensive judgment method can reach 100% judgment (Table 5).

To further validate the method proposed in this paper, the test of data combination 2 was performed. The running states of the four bearings were under load of 2 horsepower, the loss diameters of the inner ring fault were 0.1778 mm, 0.3556 mm, 0.5334 mm, and 0.7112 mm, respectively, which are shown in Table 6. It can be seen that in this test, except for misinformation in a small number of samples in IMF4 and residual errors, the VMD-synthesis method can reach 100% judgment.

The test of data combination 3 was performed. The four states were bearing normal operation states under different loads. From Table 7, it can be seen that the use of SVM for multiscale permutation entropy of the original time series has a better test effect; the use of KNN for permutation entropy obtained after VMD decomposition of the original time series has a better test effect; the use of VMD-multiscale permutation entropy combined with KNN has the best test effect, and its SVM classification effect is intermediate between the other two methods.

#### 5. Conclusion

In view of the fact that the central frequency of the intrinsic mode function (IMF) after test signal VMD decomposition contains more details, this paper further extracted the features of multiscale permutation entropy for each IMF. Subsequently, the training and test samples of each IMF were constructed, followed by the fault identification using support vector machines and KNN. The test results of the five modal components were applied to determine the classification results through the maximum assignment index. The feature extraction used at the same time includes the permutation entropy after VMD decomposition of the original time series as well as the multiscale permutation entropy of the original time series. The combinations of data used include varied fault modes, fault sizes, and operating states of rolling bearings. On account of obvious characteristics of fault mode and fault size, the above feature extraction mode could reach 99% extraction at the lowest. Compared with other feature extractions, this method could improve the accuracy of fault identification in different load test data combinations under normal operations. The research results of fault data of rolling bearings showed that the diagnosis of varied fault modes, fault sizes, and operating states of rolling bearings could be accurately performed based on multiscale permutation entropy and SVM/KNN.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research received financial supports from the Key Laboratory of Intelligent Industrial Equipment Technology of Hebei Province (Hebei University of Engineering) (NO. 202203) and China University Innovation Fund (NO. 2021DZ026).