#### Abstract

Signals with multiple components and fast-varying instantaneous frequencies reduce the readability of the time-frequency representations obtained by traditional synchrosqueezing transforms due to time-frequency blurring. We discussed a vertical synchrosqueezing transform, which is a second-order synchrosqueezing transform based on the short-time Fourier transform and compared it to the traditional short-time Fourier transform, synchrosqueezing transform, and another form of the second-order synchrosqueezing transform, the oblique synchrosqueezing transform. The quality of the time-frequency representation and the accuracy of mode reconstruction were compared through simulations and experiments. Results reveal that the second-order frequency estimator of the vertical synchrosqueezing transform could obtain accurate estimates of the instantaneous frequency and achieve highly energy-concentrated time-frequency representations for multicomponent and fast-varying signals. We also explored the application of statistical feature parameters of time-frequency image textures for the early fault diagnosis of roller bearings under fast-varying working conditions, both with and without noise. Experiments showed that there was no direct positive correlation between the resolution of the time-frequency images and the accuracy of fault diagnosis. However, the early fault diagnosis of roller bearings based on statistical texture features of high-resolution images obtained by the vertical synchrosqueezing transform was shown to have high accuracy and strong robustness to noise, thus meeting the demand for intelligent fault diagnosis.

#### 1. Introduction

Fault diagnosis of rotating machinery has long been studied, and many effective fault diagnosis methods have been proposed. A nonideal environment, with varying operating conditions, time-varying loads, fast-varying speeds, and random transient phenomena, results in extraordinarily complex vibration patterns in rotating machinery [1, 2]. Vibration signals are often compound, nonlinear, and nonstationary [3]. Faults occurring on rotating parts, such as bearings, gears, and rotors, will generate fault feature frequencies, leading to more complex frequency structures and enhancing the fast-varying instantaneous frequency (IF) [4–6]. Consequently, the extraction of effective fault features becomes more difficult. Even if rotating machinery operates in a stable state, some faults will cause a fast-varying stiffness, resulting in rapid oscillations of the IF [7], which will produce fast-varying fault characteristics.

As mentioned above, nonstationary operating conditions cause the components of vibration signals to show fast variations of both the IF and instantaneous amplitude (IA). Time-frequency analysis (TFA) is a powerful method to obtain insights into the time-frequency (TF) structures of nonstationary signals. However, traditional TFA methods cannot meet the needs of nonstationary signal analysis [8–12]. Time-frequency rearrangement (TFR) is a remarkably effective method to process multicomponent and nonstationary signals. By improving the energy concentration, the calculated value of a certain point is transferred to the centroid of gravity of the signal’s energy distribution, but TFR technology (TFRT) lacks signal reconstruction capabilities [8]. Daubechies et al. proposed the synchrosqueezing transform (SST), a TF compression method based on the wavelet transform (WT), which is valuable in the study of audio signals [13]. The SST is similar to modal decomposition [14], with the sparsity of the RM and the reconstruction ability that traditional TFR lacks. The synchrosqueezing wavelet transform (SWT) was first proposed for the subsequent processing of a continuous wavelet transform (CWT). With certain modifications, it can be applied to the STFT and WPT [15–18]. Daubechies further proposed the concept of the frequency and time concentration based on the STFT or SST, which are based on the WT [19]. Huang et al. extended the WT-based SST to an S-transform-based variant, SSST [20], which could better reflect the TF characteristics of the high-frequency and weak-amplitude components of a signal, and applied it to seismic spectrum analysis for verification [21, 22]. The corresponding extrusion effect was closely related to the TF transformation. Scholars introduced TFR applications for multicomponent signals under unstable conditions, such as for seismic research [23–25], health monitoring [26, 27], paleoclimate analysis [28], and vibration analysis [29, 30]. Feng et al. [31] proposed an iterative generalized SST method characterized by the demodulation of the fast-varying transient signal into a carrier signal with a constant transient frequency to improve the TF energy concentration and used the proposed approach to detect and diagnose gearbox faults. Sheu et al. [32] applied the Renyi entropy to measure the TF representation of the SST and determine an optimal window width.

The change of speed has a considerable impact on the vibration signals through modulation, and a small change will give rise to significant frequency aliasing. When the speed changes significantly, the IF changes faster, the vibration signal becomes more nonstationary, and the TF spectrogram becomes more blurred. Much research has been conducted on the theory and application of the SST, most importantly in the analysis of multicomponent nonstationary signals. Oberlin et al. [33] improved the mode location and reconstruction of the traditional SST-based STFT using the second-order local estimate of the IF and extended the second-order SST to the CWT [34]. The applicability of this method to nonstationary signals, especially those with strong frequency modulation (FM) and multiple components, was verified. The TFR resolution is significantly improved, and this method allows component separation and pattern reconstruction. Moreover, it is robust to noise. Pham and Meignen [35] proposed a high-order synchrosqueezing-transform-based STFT and demonstrated that a TF image provided by the third- and fourth-order SST is much sharper than that obtained by the SST and second-order SST, especially for strongly nonlinear sinusoidal frequency modulations and high-order polynomial amplitude modulations. This study improved upon the TFR. Hu et al. [36] proposed a time-frequency method based on the high-order SST and a multitaper empirical wavelet transform (MTEWT) for a wind turbine planetary gearbox under nonstationary conditions. They also proposed a high-order synchrosqueezing wavelet transform and applied it to planetary gearbox fault diagnosis under variable operating conditions [37]. Lu et al. [38] and Fei et al. [39] proposed the multifeature entropy distance for the process characteristic analysis and diagnosis of rolling bearing faults by the integration of four information entropies in the time, frequency, and TF domains and two kinds of signals: vibration signals and acoustic emission signals. Keshtegar et al. [40] proposed a sensitivity analysis using Modified multiextremum response surface basis models (MRSM) to consider the variation of input variables on the nonlinear responses. Yu et al. proposed the multisynchrosqueezing transform (multi-SST) [41] and combined it with normalized TF coefficients [42] to detect the amplitudes of the weak components contained in a multicomponent signal. Yu et al. proposed a second-order multi-SST [43], in which a second-order two-dimensional IF estimate was embedded in a multi-SST framework. Wang et al. [44] introduced a matching synchrosqueezing transform (MSST) that, like standard TF reassignment methods, simultaneously considered time and frequency variables and incorporated three estimators in a comprehensive and accurate IF estimator. The MSST can obtain a more concentrated TFR than common methods like the STFT, Hilbert–Huang transform, and SST. The effectiveness of the MSST was verified in practical applications for gearbox fault diagnosis and rotor rub-impact fault diagnosis, to good effect. Wang et al. [45] proposed a parameterized time-frequency transform (PTFT) method to solve the problem that it is difficult to extract the instantaneous rotational frequency accurately due to the strong nonstationary property of the signal. Liu et al. [46] proposed an asymmetric penalty sparse model- (APSM-) based cepstrum analysis method to improve the cepstrum effectiveness. Peng et al. [47] developed the polynomial chirplet transform (PCT) to obtain a high concentration of TFR, and an effective IF estimation algorithm was also proposed.

The second-order SST, high-order SST, multi-SST, and MSST have made great breakthroughs in the acquisition of highly concentrated TFR, and TF images obtained from nonstationary and multicomponent signals have high sharpness and good legibility. However, the robustness to noise has not been verified in the research of high-order SSTs, which is important for equipment fault diagnosis, because noise is unavoidable in real engineering. Multi-SST methods have been validated by numerical simulations, but experimental verification has been carried out at constant speeds, and the performance depends on the iteration number , which is determined by the type of fault feature to be extracted.

For machine fault diagnosis, many methods, such as the multifeature entropy distance method, have been proposed. Most TFA-related methods rely on observations of an experienced technician to detect fault frequency features based on the frequency distributions in TF images, that is, manual diagnosis. This depends on the quality of the TFR, especially under time-varying operating conditions. This is the reason that scholars are committed to improving the IF and IA estimation accuracy and obtaining highly concentrated TF ridges, which can reduce the recognition difficulty for human eyes. For intelligent fault diagnosis, although high-order SSTs and the MSST greatly increase the concentration of TFR, there has been no verification that high-resolution TFR can improve the accuracy of intelligent diagnosis, because it depends on the extraction of fault characteristics. Therefore, in addition to the acquisition of high-quality TF images, this paper aims to determine whether highly concentrated TF images result in high accuracy of intelligent diagnosis using the textures of images with the same recognition algorithm, especially under fast-varying operating conditions and complex frequency structures. Based on a comprehensive comparison and our previous research [48], a second-order SST may be a better mathematical tool for vibration signal analysis.

The remainder of this paper is organized as follows. Section 2 introduces the concept of a multicomponent and fast-varying signal, reviews important concepts of the TFRT, STFT, and original STFT-based SST and describes two types of second-order SSTs. A model for signals with multiple components and fast-varying signals is defined in Section 3, and the performance of the second-order SST is analyzed theoretically by comparison to other methods in terms of the mode reconstruction quality and accuracy. A practical implementation of the second-order SST on the vibration signals of roller bearings with different fault types is described in Section 4 to validate its superiority, and it is applied to bearing fault diagnosis in Section 5. Conclusions are drawn in Section 6.

#### 2. Time-Frequency (TF) Transformation and Ssynchrosqueezing Ttransform (SST)

##### 2.1. Model of Multicomponent and Fast-Varying Signals

Multicomponent and fast-varying signals are usually composed of several single-component fast-varying signals overlapping in the time domain. The mathematical model of a multicomponent mixed signal containing *K* signal components can be expressed aswhere is the IF of the signal component and and are the fast-varying amplitude and fast-varying phase, respectively. For any , , , and . is the IF, and is the IA.

##### 2.2. Short-Time Fourier Transform- (STFT-) Based SST

The Fourier transform of the function is defined aswhere is the frequency. Given , represents the component of the vibration frequency in the entire time domain. We let be a real-valued even function with a unit norm. Considering the sliding window , STFT of is defined as

The representation is called the spectrogram of in the TF plane, and another version is often used, defined as

Thus, we can obtain the definitions of the so-called reassignment operators for the STFT reassignment method.

Frequency reassignment operator:

Group delay reassignment operator: where .

Assuming that the window and is continuous at point 0, the STFT can be expressed by

If the signal is analytic (i.e., ), then the integral domain for is restricted to .

As an alternative TF reassignment method, the STFT-based SST has three main steps. The first is to calculate the STFT representation of the analyzed signal according to equation (3). The second step is to calculate a candidate IF, , using equation (5). The third step is the energy redistribution,from the (, ) plane to the (, ) plane, where is a Dirac distribution. We focus on a synchronous compression transform in the STFT context.

Knowing the time-varying phase function , the mode can be retrieved as

A previous study showed that the STFT-based SST can also obtain reasonable accuracy with slowly time-varying frequency-modulated signals [15].

##### 2.3. Second-Order SST

###### 2.3.1. Enhanced Version of Second-Order SST

The hypothesis of the first-order SST is suitable for low-modulation signals; that is, for any time , if , then can be used to estimate the frequency . However, when is relatively large, the TF focusing ability is limited, which unfortunately leads to a fuzzy TF representation, and the ideal effect cannot be obtained. Hence, FM cannot be ignored in multicomponent and fast-varying signal analysis.

Motivated by this heuristic, a second-order difference using the phase of the STFT was proposed by Oberlin et al. [33]. If the modulation parameter is , then the frequency estimator of the second-order SST is calculated as

The synchrosqueezed Fourier transform “squeezes” the TF spectrum by reassigning the amplitude from to . Based on this idea, the vertical second-order synchrosqueezing transform (VSST) representation is defined by replacing by in equation (8) to obtain

The reason the term “vertical” is used is related to the oblique synchrosqueezing transform (OSST), which is introduced in Section 2.3.2. The coefficients of the VSST at a certain moment only move in the vertical direction, that is, the frequency direction.

This significantly enhances the sharpness of the TFR, which leads to a more accurate and robust estimate of the local frequency of the signal, as illustrated below. This gives a sharpened energy distribution on the phase space [49].

###### 2.3.2. Oblique Synchrosqueezing Transform

The SST only reallocates the energy of signals in the frequency direction, while ignoring the time direction [13]. When dealing with multicomponent signals with a fast-varying IF, the TFR from the SST will become fuzzy. The differences in the TFR if both the time and frequency directions are taken into account are unknown. To explore this problem, another second-order SST method, the OSST [50], is introduced for comparative analysis with the VSST. Unlike the VSST, which uses a first-order Taylor expansion of the derivative of the phase, the OSST directly includes the phase shift. The OSST is defined as

The strategy of the FSST and VSST to deal with perturbations is to integrate the coefficients near the ridge to compensate for the error caused by the estimated IF. However, the OSST simultaneously moves the coefficients in the time and frequency directions and therefore cannot achieve regularized treatment.

#### 3. Numerical Simulation Study and Performance Analysis

##### 3.1. Model of Multicomponent and Fast-Varying Signal

A multicomponent and fast-varying test signal is represented by , which has three parts. The expression of the test signal and each component is shown as follows:

The three components of the test signal are illustrated in Figure 1(a). Figure 1(b) displays the reconstruction error at each time instant for the three modes of the test signal when the FSST was used for mode reconstruction. Mode 3 was reconstructed correctly, but the reconstruction accuracy of modes 1 and 2 was poor due to the strong FM. The reconstruction accuracy is closely related to the degree of signal modulation. This is also reflected in Table 1, which shows the signal-to-noise ratio (SNR) of the reconstructed signal.

**(a)**

**(b)**

##### 3.2. Quality Comparison of Time–Frequency Representation Methods

Figure 2 shows the TFRs obtained by the STFT, FSST, OSST, and VSST. The TFR obtained by the STFT was relatively “blurred” and “scattered” due to the energy dispersion. The TF energy could not be well focused on the TF ridges because the signal energy was distributed around the real IF during the TF transformation. Moreover, the resolution of the TFR was even worse, especially in the high-frequency band, and a false frequency distribution appeared. The last three TFRs in Figure 2 demonstrate that when the SST algorithm was used in the TF transform process, the TF energy could be squeezed onto the IF ridges; that is, it was sharper. The resolution significantly improved. However, the local enlarged images in Figure 3 show that the ridges of the three components obtained by the FSST were still relatively fuzzy. In particular, the resolution of the TF ridge in the high-frequency band was very low, and the compression effect was not ideal. Because the first-order SST was sensitive to the window selection, estimates of the instantaneous frequency of the traditional SST methods, such as the FSST, have low accuracy. In contrast, the frequency distributions of the three components obtained by the VSST and OSST, as shown in Figure 3, are clear, and the TF resolution is relatively high. If the frequency components of the multicomponent signals cross, then the IF estimation at the frequency intersection will be affected, as shown in Figure 4. It can be concluded that STFT and FSST cannot meet the requirements, and the OSST and VSST were significantly better. The OSST was slightly better around the intersection region.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

To facilitate the comparison of the quality of each TFR, we measured the amount of information contained in the maximum coefficient amplitude in each time step and adopted a cumulative normalized energy [33]. The coefficients were sorted from small to large, the cumulative sum of squares of the first N coefficients was obtained, and this was divided by the sum of squares of all the coefficients. This was defined as the cumulative normalized energy. The faster this increased to 1, the better the energy concentration and the sharper the TF representation became.

Noise is inevitable in scientific research and practical engineering. Thus, robustness analysis should be conducted to support the bearing fault experimental study presented below. Two groups of numerical simulation analyses were carried out, one with added Gaussian white noise (noise level SNR = 0 dB) and one without. RM is a general method that is suitable for sharp TF representations; it has been proven to be able to accurately represent multicomponent signals [51, 52], and it is also introduced here for comparison. The cumulative normalized energies of the STFT, SST, OSST, and VSST without and with noise are displayed in Figures 5(a) and 5(b), respectively. Under noise-free conditions, the SST, OSST, and VSST had similar effects and could rapidly increase to 90% of the energy value and obtain a satisfactory TF resolution. Under noisy conditions, the growth rate of the normalized energy decreased. Figure 5(b) shows that the OSST had basically the same effect as the RM, while the effect of the VSST was slightly worse than that of the OSST. The effect of the SST under noisy conditions was significantly worse compared with that in the noise-free condition, and the growth rate of the normalized energy was significantly lower. With the same number of coefficients, the cumulative normalized energy value of the SST decreased from 82.1% to 74.8%. The STFT had the worst performance under both the noise-free and noisy conditions.

**(a)**

**(b)**

##### 3.3. Accuracy of Mode Reconstruction in terms of Signal-to-Noise Ratio (SNR)

Since the signal contained modulation and noise, it can be deduced from Figure 5 that five coefficients are needed on average to cover most of the energy at each instant. Therefore, to realize the asymptotic reconstruction of each mode, it is necessary to integrate the FSST and VSST near the ridgelines according to equation (9). It is important to recall that this regularization process is naturally applicable to the VSST rather than the OSST due to the simultaneous variation of the frequency and time.

Because the relationship between the output SNR and is directly proportional to the size of the integration interval, this is used to represent the mode reconstruction of the signal, with the analysis results shown in Figure 6. Figure 6(a) shows that the parameter of the VSST did not need to be set to a large value to achieve a high SNR (SNR ≥ 13 when ≥ 3) without noise. However, using the FSST method, even when was very large (i.e., there are many integral coefficients), high reconstruction accuracy could not be obtained (SNR ≥ 8 could only be achieved when ≥ 10). Under normal conditions, a real monitoring signal will contain noise, so more signal coefficients should be considered during reconstruction; that is, a larger value for should be used. However, this will introduce more noise interference. Therefore, should not be set too large or small in the interest of high accuracy and low noise interference. As illustrated in Figure 6(b), due to the influence of noise, the reconstruction accuracy acquired using the FSST could not meet the accuracy requirements. Only the VSST could successfully reconstruct the ridge estimate of the three frequency components. This was because the test signal contained relatively strong amplitude modulation (AM) and FM. Furthermore, regardless of whether the signal contained noise, the STFT could not achieve satisfactory reconstruction accuracy.

**(a)**

**(b)**

Based on the above numerical analysis of the test signal , it was concluded that second-order SST methods, such as the VSST, can not only enhance the quality of mode reorganization but also improve the accuracy of the ridge recognition.

Therefore, the VSST was applied for roller bearing fault signal processing with multiple components and fast-varying conditions to study whether this method can minimize FM interference, which cannot be avoided in practical engineering. This is equivalent to eliminating the false energy in the monitoring signals as much as possible and obtaining high-resolution TFR sensitivity to faults. Whether TF images obtained by this method can accurately capture bearing fault features for fault diagnosis also warrants further study, as it will have important engineering value for the early prediction and diagnosis of equipment faults caused by bearings.

#### 4. Performance Analysis of Practical Bearing Vibration Signals in Fast-Varying-Speed Conditions and Characteristic Extraction

We carried out a practical implementation study to demonstrate the effectiveness of the VSST for the fault diagnosis of roller bearings at fast-varying speeds and its effect on improving the clarity of the TFRs for signals with strong FM. The experimental platform is shown in Figure 7. A personal computer (PC) controlled the motor speed to achieve a fast-varying speed rotation of the bearing. The variable-speed process of the shaft in this experiment was described by . The signal examined in this paper varied rapidly relative to that during constant-speed operation or cases with small speed fluctuations (where the speed does not undergo rapid changes similar to a step function signal). In the experiment, the rotational speed decreased from 3000 to 0 r/min in 8 s. Data was collected every second. The test bearing type was ER-12K, and fault points were preset at the inner ring, outer ring, and rolling elements. Vibration data were acquired under five states: fault-free, inner ring fault, outer ring fault, rolling element fault, and combined inner and outer ring fault.

##### 4.1. Comparison of Time–Frequency Rearrangements (TFRs) under Different Fault States of Roller Bearing

The left column in Figure 8 shows the overall TFRs obtained by the STFT, SST, OSST, and VSST. Texture differences are evident in the TF images of the different fault types under fast-varying speed conditions, and the legibility is relatively strong. This tentatively shows that it was feasible to extract fault characteristics from the TFRs. The right column shows enlarged views of local ridges of the corresponding TFRs on the left, from which it can be seen that IF ridge textures obtained by the STFT were very poor. Compared to the STFT, the IF resolution of the SST was improved to a certain extent. More importantly, TFRs obtained by the OSST and VSST could squeeze the TF energy well on the IF ridges, and the texture sharpness of the TF images was significantly improved. Therefore, using the TFRs of the second-order SST for fault diagnosis is advantageous.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

##### 4.2. Quality of TFRs under Different Fault States

Figures 9(a)–9(e) indicate the cumulative normalized energies of the STFT, SST, OSST, and VSST under five different states. Under the fault-free state shown in Figure 9(a), the SST, OSST, and VSST had similar effects, which could rapidly increase to 90% of the energy value needed to acquire satisfactory TFRs. The conclusions obtained under the outer ring fault states displayed in Figure 9(d) were basically consistent with those under the fault-free state, but the growth rate was slightly lower. Figures 9(b), 9(c), and 9(e) show that the effects of the SST, OSST, and VSST were still much better than that of the STFT, and the order of the effects was OSST > VSST > SST. Due to different effects of the faults on the AM and FM of the vibration signals, the growth rates of the ball and the combined fault states were slightly lower than those of the inner ring fault state. In conclusion, the second-order SST could use a small number of coefficients to cover more than 80% of the energy value in order to acquire satisfactory TFRs, which shows its advantages in the analysis of multicomponent and fast-varying vibration signals collected in different states.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

##### 4.3. Accuracy of Mode Reconstruction in terms of SNR under Different Fault States

We examined the relationship between SNR and (Section 3.3) to demonstrate the reconstruction accuracy of vibration signals. Considering the relative simplicity of the experimental environment and to include the influence of noise to resemble an actual project site more closely, we added Gaussian noise to the original vibration signals to obtain the overall reconstruction accuracy of the monitoring signals. The fast speed variation process was defined by . Since the original vibration data had a certain degree of noise during collection, Gaussian noise with a level of 21 dB was added to the original vibration signals to further explore the consequence of severe signal pollution. The SNR of the overall reconstruction of the signal under the fault-free state and four different fault states is shown in Figure 10.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

The results showed that the reconstruction accuracy of the STFT was far lower than that of the other two, and it did not meet the signal reconstruction accuracy requirements. While the reconstruction effect of the SST was worse than that of the VSST in the inner ring fault state, the reconstruction effects of the VSST and SST in other states were basically similar, except that the effect of the SST was slightly lower. This was mainly because the fast-varying speed scheme in this experiment was linear acceleration, so the vibration signals contained linear AM and linear FM. The analysis results show that when was larger, the reconstruction accuracy was higher. However, in an actual signal reconstruction process, a larger is not better. Although it can account for more signal coefficients, more noise is introduced, which is not conducive to the analysis of signal frequency components and fault identification. From this point of view, the VSST still has an advantage over the SST.

##### 4.4. Statistical Feature Extraction of TF Images

The TFRs obtained in Section 4.1 revealed that the textures of the images in different states under fast-varying conditions were significantly different. Therefore, it is reasonable to explore whether the texture features of images can be used for fault feature extraction and fault state identification. The gray level cooccurrence matrix (GLCM) can reflect the comprehensive information about the direction, adjacent interval, and variation range of the gray images, which is the basis for analyzing the local patterns and the arrangement rules of images. Based on the GLCM, different statistical features reflecting the consistency and contrast of the image textures were extracted to identify the bearing state.

Haralick et al. [53] statistically described the texture of an image and proposed 14 quantitative methods to calculate the texture features based on the GLCM, which are widely used as statistical feature parameters (SFPs). We selected six commonly used linear uncorrelated SFPs: energy, contrast, correlation, entropy, homogeneity, and maximum probability. We used the weighted average method to convert the colored images to grayscale. The size of each image was 429 × 543, with 256 gray levels. Six SFPs were extracted from the GLCMs of all the gray images.

This experiment also adopted the vibration data in five states collected by the above-mentioned roller bearing experimental platform with a fast-varying speed. Forty samples were randomly selected for each state. The colored TF images transformed using the STFT, SST, OSST, and VSST were converted to gray TF images, and 200 gray images were obtained for each TF transformation method, for a total of 800 gray images between the four methods. The GLCM of each gray image was calculated, and 24 SFPs were separately extracted to generate a feature matrix with dimensions of 200 × 24 for each TF transform method. The distributions of the SFPs for the five states are shown for the STFT, SST, OSST, and VSST in Figures 11(a)–11(d), respectively. Because of the evident directionality of the image texture, SFPs in four directions, that is, 0°, 45°, 90°, and 145°, were extracted. The first row represents the maximum probability, entropy, contrast, correlation, energy, and homogeneity in the 0° direction. The next three rows illustrate the six SFPs in the 45°, 90°, and 145° directions, respectively.

**(a)**

**(b)**

**(c)**

**(d)**

The sensitivity of the SFPs obtained by the four TF transform methods to the different states of the bearing was comprehensively evaluated from the aspects of intraclass aggregation and interclass separability. Figure 11 shows that, from the point of view of intraclass aggregation, the eigenvalue distribution intervals of the SFPs obtained by the OSST were relatively large, and the aggregation was the worst. However, the SFPs from the other three methods showed good aggregation. This was likely because the coefficients moved simultaneously in the time and frequency directions during the IF estimation for the OSST. Although the TF ridges were sharper, some key information that correlated with the fault characteristics was lost during the squeezing process, which was consistent with the conclusion from Figure 10.

From the viewpoint of interclass separability, the distribution of the SFPs calculated by the OSST was the worst. In the normal and inner ring fault states, the distribution area of the maximum probability in four directions obtained by the VSST, SST, and SFTF had the same distribution interval, that is, they completely overlapped. The eigenvalue distribution interval of the other three states obtained by the VSST was slightly improved by the SST, and in particular, it was much better than that of the STFT. Thus, the performance ranking of the maximum probability of the four TF transformation methods was VSST > SST > STFT > OSST. Using the same judgment method, the intraclass discrimination of the other five SFPs in the four directions was compared. The order when sorting by the entropy and the contrast was still VSST > SST > STFT > OSST. However, for the correlation, energy, and homogeneity, they were in the order of VSST > STFT > SST > OSST.

Roller bearing fault diagnosis in fast-varying speed conditions relies heavily on capturing the time-varying fault features. From the comparison of intraclass aggregation and interclass separability, the six SFPs obtained by the VSST had the best sensitivity to different states of the bearing; thus, these parameters can be used for fault identification and diagnosis.

##### 4.5. Discussion

From the direct observation of the time-frequency images and the quality of the TFR under different fault states presented in Sections 4.1 and 4.2, the VSST and OSST had similar effects and could obtain sharper time-frequency images. Using the human eye, that is, manual judgment, the difference between VSST and OSST was not significant. However, for intelligent fault recognition, it is crucial to find high-quality characteristics for classification algorithms. We adopted SFPs based on the GLCM for feature extraction. The analysis in Section 4.4 showed that the SFPs obtained by the OSST did not distinguish faults well. Therefore, the VSST is better than the OSST in terms of intelligent fault diagnosis, which will be verified in Section 5.

We have shown that more concentrated TF images do not necessarily provide high-quality SPFs, and for intelligent fault diagnosis approaches, feature extraction is key. We may need to conduct further research on whether we can extract characteristics that are conducive to intelligent fault diagnosis from highly concentrated TF images using methods such as the high-order SST, multi-SST, and MSST.

#### 5. Application in Roller Bearing Fault Diagnosis

Based on the above simulation and experimental data analysis, we compared the bearing fault diagnosis performances based on the texture features of TF images obtained by different methods. The flowchart in Figure 12 shows the main steps of the feature extraction and mode recognition. The least squares support vector machine (LS-SVM) can solve classification and regression problems with small samples, many feature points, and local changes, and it is suitable for fault diagnosis. We adopted the radial basis kernel function (RBF) here [54].

Forty vibration data samples under each of the four fault states and the fault-free state were obtained from the test platform, and each of the 40 samples was divided into 20 samples for training and 20 samples for testing. Each testing dataset contained 100 samples, and there were 20 samples for each fault state. As shown in Figure 12, four TF transformation methods, STFT, SST, OSST, and VSST, were used to convert vibration data to TF images, and the GLCMs were calculated, from which all the SFPs were extracted. We adopted LS-SVM for the identification of the five states. Each recognition process was run six times, and the results were averaged, as shown in Table 2.

Table 2 shows that, in a noise-free state, the fault identification accuracy was relatively satisfactory. The recognition accuracy of the OSST was the lowest, which agreed with the analysis of Figure 11. However, the addition of noise increased the difficulty of IF estimation with time-varying operating conditions, which affected the accuracy of the TFR and reduced the recognition accuracy to varying degrees. The classification accuracy of the SST had the fastest decline, and it could not achieve satisfactory accuracy. After adding noise, the recognition accuracy of the STFT decreased less than that of the SST and OSST. This was because the addition of noise caused the cross-interference resolution of the TF image to decrease, and the texture feature information suffered from aliasing distortion. In addition, in the TF transformation process, compression performed to improve the clarity of the TF images could cause a loss of useful information, resulting in a significant loss of accuracy for fault identification. Although the OSST can provide higher-quality TF images than the VSST, the accuracy of the fault diagnosis using the OSST was low.

#### 6. Conclusions

We compared the performances of the vertical and oblique second-order synchrosqueezing transforms for analyzing strongly modulated and multicomponent signals with a fast-varying IF to the traditional STFT, SST, and RM. Based on the simulations, the second frequency estimator of the VSST and OSST could estimate the IF more accurately. The accuracy of the IF estimation was also guaranteed to obtain more energy-concentrated TFRs than the STFT and SST, and the sharpness of the TF images obtained by the OSST was slightly higher than that obtained by the VSST. The VSST allowed for better mode reconstruction than the original SST, while the OSST did not, and it showed strong robustness to noise in the reconstruction. Experimental verification of the quality of the TFR and the accuracy of the mode reconstruction was conducted on a roller bearing experimental platform. As a postprocessing method, the VSST method produced more energy-concentrated TFRs than some nonreassignment and reassignment methods.

We experimentally studied practical applications for roller bearing fault diagnosis under fast-varying speed conditions. A comprehensive comparison of the STFT, SST, OSST, and VSST was conducted from the perspective of intraclass aggregation and interclass separability. Six SFPs in four directions were verified to be more sensitive at detecting fast-varying fault characteristics from gray TF images when the VSST was applied. An experimental study validated the VSST as a powerful tool for condition monitoring and bearing fault diagnosis under a fast-varying speed, and we concluded that TF images with high sharpness do not necessarily lead to high accuracy of fault diagnosis. The results clearly showed that the VSST method can effectively characterize the time-varying fault features by using texture features of TF images for fast-varying signals. Thus, the VSST method can meet the requirements of intelligent fault diagnosis.

Further work should be devoted to examining the influence of reassignment operators at high frequencies, especially in frequency-intersection regions when frequency components cross. This paper only studied the second-order SST based on the STFT. Further study is needed to introduce the second-order SST to the CWT, S-transform, and WPT, to understand the influence of different versions of the TFA on multicomponent and fast-varying signals, and to determine the application effect on intelligent fault diagnosis. The comparative analysis of high-order SSTs and the MSST can also be introduced in future studies.

#### Data Availability

Data are available upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors are grateful for the financial support of this study. This research was funded by the Fundamental Research Funds for the Central Universities (Grant no. 2021YQJD14), the National Natural Science Foundation of China (Grant no. U1361127), the Yue Qi Distinguished Scholar Project of China University of Mining and Technology (Beijing, Grant no. 800015Z1145), and the National Key Research and Development Program of China (Grant no. 2016YFC0600900).