#### Abstract

Effective filtering and noise reduction, feature extraction and fault diagnosis, and prognostics technology are important to Prognostics and Health Management (PHM) of equipment. Therefore, a multifeature fusion fault diagnosis method based on the combination of quadratic filtering and QPSO-KELM algorithm is proposed. In the quadratic filtering, stable filtering can reduce the impact of noise and fast-kurtogram can filtrate fault frequency bands with rich fault information. Then, the time-domain, frequency-domain, and time-frequency parameters of the secondary filter signal are extracted. MSSST was used to analyze the filtered signal, and the time-frequency image was obtained. The time-frequency parameter was extracted from the time-frequency image by 2DPCA, and all the extracted parameters are taken as the fusion fault feature of the gearbox. Finally, the fault feature parameters are taken as the training sample and testing sample of QPSO-KELM for training and testing to achieve the purpose of fault diagnosis. The experimental results show that the proposed method can effectively filter the noise, complete the fault mode identification of gearbox, and improve the fault diagnosis accuracy better than other methods.

#### 1. Introduction

As a key component of mechanical transmission system, gearbox failure will lead to equipment shutdown, economic losses, and even casualties. The earlier the gearbox fault is found and repaired in time, the less the loss will be caused. Gearbox is mainly composed of rotating shaft, bearing, and gear. Its common fault types are gear fault and bearing fault. The fault will produce an impact signal, but the background noise interference and complex propagation path make it difficult to judge the fault state of gearbox timely and accurately. In order to filter out the noise, extract the effective state information, and better monitor and diagnose the state of gearbox, the related research is being carried out gradually.

As the signal is inevitably disturbed by noise in the process of acquisition, filtering technology as a signal denoising method plays an important role in the follow-up research. At present, the commonly used methods of filtering and denoising can be divided into linear filtering and nonlinear filtering. Linear noise reduction algorithm is not suitable for processing complex vibration signals. Therefore, the nonlinear filtering noise reduction method is the mainstream noise reduction method in the field of mechanical equipment fault diagnosis. The commonly used technologies include noise reduction method based on empirical mode decomposition (EMD) [1–4], noise reduction method based on wavelet transform [5–7], and noise reduction method based on manifold learning [8–11]. For example, Boudraa and Cexus [1, 2] proposed a noise reduction method based on EMD threshold, Gu et al. [7] used wavelet analysis to denoise data, and Song et al. [11] and others used the manifold learning method to denoise. Although the above methods can achieve some effective results, the mechanical transmission system is often disturbed by impact noise in the motion process, and the impact noise can be better filtered by a stable filter.

Feature extraction technology is the core of fault diagnosis. The commonly used feature parameters include time-domain feature [12, 13], frequency-domain feature [14], and time-frequency feature [15]. Time-domain feature is the feature information directly extracted from the time-domain signal, which can intuitively obtain the signal feature. The frequency-domain feature is extracted from the frequency spectrum obtained by Fourier transform. The time-frequency feature contains more information. The commonly used time-frequency analysis methods include short-time Fourier transform (STFT) [16, 17], continuous wavelet transform (CWT) [18, 19], and S-transform (ST) [20–22]. The S-transform inherits the advantages of STFT and CWT and develops on the basis of STFT and CWT. However, there are still some problems such as low resolution and energy leakage of the time-frequency images obtained by using ST technology.

Intelligent classification algorithm is widely used in the field of fault diagnosis, and its effect is good. Common intelligent algorithms include neural network [23–25], support vector machine (SVM) [26, 27], extreme learning machine (ELM) [28, 29], kernel extreme learning machine (KELM) [30], deep learning [31, 32], and other methods. Among them, the KELM algorithm has stronger comprehensive advantages in sample demand, network generalization, calculation speed, and accuracy and is more suitable for solving gearbox fault diagnosis problems with high calculation speed and accuracy requirements. However, the classification ability of KELM network is affected by the value of kernel parameters and penalty coefficient, so it is necessary to optimize the above structural parameters. Therefore, a method, using the quantum particle swarm optimization algorithm [33–35] to optimize the structural parameters of KELM (QPSO-KELM), is proposed in this paper. The algorithm based on QPSO can optimize the structural parameters of KELM with strong global search ability, so as to improve the learning speed and classification accuracy of KELM and improve the accuracy of gearbox fault diagnosis.

At present, most of the methods are not perfect in terms of diagnosis process and structure, and main problems are as follows:(1)The effect of signal filtering and noise reduction is not obvious enough, leading to the fault feature extraction which is not obvious enough.(2)The common time-frequency analysis methods also have the problems of energy leakage and low resolution, which cannot extract effective two-dimensional features.(3)The fault diagnosis accuracy of intelligent classification algorithm is not enough. To a large extent, the classification accuracy can be improved by improving the parameters.

Therefore, filtering noise reduction, feature extraction, and fault diagnosis technology are the focus of this paper [36].

In order to solve the above problems, denoising, feature extraction, and intelligent fault diagnosis technology are listed as the research priorities in this paper, and the following researches are carried out. The related steps involved are shown in Figure 1.

The working process is summarized as follows:(1)Quadratic filtering: collecting the signals of gearbox under different fault conditions, as the signals are susceptible to noise, the stable filter is first used for primary filtering, and then, the fast-kurtogram is used to filter frequency bands with rich fault information [37–39] to complete quadratic filtering.(2)Feature extraction: one-dimensional features and two-dimensional features of the filtered signal are extracted. One-dimensional features are characteristic parameters in the time domain and frequency domain. The extraction process of two-dimensional features is that MSSST is first used for time-frequency analysis of the filtered signal, and then, 2DPCA [40, 41] is used to extract the two-dimensional characteristic parameters of the time-frequency image.(3)Fault diagnosis: the extracted features are divided into training sample and test sample, which are used as the input of the QPSO-KELM model. The test sample is input into the trained QPSO-KELM, and then, the result of fault diagnosis can be obtained.

#### 2. Basic Theory of the Proposed Method

##### 2.1. Stable Filter and Fast-Kurtogram

Materials and Methods should be described with sufficient details to allow others to replicate and build on published results. Please note that publication of your manuscript implicates that you must make all materials, data, computer code, and protocols associated with the publication available to readers. Please disclose at the submission stage any restrictions on the availability of materials or information. New methods and protocols should be described in detail, while well-established methods can be briefly described and appropriately cited.

Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication.

Interventional studies involving animals or humans and other studies requiring ethical approval must list the authority that provided approval and the corresponding ethical approval code.

Conventional signal processing methods usually filter noise by linear algorithms. But when the noise is impact and the trailing is serious, the linear filter cannot deal with it effectively. Stable filter is better for this kind of impact signal.

The standard additive noise model of a signal is as follows:where is the signal of interest and is the noise. Noise signals come from a wide range of sources, such as equipment operation, resonance, and environmental noise. The distribution of such noise can be expressed by symmetrical stable distribution. Stable filter can filter out the noise to the maximum extent and recover the signal .

For the noise which conforms to the stable distribution, the nonlinear filtering technology based on the same steady-state distribution model has the best processing effect. These filters have good robustness and can also effectively deal with the situation of serious signal tailing distribution. Suppose is the negative value of logarithmic density, and the cost function is defined as follows:

The stable filter is defined as the minimum value of the cost function:

When , the minimum value of cost function is the maximum likelihood estimation of parameter . When the distribution is the Gaussian distribution of , , and the minimum value can be obviously got. When , filtering is nonlinear, and the minimum value of equation (3) must be found by numerical method.

A variety of filtering forms can be obtained by replacing the cost function equation (2). The simplest extension is to add a nonnegative weight :

In the detection problem, the known signal is extracted from the signal interfered by stable distributed noise, which can be expressed as follows:

If there is no known signal in the signal, , if there is a known mode in the signal . The filtering process can be realized by constructing a real filter function:

By the minimization equation (6), can be obtained, and is the estimated value of the extracted known signal.

After getting the stable-filtered signal, the fast-kurtogram is used to select the frequency band with rich fault information. The basic theory of fast-kurtogram is as follows.

Spectral kurtosis is a performance index reflecting the instantaneous impact strength. Its basic theory is to calculate the kurtosis value of each spectral line, so as to find the frequency band of impact [42]. The calculation equation is as follows:where is the complex envelope of signal at the corresponding central frequency and bandwidth .

The spectral kurtosis is proposed by Antoni et al. But the calculation of kurtosis is cumbersome and not conducive to engineering practice. Therefore, Antoni further proposed the concept of fast spectral kurtosis. The fast spectral kurtosis is an inverted pyramid structure. The abscissa represents the frequency, the ordinate represents the number of decomposition layers, and the color depth of the image represents the spectral kurtosis under different central frequencies and bandwidths. According to the color depth of fast spectral kurtosis, the optimal bandwidth and center frequency can be obtained. Then, the fault frequency band is extracted by band-pass filter, and then, the square envelope order spectrum of fault signal is obtained by square envelope analysis. The inverted pyramid structure of the fast-kurtogram is shown in Figure 2.

##### 2.2. MSSST-2DPCA

For signal , the S-transform of can be defined as follows:where the result of represents the time spectrum of the signal; is the Gaussian window function; and represent time and frequency, respectively; and is a time parameter that represents the position of the Gaussian window on the timeline.

According to equation (9), synchronous squeezing S-transformation (SSST) is performed on signal , and the calculation steps are as follows.

The coefficients of S-transform are , which can be obtained by equation (8).

Calculate the time partial derivative of the S-transform, and the equation is as follows:

The instantaneous frequency can be obtained by transforming equation (10), and the calculated result is

By compressing the value of the interval to , we can get the synchronous compression transformation value . The formula is as follows:

The time-frequency analysis method of MSSST can be obtained by multiple iterations of the SSST, and the iterative equation of MSSST is as follows:where is the and *N* is the number of iterations . Substituting into (*k* = 2, 3, …, *N*), after *N* − 1 calculations, the final result can be simplified to

For MSSST, it is only required to calculate and substitute it into the first-order SSST, and the calculated result is the same as equation (13). Therefore, the computational complexity of MSSST is similar to that of the first-order SSST, and it is beneficial to improve the calculation speed. The proposed MSSST is effective to solve energy leakage and low resolution of the spectrum.

After getting the time-frequency images, 2DPCA is used to extract the feature matrix. The theory of two-dimensional principal component analysis (2DPCA) is as follows.

Suppose is the *n*-dimensional column vector, is the image matrix of *m* × *n*, and is the *m*-dimensional projection vector of on after linear transformation. The process can be expressed as follows:

The trace of covariance matrix of is defined as the total divergence:

In maximum total divergence criterion, when the divergence of the projected vector is the largest, the direction of vector is the optimal projection direction, where the equation of is

The total divergence of is

The covariance matrix expression of the image is as follows:

According to equation (19), is a nonnegative definite *n* × *n*-dimensional matrix. Suppose there are *M* training images, the *j*-th image is represented as , and the mean value of all training images is recorded as , then the divergence matrix expression is as follows:

The normalized expression is as follows:

The obtained from the criterion formula of the maximum divergence is called the optimal projection axis. The optimal projection axis is the eigenvector corresponding to the maximum eigenvalue of . In general, one optimal projection axis is not enough. Therefore, the first *d* orthogonal unit eigenvectors with the largest eigenvalues are selected as the optimal projection axis, which can be expressed by the following equation:

The optimal projection vector of 2DPCA is used to extract the two-dimensional features of the image. Suppose sample image A, according to the above equations, can be obtained, which is called the principal component vector of sample image A.

The matrix of *m* × *d* composed of principal component vectors is the feature matrix (feature image) of sample image A.

##### 2.3. QPSO-KELM

KELM algorithm is adding kernel function into ELM, and the input weights and offsets in ELM are replaced by kernel function mapping, which makes the output of the ELM more stable and solves the over learning problems.

ELM is a single hidden layer feedforward neural network, and its hidden layer weight does not need to be adjusted by feedback regulation. The structure of the ELM is shown in Figure 3.

Suppose are *N* training sample sets, where is the input characteristic parameter and is the sample label, then the ELM training model can be expressed aswhere and represent the input and output weight matrix of the *i*th hidden layer node, respectively; represents the output layer vector of the ELM.

The goal of ELM training is to make the error between the actual output of the training sample and the sample label close to 0, and the equation is

So , , and can be used to obtain equation (25), as follows:

Equation (23) can also be expressed as a matrix:where is the output matrix of the hidden layer, and the equation of is as follows:where represents the inverse matrix of , and is the optimal quadratic solution to .

According to the structural optimization and ERM criteria, the output weight of ELM can be determined, that is,where represents the network calculation error, is the structural error, is the empirical error, and is the penalty factor.

Through the optimization of the solution, the optimal solution of can be obtained as follows:

Therefore, the output model function of ELM is as follows:

By using kernel operation to replace matrix operation *H* in ELM, the KELM classification model can be established. The kernel matrix is defined as follows:where is the symmetric matrix of and is the kernel function.

Equation (30) can be expressed by kernel function as follows:where represents the output weight matrix of KELM.

Different support vector machines can be constructed by selecting different kernel functions. Among the commonly used kernel functions, compared with the polynomial kernel function and *sigmoid* kernel function, the radial basis kernel function has the advantages of simple parameters and strong adaptability to randomly distributed samples. Therefore, a SVM based on radial basis kernel function is established in this paper, and its expression is shown as follows:

According to equations (28) and (33), it can be seen that KELM contains kernel parameter and penalty coefficient *C*, and it is necessary to find the optimal parameter to make the classification effect of KELM the best.

By quantizing the iterative updating process of particles of PSO algorithm, the QPSO can reduce the algorithm complexity and improve algorithm convergence speed and global search ability. The basic principles of QPSO are as follows.

Assume that is the *d*-dimensional search space, and the population number of particles in the space is *b*, then the position of the *i*th particle can be expressed as

Suppose the individual optimal position of the particle is , and the global optimal position of the particle is ; the and are as follows:

The particle can find and update its individual optimal position and population optimal position through iterative operation, and the average optimal position can be introduced as the population optimal center. Then, the particle optimization process can be expressed aswhere is the contraction expansion factor, which is dynamically adjusted in the iterative operation according to the following equation:where *N* represents the maximum number of iterations. and are the initial and final values of , respectively; usually, the two parameters are set as and .

According to the basic principle of KELM and QPSO, the network training process which used QPSO to optimize the network structure parameters of KELM is as follows: *Step 1*. Initialize the position of population particles, and set parameters such as particle swarm size, iteration step size, and termination conditions. *Step 2*. Initialize the current position of each particle as , define the classification accuracy of KELM as fitness function, and calculate the fitness value of each particle. The position of the particle with the maximum fitness value was initialized as . *Step 3*. Update the particle position according to equations (36)–(38). *Step 4*. The fitness value of each particle is calculated, and the individual optimal position , the group optimal position , and the group optimal center are updated based on the optimal fitness value. *Step 5*. Judge whether the termination conditions are met. If so, stop the calculation and output the result; if not, return to step 3.

According to the above steps, the modeling flow chart of QPSO-KELM can be drawn, as shown in Figure 4.

#### 3. Experimental Verification and Discussion

To verify whether the proposed method is available, three preset fault tests of the gearbox are conducted. The fault simulation test rig is shown in Figure 5. The test rig consists of four parts: the power and control part, the bearing fault simulation part (not used), the gear fault simulation part, and the data acquisition part (not shown). This section mainly uses the gearbox fault simulation part. The perspective and internal structure of the gearbox are shown in Figure 6. The teeth number in the gearbox from high-speed shaft to low-speed shaft is 41, 79, 36, and 90. The fault gear is gear 3 with 36 teeth. The fault types include wear, break, and miss teeth. The fault bearing is located in the deep groove ball bearing at the end cover of the gear 3 side of the medium-speed shaft (Figure 6(b)). The deep groove ball bearing type is ER-16k, and its fault types include inner race fault and outer race fault.

**(a)**

**(b)**

**(c)**

In the test, two vibration acceleration sensors are set in vertical and horizontal directions, respectively (Figure 6(c)). The data acquisition parameters are set as follows: sampling frequency *fs* = 20.48 kHz, sampling time *t* = 48 s, and motor speed 20 r/s. The main dimension parameters of the bearing ER-16K are shown in Table 1. According to the speed of the motor and the parameters of the gearbox and the bearing ER-16K, the relevant frequency can be calculated. The calculation results are shown in Table 2.

*f*_{r1} is the rotating frequency of high-speed axis, *f*_{r2} is the rotating frequency of medium-speed axis, and *f*_{r3} is the rotating frequency of low-speed axis; *f*_{m1} is the meshing frequency of gear 1 and gear 2, and *f*_{m2} is the meshing frequency of gear 3 and gear 4; *f*_{BPFO} is the fault characteristic frequency of the bearing outer race, *f*_{BPFI} is the fault characteristic frequency of the bearing inner race, and *f*_{BSF} is the fault characteristic frequency of the bearing ball spin.

##### 3.1. Denoising with Quadratic Filter

The quadratic filtering method can effectively filter out the interference noise in the signal and select the fault frequency band with rich fault information. Two cases of bearing inner race fault and gear broken tooth fault are selected as fault cases in the quadratic filtering part.

Firstly, the stable filter is used to filter out the noise, and the detailed process is as follows.

The fault signal of bearing inner race and gear broken tooth are filtered with stable filter, and the filtering parameters are set as , , and ; then, the filtered spectrum can be obtained, as shown in Figure 7.

**(a)**

**(b)**

**(c)**

**(d)**

Comparing with the spectrum before and after filtering, the signal-to-noise ratio (SNR) is improved with a stable filter. The SNR of bearing inner race fault before filtering is −9.3491, the SNR after filtering is increased to −8.2371, the SNR before filtering is −8.7745, and the SNR after filtering is −8.1016. Comparing with the spectrum before and after filtering, it can be seen that the trailing part of the signal is reduced, which indicates that the stable filter is effective to reduce noise.

Secondly, the fast-kurtogram technology is used to process the filtered signals, and the fault frequency band with rich fault information can be obtained. The fast-kurtogram diagram and square envelope results of bearing inner race fault and tooth break fault are shown in Figure 8.

**(a)**

**(b)**

Figure 9 is the square envelope spectrum of bearing inner ring fault and gear broken tooth after quadratic filtering. It can be seen from Figure 9 that the feature frequency of bearing inner race fault is obvious, and the bearing inner race fault can be judged according to the fault feature frequency. In the square envelope spectrum of tooth broken fault, the meshing frequency, frequency doubling, and the surrounding side band can be observed, so as to judge the gear tooth broken fault.

**(a)**

**(b)**

The above results show that the fault information of gearbox can be reflected to a great extent through the signal analysis after quadratic filtering.

##### 3.2. Feature Parameters of Time Domain and Frequency Domain

Time-domain feature parameters after quadratic filtering are as follows: mean square value (*P*1), root mean square (*P*2), variance (*P*3), standard deviation (*P*4), energy (*P*5), root square amplitude (*P*6), mean square amplitude (*P*7), mean square amplitude (*P*8), kurtosis (*P*9), skewness (*P*10), waveform index (*P*11), peak index (*P*12), pulse index (*P*13), margin index (*P*14), and clearance coefficient (*P*15). These characteristic parameters are extracted from the original signal. Frequency-domain characteristic parameters include mean frequency (MF, *P*16), frequency center (FC, *P*17), root mean square frequency (RMSF, *P*18), and standard deviation frequency (STDF, *P*19). In order to facilitate calculation, the extracted feature parameters are normalized, and the normalized parameters are expressed by NP, as shown in Table 3.

##### 3.3. Feature Images Extraction with MSSST-2DPCA

Time-frequency analysis is carried out on the filtered inner race fault signal of bearing and tooth break fault signal, and the time-frequency analysis results of STFT, CWT, ST, and MSSST are compared. The time-frequency analysis results are shown in Figures 10 and 11.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

From the time-frequency analysis images, it can be seen that the order of time-frequency resolution and energy aggregation is MSSST > ST > CWT > STFT, both in bearing inner race fault or gear broken tooth fault.

After getting the time-frequency images, 2DPCA is used to extract the six cases of normal, tooth miss, tooth break, tooth wear, and bearing inner race fault and outer race fault. The parameter settings are as follows: the original image size is 258 330, and the image size after 2DPCA feature extraction is 72 99. By using the MSSST method, 960 images are generated for each case, and a total of 5760 images can be obtained. Then, the 2DPCA algorithm is used to extract the feature matrix of the image. Comparing STFT and MSSST, and the results are as follows.

The images in Figure 12 and 13 are the time-frequency features of the filtered fault signal. Comparing the results of 2DPCA-STFT and 2DPCA-MSSST, it can be seen that the feature extracted by 2DPCA-MSSST contains more abundant fault information.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 3.4. Feature Fusion

In the previous paper, 19 feature parameters are extracted by calculation, and the feature matrix corresponding to the time-frequency image is obtained by using MSSST-2DPCA technology. The fusion features are obtained by fusing the feature parameters and the feature matrix. The specific steps are as follows:(1)Since the size of the feature image is 72 99, in order to ensure the same length of the feature vector, 99 segments of the same length signal are intercepted from the collected signals, 19 feature parameters are extracted, respectively, and the feature parameter matrix of 19 99 can be obtained.(2)For the two sensors (Figure 6(c)), the matrix of feature parameters is extracted according to the method in step 1, and the eigenvector of the feature matrix is calculated.(3)Calculate the feature vectors of feature image.(4)By combining the feature vectors corresponding to the two feature parameters and the eigenvectors corresponding to the feature images, fusion feature matrix can be obtained.

The corresponding image is shown in Figure 14.

##### 3.5. Fault Diagnosis Based on QPSO-KELM

The time-domain, frequency-domain, and time-frequency feature parameters obtained above are summarized; then, 960 groups of features can be obtained, and the features are randomly divided into training samples and testing samples according to the ratio of 3 : 1; the training samples number is 720, and the testing samples number is 240. The 720 training samples are used to train the QPSO-KELM. After training, the 240 testing samples are input into the QPSO-KELM, and then, the test results are obtained, as shown in Figure 15. There is no error point in Figure 15, so the classification accuracy is 100%.

#### 4. Comparison and Discussion

In order to verify the effectiveness of the proposed method, ELM and PSO-KELM algorithms are used to compare with the QPSO-KELM algorithm. Meanwhile, in order to verify the noise filtering effect of quadratic filter, the fault classification accuracy before filtering is also analyzed based on the above three methods, which is compared with the fault diagnosis effect after filtering. The experimental results are shown in Table 4 and Figure 16.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

In Figure 16 and Table 4, it can be seen that, among all the classification methods, QPSO-KELM has the best classification effect and the highest fault diagnosis accuracy. By comparing the fault diagnosis accuracy before and after filtering, it can be concluded that no matter which method is used, the fault diagnosis accuracy after filtering has been improved. Therefore, the effectiveness of the proposed method is proved.

#### 5. Conclusion

In this paper, a fault diagnosis method based on gearbox multifeature fusion, quadratic filter, and QPSO-KELM algorithm is proposed. Through the analysis of the experimental results, the following conclusions can be drawn: after stable filtering of the collected signal, the noise interference can be effectively filtered out, and the SNR can be improved. Combined with the method of fast-kurtogram, the fault frequency band with rich fault characteristic signals can be screened out, which is also conducive to the improvement of fault diagnosis accuracy. By extracting the time-domain, frequency-domain, and time-frequency characteristic parameters, more fault characteristic information can be extracted, which is beneficial to improve the accuracy of fault diagnosis. The MSSST method is proposed to improve the time-frequency analysis method, which can effectively solve the problems of energy leakage and low resolution. The KELM algorithm optimized by QPSO, to a large extent, improves the learning ability and classification accuracy of the KELM algorithm. Compared with other methods, the classification effect is better.

Although the method proposed in this paper is better than the traditional fault diagnosis method, it still needs to continue to explore in some aspects, looking for some new fault characteristic parameters to make it more conducive to monitor the current state of the equipment. In this paper, only some single fault modes of the gearbox are analyzed, but in the actual situation, the compound fault of gearbox is more common. So, the next step of the study is to extract more sensitive and effective fault feature parameters and try to analyze more complex gearbox failure.

#### Data Availability

The experiment data are obtained through the laboratory bench of our unit, which is not open to the public.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.