#### Abstract

Planetary gearbox is one of the most widely used core parts in heavy machinery. Once it breaks down, it can lead to serious accidents and economic loss. Induction motor current signal analysis (MCSA) is a noninvasive method that uses current to detect faults. Currently, most MCSA-based fault diagnosis studies focus on the parallel shaft gearbox. However, there is a paucity of studies on the planetary gearbox. The effect of various signal processing methods on motor current and the performance of different machine learning models are rarely compared. Therefore, fault diagnosis of planetary gearbox based MCSA is conducted in this study. First, the effects of various faults on motor currents are studied. Specifically, the characteristic frequencies of a fault in sun/planet/ring gears and supporting bearings of the planetary gearbox are derived. Then, a signal preprocessing method, namely, singular spectrum analysis (SSA), is proposed to remove the supply frequency component in the current signal. Subsequently, four classical machine learning models, including the support vector machine (SVM), decision tree (DT), random forest (RF), and AdaBoost, are used for fault classifications based on the features extracted via principal component analysis (PCA). The convolutional neural network (CNN), which can automatically extract features, is also adopted. The dynamic experiment of the planetary gearbox with seven types of faults, including tooth chipping in sun/planet/ring gears, inner race spall in planet bearing, inner/outer races, and ball spalls in input support bearing, is conducted. Raw current signal in the time domain, reconstructed signal by SSA, and the current spectra in the frequency domain are used as the inputs of various models. The classification results show that the PCA-SVM is the best model for learned data while CNN is the best model for unlearned data on average. Furthermore, SSA mainly increases the accuracy of CNN in the time domain and exhibits a positive effect on unlearned data in the time domain. The classification accuracy increases significantly after transforming the time domain current data to the frequency domain.

#### 1. Introduction

A planetary gearbox exhibits the characteristics of a compact structure with a large transmission ratio and high transmission efficiency [1]. The mechanical power system is composed of a planetary gearbox and the induction motor is widely used in heavy-duty equipment, such as helicopters, wind turbines, and large cranes. Due to heavy loads and the undesirable working environment, the key components in a planetary gearbox, such as the planet gear, sun gear, ring gear, and supported bearing, are prone to contact fatigue, wear, and even fracture, which may lead to accidents. Once there is a fault in the planetary gearbox, the system is shut down for maintenance. This leads to a reduction in production efficiency and economic loss. Therefore, it is necessary to conduct fault diagnosis of planetary gearboxes.

A fault in rotating machinery equipment affects the motor current via torque transmission [2]; thus motor current signal analysis (MCSA) was proposed for condition monitoring. This is a noninvasive method that utilizes the current signal to detect faults. Given that the induction motor usually has its own current monitoring system, there is no need to install additional sensors. However, acquisition of other signals, such as vibration, temperature, and acoustics [3–5], requires installation of additional intrusive sensors, which is costly and may damage the original structure [6]. Fault diagnosis of rotating machinery based on MCSA has received significant attention in extant studies. Most studies were focused on the parallel shaft gearboxes and motor bearings.

Yilmaz and Ayaz [7] used the sum of the power spectral densities for certain frequency intervals of motor current and vibration signal, as input features, and added temperature as another feature. Subsequently, they used adaptive neuro-fuzzy inference systems (ANFIS) to classify the fault of motor bearing. Lessmeier et al. [8] conducted experiments on motor bearings with inner and outer race faults and classified motor current signals detected in the experiment using a variety of classical machine learning models. In addition to the abovementioned method of signal analysis, Han et al. [9] proposed an intelligent triboelectric bearing, which exhibits the ability of self-powering and self-sensing wherein the output current changes according to the bearing fault. Stator current modeling for defective rolling bearings based on magnetic equivalent circuits was also conducted [10]. As for fault diagnosis of a gearbox, Mohanty and Kar [11] studied the induction motor current signal of a three-stage gearbox under fault condition via amplitude demodulation and frequency demodulation. Jiang et al. [12] analyzed vibration data measured from a transmission gearbox with gear fault and discussed the impact of initial center frequencies on variational mode decomposition. When compared with the parallel shaft gearbox, the periodic situation of the planetary gearbox is more complex. First, there are multiple planet gears that mesh with the sun gear. This can lead to signal superposition [13]. Second, the planet gear not only rotates but also revolves with the carrier. Third, the original property of the periodic signal is varied with many modulation signals and a complex transmission path. Thus, fault diagnosis in planetary gearbox becomes more difficult than those of parallel shaft gearbox and motor bearing due to these characteristics. Furthermore, there is a paucity of studies on this topic. For planetary gearbox under different fault conditions, time-frequency space vector modulus analysis [14] and magnetic equivalent model [15] of stator current have been established. Furthermore, Zhang et al. [16] analyzed the motor current under the condition of ring gear tooth fracture via a resonance residual method, which proves that the frequency spectrum near the natural frequency of the transmission system contains rich information.

In terms of research methods, demodulation, wavelet transform [17], empirical mode decomposition [18], Hilbert–Huang transform [19], extracting and analyzing nonlinearities [20], and other methods are commonly adopted for extracting fault features from signals. With the enhancement of computing performance, a data-driven machine learning model has been developed [21, 22]. Increasingly, more researchers are beginning to use signal analysis and statistical methods as the basis of feature extraction and combine the traditional machine learning model for intelligent fault diagnosis. Lei et al. [23] conducted filtering and empirical mode decomposition for vibration signals of the motor under different motor bearing faults in time and frequency domains. Furthermore, Lei selected features and classified them with the genetic algorithm and ANFIS. Peng et al. [24] designed two support vector machine (SVM) models to classify the vibration and current signals of the gearbox in wind turbines according to the extracted features and used D-S information fusion to determine the fault category. Cheng et al. [25] proposed a self-encoder, which reduces the dimension of input signal from different fault experiments and then adopted a multiclass SVM model to classify the output of the self-encoder.

To extract representative features from signals, the methods of intelligent fault diagnosis mentioned above are time-consuming. To a great extent, the methods depend on the prior knowledge of signal processing technology and rotor dynamics. However, unsupervised learning and deep learning represented by neural networks reduce the requirement of prior knowledge because of their ability to automatically extract features and lead to direct diagnoses of the original signal or simply processed signal. Shen et al. [26] proposed a method based on stacked contractive autoencoder for feature extraction from amplitude spectrums. Jiang et al. [27] used the deep belief network to fuse the vibration signal and current signal to classify the fault of a two-stage gearbox. Hoang and Kang [28] adopted a convolutional neural network to obtain the features from two phases of motor currents and conducted the fault diagnosis based on the information fusion of perceptron, support vector machine, and *k*-nearest neighbor. Li et al. [29] proposed a deep random forest model, which integrates different signals as input, to diagnose a two-stage parallel gearbox. Lei et al. [30] proposed a two-stage bearing fault diagnosis method. In the first stage, unsupervised sparse filtering was used to extract features from raw signals, and the softmax function was adopted for classification in the second stage.

From the above review, it is confirmed that the existing research on MCSA-based fault diagnosis of planetary gearboxes is relatively less, and the performance of different feature extraction methods and classification models should be examined. Therefore, fault diagnosis of planetary gearbox based MCSA is conducted in this study. All the techniques adopted in this paper are common, and one of the aims of this study is to aid engineers and researchers in selecting an appropriate strategy for practical planetary gearbox fault detection via discussing preprocessing, feature extraction, and classification models. After the effects of various fault on characteristic frequencies of motor current are studied, a signal preprocessing method, singular spectrum analysis (SSA), is proposed to remove the supply frequency component and obtain a reconstructed signal. Subsequently, a convolutional neural network (CNN) and four classical machine learning models are selected to determine the type of fault. In this case, principal component analysis (PCA) is performed to extract features as the inputs of various machine learning models. Thus, dynamic experiments on planetary gearbox with seven types of faults, including tooth chipping in sun/planet/ring gears, inner race spall in planet bearing, inner/outer races, and ball spalls in input support bearing, are conducted. Finally, a few conclusions are summarized at the end.

#### 2. Influence of Fault on Motor Current

A typical planetary gearbox structure is shown in Figure 1, which consists of sun/planet/ring gears and supporting bearings. When a fault occurs, such as tooth chipping in gears or spall in bearings, it causes the load to oscillate.

Therefore, the load torque under fault condition can be expressed as the sum of a fixed torque and periodic oscillation torque due to fault. The periodic oscillation torque can be expanded via Fourier expansion. In the case of neglecting the higher harmonics,where denotes the amplitude of oscillation torque, , denotes the characteristic frequency due to fault. Given that the oscillating torque is generated via rotation, is generally considered a multiple of the input shaft rotation frequency , that is, . is greater than 0, as determined by the structure.

In the planetary gearbox, the fault characteristic frequency of sun gear, planet gear, and ring gear can be expressed as follows [31]:where denotes the meshing frequency, denotes the number of ring gear teeth, denotes the number of sun gear teeth, and denotes the number of teeth corresponding to the faulty part.

In addition to gears, rolling element bearings are also used to support the rotary components of the planetary gearbox. By considering the rolling bearing that supports the sun gear, as an example, the fault characteristic frequencies of the outer and inner race and the ball are shown, respectively, as follows:where denotes the number of balls, denotes the diameter of the ball, and denotes the bearing pitch diameter.

Under normal working conditions, the load torque of the motor is in balance with the output torque of the motor. As long as the fault appears, the resultant torque that acts on the mechanical system is . The total moment is equal to the moment of inertia times the angular acceleration, and it can be expressed as follows:

The rotor phase can be obtained by integrating twice on both sides of the equation as follows:where denotes the constant component of the rotor speed.

By considering the slip between the synchronous speed of the stator and the speed of the rotor , using the transformation of the reference system [2], the magnetomotive force of the rotor can be obtained as follows:

The magnetomotive force of the stator is not affected by the fault. Hence, it can be directly obtained as follows:where denotes the phase difference of initial magnetomotive force between the stator and rotor.

The airgap flux density is determined by the sum of the magnetomotive forces of the stator and rotor and airgap permeance. According to the magnetic flux in the winding, the magnetic induction of electromotive force can be obtained via differentiation as follows:

In practice, is much less than 1. Hence, the last item can be ignored. Subsequently, the stator current can be expressed as follows:

It is shown that the stator current is composed of two parts after the occurrence of fault. The first part is from the stator and the second part is from the oscillation of the rotor. Thus, the fault frequency can be expressed as follows:where denotes the supply frequency of the stator, and its relation with is , where is the number of poles.

#### 3. Preprocessing Technique: Singular Spectrum Analysis (SSA)

In the current signal, the supply frequency of the current is the dominant component, while the fault characteristic frequency components are usually weak. Hence, it is necessary to separate the different components of the current signal via preprocessing methods.

Singular spectrum analysis (SSA) is often used to deal with the nonlinear time series data [32]. By decomposing and reconstructing the time series, the sequences that represent different components can be extracted, which is helpful in separating weak fault feature information.

The first step of SSA for reconstruction is to transform time series into vector series with a multidimensional delay. The method involves sliding a window with length on and intercepting the time series in the window to form the trace matrix .

It is assumed that a time series of length is expressed as . The elements on each inverse diagonal of part on the left of the trace matrix are equal. Specifically, is also termed as the Hankel matrix and can be expressed as follows:where , .

The second step of SSA for reconstruction involves performing a singular value decomposition (SVD) on the trace matrix :where denotes the unit orthogonal matrix of size , denotes the *i*th column vector; denotes the diagonal matrix of size , and the elements on the diagonal are the eigenvalues of the trace matrix that are arranged from large to small; denotes a unit orthogonal matrix of size , is its *i*th column vector; denotes the rank of , where is the trace matrix corresponding to the *i*th component of the signal.

Given that the elements on each inverse diagonal of part on the left side of are not equal, the diagonal average method can be used to change the elements on each inverse diagonal of into the average value of the elements, and thus we obtain a new . Furthermore, each element in can be calculated as follows:where .

After the trace matrix of each component is obtained, different types of signal can be reconstructed as follows:where can represent the trend, period, noise, or any other compositions of the original signal.

#### 4. Feature Extraction and Classification Model

##### 4.1. Principal Component Analysis (PCA)

Principal component analysis (PCA) is a classical dimension reduction method in data mining. When a group of data has more than one variable, the problem becomes complicated. Typically, there is a certain correlation between these variables, and the information reflected by the two (or more) variables may be redundant. In this case, PCA uses one variable to reflect multiple variables with high similarity and thereby reduces the number of variables.

In signal processing, it is considered that the principal component has a larger variance and the noise exhibits a smaller variance. A set of zero-mean signal samples can be expressed as . The length of each sample is the characteristic dimension , which is equivalent to the length of the signal. To project the m-dimensional variables to the new *p*-dimensional space, a group of linearly independent basis in the original *m*-dimensional space should be constructed to ensure that the correlation between different dimensions is zero and the variance of each sample is maximum. This group of basis can be expressed as . The projection variance maximization can be realized by the following constrained optimization problem:where denotes the actual covariance matrix :

Furthermore, .

With the Lagrange multiplier method, the optimization problem can be transformed into matrix eigenvalue decomposition problem. The eigenvalues of the covariance matrix from large to small can be listed as . By preserving the first *p* orders of eigenvectors, the eigenmatrix is constructed as .

Given that the covariance matrix is a real symmetric matrix, the unit eigenvectors are orthogonal to each other. Multiplying this eigenmatrix with , we can get the projection of these sample features in *p*-dimension. The percentage of the components on a different basis of vector space in the overall information can be calculated by the following equation. Hence, during the experiment, it can be judged whether the extracted features can represent the original data:

##### 4.2. Classical Machine Learning Model

Four classical machine learning models, including support vector machine (SVM), decision tree (DT), random forest (RF), and AdaBoost, are adopted in this study. They are realized with Scikit-learn, an open-source machine learning library. The structure and principle of these models are briefly introduced as follows:

The SVM has been widely used in classification and regression problems due to its good robustness and strong generalization ability for unknown data. The core idea of SVM is to find a hyperplane, which can divide the samples linearly in space based on the training set samples. The samples that are closest to the hyperplane are called support vectors. The optimization direction of the model involves maximizing the distance between the support vector and hyperplane. When it is difficult to divide the low dimensional space linearly, SVM can use the kernel function for mapping the data to the high dimensional space to find the hyperplane. The main parameters of SVM in programming are kernel function, kernel coefficient gamma, and regularization parameter C. Usually, the kernel function is Gauss kernel function. As for gamma, if it is too large, the variance of Gaussian distribution is too small. Hence, the model can only operate on the support vector sample, which may lead to overfitting. Conversely, if gamma is too small, then the Gaussian distribution will be too smooth, resulting in the underfitting of training results. As for C, if it is too large, then the model increases the punishment in case of classification error, which leads to an overfit model.

The structure of DT is similar to that of a tree. Each internal node represents a judgment on the attribute, each branch represents an output of the judgment result, and, finally, each leaf node represents a classification result. The priority among different attributes is determined by the contribution of entropy. The main parameters of the decision tree are the criteria function, maximum depth, minimum number of samples for node subdivision, and minimum number of samples that are required at a leaf node. Furthermore, criteria function is used for calculating the entropy. Other parameters are used to determine the structure of the tree.

RF and AdaBoost are ensemble models that are the integration of traditional machine learning algorithms. Their core idea involves combining multiple weak classifiers to obtain a strong classifier. When compared with a single classifier, it reduces the possibility of overfitting the model. In this study, the base classifier for both types of ensemble models is the decision tree.

RF establishes multiple weak classifiers that are independent and equal. When a new sample is input, the voting method is used to select its category. Each classifier is trained to obtain the combined model.

AdaBoost uses an adaptive method to iteratively learn each weak classifier when a new sample is input. Each iteration increases the sample weight of the last error classification, reduces the sample weight of the correct classification, combines multiple classifiers into a strong classifier linearly, and provides a larger weight to the weak classifier with a lower error rate.

##### 4.3. One-Dimensional Convolutional Neural Network (CNN)

The neural network differs from the classical machine learning model because it represents the learning method that can automatically extract and select features from the data.

For a given sample and its corresponding label , the neural network can use the error backpropagation algorithm to continuously approximate the real mapping function between them. Convolutional neural network is a type of feedforward neural network, which consists of convolutional layers and fully connected layers. An example of a one-dimensional convolutional neural network is shown in Figure 2.

In the convolutional layer, features are extracted from the input data:where denotes the input data, denotes convolution operation, denotes the kernel, denotes bias, and denotes the activation function.

The convolution kernel performs the convolution operation by window sliding on the input signal. Each convolutional layer can use multiple convolution kernels to extract different features from the current input. The purpose of the activation function is to introduce nonlinearity into the output of a neuron. If the activation function is not used, then all the neurons are linear combined. The commonly used activation functions are rectiﬁed linear unit (ReLU) and sigmoid:

Subsequently, the feature obtained via convolution enters the pooling layer. The pooling operation is also realized by sliding a certain length of window on feature maps. When the window slides, the average or maximum value in each window is calculated to form a new feature map. The essence of pooling is actually a resampling operation, which reduces the amount of data.

Suppose that a signal sample input into the rollup layer is . By using *n* convolution kernels of size , it can obtain *n* feature maps. After pooling, the input of the next layer becomes . Meanwhile, the size of each convolution kernel in the next layer should be . Normally, in each layer is selected from the commonly used values in CNN research and *n* is determined by the complexity of the classification task, which is also known as the number of channels.

By alternately stacking multiple convolution operations and pooling layers, the overall structure of convolutional layers can be obtained. The output feature map of the last convolutional layer is directly flattened into a one-dimensional vector and input into the fully connected layer. For the fully connected layer, the operation process of each layer can be expressed as follows:where denotes the weight matrix of layer , denotes the bias of layer , denotes the input of layer , denotes the output of layer , which is also the input of the next layer, and is the activation function of layer .

The length of the last output vector from the fully connected layer is equal to the number of categories . Then, softmax is used to convert each element of the output vector to a value on an interval (0, 1), and the sum is 1. Therefore, the conversion result can be regarded as the probability of each category. Finally, the result is the category that corresponds to the maximum probability. The softmax equation is as follows:

With respect to the training process, the cross-entropy loss function is used to measure the error between the predicted and real fault type because this is a classification problem. The cross-entropy loss function is as follows:where denotes the real fault category label and denotes the fault category label predicted by the network.

#### 5. Experiment

##### 5.1. Test Bench and Dataset

The test rig, as shown in Figure 3, consists of an induction motor (rated power: 3 kw, rated speed: 1480 rpm), a 2K-H planetary gearbox, and a magnetic power break. A torque/speed transducer is placed between the motor and gearbox via mechanical couplings. The torque load is applied via the magnetic power break. The planetary gearbox has three planet gears, and the tooth numbers are 16 (sun gear), 63 (planet gear), and 143 (ring gear). The input rolling bearing, which is used to support the rotation of sun gear, is a deep-groove rolling bearing (6208). The type of planet bearing is a cylindrical roller bearing (NJ204).

Seven types of faults have been seeded in the test gearbox. They are chipping teeth in sun/planet/ring gears (SG, PG, RG), inner race spall in planet bearing (PG-IR), inner/outer races, and ball spalls in input support bearing (IR, OR, B). The pictures of fault components are shown in Figure 3. For comparison, there is a control group with no fault.

A current sensor was utilized to acquire the phase A current signal of the motor. There are five types of torque loads in the system that run under each type of fault, and the experiments are conducted four times under each operating condition (see Table 1). Each experiment collects 30 s data, and the sampling frequency is 25.6 kHz.

In this study, 20 s data from two files under torque loads of 0 N·m, 5.25 N·m, and 10.5 N·m are selected to build the model. They are split by a window of size 0.1 s. Thus, a total of 9600 samples are acquired. The 9600 samples were divided into a training set and test set 1 by 70% and 30%. Specifically, 3200 samples (10 s data from two files under a torque load of 2.6 N·m and 7.8 N·m) as unlearned condition were used to evaluate the performance of the model to inspect its robustness. This set of 3200 samples is termed as test set 2.

##### 5.2. Signal Analysis

The current signals of seven different types of fault and the normal condition are analyzed. As shown in Figure 4, the supply frequency at 50 Hz and its frequency multiplication are the most evident features. No other fault characteristic frequency is observed in the amplitude spectrum. There is no clear visual difference between eight amplitude spectrums.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

From others’ research [33], there are sidebands or many other characteristic frequencies in the amplitude spectrum, which may not be observed. To eliminate the strong effect of the supply frequency and expose the signal that may contain the fault information (besides the SSA we proposed), wavelet packet decomposition (WPD), ensemble empirical mode decomposition (EEMD), and independent component analysis (ICA) are considered for removing the power frequency. However, EEMD is not suitable for massive signal processing, and the experiment result indicates that it cannot isolate the complete supply frequency component for certain samples. As for ICA, if it is used for dimension reduction; then its function is the same as PCA in this case. If it is used for blind signal separation, then there should be multiple sensors that can acquire multiple signals. Therefore, these two methods are not adopted in this study.

After processing the current signal with SSA, by considering the inner race fault bearing as an example (as shown in Figure 5), the reconstructed signal by SSA can be divided into two categories, namely, the signal with the same periodicity as the supply frequency and residual signal.

**(a)**

**(b)**

Therefore, the periodic components of the original signal can be removed, and the signal can be reconstructed with the residual signal. The periodic component is compared with the residual and original signals, as shown in Figure 6.

Besides SSA, an approach based on WPD is also considered for comparison with SSA. After removing the nodes, which represent frequencies near the supply frequency in the binary decomposition tree, signals can be reconstructed.

Figure 7 presents the amplitude spectra of the inner race fault signal in Figure 4 after the usage of SSA and WPD. Evidently, the SSA exhibits a better ability to eliminate power supply frequency components.

**(a)**

**(b)**

##### 5.3. Experimental Results

Before the experiment, signals are labeled according to the types of fault. Given the fact that traditional machine learning models require feature engineering, PCA functions as a feature extractor. Their combination, PCA-SVM, PCA-DT, PCA-RF, and PCA-AdaBoost, with CNN is used in this classification task. Signal processing methods can be divided into four groups according to whether they are preprocessed by SSA and whether they are transformed into the frequency domain via fast Fourier transform (FFT). Given the fact that SSA involves large-scale matrix operation and resampling of the original data can lead to information loss, the window size used should not be extremely large. In this case, it is 640.

For the experiment, we implement the Python-based programming. First, PCA compresses the raw signal to 20 dimensions in advance. By using equation (17) to calculate the proportion of information in the 20 dimensions, the result exceeds 95%. Thus, these 20-dimension data represent the original data. With respect to SVM, the commonly used gamma and C are obtained by selecting the best combination according to the result. In this study, the parameter combination that enhances the result is as follows: gamma is the default and C takes 100. For DT, Gini impurity is selected as the criteria function, the maximum depth is set to 30, the minimum number of samples for node subdivision is 10, and the minimum number of samples required to be at a leaf node is set to 5. The parameters of the base classifier of RF and AdaBoost are the same as the DT above.

However, these machine learning models, which should extract features in advance, are not suitable for the original long sequence signal. Thus, CNN is adopted. By considering the computational burden, the specific structure designed is shown in Table 2. The kernel size and number of channels are popular choices in CNN study. There are 4 convolutional layers and a fully connected layer. In the case of the CNN with time domain signal input, the adaptive moment estimation (Adam) optimizer is selected, and its learning rate can adapt to the training process. In the case of the CNN with frequency domain signal input, the stochastic gradient descent (SGD) optimizer is selected, and the learning rate *α* is set to 0.0015. Different optimizers are experimentally determined. They, respectively, reach higher training accuracies in different domains. It has been proven that Adam and SGD exhibit different performances in different tasks. Adam usually converges faster than SGD. However, it may not converge in some cases and can miss the optimum solution [34]. In another study [28], a similar strategy using different optimizers is also adopted.

After the training, the current signals of test set 1 and test set 2 are inputted into each model to obtain the classification results. The accuracy of test set 1 is shown in Table 3.

The comparison of different models reveals that PCA-SVM exhibits the best performance in the time domain, whether to use preprocessing or not. It is also the best model in the frequency domain for reconstructed signals. It exhibits a maximum accuracy of 99.83% and an average accuracy of 93.92%. Furthermore, PCA-AdaBoost exhibits the second-highest average accuracy of 92.57%. It is the best model for signals, which are not preprocessed in the frequency domain. The average accuracy of CNN is slightly inferior to that of PCA-AdaBoost. These three models are very stable. PCA-RF is ranked after these three models. The worst model among them is PCA-DT. Its maximum accuracy is 94.13%. However, its average accuracy is lower than 80%.

The preprocessing method, SSA, seems to have a weak impact on the result of test set 1. It only improves the performance of CNN in the time domain and that of PCA-SVM slightly in the frequency domain. The accuracy of PCA-DT and that of two ensemble models in two domains decrease, which suggests that components of the reconstructed signal may contain certain misleading features learned by the three models. The performance of WPD is even worse than that of SSA, which suggests that SSA may be a better signal reconstruction solution.

After the signals are transformed into the frequency domain via FFT, the average classification accuracy increases. As for CNN and PCA-SVM, their performance in the time domain becomes extremely good. Thus, accuracy is easily affected, resulting in no obvious improvements or a slight decrease in accuracy. For the other three models, with data not preprocessed, the score of PCA-DT increase by 13.40%, that of PCA-RF by 3.13%, and that of PCA-AdaBoost by 0.35%. As for the reconstructed signal of SSA, the accuracy of PCA-DT increases by 22.16%, that of PCA-RF by 5.98%, and that of PCA-AdaBoost by 2.30%. Furthermore, WPD also shows the same improving trend.

In general, PCA-SVM is the best model in the test set 1. The deep learning model may not be better than the classical machine learning model. SSA mainly works on CNN in the time domain but does not significantly improve the performance of other models. Thus, the preprocessing method should be designed carefully in case of adverse effects. Transforming the time domain data into the frequency domain can improve diagnosis accuracy.

Figure 8 presents the confusion matrix of the raw signals classification result by CNN. It is shown that the main error of the classification is actually concentrated on the bearing. The accuracy of identifying ball spalls in input supporting bearing is the lowest. It is mostly confused with the outer race fault followed by the sun gear fault. Inner race fault is likely to be identified as outer race fault or ball fault. Outer race fault and sun gear fault are easily confused with the planet gear fault. Other types of faults are recognized with high accuracy. These confusions are probably because the rolling bearing supports the sun gear, and the distance between them is small, which leads to similar signal transmission paths.

For test set 2, the accuracy of different methods is shown in Table 4. In the time domain, CNN is the best model for both preprocessed and raw signals. As for the frequency domain, PCA-AdaBoost and PCA-SVM exhibit the best performance, which is the same as that for test set 1. On average, CNN exhibits the highest accuracy. Models ranked behind CNN are PCA-SVM, PCA-AdaBoost, PCA-RF, and PCA-DT. The results are similar to those of test set 1.

With respect to test set 2, the reconstruction of the original current signal by SSA plays a positive role, which is different from that for test set 1. The accuracy of all the models in the time domain increased significantly after SSA, especially that of CNN, which suggests that SSA increases the robustness in the time domain. Furthermore, it still outperforms WPD. In test set 2, the component that contains weak fault characteristics may be exposed via the decomposition method of SSA rather than WPD. However, the positive effect in the frequency domain is not significant.

The advantage of transforming to frequency domain is more evident, whether the preprocessing method is used or not. The performance in the frequency domain is better than that in the time domain. Furthermore, even the accuracy of SVM increases.

Our paper also uses vibration signals for comparison. As far as CNN is concerned, the accuracy of model classification for vibration signals is 99.0%. Although the best classification accuracy of the current signal in test set 1 exceeds that of the vibration signal, the training time consumed by the former is much longer than that of the latter. In the experiment, only five epochs were required, and the classification accuracy of the vibration signal can reach 99.0% while the current signal requires more than 100 epochs. Additionally, the model of vibration signal can easily attain an accuracy of 95% in test set 2, while the current signal can only attain a maximum accuracy of 81.34%. This proves that the model trained via vibration signal exhibits higher robustness. Therefore, the method based on the vibration signal is still better than the method based on the current signal.

#### 6. Conclusions

In this paper, deep learning and different machine learning models are used for fault diagnosis of bearings and planetary gearboxes. The current signal of the motor is processed via different methods, and the results are compared.

From the perspective of models, PCA-SVM is the best model for learned data while CNN is the best model for unlearned data in the time domain. In the frequency domain, PCA-AdaBoost and PCA-SVM exhibit the best performance in both test sets. PCA-DT is the least recommended model. From the perspective of preprocessing, SSA mainly increases the accuracy of CNN in the time domain and exhibits a positive effect on unlearned data in the time domain. However, in other cases, the accuracy decreases due to SSA. This suggests that an effective preprocessing method should be designed for the target model in the future. Finally, by transforming the time domain data to the frequency domain data, the accuracy increases significantly, which in turn shows that fault features are more likely to be exposed in the frequency domain.

These conclusions can aid in the selection of methods for future studies related to the fault diagnosis of planetary gearbox based on MCSA.

#### Data Availability

The motor current data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The research work described in the paper was supported by the National Science Foundation of China under Grant no. 11872222 and the State Key Laboratory of Tribology under Grant no. SKLT2019B09.