Abstract

Pumps are important components in aviation fuel hydraulic systems, and thanks to the development of sensor technology and industrial intelligence technology, it is possible to achieve efficient state monitoring of pumps. However, when data quality is poor or the amount of data is small, a single data-driven model may not be able to meet diagnostic accuracy. A condition monitoring method for hydraulic gear pumps based on mechanism-data fusion is proposed. The method combines a mechanism model based on the volumetric efficiency formula with a data-driven model based on vibration signals. First, the parameters of volumetric efficiency are solved by fitting the pressure–flow relationship. Subsequently, a multichannel fusion and multikernel function-weighted ensemble support vector classification (MCMK-SVC) is developed, to establish a data-driven model. Finally, through data-level fusion, feature-level fusion, and decision-level fusion, a condition monitoring model based on mechanism-data fusion is built. Experimental verification shows that the accuracy of the three levels of fusion models exceeds 96.9%. Compared to the single data-driven model or other traditional data-driven models, the accuracy of the proposed method has improved by 3% to 33%, demonstrating the effectiveness of the mechanism-data fusion model.

Keywords: condition monitoring; hydraulic gear pumps; mechanism-data fusion; support vector classification

1. Introduction

The swift advancement of artificial intelligence (AI) technology has catalyzed a transformative wave in the industrial sector. With AI at its core, a myriad of data-driven approaches for monitoring and anticipating equipment conditions have been developed [13], showcasing their strengths in accuracy, efficiency, and proactive insights. These innovations have played a pivotal role in augmenting production efficacy and curtailing production costs [4].

Hydraulic gear pumps, celebrated for their compact architecture and outstanding performance, act as the cornerstone of hydraulic systems. They are broadly deployed across various fluid transportation sectors, including aviation. Once a hydraulic gear pump malfunctions, it can affect the operation of the entire transmission system, so it is necessary to detect its faults quickly. Research indicates that component wear due to aviation fuel contamination is a primary factor affecting hydraulic systems’ reliability and structural integrity [5]. Hence, delving into the issues of hydraulic component wear and hydraulic pump failure caused by oil pollution is crucial for ensuring the safe, stable, and reliable operation of hydraulic pumps and the overall equipment. This holds significant economic value and engineering significance.

Mechanism modeling is a traditional method for condition monitoring and fault diagnosis that has strong interpretability. Many scholars have conducted relevant research on the mechanism of pumps. Bensaad et al. [6] constructed a physical model of an axial piston pump and implemented a new leakage measurement method through Kalman filtering. Rituraj and Vacca [7] modeled the leakage flow rate of a gear pump by bending and contracting the flow rate. Novak et al. [8] investigated the effect of particle wear on hydraulic pumps. Peng et al. [9] proposed a three-dimensional (3D) reconstruction of wear particles using a multiview image sequence, to extract the 3D morphologies of wear particles. Guan et al. [10] constructed a theoretical model of the working characteristics of a spherical water pump and conducted relevant validation experiments. Analyzing the working characteristics of a pump through its signal is also an effective mechanism modeling method [11, 12]. The gear is an important part of pumps. Zharkevich et al. [13] have used the finite element method to optimize the gear pump casing, reducing the weight of the pump while ensuring strength.

As the frontiers of intelligent condition recognition technologies expand, there is a growing trend among scholars towards embracing data-driven strategies for effective condition monitoring of machinery [14]. In the specific context of hydraulic pumps, numerous researchers have initiated their exploration by focusing on the pump’s overall performance metrics. Lan et al. [15] and Liu et al. [16] all used the extreme learning machine (ELM) for the fault diagnosis of hydraulic pumps. Zhu et al. [17] applied the improved LeNet-5 and particle swarm optimization (PSO) to the intelligent fault diagnosis method of hydraulic piston pumps. The fault identification models based on convolutional neural networks (CNNs) have excellent performance [1822]. Sun et al. [23] used an improved inverse Gaussian process with random effects and measurement errors to predict the remaining useful life (RUL) of the hydraulic piston pump. Multisource fusion is increasingly being applied to pump fault diagnosis [24]. When faced with insufficient samples, semisupervised learning, unsupervised learning, and transfer learning have become excellent choices for data-driven models [2528]. Monitoring the status of individual components within a pump presents an alternative analytical perspective on the pump’s performance, with gears and bearings being critical elements that influence the condition of hydraulic gear pumps. Zhang et al. [29] have developed a digital twin model for bearings. Zhang et al. [30] introduced a multisource domain adaptive fault diagnosis technique, substantiating its efficacy through validation with data from both gears and bearings.

The preceding discussion has delineated two methodologies for equipment condition monitoring: mechanism models and data-driven models. Mechanism models are grounded in physical principles and offer robust interpretability. Nonetheless, they are characterized by intricate equations, restricted applicability, and challenges in providing real-time analysis, rendering them more apt for supplementary advisory roles. Conversely, data-driven models emerge as a highly efficient, real-time capable, and moderately adaptable monitoring technique. Despite this, they impose stringent data quality and quantity requirements and tend to function as a “black box,” which diminishes their capacity for explanation. Combining the mechanism model and the data-driven model can form complementary advantages and obtain a more comprehensive and accurate condition monitoring model.

Many scholars have verified the superiority of fusing data-driven models and mechanism models. Wang et al. [31] achieved mechanism-guided data to predict the loosening characteristics of bolted connections. Song et al. [32] combined simulated signals and experimental data by using multispectral equilibrium technology to realize the automatic fault diagnosis of variable speed bearings. Zheng et al. [33] proposed a capacity prediction framework for lithium-ion batteries based on an empirical model and a data-driven model, which achieves accurate capacity prediction in the event of battery aging. Ni et al. [34] developed a fast and accurate method to estimate the remaining capacity of failed LiFePO4 batteries based on a mechanism and data-driven fusion model. Li et al. [35] integrated knowledge features and the data-driven model to achieve high-power diesel engine fault diagnosis based on progressive adaptive spark attention learning. Dessena et al. [36] improved the traditional data-driven structural health monitoring system using the Loewner matrix.

Nevertheless, within the previously discussed research, the fusion of mechanism models and data-driven models is simplistic, lacking direct reflection of the equipment’s actual physical state and exhibiting a marked dependency on the quality and quantity of data. Given that real production settings are often fraught with substantial noise that can compromise data quality, such conventional methods might be inadequate. To address this, an innovative method for hydraulic gear pump condition monitoring that synergizes the mechanism model with the data-driven model is proposed. This technique incorporates the pump’s volumetric efficiency as a mechanism model, harnesses empirical triaxial vibration signals for a data-driven framework, and enhances the conventional support vector classification (SVC) model. Through multilevel fusion, including data level, feature level, and decision level, the data generated by the mechanism model can serve as a supplement for the data-driven model, and it achieves precise monitoring even when data quality and quantity are less than ideal.

The main contributions of this paper are listed below. 1.A novel condition monitoring method based on multichannel fusion and multikernel function-weighted ensemble support vector classification (MCMK-SVC) is proposed. This approach integrates the predictions of multiple channels through majority voting. Then, it forms an ensemble SVC model by combining multiple kernel functions with weighted combinations, thus enhancing the performance and applicability of the model.2.A novel condition monitoring method for hydraulic gear pumps based on the fusion of mechanism and data is proposed, which combines the simulated data generated by the volumetric efficiency formula with the vibration signals. Using data-level fusion, feature-level fusion, and decision-level fusion, the MCMK-SVC condition monitoring model is established to improve the accuracy of hydraulic gear pump condition monitoring.3.Based on the above, further compare the condition monitoring results of different methods, analyze the advantages and disadvantages of different methods, and confirm that selecting appropriate fusion methods in different scenarios and conditions will achieve better performance.

The rest of this paper is organized as follows. Section 2 introduces the methodology. In Section 3, the performance of the proposed method is verified on the data collected by the experimental platform of hydraulic gear pumps. Section 4 provides the conclusion.

2. Methodology

As shown in Figure 1, a multimodel condition monitoring method based on mechanism-data fusion is proposed for hydraulic gear pumps. The proposed method is implemented from three aspects: mechanism modeling based on volumetric efficiency, data-driven modeling based on MCMK-SVC, and mechanism-data fusion modeling. Fusion modeling includes data-level fusion, feature-level fusion, and decision-level fusion.

2.1. Volumetric Efficiency of Hydraulic Gear Pumps

The volumetric efficiency of hydraulic pumps refers to the ratio of the actual flow rate to the leak-free flow rate. It can reflect the performance of the hydraulic pumps: If the value is small, it indicates that the actual flow of the pumps is small and the leakage flow is large, which means that the clearance between the moving pairs is large and the wear is severe. If the value is large, the opposite is true.

In hydraulic pumps, hydraulic oil plays a variety of roles such as transmitting medium, coolant, and lubricant, and pollutants generated during the operation of hydraulic pumps are also carried away by it. Therefore, the viscosity of the oil directly affects the pumps’ efficiency [37]. The viscosity of hydraulic oil is sensitive to changes in temperature and pressure. The relationship between hydraulic oil viscosity, temperature, and pressure can be expressed as follows: where is the viscosity of hydraulic oil at the pressure of and the temperature of , is the viscosity of hydraulic oil at the pressure of 101.325 kPa and the initial temperature of , is the pressure viscosity coefficient (Pa−1), and is the temperature viscosity coefficient (°C−1).

Ignoring the flow of compression loss, the relationship between the leakage flow of hydraulic pumps and oil viscosity [38] is as follows: where represents volumetric loss, represents leakage coefficient, represents the pressure difference of the pump body pressure chamber, represents the mesh gap between the gear end face and the pump body end face, represents the radial gap between the tip of the tooth and the shell, represents the dynamic viscosity of hydraulic oil, represents the number of teeth in contact with the tip of the tooth and the shell, and represents the thickness of the tooth tip.

Equation (2) can be simplified to obtain the following.

Since all the terms of in Equation (3) are constant, let where is the leakage coefficient, determined by the structure of hydraulic gear pumps and other factors.

From Equations (1) and (4), it can be obtained that

The volumetric efficiency of hydraulic gear pumps can be expressed as where is the volumetric efficiency of hydraulic pumps, is the actual flow, and is the theoretical flow.

For hydraulic gear pumps, the theoretical flow is approximated as where is the number of gear teeth; is the modulus of the gears; and is the speed of the gears.

With simultaneous Equations (5)–(7), the relationship between volumetric efficiency and pressure and temperature can be expressed as

In the experiment on the wear of hydraulic pump pollution in this paper, the temperature is kept constant through an oil cooler, so the temperature and its related items are set as constant. Organizing Equation (8) yields where is the hydraulic oil viscosity and related parameters of the pumps, is the pressure of the pumps and it is a measured value, is the correlation coefficient with the pressure viscosity, is the correlation coefficient with the temperature viscosity, and , , and are the parameters to be solved, which are constants.

2.2. Proposed Data-Driven Model
2.2.1. Data Preprocessing and Feature Engineering

Data preprocessing and feature engineering are important methods to improve data quality. The first step is data preprocessing, which mainly includes signal dimensionality reduction and noise reduction. In this paper, a method for reducing the dimensionality of vibration signals was borrowed from the energy perspective [39], which means that the energy of the vibration signal is proportional to the square of its amplitude:

For signal denoising, complex signals are decomposed into a set of intrinsic mode functions (IMFs) by ensemble empirical mode decomposition (EEMD) [40]. Calculate the series of autocorrelation coefficient of each IMF and the raw signal, and then, calculate the correlation coefficient between the series of the raw signal and the series of IMFs. Add correlation coefficients larger than 0.5 to complete signal reconstruction and noise reduction.

After data preprocessing, feature extraction is performed, including features of the time, frequency, and time–frequency domain.

The time-domain features are shown in Table 1, where represent the th time-domain feature, represents the time-domain signal, represents the total number of sampling points of the signal, and represents the th sampling point, .

Perform a fast Fourier transform (FFT) on time-domain signals to obtain frequency-domain signals, and extract frequency-domain features as shown in Table 2, where represents the th frequency-domain feature; represents the frequency-domain signal; represents the nth spectral line, ; and represents the frequency corresponding to the th spectral line.

The time–frequency-domain features are extracted through wavelet packet decomposition. The raw signal is decomposed by -layer wavelet packets to obtain sub-bands. The energy of each sub-band is as follows:

The total energy of the raw signal is as follows:

The energy proportion of each sub-band is as follows:

The energy entropy is as follows:

In this paper, the vibration signal will be decomposed into eight sub-bands using three-layer wavelet packet decomposition, and the energy proportion of the eight sub-bands and energy entropy will be extracted as time–frequency-domain features.

Finally, 16 time-domain features, 6 frequency-domain features, and 9 time–frequency-domain features are extracted, totaling 31 features.

Excessive features may cause redundancy and increase computational complexity; therefore, feature screening is also necessary. In this paper, the importance of features will be calculated using three algorithms: extremely randomized trees, random forests, and AdaBoost. The importance of each feature is sorted from highest to lowest and added. When the sum is greater than 0.9, the features that participate in the accumulation are considered important and retained. Then, obtain the weight of the features through linear SVC, and the features with a weight of 0 are removed. Finally, draw a distribution map of the selected features, and through manual inspection, if there are still some features with a high degree of overlap, keep one of them.

2.2.2. Improved SVC-Based Multichannel and Multikernel Functions

The principle of SVC [41, 42] is to find a hyperplane to distinguish data with different labels and at the same time make the distance between different categories as large as possible, to maintain a strong generalization ability.

Given the training set , , the hyperplane can be expressed as follows: where is the Lagrange multiplier and is the distance between the hyperplane and the origin.

The hyperplane is usually used to distinguish linear data. For nonlinear data, the kernel function can be used for transformation.

To better utilize the information of vibration data and improve the performance of the data-driven model, a MCMK-SVC method is proposed. The method first splits the data according to different channels (dimensions) and then trains multiple SVCs using different kernel functions for each channel, including linear kernel, polynomial kernel, Gaussian kernel, sigmoid kernel, and Laplacian kernel. Next, the prediction results of multiple channels will be fused by majority voting. If more than half of the predictions are correct, it is considered a successful prediction. Finally, the weights of the corresponding SVCs for each kernel function are calculated through the proportion of successful predictions in different channels, and an ensemble SVC model is obtained through weighted ensemble. The modeling process of MCMK-SVC is shown in Figure 2.

2.3. Proposed Method of Mechanism-Data Fusion Modeling
2.3.1. Mechanism-Data Fusion Modes

Mechanism-data fusion model refers to the fusion of the mechanism model and the data-driven model in some way to form a new model. According to the different subject models, the mechanism-data fusion model can be divided into three fusion modes: (a) the mechanism model is integrated into the input layer, feature layer, algorithm layer, or decision layer of the data-driven model, with the data-driven model as the main and the mechanism model as the auxiliary; (b) based mainly on the mechanism model, supplemented by the data-driven model, utilizing the data-driven model to modify and compensate for the parameters of the mechanism model; and (c) fusion modeling of the mechanism model and the data-driven model.

The proposed method uses the fusion mode of the data-driven model as the main part and the mechanism model as the auxiliary part of the model. This mode means that the output of the mechanism model is integrated into the input layer, feature layer, algorithm layer, decision layer, and other processes of the data-driven model, as shown in Figure 3.

2.3.2. Levels of Mechanism-Data Fusion

Different fusion modes can derive different levels of fusion. The fusion mode with the data-driven model as the main body and the mechanism model as the auxiliary can be divided into four fusion levels: data-level fusion, feature-level fusion, algorithm-level fusion, and decision-level fusion [43]. The proposed methods are based on three other fusion levels besides algorithm-level fusion.

2.3.2.1. Data-Level Fusion

Data-level fusion [44] means that the output of the mechanism model is taken as the input of the data-driven model, or the output of the mechanism model is taken as the data subset, which together forms the dataset with the data subset of the data-driven model. Data-level fusion should first define the data types required for modeling, such as vibration, temperature, or pressure, and establish the corresponding mechanism models. Second, verify the accuracy of the mechanism model and fine-tune the parameters of the mechanism model. Finally, the output of the mechanism model is used as the data required for data-driven modeling.

The proposed data-level fusion combines the flow rate and volumetric efficiency data obtained from the mechanism model with the vibration signals collected from the hydraulic gear pump pollution loss test to form a dataset. The specific process is shown in Figure 4. The main steps are follows: a.The flow data and the volumetric efficiency data are obtained as mechanism data from the optimized volumetric efficiency formula of the hydraulic pump, and then, the mechanism data are upsampled through interpolation for subsequent feature extraction.b.Downsampling the vibration signals to match the mechanism data and then combining them to form a new dataset.c.The restructured data require preprocessing, feature extraction, and feature selection. The mechanism data are obtained from the mechanism formula, so there is no need for data preprocessing, and only time-domain features are extracted. When selecting features, it is necessary to ensure consistency between mechanism features and signal features.d.Finally, the features are inputted into the data-driven model, and the classification results and the model metrics are used as the data-level fusion results.

Data-level fusion can solve the problems of insufficient data (small sample) and poor data quality (multinoise). Relevant research shows that data-level fusion has gradually become a popular way to acquire data for uncomplicated systems or devices, such as some methods for multisource data fusion [45].

2.3.2.2. Feature-Level Fusion

Feature-level fusion [4648] means that the output of the mechanism model is processed (or not) to form subsets of features, which together with the subsets of feature of the data-driven model form sets of feature. The mechanism model reflects the internal operating law of the equipment, and its feature curve often represents the state and performance changes of equipment. Compared to features extracted from the data-driven model, features extracted from the mechanism model often have better performance. Therefore, based on the features extracted from the data-driven model and mechanism model, the combination of features obtained will be more representative, and the final output results will be more accurate.

The proposed feature-level fusion will extract and screen the features of the mechanism data and then form a new feature set together with the extracted features from vibration data. The new set of features is again subjected to feature selection and becomes the input of the data-driven model. The specific process is shown in Figure 5. The main steps are as follows: a.The flow data and volumetric efficiency data are obtained from the optimized volumetric efficiency formula of the hydraulic pump, and upsample them. And then, extract and screen the time-domain features of the mechanism data.b.Perform data preprocessing, feature extraction, and feature screening on the vibration signal to obtain its features.c.Combine the mechanism features and signal features into a feature set and then perform feature selection again to ensure consistency of the features.d.Finally, the features are inputted into the data-driven model, and the classification results and the model metrics are used as the results of the feature-level fusion.

2.3.2.3. Decision-Level Fusion

Decision-level fusion [49] means combining the results (evaluation metrics) of the mechanism model and the data-driven model into a new result in some way. This combination method can be implemented by weighted decision-making, classical reasoning, Bayesian inference, or Dempster-Shafer method. In this paper, the weighted combination method will be used in the experiment. The decision-level fusion method first reduces the uncertainty caused by the evaluation indicator of a single model, that is, whether the indicator of a single model can 100% reflect the current equipment state. Second, the proportion distribution of each evaluation indicator among all indicators is considered. The division is more reasonable, which helps improve the accuracy and credibility of the model and makes the results more convincing.

The proposed decision-level fusion will combine the metrics of the two models through weighted calculation, and the specific calculation method is as follows: where is the weighted calculation result; , , is the assigned weight value; and and represent the output value of the mechanism model and the data-driven model, respectively.

The final metric is the weighted calculation of and . The decision-level fusion modeling process is shown in Figure 6.

3. Experimental Verification

3.1. Modification of Parameters of the Volumetric Efficiency Mechanism Model

To solve and modify the parameters of Equation (9), a hydraulic gear pump oil pollution wear test is carried out to collect relevant test data and modify the parameters of the hydraulic gear pump oil pollution wear mechanism model.

The hydraulic gear pump pollution and wear test bench consist of oil tank, hydraulic gear pump, oil cooler, flow meter, particle pollutant detector, oil filter, loading valve, pressure gauge, and other equipment. The test bench is shown in Figure 7. First, prepare the equipment and set the temperature and working pressure. Second, perform no-load and pollution-free operation on the hydraulic gear pump, obtain the no-load flow rate of the hydraulic gear pump, and set the volumetric efficiency threshold. Then, inject contaminated particles into the hydraulic fluid, start the experiment, and record the relevant data. Finally, based on the recorded data, calculate the volumetric efficiency of the hydraulic gear pump, where . The experiment was carried out at two temperatures: 50°C and 60°C, and the temperature was basically constant through the oil cooler. Each experiment was carried out in 25 groups, with the pressure adjusted from 1 to 73 MPa, and flow data and pressure data were recorded at an interval of 3 MPa. Partial test data are shown in Table 3.

The volumetric efficiency in Table 3 is calculated. Plot a scatter plot based on pressure and volumetric efficiency data and perform curve-fitting on the data using Python to solve the parameters , , and in Equation (9). The fitting results are shown in Figure 8(a). Similarly, processing the volumetric efficiency data at a temperature of 60°C yields the results shown in Figure 8(b).

According to the fitting curve and Equation (9), obtain the following: 1.When the temperature is 50°C, the values of parameters in (9) are , , and , and the volumetric efficiency can be expressed as follows:2.When the temperature is 60°C, the values of the parameters are: , , , and the volumetric efficiency can be expressed as:

To ensure the accuracy of the volumetric efficiency mechanism model parameters, the determination coefficient is used to evaluate the fitting effect of the volumetric efficiency data. can be expressed as follows: where and represent actual and fitted values, respectively, and represent the mean of the actual values. The fitting degree of the volumetric efficiency data is shown in Table 4.

As shown in Table 4, the volumetric efficiency data at both temperatures have a fitting degree greater than 0.99, indicating a good fitting effect and a confidence level of 99% that the data obtained from the mechanism model are reliable. However, the volumetric efficiency data provided is a priori and static, unable to reflect the real-time state of the hydraulic pump, while the data-driven model pays more attention to the pump state data.

3.2. Condition Monitoring Model Based on MCMK-SVC
3.2.1. Dataset Construction

The experimental object is an LR025CLS hydraulic gear pump, and vibration signals are collected through a triaxial accelerometer installed directly above the meshing of the two gears of the pump. The sensor model is CT1000SLFP, and the installation position is shown in Figure 9. Then, the vibration signals are collected through software named FluMoSLightV01.50. The sampling frequency is 10,000 Hz, with each sampling period lasting 1 s and a total of 43 samples taken. Among them, there are 24 groups of data at 50°C and 19 groups of data at 60°C.

The collected vibration signal time sequence diagram is shown in Figure 10.

Compare the signals when the pump is working normally and when it fails, as shown in Figure 11, and it can be seen that the fault signal is more chaotic than the normal signal.

3.2.2. Condition Monitoring Modeling

After completing the collection of vibration signals, data preprocessing and feature engineering are performed following Section 2.2. Due to the poor quality of the vibration signal in the -direction, it was removed. Using the method of Equation (10), the remaining two directions of the signals are reduced and recombined to obtain the sequence . Then, the new data is denoised through EEMD, and finally, outlier processing and normalization are performed to complete data preprocessing.

The 31 features of the signal in the time domain, frequency domain, and time–frequency domain were extracted by feature engineering, and then, 13 features were selected by feature screening, including maximum value, square mean root, waveform factor, energy proportion of sub-band 1, energy proportion of sub-band 3, kurtosis factor, skewness factor, energy proportion of sub-band 6, energy proportion of sub-band 8, centroid frequency, mean square frequency, energy entropy, and variance frequency. Plot the distribution of these features into a curve, as shown in Figure 12. From Figure 12, it can be seen that there is almost overlap between energy proportion of sub-band 1, energy proportion of sub-band 3, energy proportion of sub-band 6, and energy proportion of sub-band 8, so only energy proportion of sub-band 3 is retained, because of its high information proportion, shown in Figure 13.

Divide the data into 43 groups, each containing 3 samples, and then, divide them into training and testing sets, where the training set contains 27 groups, and the test set contains 16 groups, as shown in Table 5.

Design a data-driven condition monitoring model base on the proposed MCMK-SVC. The data are split into three channels according to vibration signals in three different directions, and the data of each channel is trained through SVCs based on five different kernel functions, and the hyperparameters are optimized using the PSO algorithm. Due to the small amount of data, the average result of fivefold cross-validation is used to calculate the metrics of the validation set, resulting in Table 6.

The formulas for calculating each metric in Table 6 are shown in Table 7, where accuracy represents the proportion of correctly predicted samples among all samples; precision reflects the model’s ability to distinguish between negative and positive samples and is defined as the ratio of correctly predicted positive samples to the total number of predicted positive samples; recall indicates the proportion of correctly predicted positive samples among all actual positive samples, thus reflecting the model’s ability to identify positive samples; and -score is a comprehensive measure of precision and recall, calculated by taking the harmonic mean of the two. It is used to evaluate the performance of the model. The successful proportion represents the proportion of groups for which the prediction results of the three channels are still correct after the voting mechanism, out of all the training sample groups. This value is equivalent to the accuracy of the prediction after the vote.

The weights of each SVC based on different kernel functions in the ensemble will be calculated according to the principle of negation. As shown in the table above, the number of groups for which the five SVCs failed to predict is 2, 5, 4, 4, and 4, respectively. The proportion of each SVC in all failed groups is calculated separately, resulting in 2/19, 5/19, 4/19, 4/19, and 4/19, respectively. The kernel function with the highest success rate should be assigned the highest weight. Therefore, the corresponding weights of the five SVCs are 5/19, 2/19, 4/19, 4/19, and 4/19, respectively.

The performance of the data-driven model is tested using the test set, and the results of five SVCs and MCMK-SVC for the three channels after voting mechanism are shown in Table 8 and Figure 14.

Although MKMC-SVC is not the best performing model, its performance surpasses that of most single-kernel function SVCs. Due to weighted averaging, its performance is also slightly better than the average performance. In addition to the performance improvement brought about by multikernel function weighting, multichannel fusion also demonstrates superiority. As shown in Figure 15, the performance of linear kernel SVC in each channel fully utilizes information and improves the performance of the model through complementarity with each other.

Although the data-driven model based on MCMK-SVC has improved its performance compared to most single-kernel function SVC models, it still cannot meet the requirements in terms of various metrics. By combining the mechanism model with it and using high-confidence mechanism data, the performance of the data-driven model can be further improved.

3.3. Condition Monitoring Model Based on Mechanism-Data Fusion

Based on the above theory, a fusion model is proposed with the data-driven model as the main and a mechanism model as the auxiliary. Through three different fusion levels, data level, feature level, and decision level, MCMK-SVC condition monitoring models are established to achieve condition monitoring of hydraulic gear pumps and explore the applicable occasions of different fusion level modeling methods.

3.3.1. Data-Level Fusion Modeling

Construct the data-level mechanism-data fusion model according to the method introduced in Section 2.3. The flow data and volumetric efficiency data are obtained from the volumetric efficiency formula (Equations (17) and (18)). In order to extract features, it is necessary to enrich mechanism data through upsampling. Using the volumetric efficiency formula, interpolate the experimental data with an interval of 3 MPa into the simulated data with an interval of 0.1.

To match the mechanism data with the vibration data, it is necessary first to find the corresponding pressure at which the mechanism data transitions from normal to faulty. Then, on the basic of this pressure, the vibration signal needs to be downsampled. Taking the mechanism formula at 50°C as an example, a matching can be achieved with an interval of 3 MPa in the mechanism data. At this interval, the average flow rate change per 3 MPa is 0.0141 L/s and the sampling frequency of the vibration signal is 10000 Hz. Therefore, the number of sampling points should be . Similarly, the sampling point for the data at 60°C is 154.

Finally, the mechanism data and the downsampled vibration data are concatenated into a new dataset for data preprocessing and feature engineering. Among them, the mechanism data does not require data preprocessing and only extracts time-domain features. The vibration data are processed according to the method in Section 2.2. When selecting features, in order to ensure consistency, it is necessary to select features that are common to both types of data, and the resulting features include skewness factor, kurtosis factor, root mean square, peak value, square mean root, peak factor, pulse factor, maximum, and standard deviation. Since the flow rate and volumetric efficiency data are combined, the number of channels input to the MCMK-SVC is increased to five. Input the dataset composed of these nine features into MCMK-SVC condition motoring model, training, parameters tuning, ensemble modeling, and testing according to the methods in Section 3.2, and finally, obtain the results shown in Figure 16 and Table 9.

3.3.2. Feature-Level Fusion Modeling

According to the method introduced in Section 2.3, a feature-level mechanism-data fusion model is constructed. Mechanism data are obtained through the same approach as in data-level fusion. Subsequently, the mechanism data and vibration data are matched according to their label. Then, feature engineering is performed on the mechanism data to extract mechanism features. The vibration signal continues to undergo data preprocessing and feature engineering using the method proposed in Section 2.3, resulting in the same set of features as described in Section 3.2.

Concatenate the mechanism features and signal features to a new feature set, and then, perform another feature selection to ensure the consistency of the features, ultimately obtaining six features including maximum value, skewness factor, kurtosis factor, square mean root, root mean square, and waveform factor. Input this new feature set into five-channel MCMK-SVC for training, parameter tuning, ensemble modeling, and testing, and obtain the results shown in Figure 17 and Table 9.

3.3.3. Decision-Level Fusion Modeling

Decision-level fusion is achieved by integrating the metrics of the mechanism model and the data-driven model. The metric of the mechanism model is the determinability coefficient . In the experiment in Section 3.1, it was known that it is 0.99, so the output value of the mechanism model is taken as 0.99. The metrics of the data-driven model were obtained in the experiment in Section 3.2, shown in Table 8, and the accuracy of every SVC is used as the output value of the data-driven model. By comparing the metrics of the two models, the confidence level of the results of the mechanism model is better than that of the data-driven model. Therefore, the results of the mechanism model are believed to be more important and should have a higher weight, with a weight of 0.6, while the weight of the data-driven model is 0.4. The final results are shown in the Table 9.

3.3.4. Conclusion of Mechanism-Data Fusion Modeling

In comparison to single data-driven models, the performance of mechanism-data fusion models has been improved under various kernel functions. a.For data-level fusion, thanks to the high accuracy of mechanism data, the accuracy rate reached 100% in most SVCs. However, if only the channel with vibration data is considered, the prediction accuracy decreases due to the loss of effective information caused by downsampling of vibration data. Fortunately, by fusing two channels of mechanism data, as long as one of the remaining three channels can predict correctly, the final result is likely to be correct.b.For feature-level fusion, except for the polynomial kernel SVC and the sigmoid kernel SVC, the accuracy of other models also reached 100%. Compared to the single data-driven model and the data-level fusion model, the feature-level fusion model uses fewer features due to an additional feature screening process, which may result in slightly lower prediction accuracy. For example, the performance of sigmoid kernel SVC is not as good as that of data-level fusion, but it can further reduce computational complexity and improve prediction efficiency, and it also improves the final prediction results of MCMK-SVC.c.Decision-level fusion obtains a more comprehensive evaluation result by fusing the indicator of the mechanism model with the indicators of the data-driven model. Considering that the value of the determination coefficient of the mechanism model is 0.99, which means there is a 99% confidence level that the results of the mechanism model are correct, this result is highly accurate. Therefore, a higher weight can be assigned to the mechanism model, here taking a value of 0.6, while the weight of the metrics of data-driven models is set at 0.4, and the resulting metrics are all excellent.d.Through the results of mechanism-data fusion models at different fusion levels, the applicable scenarios for each fusion level can be determined. The conclusions are shown in Table 10.

When the data types of two models are different, data-level fusion usually requires downsampling, which may result in the loss of some useful information. For hydraulic pump condition monitoring, data-level fusion is the most appropriate if the sampling frequency of the flow data obtained by simulation is equivalent to vibration signals. Feature-level fusion is a good approach when data-level fusion does not perform well, as it can retain effective data as much as possible, even if features are still reduced in secondary screening, but its impact will be smaller than reducing raw data. The performance comparison between data-level and feature-level fusion in different channels of linear kernel SVC is shown in Figure 18. It is evident from the analysis that, despite the similar performance achieved after multichannel fusion, feature-level fusion demonstrates superior efficacy when considering the performance within the three original channels of the data-driven model. Decision-level fusion usually performs well when the mechanism data have high accuracy and are suitable for application scenarios that rely on final metrics for decision-making.

3.4. Comparative Analysis of Different Methods

To validate the effectiveness of the proposed method and the superiority of mechanism-data fusion, it will be compared with several other methods, including the following: Method 1: SVC with single channel and single Laplacian kernel function;Method 2: SVC with single channel and multiple kernel functions;Method 3: SVC with multiple channels and single Laplacian kernel function;Method 4: Proposed MCMK-SVC;Method 5: Data-level fusion;Method 6: Feature-level fusion;Method 7: Decision-level fusion;Method 8: ELM;Method 9: Backpropagation (BP) neural network;Method 10: CNN.

The compared results are shown in Table 11. The metrics of each model in Table 11 show that from Method 1 to Method 4, the improvements of SVC by multichannel fusion and weighted ensemble of multiple kernel functions have been validated, demonstrating the superiority of ensemble learning. Compared to single-data-driven models, the mechanism-data fusion models shown from Method 5 to Method 7 have achieved varying degrees of improvement on this basis. Finally, by comparing the proposed method with some traditional hydrostatic pump data-driven condition monitoring methods such as Method 8 to Method 10, it is evident that all prove that mechanism-data fusion models have higher predictive accuracy and generalization performance.

4. Conclusions

A novel approach for hydraulic gear pump condition monitoring that synergizes the mechanism model and the data-driven model has been introduced. This method integrates a volumetric efficiency-based mechanism model with an MCMK-SVC data-driven model at three distinct levels of fusion, data, feature, and decision levels, thereby enhancing the accuracy significantly, particularly in scenarios marred by data deficiencies or inferior quality. Furthermore, this approach can be coupled with digital twin technology for application in aviation fuel production lines, offering valuable guidance for fault monitoring during actual production processes.

However, certain limitations need to be acknowledged. The data-driven model is not sufficiently advanced and lacks transferability. Future considerations will include adopting more sophisticated and complex models, as well as exploring the potential for applying this method across varying operational conditions. Furthermore, we are dedicated to tackling intricate issues to integrate our method into operational workflows, propelling the smart evolution of aviation fuel production lines.

Data Availability Statement

The datasets used to support the findings of this study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

The authors received no specific funding for this work.