Abstract
Check valve is one of the most important components and most easily damaged parts in high pressure diaphragm pump, which is a typical representative of reciprocating machinery. In order to ensure the normal operation of the pump, it is necessary to monitor its running state and diagnose fault. However, in the fault diagnosis of check valve, the classification models with single kernel function can not fully interpret the classification decision function, and meanwhile unreasonable assumption of diagnostic cost equalization has a significant impact on classification results. Therefore, the multikernel function and costsensitive mechanism are introduced to construct the fault diagnosis model of check valve based on the multikernel costsensitive extreme learning machine (MKLCSELM) in this paper. The comparative test results of check valve for high pressure diaphragm pump show that MKLCSELM can obtain fairly or slightly better performance than ELM, CSELM, MKLELM, and multikernel costsensitive support vector learning machine (MKLCSSVM). At the same time, the presented method can obtain very high accuracy under imbalance datasets condition and effectively overcome the weakness of diagnostic cost equalization and improve the interpretability and reliability of the decision function of classification model. It, therefore, is more suitable for the practical application.
1. Introduction
High pressure diaphragm pump is the most important equipment for high concentration slurry pipeline transportation. Its working condition is directly related to whether the pump can be restarted after stopping and whether it will produce accelerated flow in batch transportation. Check valve is the core and the easiest damaged component of the high pressure diaphragm pump. In order to ensure the normal operation of the pump, it is necessary to monitor its running state and diagnose fault [1]. So, the research of condition monitoring and fault diagnosis of the high pressure diaphragm pump has important practical significance in promoting development of slurry pipeline transportation field.
However, the fault characteristics of reciprocating machinery are difficult to extract because of its complex structure, multiple excitation sources, unstable operation, and so on [2]. In order to complete the condition monitoring and fault diagnosis of reciprocating machineries effectively, both domestic and foreign scholars have introduced the fault diagnosis methods of rotating machinery into the fault diagnosis of reciprocating machinery and made many valuable research results [3–5]. Ogle and Morrison [6] analyzed the failure accident of diaphragm pump and found that the environmental stress cracking of diaphragm is one of the main reasons for the diaphragm pump failure. The research results have provided effective theoretical support for accident prevention and pipeline maintenance and greatly reduced maintenance costs. In recent years, wavelet transform and Fourier transform, information entropy, neural network, bispectrum analysis, feature fusion, evidence theory, chaos theory, fractal theory, decision tree, and SVM have been widely applied to the fault diagnosis of reciprocating machinery, and many significant research achievements have been obtained [7–16]. Yet compared with the fault diagnosis of rotating machinery, there are still many research contents to be improved: the data sample size of reciprocating machinery is huge and a great deal of multisource heterogeneous information is held within them due to the influence of complex structure, multiple excitation source, multiple wearing parts, coupling of the signal, and strong nonlinearity of reciprocating machinery. It is not reasonable to use a single kernel function (such as radial basis function kernel, polynomial kernel function) for processing all the samples, and it is unable to explain the signal completely. Consequently, it is an inevitable choice to combine multiple kernel functions to achieve better processing results [17–19]. It is impossible for fault diagnosis models to get ideal classification results when datasets of fault diagnosis are not balanced (the fault samples are far less than the normal samples) and the diagnostic cost is unequal (e.g., the diagnostic cost between “the normal state which is identified as a fault state” and “the fault state which is identified as a normal state” is quite different; the former will only result in an “invalid” examining and repair for operator, but the latter will result in major safety incidents), so the hypothesis deficiency of minimum classification error and diagnostic cost equalization in the existing classification model need to be overcome [20]. At present, BP neural network and SVM are relatively mature classification learning methods and play important role in the fault diagnosis of reciprocating machinery. But, BP neural network has the problems of easily falling into local minimum, being not convergence, and so on. Meanwhile, the optimization calculation load of SVM increases with the optimization parameters and data sample size. And many parameters will be optimized to get the optimization SVM classification model. It, therefore, is one of the hot topics to explore new classification method which has the advantages of fast training speed and fewer optimization parameters to obtain global optimal solution [21].
In recent years, ELM is widely used because of its effectiveness, high speed, being easy for implementation, and multiclassification in the related fields of machine learning [22–24]. Moreover, the modified ELM models can validly solve the problems of imbalance sample and obtain better performance [25–28]. Therefore, the modified ELM methods have become the main research direction. For one thing, the transfer function of the original hidden layer based on random feature mapping will be substituted for the more efficient transfer functions. Then, the sigmoid function and radial basis function (RBF) [29–32] which are widely used in neural networks have been introduced into ELM and obtained better experiments results. For another thing, how to improve classification performance of ELM under multisource heterogeneous data and information fusion is also one of the latest research trends of the modified ELM classification models. Liu et al. [33] proposed the multikernel ELM (MKLELM) combined with the multikernel learning with constraints. Compared with traditional ELM, the MKLELM can solve these issues, including the selection and optimization of multikernel function, the application of multisource heterogeneous data processing method, and information fusion method in the classification. But, in [33], the researcher does not consider the impact of classification cost on the classification model. So, the costsensitive mechanism was introduced into the conventional ELM [34], and a new classification model based on costsensitive is proposed to conquer the drawback of diagnostic cost equalization. But, it is not very effective in dealing with the multisource heterogeneous data and information fusion because of the restriction of single and permanent kernel during the subsequent processing.
With the intensive study of ELM theory and application, the MKLELM and CSELM have greatly promoted the development of ELM. But there is still plenty of room for improvement and extension. This is typically shown in two aspects: how to select the most appropriate costsensitive method; how to construct more general multikernel function which can be widely used in fault diagnosis field. Based on the points discussed above, the multikernel function and costsensitive mechanism are introduced into ELM to construct the fault diagnosis model based on MKLCSELM for check valve of high pressure diaphragm pump in this paper.
This paper has the following main contributions. First, the advantages, shortcomings, and the application ranges of oversampling, undersampling, and threshold adjusting are analyzed to provide theoretical support for the choice of costsensitive methods. Second, a new fault diagnosis method based on MKLCSELM is proposed to diagnose the check valve faults of high pressure diaphragm pump. Third, the comparison experiments of ELM, CSELM, MKLELM, MKLCSSVM, and MKLCSELM are carried out, and the effectiveness of the proposed MKLCSELM method is verified.
The remainder of this paper is organized as follows. Section 2 describes the fundamental theory of ELM, MKLELM, costsensitive learning, and evaluation index of classification model. Section 3 presents the implementation process of the proposed method in detail. Section 4 elaborates experimental process. Section 5 shows the experimental results analysis. Section 6 offers the discussion and conclusion.
2. Related Work
2.1. Extreme Learning Machine (ELM)
From the classification optimization point of view, the principle of ELM is similar to SVM and LSSVM, whose goal is to obtain the minimum training error and maximum classification margin or generalization ability. So, on the basis of SVM principle analysis, the optimized mathematical model of ELM is described as follows [35]:
In (1), stands for the connecting weighting coefficients between hidden layer and output layer, is Frobenius norm, represents regularization parameter or penalty factor which achieves the balance between the minimum training error and maximum classification margin, is the th column of the error matrix represents the transpose of matrix (similarly hereinafter), is the output function of hidden layer for the input neuron , represents the given training set, and represents the case that sample belongs to the classification label . and are the number of training samples and categories, respectively. According to KKT (Karush Kuhn Tucker) theory, the analysis solution of (1) is calculated and the detailed solution process can be read in [36]. The solution of the output weight is solved using the MoorePenrose :
In (2), the output matrix of the hidden layer is , the output result of ELM classification model is , and is identity matrix.
For a given new sample , the output decision function of the ELM is shown as follows:
2.2. Multikernel Extreme Learning Machine (MKLELM)
The common definition of multikernel function is the linear combination of basic kernel function. So, the combination coefficient of optimal kernel function and the maximum margin of ELM are the key and core of the MKLELM [33]. A typical form of multikernel function is shown in
In (4), represents basic kernel functions.
For the convenience of processing and computing, the combination coefficients of basic kernel function satisfy restricting condition . The feature mapping of (4) is shown in
In (5), and are the high dimensional feature mapping of and , respectively.
In the construction process of multikernel function, RBF kernel function, Laplace kernel function, and inversedistance kernel function are selected as basic kernel functions.
In (6), is the parameter of kernel function. In this paper, the value of is calculated by . At the same time, represents the mean Euclidean distance between the samples.
In order to insure that the final solution and combination kernel function of multikernel optimal problem are subject to the boundedness and symmetric positive semidefinite, respectively, the norm is used as the constraint condition of the combination coefficient of the multikernel function. The different value of in the norm represents different constraint norm. According to the theoretical basis of multikernel SVM [37, 38] and (5), the theoretical expression of conventional MKLELM is described as follows:
In (7), the connecting weighting coefficient is and is the connecting weighting of the th basic kernel function.
Substituting (5) into (7), the expression of MKLELM is obtained and shown in
If is equal to , then (8) can be simplified to
Equation (9) is similar to the expression of ELM. So, the Lagrangian function of MKLELM can be calculated:
In (10), and are the Lagrangian multiplier. Then, KKT optimization condition is calculated and shown in
The matrix form of (11) is expressed in
In (12), the compound kernel function represents . Then, the solution of is shown as follows:
At the same time, the combination coefficient of multikernel function can be calculated by the derivative of
The sparse MKLELM constrained norm is given by in (14). The optimal parameter of and is calculated by iterative optimization methods. Now, for a given sample , the output decision function of MKLELM is expressed as follows:
In (15), the component of represents .
2.3. CostSensitive Methods
The costsensitive methods largely fall into three groups [39]: constructing the costsensitive classification model directly, establishing the costsensitive classification model using the Bayesian risk theory, and building the costsensitive classification model by changing the samples distribution. The latter two methods are emphatically introduced [40].
Assuming that the number of given the class labels of training set is and the number of training samples in each category is , the classification cost is defined as follows.
Cost () is described as the misclassification cost where the category is misclassified as category . Then, Cost is obvious.
Cost() represents the total cost function of category , namely, .
The cost expressions of oversampling, undersampling, and threshold adjusting by definition are discussed as follows.
In oversampling and undersampling, the cost expression of is defined bywhere is the number of categories . represents the resample category of oversampling and undersampling, which is calculated by (17) and (18), respectively:
However, the realization principle of threshold adjusting can be interpreted as follows:
In (19), is the actual output of different output nodes of ELM, and it satisfies constraint condition . is the normalization coefficient of . At the same time, the output of threshold adjusting also satisfies constraint condition .
2.4. The Evaluation Indicators of Classification Model
The evaluation indicators of binary classification and multiclassification are introduced to validate the effectiveness of the proposed method in the section.
2.4.1. The CostSensitive Evaluation Indicators of Binary Classification
In binary imbalanced learning, the cost matrix is shown in Table 1. It is generally recognized that the cost of correct classification is defined as .
Based on Table 1, the costsensitive evaluation indicators of binary classification are defined as follows.
The classification accuracy of positive samples (AP):
The classification accuracy of negative samples (AN):
Global classification accuracy (Accuracy):
In (20)(22), represents module operation.
2.4.2. The CostSensitive Evaluation Indicator of Multiclassification
The costsensitive evaluation indicator of multiclassification is more complicated than binary classification. The indicator of robustness referred to in [41] is introduced to describe the classification performance in multiclassification. The robustness indicator is calculated by
In (23), is average cost of method . represents the maximum average cost of the designed method. The indicator of robustness is lower, and the robust performance of the method is better.
3. Classification Method of Imbalance Sample Distribution Based on MKLCSELM
The main procedure of proposed MKLCSELM method involves data preprocessing (data normalization and feature extraction), construction of multikernel function, and costsensitive learning. The brief process of MKLCSELM is shown in Figure 1. The detailed process of the proposed method is described in Algorithms 1 and 2. The oversampling process refers to Algorithm 1 and the principle of undersampling is similar to oversampling. Algorithm 2 is the implementation process of threshold adjusting method.


4. Experimental Description
4.1. The Principle of Check Valve and Experiment Platform
4.1.1. The Principle of Check Valve
The check valve completed a process of feeding and discharging in every stroke of the diaphragm pump. Assume that the stroke coefficient of diaphragm pump is 50 r/min and the reciprocating action of inlet and outlet check valve will be 72000 times when it is in the normal operation for one day. Therefore, the check valve is core component of frequent motion in diaphragm pump, and it also turns into one of the most important reasons for the check valve failure. The high pressure diaphragm pump and the failure check valve for mineral slurry pipe transportation with solidliquid twophase flow are shown in Figure 2.
In Figure 2, the check valve of the high pressure diaphragm pump is a conevalve and its simple structure is shown in Figure 3. And “spoolspring” forms a weakly damped oscillation system. There are two reasons for the vibration of the system: one is external factor (resonance); the other is caused by its own characteristics. When the frequency of the external excitation source is an integral multiple of the natural frequency of the valve system, the resonance of the whole system will occur during work. So, the different running states of the check valve can be effectively judged by analyzing the vibration signal of the check valve.
4.1.2. Vibration Data Acquisition Experiment Platform
Figure 4 is the experiment platform of check valve. The threecylinder diaphragm pump includes 3 pairs of check valves, which means that it includes 3 inlet check valves and 3 outlet check valves. So, in the process of data acquisition, the six PCB 352C33 accelerometers are installed on the check valve housing to collect vibration data by a PXI3342. The data sampling frequency is 2560 Hz and the data point is 20480.
(a) Inlet check valve
(b) Outlet check valve
(c) Data acquisition device
4.2. Experimental Setting
The data attributes of check valve and classification information are defined as in Table 2.
Based on the data characteristics in Table 2, the three kinds of cost matrixes are introduced and defined as follows [42].
4.3. The Feature Extraction of Wavelet Packet Energy Entropy
Figure 5 shows the time and frequency waveform of the vibration signal for the check valve under 3 different operating conditions, including normal condition (NC), stuck valve fault (NK), and abrasion fault (NM). From point of the time domain and frequency domain waveform, it can be seen that the abnormal check valve has occurred, but further reasons or categories can not be obtained. In order to realize the automatic identification of the different running states of the check valve, it is necessary to extract the effective characteristics of the running state and then construct the state identification model.
(a) NC
(b) NM
(c) NK
The feature extraction makes full use of the advantage of wavelet packet and entropy in this paper. The thirdlayer wavelet packet energy distribution coefficient and energy entropy are extracted as characteristic parameters of the following classification model [42]. The selection of feature extraction method is based on the following points to consider.
It is by using wavelet packet technique that the vibration signal of check valve can be mapped to waveletbasis functions without information loss and has the superior ability in localization analysis of nonstationary signal.
Entropy is introduced into depicting the operation state characteristics for check valve. This is mainly because the more disordered the system is, the greater the entropy becomes. And then, we can extract sensitive and transient features to describe the operation state of check valve.
The steps of feature extraction are listed below.
Signal decomposition and reconstruction: the vibration signal of check valve is analyzed by three layers’ wavelet packet transform to get the wavelet coefficients of the thirdlayer decomposition. In this paper, “db10 wavelet” is chosen as basic waveletbasis function, which is mainly because “db10 wavelet” can well reflect the sensitive and transient features of vibration signal of check valve.
Extraction feature vector: the wavelet packet energy distribution coefficient of reconstructed signals of the thirdlayer wavelet packets coefficients and energy entropy compose the feature vector and can be calculated as follows:where denotes the number of component signals () and represents the energy of the reconstruction signal of thirdlayer wavelet coefficients.
According to the definition of feature extraction in (27) and (28), the feature vectors of check valve can be calculated. Because of the limited space, partial features (not all features) are shown in Table 3. Compared to the normal check valve with the fault check valve, the operating conditions will be easily distinguished based on the wavelet packet energy distribution coefficient and energy entropy . It shows that the feature extraction method based on wavelet packet energy entropy is effective and reliable.
5. Discussion of Experimental Results
Based on the definition of cost functions, the diagnosis cost matrix is constructed and shown in Table 4. The value of diagnostic cost is from 1 to 5 () and increases by certain step length (usually 0.5) in the experiments.
In the experimental processing, the 110 data samples are collected, including 70 NC data samples, 20 NK data samples, and 20 NM data samples. The data samples of the check valve will be processed by combining the cost matrix shown in Table 4 with theoretical illustration of oversampling, undersampling, and threshold adjusting in Section 2.3. Then, the fault diagnosis classification models of ELM, CSELM, MKLELM, MKLCSELM, and MKLCSSVM are constructed. The experimental results of binary classification and multiclassification for check valve are elaborated as follows in detail.
5.1. The Experimental Results Analysis of Binary Classification for Check Valve
In the binary classification experiments, the datasets of NC and NK are selected as the test data. The cost matrix is consistent with Table 4. The experimental results are described as follows.
5.1.1. The Experimental Results of Oversampling
The data sample distribution of oversampling is calculated and shown in Table 5 according to cost matrix in Table 4, (16), and (17). The 90 data samples are collected, 54 samples are selected as training samples, and the remaining 36 samples as test samples. Then the recognition results of classification models are presented in Figure 6.
(a) AP
(b) AN
(c) Accuracy
As seen in Figure 6, some conclusions can be observed, including the following: In the costsensitive processing of oversampling, the AP of CSELM, MKLCSSVM, and MKLCSSVM increases at first then decreases with increasing cost , the AN increases at first then reaches steady state with increasing cost , and the global classification accuracy (Accuracy) increases at first and then decreases with increasing cost . The recognition results of ELM and MKLELM method do not change with increasing cost , which is mainly because the data distribution of the mentioned ELM and MKLELM does not also change. Therefore, it is only for the comparison of experimental results and independent of the diagnostic cost . In CSELM, MKLCSSVM, and MKLCSELM method, the optimal recognition effect is obtained when the diagnostic cost is . Compared with the ELM and MKLELM methods without diagnostic cost, the diagnostic cost can improve the accuracy and reliability of classification models in CSELM, MKLCSSVM, and MKLCSSVM method. From the experimental results, we can also see that the multikernel learning mechanism is also helpful to further improve the diagnostic performance of the classification models. At the same time, Figure 6 also shows CSELM and MKLCSELM are more sensitive to the cost than the MKLCSSVM.
5.1.2. The Experimental Results of Undersampling
The data sample distribution of undersampling is calculated based on the Table 4, (16), and (18). Then, the recognition results of the abovementioned classification models are displayed in Figure 7.
(a) AP
(b) AN
(c) Accuracy
As shown in Figure 7, some conclusions can be obtained, which are similar to the results of oversampling methods. Moreover, the classification performance of mentioned classification models for check valve is slightly poor in undersampling. The experimental results found that the major problems are mostly owing to lack of the enough samples of check valve and the extreme imbalance of sample distribution is caused in the undersampling processing. At the same time, we can also observe an interesting phenomenon that when the sample is very small, the classification results of MKLCSELM are slightly worse than the other classification models. This is probably an indirect argument that the training process of MKLCSELM also needs the sufficient samples and the essence of MKLCSELM is the singlehidden layer feedforward neural network. At the same time, the presented results also indirectly demonstrate the superiority of SVM in classification with smaller samples.
5.1.3. The Experimental Results of Threshold Adjusting
Based on the Table 4 and (19), the recognition results of five classification models in threshold adjusting are presented in Figure 8.
(a) AP
(b) AN
(c) Accuracy
In Figure 8, the classification models of CSELM, MKLCSSVM, and MKLCSELM can obtain good effects due to the introduction of costsensitive learning mechanism. The AN, AP, and Accuracy of aforementioned classification models are significantly improved with the increasing cost . At the same time, the misclassification and missed diagnosis samples are sharply reduced with the increasing cost . Compared with the performance of oversampling and undersampling, the experimental results show that the threshold adjusting algorithm can also achieve satisfactory results. Therefore, the costsensitive method of threshold adjusting is also one of the effective choices for imbalance and inequality diagnosis cost in binary classification problems.
5.2. The Experimental Results Analysis of Multiclassification for Check Valve
In order to test validity and generalization ability of MKLCSELM, the aforementioned three costsensitive methods are applied to identify multioperation states of check valve. Then the effectiveness of the proposed method is verified by multiclassification tests.
5.2.1. The Experimental Results of Oversampling
In the multiclassification experimental processing, the 110 data samples are collected, 66 samples are selected as training samples, and the remaining 44 samples are as test samples. The data sample distribution of oversampling is calculated based on (16) and (17). And the recognition results of classification models are presented in Figure 9.
(a) AP
(b) AN
(c) Accuracy
As seen in Figure 9, the classification accuracy of CSELM, MKLCSSVM, and MKLCSELM increases with the increasing cost . On the contrary, the misclassification samples sharply reduce with the increasing cost . The three classification models of CSELM, MKLCSSVM, and MKLCSELM can gain the optimal classification performance when the cost is equal to 2.5 in oversampling processing. Meanwhile, compared with the experimental results illustrated in Figures 9(a), 9(b), and 9(c), some conclusions are summarized as follows: The CSELM and MKLCSELM are more sensitive to the cost than the MKLCSSVM. The classification performance of MKLCSELM is slightly better than other abovementioned classification models. The change regularity of classification accuracy, misclassification, and missed diagnosis samples with the cost is obtained and shown as follows: the diagnosis cost can be regarded as a demarcation line and inflection point of classification accuracy. The misclassification and missed diagnosis samples drastically reduce when the cost is less than 2.5. And the misclassification samples are reduced to 0 and reached balanced state when the cost is greater than 2.5. But the missed diagnosis samples are sharply increased and the classification accuracy is also gradually decreasing. The experimental results show that the abovementioned costsensitive methods are feasible in check valve fault diagnosis of industrial field.
5.2.2. The Experimental Results of Undersampling
Similar to the previous oversampling approach, the data sample distribution of undersampling is calculated. Then, the experimental results of mentionedabove classification models are presented in Figure 10.
(a) AP
(b) AN
(c) Accuracy
As shown in Figure 10, the classification accuracy of multikernel costsensitive diagnosis models is obviously decreased due to sharply reducing of data samples in undersampling. But, the misclassification samples can be also effectively restrained (even reduced to 0) by undersampling when the cost is equal to 2.5. However, Figure 10 also shows that the undersampling method should not be used in the conditions of the insufficient samples and highaccuracy requirements.
5.2.3. The Experimental Results of Threshold Adjusting
In the same way, the multiclassification recognition results of five mentioned classification models by the threshold adjusting are presented in Figure 11.
(a) AP
(b) AN
(c) Accuracy
As depicted in Figure 11, in threshold adjusting processing, the misclassification samples are reduced to 0 when the cost is increased to 2.5. The costsensitive classification models reach balanced state when the cost is increased to 2.5, but the missed diagnosis samples and accuracy of CSELM, MKLCSSVM, and MKLCSELM have no obvious change with the continuous increasing cost .
5.3. Robust Performance Evaluation of Three CostSensitive Methods for Check Valve
In order to assess the effectiveness of three costsensitive classification methods and choose the proper evaluation index for fault diagnosis of check valve, the robust performance evaluation according to the description in Section 2.4.2 is calculated; the change regularity of robust performance index varying with cost is obtained and shown in Figure 12.
(a) CSELM
(b) MKLCSSVM
(c) MKLCSELM
Figure 12 shows the comparative tests of robust performance evaluation in three costsensitive methods. The robust performance index of the undersampling is biggest. That is to say, when the sample distribution is very imbalanced, it is not suitable to adopt the costsensitive method of undersampling. Moreover, in CSELM, MKLCSSVM, and MKLCSELM method, the robust performance index in oversampling decreases at first then increases with increasing cost and the robust performance index in threshold adjusting decreases at first and then reaches steady state with increasing cost . At the same time, Figure 12 also shows that the robust performance index of oversampling is smaller than the threshold adjusting when the diagnosis cost is less than 2.5, and then the change trend is reversed when the diagnostic cost is greater than 2.5. Therefore, the oversampling and threshold adjusting are more appropriate costsensitive methods in multioperation states recognition of check valve.
6. Discussion and Conclusion
6.1. Discussion
High pressure diaphragm pump is often used as the core power equipment in slurry pipeline transportation, and its operating conditions are extremely complex. Therefore, it is critical to improve state recognition accuracy for ensuring operation safety and stability. However, the check valve is the core component of the high pressure diaphragm pump, and it is one of the most easily damaged and frequently replaced parts. Meanwhile, in the developed data acquisition system of check valve, the vibration data with normal operation has been collected in most of the time; on the contrary, the vibration data of fault time and fault state accounted for less. Therefore, it is of great significance to identify the operation state of the check valve effectively under the condition of complex operation and information asymmetry. Inspired by multikernel learning and costsensitive analysis, a fast diagnosis method of check valve based on MKLCSELM is proposed. The presented MKLCSELM method can complete the rapid positioning and analysis of the check valve fault and provide theoretical support for the adjustment and optimization in operation conditions of check valve during the followup operation.
The multikernel learning mechanism is introduced to realize the multikernel projection of nonlinear and nonstationary data, which can overcome the limitation of incomplete information characterized with the single kernel function effectively and improve the ability to represent signals. Three kinds of common kernel function are used to construct multikernel classification model during the experiment. The introduction of multikernel learning can improve the recognition accuracy of classification model effectively through the analysis of MKLELM and ELM. In this case, what kind of kernel function and how many kernel functions are selected still lack normative choice mechanism. Therefore, we need to combine the signal characteristics and previous empirical rules about the selection to the kernel function so as to complete the selection of the effective kernel function and construct the multikernel function.
In order to overcome the deficiency of assuming that the classification cost is equal through the classification model and improve the actual adaptability of the model, the paper makes the choice of the common costsensitive processing methods to construct CSELM model. The effectiveness of the introduction to costsensitive mechanism has been demonstrated through the binary classification and multiclassification recognition results; the experimental results when using three kinds of costsensitive methods have also been compared with each other in different situations to provide theoretical support and guidance for the selection of costsensitive method. However, the cost of diagnosis needs to be moderate through the experimental comparison; otherwise it will reduce the overall recognition accuracy of the classification model.
6.2. Conclusion
The fault diagnosis model of MKLCSELM based on the multikernel learning and costsensitive learning is constructed, and the datasets of check valve are used to verify the effectiveness of the proposed method. By comparative tests, some conclusions can be summarized as follows.
The MKLCSELM can gain fair or better performance than the other classification models, including ELM, CSELM, MKLELM, and MKLCSSVM.
The comparative analysis of robust performance evaluation demonstrates that the oversampling and threshold adjusting costsensitive method are more appropriate choice in multiclassification application of check valve.
The study of three costsensitive methods shows that, by selecting the appropriate cost , the constructed classification model can reduce the misclassification rate, achieve the balance between misclassification rates, miss diagnosis rate, and accuracy, and also improve the overall reliability of the classification model.
The overall experimental results of the check valve show that the theory of multikernel learning and costsensitive learning can effectively overcome the disadvantage of the sample distribution imbalance and diagnostic cost equalization supposed in the conventional classification model and improve the accuracy and reliability of classification models.
Abbreviations
ELM:  Extreme learning machine 
MKLELM:  Multikernel ELM 
MKLCSELM:  Multikernel costsensitive ELM 
RBF:  Radial basis function 
KKT:  Karush Kuhn Tucker 
NK:  Stuck valve fault 
:  Regularization parameter 
AP:  The classification accuracy of positive samples 
:  The number of basic kernel functions is 
The typical form of multikernel function  
:  The high dimensional feature mapping of 
SVM:  Support vector machine 
CSELM:  Costsensitive ELM 
MKLCSSVM:  Multikernel costsensitive SVM 
LSSVM:  Least squares SVM 
NC:  Normal condition 
NM:  Abrasion fault 
Accuracy:  Global classification accuracy 
AN:  The classification accuracy of negative samples 
:  The combination coefficients of basic kernel functions 
:  The high dimensional feature mapping of . 
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (51765022, 61663017, and 51169007) and Science & Research Program of Yunnan Province (2015ZC005).