Abstract

Because the performance of single FAM is affected by the sequence of sample presentation for the offline mode of training, a fuzzy ARTMAP (FAM) ensemble approach based on the improved Bayesian belief method is supposed to improve the classification accuracy. The training samples are input into a committee of FAMs in different sequence, the output from these FAMs is combined, and the final decision is derived by the improved Bayesian belief method. The experiment results show that the proposed FAMs’ ensemble can classify the different category reliably and has a better classification performance compared with single FAM.

1. Introduction

Recently, artificial neural networks (ANNs) have been widely used as an intelligent classifier to identify the different categories based on learning pattern from empirical data modeling in complex systems [1]. For example, the BP, RBF, and SVM models have been developed quickly and utilized to classify the different fault classes of the machine equipment [26]. However, these traditional neural network methods have limitation on generalization, which can give rise to overfitting models for training samples. To solve the problem, the fuzzy ARTMAP (FAM) neural network is created and applied to the classification field [79], which is an incremental and supervised network model and designed in accordance with adaptive resonance theory. Although the FAM is able to overcome the stability-plasticity dilemma [10], in real-world application, the performance of FAM is affected by the sequence of sample presentation for the offline mode of training [11, 12].

For this drawback, some preprocessing procedures, known as the ordering algorithms such as min-max clustering and genetic algorithm [13, 14], have been proposed for FAM. Furthermore, a number of fusion techniques have been proposed for FAM to overcome this problem. Tang and Yan employed the voting algorithm of FAM to diagnose the bearings faults [15], Loo and Rao applied the multiple FAM based on the probabilistic plurality voting strategy to medical diagnosis and classification problems [16]. Since these voting algorithms do not consider the effect of the number of the sample in each class, an improved Bayesian belief method (BBM) is used to combine multiple FAM classifiers which are offline trained in different sequence of samples in this paper.

In view of the above principles, a novel ensemble FAM classifiers is proposed to improve the classification performance of single FAM. The identification schematic graph is shown in Figure 1. Firstly, through different features extraction methods, some feature parameters are extracted from the raw signals. Secondly, by the modified distance discrimination technique, the optimal feature set is selected from the original feature set. Finally, multiple FAM classifiers ensemble based on the improved BBM is employed to come up with the final classification results. The proposed method is applied to the fault diagnosis of hydraulic pump. The experiment results show the effectiveness of the proposed ensemble FAM classifiers.

2. Fuzzy ARTMAP Ensemble Using the Improved Bayesian Belief Method

2.1. Fuzzy ARTMAP (FAM)

FAM consists of two ART modules, namely, and modules which are bridged via a map field [10], which is capable to forming associative maps between clusters of input domain in which module functions as clustering and output domain in which the module functions as clustering. Each module comprises three layers: normalization layer , input layer , and recognition layer . The structure of FAM is shown in Figure 2. When the output domain is a finite set of class labels, FAM can be utilized as a classifier. The algorithm of FAM can be depicted simply as follows.

The module receives the input pattern, and the normalization of a -dimensional input vector , is complement-coded to a 2-dimensional vector Then, the dimension of the input vector is kept constant: Afterward, the input sample selects the category node stored in the network by the category choice function (CCF): where is a min operator, is the choice function of , and is the weight vector of the th category node.

When a winning category node is selected, a vigilance test (VT), namely, a similarity check against a vigilance parameter of the chosen category node, is taken place: where is the winning th node. When the above category match function (CMF) is satisfied with criterion, the resonance occurs and learning takes place; namely, the weight vector is updated according to the following equation: where is the learning rate. Otherwise, a new node is created in which codes the input pattern. In the meantime, for the the same learning algorithm occurs simultaneously using the target pattern.

After the resonance occurs in the and , the winning node in will send a prediction to via the map field. The map field vigilance test is used to detect the test. If the test fails, it indicates that the winning node of predicts an incorrect target class in ; then a match tracking process initiates. During the match tracking, the value of is increased until it is slightly higher than ; then a new search for the other winning node in is carried out, and the process continues until the selected node can make a correct prediction in .

2.2. Decision Fusion Using Bayesian Belief Method

The novel Bayesian belief method is supposed in [17]. It is based on the assumption of mutual independency of classifiers and considers the error of each classifier. Assume that in pattern space there are classes and classifiers. A classifier can be considered as a function: It signifies that the sample is assigned to class by the classifier . And its two-dimensional confusion matrix can be represented as which is obtained by executing on the test data set after is trained. Each row corresponds to class and each column corresponds to . The matrix unit means the input samples from class while are assigned to class by classifier . The number of samples in class is , where , and the number of samples labeled by is , where . Considering the difference of the number of samples in each class, on the basis of the confusion matrix a belief measure of classification can be calculated for each classifier by the following belief function [18]: When multiple classifiers are developed, their correspondent beliefs are computed based on the performance of base classifiers. Combining the belief measures of all fusion classifiers can result in the final belief measure of the multiple classifier system. In case of equal a priori class probabilities, the combination rule can be depicted as follows: Thus, is classified into a class according to the belief of making the final decision .

3. Case Study

In order to evaluate the effectiveness of the supposed ensemble FAM, the fault identification of hydraulic pump is taken as example. Figure 3 shows the schematic diagram of experiment rig. Four accelerometers are attached to the housing with magnetic bases and mounted at the positions P1, P2, P3, and P4. Pressure sensor is mounted at the position P5. Considering the sensitivity to the fault conditions of hydraulic pump, the vibration signal which is acquired by the accelerometer at the position P2 is utilized to identify the fault categories. And the vibration signals are acquired, respectively, under normal condition and the different fault conditions, such as inner plunger wear, inner race wear, ball wear, swashplate wear, portplate wear, and paraplungers wear.

3.1. Data Preparation

The data set contains 490 samples. These data samples are divided into 245 training and 245 test samples. The detailed descriptions of three data sets are shown in Table 1. In order to identify the different fault categories, a seven-class classification problem need be solved.

3.2. Feature Extraction and Selection
3.2.1. Feature Extraction

Feature parameters are used to characterize the information relevant to the conditions of the hydraulic pump. To acquire more fault-related information, many features in different symptom domains are extracted from the measured signals.

Frequency domain is another description of a signal. In [19], some novel features which can give a much fuller picture of the frequency distribution in each band of frequencies are proposed. Supposed points of normalized PSD, , of the vibration signal, are divided into segments, where is 1 in this study. The four features based on the moment estimates of power can be obtained as follows: where “” is the number of total data points and ′ is the number of sample points in the lth segment.

In order to characterize the spectrum with a higher accuracy, the moment estimates of frequency weighed by power are calculated by the following formulas: where is the corresponding frequency of and is the total power in the segment. Then, the total number of features extracted for each spectrum is 1 × 8.

To depict the fault-related information about the hydraulic pumps quantitatively, the first-order continuous wavelet grey moment (WGM) [20] of vibration signal is extracted. Assuming the wavelet coefficients matrix which can be displayed by the continuous wavelet transform (CWT) scalogram, and are the scales and the time of the scalogram, respectively, the matrix is divided into parts along the scale equally, and the first-order wavelet grey moment of each part can be calculated by the following equation: where is the element of matrix . In this paper, the is set as 8 and the wavelet function is Morlet wavelet.

In addition, due to sensitiveness of these model parameters to the shape of the vibration data, AR model parameters are utilized to characterize the information about the conditions of hydraulic pumps. The AR model is written as follows: where are the previous samples, is the predicted sample of the signal, and is AR model parameters, which can be obtained by the least square method in [21] and expressed by the following formula: where In this study, the parameter is set as 8.

Thus, 24 features constitute the original feature set.

3.2.2. Feature Selection

In order to improve the identification accuracy and reduce the computation burden, some sensitive features providing characteristic information for the classification system need to be selected, and irrelevant or redundant features must be removed. In this study, based on [22], a modified distance discriminant technique is employed to select the optimal features.

Supposing that a feature set of classes consists of samples, in the th class there are samples, where , and . Each sample is represented by features, and the th feature of the th sample is written as . Then, the feature selection process can be described as follows.

Step 1. Calculate the standard deviation and the mean of all samples in the th feature:

Step 2. Calculate the standard deviation and the mean of the sample in the th class in the th feature, respectively,

Step 3. Calculate the weighted standard deviation of the class center in the th feature: where , , , and are the centers of all samples in the th feature; is the center of the samples of the th class in the th feature; , are the weighted means of the squared class center and the class center in the th feature; is the prior probability of the th class, respectively; and .

Step 4. Calculate the distance discriminant factor of the th feature: where is the distance of the th feature between different classes, corresponds to the distance of the th feature within classes, and is used to control the impact of , which is set as 2 in this paper.
Considering the overlapping degree among different classes, a compensation factor is calculated as follows.
Firstly, define and calculate the variance factor of in the th feature as follows:
Secondly, define and calculate the variance factor of in the th feature as follows:
Then, the compensation factor of the th feature can be defined and calculated as follows:
Thus, the modified distance discriminant factor can be calculated as follows:

Step 5. Rank features in descending order according to the modified distance discriminant factors ; then normaliz by and get the distance discriminant criteria. Clearly, bigger signifies that the correspondent feature is better to separate classes.

Step 6. Set a threshold value and select the sensitive features whose distance discriminant factor from the set of features.

3.3. Diagnosis Analysis

It is well known that the data-ordering of training samples can affect the classification accuracy of single FAM, and that a single output used to represent multiple classes may lead to lower classification accuracy. In order to know how well the proposed FAMs’ ensemble work, that is, how significant the generalization ability is improved by utilizing the improved Bayesian belief method to combine the classification results from a committee of single FAM trained with different data-ordering of training samples, the performance of single FAM is also conducted.

In the diagnosis phase performed by the single FAM and FAM ensemble, they are all trained in the fast learning and conservative mode (i.e., setting in (5) and in (3)). Besides, in order to ensure the performance of stability-plasticity, the vigilance parameter of FAM is set as , and the ensemble size is set as 5.

In order to improve the classification accuracy and reduce the computation time, in each case some salient features are selected from each feature set by the modified distance discriminant technique, respectively, and then input into the five single FAM in different sequence in the process of training. Figure 4 shows the modified distance discriminant factor of all features in the feature sets. From the figure it can be seen that the threshold corresponding to the optimal features are different for the case. That is to say, the number of salient features is different.

Figure 5 summarizes the classification results in terms of test accuracy of single FAM and FAMs’ ensemble. From the figure, it can be seen that the FAMs’ ensemble (0.988) outperforms the single FAMs’ in terms of accuracy. And the test accuracy is getting higher when the number of single FAM increases. These indicate that FAMs’ ensemble can identify the different fault categories of hydraulic pump well.

3.3.1. Effect of Different Threshold for Feature Selection

As shown in Figure 4, when the threshold value is set properly, some redundant and irrelevant features can be removed from the original feature set. To test the effect of the proposed feature selection method based on the modified distance discriminant technique, a series of experiments is carried out against the threshold value , in which the parameter of the single FAM is the same as the above, and the size of ensemble FAM is set as 5.

Figure 6 lists the classification accuracy of five individual FAMs’ and FAM ensemble against the different thresholds. From the figure, it can be noticed that when (original feature set), the test accuracy of single FAM and FAM ensemble is 0.824 and 0.845, respectively. The highest test accuracy of single FAM and FAMs’ ensemble (0.915 and 0.988) is arrived synchronously when , where the optimal feature set is selected. However, when the threshold value continues to increase, the test accuracy of single FAM and FAMs’ ensemble tends to decrease. And when threshold , the test accuracy of single FAM and FAMs’ ensemble is lower than that used in all features with threshold . This is mainly because the smaller number of features leads to the overfitting; namely, the drastic reduction of features can lead to a decrease in the test accuracy.

3.3.2. Classification Performance Comparison with Other Classification Methods

In order to test the superiority of the proposed FAMs’ ensemble method, the test results produced by FAMs’ ensemble and single FAM are compared with those produced by other classification methods. In this experiment, the parameters of FAM ensemble and single FAM are the same as the above.

Table 2 shows the test results of the FAMs’ ensemble versus other classification methods. From the table it can be seen that the average test accuracy using single FAMs’ is the lowest. However, the test accuracy produced by two FAMs’ ensemble methods is higher than that produced by the single classifier, and the test success rate of the proposed FAMs’ ensemble is highest and higher than that of FAM ensemble with voting algorithm. These indicate that the proposed FAMs’ ensemble has comparatively superior diagnosis performance.

4. Conclusions

The classification performance of FAM is affected by the sequence of training samples. A novel and reliable FAMs’ ensemble based on the improved Bayesian belief method is described and proposed to improve the classification performance of FAM in this paper, which combines the output from a committee of FAM fed with different orderings of training samples and derives the combined decision.

And the supposed FAMs’ ensemble method is applied to the fault identification of hydraulic pump. The experiment results testify that the proposed FAM ensemble can diagnose the fault categories accurately and reliably and has better diagnosis performance compared with single FAM. These indicate that the proposed FAMs’ ensemble has a good promise in the engineering of classification and decision making.

Acknowledgments

This work is supported by the National Scientific and Technological Achievement Transformation Project of China (Grant no. 201255), Electronic Information Industry Development Fund of China (Grant no. 2012407), the National Natural Science Foundation of China (Grant no. 61374172), and the Fundamental Research Funds for the Central Universities, Hunan University, China.