Abstract

Clustering methods have been widely applied to the fault diagnosis of mechanical system, but the characteristic that the number of cluster needs to be determined in advance limits the application range of the method. In this paper, a novel clustering method combining the adaptive resonance theory (ART) with the similarity measure based on the Yu’s norm is presented and applied to the fault diagnosis of rolling element bearings, which can be adaptive to generate the number of cluster by the vigilance parameter test. Time-domain features, frequency-domain features, and time series model parameters are extracted to demonstrate the fault-related information about the bearings, and then considering the irrelevance or redundancy of some features many salient features are selected by an improved distance discriminant technique and input into the proposed clustering method to diagnose the faults of bearings. The experiment results confirmed that the proposed clustering method can diagnose the fault categories accurately and has better diagnosis performance compared with fuzzy ART and Self-Organizing Feature Map (SOFM).

1. Introduction

In order to decrease the downtime on production machinery and to increase reliability against possible failures, some important machinery is equipped with condition monitoring systems, but how to be intelligent to classify the data samples collected by the condition monitoring system is challenging. Artificial neural networks (ANNs) used as an intelligent classification tool have been widely applied in the fault diagnosis field of machine conditions which are treated as classification problems based on learning pattern from empirical data modeling in complex mechanical processes and systems [1]. But in the diagnosis process some ANNs are unable to detect unexpected fault changes. In other words, if the trained ANN has not learned a new fault category at the stage of training, the trained networks cannot identify it when the fault category occurs. In this case, these networks are required to retrain and learn the knowledge using the complete data sets; the trained network model must be modified to learn the new category. Thus, the previously learned knowledge is forgotten and the memories of prior training are destroyed. This can result in a time-consuming and costly process. And in reality, it is impossible to obtain all the data set representing the features of all fault categories. These characteristics limit the application of these neural networks in fault diagnosis field. In order to solve this problem, the adaptive resonance theory (ART) network has been developed and applied to the field of pattern recognition and fault diagnosis. It is designed according to the adaptive resonance theory to overcome the stability-plasticity dilemma; that is, its learning system is able to protect useful historical data from the corruption (stability) while simultaneously learning new data (plasticity) [2, 3]. Owing to the advantages of ART network, some ART models such as ART2 and fuzzy ART have been developed and applied in some specified fields [46]. Furthermore, on the basis of advantage of ART some classification methods about the combination of ART and neural network, such as the CNN-ART algorithm [7] and ART-KOHONEN neural network [8], have been suggested to be applied in the field of fault diagnosis, which can be adaptive to expand the knowledge continuously without the loss of the previous knowledge during learning new knowledge.

Currently clustering methods, owing to their superiority in independency of supervisors, have been widely studied and applied to the field of fault diagnosis. According to the principle that similar objects are within the same cluster and dissimilar objects are in different clusters, most of clustering methods mainly employ the similarity and distance measure to partition a dataset into several clusters, such as nearest neighbor (KNN) and fuzzy -means (FCM) [911]. Recently a novel clustering method using similarity measure derived from Yu’s norm is also developed to classify medical datasets [12], which can deal with the uncertainty problem through the fuzzy formalism. But prior to classification, using these methods, the number of cluster nodes must be determined in advance. However, in the real world, it is rather hard to predict the number of classes. This is especially true in industry application where the mechanical equipment operates in a dynamic environment and under the influence of numerous uncertainty factors. Thus, these characteristics limit the application range of these clustering methods.

To the best of our knowledge, the clustering method using similarity measure based on the Yu’s norm is seldom applied in the fault diagnosis of mechanical system. And in the fault diagnosis application when the samples of different fault classes overlap in some regions in the feature space, the traditional hard (crisp) clustering method mainly uses distance to compare data sample to fault classes and classify the data sample into one and only one cluster [13], these can result in the misclassification. But in fuzzy clustering the data sample is subject to one cluster with a certain grade of membership, and the clustering method based on similarity measure makes classification decision by comparing how similar the data sample is to the class vectors. Therefore, it is preferable to make use of the similarity measure based on the Yu’s norm to develop a new diagnostic method which is capable of learning from the process data steam by identifying the different fault categories automatically; namely, the novel diagnostic method can be adaptive to generate the number of cluster nodes according to the number of faults in real time. Based on the elaboration mentioned above, a novel clustering method combined ART with similarity measure (ART-similarity) based on Yu’s norm is proposed to diagnose the faults of rolling element bearing.

The rest of the paper is organized as follows. The review of adaptive resonance theory and clustering method using similarity measure based on Yu’s norm is introduced in Section 2. Section 3 describes the proposed ART-similarity clustering method based on Yu’s norm. Section 4 explains the diagnosis system using the proposed clustering method; namely, many feature parameters are extracted through the signal processing methods to depict the related-fault information, and some salient features are selected by the modified distance discriminant technique to be input into the ART-similarity clustering method to diagnose the faults of bearings. Finally, the conclusions about the proposed clustering method in the field of fault diagnosis are given in Section 5.

2. Review of ART and Clustering Method Using Similarity Measure Based on Yu’s Norm

Adaptive resonance theory (ART) which is treated as a theory of human cognitive information processing was designed by Carpenter and Grossberg in 1976. ART network is an online learning system, and it mainly utilizes the self-organization to develop the stable and plastic clustering of input samples; namely, the ART network uses the vigilance test to resolve the stability-plasticity dilemma. In the learning process when a new sample is input into the ART network, it can attempt to categorize the sample by comparing it with the stored weight vectors of existing cluster node which represented a category. If the sample is similar to the existing categories and the match degree is greater than or equal to the vigilance value, the sample is classified into the specified category and the weight vectors of the corresponding cluster node are modified. Otherwise, a new category is created without affecting the existing memory. Its detailed dynamic character and algorithm can be seen in [4].

Because the fuzzy relation can be called a similarity relation [14], some clustering methods using similarity measure based on fuzzy set theory are developed. In 2007, a similarity measure based on Yu’s norm is constructed and the corresponding clustering method is presented by Luukka [12]. Yu’s norms, namely, Yu’s -norm and Yu’s -norm, in fuzzy logic, are written as follows, respectively:in which . According to the equivalence relations from -norms and -norms and negation [7], the similarity measure can be derived by the following: Thus, the data samples can be clustered by the similarity measure.

The clustering algorithm is described as follows. Assume that a set of objects need to be classified into different classes by different feature parameters which depict the information about the object, and these feature parameters are normalized so that these objects are vectors that belong to . Then, the weight vector () which presents the class can be obtained by calculating the arithmetic mean of some sample set that are known to belong to class . Once the weight vector is determined, the decision to which class the sample vector belongs can be made by comparing it with the weight vector of each class. The comparison is made through computing the similarity degree based on Yu’s norm, which can be written as follows:where for all . Also, , affecting the classification accuracy. And as is smaller, the classification accuracy is higher generally. Without the loss of generalization here it is set as 0.6. Thus, that sample which belongs to can be determined by the following:

3. ART-Similarity Clustering Method Based on Yu’s Norm

The ART neural network and the clustering method using similarity measure based on Yu’s norm have their respective advantages as has been noted. The proposed ART-similarity clustering method is the synthesized product of their respective advantages. Its architecture which is shown in Figure 1 is similar to that of the fuzzy ART excluding the adaptive filter that can be adaptive to adjust the number of clusters by the vigilance parameter . It is mainly comprised of input layer storing the input samples, comparison layer receiving bottom-up input from input layer and top-down input from discernment layer, and discernment layer containing the active category and storing category nodes. Each normalized input vector is denoted as a dimensional vector , where is the normalized data sample and is the dimensional vector. The weight vector associated with category node in discernment layer is represented as . Since the proposed clustering method is on the basis of similarity measure, the categories generated are merging of the similar samples. Initially, the weight vector is set to zero vector, and the number of category node is set to zero. For every input sample except the first sample, the categorization with the ART-similarity clustering method is performed by category choice, vigilance test, and learning.

(1) Category Choice. The purpose of this stage is to select the winner category node from all existing category nodes. The choice function is a similarity measure between the th category and the th input sample, by (4), which is represented aswhere and are the th input sample and the weight vector of the th category node, respectively. The winner category which has the biggest similarity degree is selected by the following equation:

(2) Vigilance Test. In order to determine whether the th input sample matches with the selected winner category in this stage, a vigilance parameter which is introduced as the evaluation criterion of similarity is used to test the similarity degree to which is a subset of category . If the similarity degree meets the vigilance criterionwhere , it indicates that the input sample is sufficiently similar to the selected winner category , the sample is classified into the th category, and the learning is also performed. Otherwise, a new category node is generated to contain the input sample in the discernment layer; correspondingly the weight vector of the category is given by the following formula:where is total number of current category node.

(3) Learning. When the selected category satisfied the vigilance criterion, the weight vector of the current category is updated by the following equation, that is, learning:where is the enhanced weight vector of category , is the origin weight vector of category , and is the number of samples that belong to the category .

4. Diagnosis System Using ART-Similarity Clustering Method

4.1. Structure of Diagnosis System

The fault diagnosis system is shown in Figure 2. The system mainly includes four stages: data acquisition, feature extraction, feature selection, and fault diagnosis. The main objective of the study is to use the proposed ART-similarity clustering method based on Yu’s norm to diagnose the different fault categories of the rolling element bearings. In order to ensure the credibility of diagnosis results the vibration signals used are obtained from the dataset of the rolling element bearings [15] in this paper. And then time-domain, frequency-domain features, and auto-regression (AR) model parameters are extracted from the raw signals, respectively, and optimized by the modified distance discriminant technique. Finally, the proposed ART-similarity clustering method combined with the salient features is trained and used to diagnose the faults of bearings.

4.2. Vibration Data Acquisition

The schematic diagram of experiment rig is shown in Figure 3. It consists of a three-phase induction motor, a load motor, and a torque sensor. The bearings are installed in a motor driven mechanical system. The load motor is controlled to get the desired torque load levels. An accelerometer is attached to the housing with magnetic bases and mounted at the 12 o’clock position at the driven end of the motor housing. Considering that the frequency content of interest in the vibration signals of the system under study does not exceed 5000 Hz, the vibration signals are acquired by a DAT recorder with the 12 K/s sample rate.

The test bearing type is 6205-2RS JEM SKF, deep groove ball bearing. The single point defects are introduced into the drive-end bearing of the motor by the electrodischarge machining. Four different defect diameters (0.007, 0.014, 0.021, and 0.028  inch) are introduced into the balls to simulate different fault severity of bearings; 0.014  inch defect diameter is introduced into the inner race and outer race, respectively, to simulate the different fault categories of bearings, and these defects’ depth is all 0.011 inch; each bearing is tested under four different loads (0, 1, 2, and 3 hp) and rpm ≈ 1800. Thus, the bearing data sets can be obtained from the experimental system under different operation loads and seven different fault conditions: (1) in normal condition; (2) with outer race fault; (3) with inner race fault; (4) with four different severity levels of ball faults.

4.3. Features Extraction

Feature parameters are mainly utilized to depict the fault-relate information about the bearings. To acquire more information many different feature parameters are extracted from the vibration signals.

Statistical feature parameters in time domain and frequency domain are often used to characterize the shape of vibration signal from different perspectives. In this study nine time-domain feature parameters and seven frequency-domain feature parameters are extracted and used as the basis for the fault diagnosis of bearings, which are listed in Table 1.

Time series model can characterize the dynamic process of mechanical system. Because of sensitiveness of these model parameters to the shape of vibration data, these parameters are also used as feature parameters to demonstrate the fault-related information about the bearing. Autoregression (AR) model which is the basis time series model can work as predictor; its basic expression can be written as follows:where are the previous samples, is the predicted sample of the signal , and is AR model parameters. The algorithm that these AR model parameters are obtained is described concretely in [16]. Here the parameter is set as 16.

Thus, an original feature set containing 32 feature parameters is obtained, which can preserve fault-related information that cover time-domain, frequency-domain and wavelet-domain.

4.4. Features Selection

When the above 32 features are used as the input of the proposed ART-similarity clustering method to diagnose the bearings, there is a possibility that the diagnosis accuracy decreases and the computation time is increased because of the redundancy or irrelevance of some features. In order to improve the diagnosis performance, some sensitive features providing characteristic information for the diagnosis system need to be selected, and irrelevant or redundant features must be removed. Here, the distance discriminant technique [17] is adopted to select the salient features from the original feature set. Considering the overlapping degree among different classes, an improved version is proposed.

Assume that a feature set of classes consists of samples, and in the th class there are samples, where , and . Each sample is represented by features, and the th feature of the th sample is written as . The feature selection process can be described as follows.

Step 1. Calculate the standard deviation and the mean of all samples in the th feature:

Step 2. Calculate the standard deviation and the mean of the sample in the th class in the th feature, respectively:

Step 3. Calculate the weighted standard deviation of the class center in the th feature: where , , , is the center of all samples in the th feature, is the center of the samples of the th class in the th feature, and are the weighted mean of the squared class center and the class center in the th feature, and is the prior probability of the th class, which can be calculated by the formula and .

Step 4. Calculate the distance of the th feature between different classes:

Step 5. Define and calculate the variance factor of in the th feature as follows:

Step 6. Calculate the distance of the th feature within classes:

Step 7. Define and calculate the variance factor of in the th feature as follows:

Step 8. The compensation factor of the th feature can be defined and calculated as follows:

Step 9. Calculate the modified distance discriminant factor of the th feature: where is used to control the impact of .

Step 10. Rank features in descending order according to the modified distance discriminant factors . Then normalize by and get the normalized modified distance discriminant factors. Clearly, bigger () signifies that the corresponding feature is better to separate classes.

Step 11. Set a threshold value , and select the sensitive features whose modified distance discriminant factor from the set of features.

Further, in order to demonstrate the superiority and character of the improved distance discriminant technique, a numerical example to compare the improved distance discriminant technique with the original distance discriminant technique is presented in the appendix.

4.5. Fault Diagnosis

In the phase of fault diagnosis some data samples of bearings are utilized to evaluate the performance of the proposed method, the data samples contain seven different fault conditions, the fault conditions are labeled by Arabic numerals , respectively, and each fault condition contains 25 data samples. The detailed description is shown in Table 2.

The detailed fault diagnosis flow chart of the proposed method is shown in Figure 4. First, data samples are preprocessed to obtain 36 feature parameters. Second, to reduce the computation time and improve diagnosis accuracy the proposed improved distance discriminant technique is used to select the salient features from the original feature set. Figure 5 shows the modified distance discriminant factors of all features. From the figure it can be seen that when the threshold , the number of selected salient features is 19.

Finally, the proposed ART-similarity clustering method based on Yu’s norm is applied to the fault diagnosis of bearings. Its characteristics are training and test together. The 175 data samples are used for training and test. In the beginning of the training of the cluster model, it is empty. When the first data sample is input into the cluster model, the first cluster node is produced which is considered as one fault category. When the next input sample enters the model, it is compared with the first cluster node. If the similarity degree is bigger than the set vigilance parameter , the input sample is classified to the first cluster, and the corresponding weight vector of the cluster node is modified by (7); otherwise, the second cluster node is produced. When the third input sample enters the model, it is compared with all the produced cluster nodes, and the cluster node that has the biggest degree of similarity is the winner. If the similarity degree meets the vigilance criterion, the sample is classified to the winner cluster, or else a new cluster node is produced. According to the above-mentioned reasoning, a trained classifier is obtained.

Generally, one fault class is needed to use many cluster nodes to learn because of the complex fault mechanism. To evaluate the performance of the proposed Yu’s norm based on ART-similarity clustering method and understand the relationship of classification accuracy, the number of cluster nodes, and vigilance parameter, a series of fault diagnosis experiments with the increasing vigilance parameters are conducted. For convenience of computation, the classification accuracy can be obtained by the following formula [8]:where is classification accuracy, is the sample number of correct classification, is the number of total samples, and is the number of generated cluster nodes. which is the difference of the total sample number minus the training sample number means the number of used samples for test.

Figure 6 shows the relationship of classification accuracy and the increasing vigilance parameters. It can be seen that the classification accuracy rises with the increasing vigilance parameter from the figure, but it is not continuous. As the classification accuracy reaches 100%. Figure 7 shows the relationship of cluster nodes number and the increasing vigilance parameters. The number of cluster nodes rises with the increasing vigilance parameters and each fault class can be composed of many cluster nodes. When , the number of cluster nodes which covers 7 fault conditions of bearings is as high as 20. And from these two figures it can be seen that when 9, the number of cluster nodes is about 10, but the classification accuracy is very low and is lower than 75%; when and rises, the classification accuracy and the number of cluster nodes all increase; when , the classification accuracy is the highest and reaches 100%, and the corresponding number of cluster nodes is 15 and is the least relatively. Thus according to the relationship of cluster nodes number and vigilance parameter and classification accuracy, is set as 0.999995 here.

For convenience of understanding, Figure 8 and Table 3 show the classification result using all samples for the training and test when . From the table and figure it can be seen that all samples are classified accurately, and each condition of bearing includes different cluster nodes number. Conditions 1, 3, 4, 5, and 6 only use one node to learn and test, respectively, but the condition 7, namely, very severe bearing ball fault, needs eight cluster nodes. This is mainly because the region of the cluster node becomes bigger or smaller depending on the space distribution of the data samples with the same condition. When the node region becomes small, the corresponding number of cluster nodes increases.

4.6. Performance Test of ART-Similarity Clustering Method Based on Bootstrap Method

It is well known that the initial conditions affect the performance of ART-similarity clustering method. To study the stability and generalization of the proposed method, the bootstrap method is used to compute the estimated mean, standard deviation, and confidence interval for the classification accuracy, which is useful for estimating a parameter when the underlying distribution function of parameter is unknown [18]. 1000 bootstrap replicate samples are generated for statistical analysis of diagnosis accuracy. The estimated statistical results are given in Table 4. From the table it can be observed that the estimated statistical mean of diagnosis result is 98.95%, the standard deviation is 0.54%, and the 95% confidence interval achieves [97.87%, 99.38%]. These all can indicate that the performance of the proposed ART-similarity clustering method is stable and generalized.

4.7. Classification Performance Comparison with Other Methods

In order to validate the superiority of the proposed ART-similarity clustering method, the classification result produced by ART-similarity classifier is compared with that produced by other conventional unsupervised neural networks, such as the fuzzy ART and SOFM network. Same data samples are utilized to evaluate these methods: 100 of these samples are used for the training of these networks, and the rest, for test. The classification results of the ART-similarity clustering method versus other classification methods with the same salient feature parameters are shown in Table 5. From the table it can be drawn that the classification accuracy of the ART-similarity is the highest and reaches 100%, and the corresponding number of cluster nodes is the least and is 15. For the fuzzy ART and SOFM, the classification accuracy is 96.57% and 94.36%, and the corresponding number of cluster nodes is 79 and 76, respectively. These indicate that the proposed ART-similarity clustering method has superior classification performance comparatively.

5. Conclusions

In this paper a new clustering method that combines the adaptive resonance theory (ART) with the similarity measure based on Yu’s norm is presented to diagnose the faults of rolling element bearings, which can generate the cluster nodes dynamically. Before application of the proposed clustering method to the fault diagnosis of bearings, time-domain statistical characteristics features, frequency-domain statistical characteristics features, and AR time series model parameters are extracted to characterize the fault-related information of bearing.

Owing to the redundancy and irrelevance of some features the improved distance discriminant techniques are used to select the sensitive features, and then they are input into the proposed clustering method to diagnose the fault categories of bearings. The experiment result showed that the proposed ART-similarity clustering method can diagnose the faults of bearings successfully, and its diagnosis accuracy is higher than fuzzy ART and SOFM. And because the initial conditions affect the performance of the proposed clustering method, the bootstrap method is utilized to analyze the diagnosis results. The statistical analysis result shows that the proposed clustering method is stable and generalized. All these indicate that the proposed method has better diagnosis ability and performance and further demonstrate that the proposed clustering method has a good promise in the field of fault diagnosis of mechanical system.

Appendix

Comparison of the Modified Distance Discriminant Technique with the Original Distance Discriminant Technique: A Numerical Example

Suppose the simulated date set is comprised of four classes. Each class consists of 35 samples, and each sample is depicted by six features. Figure 9 shows the distributions of six features under four classes. From the figure, it can be seen clearly that the classification ability of each feature is listed as follows: , where represent the indexes of the classification abilities of the six features, respectively.

To ascertain classification ability of each feature, the distance discriminant technique is adopted to evaluate the sensitivities of the six features. The evaluation results are shown in Figure 10(a), and the modified evaluation results produced by the modified distance discriminant technique are displayed in Figure 10(b). By comparison of Figure 10(a) with Figure 10(b), it can be found that the original distance discriminant factors of the six features produced by the modified distance discriminant technique are in accord with the truth. But, the distance discriminant factor produced by the distance discriminant technique of feature is bigger than that of feature , and factor of feature is bigger than that of feature ; namely, the sensitivities of feature and are higher than that of feature and , respectively. This is because the distance between different classes and distance within classes are obtained by the simple mean method in the original distance discriminant technique, and the variance factors of the two distances are not considered. However, in the modified distance discriminant technique, the variance factors are incorporated into the distance between different classes and distance within classes. Thus, the drawbacks of the original distance discriminant technique are overcome, and the modified distance discriminant factors shown in Figure 10(b) are produced. These demonstrate further that the modified distance discriminant technique is superior to the original distance discriminant technique.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The work is supported financially by the National Natural Science Foundation of China (Grant No. 51405353), and the Open Fund of the Key Laboratory for Metallurgical Equipment and Control of Ministry of Education in Wuhan University of Science and Technology (2014B01), and National Natural Science Foundation of China (Grant nos. 51475339 and 51575202).