Mathematical Problems in Engineering

Volume 2015, Article ID 368190, 14 pages

http://dx.doi.org/10.1155/2015/368190

## Fault Diagnosis with Evolving Fuzzy Classifier Based on Clustering Algorithm and Drift Detection

^{1}Graduate Program in Electrical Engineering, Federal University of Minas Gerais, Avenue Antônio Carlos 6627, 31270-901 Belo Horizonte, MG, Brazil^{2}Department of Computer Engineering, Faculdade de Ciência e Tecnologia de Montes Claros, Avenue Deputado Esteves Rodrigues 1637, 39400-142 Montes Claros, MG, Brazil^{3}Department of Electronics Engineering, Federal University of Minas Gerais, Avenue Antônio Carlos 6627,
31270-901, Belo Horizonte, MG, Brazil

Received 16 April 2014; Accepted 23 July 2014

Academic Editor: Minping Jia

Copyright © 2015 Maurilio Inacio et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The emergence of complex machinery and equipment in several areas demands efficient fault diagnosis methods. Several fault diagnosis methods based on different theories and approaches have been proposed in the literature. According to the concept of intelligent maintenance, the application of intelligent systems to accomplish fault diagnosis from process historical data has been shown to be a promising approach. In problems involving complex nonstationary dynamic systems, an adaptive fault diagnosis system is required to cope with changes in the monitored process. In order to address fault diagnosis in this scenario, use of the so-called “evolving intelligent systems” is suggested. This paper proposes the application of an evolving fuzzy classifier for fault diagnosis based on a new approach that combines a recursive clustering algorithm and a drift detection method. In this approach, the clustering update depends not only on a similarity measure, but also on the monitoring changes in the input data flow. A merging cluster mechanism was incorporated into the algorithm to enable the removal of redundant clusters. Multivariate Gaussian memberships functions are employed in the fuzzy rules to avoid information loss if there is interaction between variables. The novel approach provides greater robustness to outliers and noise present in data from process sensors. The classifier is evaluated in fault diagnosis of a DC drive system. In the experiments, a DC drive system fault simulator was used to simulate normal operation and several faulty conditions. Outliers and noise were added to the simulated data to evaluate the robustness of the fault diagnosis model.

#### 1. Introduction

The advance of technology has resulted in the emergence of machinery and complex equipment, which imposes great challenges for its management and maintenance. In industries, for instance, fault diagnosis in major processes is vitally important to assure normal operation of a plant. In these cases, due to the complexity of the systems, it is infeasible for human operators to diagnose abnormal situations (faults) in a timely manner, leading them to take wrong decisions. Statistical studies indicate that approximately 70% of the accidents in industries are caused by human error, which can account for economic losses, security reductions, and environmental damages [1].

This scenario led to the emergence of new concepts on management and maintenance of machinery and equipment, such as condition-based maintenance (CBM) [2]. CBM refers to the use of machine or equipment data obtained in real time to infer its working condition (or faulty condition), allowing maintenance scheduling and preventing equipment crashes. Based on CBM, the concept of intelligent maintenance has emerged [3]. It employs advanced fault diagnosis systems to achieve the desired goals. Thus, intelligent maintenance becomes necessary for current complex machinery and equipment.

Over the past decades, several intelligent fault diagnosis methods based on different theories and approaches have been proposed in the literature. In general, these methods use mathematical/statistical models, accumulated experience, or even process data to perform fault diagnosis [1]. Although methods based on models or experience have shown to be effective, they have the disadvantage of requiring previous knowledge of the dynamic system in question. On the contrary, methods based on process data do not require prior knowledge. They are based solely on data obtained directly from the system.

Recently, fault diagnosis methods based on process data have received great emphasis, since the acquisition of data through sensors is widely common in today’s automation systems [4, 5]. Given this current scenario, many times it is easier to extract knowledge from data than developing a model or accumulating experience. In this type of diagnosis, several works have already proposed data based diagnostics methods employing so-called “intelligent systems,” which are tools derived from computational intelligence, mainly artificial neural networks, fuzzy systems, and neurofuzzy networks, among others [2].

However, despite the good performance achieved by intelligent systems in fault diagnosis, they tend to face difficulties when the problem involves complex nonstationary dynamic systems, which represent the vast majority of the current real cases. In such systems, physical parameters, operating characteristics and fault behaviours change over time, requiring an adaptive fault diagnosis system, able to self-adapt in favor to cope with changes in the monitored system. In order to address fault diagnosis in this scenario, several works propose the use of the so-called “evolving intelligent systems” [6–10].

Evolving intelligent systems are systems based on fuzzy inference systems, artificial neural networks, or a combination of both, the neurofuzzy networks, whose main characteristic is the ability to gradually determine both its structure and parameters from input data acquired in online mode and often in real time [11, 12]. The application of evolving intelligent systems has been growing in recent years. Many works present successful applications in real world complex problems involving modeling, control, classification, or prediction [13]. An important aspect of evolving intelligent systems is that there are different theoretical and practical approaches which can be used for its implementation. Regardless of the approach to be used, the main features of evolving intelligent systems are as follows:(i)its structure is not fixed and is not defined a priori: it grows (expands or shrinks) naturally as the system evolves;(ii)its parameters are adjusted (adapted) as the system evolves;(iii)the operation is continuous; that is, they are based on online learning algorithms and, if necessary, in real time.

One of the most used approaches to define the structure of an evolving intelligent system is unsupervised recursive clustering. Generally, the algorithm performs data clustering in the input or input-output data space in an incremental manner, defining the center of each cluster, and in some cases, the radius of the cluster (or zone of influence). During the evolving process, the algorithm can create new clusters, update existing clusters, or eliminate redundant ones. The models proposed in [14–21] are examples of intelligent systems based on evolving clustering algorithms.

Most evolving intelligent systems based on recursive clustering adopt a mechanism to update the structure and parameters of the system (creation/modification/removal of clusters) using some measure of similarity between input data samples and existing clusters. Although this mechanism is functional, it may lead to an erroneous definition of the structure, since outliers or noisy samples (as usually are the data acquired by sensors in industrial environments) which exceed the measure of similarity may generate clusters that do not effectively represent the data spacial structure [21]. Some evolving intelligent systems adopt more elaborated mechanisms to update the model structure and system parameters, such as the models proposed in [20, 21], using methods to ignore/filter outliers and noise.

Considering the fault diagnosis problem, the use of evolving intelligent systems based on recursive clustering algorithms robust to outliers and data noise is mandatory. In this problem, each new cluster created is usually associated with a new faulty condition. Thus, if the clustering procedure is not robust, the fault diagnosis model tends to have a high false alarm rate; that is, new faulty conditions are erroneously detected. In this context, this paper proposes a fault diagnosis approach based on an evolving fuzzy classifier which uses a new robust unsupervised recursive clustering algorithm. The proposed classifier uses a modified version of the Gustafson-Kessel (GK) clustering algorithm [22] with the incorporation of the drift detection method (DDM) [23].

GK is a powerful clustering algorithm. Unlike many others, it allows the identification of clusters with different shapes and orientations in space. The algorithm employs a technique to adapt the distance metric to the shape of each cluster using an estimation of the cluster covariance matrix. Furthermore, the GK algorithm has also the advantage of being relatively insensitive to data scale and initialization of the partition matrix [24]. Several applications have been proposed in the literature based on this clustering algorithm, such as time series prediction, dynamic systems modeling, fault diagnosis, and prognosis.

According to the literature, a drift detection is a method to detect gradual changes in the context of input data. By context, it is understood as a set of generated data when the process is stationary. Thus, a method for drift detection is able to detect time instants when changes occur in the context of the data. The detection of a new context suggests that the current model is outdated and needs to be updated using current relevant information. Drift detection methods are suitable for applications involving machine learning, where algorithms are applied to real world problems, in complex, nonstationary, and dynamic environments. In these applications, large amounts of information are provided in a continuous flow of high-speed data presenting variations over time as, for example, real time monitoring of industrial plants [25]. The learning algorithms must be able to monitor the behavior of the dynamic system in question and adapt the model as changes occur. Among several methods proposed for drift detection, the DDM algorithm employs simple and computationally efficient method to detect moments when changes occur. It consists of an independent drift detection method, and it can be embedded into any learning algorithm, while increasing its efficiency in problems involving nonstationary dynamic models.

The new unsupervised recursive clustering algorithm proposed in this paper combines the advantages of the GK algorithm, especially the ability to identify clusters with different shapes and orientations in an online mode, with the DDM algorithm. The DDM algorithm is used to detect changes in the input stream triggering updates in the cluster structure. In the proposed algorithm, any clustering update depends not only on the similarity measure, but also on monitoring changes in the input data flow, which gives the algorithm a greater robustness to the presence of outliers and noise. A merging cluster mechanism was also incorporated into the algorithm to enable the removal of redundant clusters. The fuzzy rule base of the proposed classifier is updated whenever the cluster structure is modified. The clusters centers and covariance matrices are used as parameters of fuzzy rules. Multivariate Gaussian memberships functions are employed in the rules, characterized by a central vector and a dispersion matrix, which represents the current dispersion of the input variables, as well as the interactions between them [21].

In accordance with the characteristics of the proposed recursive clustering algorithm, the main benefits achieved by the classifier used in this work are(i)the ability to learn faults of the dynamic system in online mode and, if necessary, in real time, eliminating the need for prior knowledge of the system;(ii)the ability to adapt whenever changes are detected in the monitored system, allowing the application to real problems;(iii)low false alarm rate and high fault isolation rate due to the robustness to outliers and noise, increasing the reliability of diagnosis.

To evaluate the performance of the proposed approach in fault diagnosis, a DC drive system fault simulator was used to simulate normal operation and several faulty conditions. Outliers and noise were added to the simulated data to evaluate the robustness of the fault diagnosis model.

This paper is organized as follows. Section 2 presents the theoretical concepts regarding recursive clustering algorithm, drift detection method, and presents the proposed recursive clustering algorithm. Next, Section 3 presents the proposed classifier and its application in fault diagnosis. Section 4 presents the experiments and results. Finally, Section 5 presents the conclusion and suggestions for future works.

#### 2. Recursive Clustering Algorithm and Drift Detection

##### 2.1. Recursive Gustaffson-Kessel Algorithm

In pattern recognition, clustering algorithms are among the most useful tools to solve problems that involve analysis of nonlabeled data, or unsupervised learning [26]. Over the past decades, thousands of clustering algorithms have been proposed [27], but most of them are based on the offline learning concept or batch learning; it is assumed that the entire dataset is previously available. However, for many applications, data is acquired in real time, requiring online learning.

In contrast to clustering algorithms for offline learning which find clusters employing an iterative strategy, such as K-means and Fuzzy C-Means (FCM) [27], clustering algorithms for online learning are based on recursive strategies, which allow the algorithm to find clusters processing each input data sample only once. Several algorithms have been proposed in the last years based on this approach, such as evolving clustering method (ECM) [14], evolving vector quantization (eVQ) [18], and eClustering [28]. A common feature of these algorithms is that they assume that the form of the clusters is spherical, which can be a limiting factor in real applications, where the clusters may have different shapes and orientation in space.

Unlike many clustering algorithms that employ Euclidian distance as measure of similarity, GK algorithm employs Mahalanobis distance, which allows the identification of clusters with ellipsoidal shapes. In this algorithm, the distance is defined as follows: where represents the distance between an input data sample , , and the cluster center , , where is the number of data samples, is the number of data dimensions, and is the number of clusters. The norm-inducing matrix , , defines the shape and orientation of each cluster in space, which depends on a fuzzy covariance matrix , , and of the membership degree of the input data sample , , . The GK algorithm uses an iterative process to estimate the parameters of the clusters (the cluster center and fuzzy covariance matrix), which are used to define the distance and membership degree . This process is finished when a certain convergence criterion is reached. But, as discussed at the beginning of this section, when the application requires the definition of clustering in online mode, a recursive procedure is required. More details about the GK algorithm can be found in [22].

In [24], an extended version of the GK algorithm named evolving GK-like algorithm (eGKL) is proposed. This approach estimates the number of clusters and performs the adaptation of its parameters recursively, maintaining the advantages of the GK algorithm, such as the ability to identify clusters with generic shapes and orientations. The eGKL algorithm does not demand any a priori information regarding the number of clusters. In order to estimate the number of clusters, a strategy to evaluate each new input data sample is used. The strategy checks if each sample belongs to an existing cluster. If the current data sample belongs to a cluster already set, the parameters of the cluster (center and covariance matrix) are updated. If the data sample does not belong to any of the existing clusters, it is used to define a new one. To evaluate the similarity between a new sample data and one of the existing clusters, the eGKL algorithm employs the Mahalanobis distance, defined as follows:

In this strategy, the current data sample belongs to an existing cluster if the distance to the cluster center is smaller than the cluster radius. The eGKL algorithm uses an approach inspired in concepts of statistical process control to estimate the radius of each cluster. In this approach, it is assumed that a sample belongs to a cluster if the following relationship holds: where is the value of a Chi-squared distribution with degrees of freedom and a confidence interval . The degrees of freedom correspond to the input space dimension. This approach has the advantage of avoiding the problem called “curse of dimensionality” [29], that is, the problem of increasing the distance between two adjacent points with the increase in the input space dimensionality, since is proportional to the dimension of the input data.

In eGLK algorithm, if condition (3) is satisfied, it means that the current data sample belongs to a cluster, so the cluster parameters are updated. Otherwise, it is assumed that the current data sample does not belong to any one of the existing clusters, and a new cluster is created. The complete procedures of the eGLK algorithm can be seen in [24].

To increase the eGKL algorithm robustness to outliers, the authors propose a mechanism based on the number of data samples that belong to a cluster. In this mechanism, if the number of data samples , , already assigned to an existing cluster is less than (a minimum number initially chosen), even if the new data sample does not belong to that cluster, the cluster parameters are updated. Although it is functional, this mechanism depends on the proper choice of parameters to the problem at hand, which can be difficult for problems where a priori information is not available.

##### 2.2. Drift Detection Method

Several drift detection methods have been proposed. In general, they can be classified into two categories: methods that perform adaptive learning at regular intervals regardless of the occurrence of changes and methods that detect changes first and subsequently adapt the learning to these changes [25]. Considering the first category, methods can use time windows of fixed size or weight the data according to their age or utility [30–32]. When the time windows of a fixed size are used, at each time frame, learning is performed only with data samples included in the window. An inherent difficulty with methods using fixed-size windows is choosing the appropriate window size for each problem. In the second category, methods use some indicators monitored over time to detect changes, such as performance measures, data distribution, or data properties [23, 33, 34]. If during the monitoring process a drift is detect, actions are taken to adapt the model to the change that has occurred, as in the case of using adaptive size time window, where the actions are to adjust the window according to the extent of the context change.

The DDM algorithm, which belongs to the second category, employs a simple method with direct application. This method is based on monitoring the number of errors produced by a learning model during prediction. The method uses the Binomial distribution to determine the general form of the probability for the random variable that represents the number of prediction errors into a sequence of input data samples. For each data sample sequences, the error rate is the probability of the prediction error with standard deviation . According to the probability approximately correct (PAC) learning model [35], the error rate of the learning algorithm decreases with the increase of input data samples, and if the distribution is stationary, a significant increase in the error rate suggests context changes. In this case, it is assumed that the current model is inappropriate and should be updated.

In this method, while monitoring the error, it defines a warning and a drift level. When exceeds the warning level, the data samples are stored in memory. However, if exceeds the drift level, it is considered that there is a context change. In this situation, the model induced by the learning algorithm should be updated with the data samples stored since the time that the warning level has been reached. It is possible that the error increases and, after reaching the warning level, it decreases to lower levels. This situation corresponds to a false alarm, where there is no change of context and, therefore, no action is required and the data samples stored in the memory are no longer needed. More details about the DDM method can be found in [23].

The use of the DDM algorithm embedded in a model learning algorithm can keep the dynamic system model continuously updated to the current context. For instance, DDM can be used embedded in a recursive clustering algorithm. In this case, the definition of the clusters are adjusted whenever a context change is detected. DDM is used to avoid the nonrobust approach of creating new clusters whenever a similarity measure threshold is violated. This mechanism gives the recursive clustering algorithm a greater robustness to outliers and noise in applications where online learning of nonstationary dynamic models is necessary.

##### 2.3. Proposed Algorithm

This section describes the proposed unsupervised recursive clustering algorithm with a new mechanism of clustering update. The algorithm is a recursive version of the GK algorithm, inspired by the eGKL algorithm, incorporating the DDM algorithm. In the proposed algorithm, clustering is performed in online mode and, if necessary, in real time.

Assuming that there is no a priori information about the clustering structure nor a initial set of input data samples, the proposed algorithm starts by associating the center of the first cluster to the first data sample . The corresponding covariance matrix , the learning rate , and the number of samples associated with the first cluster are defined as follows: where ; is an identity matrix of size, is a small positive number (default value: ), and is the initial learning rate (default value: ).

The algorithm stops when all data samples are processed; otherwise, a new data sample is obtained and the distance between the data sample and the centers of the existing clusters is computed:

The similarity between the current data sample and the existing clusters is verified by the similarity condition

If similarity condition (6) is met for a given cluster, it is assumed that the current sample belongs to this cluster. The cluster parameters (center, covariance matrix, learning rate, and number of samples in the cluster) are then updated as follows: where .

If similarity condition (6) is not met, it is assumed that the current sample does not belong to any existing cluster. The algorithm increments a variable that represents the number of dissimilarities, ; then, the error probability and the standard deviation are computed as

In this algorithm, the and values are stored whenever reach the lowest value during the process, obtaining and . If the following condition is met, then and . Note that, when algorithm starts, the and values must be initialized as a positive number, is suggested set at one for each value.

To decide whether the current data sample represents a new cluster or it is just an outlier, warning and drift conditions are evaluated. The warning condition is verified as where is the warning level (default value: ). If the warning level is reached, then the current data sample is stored in a window of samples , (where is the current size of the window) and then the drift condition is evaluated. Otherwise, the algorithm processes the next input data sample. Drift condition is verified as where is the drift level (default value: ). If the drift level is reached, a new cluster is created and the center and the covariance matrix of the new cluster are determined by the samples stored in the data window as follows:

The remaining parameters of the new cluster (learning rate and number of samples in the cluster) are initialized as

In order to avoid redundant cluster formation, during the update, the similarity between clusters is checked. To achieve this, distances between the centers of the clusters are computed as follows:

If one of the following similarity conditions is met for two existing clusters and , the clusters are merged. These clusters have a hyper ellipsoidal shape, defined by a mean vector, a covariance matrix, and a number of samples associated with each one. The combination of these two clusters produces a new one with parameters computed as follows [36]:

Algorithm 1 summarizes the proposed recursive clustering algorithm.