Abstract

Existing anomaly detection models of mechanical systems often face challenges for the equipment under multiple working conditions: the learning model under a single working condition is challenging to adapt to new working conditions, and centralized learning of multicondition samples leads to too low detection accuracy. A multiworking condition variational auto-encoder (MW-CVAE) is proposed to solve the problem. Based on the variational auto-encoder model, the working conditions of the equipment are regarded as the input. The anomaly evaluation threshold of each independent working condition is established by the centralized learning of normal multiworking condition samples. At the same time, it is found that the representation of each working condition sample in the space of the hidden layer forms a distribution close to the prior probability, providing a theoretical basis for the separation and evaluation of working conditions. By comparing and verifying the CWRU, JNU, and PU datasets, the new method significantly improves anomaly detection (the F1-score value is increased by 18-19%) and can be widely used in anomaly detection mechanical systems with various discrete working conditions.

1. Introduction

Anomaly detection is a method for identifying abnormalities and illogical data mining, and it is an essential branch of machine learning. Especially in the era of big data, the speed of manual data processing has been far behind that of computers, and therefore, faster detection of abnormal data is a valuable task nowadays [1, 2]. In industries, the anomaly detection of mechanical devices is crucial. More immediate and more accurate anomaly detection help prevent accidents and improves reliability and production efficiency [3, 4].

Anomaly detection algorithm-based machine learning mainly includes One-Class Support Vector Machines (OC-SVMs), Principal Component Analysis (PCA), and Local Outlier Factor (LOF). For OC-SVM, the normal data is used to train the model to obtain a hyperplane, which is utilized to circle the positive data. OC-SVM takes the hyperplane as a criterion and considers the samples inside the circle are positive. Since the computation of kernel function is time-consuming, OC-SVM is not widely used under massive data [57]. PCA is a statistical algorithm to convert a set of potentially correlated variables into a set of linearly uncorrelated variables, and the transformed set of variables is called principal components [8]. LOF measures the density deviation of a given sample concerning its neighbours and determines whether a point is an outlier by comparing the density of the sample with its neighbour. This algorithm is suitable for data with noticeable density differences, and its complexity is high; therefore, it does not apply to big data [9].

In recent years, the anomaly detection algorithm based on deep learning has been an effective method and academic focus gradually [1022]. Auto-Encoder (AE) is widely used for anomaly detection due to its excellent deep representation. The AE-based anomaly detection algorithm minimizes the reconstruction error to establish a representation model of normal samples and identifies the samples whose reconstruction error exceeds the threshold as anomalies [19]. Chen et al. proposed a novel quadratic function-based deep convolutional auto-encoder (DCAE) in predicting the remaining useful life (RUL) of bearing [23]. The bearing vibration signals are first preprocessed by low-pass filtering and then fed into the quadratic function-based DCAE neural networks. It can generate a bearing Health Indicator (HI) from raw vibration signals and can be better applied to RUL prediction than other existing HI. However, AE has a disadvantage in that it only gives a low-dimensional hidden space representation and cannot learn the characteristics of a sample’s probability distribution. However, by employing a variational auto-encoder (VAE), a specific sample can be expressed as the distribution of possible samples, and the latent space becomes a continuous distribution space. VAE, therefore, is competitive in the field of anomaly detection as a generation model. In 2015, An and Cho [17] proved the feasibility of VAE in unsupervised anomaly detection and applied VAE in network intrusion detection. Literature [11] proposed an unsupervised anomaly detection model donut based on VAE. The encoder extracts representative features from Key Performance Indicator (KPI) sequences. The decoder reconstructs the sequence according to the features and calculates the anomaly in the deviation sequence, which is between the reconstructive sequence and the origin sequence. This model fully uses the deep representation ability to model KPI series and takes advantage of the representation abilities of KPI sequence in the deep generation models. References [12, 13] discuss how the condition K affects VAE. According to the different conditions, the potential distribution can be generated and be used for detection. Due to the different thresholds of anomalies, the model can detect local and global anomalies. References [24, 25] discussed the unsupervised construction of HI. In [24], it presented a new unsupervised HI construction approach. The method innovatively constructs the HI of a distribution contact ratio metric health indicator (DCRHI) to represent the degradation process well and obtain a uniform failure threshold. Qin et al. proposed a novel degradation-trend-constrained VAE (DTC-VAE) to construct the HI vector with the distinct degradation trend [22]. Compared with other typical unsupervised HI construction methods, this method can more easily determine the uniform failure threshold.

Mechanical devices work under different conditions, such as the change of load, rotating speed, and input power, leading to two challenges in the anomaly detection of a mechanical system. (i) The model learned from a single working condition is not appropriate for a new working condition and even identifies the normal samples under another condition as anomalies. (ii) The concentration learning of samples under multiconditions may lead to low detection accuracy. The research on anomaly detection under multiple working conditions has been relatively inadequate [4, 26].

The Multiworking Conditions Variational Auto-Encoder (MW-CAVE) is presented in the paper to solve the low detection accuracy under multiworking conditions. MW-CAVE takes the working conditions as the conditional input of VAE, obtains the distribution of samples by concentration learning on normal data, and determines the threshold of anomaly detection, increasing the detection accuracy under multiworking conditions.

2. Anomaly Detection of Multiworking Conditions

2.1. Problem Analysis

Figure 1 demonstrates the vibration responses of normal and abnormal cases, of which data are quoted from the JNU dataset (more details on the JNU dataset will be introduced in Section 5.1). Figures 1(a)1(c) show the vibration of normal cases, and others show the abnormal.

Figure 2 illustrates the results using VAE anomaly detection (introduced in Section 3.1). A typical VAE anomaly detection utilizes the reconstruction error (LMSE) as the anomaly score to evaluate the abnormal condition. Threshold 1 in Figure 2 represents the best threshold according to the maximum principle of the F1-score (introduced in Section 5.2.3) and the normal and abnormal are expressed as 0 (False) and 1 (True), respectively. Anomaly scores that exceed the threshold are considered abnormal. Table 1 compares the detection performance of the JNU dataset and PU dataset by VAE anomaly detection and shows both low Area Under Curve (AUC) and F1-score and poor performance of anomaly detection. Scores of the three normal cases divide into three layers in Figure 2, representing three typical working conditions, respectively. If threshold 2 is used as the standard, some abnormal samples (in region 1) may be misclassified into normal. If threshold 3 is used as the standard, all the samples in region 1 and region 2 are misclassified as normal. The phenomenon of the misclassification is called Misclassification Caused by Working Condition Interference, and this is the main reason leading to a low anomaly detection performance.

2.2. Idea of Paper

Figure 3 shows the idea of the paper. The ellipses 1 and 2 represent the distributions of the normal samples of two working conditions obtained by generating distributed model learning, and the ellipse edges represent the classification boundary. Points 1 and 2 stand for abnormal and normal samples of working condition 1, respectively. Point 3 stands for an abnormal sample of working condition 2. If the two working conditions were not distinguished, two kinds of errors would have happened: (1) point 1 is an abnormal sample of working condition 1 but would be considered normal according to the distribution of working condition 2. (2) Point 2 is a normal sample of working condition 1 but would be abnormal according to the distribution of working condition 2. An anomaly detection method is established to avoid these two kinds of errors, as shown in Figure 3(b). Two working conditions will be separated. In low-dimensional space, the normal samples of each independent working condition are learned to form a Gaussian distribution. After two working conditions are isolated, points 1 and 2 will be compared with the distribution of working condition 1. According to the probability density, point 1 is judged as an abnormal sample, and point 2 is deemed as a normal sample. Similarly, point 3 can also be correctly identified.

By utilizing the ability of a variational auto-encoder (VAE) to lean the data distribution, and a conditional variational auto-encoder (CVAE) to separate the working conditions, a Multiworking Conditions Anomaly Detection Method, MW-CVAE, is proposed in this paper. Based on the centralized learning of various working conditions, the proposed method establishes the respective distribution characteristics (latent space) of different working conditions samples and calculates the exclusive anomaly metric of different working conditions samples.

The new method can overcome the misclassification problem caused by working condition interference under multiple working conditions.

2.3. Definition of Multiworking Conditions Anomaly Detection

Multiworking conditions anomaly detection is defined as a mechanical system that has C kinds of working conditions, , of which l data samples of each working condition are collected, respectively; for C kinds of working conditions, the total number of samples is n = C·l. The datasets of samples are time series , (i= 1, 2, …, n), the subscript “i” represents the ith sample, and the length of the data sequence is J. A model, described by similar function , will be established to determine the anomaly algorithm and corresponding threshold . If , the sample is normal; otherwise, it is abnormal.

3. Methodological Foundation

3.1. Variational Auto-Encoder (VAE)

VAE, shown in Figure 4, is a deep Bayesian network that can establish a relationship between visible variable x and latent variable z, which is usually a multivariate unit Gaussian distribution. VAE simulates the data distribution through a neural network with parameters and takes samples from hidden layers z. The data that conforms to the distribution of be generated by . Since the true posterior is intractable by analytic methods, similar to an auto-encoder, the parameter estimation approach of a VAE is approximating the distribution through a simple distribution . Refer to [10], the process of calculation is as follows:

The training loss of a VAE is as

To calculate Kullback–Leibler (KL) divergence, the model considers and following normal distribution, , . , and represents the mean and variance generated by the encoder network , and the reparameterization is utilized to sample z.

3.2. Conditional Variational Auto-Encoder (CVAE)

CVAE, shown in Figure 5, is a conditional-directed graphical model where input observations modulate the prior on latent variables that generate the outputs, in order to model the distribution of high-dimensional output space as a generative model conditioned on the input observation [13]. There are three types of variables. For random observable x, z (unknown, unobserved), and c (known, observed) are independent random latent variables. The conditional probability is formed by a nonlinear transformation, with the parameter . is another nonlinear function that approximates inference posterior  = . The latent variable z allows for modeling multiple modes in conditional distribution of given making the model enough for modeling one-to-many mapping. To make an approximation of and , ELBO in (3) is given as

Gaussian latent variable z samples hidden variable z. Refer to [7], assume , should be , the second part in (3) has an analytical solution.

The first item in (4), called reconstruction error, represents the difference between the input and reconstruction , then loss function is expressed aswhere is the dimension of and .

4. MW-CAVE Anomaly Detection

4.1. Process of Anomaly Detection

The MW-CVAE model is based on CVAE, and its process is shown in Figure 6. The details of detection are described as follows:(a)Data preprocessing. In the data preprocessing of anomaly detection, the most critical part is normalization. References [8, 16] adopt the normalization of mean and variance, namely, . and stand for the mean and variance of training data. However, the above-given method is only applicable to the same ranges of normal samples and anomalous samples. Usually, the range of anomalous samples is larger than normal. In the training step, only normal samples, which have a relatively small range will be trained. When we get to the testing phase, the mean and variance of testing data may change. To avoid the error caused by the normalization of testing data, the article suggests adopting linear global normalization:where and represent the static minimum and maximum ranges of all samples, respectively, rather than the statistical functions, like and . In practical application, we choose a proper value of and to ensure all the training and testing samples are in the range .(b)Network structure. The network structure is shown in Figure 7. The input layer adopts the normalized vibration signal X, and the number of nodes is 8192. Working condition c adopts a 30-node one-hot coding mode, which is detailed in part (c). Encoders are composed of 400 and 200 full connection layers, respectively, and the activation function is ReLU. The hidden layer size is 2. Decoders, on the contrary, use 200,400 nodes of the fully connected layer, the number of nodes in the output layer is 8192.(c)Encoding of working conditions. The encoder and decoder add the same input representing a specific working condition c. The working condition c is a known variable in the training and testing process and maybe one or more of the following: (i) the devices are at different speeds. (ii) The devices are under different loads. (iii) The devices produce workpieces with different specifications. The working condition c is encoded by a one-hot form. Since the speed of devices is continuous, for example, the rotating speed is between 200 to 950 rpm, the number of types is infinite. In practice, for the sake of safety, energy-saving, and high efficiency, the working speed is only a few, such as low speed (0), medium speed (1), and high speed (2). The load and specification can be encoded similarly. The one-hot encoding of working conditions is shown in Figure 8.(d)Training process. In the training process, Stochastic Gradient Variational Bayesian (SGVB) is used [7]. Since the CVAE model is adopted, each sample and its working condition information are as input, and the normal samples containing all working conditions are the input to learn.(e)Anomaly score. In this paper, we adopt the reconstruction-based approach to evaluate the degree of abnormality [27]. When the samples are abnormal, the error is high, namely, high abnormality. The anomaly score of sample x is expressed aswhere is the dimension of input samples.(f)Determination of anomaly threshold. The paper’s determination method of abnormal threshold is significantly different from a traditional CVAE method. An anomaly detection method based on an anomaly score must determine an anomaly threshold to distinguish whether the sample is normal or abnormal. The selection of abnormal threshold is the key to the detection performance [27]. Referring to the paper [28], we first obtain a set of test samples with known abnormal labels and obtain all abnormal scores. By drawing the relationship curve between F1-score (including precision rate and recall rate) and the threshold, we find that the threshold corresponding to the maximum value of the F1-score is the best threshold . However, in the multicondition scenario, due to the Misclassification Caused by Working Condition Interference’, the optimal threshold obtained by the above method still cannot guarantee good anomaly detection performance (problem analyzed in Section 2.1). In order to prevent this problem, the proposed method uses the idea of working condition separation to learn the best threshold.The method given in Algorithm 1 can greatly improve the accuracy of multiworking condition anomaly detection. The core idea of the algorithm is to classify the test samples with abnormal labels according to the working conditions, that is, to learn the optimal threshold for each working condition.(g)MW-CAVE anomaly detection. After the detection threshold is determined by step (f), Algorithm 2 can be used to determine the abnormal state under a certain working condition.

4.2. Testing Algorithm
Input: Trained CVAE model, , , ; Testing data set working condition  =  abnormal label  =  , ;
Output: best_threshold: ()
 scores  score() # get anomaly score by equation (7)
 for i = 1 to C
   =  #get index by
  Si  scores [wi_ind]
   [wi_ind]
  t = 0.01
  while t < 1 do
   Ai_pred  Si > t
   best_t F1-score (Ai, Ai_pred, Si)
   t = t + 0.01
  end while
   #best threshold for
 end for

In Step 1, the testing data set , working condition , abnormal label , and trained CVAE model for testing are collected. The number of samples is N.

In Step 2, the optimal threshold is identified for each working condition. We obtain the set of test samples with known abnormal labels and the anomaly scores by equation (7). The index of the data under different working conditions is recorded as wi_ind. Corresponding to the index, the sample anomaly score and the true anomaly label are recorded as Si and Ai, respectively. The normalized threshold t (range: 0, 1) starts at 0 and increases in steps of 0.01. According to Si, Ai, and t, the best threshold (best_t, the value of t which generate the maximum F1-score) is calculated by the F1-score. The above-given operation is repeated to obtain each optimal threshold under different working conditions.

Input: samples and corresponding working conditions
Output: anomaly or not
# get anomaly score by equation (7)
 If score < best_threshold ()
   is normally
 else
   is anomaly

5. Examples

5.1. Introduction of Dataset

The CWRU dataset provided by Case Western Reserve University Bearing Data Center [29] is one of the most famous open-source datasets in fault diagnosis research. Data of CWRU was collected by accelerometers attached to the housing with magnetic bases. This paper uses the driver end-bearing fault data whose sampling frequency is 12 kHz, and four working conditions are listed in Table 2.

JNU dataset is a dataset on the rolling bearings provided by Jiangnan University [23] and contains four health statuses: normal, inner ring fault, outer ring fault, and roller fault. Accelerometers collect the vibration signal under three rotating speeds, 600 rpm, 800 rpm, and 1000 rpm, at the sampling frequency of 50 kHz. Three typical working conditions are listed in Table 3.

PU dataset is a dataset on the rolling bearings provided by Universität Paderborn [24]. The type of rolling bearing is 6203. The faults are divided into practical faults and artificial faults. The latter is discussed in the paper. In the artificial faults, the crack, spalling, and pitting are machined by electrical discharge machining, drilling, and electrical engraving. The vibration responses are acceleration signals, of which the sampling frequency is 48 kHz. The working conditions are divided according to the torque, radial force, and rotating speed. Four typical working conditions in PU dataset are listed in Table 4.

5.2. Experimental Method and Criterion
5.2.1. Baseline Method

In the experiment, five methods are utilized for comparison. The former two methods are machine learning, and the latter three are deep learning.(a)Principal Component Analysis (PCA). After the eigenvalues of the covariance matrices of samples are decomposed, the eigenvalues are the variances corresponding to the samples projected onto axes. A smaller eigenvalue indicates that the sample is concentrated. Meanwhile, the anomaly is easier to shift. It can be used as an indicator to distinguish anomalies [8].(b)Local Outlier Factor (LOF). LOF method compares the density of given data points to that of their neighbors. Since the outliers come from the areas with a lesser density, the ratio of abnormal data points is higher. LOF method detects whether the data are normal or not by comparing the densities of given data points with the data points near them [9].(c)Variational Auto-Encoder (VAE). For a VAE, the reconstruction error is considered as a score. According to the maximum of the F1-score on the testing set, the best threshold is determined by linear normalization. The best threshold is regarded as a criterion for anomaly detection [25, 27].(d)Conditional Variational Auto-Encoder (CVAE). References [12, 13] introduce anomaly detection by CVAE. For the case of learning of multiple working conditions, best_thr is the threshold corresponding to the maximum of the F1-score.(e)Multiworking Condition Anomaly detection (MW-CVAE). When there are C kinds of working conditions, it is divided into C kinds of independent working conditions, and the threshold determination and anomaly detection are carried out for each independent working condition.

5.2.2. Experimental Parameters

The parameters of 3 deep learning models in the article are listed as follows:(a)VAE. For the encoder, size of the input layer is 8192. The nodes in the middle layers are 400 and 200, respectively. z size of the hidden layer is 2. For the decoder, the input is 2. The nodes in the middle layers are 200 and 400, respectively. The size of the output layer is the same as that of the input layer, and both are 8192.(b)CVAE. For the encoder, size of the input layer is 8192. The nodes in the middle layers are 400 and 200, respectively. z size of the hidden layer is 2. For the decoder, the input is 2. The nodes in the middle layers are 200 and 400, respectively. The size of the output layer is the same as that of the input layer, and both are 8192. The working conditions are encoded with one-hot of length 30 bits. The training adopts Adam optimization, and the learning rate is 10e−3.(c)MW-CVAE. The parameters are the same as these of CVAE.

5.2.3. Criteria

Under the centralized learning of multiple working conditions, the detection accuracy of the samples covering all working conditions and health status is compared to verify the effectiveness of MW-CVAE. The area under curve (AUC), F1-score, and accuracy are treated as indicators for comparing different methods.

In order to introduce the performance indexes F1-score and Accuracy, the confusion matrix in Table 5 is introduced. TP is the number of positive (abnormally) samples predicted to be positive. FN is the number of positive samples predicted to be negative. FP is the number of negative (normal) samples predicted as positive and TN is the number of negative samples predicted as negative. The formulas of Accuracy, Precision, Recall, and F1-score are shown in equations (8)–(11), respectively.

5.3. Results of Anomaly Detection
5.3.1. Performance

The anomaly detections by MW-CVAE and CVAE are compared on the JNU dataset, as shown in Figure 9. Figures 9(a)9(c) demonstrate the visual display of anomaly detection by MW-CVAE under wc = 0, wc = 1 and wc = 2, and Figure 8(d) shows that by CVAE under wc = [0, 1, 2]. The blue dash line stands for the position of the best threshold. In Figure 9(d), there are stratifications between the three normal working conditions, indicating that the three working conditions are at different abnormal levels (green “+” points in Figure 9(d)). Figures 9(a)9(c) show the distributions of the best threshold and samples. Due to the separation of working conditions, the interference of working conditions is avoided.

Table 6 compares the results by MW-CVAE and CVAE. The method proposed in this paper has a significant improvement. Accuracy is improved by 0.16–0.1645, AUC is improved by 0.1139–0.1154, and F1-score is improved by 0.1815–0.1918. The determination method of abnormal threshold presented in this paper is different from the CVAE method. In MW-CVAE, it learns the best threshold according to the idea of working condition separation and avoids the interference of working conditions.

Figure 10 compares the receiver operating characteristic curve (ROC) using VAE and CVAE on the JNU dataset. The AUC of CVAE is higher than that of VAE. Also, the AUC of MW-CVAE is higher than that of MW-VAE. The reason is that there is an interference between different working conditions using VAE.

5.3.2. Comparison of Different Datasets

Table 7 compares the F1-score and AUC of the methods on CWRU, JNU, and PU datasets. It can be seen from the comparison in Table 7 that in addition to the AUC performance of one PCA method on the PU data set, the MW-CAVE method has achieved the best F1-score and AUC values on all data sets compared with other methods.

5.4. Effects of Parameters
5.4.1. Effects of Latent Space

Figure 11 shows the distribution of hidden learning samples under different epochs when the hidden layer size is 2. The red, blue, and green points represent three different working conditions of learning samples. As the epoch progresses, the network learning gradually converges. Meanwhile, the final distribution of each working condition will form the normal distribution corresponding to the respective working condition. However, the mean and variance are different for each working condition, precisely with the assumption of CVAE.

5.4.2. Effects of the Size of Latent Layers

Figure 12 shows the F1-score on JNU and PU datasets under the different sizes K of hidden layers. When K increases from 2 to 200, F1-score on JNU dataset fluctuates slightly between 2 and 10. In other cases, F1-score remains almost the same. It indicates that the size of hidden layers has little effect on MW-CAVE accuracy.

6. Conclusion

Aiming at the problem that the accuracy of mechanical system anomaly detection is significantly reduced under multiple working conditions, an MW-CVAE is proposed in the paper. The working condition is encoded as conditional input to establish the anomaly detection model. Compared with the typical CVAE model, each working condition has a corresponding best threshold of anomaly detection in the MW-CVAE model, which improves the detection. For instance, the AUC increases by 11-12%, and the F1-score increases by 18-19%. The method proposed in the paper can be applied to the anomaly detection of discrete working conditions in the industry. According to the distribution of multiple working conditions in the latent space of the MW-CVAE model, each working condition tends to the normal distribution after the convergence of learning, providing a basis for classification by working conditions and anomaly detection.

Data Availability

The data used in this study can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by a Special Project of National Key Research and Development Program-Research of China on the technology of whole-life-cycle detection, monitoring and integrity evaluation of manned equipment in amusement parks and scenic spots (Project no.: 2016YFF0203100).